Commit 10fb619d3e0618528b7ac6c20cad6262020cf947
1 parent
f3d1138b
Split documentation into multiple pages, change theme
Showing
16 changed files
with
6263 additions
and
6261 deletions
TODO
| @@ -30,8 +30,6 @@ Before release: | @@ -30,8 +30,6 @@ Before release: | ||
| 30 | I can do about, and it doesn't seem worth fixing. Maybe mention it | 30 | I can do about, and it doesn't seem worth fixing. Maybe mention it |
| 31 | somewhere? | 31 | somewhere? |
| 32 | * README-maintainer: Fix installation of documentation to website | 32 | * README-maintainer: Fix installation of documentation to website |
| 33 | -* Get navigation working properly | ||
| 34 | -* Figure out where to put :ref:`search` so we get doc search | ||
| 35 | 33 | ||
| 36 | Soon: | 34 | Soon: |
| 37 | 35 |
manual/acknowledgement.rst
0 โ 100644
| 1 | +.. _acknowledgments: | ||
| 2 | + | ||
| 3 | +Acknowledgment | ||
| 4 | +============== | ||
| 5 | + | ||
| 6 | +QPDF was originally created in 2001 and modified periodically between | ||
| 7 | +2001 and 2005 during my employment at `Apex CoVantage | ||
| 8 | +<http://www.apexcovantage.com>`__. Upon my departure from Apex, the | ||
| 9 | +company graciously allowed me to take ownership of the software and | ||
| 10 | +continue maintaining it as an open source project, a decision for which I | ||
| 11 | +am very grateful. I have made considerable enhancements to it since | ||
| 12 | +that time. I feel fortunate to have worked for people who would make | ||
| 13 | +such a decision. This work would not have been possible without their | ||
| 14 | +support. |
manual/cli.rst
0 โ 100644
| 1 | +.. _ref.using: | ||
| 2 | + | ||
| 3 | +Running QPDF | ||
| 4 | +============ | ||
| 5 | + | ||
| 6 | +This chapter describes how to run the qpdf program from the command | ||
| 7 | +line. | ||
| 8 | + | ||
| 9 | +.. _ref.invocation: | ||
| 10 | + | ||
| 11 | +Basic Invocation | ||
| 12 | +---------------- | ||
| 13 | + | ||
| 14 | +When running qpdf, the basic invocation is as follows: | ||
| 15 | + | ||
| 16 | +:: | ||
| 17 | + | ||
| 18 | + qpdf [ options ] { infilename | --empty } outfilename | ||
| 19 | + | ||
| 20 | +This converts PDF file :samp:`infilename` to PDF file | ||
| 21 | +:samp:`outfilename`. The output file is functionally | ||
| 22 | +identical to the input file but may have been structurally reorganized. | ||
| 23 | +Also, orphaned objects will be removed from the file. Many | ||
| 24 | +transformations are available as controlled by the options below. In | ||
| 25 | +place of :samp:`infilename`, the parameter | ||
| 26 | +:samp:`--empty` may be specified. This causes qpdf to | ||
| 27 | +use a dummy input file that contains zero pages. The only normal use | ||
| 28 | +case for using :samp:`--empty` would be if you were | ||
| 29 | +going to add pages from another source, as discussed in :ref:`ref.page-selection`. | ||
| 30 | + | ||
| 31 | +If :samp:`@filename` appears as a word anywhere in the | ||
| 32 | +command-line, it will be read line by line, and each line will be | ||
| 33 | +treated as a command-line argument. Leading and trailing whitespace is | ||
| 34 | +intentionally not removed from lines, which makes it possible to handle | ||
| 35 | +arguments that start or end with spaces. The :samp:`@-` | ||
| 36 | +option allows arguments to be read from standard input. This allows qpdf | ||
| 37 | +to be invoked with an arbitrary number of arbitrarily long arguments. It | ||
| 38 | +is also very useful for avoiding having to pass passwords on the command | ||
| 39 | +line. Note that the :samp:`@filename` can't appear in | ||
| 40 | +the middle of an argument, so constructs such as | ||
| 41 | +:samp:`--arg=@option` will not work. You would have to | ||
| 42 | +include the argument and its options together in the arguments file. | ||
| 43 | + | ||
| 44 | +:samp:`outfilename` does not have to be seekable, even | ||
| 45 | +when generating linearized files. Specifying ":samp:`-`" | ||
| 46 | +as :samp:`outfilename` means to write to standard | ||
| 47 | +output. If you want to overwrite the input file with the output, use the | ||
| 48 | +option :samp:`--replace-input` and omit the output file | ||
| 49 | +name. You can't specify the same file as both the input and the output. | ||
| 50 | +If you do this, qpdf will tell you about the | ||
| 51 | +:samp:`--replace-input` option. | ||
| 52 | + | ||
| 53 | +Most options require an output file, but some testing or inspection | ||
| 54 | +commands do not. These are specifically noted. | ||
| 55 | + | ||
| 56 | +.. _ref.exit-status: | ||
| 57 | + | ||
| 58 | +Exit Status | ||
| 59 | +~~~~~~~~~~~ | ||
| 60 | + | ||
| 61 | +The exit status of :command:`qpdf` may be interpreted as | ||
| 62 | +follows: | ||
| 63 | + | ||
| 64 | +- ``0``: no errors or warnings were found. The file may still have | ||
| 65 | + problems qpdf can't detect. If | ||
| 66 | + :samp:`--warning-exit-0` was specified, exit status 0 | ||
| 67 | + is used even if there are warnings. | ||
| 68 | + | ||
| 69 | +- ``2``: errors were found. qpdf was not able to fully process the | ||
| 70 | + file. | ||
| 71 | + | ||
| 72 | +- ``3``: qpdf encountered problems that it was able to recover from. In | ||
| 73 | + some cases, the resulting file may still be damaged. Note that qpdf | ||
| 74 | + still exits with status ``3`` if it finds warnings even when | ||
| 75 | + :samp:`--no-warn` is specified. With | ||
| 76 | + :samp:`--warning-exit-0`, warnings without errors | ||
| 77 | + exit with status 0 instead of 3. | ||
| 78 | + | ||
| 79 | +Note that :command:`qpdf` never exists with status ``1``. | ||
| 80 | +If you get an exit status of ``1``, it was something else, like the | ||
| 81 | +shell not being able to find or execute :command:`qpdf`. | ||
| 82 | + | ||
| 83 | +.. _ref.shell-completion: | ||
| 84 | + | ||
| 85 | +Shell Completion | ||
| 86 | +---------------- | ||
| 87 | + | ||
| 88 | +Starting in qpdf version 8.3.0, qpdf provides its own completion support | ||
| 89 | +for zsh and bash. You can enable bash completion with :command:`eval | ||
| 90 | +$(qpdf --completion-bash)` and zsh completion with | ||
| 91 | +:command:`eval $(qpdf --completion-zsh)`. If | ||
| 92 | +:command:`qpdf` is not in your path, you should invoke it | ||
| 93 | +above with an absolute path. If you invoke it with a relative path, it | ||
| 94 | +will warn you, and the completion won't work if you're in a different | ||
| 95 | +directory. | ||
| 96 | + | ||
| 97 | +qpdf will use ``argv[0]`` to figure out where its executable is. This | ||
| 98 | +may produce unwanted results in some cases, especially if you are trying | ||
| 99 | +to use completion with copy of qpdf that is built from source. You can | ||
| 100 | +specify a full path to the qpdf you want to use for completion in the | ||
| 101 | +``QPDF_EXECUTABLE`` environment variable. | ||
| 102 | + | ||
| 103 | +.. _ref.basic-options: | ||
| 104 | + | ||
| 105 | +Basic Options | ||
| 106 | +------------- | ||
| 107 | + | ||
| 108 | +The following options are the most common ones and perform commonly | ||
| 109 | +needed transformations. | ||
| 110 | + | ||
| 111 | +:samp:`--help` | ||
| 112 | + Display command-line invocation help. | ||
| 113 | + | ||
| 114 | +:samp:`--version` | ||
| 115 | + Display the current version of qpdf. | ||
| 116 | + | ||
| 117 | +:samp:`--copyright` | ||
| 118 | + Show detailed copyright information. | ||
| 119 | + | ||
| 120 | +:samp:`--show-crypto` | ||
| 121 | + Show a list of available crypto providers, each on a line by itself. | ||
| 122 | + The default provider is always listed first. See :ref:`ref.crypto` for more information about crypto | ||
| 123 | + providers. | ||
| 124 | + | ||
| 125 | +:samp:`--completion-bash` | ||
| 126 | + Output a completion command you can eval to enable shell completion | ||
| 127 | + from bash. | ||
| 128 | + | ||
| 129 | +:samp:`--completion-zsh` | ||
| 130 | + Output a completion command you can eval to enable shell completion | ||
| 131 | + from zsh. | ||
| 132 | + | ||
| 133 | +:samp:`--password={password}` | ||
| 134 | + Specifies a password for accessing encrypted files. To read the | ||
| 135 | + password from a file or standard input, you can use | ||
| 136 | + :samp:`--password-file`, added in qpdf 10.2. Note | ||
| 137 | + that you can also use :samp:`@filename` or | ||
| 138 | + :samp:`@-` as described above to put the password in | ||
| 139 | + a file or pass it via standard input, but you would do so by | ||
| 140 | + specifying the entire | ||
| 141 | + :samp:`--password={password}` | ||
| 142 | + option in the file. Syntax such as | ||
| 143 | + :samp:`--password=@filename` won't work since | ||
| 144 | + :samp:`@filename` is not recognized in the middle of | ||
| 145 | + an argument. | ||
| 146 | + | ||
| 147 | +:samp:`--password-file={filename}` | ||
| 148 | + Reads the first line from the specified file and uses it as the | ||
| 149 | + password for accessing encrypted files. | ||
| 150 | + :samp:`{filename}` | ||
| 151 | + may be ``-`` to read the password from standard input. Note that, in | ||
| 152 | + this case, the password is echoed and there is no prompt, so use with | ||
| 153 | + caution. | ||
| 154 | + | ||
| 155 | +:samp:`--is-encrypted` | ||
| 156 | + Silently exit with status 0 if the file is encrypted or status 2 if | ||
| 157 | + the file is not encrypted. This is useful for shell scripts. Other | ||
| 158 | + options are ignored if this is given. This option is mutually | ||
| 159 | + exclusive with :samp:`--requires-password`. Both this | ||
| 160 | + option and :samp:`--requires-password` exit with | ||
| 161 | + status 2 for non-encrypted files. | ||
| 162 | + | ||
| 163 | +:samp:`--requires-password` | ||
| 164 | + Silently exit with status 0 if a password (other than as supplied) is | ||
| 165 | + required. Exit with status 2 if the file is not encrypted. Exit with | ||
| 166 | + status 3 if the file is encrypted but requires no password or the | ||
| 167 | + correct password has been supplied. This is useful for shell scripts. | ||
| 168 | + Note that any supplied password is used when opening the file. When | ||
| 169 | + used with a :samp:`--password` option, this option | ||
| 170 | + can be used to check the correctness of the password. In that case, | ||
| 171 | + an exit status of 3 means the file works with the supplied password. | ||
| 172 | + This option is mutually exclusive with | ||
| 173 | + :samp:`--is-encrypted`. Both this option and | ||
| 174 | + :samp:`--is-encrypted` exit with status 2 for | ||
| 175 | + non-encrypted files. | ||
| 176 | + | ||
| 177 | +:samp:`--verbose` | ||
| 178 | + Increase verbosity of output. For now, this just prints some | ||
| 179 | + indication of any file that it creates. | ||
| 180 | + | ||
| 181 | +:samp:`--progress` | ||
| 182 | + Indicate progress while writing files. | ||
| 183 | + | ||
| 184 | +:samp:`--no-warn` | ||
| 185 | + Suppress writing of warnings to stderr. If warnings were detected and | ||
| 186 | + suppressed, :command:`qpdf` will still exit with exit | ||
| 187 | + code 3. See also :samp:`--warning-exit-0`. | ||
| 188 | + | ||
| 189 | +:samp:`--warning-exit-0` | ||
| 190 | + If warnings are found but no errors, exit with exit code 0 instead 3. | ||
| 191 | + When combined with :samp:`--no-warn`, the effect is | ||
| 192 | + for :command:`qpdf` to completely ignore warnings. | ||
| 193 | + | ||
| 194 | +:samp:`--linearize` | ||
| 195 | + Causes generation of a linearized (web-optimized) output file. | ||
| 196 | + | ||
| 197 | +:samp:`--replace-input` | ||
| 198 | + If specified, the output file name should be omitted. This option | ||
| 199 | + tells qpdf to replace the input file with the output. It does this by | ||
| 200 | + writing to | ||
| 201 | + :file:`{infilename}.~qpdf-temp#` | ||
| 202 | + and, when done, overwriting the input file with the temporary file. | ||
| 203 | + If there were any warnings, the original input is saved as | ||
| 204 | + :file:`{infilename}.~qpdf-orig`. | ||
| 205 | + | ||
| 206 | +:samp:`--copy-encryption=file` | ||
| 207 | + Encrypt the file using the same encryption parameters, including user | ||
| 208 | + and owner password, as the specified file. Use | ||
| 209 | + :samp:`--encryption-file-password` to specify a | ||
| 210 | + password if one is needed to open this file. Note that copying the | ||
| 211 | + encryption parameters from a file also copies the first half of | ||
| 212 | + ``/ID`` from the file since this is part of the encryption | ||
| 213 | + parameters. | ||
| 214 | + | ||
| 215 | +:samp:`--encryption-file-password=password` | ||
| 216 | + If the file specified with :samp:`--copy-encryption` | ||
| 217 | + requires a password, specify the password using this option. Note | ||
| 218 | + that only one of the user or owner password is required. Both | ||
| 219 | + passwords will be preserved since QPDF does not distinguish between | ||
| 220 | + the two passwords. It is possible to preserve encryption parameters, | ||
| 221 | + including the owner password, from a file even if you don't know the | ||
| 222 | + file's owner password. | ||
| 223 | + | ||
| 224 | +:samp:`--allow-weak-crypto` | ||
| 225 | + Starting with version 10.4, qpdf issues warnings when requested to | ||
| 226 | + create files using RC4 encryption. This option suppresses those | ||
| 227 | + warnings. In future versions of qpdf, qpdf will refuse to create | ||
| 228 | + files with weak cryptography when this flag is not given. See :ref:`ref.weak-crypto` for additional details. | ||
| 229 | + | ||
| 230 | +:samp:`--encrypt options --` | ||
| 231 | + Causes generation an encrypted output file. Please see :ref:`ref.encryption-options` for details on how to specify | ||
| 232 | + encryption parameters. | ||
| 233 | + | ||
| 234 | +:samp:`--decrypt` | ||
| 235 | + Removes any encryption on the file. A password must be supplied if | ||
| 236 | + the file is password protected. | ||
| 237 | + | ||
| 238 | +:samp:`--password-is-hex-key` | ||
| 239 | + Overrides the usual computation/retrieval of the PDF file's | ||
| 240 | + encryption key from user/owner password with an explicit | ||
| 241 | + specification of the encryption key. When this option is specified, | ||
| 242 | + the argument to the :samp:`--password` option is | ||
| 243 | + interpreted as a hexadecimal-encoded key value. This only applies to | ||
| 244 | + the password used to open the main input file. It does not apply to | ||
| 245 | + other files opened by :samp:`--pages` or other | ||
| 246 | + options or to files being written. | ||
| 247 | + | ||
| 248 | + Most users will never have a need for this option, and no standard | ||
| 249 | + viewers support this mode of operation, but it can be useful for | ||
| 250 | + forensic or investigatory purposes. For example, if a PDF file is | ||
| 251 | + encrypted with an unknown password, a brute-force attack using the | ||
| 252 | + key directly is sometimes more efficient than one using the password. | ||
| 253 | + Also, if a file is heavily damaged, it may be possible to derive the | ||
| 254 | + encryption key and recover parts of the file using it directly. To | ||
| 255 | + expose the encryption key used by an encrypted file that you can open | ||
| 256 | + normally, use the :samp:`--show-encryption-key` | ||
| 257 | + option. | ||
| 258 | + | ||
| 259 | +:samp:`--suppress-password-recovery` | ||
| 260 | + Ordinarily, qpdf attempts to automatically compensate for passwords | ||
| 261 | + specified in the wrong character encoding. This option suppresses | ||
| 262 | + that behavior. Under normal conditions, there are no reasons to use | ||
| 263 | + this option. See :ref:`ref.unicode-passwords` for a | ||
| 264 | + discussion | ||
| 265 | + | ||
| 266 | +:samp:`--password-mode={mode}` | ||
| 267 | + This option can be used to fine-tune how qpdf interprets Unicode | ||
| 268 | + (non-ASCII) password strings passed on the command line. With the | ||
| 269 | + exception of the :samp:`hex-bytes` mode, these only | ||
| 270 | + apply to passwords provided when encrypting files. The | ||
| 271 | + :samp:`hex-bytes` mode also applies to passwords | ||
| 272 | + specified for reading files. For additional discussion of the | ||
| 273 | + supported password modes and when you might want to use them, see | ||
| 274 | + :ref:`ref.unicode-passwords`. The following modes | ||
| 275 | + are supported: | ||
| 276 | + | ||
| 277 | + - :samp:`auto`: Automatically determine whether the | ||
| 278 | + specified password is a properly encoded Unicode (UTF-8) string, | ||
| 279 | + and transcode it as required by the PDF spec based on the type | ||
| 280 | + encryption being applied. On Windows starting with version 8.4.0, | ||
| 281 | + and on almost all other modern platforms, incoming passwords will | ||
| 282 | + be properly encoded in UTF-8, so this is almost always what you | ||
| 283 | + want. | ||
| 284 | + | ||
| 285 | + - :samp:`unicode`: Tells qpdf that the incoming | ||
| 286 | + password is UTF-8, overriding whatever its automatic detection | ||
| 287 | + determines. The only difference between this mode and | ||
| 288 | + :samp:`auto` is that qpdf will fail with an error | ||
| 289 | + message if the password is not valid UTF-8 instead of falling back | ||
| 290 | + to :samp:`bytes` mode with a warning. | ||
| 291 | + | ||
| 292 | + - :samp:`bytes`: Interpret the password as a literal | ||
| 293 | + byte string. For non-Windows platforms, this is what versions of | ||
| 294 | + qpdf prior to 8.4.0 did. For Windows platforms, there is no way to | ||
| 295 | + specify strings of binary data on the command line directly, but | ||
| 296 | + you can use the :samp:`@filename` option to do it, | ||
| 297 | + in which case this option forces qpdf to respect the string of | ||
| 298 | + bytes as provided. This option will allow you to encrypt PDF files | ||
| 299 | + with passwords that will not be usable by other readers. | ||
| 300 | + | ||
| 301 | + - :samp:`hex-bytes`: Interpret the password as a | ||
| 302 | + hex-encoded string. This provides a way to pass binary data as a | ||
| 303 | + password on all platforms including Windows. As with | ||
| 304 | + :samp:`bytes`, this option may allow creation of | ||
| 305 | + files that can't be opened by other readers. This mode affects | ||
| 306 | + qpdf's interpretation of passwords specified for decrypting files | ||
| 307 | + as well as for encrypting them. It makes it possible to specify | ||
| 308 | + strings that are encoded in some manner other than the system's | ||
| 309 | + default encoding. | ||
| 310 | + | ||
| 311 | +:samp:`--rotate=[+|-]angle[:page-range]` | ||
| 312 | + Apply rotation to specified pages. The | ||
| 313 | + :samp:`page-range` portion of the option value has | ||
| 314 | + the same format as page ranges in :ref:`ref.page-selection`. If the page range is omitted, the | ||
| 315 | + rotation is applied to all pages. The :samp:`angle` | ||
| 316 | + portion of the parameter may be either 0, 90, 180, or 270. If | ||
| 317 | + preceded by :samp:`+` or :samp:`-`, | ||
| 318 | + the angle is added to or subtracted from the specified pages' | ||
| 319 | + original rotations. This is almost always what you want. Otherwise | ||
| 320 | + the pages' rotations are set to the exact value, which may cause the | ||
| 321 | + appearances of the pages to be inconsistent, especially for scans. | ||
| 322 | + For example, the command :command:`qpdf in.pdf out.pdf | ||
| 323 | + --rotate=+90:2,4,6 --rotate=180:7-8` would rotate pages | ||
| 324 | + 2, 4, and 6 90 degrees clockwise from their original rotation and | ||
| 325 | + force the rotation of pages 7 through 8 to 180 degrees regardless of | ||
| 326 | + their original rotation, and the command :command:`qpdf in.pdf | ||
| 327 | + out.pdf --rotate=+180` would rotate all pages by 180 | ||
| 328 | + degrees. | ||
| 329 | + | ||
| 330 | +:samp:`--keep-files-open={[yn]}` | ||
| 331 | + This option controls whether qpdf keeps individual files open while | ||
| 332 | + merging. Prior to version 8.1.0, qpdf always kept all files open, but | ||
| 333 | + this meant that the number of files that could be merged was limited | ||
| 334 | + by the operating system's open file limit. Version 8.1.0 opened files | ||
| 335 | + as they were referenced and closed them after each read, but this | ||
| 336 | + caused a major performance impact. Version 8.2.0 optimized the | ||
| 337 | + performance but did so in a way that, for local file systems, there | ||
| 338 | + was a small but unavoidable performance hit, but for networked file | ||
| 339 | + systems, the performance impact could be very high. Starting with | ||
| 340 | + version 8.2.1, the default behavior is that files are kept open if no | ||
| 341 | + more than 200 files are specified, but this default behavior can be | ||
| 342 | + explicitly overridden with the | ||
| 343 | + :samp:`--keep-files-open` flag. If you are merging | ||
| 344 | + more than 200 files but less than the operating system's max open | ||
| 345 | + files limit, you may want to use | ||
| 346 | + :samp:`--keep-files-open=y`, especially if working | ||
| 347 | + over a networked file system. If you are using a local file system | ||
| 348 | + where the overhead is low and you might sometimes merge more than the | ||
| 349 | + OS limit's number of files from a script and are not worried about a | ||
| 350 | + few seconds additional processing time, you may want to specify | ||
| 351 | + :samp:`--keep-files-open=n`. The threshold for | ||
| 352 | + switching may be changed from the default 200 with the | ||
| 353 | + :samp:`--keep-files-open-threshold` option. | ||
| 354 | + | ||
| 355 | +:samp:`--keep-files-open-threshold={count}` | ||
| 356 | + If specified, overrides the default value of 200 used as the | ||
| 357 | + threshold for qpdf deciding whether or not to keep files open. See | ||
| 358 | + :samp:`--keep-files-open` for details. | ||
| 359 | + | ||
| 360 | +:samp:`--pages options --` | ||
| 361 | + Select specific pages from one or more input files. See :ref:`ref.page-selection` for details on how to do | ||
| 362 | + page selection (splitting and merging). | ||
| 363 | + | ||
| 364 | +:samp:`--collate={n}` | ||
| 365 | + When specified, collate rather than concatenate pages from files | ||
| 366 | + specified with :samp:`--pages`. With a numeric | ||
| 367 | + argument, collate in groups of :samp:`{n}`. | ||
| 368 | + The default is 1. See :ref:`ref.page-selection` for additional details. | ||
| 369 | + | ||
| 370 | +:samp:`--flatten-rotation` | ||
| 371 | + For each page that is rotated using the ``/Rotate`` key in the page's | ||
| 372 | + dictionary, remove the ``/Rotate`` key and implement the identical | ||
| 373 | + rotation semantics by modifying the page's contents. This option can | ||
| 374 | + be useful to prepare files for buggy PDF applications that don't | ||
| 375 | + properly handle rotated pages. | ||
| 376 | + | ||
| 377 | +:samp:`--split-pages=[n]` | ||
| 378 | + Write each group of :samp:`n` pages to a separate | ||
| 379 | + output file. If :samp:`n` is not specified, create | ||
| 380 | + single pages. Output file names are generated as follows: | ||
| 381 | + | ||
| 382 | + - If the string ``%d`` appears in the output file name, it is | ||
| 383 | + replaced with a range of zero-padded page numbers starting from 1. | ||
| 384 | + | ||
| 385 | + - Otherwise, if the output file name ends in | ||
| 386 | + :file:`.pdf` (case insensitive), a zero-padded | ||
| 387 | + page range, preceded by a dash, is inserted before the file | ||
| 388 | + extension. | ||
| 389 | + | ||
| 390 | + - Otherwise, the file name is appended with a zero-padded page range | ||
| 391 | + preceded by a dash. | ||
| 392 | + | ||
| 393 | + Page ranges are a single number in the case of single-page groups or | ||
| 394 | + two numbers separated by a dash otherwise. For example, if | ||
| 395 | + :file:`infile.pdf` has 12 pages | ||
| 396 | + | ||
| 397 | + - :command:`qpdf --split-pages infile.pdf %d-out` | ||
| 398 | + would generate files :file:`01-out` through | ||
| 399 | + :file:`12-out` | ||
| 400 | + | ||
| 401 | + - :command:`qpdf --split-pages=2 infile.pdf | ||
| 402 | + outfile.pdf` would generate files | ||
| 403 | + :file:`outfile-01-02.pdf` through | ||
| 404 | + :file:`outfile-11-12.pdf` | ||
| 405 | + | ||
| 406 | + - :command:`qpdf --split-pages infile.pdf | ||
| 407 | + something.else` would generate files | ||
| 408 | + :file:`something.else-01` through | ||
| 409 | + :file:`something.else-12` | ||
| 410 | + | ||
| 411 | + Note that outlines, threads, and other global features of the | ||
| 412 | + original PDF file are not preserved. For each page of output, this | ||
| 413 | + option creates an empty PDF and copies a single page from the output | ||
| 414 | + into it. If you require the global data, you will have to run | ||
| 415 | + :command:`qpdf` with the | ||
| 416 | + :samp:`--pages` option once for each file. Using | ||
| 417 | + :samp:`--split-pages` is much faster if you don't | ||
| 418 | + require the global data. | ||
| 419 | + | ||
| 420 | +:samp:`--overlay options --` | ||
| 421 | + Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on | ||
| 422 | + overlay/underlay. | ||
| 423 | + | ||
| 424 | +:samp:`--underlay options --` | ||
| 425 | + Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on | ||
| 426 | + overlay/underlay. | ||
| 427 | + | ||
| 428 | +Password-protected files may be opened by specifying a password. By | ||
| 429 | +default, qpdf will preserve any encryption data associated with a file. | ||
| 430 | +If :samp:`--decrypt` is specified, qpdf will attempt to | ||
| 431 | +remove any encryption information. If :samp:`--encrypt` | ||
| 432 | +is specified, qpdf will replace the document's encryption parameters | ||
| 433 | +with whatever is specified. | ||
| 434 | + | ||
| 435 | +Note that qpdf does not obey encryption restrictions already imposed on | ||
| 436 | +the file. Doing so would be meaningless since qpdf can be used to remove | ||
| 437 | +encryption from the file entirely. This functionality is not intended to | ||
| 438 | +be used for bypassing copyright restrictions or other restrictions | ||
| 439 | +placed on files by their producers. | ||
| 440 | + | ||
| 441 | +Prior to 8.4.0, in the case of passwords that contain characters that | ||
| 442 | +fall outside of 7-bit US-ASCII, qpdf left the burden of supplying | ||
| 443 | +properly encoded encryption and decryption passwords to the user. | ||
| 444 | +Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For | ||
| 445 | +an in-depth discussion, please see :ref:`ref.unicode-passwords`. Previous versions of this manual | ||
| 446 | +described workarounds using the :command:`iconv` command. | ||
| 447 | +Such workarounds are no longer required or recommended with qpdf 8.4.0. | ||
| 448 | +However, for backward compatibility, qpdf attempts to detect those | ||
| 449 | +workarounds and do the right thing in most cases. | ||
| 450 | + | ||
| 451 | +.. _ref.encryption-options: | ||
| 452 | + | ||
| 453 | +Encryption Options | ||
| 454 | +------------------ | ||
| 455 | + | ||
| 456 | +To change the encryption parameters of a file, use the --encrypt flag. | ||
| 457 | +The syntax is | ||
| 458 | + | ||
| 459 | +:: | ||
| 460 | + | ||
| 461 | + --encrypt user-password owner-password key-length [ restrictions ] -- | ||
| 462 | + | ||
| 463 | +Note that ":samp:`--`" terminates parsing of encryption | ||
| 464 | +flags and must be present even if no restrictions are present. | ||
| 465 | + | ||
| 466 | +Either or both of the user password and the owner password may be empty | ||
| 467 | +strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation | ||
| 468 | +of PDF files with a non-empty user password, an empty owner password, | ||
| 469 | +and a 256-bit key since such files can be opened with no password. If | ||
| 470 | +you want to create such files, specify the encryption option | ||
| 471 | +:samp:`--allow-insecure`, as described below. | ||
| 472 | + | ||
| 473 | +The value for | ||
| 474 | +:samp:`{key-length}` may | ||
| 475 | +be 40, 128, or 256. The restriction flags are dependent upon key length. | ||
| 476 | +When no additional restrictions are given, the default is to be fully | ||
| 477 | +permissive. | ||
| 478 | + | ||
| 479 | +If :samp:`{key-length}` | ||
| 480 | +is 40, the following restriction options are available: | ||
| 481 | + | ||
| 482 | +:samp:`--print=[yn]` | ||
| 483 | + Determines whether or not to allow printing. | ||
| 484 | + | ||
| 485 | +:samp:`--modify=[yn]` | ||
| 486 | + Determines whether or not to allow document modification. | ||
| 487 | + | ||
| 488 | +:samp:`--extract=[yn]` | ||
| 489 | + Determines whether or not to allow text/image extraction. | ||
| 490 | + | ||
| 491 | +:samp:`--annotate=[yn]` | ||
| 492 | + Determines whether or not to allow comments and form fill-in and | ||
| 493 | + signing. | ||
| 494 | + | ||
| 495 | +If :samp:`{key-length}` | ||
| 496 | +is 128, the following restriction options are available: | ||
| 497 | + | ||
| 498 | +:samp:`--accessibility=[yn]` | ||
| 499 | + Determines whether or not to allow accessibility to visually | ||
| 500 | + impaired. The qpdf library disregards this field when AES is used or | ||
| 501 | + when 256-bit encryption is used. You should really never disable | ||
| 502 | + accessibility, but qpdf lets you do it in case you need to configure | ||
| 503 | + a file this way for testing purposes. The PDF spec says that | ||
| 504 | + conforming readers should disregard this permission and always allow | ||
| 505 | + accessibility. | ||
| 506 | + | ||
| 507 | +:samp:`--extract=[yn]` | ||
| 508 | + Determines whether or not to allow text/graphic extraction. | ||
| 509 | + | ||
| 510 | +:samp:`--assemble=[yn]` | ||
| 511 | + Determines whether document assembly (rotation and reordering of | ||
| 512 | + pages) is allowed. | ||
| 513 | + | ||
| 514 | +:samp:`--annotate=[yn]` | ||
| 515 | + Determines whether modifying annotations is allowed. This includes | ||
| 516 | + adding comments and filling in form fields. Also allows editing of | ||
| 517 | + form fields if :samp:`--modify-other=y` is given. | ||
| 518 | + | ||
| 519 | +:samp:`--form=[yn]` | ||
| 520 | + Determines whether filling form fields is allowed. | ||
| 521 | + | ||
| 522 | +:samp:`--modify-other=[yn]` | ||
| 523 | + Allow all document editing except those controlled separately by the | ||
| 524 | + :samp:`--assemble`, | ||
| 525 | + :samp:`--annotate`, and | ||
| 526 | + :samp:`--form` options. | ||
| 527 | + | ||
| 528 | +:samp:`--print={print-opt}` | ||
| 529 | + Controls printing access. | ||
| 530 | + :samp:`{print-opt}` | ||
| 531 | + may be one of the following: | ||
| 532 | + | ||
| 533 | + - :samp:`full`: allow full printing | ||
| 534 | + | ||
| 535 | + - :samp:`low`: allow low-resolution printing only | ||
| 536 | + | ||
| 537 | + - :samp:`none`: disallow printing | ||
| 538 | + | ||
| 539 | +:samp:`--modify={modify-opt}` | ||
| 540 | + Controls modify access. This way of controlling modify access has | ||
| 541 | + less granularity than new options added in qpdf 8.4. | ||
| 542 | + :samp:`{modify-opt}` | ||
| 543 | + may be one of the following: | ||
| 544 | + | ||
| 545 | + - :samp:`all`: allow full document modification | ||
| 546 | + | ||
| 547 | + - :samp:`annotate`: allow comment authoring, form | ||
| 548 | + operations, and document assembly | ||
| 549 | + | ||
| 550 | + - :samp:`form`: allow form field fill-in and signing | ||
| 551 | + and document assembly | ||
| 552 | + | ||
| 553 | + - :samp:`assembly`: allow document assembly only | ||
| 554 | + | ||
| 555 | + - :samp:`none`: allow no modifications | ||
| 556 | + | ||
| 557 | + Using the :samp:`--modify` option does not allow you | ||
| 558 | + to create certain combinations of permissions such as allowing form | ||
| 559 | + filling but not allowing document assembly. Starting with qpdf 8.4, | ||
| 560 | + you can either just use the other options to control fields | ||
| 561 | + individually, or you can use something like :samp:`--modify=form | ||
| 562 | + --assembly=n` to fine tune. | ||
| 563 | + | ||
| 564 | +:samp:`--cleartext-metadata` | ||
| 565 | + If specified, any metadata stream in the document will be left | ||
| 566 | + unencrypted even if the rest of the document is encrypted. This also | ||
| 567 | + forces the PDF version to be at least 1.5. | ||
| 568 | + | ||
| 569 | +:samp:`--use-aes=[yn]` | ||
| 570 | + If :samp:`--use-aes=y` is specified, AES encryption | ||
| 571 | + will be used instead of RC4 encryption. This forces the PDF version | ||
| 572 | + to be at least 1.6. | ||
| 573 | + | ||
| 574 | +:samp:`--allow-insecure` | ||
| 575 | + From qpdf 10.2, qpdf defaults to not allowing creation of PDF files | ||
| 576 | + where the user password is non-empty, the owner password is empty, | ||
| 577 | + and a 256-bit key is in use. Files created in this way are insecure | ||
| 578 | + since they can be opened without a password. Users would ordinarily | ||
| 579 | + never want to create such files. If you are using qpdf to | ||
| 580 | + intentionally created strange files for testing (a definite valid use | ||
| 581 | + of qpdf!), this option allows you to create such insecure files. | ||
| 582 | + | ||
| 583 | +:samp:`--force-V4` | ||
| 584 | + Use of this option forces the ``/V`` and ``/R`` parameters in the | ||
| 585 | + document's encryption dictionary to be set to the value ``4``. As | ||
| 586 | + qpdf will automatically do this when required, there is no reason to | ||
| 587 | + ever use this option. It exists primarily for use in testing qpdf | ||
| 588 | + itself. This option also forces the PDF version to be at least 1.5. | ||
| 589 | + | ||
| 590 | +If :samp:`{key-length}` | ||
| 591 | +is 256, the minimum PDF version is 1.7 with extension level 8, and the | ||
| 592 | +AES-based encryption format used is the PDF 2.0 encryption method | ||
| 593 | +supported by Acrobat X. the same options are available as with 128 bits | ||
| 594 | +with the following exceptions: | ||
| 595 | + | ||
| 596 | +:samp:`--use-aes` | ||
| 597 | + This option is not available with 256-bit keys. AES is always used | ||
| 598 | + with 256-bit encryption keys. | ||
| 599 | + | ||
| 600 | +:samp:`--force-V4` | ||
| 601 | + This option is not available with 256 keys. | ||
| 602 | + | ||
| 603 | +:samp:`--force-R5` | ||
| 604 | + If specified, qpdf sets the minimum version to 1.7 at extension level | ||
| 605 | + 3 and writes the deprecated encryption format used by Acrobat version | ||
| 606 | + IX. This option should not be used in practice to generate PDF files | ||
| 607 | + that will be in general use, but it can be useful to generate files | ||
| 608 | + if you are trying to test proper support in another application for | ||
| 609 | + PDF files encrypted in this way. | ||
| 610 | + | ||
| 611 | +The default for each permission option is to be fully permissive. | ||
| 612 | + | ||
| 613 | +.. _ref.page-selection: | ||
| 614 | + | ||
| 615 | +Page Selection Options | ||
| 616 | +---------------------- | ||
| 617 | + | ||
| 618 | +Starting with qpdf 3.0, it is possible to split and merge PDF files by | ||
| 619 | +selecting pages from one or more input files. Whatever file is given as | ||
| 620 | +the primary input file is used as the starting point, but its pages are | ||
| 621 | +replaced with pages as specified. | ||
| 622 | + | ||
| 623 | +:: | ||
| 624 | + | ||
| 625 | + --pages input-file [ --password=password ] [ page-range ] [ ... ] -- | ||
| 626 | + | ||
| 627 | +Multiple input files may be specified. Each one is given as the name of | ||
| 628 | +the input file, an optional password (if required to open the file), and | ||
| 629 | +the range of pages. Note that ":samp:`--`" terminates | ||
| 630 | +parsing of page selection flags. | ||
| 631 | + | ||
| 632 | +Starting with qpf 8.4, the special input file name | ||
| 633 | +":file:`.`" can be used as a shortcut for the | ||
| 634 | +primary input filename. | ||
| 635 | + | ||
| 636 | +For each file that pages should be taken from, specify the file, a | ||
| 637 | +password needed to open the file (if any), and a page range. The | ||
| 638 | +password needs to be given only once per file. If any of the input files | ||
| 639 | +are the same as the primary input file or the file used to copy | ||
| 640 | +encryption parameters (if specified), you do not need to repeat the | ||
| 641 | +password here. The same file can be repeated multiple times. If a file | ||
| 642 | +that is repeated has a password, the password only has to be given the | ||
| 643 | +first time. All non-page data (info, outlines, page numbers, etc.) are | ||
| 644 | +taken from the primary input file. To discard these, use | ||
| 645 | +:samp:`--empty` as the primary input. | ||
| 646 | + | ||
| 647 | +Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf | ||
| 648 | +sees a value in the place where it expects a page range and that value | ||
| 649 | +is not a valid range but is a valid file name, qpdf will implicitly use | ||
| 650 | +the range ``1-z``, meaning that it will include all pages in the file. | ||
| 651 | +This makes it possible to easily combine all pages in a set of files | ||
| 652 | +with a command like :command:`qpdf --empty out.pdf --pages \*.pdf | ||
| 653 | +--`. | ||
| 654 | + | ||
| 655 | +The page range is a set of numbers separated by commas, ranges of | ||
| 656 | +numbers separated dashes, or combinations of those. The character "z" | ||
| 657 | +represents the last page. A number preceded by an "r" indicates to count | ||
| 658 | +from the end, so ``r3-r1`` would be the last three pages of the | ||
| 659 | +document. Pages can appear in any order. Ranges can appear with a high | ||
| 660 | +number followed by a low number, which causes the pages to appear in | ||
| 661 | +reverse. Numbers may be repeated in a page range. A page range may be | ||
| 662 | +optionally appended with ``:even`` or ``:odd`` to indicate only the even | ||
| 663 | +or odd pages in the given range. Note that even and odd refer to the | ||
| 664 | +positions within the specified, range, not whether the original number | ||
| 665 | +is even or odd. | ||
| 666 | + | ||
| 667 | +Example page ranges: | ||
| 668 | + | ||
| 669 | +- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in | ||
| 670 | + that order. | ||
| 671 | + | ||
| 672 | +- ``z-1``: all pages in the document in reverse | ||
| 673 | + | ||
| 674 | +- ``r3-r1``: the last three pages of the document | ||
| 675 | + | ||
| 676 | +- ``r1-r3``: the last three pages of the document in reverse order | ||
| 677 | + | ||
| 678 | +- ``1-20:even``: even pages from 2 to 20 | ||
| 679 | + | ||
| 680 | +- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd | ||
| 681 | + positions from among the original range, which represents pages 5, 7, | ||
| 682 | + 8, 9, and 12. | ||
| 683 | + | ||
| 684 | +Starting in qpdf version 8.3, you can specify the | ||
| 685 | +:samp:`--collate` option. Note that this option is | ||
| 686 | +specified outside of :samp:`--pagesย ...ย --`. When | ||
| 687 | +:samp:`--collate` is specified, it changes the meaning | ||
| 688 | +of :samp:`--pages` so that the specified files, as | ||
| 689 | +modified by page ranges, are collated rather than concatenated. For | ||
| 690 | +example, if you add the files :file:`odd.pdf` and | ||
| 691 | +:file:`even.pdf` containing odd and even pages of a | ||
| 692 | +document respectively, you could run :command:`qpdf --collate odd.pdf | ||
| 693 | +--pages odd.pdf even.pdf -- all.pdf` to collate the pages. | ||
| 694 | +This would pick page 1 from odd, page 1 from even, page 2 from odd, page | ||
| 695 | +2 from even, etc. until all pages have been included. Any number of | ||
| 696 | +files and page ranges can be specified. If any file has fewer pages, | ||
| 697 | +that file is just skipped when its pages have all been included. For | ||
| 698 | +example, if you ran :command:`qpdf --collate --empty --pages a.pdf | ||
| 699 | +1-5 b.pdf 6-4 c.pdf r1 -- out.pdf`, you would get the | ||
| 700 | +following pages in this order: | ||
| 701 | + | ||
| 702 | +- a.pdf page 1 | ||
| 703 | + | ||
| 704 | +- b.pdf page 6 | ||
| 705 | + | ||
| 706 | +- c.pdf last page | ||
| 707 | + | ||
| 708 | +- a.pdf page 2 | ||
| 709 | + | ||
| 710 | +- b.pdf page 5 | ||
| 711 | + | ||
| 712 | +- a.pdf page 3 | ||
| 713 | + | ||
| 714 | +- b.pdf page 4 | ||
| 715 | + | ||
| 716 | +- a.pdf page 4 | ||
| 717 | + | ||
| 718 | +- a.pdf page 5 | ||
| 719 | + | ||
| 720 | +Starting in qpdf version 10.2, you may specify a numeric argument to | ||
| 721 | +:samp:`--collate`. With | ||
| 722 | +:samp:`--collate={n}`, | ||
| 723 | +pull groups of :samp:`{n}` pages from each file, | ||
| 724 | +again, stopping when there are no more pages. For example, if you ran | ||
| 725 | +:command:`qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf | ||
| 726 | +r1 -- out.pdf`, you would get the following pages in this | ||
| 727 | +order: | ||
| 728 | + | ||
| 729 | +- a.pdf page 1 | ||
| 730 | + | ||
| 731 | +- a.pdf page 2 | ||
| 732 | + | ||
| 733 | +- b.pdf page 6 | ||
| 734 | + | ||
| 735 | +- b.pdf page 5 | ||
| 736 | + | ||
| 737 | +- c.pdf last page | ||
| 738 | + | ||
| 739 | +- a.pdf page 3 | ||
| 740 | + | ||
| 741 | +- a.pdf page 4 | ||
| 742 | + | ||
| 743 | +- b.pdf page 4 | ||
| 744 | + | ||
| 745 | +- a.pdf page 5 | ||
| 746 | + | ||
| 747 | +Starting in qpdf version 8.3, when you split and merge files, any page | ||
| 748 | +labels (page numbers) are preserved in the final file. It is expected | ||
| 749 | +that more document features will be preserved by splitting and merging. | ||
| 750 | +In the mean time, semantics of splitting and merging vary across | ||
| 751 | +features. For example, the document's outlines (bookmarks) point to | ||
| 752 | +actual page objects, so if you select some pages and not others, | ||
| 753 | +bookmarks that point to pages that are in the output file will work, and | ||
| 754 | +remaining bookmarks will not work. A future version of | ||
| 755 | +:command:`qpdf` may do a better job at handling these | ||
| 756 | +issues. (Note that the qpdf library already contains all of the APIs | ||
| 757 | +required in order to implement this in your own application if you need | ||
| 758 | +it.) In the mean time, you can always use | ||
| 759 | +:samp:`--empty` as the primary input file to avoid | ||
| 760 | +copying all of that from the first file. For example, to take pages 1 | ||
| 761 | +through 5 from a :file:`infile.pdf` while preserving | ||
| 762 | +all metadata associated with that file, you could use | ||
| 763 | + | ||
| 764 | +:: | ||
| 765 | + | ||
| 766 | + qpdf infile.pdf --pages . 1-5 -- outfile.pdf | ||
| 767 | + | ||
| 768 | +If you wanted pages 1 through 5 from | ||
| 769 | +:file:`infile.pdf` but you wanted the rest of the | ||
| 770 | +metadata to be dropped, you could instead run | ||
| 771 | + | ||
| 772 | +:: | ||
| 773 | + | ||
| 774 | + qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf | ||
| 775 | + | ||
| 776 | +If you wanted to take pages 1 through 5 from | ||
| 777 | +:file:`file1.pdf` and pages 11 through 15 from | ||
| 778 | +:file:`file2.pdf` in reverse, taking document-level | ||
| 779 | +metadata from :file:`file2.pdf`, you would run | ||
| 780 | + | ||
| 781 | +:: | ||
| 782 | + | ||
| 783 | + qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf | ||
| 784 | + | ||
| 785 | +If, for some reason, you wanted to take the first page of an encrypted | ||
| 786 | +file called :file:`encrypted.pdf` with password | ||
| 787 | +``pass`` and repeat it twice in an output file, and if you wanted to | ||
| 788 | +drop document-level metadata but preserve encryption, you would use | ||
| 789 | + | ||
| 790 | +:: | ||
| 791 | + | ||
| 792 | + qpdf --empty --copy-encryption=encrypted.pdf --encryption-file-password=pass | ||
| 793 | + --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 -- | ||
| 794 | + outfile.pdf | ||
| 795 | + | ||
| 796 | +Note that we had to specify the password all three times because giving | ||
| 797 | +a password as :samp:`--encryption-file-password` doesn't | ||
| 798 | +count for page selection, and as far as qpdf is concerned, | ||
| 799 | +:file:`encrypted.pdf` and | ||
| 800 | +:file:`./encrypted.pdf` are separated files. These | ||
| 801 | +are all corner cases that most users should hopefully never have to be | ||
| 802 | +bothered with. | ||
| 803 | + | ||
| 804 | +Prior to version 8.4, it was not possible to specify the same page from | ||
| 805 | +the same file directly more than once, and the workaround of specifying | ||
| 806 | +the same file in more than one way was required. Version 8.4 removes | ||
| 807 | +this limitation, but there is still a valid use case. When you specify | ||
| 808 | +the same page from the same file more than once, qpdf will share objects | ||
| 809 | +between the pages. If you are going to do further manipulation on the | ||
| 810 | +file and need the two instances of the same original page to be deep | ||
| 811 | +copies, then you can specify the file in two different ways. For example | ||
| 812 | +:command:`qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf` | ||
| 813 | +would create a file with two copies of the first page of the input, and | ||
| 814 | +the two copies would share any objects in common. This includes fonts, | ||
| 815 | +images, and anything else the page references. | ||
| 816 | + | ||
| 817 | +.. _ref.overlay-underlay: | ||
| 818 | + | ||
| 819 | +Overlay and Underlay Options | ||
| 820 | +---------------------------- | ||
| 821 | + | ||
| 822 | +Starting with qpdf 8.4, it is possible to overlay or underlay pages from | ||
| 823 | +other files onto the output generated by qpdf. Specify overlay or | ||
| 824 | +underlay as follows: | ||
| 825 | + | ||
| 826 | +:: | ||
| 827 | + | ||
| 828 | + { --overlay | --underlay } file [ options ] -- | ||
| 829 | + | ||
| 830 | +Overlay and underlay options are processed late, so they can be combined | ||
| 831 | +with other like merging and will apply to the final output. The | ||
| 832 | +:samp:`--overlay` and :samp:`--underlay` | ||
| 833 | +options work the same way, except underlay pages are drawn underneath | ||
| 834 | +the page to which they are applied, possibly obscured by the original | ||
| 835 | +page, and overlay files are drawn on top of the page to which they are | ||
| 836 | +applied, possibly obscuring the page. You can combine overlay and | ||
| 837 | +underlay. | ||
| 838 | + | ||
| 839 | +The default behavior of overlay and underlay is that pages are taken | ||
| 840 | +from the overlay/underlay file in sequence and applied to corresponding | ||
| 841 | +pages in the output until there are no more output pages. If the overlay | ||
| 842 | +or underlay file runs out of pages, remaining output pages are left | ||
| 843 | +alone. This behavior can be modified by options, which are provided | ||
| 844 | +between the :samp:`--overlay` or | ||
| 845 | +:samp:`--underlay` flag and the | ||
| 846 | +:samp:`--` option. The following options are supported: | ||
| 847 | + | ||
| 848 | +- :samp:`--password=password`: supply a password if the | ||
| 849 | + overlay/underlay file is encrypted. | ||
| 850 | + | ||
| 851 | +- :samp:`--to=page-range`: a range of pages in the same | ||
| 852 | + form at described in :ref:`ref.page-selection` | ||
| 853 | + indicates which pages in the output should have the overlay/underlay | ||
| 854 | + applied. If not specified, overlay/underlay are applied to all pages. | ||
| 855 | + | ||
| 856 | +- :samp:`--from=[page-range]`: a range of pages that | ||
| 857 | + specifies which pages in the overlay/underlay file will be used for | ||
| 858 | + overlay or underlay. If not specified, all pages will be used. This | ||
| 859 | + can be explicitly specified to be empty if | ||
| 860 | + :samp:`--repeat` is used. | ||
| 861 | + | ||
| 862 | +- :samp:`--repeat=page-range`: an optional range of | ||
| 863 | + pages that specifies which pages in the overlay/underlay file will be | ||
| 864 | + repeated after the "from" pages are used up. If you want to repeat a | ||
| 865 | + range of pages starting at the beginning, you can explicitly use | ||
| 866 | + :samp:`--from=`. | ||
| 867 | + | ||
| 868 | +Here are some examples. | ||
| 869 | + | ||
| 870 | +- :command:`--overlay o.pdf --to=1-5 --from=1-3 --repeat=4 | ||
| 871 | + --`: overlay the first three pages from file | ||
| 872 | + :file:`o.pdf` onto the first three pages of the | ||
| 873 | + output, then overlay page 4 from :file:`o.pdf` | ||
| 874 | + onto pages 4 and 5 of the output. Leave remaining output pages | ||
| 875 | + untouched. | ||
| 876 | + | ||
| 877 | +- :command:`--underlay footer.pdf --from= --repeat=1,2 | ||
| 878 | + --`: Underlay page 1 of | ||
| 879 | + :file:`footer.pdf` on all odd output pages, and | ||
| 880 | + underlay page 2 of :file:`footer.pdf` on all even | ||
| 881 | + output pages. | ||
| 882 | + | ||
| 883 | +.. _ref.attachments: | ||
| 884 | + | ||
| 885 | +Embedded Files/Attachments Options | ||
| 886 | +---------------------------------- | ||
| 887 | + | ||
| 888 | +Starting with qpdf 10.2, you can work with file attachments in PDF files | ||
| 889 | +from the command line. The following options are available: | ||
| 890 | + | ||
| 891 | +:samp:`--list-attachments` | ||
| 892 | + Show the "key" and stream number for embedded files. With | ||
| 893 | + :samp:`--verbose`, additional information, including | ||
| 894 | + preferred file name, description, dates, and more are also displayed. | ||
| 895 | + The key is usually but not always equal to the file name, and is | ||
| 896 | + needed by some of the other options. | ||
| 897 | + | ||
| 898 | +:samp:`--show-attachment={key}` | ||
| 899 | + Write the contents of the specified attachment to standard output as | ||
| 900 | + binary data. The key should match one of the keys shown by | ||
| 901 | + :samp:`--list-attachments`. If specified multiple | ||
| 902 | + times, only the last attachment will be shown. | ||
| 903 | + | ||
| 904 | +:samp:`--add-attachment {file} {options} --` | ||
| 905 | + Add or replace an attachment with the contents of | ||
| 906 | + :samp:`{file}`. This may be specified more | ||
| 907 | + than once. The following additional options may appear before the | ||
| 908 | + ``--`` that ends this option: | ||
| 909 | + | ||
| 910 | + :samp:`--key={key}` | ||
| 911 | + The key to use to register the attachment in the embedded files | ||
| 912 | + table. Defaults to the last path element of | ||
| 913 | + :samp:`{file}`. | ||
| 914 | + | ||
| 915 | + :samp:`--filename={name}` | ||
| 916 | + The file name to be used for the attachment. This is what is | ||
| 917 | + usually displayed to the user and is the name most graphical PDF | ||
| 918 | + viewers will use when saving a file. It defaults to the last path | ||
| 919 | + element of :samp:`{file}`. | ||
| 920 | + | ||
| 921 | + :samp:`--creationdate={date}` | ||
| 922 | + The attachment's creation date in PDF format; defaults to the | ||
| 923 | + current time. The date format is explained below. | ||
| 924 | + | ||
| 925 | + :samp:`--moddate={date}` | ||
| 926 | + The attachment's modification date in PDF format; defaults to the | ||
| 927 | + current time. The date format is explained below. | ||
| 928 | + | ||
| 929 | + :samp:`--mimetype={type/subtype}` | ||
| 930 | + The mime type for the attachment, e.g. ``text/plain`` or | ||
| 931 | + ``application/pdf``. Note that the mimetype appears in a field | ||
| 932 | + called ``/Subtype`` in the PDF but actually includes the full type | ||
| 933 | + and subtype of the mime type. | ||
| 934 | + | ||
| 935 | + :samp:`--description={"text"}` | ||
| 936 | + Descriptive text for the attachment, displayed by some PDF | ||
| 937 | + viewers. | ||
| 938 | + | ||
| 939 | + :samp:`--replace` | ||
| 940 | + Indicates that any existing attachment with the same key should be | ||
| 941 | + replaced by the new attachment. Otherwise, | ||
| 942 | + :command:`qpdf` gives an error if an attachment | ||
| 943 | + with that key is already present. | ||
| 944 | + | ||
| 945 | +:samp:`--remove-attachment={key}` | ||
| 946 | + Remove the specified attachment. This doesn't only remove the | ||
| 947 | + attachment from the embedded files table but also clears out the file | ||
| 948 | + specification. That means that any potential internal links to the | ||
| 949 | + attachment will be broken. This option may be specified multiple | ||
| 950 | + times. Run with :samp:`--verbose` to see status of | ||
| 951 | + the removal. | ||
| 952 | + | ||
| 953 | +:samp:`--copy-attachments-from {file} {options} --` | ||
| 954 | + Copy attachments from another file. This may be specified more than | ||
| 955 | + once. The following additional options may appear before the ``--`` | ||
| 956 | + that ends this option: | ||
| 957 | + | ||
| 958 | + :samp:`--password={password}` | ||
| 959 | + If required, the password needed to open | ||
| 960 | + :samp:`{file}` | ||
| 961 | + | ||
| 962 | + :samp:`--prefix={prefix}` | ||
| 963 | + Only required if the file from which attachments are being copied | ||
| 964 | + has attachments with keys that conflict with attachments already | ||
| 965 | + in the file. In this case, the specified prefix will be prepended | ||
| 966 | + to each key. This affects only the key in the embedded files | ||
| 967 | + table, not the file name. The PDF specification doesn't preclude | ||
| 968 | + multiple attachments having the same file name. | ||
| 969 | + | ||
| 970 | +When a date is required, the date should conform to the PDF date format | ||
| 971 | +specification, which is | ||
| 972 | +``D:``\ :samp:`{yyyymmddhhmmss<z>}`, where | ||
| 973 | +:samp:`{<z>}` is either ``Z`` for UTC or a | ||
| 974 | +timezone offset in the form :samp:`{-hh'mm'}` or | ||
| 975 | +:samp:`{+hh'mm'}`. Examples: | ||
| 976 | +``D:20210207161528-05'00'``, ``D:20210207211528Z``. | ||
| 977 | + | ||
| 978 | +.. _ref.advanced-parsing: | ||
| 979 | + | ||
| 980 | +Advanced Parsing Options | ||
| 981 | +------------------------ | ||
| 982 | + | ||
| 983 | +These options control aspects of how qpdf reads PDF files. Mostly these | ||
| 984 | +are of use to people who are working with damaged files. There is little | ||
| 985 | +reason to use these options unless you are trying to solve specific | ||
| 986 | +problems. The following options are available: | ||
| 987 | + | ||
| 988 | +:samp:`--suppress-recovery` | ||
| 989 | + Prevents qpdf from attempting to recover damaged files. | ||
| 990 | + | ||
| 991 | +:samp:`--ignore-xref-streams` | ||
| 992 | + Tells qpdf to ignore any cross-reference streams. | ||
| 993 | + | ||
| 994 | +Ordinarily, qpdf will attempt to recover from certain types of errors in | ||
| 995 | +PDF files. These include errors in the cross-reference table, certain | ||
| 996 | +types of object numbering errors, and certain types of stream length | ||
| 997 | +errors. Sometimes, qpdf may think it has recovered but may not have | ||
| 998 | +actually recovered, so care should be taken when using this option as | ||
| 999 | +some data loss is possible. The | ||
| 1000 | +:samp:`--suppress-recovery` option will prevent qpdf | ||
| 1001 | +from attempting recovery. In this case, it will fail on the first error | ||
| 1002 | +that it encounters. | ||
| 1003 | + | ||
| 1004 | +Ordinarily, qpdf reads cross-reference streams when they are present in | ||
| 1005 | +a PDF file. If :samp:`--ignore-xref-streams` is | ||
| 1006 | +specified, qpdf will ignore any cross-reference streams for hybrid PDF | ||
| 1007 | +files. The purpose of hybrid files is to make some content available to | ||
| 1008 | +viewers that are not aware of cross-reference streams. It is almost | ||
| 1009 | +never desirable to ignore them. The only time when you might want to use | ||
| 1010 | +this feature is if you are testing creation of hybrid PDF files and wish | ||
| 1011 | +to see how a PDF consumer that doesn't understand object and | ||
| 1012 | +cross-reference streams would interpret such a file. | ||
| 1013 | + | ||
| 1014 | +.. _ref.advanced-transformation: | ||
| 1015 | + | ||
| 1016 | +Advanced Transformation Options | ||
| 1017 | +------------------------------- | ||
| 1018 | + | ||
| 1019 | +These transformation options control fine points of how qpdf creates the | ||
| 1020 | +output file. Mostly these are of use only to people who are very | ||
| 1021 | +familiar with the PDF file format or who are PDF developers. The | ||
| 1022 | +following options are available: | ||
| 1023 | + | ||
| 1024 | +:samp:`--compress-streams={[yn]}` | ||
| 1025 | + By default, or with :samp:`--compress-streams=y`, | ||
| 1026 | + qpdf will compress any stream with no other filters applied to it | ||
| 1027 | + with the ``/FlateDecode`` filter when it writes it. To suppress this | ||
| 1028 | + behavior and preserve uncompressed streams as uncompressed, use | ||
| 1029 | + :samp:`--compress-streams=n`. | ||
| 1030 | + | ||
| 1031 | +:samp:`--decode-level={option}` | ||
| 1032 | + Controls which streams qpdf tries to decode. The default is | ||
| 1033 | + :samp:`generalized`. The following options are | ||
| 1034 | + available: | ||
| 1035 | + | ||
| 1036 | + - :samp:`none`: do not attempt to decode any streams | ||
| 1037 | + | ||
| 1038 | + - :samp:`generalized`: decode streams filtered with | ||
| 1039 | + supported generalized filters: ``/LZWDecode``, ``/FlateDecode``, | ||
| 1040 | + ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized | ||
| 1041 | + filters as those to be used for general-purpose compression or | ||
| 1042 | + encoding, as opposed to filters specifically designed for image | ||
| 1043 | + data. Note that, by default, streams already compressed with | ||
| 1044 | + ``/FlateDecode`` are not uncompressed and recompressed unless you | ||
| 1045 | + also specify :samp:`--recompress-flate`. | ||
| 1046 | + | ||
| 1047 | + - :samp:`specialized`: in addition to generalized, | ||
| 1048 | + decode streams with supported non-lossy specialized filters; | ||
| 1049 | + currently this is just ``/RunLengthDecode`` | ||
| 1050 | + | ||
| 1051 | + - :samp:`all`: in addition to generalized and | ||
| 1052 | + specialized, decode streams with supported lossy filters; | ||
| 1053 | + currently this is just ``/DCTDecode`` (JPEG) | ||
| 1054 | + | ||
| 1055 | +:samp:`--stream-data={option}` | ||
| 1056 | + Controls transformation of stream data. This option predates the | ||
| 1057 | + :samp:`--compress-streams` and | ||
| 1058 | + :samp:`--decode-level` options. Those options can be | ||
| 1059 | + used to achieve the same affect with more control. The value of | ||
| 1060 | + :samp:`{option}` may | ||
| 1061 | + be one of the following: | ||
| 1062 | + | ||
| 1063 | + - :samp:`compress`: recompress stream data when | ||
| 1064 | + possible (default); equivalent to | ||
| 1065 | + :samp:`--compress-streams=y` | ||
| 1066 | + :samp:`--decode-level=generalized`. Does not | ||
| 1067 | + recompress streams already compressed with ``/FlateDecode`` unless | ||
| 1068 | + :samp:`--recompress-flate` is also specified. | ||
| 1069 | + | ||
| 1070 | + - :samp:`preserve`: leave all stream data as is; | ||
| 1071 | + equivalent to :samp:`--compress-streams=n` | ||
| 1072 | + :samp:`--decode-level=none` | ||
| 1073 | + | ||
| 1074 | + - :samp:`uncompress`: uncompress stream data | ||
| 1075 | + compressed with generalized filters when possible; equivalent to | ||
| 1076 | + :samp:`--compress-streams=n` | ||
| 1077 | + :samp:`--decode-level=generalized` | ||
| 1078 | + | ||
| 1079 | +:samp:`--recompress-flate` | ||
| 1080 | + By default, streams already compressed with ``/FlateDecode`` are left | ||
| 1081 | + alone rather than being uncompressed and recompressed. This option | ||
| 1082 | + causes qpdf to uncompress and recompress the streams. There is a | ||
| 1083 | + significant performance cost to using this option, but you probably | ||
| 1084 | + want to use it if you specify | ||
| 1085 | + :samp:`--compression-level`. | ||
| 1086 | + | ||
| 1087 | +:samp:`--compression-level={level}` | ||
| 1088 | + When writing new streams that are compressed with ``/FlateDecode``, | ||
| 1089 | + use the specified compression level. The value of | ||
| 1090 | + :samp:`level` should be a number from 1 to 9 and is | ||
| 1091 | + passed directly to zlib, which implements deflate compression. Note | ||
| 1092 | + that qpdf doesn't uncompress and recompress streams by default. To | ||
| 1093 | + have this option apply to already compressed streams, you should also | ||
| 1094 | + specify :samp:`--recompress-flate`. If your goal is | ||
| 1095 | + to shrink the size of PDF files, you should also use | ||
| 1096 | + :samp:`--object-streams=generate`. | ||
| 1097 | + | ||
| 1098 | +:samp:`--normalize-content=[yn]` | ||
| 1099 | + Enables or disables normalization of content streams. Content | ||
| 1100 | + normalization is enabled by default in QDF mode. Please see :ref:`ref.qdf` for additional discussion of QDF mode. | ||
| 1101 | + | ||
| 1102 | +:samp:`--object-streams={mode}` | ||
| 1103 | + Controls handling of object streams. The value of | ||
| 1104 | + :samp:`{mode}` may be | ||
| 1105 | + one of the following: | ||
| 1106 | + | ||
| 1107 | + - :samp:`preserve`: preserve original object streams | ||
| 1108 | + (default) | ||
| 1109 | + | ||
| 1110 | + - :samp:`disable`: don't write any object streams | ||
| 1111 | + | ||
| 1112 | + - :samp:`generate`: use object streams wherever | ||
| 1113 | + possible | ||
| 1114 | + | ||
| 1115 | +:samp:`--preserve-unreferenced` | ||
| 1116 | + Tells qpdf to preserve objects that are not referenced when writing | ||
| 1117 | + the file. Ordinarily any object that is not referenced in a traversal | ||
| 1118 | + of the document from the trailer dictionary will be discarded. This | ||
| 1119 | + may be useful in working with some damaged files or inspecting files | ||
| 1120 | + with known unreferenced objects. | ||
| 1121 | + | ||
| 1122 | + This flag is ignored for linearized files and has the effect of | ||
| 1123 | + causing objects in the new file to be written in order by object ID | ||
| 1124 | + from the original file. This does not mean that object numbers will | ||
| 1125 | + be the same since qpdf may create stream lengths as direct or | ||
| 1126 | + indirect differently from the original file, and the original file | ||
| 1127 | + may have gaps in its numbering. | ||
| 1128 | + | ||
| 1129 | + See also :samp:`--preserve-unreferenced-resources`, | ||
| 1130 | + which does something completely different. | ||
| 1131 | + | ||
| 1132 | +:samp:`--remove-unreferenced-resources={option}` | ||
| 1133 | + The :samp:`{option}` may be ``auto``, | ||
| 1134 | + ``yes``, or ``no``. The default is ``auto``. | ||
| 1135 | + | ||
| 1136 | + Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt | ||
| 1137 | + to remove images and fonts that are not used by a page even if they | ||
| 1138 | + are referenced in the page's resources dictionary. When shared | ||
| 1139 | + resources are in use, this behavior can greatly reduce the file sizes | ||
| 1140 | + of split pages, but the analysis is very slow. In versions from 8.1 | ||
| 1141 | + through 9.1.1, qpdf did this analysis by default. Starting in qpdf | ||
| 1142 | + 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file | ||
| 1143 | + to determine whether the file is likely to have unreferenced objects | ||
| 1144 | + on pages, a pattern that frequently occurs when resource dictionaries | ||
| 1145 | + are shared across multiple pages and rarely occurs otherwise. If it | ||
| 1146 | + discovers this pattern, then it will attempt to remove unreferenced | ||
| 1147 | + resources. Usually this means you get the slower splitting speed only | ||
| 1148 | + when it's actually going to create smaller files. You can suppress | ||
| 1149 | + removal of unreferenced resources altogether by specifying ``no`` or | ||
| 1150 | + force it to do the full algorithm by specifying ``yes``. | ||
| 1151 | + | ||
| 1152 | + Other than cases in which you don't care about file size and care a | ||
| 1153 | + lot about runtime, there are few reasons to use this option, | ||
| 1154 | + especially now that ``auto`` mode is supported. One reason to use | ||
| 1155 | + this is if you suspect that qpdf is removing resources it shouldn't | ||
| 1156 | + be removing. If you encounter that case, please report it as bug at | ||
| 1157 | + https://github.com/qpdf/qpdf/issues/. | ||
| 1158 | + | ||
| 1159 | +:samp:`--preserve-unreferenced-resources` | ||
| 1160 | + This is a synonym for | ||
| 1161 | + :samp:`--remove-unreferenced-resources=no`. | ||
| 1162 | + | ||
| 1163 | + See also :samp:`--preserve-unreferenced`, which does | ||
| 1164 | + something completely different. | ||
| 1165 | + | ||
| 1166 | +:samp:`--newline-before-endstream` | ||
| 1167 | + Tells qpdf to insert a newline before the ``endstream`` keyword, not | ||
| 1168 | + counted in the length, after any stream content even if the last | ||
| 1169 | + character of the stream was a newline. This may result in two | ||
| 1170 | + newlines in some cases. This is a requirement of PDF/A. While qpdf | ||
| 1171 | + doesn't specifically know how to generate PDF/A-compliant PDFs, this | ||
| 1172 | + at least prevents it from removing compliance on already compliant | ||
| 1173 | + files. | ||
| 1174 | + | ||
| 1175 | +:samp:`--linearize-pass1={file}` | ||
| 1176 | + Write the first pass of linearization to the named file. The | ||
| 1177 | + resulting file is not a valid PDF file. This option is useful only | ||
| 1178 | + for debugging ``QPDFWriter``'s linearization code. When qpdf | ||
| 1179 | + linearizes files, it writes the file in two passes, using the first | ||
| 1180 | + pass to calculate sizes and offsets that are required for hint tables | ||
| 1181 | + and the linearization dictionary. Ordinarily, the first pass is | ||
| 1182 | + discarded. This option enables it to be captured. | ||
| 1183 | + | ||
| 1184 | +:samp:`--coalesce-contents` | ||
| 1185 | + When a page's contents are split across multiple streams, this option | ||
| 1186 | + causes qpdf to combine them into a single stream. Use of this option | ||
| 1187 | + is never necessary for ordinary usage, but it can help when working | ||
| 1188 | + with some files in some cases. For example, this can also be combined | ||
| 1189 | + with QDF mode or content normalization to make it easier to look at | ||
| 1190 | + all of a page's contents at once. | ||
| 1191 | + | ||
| 1192 | +:samp:`--flatten-annotations={option}` | ||
| 1193 | + This option collapses annotations into the pages' contents with | ||
| 1194 | + special handling for form fields. Ordinarily, an annotation is | ||
| 1195 | + rendered separately and on top of the page. Combining annotations | ||
| 1196 | + into the page's contents effectively freezes the placement of the | ||
| 1197 | + annotations, making them look right after various page | ||
| 1198 | + transformations. The library functionality backing this option was | ||
| 1199 | + added for the benefit of programs that want to create *n-up* page | ||
| 1200 | + layouts and other similar things that don't work well with | ||
| 1201 | + annotations. The :samp:`{option}` parameter | ||
| 1202 | + may be any of the following: | ||
| 1203 | + | ||
| 1204 | + - :samp:`all`: include all annotations that are not | ||
| 1205 | + marked invisible or hidden | ||
| 1206 | + | ||
| 1207 | + - :samp:`print`: only include annotations that | ||
| 1208 | + indicate that they should appear when the page is printed | ||
| 1209 | + | ||
| 1210 | + - :samp:`screen`: omit annotations that indicate | ||
| 1211 | + they should not appear on the screen | ||
| 1212 | + | ||
| 1213 | + Note that form fields are special because the annotations that are | ||
| 1214 | + used to render filled-in form fields may become out of date from the | ||
| 1215 | + fields' values if the form is filled in by a program that doesn't | ||
| 1216 | + know how to update the appearances. If qpdf detects this case, its | ||
| 1217 | + default behavior is not to flatten those annotations because doing so | ||
| 1218 | + would cause the value of the form field to be lost. This gives you a | ||
| 1219 | + chance to go back and resave the form with a program that knows how | ||
| 1220 | + to generate appearances. QPDF itself can generate appearances with | ||
| 1221 | + some limitations. See the | ||
| 1222 | + :samp:`--generate-appearances` option below. | ||
| 1223 | + | ||
| 1224 | +:samp:`--generate-appearances` | ||
| 1225 | + If a file contains interactive form fields and indicates that the | ||
| 1226 | + appearances are out of date with the values of the form, this flag | ||
| 1227 | + will regenerate appearances, subject to a few limitations. Note that | ||
| 1228 | + there is not usually a reason to do this, but it can be necessary | ||
| 1229 | + before using the :samp:`--flatten-annotations` | ||
| 1230 | + option. Most of these are not a problem with well-behaved PDF files. | ||
| 1231 | + The limitations are as follows: | ||
| 1232 | + | ||
| 1233 | + - Radio button and checkbox appearances use the pre-set values in | ||
| 1234 | + the PDF file. QPDF just makes sure that the correct appearance is | ||
| 1235 | + displayed based on the value of the field. This is fine for PDF | ||
| 1236 | + files that create their forms properly. Some PDF writers save | ||
| 1237 | + appearances for fields when they change, which could cause some | ||
| 1238 | + controls to have inconsistent appearances. | ||
| 1239 | + | ||
| 1240 | + - For text fields and list boxes, any characters that fall outside | ||
| 1241 | + of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman" | ||
| 1242 | + encoding, will be replaced by the ``?`` character. | ||
| 1243 | + | ||
| 1244 | + - Quadding is ignored. Quadding is used to specify whether the | ||
| 1245 | + contents of a field should be left, center, or right aligned with | ||
| 1246 | + the field. | ||
| 1247 | + | ||
| 1248 | + - Rich text, multi-line, and other more elaborate formatting | ||
| 1249 | + directives are ignored. | ||
| 1250 | + | ||
| 1251 | + - There is no support for multi-select fields or signature fields. | ||
| 1252 | + | ||
| 1253 | + If qpdf doesn't do a good enough job with your form, use an external | ||
| 1254 | + application to save your filled-in form before processing it with | ||
| 1255 | + qpdf. | ||
| 1256 | + | ||
| 1257 | +:samp:`--optimize-images` | ||
| 1258 | + This flag causes qpdf to recompress all images that are not | ||
| 1259 | + compressed with DCT (JPEG) using DCT compression as long as doing so | ||
| 1260 | + decreases the size in bytes of the image data and the image does not | ||
| 1261 | + fall below minimum specified dimensions. Useful information is | ||
| 1262 | + provided when used in combination with | ||
| 1263 | + :samp:`--verbose`. See also the | ||
| 1264 | + :samp:`--oi-min-width`, | ||
| 1265 | + :samp:`--oi-min-height`, and | ||
| 1266 | + :samp:`--oi-min-area` options. By default, starting | ||
| 1267 | + in qpdf 8.4, inline images are converted to regular images and | ||
| 1268 | + optimized as well. Use :samp:`--keep-inline-images` | ||
| 1269 | + to prevent inline images from being included. | ||
| 1270 | + | ||
| 1271 | +:samp:`--oi-min-width={width}` | ||
| 1272 | + Avoid optimizing images whose width is below the specified amount. If | ||
| 1273 | + omitted, the default is 128 pixels. Use 0 for no minimum. | ||
| 1274 | + | ||
| 1275 | +:samp:`--oi-min-height={height}` | ||
| 1276 | + Avoid optimizing images whose height is below the specified amount. | ||
| 1277 | + If omitted, the default is 128 pixels. Use 0 for no minimum. | ||
| 1278 | + | ||
| 1279 | +:samp:`--oi-min-area={area-in-pixels}` | ||
| 1280 | + Avoid optimizing images whose pixel count (widthย รย height) is below | ||
| 1281 | + the specified amount. If omitted, the default is 16,384 pixels. Use 0 | ||
| 1282 | + for no minimum. | ||
| 1283 | + | ||
| 1284 | +:samp:`--externalize-inline-images` | ||
| 1285 | + Convert inline images to regular images. By default, images whose | ||
| 1286 | + data is at least 1,024 bytes are converted when this option is | ||
| 1287 | + selected. Use :samp:`--ii-min-bytes` to change the | ||
| 1288 | + size threshold. This option is implicitly selected when | ||
| 1289 | + :samp:`--optimize-images` is selected. Use | ||
| 1290 | + :samp:`--keep-inline-images` to exclude inline images | ||
| 1291 | + from image optimization. | ||
| 1292 | + | ||
| 1293 | +:samp:`--ii-min-bytes={bytes}` | ||
| 1294 | + Avoid converting inline images whose size is below the specified | ||
| 1295 | + minimum size to regular images. If omitted, the default is 1,024 | ||
| 1296 | + bytes. Use 0 for no minimum. | ||
| 1297 | + | ||
| 1298 | +:samp:`--keep-inline-images` | ||
| 1299 | + Prevent inline images from being included in image optimization. This | ||
| 1300 | + option has no affect when :samp:`--optimize-images` | ||
| 1301 | + is not specified. | ||
| 1302 | + | ||
| 1303 | +:samp:`--remove-page-labels` | ||
| 1304 | + Remove page labels from the output file. | ||
| 1305 | + | ||
| 1306 | +:samp:`--qdf` | ||
| 1307 | + Turns on QDF mode. For additional information on QDF, please see :ref:`ref.qdf`. Note that :samp:`--linearize` | ||
| 1308 | + disables QDF mode. | ||
| 1309 | + | ||
| 1310 | +:samp:`--min-version={version}` | ||
| 1311 | + Forces the PDF version of the output file to be at least | ||
| 1312 | + :samp:`{version}`. In other words, if the | ||
| 1313 | + input file has a lower version than the specified version, the | ||
| 1314 | + specified version will be used. If the input file has a higher | ||
| 1315 | + version, the input file's original version will be used. It is seldom | ||
| 1316 | + necessary to use this option since qpdf will automatically increase | ||
| 1317 | + the version as needed when adding features that require newer PDF | ||
| 1318 | + readers. | ||
| 1319 | + | ||
| 1320 | + The version number may be expressed in the form | ||
| 1321 | + :samp:`{major.minor.extension-level}`, in | ||
| 1322 | + which case the version is interpreted as | ||
| 1323 | + :samp:`{major.minor}` at extension level | ||
| 1324 | + :samp:`{extension-level}`. For example, | ||
| 1325 | + version ``1.7.8`` represents version 1.7 at extension level 8. Note | ||
| 1326 | + that minimal syntax checking is done on the command line. | ||
| 1327 | + | ||
| 1328 | +:samp:`--force-version={version}` | ||
| 1329 | + This option forces the PDF version to be the exact version specified | ||
| 1330 | + *even when the file may have content that is not supported in that | ||
| 1331 | + version*. The version number is interpreted in the same way as with | ||
| 1332 | + :samp:`--min-version` so that extension levels can be | ||
| 1333 | + set. In some cases, forcing the output file's PDF version to be lower | ||
| 1334 | + than that of the input file will cause qpdf to disable certain | ||
| 1335 | + features of the document. Specifically, 256-bit keys are disabled if | ||
| 1336 | + the version is less than 1.7 with extension level 8 (except R5 is | ||
| 1337 | + disabled if less than 1.7 with extension level 3), AES encryption is | ||
| 1338 | + disabled if the version is less than 1.6, cleartext metadata and | ||
| 1339 | + object streams are disabled if less than 1.5, 128-bit encryption keys | ||
| 1340 | + are disabled if less than 1.4, and all encryption is disabled if less | ||
| 1341 | + than 1.3. Even with these precautions, qpdf won't be able to do | ||
| 1342 | + things like eliminate use of newer image compression schemes, | ||
| 1343 | + transparency groups, or other features that may have been added in | ||
| 1344 | + more recent versions of PDF. | ||
| 1345 | + | ||
| 1346 | + As a general rule, with the exception of big structural things like | ||
| 1347 | + the use of object streams or AES encryption, PDF viewers are supposed | ||
| 1348 | + to ignore features in files that they don't support from newer | ||
| 1349 | + versions. This means that forcing the version to a lower version may | ||
| 1350 | + make it possible to open your PDF file with an older version, though | ||
| 1351 | + bear in mind that some of the original document's functionality may | ||
| 1352 | + be lost. | ||
| 1353 | + | ||
| 1354 | +By default, when a stream is encoded using non-lossy filters that qpdf | ||
| 1355 | +understands and is not already compressed using a good compression | ||
| 1356 | +scheme, qpdf will uncompress and recompress streams. Assuming proper | ||
| 1357 | +filter implements, this is safe and generally results in smaller files. | ||
| 1358 | +This behavior may also be explicitly requested with | ||
| 1359 | +:samp:`--stream-data=compress`. | ||
| 1360 | + | ||
| 1361 | +When :samp:`--normalize-content=y` is specified, qpdf | ||
| 1362 | +will attempt to normalize whitespace and newlines in page content | ||
| 1363 | +streams. This is generally safe but could, in some cases, cause damage | ||
| 1364 | +to the content streams. This option is intended for people who wish to | ||
| 1365 | +study PDF content streams or to debug PDF content. You should not use | ||
| 1366 | +this for "production" PDF files. | ||
| 1367 | + | ||
| 1368 | +When normalizing content, if qpdf runs into any lexical errors, it will | ||
| 1369 | +print a warning indicating that content may be damaged. The only | ||
| 1370 | +situation in which qpdf is known to cause damage during content | ||
| 1371 | +normalization is when a page's contents are split across multiple | ||
| 1372 | +streams and streams are split in the middle of a lexical token such as a | ||
| 1373 | +string, name, or inline image. Note that files that do this are invalid | ||
| 1374 | +since the PDF specification states that content streams are not to be | ||
| 1375 | +split in the middle of a token. If you want to inspect the original | ||
| 1376 | +content streams in an uncompressed format, you can always run with | ||
| 1377 | +:samp:`--qdf --normalize-content=n` for a QDF file | ||
| 1378 | +without content normalization, or alternatively | ||
| 1379 | +:samp:`--stream-data=uncompress` for a regular non-QDF | ||
| 1380 | +mode file with uncompressed streams. These will both uncompress all the | ||
| 1381 | +streams but will not attempt to normalize content. Please note that if | ||
| 1382 | +you are using content normalization or QDF mode for the purpose of | ||
| 1383 | +manually inspecting files, you don't have to care about this. | ||
| 1384 | + | ||
| 1385 | +Object streams, also known as compressed objects, were introduced into | ||
| 1386 | +the PDF specification at version 1.5, corresponding to Acrobat 6. Some | ||
| 1387 | +older PDF viewers may not support files with object streams. qpdf can be | ||
| 1388 | +used to transform files with object streams to files without object | ||
| 1389 | +streams or vice versa. As mentioned above, there are three object stream | ||
| 1390 | +modes: :samp:`preserve`, | ||
| 1391 | +:samp:`disable`, and :samp:`generate`. | ||
| 1392 | + | ||
| 1393 | +In :samp:`preserve` mode, the relationship to objects | ||
| 1394 | +and the streams that contain them is preserved from the original file. | ||
| 1395 | +In :samp:`disable` mode, all objects are written as | ||
| 1396 | +regular, uncompressed objects. The resulting file should be readable by | ||
| 1397 | +older PDF viewers. (Of course, the content of the files may include | ||
| 1398 | +features not supported by older viewers, but at least the structure will | ||
| 1399 | +be supported.) In :samp:`generate` mode, qpdf will | ||
| 1400 | +create its own object streams. This will usually result in more compact | ||
| 1401 | +PDF files, though they may not be readable by older viewers. In this | ||
| 1402 | +mode, qpdf will also make sure the PDF version number in the header is | ||
| 1403 | +at least 1.5. | ||
| 1404 | + | ||
| 1405 | +The :samp:`--qdf` flag turns on QDF mode, which changes | ||
| 1406 | +some of the defaults described above. Specifically, in QDF mode, by | ||
| 1407 | +default, stream data is uncompressed, content streams are normalized, | ||
| 1408 | +and encryption is removed. These defaults can still be overridden by | ||
| 1409 | +specifying the appropriate options as described above. Additionally, in | ||
| 1410 | +QDF mode, stream lengths are stored as indirect objects, objects are | ||
| 1411 | +laid out in a less efficient but more readable fashion, and the | ||
| 1412 | +documents are interspersed with comments that make it easier for the | ||
| 1413 | +user to find things and also make it possible for | ||
| 1414 | +:command:`fix-qdf` to work properly. QDF mode is intended | ||
| 1415 | +for people, mostly developers, who wish to inspect or modify PDF files | ||
| 1416 | +in a text editor. For details, please see :ref:`ref.qdf`. | ||
| 1417 | + | ||
| 1418 | +.. _ref.testing-options: | ||
| 1419 | + | ||
| 1420 | +Testing, Inspection, and Debugging Options | ||
| 1421 | +------------------------------------------ | ||
| 1422 | + | ||
| 1423 | +These options can be useful for digging into PDF files or for use in | ||
| 1424 | +automated test suites for software that uses the qpdf library. When any | ||
| 1425 | +of the options in this section are specified, no output file should be | ||
| 1426 | +given. The following options are available: | ||
| 1427 | + | ||
| 1428 | +:samp:`--deterministic-id` | ||
| 1429 | + Causes generation of a deterministic value for /ID. This prevents use | ||
| 1430 | + of timestamp and output file name information in the /ID generation. | ||
| 1431 | + Instead, at some slight additional runtime cost, the /ID field is | ||
| 1432 | + generated to include a digest of the significant parts of the content | ||
| 1433 | + of the output PDF file. This means that a given qpdf operation should | ||
| 1434 | + generate the same /ID each time it is run, which can be useful when | ||
| 1435 | + caching results or for generation of some test data. Use of this flag | ||
| 1436 | + is not compatible with creation of encrypted files. | ||
| 1437 | + | ||
| 1438 | +:samp:`--static-id` | ||
| 1439 | + Causes generation of a fixed value for /ID. This is intended for | ||
| 1440 | + testing only. Never use it for production files. If you are trying to | ||
| 1441 | + get the same /ID each time for a given file and you are not | ||
| 1442 | + generating encrypted files, consider using the | ||
| 1443 | + :samp:`--deterministic-id` option. | ||
| 1444 | + | ||
| 1445 | +:samp:`--static-aes-iv` | ||
| 1446 | + Causes use of a static initialization vector for AES-CBC. This is | ||
| 1447 | + intended for testing only so that output files can be reproducible. | ||
| 1448 | + Never use it for production files. This option in particular is not | ||
| 1449 | + secure since it significantly weakens the encryption. | ||
| 1450 | + | ||
| 1451 | +:samp:`--no-original-object-ids` | ||
| 1452 | + Suppresses inclusion of original object ID comments in QDF files. | ||
| 1453 | + This can be useful when generating QDF files for test purposes, | ||
| 1454 | + particularly when comparing them to determine whether two PDF files | ||
| 1455 | + have identical content. | ||
| 1456 | + | ||
| 1457 | +:samp:`--show-encryption` | ||
| 1458 | + Shows document encryption parameters. Also shows the document's user | ||
| 1459 | + password if the owner password is given. | ||
| 1460 | + | ||
| 1461 | +:samp:`--show-encryption-key` | ||
| 1462 | + When encryption information is being displayed, as when | ||
| 1463 | + :samp:`--check` or | ||
| 1464 | + :samp:`--show-encryption` is given, display the | ||
| 1465 | + computed or retrieved encryption key as a hexadecimal string. This | ||
| 1466 | + value is not ordinarily useful to users, but it can be used as the | ||
| 1467 | + argument to :samp:`--password` if the | ||
| 1468 | + :samp:`--password-is-hex-key` is specified. Note | ||
| 1469 | + that, when PDF files are encrypted, passwords and other metadata are | ||
| 1470 | + used only to compute an encryption key, and the encryption key is | ||
| 1471 | + what is actually used for encryption. This enables retrieval of that | ||
| 1472 | + key. | ||
| 1473 | + | ||
| 1474 | +:samp:`--check-linearization` | ||
| 1475 | + Checks file integrity and linearization status. | ||
| 1476 | + | ||
| 1477 | +:samp:`--show-linearization` | ||
| 1478 | + Checks and displays all data in the linearization hint tables. | ||
| 1479 | + | ||
| 1480 | +:samp:`--show-xref` | ||
| 1481 | + Shows the contents of the cross-reference table in a human-readable | ||
| 1482 | + form. This is especially useful for files with cross-reference | ||
| 1483 | + streams which are stored in a binary format. | ||
| 1484 | + | ||
| 1485 | +:samp:`--show-object=trailer|obj[,gen]` | ||
| 1486 | + Show the contents of the given object. This is especially useful for | ||
| 1487 | + inspecting objects that are inside of object streams (also known as | ||
| 1488 | + "compressed objects"). | ||
| 1489 | + | ||
| 1490 | +:samp:`--raw-stream-data` | ||
| 1491 | + When used along with the :samp:`--show-object` | ||
| 1492 | + option, if the object is a stream, shows the raw stream data instead | ||
| 1493 | + of object's contents. | ||
| 1494 | + | ||
| 1495 | +:samp:`--filtered-stream-data` | ||
| 1496 | + When used along with the :samp:`--show-object` | ||
| 1497 | + option, if the object is a stream, shows the filtered stream data | ||
| 1498 | + instead of object's contents. If the stream is filtered using filters | ||
| 1499 | + that qpdf does not support, an error will be issued. | ||
| 1500 | + | ||
| 1501 | +:samp:`--show-npages` | ||
| 1502 | + Prints the number of pages in the input file on a line by itself. | ||
| 1503 | + Since the number of pages appears by itself on a line, this option | ||
| 1504 | + can be useful for scripting if you need to know the number of pages | ||
| 1505 | + in a file. | ||
| 1506 | + | ||
| 1507 | +:samp:`--show-pages` | ||
| 1508 | + Shows the object and generation number for each page dictionary | ||
| 1509 | + object and for each content stream associated with the page. Having | ||
| 1510 | + this information makes it more convenient to inspect objects from a | ||
| 1511 | + particular page. | ||
| 1512 | + | ||
| 1513 | +:samp:`--with-images` | ||
| 1514 | + When used along with :samp:`--show-pages`, also shows | ||
| 1515 | + the object and generation numbers for the image objects on each page. | ||
| 1516 | + (At present, information about images in shared resource dictionaries | ||
| 1517 | + are not output by this command. This is discussed in a comment in the | ||
| 1518 | + source code.) | ||
| 1519 | + | ||
| 1520 | +:samp:`--json` | ||
| 1521 | + Generate a JSON representation of the file. This is described in | ||
| 1522 | + depth in :ref:`ref.json` | ||
| 1523 | + | ||
| 1524 | +:samp:`--json-help` | ||
| 1525 | + Describe the format of the JSON output. | ||
| 1526 | + | ||
| 1527 | +:samp:`--json-key=key` | ||
| 1528 | + This option is repeatable. If specified, only top-level keys | ||
| 1529 | + specified will be included in the JSON output. If not specified, all | ||
| 1530 | + keys will be shown. | ||
| 1531 | + | ||
| 1532 | +:samp:`--json-object=trailer|obj[,gen]` | ||
| 1533 | + This option is repeatable. If specified, only specified objects will | ||
| 1534 | + be shown in the "``objects``" key of the JSON output. If absent, all | ||
| 1535 | + objects will be shown. | ||
| 1536 | + | ||
| 1537 | +:samp:`--check` | ||
| 1538 | + Checks file structure and well as encryption, linearization, and | ||
| 1539 | + encoding of stream data. A file for which | ||
| 1540 | + :samp:`--check` reports no errors may still have | ||
| 1541 | + errors in stream data content but should otherwise be structurally | ||
| 1542 | + sound. If :samp:`--check` any errors, qpdf will exit | ||
| 1543 | + with a status of 2. There are some recoverable conditions that | ||
| 1544 | + :samp:`--check` detects. These are issued as warnings | ||
| 1545 | + instead of errors. If qpdf finds no errors but finds warnings, it | ||
| 1546 | + will exit with a status of 3 (as of versionย 2.0.4). When | ||
| 1547 | + :samp:`--check` is combined with other options, | ||
| 1548 | + checks are always performed before any other options are processed. | ||
| 1549 | + For erroneous files, :samp:`--check` will cause qpdf | ||
| 1550 | + to attempt to recover, after which other options are effectively | ||
| 1551 | + operating on the recovered file. Combining | ||
| 1552 | + :samp:`--check` with other options in this way can be | ||
| 1553 | + useful for manually recovering severely damaged files. Note that | ||
| 1554 | + :samp:`--check` produces no output to standard output | ||
| 1555 | + when everything is valid, so if you are using this to | ||
| 1556 | + programmatically validate files in bulk, it is safe to run without | ||
| 1557 | + output redirected to :file:`/dev/null` and just | ||
| 1558 | + check for a 0 exit code. | ||
| 1559 | + | ||
| 1560 | +The :samp:`--raw-stream-data` and | ||
| 1561 | +:samp:`--filtered-stream-data` options are ignored | ||
| 1562 | +unless :samp:`--show-object` is given. Either of these | ||
| 1563 | +options will cause the stream data to be written to standard output. In | ||
| 1564 | +order to avoid commingling of stream data with other output, it is | ||
| 1565 | +recommend that these objects not be combined with other test/inspection | ||
| 1566 | +options. | ||
| 1567 | + | ||
| 1568 | +If :samp:`--filtered-stream-data` is given and | ||
| 1569 | +:samp:`--normalize-content=y` is also given, qpdf will | ||
| 1570 | +attempt to normalize the stream data as if it is a page content stream. | ||
| 1571 | +This attempt will be made even if it is not a page content stream, in | ||
| 1572 | +which case it will produce unusable results. | ||
| 1573 | + | ||
| 1574 | +.. _ref.unicode-passwords: | ||
| 1575 | + | ||
| 1576 | +Unicode Passwords | ||
| 1577 | +----------------- | ||
| 1578 | + | ||
| 1579 | +At the library API level, all methods that perform encryption and | ||
| 1580 | +decryption interpret passwords as strings of bytes. It is up to the | ||
| 1581 | +caller to ensure that they are appropriately encoded. Starting with qpdf | ||
| 1582 | +version 8.4.0, qpdf will attempt to make this easier for you when | ||
| 1583 | +interact with qpdf via its command line interface. The PDF specification | ||
| 1584 | +requires passwords used to encrypt files with 40-bit or 128-bit | ||
| 1585 | +encryption to be encoded with PDF Doc encoding. This encoding is a | ||
| 1586 | +single-byte encoding that supports ISO-Latin-1 and a handful of other | ||
| 1587 | +commonly used characters. It has a large overlap with Windows ANSI but | ||
| 1588 | +is not exactly the same. There is generally not a way to provide PDF Doc | ||
| 1589 | +encoded strings on the command line. As such, qpdf versions prior to | ||
| 1590 | +8.4.0 would often create PDF files that couldn't be opened with other | ||
| 1591 | +software when given a password with non-ASCII characters to encrypt a | ||
| 1592 | +file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf | ||
| 1593 | +recognizes the encoding of the parameter and transcodes it as needed. | ||
| 1594 | +The rest of this section provides the details about exactly how qpdf | ||
| 1595 | +behaves. Most users will not need to know this information, but it might | ||
| 1596 | +be useful if you have been working around qpdf's old behavior or if you | ||
| 1597 | +are using qpdf to generate encrypted files for testing other PDF | ||
| 1598 | +software. | ||
| 1599 | + | ||
| 1600 | +A note about Windows: when qpdf builds, it attempts to determine what it | ||
| 1601 | +has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain`` | ||
| 1602 | +function is an alternative entry point that receives all arguments as | ||
| 1603 | +UTF-16-encoded strings. When qpdf starts up this way, it converts all | ||
| 1604 | +the strings to UTF-8 encoding and then invokes the regular main. This | ||
| 1605 | +means that, as far as qpdf is concerned, it receives its command-line | ||
| 1606 | +arguments with UTF-8 encoding, just as it would in any modern Linux or | ||
| 1607 | +UNIX environment. | ||
| 1608 | + | ||
| 1609 | +If a file is being encrypted with 40-bit or 128-bit encryption and the | ||
| 1610 | +supplied password is not a valid UTF-8 string, qpdf will fall back to | ||
| 1611 | +the behavior of interpreting the password as a string of bytes. If you | ||
| 1612 | +have old scripts that encrypt files by passing the output of | ||
| 1613 | +:command:`iconv` to qpdf, you no longer need to do that, | ||
| 1614 | +but if you do, qpdf should still work. The only exception would be for | ||
| 1615 | +the extremely unlikely case of a password that is encoded with a | ||
| 1616 | +single-byte encoding but also happens to be valid UTF-8. Such a password | ||
| 1617 | +would contain strings of even numbers of characters that alternate | ||
| 1618 | +between accented letters and symbols. In the extremely unlikely event | ||
| 1619 | +that you are intentionally using such passwords and qpdf is thwarting | ||
| 1620 | +you by interpreting them as UTF-8, you can use | ||
| 1621 | +:samp:`--password-mode=bytes` to suppress qpdf's | ||
| 1622 | +automatic behavior. | ||
| 1623 | + | ||
| 1624 | +The :samp:`--password-mode` option, as described earlier | ||
| 1625 | +in this chapter, can be used to change qpdf's interpretation of supplied | ||
| 1626 | +passwords. There are very few reasons to use this option. One would be | ||
| 1627 | +the unlikely case described in the previous paragraph in which the | ||
| 1628 | +supplied password happens to be valid UTF-8 but isn't supposed to be | ||
| 1629 | +UTF-8. Your best bet would be just to provide the password as a valid | ||
| 1630 | +UTF-8 string, but you could also use | ||
| 1631 | +:samp:`--password-mode=bytes`. Another reason to use | ||
| 1632 | +:samp:`--password-mode=bytes` would be to intentionally | ||
| 1633 | +generate PDF files encrypted with passwords that are not properly | ||
| 1634 | +encoded. The qpdf test suite does this to generate invalid files for the | ||
| 1635 | +purpose of testing its password recovery capability. If you were trying | ||
| 1636 | +to create intentionally incorrect files for a similar purposes, the | ||
| 1637 | +:samp:`bytes` password mode can enable you to do this. | ||
| 1638 | + | ||
| 1639 | +When qpdf attempts to decrypt a file with a password that contains | ||
| 1640 | +non-ASCII characters, it will generate a list of alternative passwords | ||
| 1641 | +by attempting to interpret the password as each of a handful of | ||
| 1642 | +different coding systems and then transcode them to the required format. | ||
| 1643 | +This helps to compensate for the supplied password being given in the | ||
| 1644 | +wrong coding system, such as would happen if you used the | ||
| 1645 | +:command:`iconv` workaround that was previously needed. | ||
| 1646 | +It also generates passwords by doing the reverse operation: translating | ||
| 1647 | +from correct in incorrect encoding of the password. This would enable | ||
| 1648 | +qpdf to decrypt files using passwords that were improperly encoded by | ||
| 1649 | +whatever software encrypted the files, including older versions of qpdf | ||
| 1650 | +invoked without properly encoded passwords. The combination of these two | ||
| 1651 | +recovery methods should make qpdf transparently open most encrypted | ||
| 1652 | +files with the password supplied correctly but in the wrong coding | ||
| 1653 | +system. There are no real downsides to this behavior, but if you don't | ||
| 1654 | +want qpdf to do this, you can use the | ||
| 1655 | +:samp:`--suppress-password-recovery` option. One reason | ||
| 1656 | +to do that is to ensure that you know the exact password that was used | ||
| 1657 | +to encrypt the file. | ||
| 1658 | + | ||
| 1659 | +With these changes, qpdf now generates compliant passwords in most | ||
| 1660 | +cases. There are still some exceptions. In particular, the PDF | ||
| 1661 | +specification directs compliant writers to normalize Unicode passwords | ||
| 1662 | +and to perform certain transformations on passwords with bidirectional | ||
| 1663 | +text. Implementing this functionality requires using a real Unicode | ||
| 1664 | +library like ICU. If a client application that uses qpdf wants to do | ||
| 1665 | +this, the qpdf library will accept the resulting passwords, but qpdf | ||
| 1666 | +will not perform these transformations itself. It is possible that this | ||
| 1667 | +will be addressed in a future version of qpdf. The ``QPDFWriter`` | ||
| 1668 | +methods that enable encryption on the output file accept passwords as | ||
| 1669 | +strings of bytes. | ||
| 1670 | + | ||
| 1671 | +Please note that the :samp:`--password-is-hex-key` | ||
| 1672 | +option is unrelated to all this. This flag bypasses the normal process | ||
| 1673 | +of going from password to encryption string entirely, allowing the raw | ||
| 1674 | +encryption key to be specified directly. This is useful for forensic | ||
| 1675 | +purposes or for brute-force recovery of files with unknown passwords. |
manual/conf.py
| @@ -11,4 +11,7 @@ project = 'QPDF' | @@ -11,4 +11,7 @@ project = 'QPDF' | ||
| 11 | copyright = '2005-2021, Jay Berkenbilt' | 11 | copyright = '2005-2021, Jay Berkenbilt' |
| 12 | author = 'Jay Berkenbilt' | 12 | author = 'Jay Berkenbilt' |
| 13 | release = '10.4.0' | 13 | release = '10.4.0' |
| 14 | -html_theme = 'alabaster' | 14 | +html_theme = 'agogo' |
| 15 | +html_theme_options = { | ||
| 16 | + "body_max_width": None, | ||
| 17 | +} |
manual/design.rst
0 โ 100644
| 1 | +.. _ref.design: | ||
| 2 | + | ||
| 3 | +Design and Library Notes | ||
| 4 | +======================== | ||
| 5 | + | ||
| 6 | +.. _ref.design.intro: | ||
| 7 | + | ||
| 8 | +Introduction | ||
| 9 | +------------ | ||
| 10 | + | ||
| 11 | +This section was written prior to the implementation of the qpdf package | ||
| 12 | +and was subsequently modified to reflect the implementation. In some | ||
| 13 | +cases, for purposes of explanation, it may differ slightly from the | ||
| 14 | +actual implementation. As always, the source code and test suite are | ||
| 15 | +authoritative. Even if there are some errors, this document should serve | ||
| 16 | +as a road map to understanding how this code works. | ||
| 17 | + | ||
| 18 | +In general, one should adhere strictly to a specification when writing | ||
| 19 | +but be liberal in reading. This way, the product of our software will be | ||
| 20 | +accepted by the widest range of other programs, and we will accept the | ||
| 21 | +widest range of input files. This library attempts to conform to that | ||
| 22 | +philosophy whenever possible but also aims to provide strict checking | ||
| 23 | +for people who want to validate PDF files. If you don't want to see | ||
| 24 | +warnings and are trying to write something that is tolerant, you can | ||
| 25 | +call ``setSuppressWarnings(true)``. If you want to fail on the first | ||
| 26 | +error, you can call ``setAttemptRecovery(false)``. The default behavior | ||
| 27 | +is to generating warnings for recoverable problems. Note that recovery | ||
| 28 | +will not always produce the desired results even if it is able to get | ||
| 29 | +through the file. Unlike most other PDF files that produce generic | ||
| 30 | +warnings such as "This file is damaged,", qpdf generally issues a | ||
| 31 | +detailed error message that would be most useful to a PDF developer. | ||
| 32 | +This is by design as there seems to be a shortage of PDF validation | ||
| 33 | +tools out there. This was, in fact, one of the major motivations behind | ||
| 34 | +the initial creation of qpdf. | ||
| 35 | + | ||
| 36 | +.. _ref.design-goals: | ||
| 37 | + | ||
| 38 | +Design Goals | ||
| 39 | +------------ | ||
| 40 | + | ||
| 41 | +The QPDF package includes support for reading and rewriting PDF files. | ||
| 42 | +It aims to hide from the user details involving object locations, | ||
| 43 | +modified (appended) PDF files, the directness/indirectness of objects, | ||
| 44 | +and stream filters including encryption. It does not aim to hide | ||
| 45 | +knowledge of the object hierarchy or content stream contents. Put | ||
| 46 | +another way, a user of the qpdf library is expected to have knowledge | ||
| 47 | +about how PDF files work, but is not expected to have to keep track of | ||
| 48 | +bookkeeping details such as file positions. | ||
| 49 | + | ||
| 50 | +A user of the library never has to care whether an object is direct or | ||
| 51 | +indirect, though it is possible to determine whether an object is direct | ||
| 52 | +or not if this information is needed. All access to objects deals with | ||
| 53 | +this transparently. All memory management details are also handled by | ||
| 54 | +the library. | ||
| 55 | + | ||
| 56 | +The ``PointerHolder`` object is used internally by the library to deal | ||
| 57 | +with memory management. This is basically a smart pointer object very | ||
| 58 | +similar in spirit to C++-11's ``std::shared_ptr`` object, but predating | ||
| 59 | +it by several years. This library also makes use of a technique for | ||
| 60 | +giving fine-grained access to methods in one class to other classes by | ||
| 61 | +using public subclasses with friends and only private members that in | ||
| 62 | +turn call private methods of the containing class. See | ||
| 63 | +``QPDFObjectHandle::Factory`` as an example. | ||
| 64 | + | ||
| 65 | +The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF | ||
| 66 | +file. The library provides methods for both accessing and mutating PDF | ||
| 67 | +files. | ||
| 68 | + | ||
| 69 | +The primary class for interacting with PDF objects is | ||
| 70 | +``QPDFObjectHandle``. Instances of this class can be passed around by | ||
| 71 | +value, copied, stored in containers, etc. with very low overhead. | ||
| 72 | +Instances of ``QPDFObjectHandle`` created by reading from a file will | ||
| 73 | +always contain a reference back to the ``QPDF`` object from which they | ||
| 74 | +were created. A ``QPDFObjectHandle`` may be direct or indirect. If | ||
| 75 | +indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to | ||
| 76 | +is a null pointer. In this case, the first attempt to access the | ||
| 77 | +underlying ``QPDFObject`` will result in the ``QPDFObject`` being | ||
| 78 | +resolved via a call to the referenced ``QPDF`` instance. This makes it | ||
| 79 | +essentially impossible to make coding errors in which certain things | ||
| 80 | +will work for some PDF files and not for others based on which objects | ||
| 81 | +are direct and which objects are indirect. | ||
| 82 | + | ||
| 83 | +Instances of ``QPDFObjectHandle`` can be directly created and modified | ||
| 84 | +using static factory methods in the ``QPDFObjectHandle`` class. There | ||
| 85 | +are factory methods for each type of object as well as a convenience | ||
| 86 | +method ``QPDFObjectHandle::parse`` that creates an object from a string | ||
| 87 | +representation of the object. Existing instances of ``QPDFObjectHandle`` | ||
| 88 | +can also be modified in several ways. See comments in | ||
| 89 | +:file:`QPDFObjectHandle.hh` for details. | ||
| 90 | + | ||
| 91 | +An instance of ``QPDF`` is constructed by using the class's default | ||
| 92 | +constructor. If desired, the ``QPDF`` object may be configured with | ||
| 93 | +various methods that change its default behavior. Then the | ||
| 94 | +``QPDF::processFile()`` method is passed the name of a PDF file, which | ||
| 95 | +permanently associates the file with that QPDF object. A password may | ||
| 96 | +also be given for access to password-protected files. QPDF does not | ||
| 97 | +enforce encryption parameters and will treat user and owner passwords | ||
| 98 | +equivalently. Either password may be used to access an encrypted file. | ||
| 99 | +``QPDF`` will allow recovery of a user password given an owner password. | ||
| 100 | +The input PDF file must be seekable. (Output files written by | ||
| 101 | +``QPDFWriter`` need not be seekable, even when creating linearized | ||
| 102 | +files.) During construction, ``QPDF`` validates the PDF file's header, | ||
| 103 | +and then reads the cross reference tables and trailer dictionaries. The | ||
| 104 | +``QPDF`` class keeps only the first trailer dictionary though it does | ||
| 105 | +read all of them so it can check the ``/Prev`` key. ``QPDF`` class users | ||
| 106 | +may request the root object and the trailer dictionary specifically. The | ||
| 107 | +cross reference table is kept private. Objects may then be requested by | ||
| 108 | +number of by walking the object tree. | ||
| 109 | + | ||
| 110 | +When a PDF file has a cross-reference stream instead of a | ||
| 111 | +cross-reference table and trailer, requesting the document's trailer | ||
| 112 | +dictionary returns the stream dictionary from the cross-reference stream | ||
| 113 | +instead. | ||
| 114 | + | ||
| 115 | +There are some convenience routines for very common operations such as | ||
| 116 | +walking the page tree and returning a vector of all page objects. For | ||
| 117 | +full details, please see the header files | ||
| 118 | +:file:`QPDF.hh` and | ||
| 119 | +:file:`QPDFObjectHandle.hh`. There are also some | ||
| 120 | +additional helper classes that provide higher level API functions for | ||
| 121 | +certain document constructions. These are discussed in :ref:`ref.helper-classes`. | ||
| 122 | + | ||
| 123 | +.. _ref.helper-classes: | ||
| 124 | + | ||
| 125 | +Helper Classes | ||
| 126 | +-------------- | ||
| 127 | + | ||
| 128 | +QPDF version 8.1 introduced the concept of helper classes. Helper | ||
| 129 | +classes are intended to contain higher level APIs that allow developers | ||
| 130 | +to work with certain document constructs at an abstraction level above | ||
| 131 | +that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of | ||
| 132 | +not hiding document structure from the developer. As with qpdf in | ||
| 133 | +general, the goal is take away some of the more tedious bookkeeping | ||
| 134 | +aspects of working with PDF files, not to remove the need for the | ||
| 135 | +developer to understand how the PDF construction in question works. The | ||
| 136 | +driving factor behind the creation of helper classes was to allow the | ||
| 137 | +evolution of higher level interfaces in qpdf without polluting the | ||
| 138 | +interfaces of the main top-level classes ``QPDF`` and | ||
| 139 | +``QPDFObjectHandle``. | ||
| 140 | + | ||
| 141 | +There are two kinds of helper classes: *document* helpers and *object* | ||
| 142 | +helpers. Document helpers are constructed with a reference to a ``QPDF`` | ||
| 143 | +object and provide methods for working with structures that are at the | ||
| 144 | +document level. Object helpers are constructed with an instance of a | ||
| 145 | +``QPDFObjectHandle`` and provide methods for working with specific types | ||
| 146 | +of objects. | ||
| 147 | + | ||
| 148 | +Examples of document helpers include ``QPDFPageDocumentHelper``, which | ||
| 149 | +contains methods for operating on the document's page trees, such as | ||
| 150 | +enumerating all pages of a document and adding and removing pages; and | ||
| 151 | +``QPDFAcroFormDocumentHelper``, which contains document-level methods | ||
| 152 | +related to interactive forms, such as enumerating form fields and | ||
| 153 | +creating mappings between form fields and annotations. | ||
| 154 | + | ||
| 155 | +Examples of object helpers include ``QPDFPageObjectHelper`` for | ||
| 156 | +performing operations on pages such as page rotation and some operations | ||
| 157 | +on content streams, ``QPDFFormFieldObjectHelper`` for performing | ||
| 158 | +operations related to interactive form fields, and | ||
| 159 | +``QPDFAnnotationObjectHelper`` for working with annotations. | ||
| 160 | + | ||
| 161 | +It is always possible to retrieve the underlying ``QPDF`` reference from | ||
| 162 | +a document helper and the underlying ``QPDFObjectHandle`` reference from | ||
| 163 | +an object helper. Helpers are designed to be helpers, not wrappers. The | ||
| 164 | +intention is that, in general, it is safe to freely intermix operations | ||
| 165 | +that use helpers with operations that use the underlying objects. | ||
| 166 | +Document and object helpers do not attempt to provide a complete | ||
| 167 | +interface for working with the things they are helping with, nor do they | ||
| 168 | +attempt to encapsulate underlying structures. They just provide a few | ||
| 169 | +methods to help with error-prone, repetitive, or complex tasks. In some | ||
| 170 | +cases, a helper object may cache some information that is expensive to | ||
| 171 | +gather. In such cases, the helper classes are implemented so that their | ||
| 172 | +own methods keep the cache consistent, and the header file will provide | ||
| 173 | +a method to invalidate the cache and a description of what kinds of | ||
| 174 | +operations would make the cache invalid. If in doubt, you can always | ||
| 175 | +discard a helper class and create a new one with the same underlying | ||
| 176 | +objects, which will ensure that you have discarded any stale | ||
| 177 | +information. | ||
| 178 | + | ||
| 179 | +By Convention, document helpers are called | ||
| 180 | +``QPDFSomethingDocumentHelper`` and are derived from | ||
| 181 | +``QPDFDocumentHelper``, and object helpers are called | ||
| 182 | +``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``. | ||
| 183 | +For details on specific helpers, please see their header files. You can | ||
| 184 | +find them by looking at | ||
| 185 | +:file:`include/qpdf/QPDF*DocumentHelper.hh` and | ||
| 186 | +:file:`include/qpdf/QPDF*ObjectHelper.hh`. | ||
| 187 | + | ||
| 188 | +In order to avoid creation of circular dependencies, the following | ||
| 189 | +general guidelines are followed with helper classes: | ||
| 190 | + | ||
| 191 | +- Core class interfaces do not know about helper classes. For example, | ||
| 192 | + no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper | ||
| 193 | + classes in their interfaces. | ||
| 194 | + | ||
| 195 | +- Interfaces of object helpers will usually not use document helpers in | ||
| 196 | + their interfaces. This is because it is much more useful for document | ||
| 197 | + helpers to have methods that return object helpers. Most operations | ||
| 198 | + in PDF files start at the document level and go from there to the | ||
| 199 | + object level rather than the other way around. It can sometimes be | ||
| 200 | + useful to map back from object-level structures to document-level | ||
| 201 | + structures. If there is a desire to do this, it will generally be | ||
| 202 | + provided by a method in the document helper class. | ||
| 203 | + | ||
| 204 | +- Most of the time, object helpers don't know about other object | ||
| 205 | + helpers. However, in some cases, one type of object may be a | ||
| 206 | + container for another type of object, in which case it may make sense | ||
| 207 | + for the outer object to know about the inner object. For example, | ||
| 208 | + there are methods in the ``QPDFPageObjectHelper`` that know | ||
| 209 | + ``QPDFAnnotationObjectHelper`` because references to annotations are | ||
| 210 | + contained in page dictionaries. | ||
| 211 | + | ||
| 212 | +- Any helper or core library class may use helpers in their | ||
| 213 | + implementations. | ||
| 214 | + | ||
| 215 | +Prior to qpdf version 8.1, higher level interfaces were added as | ||
| 216 | +"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For | ||
| 217 | +compatibility, older convenience functions for operating with pages will | ||
| 218 | +remain in those classes even as alternatives are provided in helper | ||
| 219 | +classes. Going forward, new higher level interfaces will be provided | ||
| 220 | +using helper classes. | ||
| 221 | + | ||
| 222 | +.. _ref.implementation-notes: | ||
| 223 | + | ||
| 224 | +Implementation Notes | ||
| 225 | +-------------------- | ||
| 226 | + | ||
| 227 | +This section contains a few notes about QPDF's internal implementation, | ||
| 228 | +particularly around what it does when it first processes a file. This | ||
| 229 | +section is a bit of a simplification of what it actually does, but it | ||
| 230 | +could serve as a starting point to someone trying to understand the | ||
| 231 | +implementation. There is nothing in this section that you need to know | ||
| 232 | +to use the qpdf library. | ||
| 233 | + | ||
| 234 | +``QPDFObject`` is the basic PDF Object class. It is an abstract base | ||
| 235 | +class from which are derived classes for each type of PDF object. | ||
| 236 | +Clients do not interact with Objects directly but instead interact with | ||
| 237 | +``QPDFObjectHandle``. | ||
| 238 | + | ||
| 239 | +When the ``QPDF`` class creates a new object, it dynamically allocates | ||
| 240 | +the appropriate type of ``QPDFObject`` and immediately hands the pointer | ||
| 241 | +to an instance of ``QPDFObjectHandle``. The parser reads a token from | ||
| 242 | +the current file position. If the token is a not either a dictionary or | ||
| 243 | +array opener, an object is immediately constructed from the single token | ||
| 244 | +and the parser returns. Otherwise, the parser iterates in a special mode | ||
| 245 | +in which it accumulates objects until it finds a balancing closer. | ||
| 246 | +During this process, the "``R``" keyword is recognized and an indirect | ||
| 247 | +``QPDFObjectHandle`` may be constructed. | ||
| 248 | + | ||
| 249 | +The ``QPDF::resolve()`` method, which is used to resolve an indirect | ||
| 250 | +object, may be invoked from the ``QPDFObjectHandle`` class. It first | ||
| 251 | +checks a cache to see whether this object has already been read. If not, | ||
| 252 | +it reads the object from the PDF file and caches it. It the returns the | ||
| 253 | +resulting ``QPDFObjectHandle``. The calling object handle then replaces | ||
| 254 | +its ``PointerHolder<QDFObject>`` with the one from the newly returned | ||
| 255 | +``QPDFObjectHandle``. In this way, only a single copy of any direct | ||
| 256 | +object need exist and clients can access objects transparently without | ||
| 257 | +knowing caring whether they are direct or indirect objects. | ||
| 258 | +Additionally, no object is ever read from the file more than once. That | ||
| 259 | +means that only the portions of the PDF file that are actually needed | ||
| 260 | +are ever read from the input file, thus allowing the qpdf package to | ||
| 261 | +take advantage of this important design goal of PDF files. | ||
| 262 | + | ||
| 263 | +If the requested object is inside of an object stream, the object stream | ||
| 264 | +itself is first read into memory. Then the tokenizer reads objects from | ||
| 265 | +the memory stream based on the offset information stored in the stream. | ||
| 266 | +Those individual objects are cached, after which the temporary buffer | ||
| 267 | +holding the object stream contents are discarded. In this way, the first | ||
| 268 | +time an object in an object stream is requested, all objects in the | ||
| 269 | +stream are cached. | ||
| 270 | + | ||
| 271 | +The following example should clarify how ``QPDF`` processes a simple | ||
| 272 | +file. | ||
| 273 | + | ||
| 274 | +- Client constructs ``QPDF`` ``pdf`` and calls | ||
| 275 | + ``pdf.processFile("a.pdf");``. | ||
| 276 | + | ||
| 277 | +- The ``QPDF`` class checks the beginning of | ||
| 278 | + :file:`a.pdf` for a PDF header. It then reads the | ||
| 279 | + cross reference table mentioned at the end of the file, ensuring that | ||
| 280 | + it is looking before the last ``%%EOF``. After getting to ``trailer`` | ||
| 281 | + keyword, it invokes the parser. | ||
| 282 | + | ||
| 283 | +- The parser sees "``<<``", so it calls itself recursively in | ||
| 284 | + dictionary creation mode. | ||
| 285 | + | ||
| 286 | +- In dictionary creation mode, the parser keeps accumulating objects | ||
| 287 | + until it encounters "``>>``". Each object that is read is pushed onto | ||
| 288 | + a stack. If "``R``" is read, the last two objects on the stack are | ||
| 289 | + inspected. If they are integers, they are popped off the stack and | ||
| 290 | + their values are used to construct an indirect object handle which is | ||
| 291 | + then pushed onto the stack. When "``>>``" is finally read, the stack | ||
| 292 | + is converted into a ``QPDF_Dictionary`` which is placed in a | ||
| 293 | + ``QPDFObjectHandle`` and returned. | ||
| 294 | + | ||
| 295 | +- The resulting dictionary is saved as the trailer dictionary. | ||
| 296 | + | ||
| 297 | +- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that | ||
| 298 | + point and repeats except that the new trailer dictionary is not | ||
| 299 | + saved. If ``/Prev`` is not present, the initial parsing process is | ||
| 300 | + complete. | ||
| 301 | + | ||
| 302 | + If there is an encryption dictionary, the document's encryption | ||
| 303 | + parameters are initialized. | ||
| 304 | + | ||
| 305 | +- The client requests root object. The ``QPDF`` class gets the value of | ||
| 306 | + root key from trailer dictionary and returns it. It is an unresolved | ||
| 307 | + indirect ``QPDFObjectHandle``. | ||
| 308 | + | ||
| 309 | +- The client requests the ``/Pages`` key from root | ||
| 310 | + ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is | ||
| 311 | + indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the | ||
| 312 | + object cache for an object with the root dictionary's object ID and | ||
| 313 | + generation number. Upon not seeing it, it checks the cross reference | ||
| 314 | + table, gets the offset, and reads the object present at that offset. | ||
| 315 | + It stores the result in the object cache and returns the cached | ||
| 316 | + result. The calling ``QPDFObjectHandle`` replaces its object pointer | ||
| 317 | + with the one from the resolved ``QPDFObjectHandle``, verifies that it | ||
| 318 | + a valid dictionary object, and returns the (unresolved indirect) | ||
| 319 | + ``QPDFObject`` handle to the top of the Pages hierarchy. | ||
| 320 | + | ||
| 321 | + As the client continues to request objects, the same process is | ||
| 322 | + followed for each new requested object. | ||
| 323 | + | ||
| 324 | +.. _ref.casting: | ||
| 325 | + | ||
| 326 | +Casting Policy | ||
| 327 | +-------------- | ||
| 328 | + | ||
| 329 | +This section describes the casting policy followed by qpdf's | ||
| 330 | +implementation. This is no concern to qpdf's end users and largely of no | ||
| 331 | +concern to people writing code that uses qpdf, but it could be of | ||
| 332 | +interest to people who are porting qpdf to a new platform or who are | ||
| 333 | +making modifications to the code. | ||
| 334 | + | ||
| 335 | +The C++ code in qpdf is free of old-style casts except where unavoidable | ||
| 336 | +(e.g. where the old-style cast is in a macro provided by a third-party | ||
| 337 | +header file). When there is a need for a cast, it is handled, in order | ||
| 338 | +of preference, by rewriting the code to avoid the need for a cast, | ||
| 339 | +calling ``const_cast``, calling ``static_cast``, calling | ||
| 340 | +``reinterpret_cast``, or calling some combination of the above. As a | ||
| 341 | +last resort, a compiler-specific ``#pragma`` may be used to suppress a | ||
| 342 | +warning that we don't want to fix. Examples may include suppressing | ||
| 343 | +warnings about the use of old-style casts in code that is shared between | ||
| 344 | +C and C++ code. | ||
| 345 | + | ||
| 346 | +The ``QIntC`` namespace, provided by | ||
| 347 | +:file:`include/qpdf/QIntC.hh`, implements safe | ||
| 348 | +functions for converting between integer types. These functions do range | ||
| 349 | +checking and throw a ``std::range_error``, which is subclass of | ||
| 350 | +``std::runtime_error``, if conversion from one integer type to another | ||
| 351 | +results in loss of information. There are many cases in which we have to | ||
| 352 | +move between different integer types because of incompatible integer | ||
| 353 | +types used in interoperable interfaces. Some are unavoidable, such as | ||
| 354 | +moving between sizes and offsets, and others are there because of old | ||
| 355 | +code that is too in entrenched to be fixable without breaking source | ||
| 356 | +compatibility and causing pain for users. QPDF is compiled with extra | ||
| 357 | +warnings to detect conversions with potential data loss, and all such | ||
| 358 | +cases should be fixed by either using a function from ``QIntC`` or a | ||
| 359 | +``static_cast``. | ||
| 360 | + | ||
| 361 | +When the intention is just to switch the type because of exchanging data | ||
| 362 | +between incompatible interfaces, use ``QIntC``. This is the usual case. | ||
| 363 | +However, there are some cases in which we are explicitly intending to | ||
| 364 | +use the exact same bit pattern with a different type. This is most | ||
| 365 | +common when switching between signed and unsigned characters. A lot of | ||
| 366 | +qpdf's code uses unsigned characters internally, but ``std::string`` and | ||
| 367 | +``char`` are signed. Using ``QIntC::to_char`` would be wrong for | ||
| 368 | +converting from unsigned to signed characters because a negative | ||
| 369 | +``char`` value and the corresponding ``unsigned char`` value greater | ||
| 370 | +than 127 *mean the same thing*. There are also | ||
| 371 | +cases in which we use ``static_cast`` when working with bit fields where | ||
| 372 | +we are not representing a numerical value but rather a bunch of bits | ||
| 373 | +packed together in some integer type. Also note that ``size_t`` and | ||
| 374 | +``long`` both typically differ between 32-bit and 64-bit environments, | ||
| 375 | +so sometimes an explicit cast may not be needed to avoid warnings on one | ||
| 376 | +platform but may be needed on another. A conversion with ``QIntC`` | ||
| 377 | +should always be used when the types are different even if the | ||
| 378 | +underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit | ||
| 379 | +platforms, and the test suite is very thorough, so it is hard to make | ||
| 380 | +any of the potential errors here without being caught in build or test. | ||
| 381 | + | ||
| 382 | +Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The | ||
| 383 | +pipeline interface has a ``write`` call that uses ``unsigned char*`` | ||
| 384 | +without a ``const`` qualifier. The main reason for this is | ||
| 385 | +to support pipelines that make calls to third-party libraries, such as | ||
| 386 | +zlib, that don't include ``const`` in their interfaces. Unfortunately, | ||
| 387 | +there are many places in the code where it is desirable to have | ||
| 388 | +``const char*`` with pipelines. None of the pipeline implementations | ||
| 389 | +in qpdf | ||
| 390 | +currently modify the data passed to write, and doing so would be counter | ||
| 391 | +to the intent of ``Pipeline``, but there is nothing in the code to | ||
| 392 | +prevent this from being done. There are places in the code where | ||
| 393 | +``const_cast`` is used to remove the const-ness of pointers going into | ||
| 394 | +``Pipeline``\ s. This could theoretically be unsafe, but there is | ||
| 395 | +adequate testing to assert that it is safe and will remain safe in | ||
| 396 | +qpdf's code. | ||
| 397 | + | ||
| 398 | +.. _ref.encryption: | ||
| 399 | + | ||
| 400 | +Encryption | ||
| 401 | +---------- | ||
| 402 | + | ||
| 403 | +Encryption is supported transparently by qpdf. When opening a PDF file, | ||
| 404 | +if an encryption dictionary exists, the ``QPDF`` object processes this | ||
| 405 | +dictionary using the password (if any) provided. The primary decryption | ||
| 406 | +key is computed and cached. No further access is made to the encryption | ||
| 407 | +dictionary after that time. When an object is read from a file, the | ||
| 408 | +object ID and generation of the object in which it is contained is | ||
| 409 | +always known. Using this information along with the stored encryption | ||
| 410 | +key, all stream and string objects are transparently decrypted. Raw | ||
| 411 | +encrypted objects are never stored in memory. This way, nothing in the | ||
| 412 | +library ever has to know or care whether it is reading an encrypted | ||
| 413 | +file. | ||
| 414 | + | ||
| 415 | +An interface is also provided for writing encrypted streams and strings | ||
| 416 | +given an encryption key. This is used by ``QPDFWriter`` when it rewrites | ||
| 417 | +encrypted files. | ||
| 418 | + | ||
| 419 | +When copying encrypted files, unless otherwise directed, qpdf will | ||
| 420 | +preserve any encryption in force in the original file. qpdf can do this | ||
| 421 | +with either the user or the owner password. There is no difference in | ||
| 422 | +capability based on which password is used. When 40 or 128 bit | ||
| 423 | +encryption keys are used, the user password can be recovered with the | ||
| 424 | +owner password. With 256 keys, the user and owner passwords are used | ||
| 425 | +independently to encrypt the actual encryption key, so while either can | ||
| 426 | +be used, the owner password can no longer be used to recover the user | ||
| 427 | +password. | ||
| 428 | + | ||
| 429 | +Starting with version 4.0.0, qpdf can read files that are not encrypted | ||
| 430 | +but that contain encrypted attachments, but it cannot write such files. | ||
| 431 | +qpdf also requires the password to be specified in order to open the | ||
| 432 | +file, not just to extract attachments, since once the file is open, all | ||
| 433 | +decryption is handled transparently. When copying files like this while | ||
| 434 | +preserving encryption, qpdf will apply the file's encryption to | ||
| 435 | +everything in the file, not just to the attachments. When decrypting the | ||
| 436 | +file, qpdf will decrypt the attachments. In general, when copying PDF | ||
| 437 | +files with multiple encryption formats, qpdf will choose the newest | ||
| 438 | +format. The only exception to this is that clear-text metadata will be | ||
| 439 | +preserved as clear-text if it is that way in the original file. | ||
| 440 | + | ||
| 441 | +One point of confusion some people have about encrypted PDF files is | ||
| 442 | +that encryption is not the same as password protection. Password | ||
| 443 | +protected files are always encrypted, but it is also possible to create | ||
| 444 | +encrypted files that do not have passwords. Internally, such files use | ||
| 445 | +the empty string as a password, and most readers try the empty string | ||
| 446 | +first to see if it works and prompt for a password only if the empty | ||
| 447 | +string doesn't work. Normally such files have an empty user password and | ||
| 448 | +a non-empty owner password. In that way, if the file is opened by an | ||
| 449 | +ordinary reader without specification of password, the restrictions | ||
| 450 | +specified in the encryption dictionary can be enforced. Most users | ||
| 451 | +wouldn't even realize such a file was encrypted. Since qpdf always | ||
| 452 | +ignores the restrictions (except for the purpose of reporting what they | ||
| 453 | +are), qpdf doesn't care which password you use. QPDF will allow you to | ||
| 454 | +create PDF files with non-empty user passwords and empty owner | ||
| 455 | +passwords. Some readers will require a password when you open these | ||
| 456 | +files, and others will open the files without a password and not enforce | ||
| 457 | +restrictions. Having a non-empty user password and an empty owner | ||
| 458 | +password doesn't really make sense because it would mean that opening | ||
| 459 | +the file with the user password would be more restrictive than not | ||
| 460 | +supplying a password at all. QPDF also allows you to create PDF files | ||
| 461 | +with the same password as both the user and owner password. Some readers | ||
| 462 | +will not ever allow such files to be accessed without restrictions | ||
| 463 | +because they never try the password as the owner password if it works as | ||
| 464 | +the user password. Nonetheless, one of the powerful aspects of qpdf is | ||
| 465 | +that it allows you to finely specify the way encrypted files are | ||
| 466 | +created, even if the results are not useful to some readers. One use | ||
| 467 | +case for this would be for testing a PDF reader to ensure that it | ||
| 468 | +handles odd configurations of input files. | ||
| 469 | + | ||
| 470 | +.. _ref.random-numbers: | ||
| 471 | + | ||
| 472 | +Random Number Generation | ||
| 473 | +------------------------ | ||
| 474 | + | ||
| 475 | +QPDF generates random numbers to support generation of encrypted data. | ||
| 476 | +Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of | ||
| 477 | +random numbers. Older versions used the OS-provided source of secure | ||
| 478 | +random numbers or, if allowed at build time, insecure random numbers | ||
| 479 | +from stdlib. Starting with version 5.1.0, you can disable use of | ||
| 480 | +OS-provided secure random numbers at build time. This is especially | ||
| 481 | +useful on Windows if you want to avoid a dependency on Microsoft's | ||
| 482 | +cryptography API. You can also supply your own random data provider. For | ||
| 483 | +details on how to do this, please refer to the top-level README.md file | ||
| 484 | +in the source distribution and to comments in | ||
| 485 | +:file:`QUtil.hh`. | ||
| 486 | + | ||
| 487 | +.. _ref.adding-and-remove-pages: | ||
| 488 | + | ||
| 489 | +Adding and Removing Pages | ||
| 490 | +------------------------- | ||
| 491 | + | ||
| 492 | +While qpdf's API has supported adding and modifying objects for some | ||
| 493 | +time, version 3.0 introduces specific methods for adding and removing | ||
| 494 | +pages. These are largely convenience routines that handle two tricky | ||
| 495 | +issues: pushing inheritable resources from the ``/Pages`` tree down to | ||
| 496 | +individual pages and manipulation of the ``/Pages`` tree itself. For | ||
| 497 | +details, see ``addPage`` and surrounding methods in | ||
| 498 | +:file:`QPDF.hh`. | ||
| 499 | + | ||
| 500 | +.. _ref.reserved-objects: | ||
| 501 | + | ||
| 502 | +Reserving Object Numbers | ||
| 503 | +------------------------ | ||
| 504 | + | ||
| 505 | +Version 3.0 of qpdf introduced the concept of reserved objects. These | ||
| 506 | +are seldom needed for ordinary operations, but there are cases in which | ||
| 507 | +you may want to add a series of indirect objects with references to each | ||
| 508 | +other to a ``QPDF`` object. This causes a problem because you can't | ||
| 509 | +determine the object ID that a new indirect object will have until you | ||
| 510 | +add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The | ||
| 511 | +only way to add two mutually referential objects to a ``QPDF`` object | ||
| 512 | +prior to version 3.0 would be to add the new objects first and then make | ||
| 513 | +them refer to each other after adding them. Now it is possible to create | ||
| 514 | +a *reserved object* using | ||
| 515 | +``QPDFObjectHandle::newReserved``. This is an indirect object that stays | ||
| 516 | +"unresolved" even if it is queried for its type. So now, if you want to | ||
| 517 | +create a set of mutually referential objects, you can create | ||
| 518 | +reservations for each one of them and use those reservations to | ||
| 519 | +construct the references. When finished, you can call | ||
| 520 | +``QPDF::replaceReserved`` to replace the reserved objects with the real | ||
| 521 | +ones. This functionality will never be needed by most applications, but | ||
| 522 | +it is used internally by QPDF when copying objects from other PDF files, | ||
| 523 | +as discussed in :ref:`ref.foreign-objects`. For an example of how to use reserved | ||
| 524 | +objects, search for ``newReserved`` in | ||
| 525 | +:file:`test_driver.cc` in qpdf's sources. | ||
| 526 | + | ||
| 527 | +.. _ref.foreign-objects: | ||
| 528 | + | ||
| 529 | +Copying Objects From Other PDF Files | ||
| 530 | +------------------------------------ | ||
| 531 | + | ||
| 532 | +Version 3.0 of qpdf introduced the ability to copy objects into a | ||
| 533 | +``QPDF`` object from a different ``QPDF`` object, which we refer to as | ||
| 534 | +*foreign objects*. This allows arbitrary | ||
| 535 | +merging of PDF files. The "from" ``QPDF`` object must remain valid after | ||
| 536 | +the copy as discussed in the note below. The | ||
| 537 | +:command:`qpdf` command-line tool provides limited | ||
| 538 | +support for basic page selection, including merging in pages from other | ||
| 539 | +files, but the library's API makes it possible to implement arbitrarily | ||
| 540 | +complex merging operations. The main method for copying foreign objects | ||
| 541 | +is ``QPDF::copyForeignObject``. This takes an indirect object from | ||
| 542 | +another ``QPDF`` and copies it recursively into this object while | ||
| 543 | +preserving all object structure, including circular references. This | ||
| 544 | +means you can add a direct object that you create from scratch to a | ||
| 545 | +``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an | ||
| 546 | +indirect object from another file with ``QPDF::copyForeignObject``. The | ||
| 547 | +fact that ``QPDF::makeIndirectObject`` does not automatically detect a | ||
| 548 | +foreign object and copy it is an explicit design decision. Copying a | ||
| 549 | +foreign object seems like a sufficiently significant thing to do that it | ||
| 550 | +should be done explicitly. | ||
| 551 | + | ||
| 552 | +The other way to copy foreign objects is by passing a page from one | ||
| 553 | +``QPDF`` to another by calling ``QPDF::addPage``. In contrast to | ||
| 554 | +``QPDF::makeIndirectObject``, this method automatically distinguishes | ||
| 555 | +between indirect objects in the current file, foreign objects, and | ||
| 556 | +direct objects. | ||
| 557 | + | ||
| 558 | +Please note: when you copy objects from one ``QPDF`` to another, the | ||
| 559 | +source ``QPDF`` object must remain valid until you have finished with | ||
| 560 | +the destination object. This is because the original object is still | ||
| 561 | +used to retrieve any referenced stream data from the copied object. | ||
| 562 | + | ||
| 563 | +.. _ref.rewriting: | ||
| 564 | + | ||
| 565 | +Writing PDF Files | ||
| 566 | +----------------- | ||
| 567 | + | ||
| 568 | +The qpdf library supports file writing of ``QPDF`` objects to PDF files | ||
| 569 | +through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two | ||
| 570 | +writing modes: one for non-linearized files, and one for linearized | ||
| 571 | +files. See :ref:`ref.linearization` for a description of | ||
| 572 | +linearization is implemented. This section describes how we write | ||
| 573 | +non-linearized files including the creation of QDF files (see :ref:`ref.qdf`. | ||
| 574 | + | ||
| 575 | +This outline was written prior to implementation and is not exactly | ||
| 576 | +accurate, but it provides a correct "notional" idea of how writing | ||
| 577 | +works. Look at the code in ``QPDFWriter`` for exact details. | ||
| 578 | + | ||
| 579 | +- Initialize state: | ||
| 580 | + | ||
| 581 | + - next object number = 1 | ||
| 582 | + | ||
| 583 | + - object queue = empty | ||
| 584 | + | ||
| 585 | + - renumber table: old object id/generation to new id/0 = empty | ||
| 586 | + | ||
| 587 | + - xref table: new id -> offset = empty | ||
| 588 | + | ||
| 589 | +- Create a QPDF object from a file. | ||
| 590 | + | ||
| 591 | +- Write header for new PDF file. | ||
| 592 | + | ||
| 593 | +- Request the trailer dictionary. | ||
| 594 | + | ||
| 595 | +- For each value that is an indirect object, grab the next object | ||
| 596 | + number (via an operation that returns and increments the number). Map | ||
| 597 | + object to new number in renumber table. Push object onto queue. | ||
| 598 | + | ||
| 599 | +- While there are more objects on the queue: | ||
| 600 | + | ||
| 601 | + - Pop queue. | ||
| 602 | + | ||
| 603 | + - Look up object's new number *n* in the renumbering table. | ||
| 604 | + | ||
| 605 | + - Store current offset into xref table. | ||
| 606 | + | ||
| 607 | + - Write ``:samp:`{n}` 0 obj``. | ||
| 608 | + | ||
| 609 | + - If object is null, whether direct or indirect, write out null, | ||
| 610 | + thus eliminating unresolvable indirect object references. | ||
| 611 | + | ||
| 612 | + - If the object is a stream stream, write stream contents, piped | ||
| 613 | + through any filters as required, to a memory buffer. Use this | ||
| 614 | + buffer to determine the stream length. | ||
| 615 | + | ||
| 616 | + - If object is not a stream, array, or dictionary, write out its | ||
| 617 | + contents. | ||
| 618 | + | ||
| 619 | + - If object is an array or dictionary (including stream), traverse | ||
| 620 | + its elements (for array) or values (for dictionaries), handling | ||
| 621 | + recursive dictionaries and arrays, looking for indirect objects. | ||
| 622 | + When an indirect object is found, if it is not resolvable, ignore. | ||
| 623 | + (This case is handled when writing it out.) Otherwise, look it up | ||
| 624 | + in the renumbering table. If not found, grab the next available | ||
| 625 | + object number, assign to the referenced object in the renumbering | ||
| 626 | + table, and push the referenced object onto the queue. As a special | ||
| 627 | + case, when writing out a stream dictionary, replace length, | ||
| 628 | + filters, and decode parameters as required. | ||
| 629 | + | ||
| 630 | + Write out dictionary or array, replacing any unresolvable indirect | ||
| 631 | + object references with null (pdf spec says reference to | ||
| 632 | + non-existent object is legal and resolves to null) and any | ||
| 633 | + resolvable ones with references to the renumbered objects. | ||
| 634 | + | ||
| 635 | + - If the object is a stream, write ``stream\n``, the stream contents | ||
| 636 | + (from the memory buffer), and ``\nendstream\n``. | ||
| 637 | + | ||
| 638 | + - When done, write ``endobj``. | ||
| 639 | + | ||
| 640 | +Once we have finished the queue, all referenced objects will have been | ||
| 641 | +written out and all deleted objects or unreferenced objects will have | ||
| 642 | +been skipped. The new cross-reference table will contain an offset for | ||
| 643 | +every new object number from 1 up to the number of objects written. This | ||
| 644 | +can be used to write out a new xref table. Finally we can write out the | ||
| 645 | +trailer dictionary with appropriately computed /ID (see spec, 8.3, File | ||
| 646 | +Identifiers), the cross reference table offset, and ``%%EOF``. | ||
| 647 | + | ||
| 648 | +.. _ref.filtered-streams: | ||
| 649 | + | ||
| 650 | +Filtered Streams | ||
| 651 | +---------------- | ||
| 652 | + | ||
| 653 | +Support for streams is implemented through the ``Pipeline`` interface | ||
| 654 | +which was designed for this package. | ||
| 655 | + | ||
| 656 | +When reading streams, create a series of ``Pipeline`` objects. The | ||
| 657 | +``Pipeline`` abstract base requires implementation ``write()`` and | ||
| 658 | +``finish()`` and provides an implementation of ``getNext()``. Each | ||
| 659 | +pipeline object, upon receiving data, does whatever it is going to do | ||
| 660 | +and then writes the data (possibly modified) to its successor. | ||
| 661 | +Alternatively, a pipeline may be an end-of-the-line pipeline that does | ||
| 662 | +something like store its output to a file or a memory buffer ignoring a | ||
| 663 | +successor. For additional details, look at | ||
| 664 | +:file:`Pipeline.hh`. | ||
| 665 | + | ||
| 666 | +``QPDF`` can read raw or filtered streams. When reading a filtered | ||
| 667 | +stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each | ||
| 668 | +appropriate filter object and chains them together. The last filter | ||
| 669 | +should write to whatever type of output is required. The ``QPDF`` class | ||
| 670 | +has an interface to write raw or filtered stream contents to a given | ||
| 671 | +pipeline. | ||
| 672 | + | ||
| 673 | +.. _ref.object-accessors: | ||
| 674 | + | ||
| 675 | +Object Accessor Methods | ||
| 676 | +----------------------- | ||
| 677 | + | ||
| 678 | +.. | ||
| 679 | + This section is referenced in QPDFObjectHandle.hh | ||
| 680 | + | ||
| 681 | +For general information about how to access instances of | ||
| 682 | +``QPDFObjectHandle``, please see the comments in | ||
| 683 | +:file:`QPDFObjectHandle.hh`. Search for "Accessor | ||
| 684 | +methods". This section provides a more in-depth discussion of the | ||
| 685 | +behavior and the rationale for the behavior. | ||
| 686 | + | ||
| 687 | +*Why were type errors made into warnings?* When type checks were | ||
| 688 | +introduced into qpdf in the early days, it was expected that type errors | ||
| 689 | +would only occur as a result of programmer error. However, in practice, | ||
| 690 | +type errors would occur with malformed PDF files because of assumptions | ||
| 691 | +made in code, including code within the qpdf library and code written by | ||
| 692 | +library users. The most common case would be chaining calls to | ||
| 693 | +``getKey()`` to access keys deep within a dictionary. In many cases, | ||
| 694 | +qpdf would be able to recover from these situations, but the old | ||
| 695 | +behavior often resulted in crashes rather than graceful recovery. For | ||
| 696 | +this reason, the errors were changed to warnings. | ||
| 697 | + | ||
| 698 | +*Why even warn about type errors when the user can't usually do anything | ||
| 699 | +about them?* Type warnings are extremely valuable during development. | ||
| 700 | +Since it's impossible to catch at compile time things like typos in | ||
| 701 | +dictionary key names or logic errors around what the structure of a PDF | ||
| 702 | +file might be, the presence of type warnings can save lots of developer | ||
| 703 | +time. They have also proven useful in exposing issues in qpdf itself | ||
| 704 | +that would have otherwise gone undetected. | ||
| 705 | + | ||
| 706 | +*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if | ||
| 707 | +``QPDFObjectHandle`` could be more strongly typed so that you'd have to | ||
| 708 | +have check that something was of a particular type before calling | ||
| 709 | +type-specific accessor methods. However, implementing this at this stage | ||
| 710 | +of the library's history would be quite difficult, and it would make a | ||
| 711 | +the common pattern of drilling into an object no longer work. While it | ||
| 712 | +would be possible to have a parallel interface, it would create a lot of | ||
| 713 | +extra code. If qpdf were written in a language like rust, an interface | ||
| 714 | +like this would make a lot of sense, but, for a variety of reasons, the | ||
| 715 | +qpdf API is consistent with other APIs of its time, relying on exception | ||
| 716 | +handling to catch errors. The underlying PDF objects are inherently not | ||
| 717 | +type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would | ||
| 718 | +ultimately cause a lot more code to have to be written and would like | ||
| 719 | +make software that uses qpdf more brittle, and even so, checks would | ||
| 720 | +have to occur at runtime. | ||
| 721 | + | ||
| 722 | +*Why do type errors sometimes raise exceptions?* The way warnings work | ||
| 723 | +in qpdf requires a ``QPDF`` object to be associated with an object | ||
| 724 | +handle for a warning to be issued. It would be nice if this could be | ||
| 725 | +fixed, but it would require major changes to the API. Rather than | ||
| 726 | +throwing away these conditions, we convert them to exceptions. It's not | ||
| 727 | +that bad though. Since any object handle that was read from a file has | ||
| 728 | +an associated ``QPDF`` object, it would only be type errors on objects | ||
| 729 | +that were created explicitly that would cause exceptions, and in that | ||
| 730 | +case, type errors are much more likely to be the result of a coding | ||
| 731 | +error than invalid input. | ||
| 732 | + | ||
| 733 | +*Why does the behavior of a type exception differ between the C and C++ | ||
| 734 | +API?* There is no way to throw and catch exceptions in C short of | ||
| 735 | +something like ``setjmp`` and ``longjmp``, and that approach is not | ||
| 736 | +portable across language barriers. Since the C API is often used from | ||
| 737 | +other languages, it's important to keep things as simple as possible. | ||
| 738 | +Starting in qpdf 10.5, exceptions that used to crash code using the C | ||
| 739 | +API will be written to stderr by default, and it is possible to register | ||
| 740 | +an error handler. There's no reason that the error handler can't | ||
| 741 | +simulate exception handling in some way, such as by using ``setjmp`` and | ||
| 742 | +``longjmp`` or by setting some variable that can be checked after | ||
| 743 | +library calls are made. In retrospect, it might have been better if the | ||
| 744 | +C API object handle methods returned error codes like the other methods | ||
| 745 | +and set return values in passed-in pointers, but this would complicate | ||
| 746 | +both the implementation and the use of the library for a case that is | ||
| 747 | +actually quite rare and largely avoidable. |
manual/index.rst
| @@ -9,6261 +9,16 @@ QPDF version |release| | @@ -9,6261 +9,16 @@ QPDF version |release| | ||
| 9 | :maxdepth: 2 | 9 | :maxdepth: 2 |
| 10 | :caption: Contents: | 10 | :caption: Contents: |
| 11 | 11 | ||
| 12 | -.. _ref.overview: | ||
| 13 | - | ||
| 14 | -What is QPDF? | ||
| 15 | -============= | ||
| 16 | - | ||
| 17 | -QPDF is a program and C++ library for structural, content-preserving | ||
| 18 | -transformations on PDF files. QPDF's website is located at | ||
| 19 | -https://qpdf.sourceforge.io/. QPDF's source code is hosted on github | ||
| 20 | -at https://github.com/qpdf/qpdf. | ||
| 21 | - | ||
| 22 | -QPDF provides many useful capabilities to developers of PDF-producing | ||
| 23 | -software or for people who just want to look at the innards of a PDF | ||
| 24 | -file to learn more about how they work. With QPDF, it is possible to | ||
| 25 | -copy objects from one PDF file into another and to manipulate the list | ||
| 26 | -of pages in a PDF file. This makes it possible to merge and split PDF | ||
| 27 | -files. The QPDF library also makes it possible for you to create PDF | ||
| 28 | -files from scratch. In this mode, you are responsible for supplying | ||
| 29 | -all the contents of the file, while the QPDF library takes care of all | ||
| 30 | -the syntactical representation of the objects, creation of cross | ||
| 31 | -references tables and, if you use them, object streams, encryption, | ||
| 32 | -linearization, and other syntactic details. You are still responsible | ||
| 33 | -for generating PDF content on your own. | ||
| 34 | - | ||
| 35 | -QPDF has been designed with very few external dependencies, and it is | ||
| 36 | -intentionally very lightweight. QPDF is *not* a PDF content creation | ||
| 37 | -library, a PDF viewer, or a program capable of converting PDF into other | ||
| 38 | -formats. In particular, QPDF knows nothing about the semantics of PDF | ||
| 39 | -content streams. If you are looking for something that can do that, you | ||
| 40 | -should look elsewhere. However, once you have a valid PDF file, QPDF can | ||
| 41 | -be used to transform that file in ways that perhaps your original PDF | ||
| 42 | -creation tool can't handle. For example, many programs generate simple PDF | ||
| 43 | -files but can't password-protect them, web-optimize them, or perform | ||
| 44 | -other transformations of that type. | ||
| 45 | - | ||
| 46 | -.. _ref.license: | ||
| 47 | - | ||
| 48 | -License | ||
| 49 | -======= | ||
| 50 | - | ||
| 51 | -QPDF is licensed under `the Apache License, Version 2.0 | ||
| 52 | -<http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License"). | ||
| 53 | -Unless required by applicable law or agreed to in writing, software | ||
| 54 | -distributed under the License is distributed on an "AS IS" BASIS, | ||
| 55 | -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | ||
| 56 | -implied. See the License for the specific language governing | ||
| 57 | -permissions and limitations under the License. | ||
| 58 | - | ||
| 59 | -.. _ref.installing: | ||
| 60 | - | ||
| 61 | -Building and Installing QPDF | ||
| 62 | -============================ | ||
| 63 | - | ||
| 64 | -This chapter describes how to build and install qpdf. Please see also | ||
| 65 | -the :file:`README.md` and | ||
| 66 | -:file:`INSTALL` files in the source distribution. | ||
| 67 | - | ||
| 68 | -.. _ref.prerequisites: | ||
| 69 | - | ||
| 70 | -System Requirements | ||
| 71 | -------------------- | ||
| 72 | - | ||
| 73 | -The qpdf package has few external dependencies. In order to build qpdf, | ||
| 74 | -the following packages are required: | ||
| 75 | - | ||
| 76 | -- A C++ compiler that supports C++-14. | ||
| 77 | - | ||
| 78 | -- zlib: http://www.zlib.net/ | ||
| 79 | - | ||
| 80 | -- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/ | ||
| 81 | - | ||
| 82 | -- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be | ||
| 83 | - able to use the gnutls crypto provider, and/or openssl: | ||
| 84 | - https://openssl.org/ to be able to use the openssl crypto provider. | ||
| 85 | - | ||
| 86 | -- gnu make 3.81 or newer: http://www.gnu.org/software/make | ||
| 87 | - | ||
| 88 | -- perl version 5.8 or newer: http://www.perl.org/; required for running | ||
| 89 | - the test suite. Starting with qpdf version 9.1.1, perl is no longer | ||
| 90 | - required at runtime. | ||
| 91 | - | ||
| 92 | -- GNU diffutils (any version): http://www.gnu.org/software/diffutils/ | ||
| 93 | - is required to run the test suite. Note that this is the version of | ||
| 94 | - diff present on virtually all GNU/Linux systems. This is required | ||
| 95 | - because the test suite uses :command:`diff -u`. | ||
| 96 | - | ||
| 97 | -Part of qpdf's test suite does comparisons of the contents PDF files by | ||
| 98 | -converting them images and comparing the images. The image comparison | ||
| 99 | -tests are disabled by default. Those tests are not required for | ||
| 100 | -determining correctness of a qpdf build if you have not modified the | ||
| 101 | -code since the test suite also contains expected output files that are | ||
| 102 | -compared literally. The image comparison tests provide an extra check to | ||
| 103 | -make sure that any content transformations don't break the rendering of | ||
| 104 | -pages. Transformations that affect the content streams themselves are | ||
| 105 | -off by default and are only provided to help developers look into the | ||
| 106 | -contents of PDF files. If you are making deep changes to the library | ||
| 107 | -that cause changes in the contents of the files that qpdf generate, | ||
| 108 | -then you should enable the image comparison tests. Enable them by | ||
| 109 | -running :command:`configure` with the | ||
| 110 | -:samp:`--enable-test-compare-images` flag. If you enable | ||
| 111 | -this, the following additional requirements are required by the test | ||
| 112 | -suite. Note that in no case are these items required to use qpdf. | ||
| 113 | - | ||
| 114 | -- libtiff: http://www.remotesensing.org/libtiff/ | ||
| 115 | - | ||
| 116 | -- GhostScript version 8.60 or newer: http://www.ghostscript.com | ||
| 117 | - | ||
| 118 | -If you do not enable this, then you do not need to have tiff and | ||
| 119 | -ghostscript. | ||
| 120 | - | ||
| 121 | -Pre-built documentation is distributed with qpdf, so you should | ||
| 122 | -generally not need to rebuild the documentation. In order to build the | ||
| 123 | -documentation from source, you need to install `Sphinx | ||
| 124 | -<https://sphinx-doc.org>`__. To build the PDF version of the | ||
| 125 | -documentation, you need `pdflatex`, `latexmk`, and a fairly complete | ||
| 126 | -LaTeX installation. Detailed requirements can be found in the Sphinx | ||
| 127 | -documentation. | ||
| 128 | - | ||
| 129 | -.. _ref.building: | ||
| 130 | - | ||
| 131 | -Build Instructions | ||
| 132 | ------------------- | ||
| 133 | - | ||
| 134 | -Building qpdf on UNIX is generally just a matter of running | ||
| 135 | - | ||
| 136 | -:: | ||
| 137 | - | ||
| 138 | - ./configure | ||
| 139 | - make | ||
| 140 | - | ||
| 141 | -You can also run :command:`make check` to run the test | ||
| 142 | -suite and :command:`make install` to install. Please run | ||
| 143 | -:command:`./configure --help` for options on what can be | ||
| 144 | -configured. You can also set the value of ``DESTDIR`` during | ||
| 145 | -installation to install to a temporary location, as is common with many | ||
| 146 | -open source packages. Please see also the | ||
| 147 | -:file:`README.md` and | ||
| 148 | -:file:`INSTALL` files in the source distribution. | ||
| 149 | - | ||
| 150 | -Building on Windows is a little bit more complicated. For details, | ||
| 151 | -please see :file:`README-windows.md` in the source | ||
| 152 | -distribution. You can also download a binary distribution for Windows. | ||
| 153 | -There is a port of qpdf to Visual C++ version 6 in the | ||
| 154 | -:file:`contrib` area generously contributed by Jian | ||
| 155 | -Ma. This is also discussed in more detail in | ||
| 156 | -:file:`README-windows.md`. | ||
| 157 | - | ||
| 158 | -While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one | ||
| 159 | -place in the public API, and it's just in a helper function. It is | ||
| 160 | -possible to build qpdf on a system that doesn't have ``wchar_t``, and | ||
| 161 | -it's also possible to compile a program that uses qpdf on a system | ||
| 162 | -without ``wchar_t`` as long as you don't call that one method. This is a | ||
| 163 | -very unusual situation. For a detailed discussion, please see the | ||
| 164 | -top-level README.md file in qpdf's source distribution. | ||
| 165 | - | ||
| 166 | -There are some other things you can do with the build. Although qpdf | ||
| 167 | -uses :command:`autoconf`, it does not use | ||
| 168 | -:command:`automake` but instead uses a | ||
| 169 | -hand-crafted non-recursive Makefile that requires gnu make. If you're | ||
| 170 | -really interested, please read the comments in the top-level | ||
| 171 | -:file:`Makefile`. | ||
| 172 | - | ||
| 173 | -.. _ref.crypto: | ||
| 174 | - | ||
| 175 | -Crypto Providers | ||
| 176 | ----------------- | ||
| 177 | - | ||
| 178 | -Starting with qpdf 9.1.0, the qpdf library can be built with multiple | ||
| 179 | -implementations of providers of cryptographic functions, which we refer | ||
| 180 | -to as "crypto providers." At the time of writing, a crypto | ||
| 181 | -implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes | ||
| 182 | -and RC4 and AES256 with and without CBC encryption. In the future, if | ||
| 183 | -digital signature is added to qpdf, there may be additional requirements | ||
| 184 | -beyond this. | ||
| 185 | - | ||
| 186 | -Starting with qpdf version 9.1.0, the available implementations are | ||
| 187 | -``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added. | ||
| 188 | -Additional implementations may be added if needed. It is also possible | ||
| 189 | -for a developer to provide their own implementation without modifying | ||
| 190 | -the qpdf library. | ||
| 191 | - | ||
| 192 | -.. _ref.crypto.build: | ||
| 193 | - | ||
| 194 | -Build Support For Crypto Providers | ||
| 195 | -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 196 | - | ||
| 197 | -When building with qpdf's build system, crypto providers can be enabled | ||
| 198 | -at build time using various :command:`./configure` | ||
| 199 | -options. The default behavior is for | ||
| 200 | -:command:`./configure` to discover which crypto providers | ||
| 201 | -can be supported based on available external libraries, to build all | ||
| 202 | -available crypto providers, and to use an external provider as the | ||
| 203 | -default over the native one. This behavior can be changed with the | ||
| 204 | -following flags to :command:`./configure`: | ||
| 205 | - | ||
| 206 | -- :samp:`--enable-crypto-{x}` | ||
| 207 | - (where :samp:`{x}` is a supported crypto | ||
| 208 | - provider): enable the :samp:`{x}` crypto | ||
| 209 | - provider, requiring any external dependencies it needs | ||
| 210 | - | ||
| 211 | -- :samp:`--disable-crypto-{x}`: | ||
| 212 | - disable the :samp:`{x}` provider, and do not | ||
| 213 | - link against its dependencies even if they are available | ||
| 214 | - | ||
| 215 | -- :samp:`--with-default-crypto={x}`: | ||
| 216 | - make :samp:`{x}` the default provider even if | ||
| 217 | - a higher priority one is available | ||
| 218 | - | ||
| 219 | -- :samp:`--disable-implicit-crypto`: only build crypto | ||
| 220 | - providers that are explicitly requested with an | ||
| 221 | - :samp:`--enable-crypto-{x}` | ||
| 222 | - option | ||
| 223 | - | ||
| 224 | -For example, if you want to guarantee that the gnutls crypto provider is | ||
| 225 | -used and that the native provider is not built, you could run | ||
| 226 | -:command:`./configure --enable-crypto-gnutls | ||
| 227 | ---disable-implicit-crypto`. | ||
| 228 | - | ||
| 229 | -If you build qpdf using your own build system, in order for qpdf to work | ||
| 230 | -at all, you need to enable at least one crypto provider. The file | ||
| 231 | -:file:`libqpdf/qpdf/qpdf-config.h.in` provides | ||
| 232 | -macros ``DEFAULT_CRYPTO``, whose value must be a string naming the | ||
| 233 | -default crypto provider, and various symbols starting with | ||
| 234 | -``USE_CRYPTO_``, at least one of which has to be enabled. Additionally, | ||
| 235 | -you must compile the source files that implement a crypto provider. To | ||
| 236 | -get a list of those files, look at | ||
| 237 | -:file:`libqpdf/build.mk`. If you want to omit a | ||
| 238 | -particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is | ||
| 239 | -undefined, you can completely ignore the source files that belong to a | ||
| 240 | -particular crypto provider. Additionally, crypto providers may have | ||
| 241 | -their own external dependencies that can be omitted if the crypto | ||
| 242 | -provider is not used. For example, if you are building qpdf yourself and | ||
| 243 | -are using an environment that does not support gnutls or openssl, you | ||
| 244 | -can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS`` | ||
| 245 | -is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then | ||
| 246 | -you must include the source files used in the native implementation, | ||
| 247 | -some of which were added or renamed from earlier versions, to your | ||
| 248 | -build, and you can ignore | ||
| 249 | -:file:`QPDFCrypto_gnutls.cc`. Always consult | ||
| 250 | -:file:`libqpdf/build.mk` to get the list of source | ||
| 251 | -files you need to build. | ||
| 252 | - | ||
| 253 | -.. _ref.crypto.runtime: | ||
| 254 | - | ||
| 255 | -Runtime Crypto Provider Selection | ||
| 256 | -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 257 | - | ||
| 258 | -You can use the :samp:`--show-crypto` option to | ||
| 259 | -:command:`qpdf` to get a list of available crypto | ||
| 260 | -providers. The default provider is always listed first, and the rest are | ||
| 261 | -listed in lexical order. Each crypto provider is listed on a line by | ||
| 262 | -itself with no other text, enabling the output of this command to be | ||
| 263 | -used easily in scripts. | ||
| 264 | - | ||
| 265 | -You can override which crypto provider is used by setting the | ||
| 266 | -``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to | ||
| 267 | -ever do this, but you might want to do it if you were explicitly trying | ||
| 268 | -to compare behavior of two different crypto providers while testing | ||
| 269 | -performance or reproducing a bug. It could also be useful for people who | ||
| 270 | -are implementing their own crypto providers. | ||
| 271 | - | ||
| 272 | -.. _ref.crypto.develop: | ||
| 273 | - | ||
| 274 | -Crypto Provider Information for Developers | ||
| 275 | -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 276 | - | ||
| 277 | -If you are writing code that uses libqpdf and you want to force a | ||
| 278 | -certain crypto provider to be used, you can call the method | ||
| 279 | -``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of | ||
| 280 | -a built-in or developer-supplied provider. To add your own crypto | ||
| 281 | -provider, you have to create a class derived from ``QPDFCryptoImpl`` and | ||
| 282 | -register it with ``QPDFCryptoProvider``. For additional information, see | ||
| 283 | -comments in :file:`include/qpdf/QPDFCryptoImpl.hh`. | ||
| 284 | - | ||
| 285 | -.. _ref.crypto.design: | ||
| 286 | - | ||
| 287 | -Crypto Provider Design Notes | ||
| 288 | -~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 289 | - | ||
| 290 | -This section describes a few bits of rationale for why the crypto | ||
| 291 | -provider interface was set up the way it was. You don't need to know any | ||
| 292 | -of this information, but it's provided for the record and in case it's | ||
| 293 | -interesting. | ||
| 294 | - | ||
| 295 | -As a general rule, I want to avoid as much as possible including large | ||
| 296 | -blocks of code that are conditionally compiled such that, in most | ||
| 297 | -builds, some code is never built. This is dangerous because it makes it | ||
| 298 | -very easy for invalid code to creep in unnoticed. As such, I want it to | ||
| 299 | -be possible to build qpdf with all available crypto providers, and this | ||
| 300 | -is the way I build qpdf for local development. At the same time, if a | ||
| 301 | -particular packager feels that it is a security liability for qpdf to | ||
| 302 | -use crypto functionality from other than a library that gets | ||
| 303 | -considerable scrutiny for this specific purpose (such as gnutls, | ||
| 304 | -openssl, or nettle), then I want to give that packager the ability to | ||
| 305 | -completely disable qpdf's native implementation. Or if someone wants to | ||
| 306 | -avoid adding a dependency on one of the external crypto providers, I | ||
| 307 | -don't want the availability of the provider to impose additional | ||
| 308 | -external dependencies within that environment. Both of these are | ||
| 309 | -situations that I know to be true for some users of qpdf. | ||
| 310 | - | ||
| 311 | -I want registration and selection of crypto providers to be thread-safe, | ||
| 312 | -and I want it to work deterministically for a developer to provide their | ||
| 313 | -own crypto provider and be able to set it up as the default. This was | ||
| 314 | -the primary motivation behind requiring C++-11 as doing so enabled me to | ||
| 315 | -exploit the guaranteed thread safety of local block static | ||
| 316 | -initialization. The ``QPDFCryptoProvider`` class uses a singleton | ||
| 317 | -pattern with thread-safe initialization to create the singleton instance | ||
| 318 | -of ``QPDFCryptoProvider`` and exposes only static methods in its public | ||
| 319 | -interface. In this way, if a developer wants to call any | ||
| 320 | -``QPDFCryptoProvider`` methods, the library guarantees the | ||
| 321 | -``QPDFCryptoProvider`` is fully initialized and all built-in crypto | ||
| 322 | -providers are registered. Making ``QPDFCryptoProvider`` actually know | ||
| 323 | -about all the built-in providers may seem a bit sad at first, but this | ||
| 324 | -choice makes it extremely clear exactly what the initialization behavior | ||
| 325 | -is. There's no question about provider implementations automatically | ||
| 326 | -registering themselves in a nondeterministic order. It also means that | ||
| 327 | -implementations do not need to know anything about the provider | ||
| 328 | -interface, which makes them easier to test in isolation. Another | ||
| 329 | -advantage of this approach is that a developer who wants to develop | ||
| 330 | -their own crypto provider can do so in complete isolation from the qpdf | ||
| 331 | -library and, with just two calls, can make qpdf use their provider in | ||
| 332 | -their application. If they decided to contribute their code, plugging it | ||
| 333 | -into the qpdf library would require a very small change to qpdf's source | ||
| 334 | -code. | ||
| 335 | - | ||
| 336 | -The decision to make the crypto provider selectable at runtime was one I | ||
| 337 | -struggled with a little, but I decided to do it for various reasons. | ||
| 338 | -Allowing an end user to switch crypto providers easily could be very | ||
| 339 | -useful for reproducing a potential bug. If a user reports a bug that | ||
| 340 | -some cryptographic thing is broken, I can easily ask that person to try | ||
| 341 | -with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The | ||
| 342 | -same could apply in the event of a performance problem. This also makes | ||
| 343 | -it easier for qpdf's own test suite to exercise code with different | ||
| 344 | -providers without having to make every program that links with qpdf | ||
| 345 | -aware of the possibility of multiple providers. In qpdf's continuous | ||
| 346 | -integration environment, the entire test suite is run for each supported | ||
| 347 | -crypto provider. This is made simple by being able to select the | ||
| 348 | -provider using an environment variable. | ||
| 349 | - | ||
| 350 | -Finally, making crypto providers selectable in this way establish a | ||
| 351 | -pattern that I may follow again in the future for stream filter | ||
| 352 | -providers. One could imagine a future enhancement where someone could | ||
| 353 | -provide their own implementations for basic filters like | ||
| 354 | -``/FlateDecode`` or for other filters that qpdf doesn't support. | ||
| 355 | -Implementing the registration functions and internal storage of | ||
| 356 | -registered providers was also easier using C++-11's functional | ||
| 357 | -interfaces, which was another reason to require C++-11 at this time. | ||
| 358 | - | ||
| 359 | -.. _ref.packaging: | ||
| 360 | - | ||
| 361 | -Notes for Packagers | ||
| 362 | -------------------- | ||
| 363 | - | ||
| 364 | -If you are packaging qpdf for an operating system distribution, here are | ||
| 365 | -some things you may want to keep in mind: | ||
| 366 | - | ||
| 367 | -- Starting in qpdf version 9.1.1, qpdf no longer has a runtime | ||
| 368 | - dependency on perl. This is because fix-qdf was rewritten in C++. | ||
| 369 | - However, qpdf still has a build-time dependency on perl. | ||
| 370 | - | ||
| 371 | -- Make sure you are getting the intended behavior with regard to crypto | ||
| 372 | - providers. Read :ref:`ref.crypto.build` for details. | ||
| 373 | - | ||
| 374 | -- Passing :samp:`--enable-show-failed-test-output` to | ||
| 375 | - :command:`./configure` will cause any failed test | ||
| 376 | - output to be written to the console. This can be very useful for | ||
| 377 | - seeing test failures generated by autobuilders where you can't access | ||
| 378 | - qtest.log after the fact. | ||
| 379 | - | ||
| 380 | -- If qpdf's build environment detects the presence of autoconf and | ||
| 381 | - related tools, it will check to ensure that automatically generated | ||
| 382 | - files are up-to-date with recorded checksums and fail if it detects a | ||
| 383 | - discrepancy. This feature is intended to prevent you from | ||
| 384 | - accidentally forgetting to regenerate automatic files after modifying | ||
| 385 | - their sources. If your packaging environment automatically refreshes | ||
| 386 | - automatic files, it can cause this check to fail. Suppress qpdf's | ||
| 387 | - checks by passing :samp:`--disable-check-autofiles` | ||
| 388 | - to :command:`/.configure`. This is safe since qpdf's | ||
| 389 | - :command:`autogen.sh` just runs autotools in the | ||
| 390 | - normal way. | ||
| 391 | - | ||
| 392 | -- QPDF's :command:`make install` does not install | ||
| 393 | - completion files by default, but as a packager, it's good if you | ||
| 394 | - install them wherever your distribution expects such files to go. You | ||
| 395 | - can find completion files to install in the | ||
| 396 | - :file:`completions` directory. | ||
| 397 | - | ||
| 398 | -- Packagers are encouraged to install the source files from the | ||
| 399 | - :file:`examples` directory along with qpdf | ||
| 400 | - development packages. | ||
| 401 | - | ||
| 402 | -.. _ref.using: | ||
| 403 | - | ||
| 404 | -Running QPDF | ||
| 405 | -============ | ||
| 406 | - | ||
| 407 | -This chapter describes how to run the qpdf program from the command | ||
| 408 | -line. | ||
| 409 | - | ||
| 410 | -.. _ref.invocation: | ||
| 411 | - | ||
| 412 | -Basic Invocation | ||
| 413 | ----------------- | ||
| 414 | - | ||
| 415 | -When running qpdf, the basic invocation is as follows: | ||
| 416 | - | ||
| 417 | -:: | ||
| 418 | - | ||
| 419 | - qpdf [ options ] { infilename | --empty } outfilename | ||
| 420 | - | ||
| 421 | -This converts PDF file :samp:`infilename` to PDF file | ||
| 422 | -:samp:`outfilename`. The output file is functionally | ||
| 423 | -identical to the input file but may have been structurally reorganized. | ||
| 424 | -Also, orphaned objects will be removed from the file. Many | ||
| 425 | -transformations are available as controlled by the options below. In | ||
| 426 | -place of :samp:`infilename`, the parameter | ||
| 427 | -:samp:`--empty` may be specified. This causes qpdf to | ||
| 428 | -use a dummy input file that contains zero pages. The only normal use | ||
| 429 | -case for using :samp:`--empty` would be if you were | ||
| 430 | -going to add pages from another source, as discussed in :ref:`ref.page-selection`. | ||
| 431 | - | ||
| 432 | -If :samp:`@filename` appears as a word anywhere in the | ||
| 433 | -command-line, it will be read line by line, and each line will be | ||
| 434 | -treated as a command-line argument. Leading and trailing whitespace is | ||
| 435 | -intentionally not removed from lines, which makes it possible to handle | ||
| 436 | -arguments that start or end with spaces. The :samp:`@-` | ||
| 437 | -option allows arguments to be read from standard input. This allows qpdf | ||
| 438 | -to be invoked with an arbitrary number of arbitrarily long arguments. It | ||
| 439 | -is also very useful for avoiding having to pass passwords on the command | ||
| 440 | -line. Note that the :samp:`@filename` can't appear in | ||
| 441 | -the middle of an argument, so constructs such as | ||
| 442 | -:samp:`--arg=@option` will not work. You would have to | ||
| 443 | -include the argument and its options together in the arguments file. | ||
| 444 | - | ||
| 445 | -:samp:`outfilename` does not have to be seekable, even | ||
| 446 | -when generating linearized files. Specifying ":samp:`-`" | ||
| 447 | -as :samp:`outfilename` means to write to standard | ||
| 448 | -output. If you want to overwrite the input file with the output, use the | ||
| 449 | -option :samp:`--replace-input` and omit the output file | ||
| 450 | -name. You can't specify the same file as both the input and the output. | ||
| 451 | -If you do this, qpdf will tell you about the | ||
| 452 | -:samp:`--replace-input` option. | ||
| 453 | - | ||
| 454 | -Most options require an output file, but some testing or inspection | ||
| 455 | -commands do not. These are specifically noted. | ||
| 456 | - | ||
| 457 | -.. _ref.exit-status: | ||
| 458 | - | ||
| 459 | -Exit Status | ||
| 460 | -~~~~~~~~~~~ | ||
| 461 | - | ||
| 462 | -The exit status of :command:`qpdf` may be interpreted as | ||
| 463 | -follows: | ||
| 464 | - | ||
| 465 | -- ``0``: no errors or warnings were found. The file may still have | ||
| 466 | - problems qpdf can't detect. If | ||
| 467 | - :samp:`--warning-exit-0` was specified, exit status 0 | ||
| 468 | - is used even if there are warnings. | ||
| 469 | - | ||
| 470 | -- ``2``: errors were found. qpdf was not able to fully process the | ||
| 471 | - file. | ||
| 472 | - | ||
| 473 | -- ``3``: qpdf encountered problems that it was able to recover from. In | ||
| 474 | - some cases, the resulting file may still be damaged. Note that qpdf | ||
| 475 | - still exits with status ``3`` if it finds warnings even when | ||
| 476 | - :samp:`--no-warn` is specified. With | ||
| 477 | - :samp:`--warning-exit-0`, warnings without errors | ||
| 478 | - exit with status 0 instead of 3. | ||
| 479 | - | ||
| 480 | -Note that :command:`qpdf` never exists with status ``1``. | ||
| 481 | -If you get an exit status of ``1``, it was something else, like the | ||
| 482 | -shell not being able to find or execute :command:`qpdf`. | ||
| 483 | - | ||
| 484 | -.. _ref.shell-completion: | ||
| 485 | - | ||
| 486 | -Shell Completion | ||
| 487 | ----------------- | ||
| 488 | - | ||
| 489 | -Starting in qpdf version 8.3.0, qpdf provides its own completion support | ||
| 490 | -for zsh and bash. You can enable bash completion with :command:`eval | ||
| 491 | -$(qpdf --completion-bash)` and zsh completion with | ||
| 492 | -:command:`eval $(qpdf --completion-zsh)`. If | ||
| 493 | -:command:`qpdf` is not in your path, you should invoke it | ||
| 494 | -above with an absolute path. If you invoke it with a relative path, it | ||
| 495 | -will warn you, and the completion won't work if you're in a different | ||
| 496 | -directory. | ||
| 497 | - | ||
| 498 | -qpdf will use ``argv[0]`` to figure out where its executable is. This | ||
| 499 | -may produce unwanted results in some cases, especially if you are trying | ||
| 500 | -to use completion with copy of qpdf that is built from source. You can | ||
| 501 | -specify a full path to the qpdf you want to use for completion in the | ||
| 502 | -``QPDF_EXECUTABLE`` environment variable. | ||
| 503 | - | ||
| 504 | -.. _ref.basic-options: | ||
| 505 | - | ||
| 506 | -Basic Options | ||
| 507 | -------------- | ||
| 508 | - | ||
| 509 | -The following options are the most common ones and perform commonly | ||
| 510 | -needed transformations. | ||
| 511 | - | ||
| 512 | -:samp:`--help` | ||
| 513 | - Display command-line invocation help. | ||
| 514 | - | ||
| 515 | -:samp:`--version` | ||
| 516 | - Display the current version of qpdf. | ||
| 517 | - | ||
| 518 | -:samp:`--copyright` | ||
| 519 | - Show detailed copyright information. | ||
| 520 | - | ||
| 521 | -:samp:`--show-crypto` | ||
| 522 | - Show a list of available crypto providers, each on a line by itself. | ||
| 523 | - The default provider is always listed first. See :ref:`ref.crypto` for more information about crypto | ||
| 524 | - providers. | ||
| 525 | - | ||
| 526 | -:samp:`--completion-bash` | ||
| 527 | - Output a completion command you can eval to enable shell completion | ||
| 528 | - from bash. | ||
| 529 | - | ||
| 530 | -:samp:`--completion-zsh` | ||
| 531 | - Output a completion command you can eval to enable shell completion | ||
| 532 | - from zsh. | ||
| 533 | - | ||
| 534 | -:samp:`--password={password}` | ||
| 535 | - Specifies a password for accessing encrypted files. To read the | ||
| 536 | - password from a file or standard input, you can use | ||
| 537 | - :samp:`--password-file`, added in qpdf 10.2. Note | ||
| 538 | - that you can also use :samp:`@filename` or | ||
| 539 | - :samp:`@-` as described above to put the password in | ||
| 540 | - a file or pass it via standard input, but you would do so by | ||
| 541 | - specifying the entire | ||
| 542 | - :samp:`--password={password}` | ||
| 543 | - option in the file. Syntax such as | ||
| 544 | - :samp:`--password=@filename` won't work since | ||
| 545 | - :samp:`@filename` is not recognized in the middle of | ||
| 546 | - an argument. | ||
| 547 | - | ||
| 548 | -:samp:`--password-file={filename}` | ||
| 549 | - Reads the first line from the specified file and uses it as the | ||
| 550 | - password for accessing encrypted files. | ||
| 551 | - :samp:`{filename}` | ||
| 552 | - may be ``-`` to read the password from standard input. Note that, in | ||
| 553 | - this case, the password is echoed and there is no prompt, so use with | ||
| 554 | - caution. | ||
| 555 | - | ||
| 556 | -:samp:`--is-encrypted` | ||
| 557 | - Silently exit with status 0 if the file is encrypted or status 2 if | ||
| 558 | - the file is not encrypted. This is useful for shell scripts. Other | ||
| 559 | - options are ignored if this is given. This option is mutually | ||
| 560 | - exclusive with :samp:`--requires-password`. Both this | ||
| 561 | - option and :samp:`--requires-password` exit with | ||
| 562 | - status 2 for non-encrypted files. | ||
| 563 | - | ||
| 564 | -:samp:`--requires-password` | ||
| 565 | - Silently exit with status 0 if a password (other than as supplied) is | ||
| 566 | - required. Exit with status 2 if the file is not encrypted. Exit with | ||
| 567 | - status 3 if the file is encrypted but requires no password or the | ||
| 568 | - correct password has been supplied. This is useful for shell scripts. | ||
| 569 | - Note that any supplied password is used when opening the file. When | ||
| 570 | - used with a :samp:`--password` option, this option | ||
| 571 | - can be used to check the correctness of the password. In that case, | ||
| 572 | - an exit status of 3 means the file works with the supplied password. | ||
| 573 | - This option is mutually exclusive with | ||
| 574 | - :samp:`--is-encrypted`. Both this option and | ||
| 575 | - :samp:`--is-encrypted` exit with status 2 for | ||
| 576 | - non-encrypted files. | ||
| 577 | - | ||
| 578 | -:samp:`--verbose` | ||
| 579 | - Increase verbosity of output. For now, this just prints some | ||
| 580 | - indication of any file that it creates. | ||
| 581 | - | ||
| 582 | -:samp:`--progress` | ||
| 583 | - Indicate progress while writing files. | ||
| 584 | - | ||
| 585 | -:samp:`--no-warn` | ||
| 586 | - Suppress writing of warnings to stderr. If warnings were detected and | ||
| 587 | - suppressed, :command:`qpdf` will still exit with exit | ||
| 588 | - code 3. See also :samp:`--warning-exit-0`. | ||
| 589 | - | ||
| 590 | -:samp:`--warning-exit-0` | ||
| 591 | - If warnings are found but no errors, exit with exit code 0 instead 3. | ||
| 592 | - When combined with :samp:`--no-warn`, the effect is | ||
| 593 | - for :command:`qpdf` to completely ignore warnings. | ||
| 594 | - | ||
| 595 | -:samp:`--linearize` | ||
| 596 | - Causes generation of a linearized (web-optimized) output file. | ||
| 597 | - | ||
| 598 | -:samp:`--replace-input` | ||
| 599 | - If specified, the output file name should be omitted. This option | ||
| 600 | - tells qpdf to replace the input file with the output. It does this by | ||
| 601 | - writing to | ||
| 602 | - :file:`{infilename}.~qpdf-temp#` | ||
| 603 | - and, when done, overwriting the input file with the temporary file. | ||
| 604 | - If there were any warnings, the original input is saved as | ||
| 605 | - :file:`{infilename}.~qpdf-orig`. | ||
| 606 | - | ||
| 607 | -:samp:`--copy-encryption=file` | ||
| 608 | - Encrypt the file using the same encryption parameters, including user | ||
| 609 | - and owner password, as the specified file. Use | ||
| 610 | - :samp:`--encryption-file-password` to specify a | ||
| 611 | - password if one is needed to open this file. Note that copying the | ||
| 612 | - encryption parameters from a file also copies the first half of | ||
| 613 | - ``/ID`` from the file since this is part of the encryption | ||
| 614 | - parameters. | ||
| 615 | - | ||
| 616 | -:samp:`--encryption-file-password=password` | ||
| 617 | - If the file specified with :samp:`--copy-encryption` | ||
| 618 | - requires a password, specify the password using this option. Note | ||
| 619 | - that only one of the user or owner password is required. Both | ||
| 620 | - passwords will be preserved since QPDF does not distinguish between | ||
| 621 | - the two passwords. It is possible to preserve encryption parameters, | ||
| 622 | - including the owner password, from a file even if you don't know the | ||
| 623 | - file's owner password. | ||
| 624 | - | ||
| 625 | -:samp:`--allow-weak-crypto` | ||
| 626 | - Starting with version 10.4, qpdf issues warnings when requested to | ||
| 627 | - create files using RC4 encryption. This option suppresses those | ||
| 628 | - warnings. In future versions of qpdf, qpdf will refuse to create | ||
| 629 | - files with weak cryptography when this flag is not given. See :ref:`ref.weak-crypto` for additional details. | ||
| 630 | - | ||
| 631 | -:samp:`--encrypt options --` | ||
| 632 | - Causes generation an encrypted output file. Please see :ref:`ref.encryption-options` for details on how to specify | ||
| 633 | - encryption parameters. | ||
| 634 | - | ||
| 635 | -:samp:`--decrypt` | ||
| 636 | - Removes any encryption on the file. A password must be supplied if | ||
| 637 | - the file is password protected. | ||
| 638 | - | ||
| 639 | -:samp:`--password-is-hex-key` | ||
| 640 | - Overrides the usual computation/retrieval of the PDF file's | ||
| 641 | - encryption key from user/owner password with an explicit | ||
| 642 | - specification of the encryption key. When this option is specified, | ||
| 643 | - the argument to the :samp:`--password` option is | ||
| 644 | - interpreted as a hexadecimal-encoded key value. This only applies to | ||
| 645 | - the password used to open the main input file. It does not apply to | ||
| 646 | - other files opened by :samp:`--pages` or other | ||
| 647 | - options or to files being written. | ||
| 648 | - | ||
| 649 | - Most users will never have a need for this option, and no standard | ||
| 650 | - viewers support this mode of operation, but it can be useful for | ||
| 651 | - forensic or investigatory purposes. For example, if a PDF file is | ||
| 652 | - encrypted with an unknown password, a brute-force attack using the | ||
| 653 | - key directly is sometimes more efficient than one using the password. | ||
| 654 | - Also, if a file is heavily damaged, it may be possible to derive the | ||
| 655 | - encryption key and recover parts of the file using it directly. To | ||
| 656 | - expose the encryption key used by an encrypted file that you can open | ||
| 657 | - normally, use the :samp:`--show-encryption-key` | ||
| 658 | - option. | ||
| 659 | - | ||
| 660 | -:samp:`--suppress-password-recovery` | ||
| 661 | - Ordinarily, qpdf attempts to automatically compensate for passwords | ||
| 662 | - specified in the wrong character encoding. This option suppresses | ||
| 663 | - that behavior. Under normal conditions, there are no reasons to use | ||
| 664 | - this option. See :ref:`ref.unicode-passwords` for a | ||
| 665 | - discussion | ||
| 666 | - | ||
| 667 | -:samp:`--password-mode={mode}` | ||
| 668 | - This option can be used to fine-tune how qpdf interprets Unicode | ||
| 669 | - (non-ASCII) password strings passed on the command line. With the | ||
| 670 | - exception of the :samp:`hex-bytes` mode, these only | ||
| 671 | - apply to passwords provided when encrypting files. The | ||
| 672 | - :samp:`hex-bytes` mode also applies to passwords | ||
| 673 | - specified for reading files. For additional discussion of the | ||
| 674 | - supported password modes and when you might want to use them, see | ||
| 675 | - :ref:`ref.unicode-passwords`. The following modes | ||
| 676 | - are supported: | ||
| 677 | - | ||
| 678 | - - :samp:`auto`: Automatically determine whether the | ||
| 679 | - specified password is a properly encoded Unicode (UTF-8) string, | ||
| 680 | - and transcode it as required by the PDF spec based on the type | ||
| 681 | - encryption being applied. On Windows starting with version 8.4.0, | ||
| 682 | - and on almost all other modern platforms, incoming passwords will | ||
| 683 | - be properly encoded in UTF-8, so this is almost always what you | ||
| 684 | - want. | ||
| 685 | - | ||
| 686 | - - :samp:`unicode`: Tells qpdf that the incoming | ||
| 687 | - password is UTF-8, overriding whatever its automatic detection | ||
| 688 | - determines. The only difference between this mode and | ||
| 689 | - :samp:`auto` is that qpdf will fail with an error | ||
| 690 | - message if the password is not valid UTF-8 instead of falling back | ||
| 691 | - to :samp:`bytes` mode with a warning. | ||
| 692 | - | ||
| 693 | - - :samp:`bytes`: Interpret the password as a literal | ||
| 694 | - byte string. For non-Windows platforms, this is what versions of | ||
| 695 | - qpdf prior to 8.4.0 did. For Windows platforms, there is no way to | ||
| 696 | - specify strings of binary data on the command line directly, but | ||
| 697 | - you can use the :samp:`@filename` option to do it, | ||
| 698 | - in which case this option forces qpdf to respect the string of | ||
| 699 | - bytes as provided. This option will allow you to encrypt PDF files | ||
| 700 | - with passwords that will not be usable by other readers. | ||
| 701 | - | ||
| 702 | - - :samp:`hex-bytes`: Interpret the password as a | ||
| 703 | - hex-encoded string. This provides a way to pass binary data as a | ||
| 704 | - password on all platforms including Windows. As with | ||
| 705 | - :samp:`bytes`, this option may allow creation of | ||
| 706 | - files that can't be opened by other readers. This mode affects | ||
| 707 | - qpdf's interpretation of passwords specified for decrypting files | ||
| 708 | - as well as for encrypting them. It makes it possible to specify | ||
| 709 | - strings that are encoded in some manner other than the system's | ||
| 710 | - default encoding. | ||
| 711 | - | ||
| 712 | -:samp:`--rotate=[+|-]angle[:page-range]` | ||
| 713 | - Apply rotation to specified pages. The | ||
| 714 | - :samp:`page-range` portion of the option value has | ||
| 715 | - the same format as page ranges in :ref:`ref.page-selection`. If the page range is omitted, the | ||
| 716 | - rotation is applied to all pages. The :samp:`angle` | ||
| 717 | - portion of the parameter may be either 0, 90, 180, or 270. If | ||
| 718 | - preceded by :samp:`+` or :samp:`-`, | ||
| 719 | - the angle is added to or subtracted from the specified pages' | ||
| 720 | - original rotations. This is almost always what you want. Otherwise | ||
| 721 | - the pages' rotations are set to the exact value, which may cause the | ||
| 722 | - appearances of the pages to be inconsistent, especially for scans. | ||
| 723 | - For example, the command :command:`qpdf in.pdf out.pdf | ||
| 724 | - --rotate=+90:2,4,6 --rotate=180:7-8` would rotate pages | ||
| 725 | - 2, 4, and 6 90 degrees clockwise from their original rotation and | ||
| 726 | - force the rotation of pages 7 through 8 to 180 degrees regardless of | ||
| 727 | - their original rotation, and the command :command:`qpdf in.pdf | ||
| 728 | - out.pdf --rotate=+180` would rotate all pages by 180 | ||
| 729 | - degrees. | ||
| 730 | - | ||
| 731 | -:samp:`--keep-files-open={[yn]}` | ||
| 732 | - This option controls whether qpdf keeps individual files open while | ||
| 733 | - merging. Prior to version 8.1.0, qpdf always kept all files open, but | ||
| 734 | - this meant that the number of files that could be merged was limited | ||
| 735 | - by the operating system's open file limit. Version 8.1.0 opened files | ||
| 736 | - as they were referenced and closed them after each read, but this | ||
| 737 | - caused a major performance impact. Version 8.2.0 optimized the | ||
| 738 | - performance but did so in a way that, for local file systems, there | ||
| 739 | - was a small but unavoidable performance hit, but for networked file | ||
| 740 | - systems, the performance impact could be very high. Starting with | ||
| 741 | - version 8.2.1, the default behavior is that files are kept open if no | ||
| 742 | - more than 200 files are specified, but this default behavior can be | ||
| 743 | - explicitly overridden with the | ||
| 744 | - :samp:`--keep-files-open` flag. If you are merging | ||
| 745 | - more than 200 files but less than the operating system's max open | ||
| 746 | - files limit, you may want to use | ||
| 747 | - :samp:`--keep-files-open=y`, especially if working | ||
| 748 | - over a networked file system. If you are using a local file system | ||
| 749 | - where the overhead is low and you might sometimes merge more than the | ||
| 750 | - OS limit's number of files from a script and are not worried about a | ||
| 751 | - few seconds additional processing time, you may want to specify | ||
| 752 | - :samp:`--keep-files-open=n`. The threshold for | ||
| 753 | - switching may be changed from the default 200 with the | ||
| 754 | - :samp:`--keep-files-open-threshold` option. | ||
| 755 | - | ||
| 756 | -:samp:`--keep-files-open-threshold={count}` | ||
| 757 | - If specified, overrides the default value of 200 used as the | ||
| 758 | - threshold for qpdf deciding whether or not to keep files open. See | ||
| 759 | - :samp:`--keep-files-open` for details. | ||
| 760 | - | ||
| 761 | -:samp:`--pages options --` | ||
| 762 | - Select specific pages from one or more input files. See :ref:`ref.page-selection` for details on how to do | ||
| 763 | - page selection (splitting and merging). | ||
| 764 | - | ||
| 765 | -:samp:`--collate={n}` | ||
| 766 | - When specified, collate rather than concatenate pages from files | ||
| 767 | - specified with :samp:`--pages`. With a numeric | ||
| 768 | - argument, collate in groups of :samp:`{n}`. | ||
| 769 | - The default is 1. See :ref:`ref.page-selection` for additional details. | ||
| 770 | - | ||
| 771 | -:samp:`--flatten-rotation` | ||
| 772 | - For each page that is rotated using the ``/Rotate`` key in the page's | ||
| 773 | - dictionary, remove the ``/Rotate`` key and implement the identical | ||
| 774 | - rotation semantics by modifying the page's contents. This option can | ||
| 775 | - be useful to prepare files for buggy PDF applications that don't | ||
| 776 | - properly handle rotated pages. | ||
| 777 | - | ||
| 778 | -:samp:`--split-pages=[n]` | ||
| 779 | - Write each group of :samp:`n` pages to a separate | ||
| 780 | - output file. If :samp:`n` is not specified, create | ||
| 781 | - single pages. Output file names are generated as follows: | ||
| 782 | - | ||
| 783 | - - If the string ``%d`` appears in the output file name, it is | ||
| 784 | - replaced with a range of zero-padded page numbers starting from 1. | ||
| 785 | - | ||
| 786 | - - Otherwise, if the output file name ends in | ||
| 787 | - :file:`.pdf` (case insensitive), a zero-padded | ||
| 788 | - page range, preceded by a dash, is inserted before the file | ||
| 789 | - extension. | ||
| 790 | - | ||
| 791 | - - Otherwise, the file name is appended with a zero-padded page range | ||
| 792 | - preceded by a dash. | ||
| 793 | - | ||
| 794 | - Page ranges are a single number in the case of single-page groups or | ||
| 795 | - two numbers separated by a dash otherwise. For example, if | ||
| 796 | - :file:`infile.pdf` has 12 pages | ||
| 797 | - | ||
| 798 | - - :command:`qpdf --split-pages infile.pdf %d-out` | ||
| 799 | - would generate files :file:`01-out` through | ||
| 800 | - :file:`12-out` | ||
| 801 | - | ||
| 802 | - - :command:`qpdf --split-pages=2 infile.pdf | ||
| 803 | - outfile.pdf` would generate files | ||
| 804 | - :file:`outfile-01-02.pdf` through | ||
| 805 | - :file:`outfile-11-12.pdf` | ||
| 806 | - | ||
| 807 | - - :command:`qpdf --split-pages infile.pdf | ||
| 808 | - something.else` would generate files | ||
| 809 | - :file:`something.else-01` through | ||
| 810 | - :file:`something.else-12` | ||
| 811 | - | ||
| 812 | - Note that outlines, threads, and other global features of the | ||
| 813 | - original PDF file are not preserved. For each page of output, this | ||
| 814 | - option creates an empty PDF and copies a single page from the output | ||
| 815 | - into it. If you require the global data, you will have to run | ||
| 816 | - :command:`qpdf` with the | ||
| 817 | - :samp:`--pages` option once for each file. Using | ||
| 818 | - :samp:`--split-pages` is much faster if you don't | ||
| 819 | - require the global data. | ||
| 820 | - | ||
| 821 | -:samp:`--overlay options --` | ||
| 822 | - Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on | ||
| 823 | - overlay/underlay. | ||
| 824 | - | ||
| 825 | -:samp:`--underlay options --` | ||
| 826 | - Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on | ||
| 827 | - overlay/underlay. | ||
| 828 | - | ||
| 829 | -Password-protected files may be opened by specifying a password. By | ||
| 830 | -default, qpdf will preserve any encryption data associated with a file. | ||
| 831 | -If :samp:`--decrypt` is specified, qpdf will attempt to | ||
| 832 | -remove any encryption information. If :samp:`--encrypt` | ||
| 833 | -is specified, qpdf will replace the document's encryption parameters | ||
| 834 | -with whatever is specified. | ||
| 835 | - | ||
| 836 | -Note that qpdf does not obey encryption restrictions already imposed on | ||
| 837 | -the file. Doing so would be meaningless since qpdf can be used to remove | ||
| 838 | -encryption from the file entirely. This functionality is not intended to | ||
| 839 | -be used for bypassing copyright restrictions or other restrictions | ||
| 840 | -placed on files by their producers. | ||
| 841 | - | ||
| 842 | -Prior to 8.4.0, in the case of passwords that contain characters that | ||
| 843 | -fall outside of 7-bit US-ASCII, qpdf left the burden of supplying | ||
| 844 | -properly encoded encryption and decryption passwords to the user. | ||
| 845 | -Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For | ||
| 846 | -an in-depth discussion, please see :ref:`ref.unicode-passwords`. Previous versions of this manual | ||
| 847 | -described workarounds using the :command:`iconv` command. | ||
| 848 | -Such workarounds are no longer required or recommended with qpdf 8.4.0. | ||
| 849 | -However, for backward compatibility, qpdf attempts to detect those | ||
| 850 | -workarounds and do the right thing in most cases. | ||
| 851 | - | ||
| 852 | -.. _ref.encryption-options: | ||
| 853 | - | ||
| 854 | -Encryption Options | ||
| 855 | ------------------- | ||
| 856 | - | ||
| 857 | -To change the encryption parameters of a file, use the --encrypt flag. | ||
| 858 | -The syntax is | ||
| 859 | - | ||
| 860 | -:: | ||
| 861 | - | ||
| 862 | - --encrypt user-password owner-password key-length [ restrictions ] -- | ||
| 863 | - | ||
| 864 | -Note that ":samp:`--`" terminates parsing of encryption | ||
| 865 | -flags and must be present even if no restrictions are present. | ||
| 866 | - | ||
| 867 | -Either or both of the user password and the owner password may be empty | ||
| 868 | -strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation | ||
| 869 | -of PDF files with a non-empty user password, an empty owner password, | ||
| 870 | -and a 256-bit key since such files can be opened with no password. If | ||
| 871 | -you want to create such files, specify the encryption option | ||
| 872 | -:samp:`--allow-insecure`, as described below. | ||
| 873 | - | ||
| 874 | -The value for | ||
| 875 | -:samp:`{key-length}` may | ||
| 876 | -be 40, 128, or 256. The restriction flags are dependent upon key length. | ||
| 877 | -When no additional restrictions are given, the default is to be fully | ||
| 878 | -permissive. | ||
| 879 | - | ||
| 880 | -If :samp:`{key-length}` | ||
| 881 | -is 40, the following restriction options are available: | ||
| 882 | - | ||
| 883 | -:samp:`--print=[yn]` | ||
| 884 | - Determines whether or not to allow printing. | ||
| 885 | - | ||
| 886 | -:samp:`--modify=[yn]` | ||
| 887 | - Determines whether or not to allow document modification. | ||
| 888 | - | ||
| 889 | -:samp:`--extract=[yn]` | ||
| 890 | - Determines whether or not to allow text/image extraction. | ||
| 891 | - | ||
| 892 | -:samp:`--annotate=[yn]` | ||
| 893 | - Determines whether or not to allow comments and form fill-in and | ||
| 894 | - signing. | ||
| 895 | - | ||
| 896 | -If :samp:`{key-length}` | ||
| 897 | -is 128, the following restriction options are available: | ||
| 898 | - | ||
| 899 | -:samp:`--accessibility=[yn]` | ||
| 900 | - Determines whether or not to allow accessibility to visually | ||
| 901 | - impaired. The qpdf library disregards this field when AES is used or | ||
| 902 | - when 256-bit encryption is used. You should really never disable | ||
| 903 | - accessibility, but qpdf lets you do it in case you need to configure | ||
| 904 | - a file this way for testing purposes. The PDF spec says that | ||
| 905 | - conforming readers should disregard this permission and always allow | ||
| 906 | - accessibility. | ||
| 907 | - | ||
| 908 | -:samp:`--extract=[yn]` | ||
| 909 | - Determines whether or not to allow text/graphic extraction. | ||
| 910 | - | ||
| 911 | -:samp:`--assemble=[yn]` | ||
| 912 | - Determines whether document assembly (rotation and reordering of | ||
| 913 | - pages) is allowed. | ||
| 914 | - | ||
| 915 | -:samp:`--annotate=[yn]` | ||
| 916 | - Determines whether modifying annotations is allowed. This includes | ||
| 917 | - adding comments and filling in form fields. Also allows editing of | ||
| 918 | - form fields if :samp:`--modify-other=y` is given. | ||
| 919 | - | ||
| 920 | -:samp:`--form=[yn]` | ||
| 921 | - Determines whether filling form fields is allowed. | ||
| 922 | - | ||
| 923 | -:samp:`--modify-other=[yn]` | ||
| 924 | - Allow all document editing except those controlled separately by the | ||
| 925 | - :samp:`--assemble`, | ||
| 926 | - :samp:`--annotate`, and | ||
| 927 | - :samp:`--form` options. | ||
| 928 | - | ||
| 929 | -:samp:`--print={print-opt}` | ||
| 930 | - Controls printing access. | ||
| 931 | - :samp:`{print-opt}` | ||
| 932 | - may be one of the following: | ||
| 933 | - | ||
| 934 | - - :samp:`full`: allow full printing | ||
| 935 | - | ||
| 936 | - - :samp:`low`: allow low-resolution printing only | ||
| 937 | - | ||
| 938 | - - :samp:`none`: disallow printing | ||
| 939 | - | ||
| 940 | -:samp:`--modify={modify-opt}` | ||
| 941 | - Controls modify access. This way of controlling modify access has | ||
| 942 | - less granularity than new options added in qpdf 8.4. | ||
| 943 | - :samp:`{modify-opt}` | ||
| 944 | - may be one of the following: | ||
| 945 | - | ||
| 946 | - - :samp:`all`: allow full document modification | ||
| 947 | - | ||
| 948 | - - :samp:`annotate`: allow comment authoring, form | ||
| 949 | - operations, and document assembly | ||
| 950 | - | ||
| 951 | - - :samp:`form`: allow form field fill-in and signing | ||
| 952 | - and document assembly | ||
| 953 | - | ||
| 954 | - - :samp:`assembly`: allow document assembly only | ||
| 955 | - | ||
| 956 | - - :samp:`none`: allow no modifications | ||
| 957 | - | ||
| 958 | - Using the :samp:`--modify` option does not allow you | ||
| 959 | - to create certain combinations of permissions such as allowing form | ||
| 960 | - filling but not allowing document assembly. Starting with qpdf 8.4, | ||
| 961 | - you can either just use the other options to control fields | ||
| 962 | - individually, or you can use something like :samp:`--modify=form | ||
| 963 | - --assembly=n` to fine tune. | ||
| 964 | - | ||
| 965 | -:samp:`--cleartext-metadata` | ||
| 966 | - If specified, any metadata stream in the document will be left | ||
| 967 | - unencrypted even if the rest of the document is encrypted. This also | ||
| 968 | - forces the PDF version to be at least 1.5. | ||
| 969 | - | ||
| 970 | -:samp:`--use-aes=[yn]` | ||
| 971 | - If :samp:`--use-aes=y` is specified, AES encryption | ||
| 972 | - will be used instead of RC4 encryption. This forces the PDF version | ||
| 973 | - to be at least 1.6. | ||
| 974 | - | ||
| 975 | -:samp:`--allow-insecure` | ||
| 976 | - From qpdf 10.2, qpdf defaults to not allowing creation of PDF files | ||
| 977 | - where the user password is non-empty, the owner password is empty, | ||
| 978 | - and a 256-bit key is in use. Files created in this way are insecure | ||
| 979 | - since they can be opened without a password. Users would ordinarily | ||
| 980 | - never want to create such files. If you are using qpdf to | ||
| 981 | - intentionally created strange files for testing (a definite valid use | ||
| 982 | - of qpdf!), this option allows you to create such insecure files. | ||
| 983 | - | ||
| 984 | -:samp:`--force-V4` | ||
| 985 | - Use of this option forces the ``/V`` and ``/R`` parameters in the | ||
| 986 | - document's encryption dictionary to be set to the value ``4``. As | ||
| 987 | - qpdf will automatically do this when required, there is no reason to | ||
| 988 | - ever use this option. It exists primarily for use in testing qpdf | ||
| 989 | - itself. This option also forces the PDF version to be at least 1.5. | ||
| 990 | - | ||
| 991 | -If :samp:`{key-length}` | ||
| 992 | -is 256, the minimum PDF version is 1.7 with extension level 8, and the | ||
| 993 | -AES-based encryption format used is the PDF 2.0 encryption method | ||
| 994 | -supported by Acrobat X. the same options are available as with 128 bits | ||
| 995 | -with the following exceptions: | ||
| 996 | - | ||
| 997 | -:samp:`--use-aes` | ||
| 998 | - This option is not available with 256-bit keys. AES is always used | ||
| 999 | - with 256-bit encryption keys. | ||
| 1000 | - | ||
| 1001 | -:samp:`--force-V4` | ||
| 1002 | - This option is not available with 256 keys. | ||
| 1003 | - | ||
| 1004 | -:samp:`--force-R5` | ||
| 1005 | - If specified, qpdf sets the minimum version to 1.7 at extension level | ||
| 1006 | - 3 and writes the deprecated encryption format used by Acrobat version | ||
| 1007 | - IX. This option should not be used in practice to generate PDF files | ||
| 1008 | - that will be in general use, but it can be useful to generate files | ||
| 1009 | - if you are trying to test proper support in another application for | ||
| 1010 | - PDF files encrypted in this way. | ||
| 1011 | - | ||
| 1012 | -The default for each permission option is to be fully permissive. | ||
| 1013 | - | ||
| 1014 | -.. _ref.page-selection: | ||
| 1015 | - | ||
| 1016 | -Page Selection Options | ||
| 1017 | ----------------------- | ||
| 1018 | - | ||
| 1019 | -Starting with qpdf 3.0, it is possible to split and merge PDF files by | ||
| 1020 | -selecting pages from one or more input files. Whatever file is given as | ||
| 1021 | -the primary input file is used as the starting point, but its pages are | ||
| 1022 | -replaced with pages as specified. | ||
| 1023 | - | ||
| 1024 | -:: | ||
| 1025 | - | ||
| 1026 | - --pages input-file [ --password=password ] [ page-range ] [ ... ] -- | ||
| 1027 | - | ||
| 1028 | -Multiple input files may be specified. Each one is given as the name of | ||
| 1029 | -the input file, an optional password (if required to open the file), and | ||
| 1030 | -the range of pages. Note that ":samp:`--`" terminates | ||
| 1031 | -parsing of page selection flags. | ||
| 1032 | - | ||
| 1033 | -Starting with qpf 8.4, the special input file name | ||
| 1034 | -":file:`.`" can be used as a shortcut for the | ||
| 1035 | -primary input filename. | ||
| 1036 | - | ||
| 1037 | -For each file that pages should be taken from, specify the file, a | ||
| 1038 | -password needed to open the file (if any), and a page range. The | ||
| 1039 | -password needs to be given only once per file. If any of the input files | ||
| 1040 | -are the same as the primary input file or the file used to copy | ||
| 1041 | -encryption parameters (if specified), you do not need to repeat the | ||
| 1042 | -password here. The same file can be repeated multiple times. If a file | ||
| 1043 | -that is repeated has a password, the password only has to be given the | ||
| 1044 | -first time. All non-page data (info, outlines, page numbers, etc.) are | ||
| 1045 | -taken from the primary input file. To discard these, use | ||
| 1046 | -:samp:`--empty` as the primary input. | ||
| 1047 | - | ||
| 1048 | -Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf | ||
| 1049 | -sees a value in the place where it expects a page range and that value | ||
| 1050 | -is not a valid range but is a valid file name, qpdf will implicitly use | ||
| 1051 | -the range ``1-z``, meaning that it will include all pages in the file. | ||
| 1052 | -This makes it possible to easily combine all pages in a set of files | ||
| 1053 | -with a command like :command:`qpdf --empty out.pdf --pages \*.pdf | ||
| 1054 | ---`. | ||
| 1055 | - | ||
| 1056 | -The page range is a set of numbers separated by commas, ranges of | ||
| 1057 | -numbers separated dashes, or combinations of those. The character "z" | ||
| 1058 | -represents the last page. A number preceded by an "r" indicates to count | ||
| 1059 | -from the end, so ``r3-r1`` would be the last three pages of the | ||
| 1060 | -document. Pages can appear in any order. Ranges can appear with a high | ||
| 1061 | -number followed by a low number, which causes the pages to appear in | ||
| 1062 | -reverse. Numbers may be repeated in a page range. A page range may be | ||
| 1063 | -optionally appended with ``:even`` or ``:odd`` to indicate only the even | ||
| 1064 | -or odd pages in the given range. Note that even and odd refer to the | ||
| 1065 | -positions within the specified, range, not whether the original number | ||
| 1066 | -is even or odd. | ||
| 1067 | - | ||
| 1068 | -Example page ranges: | ||
| 1069 | - | ||
| 1070 | -- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in | ||
| 1071 | - that order. | ||
| 1072 | - | ||
| 1073 | -- ``z-1``: all pages in the document in reverse | ||
| 1074 | - | ||
| 1075 | -- ``r3-r1``: the last three pages of the document | ||
| 1076 | - | ||
| 1077 | -- ``r1-r3``: the last three pages of the document in reverse order | ||
| 1078 | - | ||
| 1079 | -- ``1-20:even``: even pages from 2 to 20 | ||
| 1080 | - | ||
| 1081 | -- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd | ||
| 1082 | - positions from among the original range, which represents pages 5, 7, | ||
| 1083 | - 8, 9, and 12. | ||
| 1084 | - | ||
| 1085 | -Starting in qpdf version 8.3, you can specify the | ||
| 1086 | -:samp:`--collate` option. Note that this option is | ||
| 1087 | -specified outside of :samp:`--pagesย ...ย --`. When | ||
| 1088 | -:samp:`--collate` is specified, it changes the meaning | ||
| 1089 | -of :samp:`--pages` so that the specified files, as | ||
| 1090 | -modified by page ranges, are collated rather than concatenated. For | ||
| 1091 | -example, if you add the files :file:`odd.pdf` and | ||
| 1092 | -:file:`even.pdf` containing odd and even pages of a | ||
| 1093 | -document respectively, you could run :command:`qpdf --collate odd.pdf | ||
| 1094 | ---pages odd.pdf even.pdf -- all.pdf` to collate the pages. | ||
| 1095 | -This would pick page 1 from odd, page 1 from even, page 2 from odd, page | ||
| 1096 | -2 from even, etc. until all pages have been included. Any number of | ||
| 1097 | -files and page ranges can be specified. If any file has fewer pages, | ||
| 1098 | -that file is just skipped when its pages have all been included. For | ||
| 1099 | -example, if you ran :command:`qpdf --collate --empty --pages a.pdf | ||
| 1100 | -1-5 b.pdf 6-4 c.pdf r1 -- out.pdf`, you would get the | ||
| 1101 | -following pages in this order: | ||
| 1102 | - | ||
| 1103 | -- a.pdf page 1 | ||
| 1104 | - | ||
| 1105 | -- b.pdf page 6 | ||
| 1106 | - | ||
| 1107 | -- c.pdf last page | ||
| 1108 | - | ||
| 1109 | -- a.pdf page 2 | ||
| 1110 | - | ||
| 1111 | -- b.pdf page 5 | ||
| 1112 | - | ||
| 1113 | -- a.pdf page 3 | ||
| 1114 | - | ||
| 1115 | -- b.pdf page 4 | ||
| 1116 | - | ||
| 1117 | -- a.pdf page 4 | ||
| 1118 | - | ||
| 1119 | -- a.pdf page 5 | ||
| 1120 | - | ||
| 1121 | -Starting in qpdf version 10.2, you may specify a numeric argument to | ||
| 1122 | -:samp:`--collate`. With | ||
| 1123 | -:samp:`--collate={n}`, | ||
| 1124 | -pull groups of :samp:`{n}` pages from each file, | ||
| 1125 | -again, stopping when there are no more pages. For example, if you ran | ||
| 1126 | -:command:`qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf | ||
| 1127 | -r1 -- out.pdf`, you would get the following pages in this | ||
| 1128 | -order: | ||
| 1129 | - | ||
| 1130 | -- a.pdf page 1 | ||
| 1131 | - | ||
| 1132 | -- a.pdf page 2 | ||
| 1133 | - | ||
| 1134 | -- b.pdf page 6 | ||
| 1135 | - | ||
| 1136 | -- b.pdf page 5 | ||
| 1137 | - | ||
| 1138 | -- c.pdf last page | ||
| 1139 | - | ||
| 1140 | -- a.pdf page 3 | ||
| 1141 | - | ||
| 1142 | -- a.pdf page 4 | ||
| 1143 | - | ||
| 1144 | -- b.pdf page 4 | ||
| 1145 | - | ||
| 1146 | -- a.pdf page 5 | ||
| 1147 | - | ||
| 1148 | -Starting in qpdf version 8.3, when you split and merge files, any page | ||
| 1149 | -labels (page numbers) are preserved in the final file. It is expected | ||
| 1150 | -that more document features will be preserved by splitting and merging. | ||
| 1151 | -In the mean time, semantics of splitting and merging vary across | ||
| 1152 | -features. For example, the document's outlines (bookmarks) point to | ||
| 1153 | -actual page objects, so if you select some pages and not others, | ||
| 1154 | -bookmarks that point to pages that are in the output file will work, and | ||
| 1155 | -remaining bookmarks will not work. A future version of | ||
| 1156 | -:command:`qpdf` may do a better job at handling these | ||
| 1157 | -issues. (Note that the qpdf library already contains all of the APIs | ||
| 1158 | -required in order to implement this in your own application if you need | ||
| 1159 | -it.) In the mean time, you can always use | ||
| 1160 | -:samp:`--empty` as the primary input file to avoid | ||
| 1161 | -copying all of that from the first file. For example, to take pages 1 | ||
| 1162 | -through 5 from a :file:`infile.pdf` while preserving | ||
| 1163 | -all metadata associated with that file, you could use | ||
| 1164 | - | ||
| 1165 | -:: | ||
| 1166 | - | ||
| 1167 | - qpdf infile.pdf --pages . 1-5 -- outfile.pdf | ||
| 1168 | - | ||
| 1169 | -If you wanted pages 1 through 5 from | ||
| 1170 | -:file:`infile.pdf` but you wanted the rest of the | ||
| 1171 | -metadata to be dropped, you could instead run | ||
| 1172 | - | ||
| 1173 | -:: | ||
| 1174 | - | ||
| 1175 | - qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf | ||
| 1176 | - | ||
| 1177 | -If you wanted to take pages 1 through 5 from | ||
| 1178 | -:file:`file1.pdf` and pages 11 through 15 from | ||
| 1179 | -:file:`file2.pdf` in reverse, taking document-level | ||
| 1180 | -metadata from :file:`file2.pdf`, you would run | ||
| 1181 | - | ||
| 1182 | -:: | ||
| 1183 | - | ||
| 1184 | - qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf | ||
| 1185 | - | ||
| 1186 | -If, for some reason, you wanted to take the first page of an encrypted | ||
| 1187 | -file called :file:`encrypted.pdf` with password | ||
| 1188 | -``pass`` and repeat it twice in an output file, and if you wanted to | ||
| 1189 | -drop document-level metadata but preserve encryption, you would use | ||
| 1190 | - | ||
| 1191 | -:: | ||
| 1192 | - | ||
| 1193 | - qpdf --empty --copy-encryption=encrypted.pdf --encryption-file-password=pass | ||
| 1194 | - --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 -- | ||
| 1195 | - outfile.pdf | ||
| 1196 | - | ||
| 1197 | -Note that we had to specify the password all three times because giving | ||
| 1198 | -a password as :samp:`--encryption-file-password` doesn't | ||
| 1199 | -count for page selection, and as far as qpdf is concerned, | ||
| 1200 | -:file:`encrypted.pdf` and | ||
| 1201 | -:file:`./encrypted.pdf` are separated files. These | ||
| 1202 | -are all corner cases that most users should hopefully never have to be | ||
| 1203 | -bothered with. | ||
| 1204 | - | ||
| 1205 | -Prior to version 8.4, it was not possible to specify the same page from | ||
| 1206 | -the same file directly more than once, and the workaround of specifying | ||
| 1207 | -the same file in more than one way was required. Version 8.4 removes | ||
| 1208 | -this limitation, but there is still a valid use case. When you specify | ||
| 1209 | -the same page from the same file more than once, qpdf will share objects | ||
| 1210 | -between the pages. If you are going to do further manipulation on the | ||
| 1211 | -file and need the two instances of the same original page to be deep | ||
| 1212 | -copies, then you can specify the file in two different ways. For example | ||
| 1213 | -:command:`qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf` | ||
| 1214 | -would create a file with two copies of the first page of the input, and | ||
| 1215 | -the two copies would share any objects in common. This includes fonts, | ||
| 1216 | -images, and anything else the page references. | ||
| 1217 | - | ||
| 1218 | -.. _ref.overlay-underlay: | ||
| 1219 | - | ||
| 1220 | -Overlay and Underlay Options | ||
| 1221 | ----------------------------- | ||
| 1222 | - | ||
| 1223 | -Starting with qpdf 8.4, it is possible to overlay or underlay pages from | ||
| 1224 | -other files onto the output generated by qpdf. Specify overlay or | ||
| 1225 | -underlay as follows: | ||
| 1226 | - | ||
| 1227 | -:: | ||
| 1228 | - | ||
| 1229 | - { --overlay | --underlay } file [ options ] -- | ||
| 1230 | - | ||
| 1231 | -Overlay and underlay options are processed late, so they can be combined | ||
| 1232 | -with other like merging and will apply to the final output. The | ||
| 1233 | -:samp:`--overlay` and :samp:`--underlay` | ||
| 1234 | -options work the same way, except underlay pages are drawn underneath | ||
| 1235 | -the page to which they are applied, possibly obscured by the original | ||
| 1236 | -page, and overlay files are drawn on top of the page to which they are | ||
| 1237 | -applied, possibly obscuring the page. You can combine overlay and | ||
| 1238 | -underlay. | ||
| 1239 | - | ||
| 1240 | -The default behavior of overlay and underlay is that pages are taken | ||
| 1241 | -from the overlay/underlay file in sequence and applied to corresponding | ||
| 1242 | -pages in the output until there are no more output pages. If the overlay | ||
| 1243 | -or underlay file runs out of pages, remaining output pages are left | ||
| 1244 | -alone. This behavior can be modified by options, which are provided | ||
| 1245 | -between the :samp:`--overlay` or | ||
| 1246 | -:samp:`--underlay` flag and the | ||
| 1247 | -:samp:`--` option. The following options are supported: | ||
| 1248 | - | ||
| 1249 | -- :samp:`--password=password`: supply a password if the | ||
| 1250 | - overlay/underlay file is encrypted. | ||
| 1251 | - | ||
| 1252 | -- :samp:`--to=page-range`: a range of pages in the same | ||
| 1253 | - form at described in :ref:`ref.page-selection` | ||
| 1254 | - indicates which pages in the output should have the overlay/underlay | ||
| 1255 | - applied. If not specified, overlay/underlay are applied to all pages. | ||
| 1256 | - | ||
| 1257 | -- :samp:`--from=[page-range]`: a range of pages that | ||
| 1258 | - specifies which pages in the overlay/underlay file will be used for | ||
| 1259 | - overlay or underlay. If not specified, all pages will be used. This | ||
| 1260 | - can be explicitly specified to be empty if | ||
| 1261 | - :samp:`--repeat` is used. | ||
| 1262 | - | ||
| 1263 | -- :samp:`--repeat=page-range`: an optional range of | ||
| 1264 | - pages that specifies which pages in the overlay/underlay file will be | ||
| 1265 | - repeated after the "from" pages are used up. If you want to repeat a | ||
| 1266 | - range of pages starting at the beginning, you can explicitly use | ||
| 1267 | - :samp:`--from=`. | ||
| 1268 | - | ||
| 1269 | -Here are some examples. | ||
| 1270 | - | ||
| 1271 | -- :command:`--overlay o.pdf --to=1-5 --from=1-3 --repeat=4 | ||
| 1272 | - --`: overlay the first three pages from file | ||
| 1273 | - :file:`o.pdf` onto the first three pages of the | ||
| 1274 | - output, then overlay page 4 from :file:`o.pdf` | ||
| 1275 | - onto pages 4 and 5 of the output. Leave remaining output pages | ||
| 1276 | - untouched. | ||
| 1277 | - | ||
| 1278 | -- :command:`--underlay footer.pdf --from= --repeat=1,2 | ||
| 1279 | - --`: Underlay page 1 of | ||
| 1280 | - :file:`footer.pdf` on all odd output pages, and | ||
| 1281 | - underlay page 2 of :file:`footer.pdf` on all even | ||
| 1282 | - output pages. | ||
| 1283 | - | ||
| 1284 | -.. _ref.attachments: | ||
| 1285 | - | ||
| 1286 | -Embedded Files/Attachments Options | ||
| 1287 | ----------------------------------- | ||
| 1288 | - | ||
| 1289 | -Starting with qpdf 10.2, you can work with file attachments in PDF files | ||
| 1290 | -from the command line. The following options are available: | ||
| 1291 | - | ||
| 1292 | -:samp:`--list-attachments` | ||
| 1293 | - Show the "key" and stream number for embedded files. With | ||
| 1294 | - :samp:`--verbose`, additional information, including | ||
| 1295 | - preferred file name, description, dates, and more are also displayed. | ||
| 1296 | - The key is usually but not always equal to the file name, and is | ||
| 1297 | - needed by some of the other options. | ||
| 1298 | - | ||
| 1299 | -:samp:`--show-attachment={key}` | ||
| 1300 | - Write the contents of the specified attachment to standard output as | ||
| 1301 | - binary data. The key should match one of the keys shown by | ||
| 1302 | - :samp:`--list-attachments`. If specified multiple | ||
| 1303 | - times, only the last attachment will be shown. | ||
| 1304 | - | ||
| 1305 | -:samp:`--add-attachment {file} {options} --` | ||
| 1306 | - Add or replace an attachment with the contents of | ||
| 1307 | - :samp:`{file}`. This may be specified more | ||
| 1308 | - than once. The following additional options may appear before the | ||
| 1309 | - ``--`` that ends this option: | ||
| 1310 | - | ||
| 1311 | - :samp:`--key={key}` | ||
| 1312 | - The key to use to register the attachment in the embedded files | ||
| 1313 | - table. Defaults to the last path element of | ||
| 1314 | - :samp:`{file}`. | ||
| 1315 | - | ||
| 1316 | - :samp:`--filename={name}` | ||
| 1317 | - The file name to be used for the attachment. This is what is | ||
| 1318 | - usually displayed to the user and is the name most graphical PDF | ||
| 1319 | - viewers will use when saving a file. It defaults to the last path | ||
| 1320 | - element of :samp:`{file}`. | ||
| 1321 | - | ||
| 1322 | - :samp:`--creationdate={date}` | ||
| 1323 | - The attachment's creation date in PDF format; defaults to the | ||
| 1324 | - current time. The date format is explained below. | ||
| 1325 | - | ||
| 1326 | - :samp:`--moddate={date}` | ||
| 1327 | - The attachment's modification date in PDF format; defaults to the | ||
| 1328 | - current time. The date format is explained below. | ||
| 1329 | - | ||
| 1330 | - :samp:`--mimetype={type/subtype}` | ||
| 1331 | - The mime type for the attachment, e.g. ``text/plain`` or | ||
| 1332 | - ``application/pdf``. Note that the mimetype appears in a field | ||
| 1333 | - called ``/Subtype`` in the PDF but actually includes the full type | ||
| 1334 | - and subtype of the mime type. | ||
| 1335 | - | ||
| 1336 | - :samp:`--description={"text"}` | ||
| 1337 | - Descriptive text for the attachment, displayed by some PDF | ||
| 1338 | - viewers. | ||
| 1339 | - | ||
| 1340 | - :samp:`--replace` | ||
| 1341 | - Indicates that any existing attachment with the same key should be | ||
| 1342 | - replaced by the new attachment. Otherwise, | ||
| 1343 | - :command:`qpdf` gives an error if an attachment | ||
| 1344 | - with that key is already present. | ||
| 1345 | - | ||
| 1346 | -:samp:`--remove-attachment={key}` | ||
| 1347 | - Remove the specified attachment. This doesn't only remove the | ||
| 1348 | - attachment from the embedded files table but also clears out the file | ||
| 1349 | - specification. That means that any potential internal links to the | ||
| 1350 | - attachment will be broken. This option may be specified multiple | ||
| 1351 | - times. Run with :samp:`--verbose` to see status of | ||
| 1352 | - the removal. | ||
| 1353 | - | ||
| 1354 | -:samp:`--copy-attachments-from {file} {options} --` | ||
| 1355 | - Copy attachments from another file. This may be specified more than | ||
| 1356 | - once. The following additional options may appear before the ``--`` | ||
| 1357 | - that ends this option: | ||
| 1358 | - | ||
| 1359 | - :samp:`--password={password}` | ||
| 1360 | - If required, the password needed to open | ||
| 1361 | - :samp:`{file}` | ||
| 1362 | - | ||
| 1363 | - :samp:`--prefix={prefix}` | ||
| 1364 | - Only required if the file from which attachments are being copied | ||
| 1365 | - has attachments with keys that conflict with attachments already | ||
| 1366 | - in the file. In this case, the specified prefix will be prepended | ||
| 1367 | - to each key. This affects only the key in the embedded files | ||
| 1368 | - table, not the file name. The PDF specification doesn't preclude | ||
| 1369 | - multiple attachments having the same file name. | ||
| 1370 | - | ||
| 1371 | -When a date is required, the date should conform to the PDF date format | ||
| 1372 | -specification, which is | ||
| 1373 | -``D:``\ :samp:`{yyyymmddhhmmss<z>}`, where | ||
| 1374 | -:samp:`{<z>}` is either ``Z`` for UTC or a | ||
| 1375 | -timezone offset in the form :samp:`{-hh'mm'}` or | ||
| 1376 | -:samp:`{+hh'mm'}`. Examples: | ||
| 1377 | -``D:20210207161528-05'00'``, ``D:20210207211528Z``. | ||
| 1378 | - | ||
| 1379 | -.. _ref.advanced-parsing: | ||
| 1380 | - | ||
| 1381 | -Advanced Parsing Options | ||
| 1382 | ------------------------- | ||
| 1383 | - | ||
| 1384 | -These options control aspects of how qpdf reads PDF files. Mostly these | ||
| 1385 | -are of use to people who are working with damaged files. There is little | ||
| 1386 | -reason to use these options unless you are trying to solve specific | ||
| 1387 | -problems. The following options are available: | ||
| 1388 | - | ||
| 1389 | -:samp:`--suppress-recovery` | ||
| 1390 | - Prevents qpdf from attempting to recover damaged files. | ||
| 1391 | - | ||
| 1392 | -:samp:`--ignore-xref-streams` | ||
| 1393 | - Tells qpdf to ignore any cross-reference streams. | ||
| 1394 | - | ||
| 1395 | -Ordinarily, qpdf will attempt to recover from certain types of errors in | ||
| 1396 | -PDF files. These include errors in the cross-reference table, certain | ||
| 1397 | -types of object numbering errors, and certain types of stream length | ||
| 1398 | -errors. Sometimes, qpdf may think it has recovered but may not have | ||
| 1399 | -actually recovered, so care should be taken when using this option as | ||
| 1400 | -some data loss is possible. The | ||
| 1401 | -:samp:`--suppress-recovery` option will prevent qpdf | ||
| 1402 | -from attempting recovery. In this case, it will fail on the first error | ||
| 1403 | -that it encounters. | ||
| 1404 | - | ||
| 1405 | -Ordinarily, qpdf reads cross-reference streams when they are present in | ||
| 1406 | -a PDF file. If :samp:`--ignore-xref-streams` is | ||
| 1407 | -specified, qpdf will ignore any cross-reference streams for hybrid PDF | ||
| 1408 | -files. The purpose of hybrid files is to make some content available to | ||
| 1409 | -viewers that are not aware of cross-reference streams. It is almost | ||
| 1410 | -never desirable to ignore them. The only time when you might want to use | ||
| 1411 | -this feature is if you are testing creation of hybrid PDF files and wish | ||
| 1412 | -to see how a PDF consumer that doesn't understand object and | ||
| 1413 | -cross-reference streams would interpret such a file. | ||
| 1414 | - | ||
| 1415 | -.. _ref.advanced-transformation: | ||
| 1416 | - | ||
| 1417 | -Advanced Transformation Options | ||
| 1418 | -------------------------------- | ||
| 1419 | - | ||
| 1420 | -These transformation options control fine points of how qpdf creates the | ||
| 1421 | -output file. Mostly these are of use only to people who are very | ||
| 1422 | -familiar with the PDF file format or who are PDF developers. The | ||
| 1423 | -following options are available: | ||
| 1424 | - | ||
| 1425 | -:samp:`--compress-streams={[yn]}` | ||
| 1426 | - By default, or with :samp:`--compress-streams=y`, | ||
| 1427 | - qpdf will compress any stream with no other filters applied to it | ||
| 1428 | - with the ``/FlateDecode`` filter when it writes it. To suppress this | ||
| 1429 | - behavior and preserve uncompressed streams as uncompressed, use | ||
| 1430 | - :samp:`--compress-streams=n`. | ||
| 1431 | - | ||
| 1432 | -:samp:`--decode-level={option}` | ||
| 1433 | - Controls which streams qpdf tries to decode. The default is | ||
| 1434 | - :samp:`generalized`. The following options are | ||
| 1435 | - available: | ||
| 1436 | - | ||
| 1437 | - - :samp:`none`: do not attempt to decode any streams | ||
| 1438 | - | ||
| 1439 | - - :samp:`generalized`: decode streams filtered with | ||
| 1440 | - supported generalized filters: ``/LZWDecode``, ``/FlateDecode``, | ||
| 1441 | - ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized | ||
| 1442 | - filters as those to be used for general-purpose compression or | ||
| 1443 | - encoding, as opposed to filters specifically designed for image | ||
| 1444 | - data. Note that, by default, streams already compressed with | ||
| 1445 | - ``/FlateDecode`` are not uncompressed and recompressed unless you | ||
| 1446 | - also specify :samp:`--recompress-flate`. | ||
| 1447 | - | ||
| 1448 | - - :samp:`specialized`: in addition to generalized, | ||
| 1449 | - decode streams with supported non-lossy specialized filters; | ||
| 1450 | - currently this is just ``/RunLengthDecode`` | ||
| 1451 | - | ||
| 1452 | - - :samp:`all`: in addition to generalized and | ||
| 1453 | - specialized, decode streams with supported lossy filters; | ||
| 1454 | - currently this is just ``/DCTDecode`` (JPEG) | ||
| 1455 | - | ||
| 1456 | -:samp:`--stream-data={option}` | ||
| 1457 | - Controls transformation of stream data. This option predates the | ||
| 1458 | - :samp:`--compress-streams` and | ||
| 1459 | - :samp:`--decode-level` options. Those options can be | ||
| 1460 | - used to achieve the same affect with more control. The value of | ||
| 1461 | - :samp:`{option}` may | ||
| 1462 | - be one of the following: | ||
| 1463 | - | ||
| 1464 | - - :samp:`compress`: recompress stream data when | ||
| 1465 | - possible (default); equivalent to | ||
| 1466 | - :samp:`--compress-streams=y` | ||
| 1467 | - :samp:`--decode-level=generalized`. Does not | ||
| 1468 | - recompress streams already compressed with ``/FlateDecode`` unless | ||
| 1469 | - :samp:`--recompress-flate` is also specified. | ||
| 1470 | - | ||
| 1471 | - - :samp:`preserve`: leave all stream data as is; | ||
| 1472 | - equivalent to :samp:`--compress-streams=n` | ||
| 1473 | - :samp:`--decode-level=none` | ||
| 1474 | - | ||
| 1475 | - - :samp:`uncompress`: uncompress stream data | ||
| 1476 | - compressed with generalized filters when possible; equivalent to | ||
| 1477 | - :samp:`--compress-streams=n` | ||
| 1478 | - :samp:`--decode-level=generalized` | ||
| 1479 | - | ||
| 1480 | -:samp:`--recompress-flate` | ||
| 1481 | - By default, streams already compressed with ``/FlateDecode`` are left | ||
| 1482 | - alone rather than being uncompressed and recompressed. This option | ||
| 1483 | - causes qpdf to uncompress and recompress the streams. There is a | ||
| 1484 | - significant performance cost to using this option, but you probably | ||
| 1485 | - want to use it if you specify | ||
| 1486 | - :samp:`--compression-level`. | ||
| 1487 | - | ||
| 1488 | -:samp:`--compression-level={level}` | ||
| 1489 | - When writing new streams that are compressed with ``/FlateDecode``, | ||
| 1490 | - use the specified compression level. The value of | ||
| 1491 | - :samp:`level` should be a number from 1 to 9 and is | ||
| 1492 | - passed directly to zlib, which implements deflate compression. Note | ||
| 1493 | - that qpdf doesn't uncompress and recompress streams by default. To | ||
| 1494 | - have this option apply to already compressed streams, you should also | ||
| 1495 | - specify :samp:`--recompress-flate`. If your goal is | ||
| 1496 | - to shrink the size of PDF files, you should also use | ||
| 1497 | - :samp:`--object-streams=generate`. | ||
| 1498 | - | ||
| 1499 | -:samp:`--normalize-content=[yn]` | ||
| 1500 | - Enables or disables normalization of content streams. Content | ||
| 1501 | - normalization is enabled by default in QDF mode. Please see :ref:`ref.qdf` for additional discussion of QDF mode. | ||
| 1502 | - | ||
| 1503 | -:samp:`--object-streams={mode}` | ||
| 1504 | - Controls handling of object streams. The value of | ||
| 1505 | - :samp:`{mode}` may be | ||
| 1506 | - one of the following: | ||
| 1507 | - | ||
| 1508 | - - :samp:`preserve`: preserve original object streams | ||
| 1509 | - (default) | ||
| 1510 | - | ||
| 1511 | - - :samp:`disable`: don't write any object streams | ||
| 1512 | - | ||
| 1513 | - - :samp:`generate`: use object streams wherever | ||
| 1514 | - possible | ||
| 1515 | - | ||
| 1516 | -:samp:`--preserve-unreferenced` | ||
| 1517 | - Tells qpdf to preserve objects that are not referenced when writing | ||
| 1518 | - the file. Ordinarily any object that is not referenced in a traversal | ||
| 1519 | - of the document from the trailer dictionary will be discarded. This | ||
| 1520 | - may be useful in working with some damaged files or inspecting files | ||
| 1521 | - with known unreferenced objects. | ||
| 1522 | - | ||
| 1523 | - This flag is ignored for linearized files and has the effect of | ||
| 1524 | - causing objects in the new file to be written in order by object ID | ||
| 1525 | - from the original file. This does not mean that object numbers will | ||
| 1526 | - be the same since qpdf may create stream lengths as direct or | ||
| 1527 | - indirect differently from the original file, and the original file | ||
| 1528 | - may have gaps in its numbering. | ||
| 1529 | - | ||
| 1530 | - See also :samp:`--preserve-unreferenced-resources`, | ||
| 1531 | - which does something completely different. | ||
| 1532 | - | ||
| 1533 | -:samp:`--remove-unreferenced-resources={option}` | ||
| 1534 | - The :samp:`{option}` may be ``auto``, | ||
| 1535 | - ``yes``, or ``no``. The default is ``auto``. | ||
| 1536 | - | ||
| 1537 | - Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt | ||
| 1538 | - to remove images and fonts that are not used by a page even if they | ||
| 1539 | - are referenced in the page's resources dictionary. When shared | ||
| 1540 | - resources are in use, this behavior can greatly reduce the file sizes | ||
| 1541 | - of split pages, but the analysis is very slow. In versions from 8.1 | ||
| 1542 | - through 9.1.1, qpdf did this analysis by default. Starting in qpdf | ||
| 1543 | - 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file | ||
| 1544 | - to determine whether the file is likely to have unreferenced objects | ||
| 1545 | - on pages, a pattern that frequently occurs when resource dictionaries | ||
| 1546 | - are shared across multiple pages and rarely occurs otherwise. If it | ||
| 1547 | - discovers this pattern, then it will attempt to remove unreferenced | ||
| 1548 | - resources. Usually this means you get the slower splitting speed only | ||
| 1549 | - when it's actually going to create smaller files. You can suppress | ||
| 1550 | - removal of unreferenced resources altogether by specifying ``no`` or | ||
| 1551 | - force it to do the full algorithm by specifying ``yes``. | ||
| 1552 | - | ||
| 1553 | - Other than cases in which you don't care about file size and care a | ||
| 1554 | - lot about runtime, there are few reasons to use this option, | ||
| 1555 | - especially now that ``auto`` mode is supported. One reason to use | ||
| 1556 | - this is if you suspect that qpdf is removing resources it shouldn't | ||
| 1557 | - be removing. If you encounter that case, please report it as bug at | ||
| 1558 | - https://github.com/qpdf/qpdf/issues/. | ||
| 1559 | - | ||
| 1560 | -:samp:`--preserve-unreferenced-resources` | ||
| 1561 | - This is a synonym for | ||
| 1562 | - :samp:`--remove-unreferenced-resources=no`. | ||
| 1563 | - | ||
| 1564 | - See also :samp:`--preserve-unreferenced`, which does | ||
| 1565 | - something completely different. | ||
| 1566 | - | ||
| 1567 | -:samp:`--newline-before-endstream` | ||
| 1568 | - Tells qpdf to insert a newline before the ``endstream`` keyword, not | ||
| 1569 | - counted in the length, after any stream content even if the last | ||
| 1570 | - character of the stream was a newline. This may result in two | ||
| 1571 | - newlines in some cases. This is a requirement of PDF/A. While qpdf | ||
| 1572 | - doesn't specifically know how to generate PDF/A-compliant PDFs, this | ||
| 1573 | - at least prevents it from removing compliance on already compliant | ||
| 1574 | - files. | ||
| 1575 | - | ||
| 1576 | -:samp:`--linearize-pass1={file}` | ||
| 1577 | - Write the first pass of linearization to the named file. The | ||
| 1578 | - resulting file is not a valid PDF file. This option is useful only | ||
| 1579 | - for debugging ``QPDFWriter``'s linearization code. When qpdf | ||
| 1580 | - linearizes files, it writes the file in two passes, using the first | ||
| 1581 | - pass to calculate sizes and offsets that are required for hint tables | ||
| 1582 | - and the linearization dictionary. Ordinarily, the first pass is | ||
| 1583 | - discarded. This option enables it to be captured. | ||
| 1584 | - | ||
| 1585 | -:samp:`--coalesce-contents` | ||
| 1586 | - When a page's contents are split across multiple streams, this option | ||
| 1587 | - causes qpdf to combine them into a single stream. Use of this option | ||
| 1588 | - is never necessary for ordinary usage, but it can help when working | ||
| 1589 | - with some files in some cases. For example, this can also be combined | ||
| 1590 | - with QDF mode or content normalization to make it easier to look at | ||
| 1591 | - all of a page's contents at once. | ||
| 1592 | - | ||
| 1593 | -:samp:`--flatten-annotations={option}` | ||
| 1594 | - This option collapses annotations into the pages' contents with | ||
| 1595 | - special handling for form fields. Ordinarily, an annotation is | ||
| 1596 | - rendered separately and on top of the page. Combining annotations | ||
| 1597 | - into the page's contents effectively freezes the placement of the | ||
| 1598 | - annotations, making them look right after various page | ||
| 1599 | - transformations. The library functionality backing this option was | ||
| 1600 | - added for the benefit of programs that want to create *n-up* page | ||
| 1601 | - layouts and other similar things that don't work well with | ||
| 1602 | - annotations. The :samp:`{option}` parameter | ||
| 1603 | - may be any of the following: | ||
| 1604 | - | ||
| 1605 | - - :samp:`all`: include all annotations that are not | ||
| 1606 | - marked invisible or hidden | ||
| 1607 | - | ||
| 1608 | - - :samp:`print`: only include annotations that | ||
| 1609 | - indicate that they should appear when the page is printed | ||
| 1610 | - | ||
| 1611 | - - :samp:`screen`: omit annotations that indicate | ||
| 1612 | - they should not appear on the screen | ||
| 1613 | - | ||
| 1614 | - Note that form fields are special because the annotations that are | ||
| 1615 | - used to render filled-in form fields may become out of date from the | ||
| 1616 | - fields' values if the form is filled in by a program that doesn't | ||
| 1617 | - know how to update the appearances. If qpdf detects this case, its | ||
| 1618 | - default behavior is not to flatten those annotations because doing so | ||
| 1619 | - would cause the value of the form field to be lost. This gives you a | ||
| 1620 | - chance to go back and resave the form with a program that knows how | ||
| 1621 | - to generate appearances. QPDF itself can generate appearances with | ||
| 1622 | - some limitations. See the | ||
| 1623 | - :samp:`--generate-appearances` option below. | ||
| 1624 | - | ||
| 1625 | -:samp:`--generate-appearances` | ||
| 1626 | - If a file contains interactive form fields and indicates that the | ||
| 1627 | - appearances are out of date with the values of the form, this flag | ||
| 1628 | - will regenerate appearances, subject to a few limitations. Note that | ||
| 1629 | - there is not usually a reason to do this, but it can be necessary | ||
| 1630 | - before using the :samp:`--flatten-annotations` | ||
| 1631 | - option. Most of these are not a problem with well-behaved PDF files. | ||
| 1632 | - The limitations are as follows: | ||
| 1633 | - | ||
| 1634 | - - Radio button and checkbox appearances use the pre-set values in | ||
| 1635 | - the PDF file. QPDF just makes sure that the correct appearance is | ||
| 1636 | - displayed based on the value of the field. This is fine for PDF | ||
| 1637 | - files that create their forms properly. Some PDF writers save | ||
| 1638 | - appearances for fields when they change, which could cause some | ||
| 1639 | - controls to have inconsistent appearances. | ||
| 1640 | - | ||
| 1641 | - - For text fields and list boxes, any characters that fall outside | ||
| 1642 | - of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman" | ||
| 1643 | - encoding, will be replaced by the ``?`` character. | ||
| 1644 | - | ||
| 1645 | - - Quadding is ignored. Quadding is used to specify whether the | ||
| 1646 | - contents of a field should be left, center, or right aligned with | ||
| 1647 | - the field. | ||
| 1648 | - | ||
| 1649 | - - Rich text, multi-line, and other more elaborate formatting | ||
| 1650 | - directives are ignored. | ||
| 1651 | - | ||
| 1652 | - - There is no support for multi-select fields or signature fields. | ||
| 1653 | - | ||
| 1654 | - If qpdf doesn't do a good enough job with your form, use an external | ||
| 1655 | - application to save your filled-in form before processing it with | ||
| 1656 | - qpdf. | ||
| 1657 | - | ||
| 1658 | -:samp:`--optimize-images` | ||
| 1659 | - This flag causes qpdf to recompress all images that are not | ||
| 1660 | - compressed with DCT (JPEG) using DCT compression as long as doing so | ||
| 1661 | - decreases the size in bytes of the image data and the image does not | ||
| 1662 | - fall below minimum specified dimensions. Useful information is | ||
| 1663 | - provided when used in combination with | ||
| 1664 | - :samp:`--verbose`. See also the | ||
| 1665 | - :samp:`--oi-min-width`, | ||
| 1666 | - :samp:`--oi-min-height`, and | ||
| 1667 | - :samp:`--oi-min-area` options. By default, starting | ||
| 1668 | - in qpdf 8.4, inline images are converted to regular images and | ||
| 1669 | - optimized as well. Use :samp:`--keep-inline-images` | ||
| 1670 | - to prevent inline images from being included. | ||
| 1671 | - | ||
| 1672 | -:samp:`--oi-min-width={width}` | ||
| 1673 | - Avoid optimizing images whose width is below the specified amount. If | ||
| 1674 | - omitted, the default is 128 pixels. Use 0 for no minimum. | ||
| 1675 | - | ||
| 1676 | -:samp:`--oi-min-height={height}` | ||
| 1677 | - Avoid optimizing images whose height is below the specified amount. | ||
| 1678 | - If omitted, the default is 128 pixels. Use 0 for no minimum. | ||
| 1679 | - | ||
| 1680 | -:samp:`--oi-min-area={area-in-pixels}` | ||
| 1681 | - Avoid optimizing images whose pixel count (widthย รย height) is below | ||
| 1682 | - the specified amount. If omitted, the default is 16,384 pixels. Use 0 | ||
| 1683 | - for no minimum. | ||
| 1684 | - | ||
| 1685 | -:samp:`--externalize-inline-images` | ||
| 1686 | - Convert inline images to regular images. By default, images whose | ||
| 1687 | - data is at least 1,024 bytes are converted when this option is | ||
| 1688 | - selected. Use :samp:`--ii-min-bytes` to change the | ||
| 1689 | - size threshold. This option is implicitly selected when | ||
| 1690 | - :samp:`--optimize-images` is selected. Use | ||
| 1691 | - :samp:`--keep-inline-images` to exclude inline images | ||
| 1692 | - from image optimization. | ||
| 1693 | - | ||
| 1694 | -:samp:`--ii-min-bytes={bytes}` | ||
| 1695 | - Avoid converting inline images whose size is below the specified | ||
| 1696 | - minimum size to regular images. If omitted, the default is 1,024 | ||
| 1697 | - bytes. Use 0 for no minimum. | ||
| 1698 | - | ||
| 1699 | -:samp:`--keep-inline-images` | ||
| 1700 | - Prevent inline images from being included in image optimization. This | ||
| 1701 | - option has no affect when :samp:`--optimize-images` | ||
| 1702 | - is not specified. | ||
| 1703 | - | ||
| 1704 | -:samp:`--remove-page-labels` | ||
| 1705 | - Remove page labels from the output file. | ||
| 1706 | - | ||
| 1707 | -:samp:`--qdf` | ||
| 1708 | - Turns on QDF mode. For additional information on QDF, please see :ref:`ref.qdf`. Note that :samp:`--linearize` | ||
| 1709 | - disables QDF mode. | ||
| 1710 | - | ||
| 1711 | -:samp:`--min-version={version}` | ||
| 1712 | - Forces the PDF version of the output file to be at least | ||
| 1713 | - :samp:`{version}`. In other words, if the | ||
| 1714 | - input file has a lower version than the specified version, the | ||
| 1715 | - specified version will be used. If the input file has a higher | ||
| 1716 | - version, the input file's original version will be used. It is seldom | ||
| 1717 | - necessary to use this option since qpdf will automatically increase | ||
| 1718 | - the version as needed when adding features that require newer PDF | ||
| 1719 | - readers. | ||
| 1720 | - | ||
| 1721 | - The version number may be expressed in the form | ||
| 1722 | - :samp:`{major.minor.extension-level}`, in | ||
| 1723 | - which case the version is interpreted as | ||
| 1724 | - :samp:`{major.minor}` at extension level | ||
| 1725 | - :samp:`{extension-level}`. For example, | ||
| 1726 | - version ``1.7.8`` represents version 1.7 at extension level 8. Note | ||
| 1727 | - that minimal syntax checking is done on the command line. | ||
| 1728 | - | ||
| 1729 | -:samp:`--force-version={version}` | ||
| 1730 | - This option forces the PDF version to be the exact version specified | ||
| 1731 | - *even when the file may have content that is not supported in that | ||
| 1732 | - version*. The version number is interpreted in the same way as with | ||
| 1733 | - :samp:`--min-version` so that extension levels can be | ||
| 1734 | - set. In some cases, forcing the output file's PDF version to be lower | ||
| 1735 | - than that of the input file will cause qpdf to disable certain | ||
| 1736 | - features of the document. Specifically, 256-bit keys are disabled if | ||
| 1737 | - the version is less than 1.7 with extension level 8 (except R5 is | ||
| 1738 | - disabled if less than 1.7 with extension level 3), AES encryption is | ||
| 1739 | - disabled if the version is less than 1.6, cleartext metadata and | ||
| 1740 | - object streams are disabled if less than 1.5, 128-bit encryption keys | ||
| 1741 | - are disabled if less than 1.4, and all encryption is disabled if less | ||
| 1742 | - than 1.3. Even with these precautions, qpdf won't be able to do | ||
| 1743 | - things like eliminate use of newer image compression schemes, | ||
| 1744 | - transparency groups, or other features that may have been added in | ||
| 1745 | - more recent versions of PDF. | ||
| 1746 | - | ||
| 1747 | - As a general rule, with the exception of big structural things like | ||
| 1748 | - the use of object streams or AES encryption, PDF viewers are supposed | ||
| 1749 | - to ignore features in files that they don't support from newer | ||
| 1750 | - versions. This means that forcing the version to a lower version may | ||
| 1751 | - make it possible to open your PDF file with an older version, though | ||
| 1752 | - bear in mind that some of the original document's functionality may | ||
| 1753 | - be lost. | ||
| 1754 | - | ||
| 1755 | -By default, when a stream is encoded using non-lossy filters that qpdf | ||
| 1756 | -understands and is not already compressed using a good compression | ||
| 1757 | -scheme, qpdf will uncompress and recompress streams. Assuming proper | ||
| 1758 | -filter implements, this is safe and generally results in smaller files. | ||
| 1759 | -This behavior may also be explicitly requested with | ||
| 1760 | -:samp:`--stream-data=compress`. | ||
| 1761 | - | ||
| 1762 | -When :samp:`--normalize-content=y` is specified, qpdf | ||
| 1763 | -will attempt to normalize whitespace and newlines in page content | ||
| 1764 | -streams. This is generally safe but could, in some cases, cause damage | ||
| 1765 | -to the content streams. This option is intended for people who wish to | ||
| 1766 | -study PDF content streams or to debug PDF content. You should not use | ||
| 1767 | -this for "production" PDF files. | ||
| 1768 | - | ||
| 1769 | -When normalizing content, if qpdf runs into any lexical errors, it will | ||
| 1770 | -print a warning indicating that content may be damaged. The only | ||
| 1771 | -situation in which qpdf is known to cause damage during content | ||
| 1772 | -normalization is when a page's contents are split across multiple | ||
| 1773 | -streams and streams are split in the middle of a lexical token such as a | ||
| 1774 | -string, name, or inline image. Note that files that do this are invalid | ||
| 1775 | -since the PDF specification states that content streams are not to be | ||
| 1776 | -split in the middle of a token. If you want to inspect the original | ||
| 1777 | -content streams in an uncompressed format, you can always run with | ||
| 1778 | -:samp:`--qdf --normalize-content=n` for a QDF file | ||
| 1779 | -without content normalization, or alternatively | ||
| 1780 | -:samp:`--stream-data=uncompress` for a regular non-QDF | ||
| 1781 | -mode file with uncompressed streams. These will both uncompress all the | ||
| 1782 | -streams but will not attempt to normalize content. Please note that if | ||
| 1783 | -you are using content normalization or QDF mode for the purpose of | ||
| 1784 | -manually inspecting files, you don't have to care about this. | ||
| 1785 | - | ||
| 1786 | -Object streams, also known as compressed objects, were introduced into | ||
| 1787 | -the PDF specification at version 1.5, corresponding to Acrobat 6. Some | ||
| 1788 | -older PDF viewers may not support files with object streams. qpdf can be | ||
| 1789 | -used to transform files with object streams to files without object | ||
| 1790 | -streams or vice versa. As mentioned above, there are three object stream | ||
| 1791 | -modes: :samp:`preserve`, | ||
| 1792 | -:samp:`disable`, and :samp:`generate`. | ||
| 1793 | - | ||
| 1794 | -In :samp:`preserve` mode, the relationship to objects | ||
| 1795 | -and the streams that contain them is preserved from the original file. | ||
| 1796 | -In :samp:`disable` mode, all objects are written as | ||
| 1797 | -regular, uncompressed objects. The resulting file should be readable by | ||
| 1798 | -older PDF viewers. (Of course, the content of the files may include | ||
| 1799 | -features not supported by older viewers, but at least the structure will | ||
| 1800 | -be supported.) In :samp:`generate` mode, qpdf will | ||
| 1801 | -create its own object streams. This will usually result in more compact | ||
| 1802 | -PDF files, though they may not be readable by older viewers. In this | ||
| 1803 | -mode, qpdf will also make sure the PDF version number in the header is | ||
| 1804 | -at least 1.5. | ||
| 1805 | - | ||
| 1806 | -The :samp:`--qdf` flag turns on QDF mode, which changes | ||
| 1807 | -some of the defaults described above. Specifically, in QDF mode, by | ||
| 1808 | -default, stream data is uncompressed, content streams are normalized, | ||
| 1809 | -and encryption is removed. These defaults can still be overridden by | ||
| 1810 | -specifying the appropriate options as described above. Additionally, in | ||
| 1811 | -QDF mode, stream lengths are stored as indirect objects, objects are | ||
| 1812 | -laid out in a less efficient but more readable fashion, and the | ||
| 1813 | -documents are interspersed with comments that make it easier for the | ||
| 1814 | -user to find things and also make it possible for | ||
| 1815 | -:command:`fix-qdf` to work properly. QDF mode is intended | ||
| 1816 | -for people, mostly developers, who wish to inspect or modify PDF files | ||
| 1817 | -in a text editor. For details, please see :ref:`ref.qdf`. | ||
| 1818 | - | ||
| 1819 | -.. _ref.testing-options: | ||
| 1820 | - | ||
| 1821 | -Testing, Inspection, and Debugging Options | ||
| 1822 | ------------------------------------------- | ||
| 1823 | - | ||
| 1824 | -These options can be useful for digging into PDF files or for use in | ||
| 1825 | -automated test suites for software that uses the qpdf library. When any | ||
| 1826 | -of the options in this section are specified, no output file should be | ||
| 1827 | -given. The following options are available: | ||
| 1828 | - | ||
| 1829 | -:samp:`--deterministic-id` | ||
| 1830 | - Causes generation of a deterministic value for /ID. This prevents use | ||
| 1831 | - of timestamp and output file name information in the /ID generation. | ||
| 1832 | - Instead, at some slight additional runtime cost, the /ID field is | ||
| 1833 | - generated to include a digest of the significant parts of the content | ||
| 1834 | - of the output PDF file. This means that a given qpdf operation should | ||
| 1835 | - generate the same /ID each time it is run, which can be useful when | ||
| 1836 | - caching results or for generation of some test data. Use of this flag | ||
| 1837 | - is not compatible with creation of encrypted files. | ||
| 1838 | - | ||
| 1839 | -:samp:`--static-id` | ||
| 1840 | - Causes generation of a fixed value for /ID. This is intended for | ||
| 1841 | - testing only. Never use it for production files. If you are trying to | ||
| 1842 | - get the same /ID each time for a given file and you are not | ||
| 1843 | - generating encrypted files, consider using the | ||
| 1844 | - :samp:`--deterministic-id` option. | ||
| 1845 | - | ||
| 1846 | -:samp:`--static-aes-iv` | ||
| 1847 | - Causes use of a static initialization vector for AES-CBC. This is | ||
| 1848 | - intended for testing only so that output files can be reproducible. | ||
| 1849 | - Never use it for production files. This option in particular is not | ||
| 1850 | - secure since it significantly weakens the encryption. | ||
| 1851 | - | ||
| 1852 | -:samp:`--no-original-object-ids` | ||
| 1853 | - Suppresses inclusion of original object ID comments in QDF files. | ||
| 1854 | - This can be useful when generating QDF files for test purposes, | ||
| 1855 | - particularly when comparing them to determine whether two PDF files | ||
| 1856 | - have identical content. | ||
| 1857 | - | ||
| 1858 | -:samp:`--show-encryption` | ||
| 1859 | - Shows document encryption parameters. Also shows the document's user | ||
| 1860 | - password if the owner password is given. | ||
| 1861 | - | ||
| 1862 | -:samp:`--show-encryption-key` | ||
| 1863 | - When encryption information is being displayed, as when | ||
| 1864 | - :samp:`--check` or | ||
| 1865 | - :samp:`--show-encryption` is given, display the | ||
| 1866 | - computed or retrieved encryption key as a hexadecimal string. This | ||
| 1867 | - value is not ordinarily useful to users, but it can be used as the | ||
| 1868 | - argument to :samp:`--password` if the | ||
| 1869 | - :samp:`--password-is-hex-key` is specified. Note | ||
| 1870 | - that, when PDF files are encrypted, passwords and other metadata are | ||
| 1871 | - used only to compute an encryption key, and the encryption key is | ||
| 1872 | - what is actually used for encryption. This enables retrieval of that | ||
| 1873 | - key. | ||
| 1874 | - | ||
| 1875 | -:samp:`--check-linearization` | ||
| 1876 | - Checks file integrity and linearization status. | ||
| 1877 | - | ||
| 1878 | -:samp:`--show-linearization` | ||
| 1879 | - Checks and displays all data in the linearization hint tables. | ||
| 1880 | - | ||
| 1881 | -:samp:`--show-xref` | ||
| 1882 | - Shows the contents of the cross-reference table in a human-readable | ||
| 1883 | - form. This is especially useful for files with cross-reference | ||
| 1884 | - streams which are stored in a binary format. | ||
| 1885 | - | ||
| 1886 | -:samp:`--show-object=trailer|obj[,gen]` | ||
| 1887 | - Show the contents of the given object. This is especially useful for | ||
| 1888 | - inspecting objects that are inside of object streams (also known as | ||
| 1889 | - "compressed objects"). | ||
| 1890 | - | ||
| 1891 | -:samp:`--raw-stream-data` | ||
| 1892 | - When used along with the :samp:`--show-object` | ||
| 1893 | - option, if the object is a stream, shows the raw stream data instead | ||
| 1894 | - of object's contents. | ||
| 1895 | - | ||
| 1896 | -:samp:`--filtered-stream-data` | ||
| 1897 | - When used along with the :samp:`--show-object` | ||
| 1898 | - option, if the object is a stream, shows the filtered stream data | ||
| 1899 | - instead of object's contents. If the stream is filtered using filters | ||
| 1900 | - that qpdf does not support, an error will be issued. | ||
| 1901 | - | ||
| 1902 | -:samp:`--show-npages` | ||
| 1903 | - Prints the number of pages in the input file on a line by itself. | ||
| 1904 | - Since the number of pages appears by itself on a line, this option | ||
| 1905 | - can be useful for scripting if you need to know the number of pages | ||
| 1906 | - in a file. | ||
| 1907 | - | ||
| 1908 | -:samp:`--show-pages` | ||
| 1909 | - Shows the object and generation number for each page dictionary | ||
| 1910 | - object and for each content stream associated with the page. Having | ||
| 1911 | - this information makes it more convenient to inspect objects from a | ||
| 1912 | - particular page. | ||
| 1913 | - | ||
| 1914 | -:samp:`--with-images` | ||
| 1915 | - When used along with :samp:`--show-pages`, also shows | ||
| 1916 | - the object and generation numbers for the image objects on each page. | ||
| 1917 | - (At present, information about images in shared resource dictionaries | ||
| 1918 | - are not output by this command. This is discussed in a comment in the | ||
| 1919 | - source code.) | ||
| 1920 | - | ||
| 1921 | -:samp:`--json` | ||
| 1922 | - Generate a JSON representation of the file. This is described in | ||
| 1923 | - depth in :ref:`ref.json` | ||
| 1924 | - | ||
| 1925 | -:samp:`--json-help` | ||
| 1926 | - Describe the format of the JSON output. | ||
| 1927 | - | ||
| 1928 | -:samp:`--json-key=key` | ||
| 1929 | - This option is repeatable. If specified, only top-level keys | ||
| 1930 | - specified will be included in the JSON output. If not specified, all | ||
| 1931 | - keys will be shown. | ||
| 1932 | - | ||
| 1933 | -:samp:`--json-object=trailer|obj[,gen]` | ||
| 1934 | - This option is repeatable. If specified, only specified objects will | ||
| 1935 | - be shown in the "``objects``" key of the JSON output. If absent, all | ||
| 1936 | - objects will be shown. | ||
| 1937 | - | ||
| 1938 | -:samp:`--check` | ||
| 1939 | - Checks file structure and well as encryption, linearization, and | ||
| 1940 | - encoding of stream data. A file for which | ||
| 1941 | - :samp:`--check` reports no errors may still have | ||
| 1942 | - errors in stream data content but should otherwise be structurally | ||
| 1943 | - sound. If :samp:`--check` any errors, qpdf will exit | ||
| 1944 | - with a status of 2. There are some recoverable conditions that | ||
| 1945 | - :samp:`--check` detects. These are issued as warnings | ||
| 1946 | - instead of errors. If qpdf finds no errors but finds warnings, it | ||
| 1947 | - will exit with a status of 3 (as of versionย 2.0.4). When | ||
| 1948 | - :samp:`--check` is combined with other options, | ||
| 1949 | - checks are always performed before any other options are processed. | ||
| 1950 | - For erroneous files, :samp:`--check` will cause qpdf | ||
| 1951 | - to attempt to recover, after which other options are effectively | ||
| 1952 | - operating on the recovered file. Combining | ||
| 1953 | - :samp:`--check` with other options in this way can be | ||
| 1954 | - useful for manually recovering severely damaged files. Note that | ||
| 1955 | - :samp:`--check` produces no output to standard output | ||
| 1956 | - when everything is valid, so if you are using this to | ||
| 1957 | - programmatically validate files in bulk, it is safe to run without | ||
| 1958 | - output redirected to :file:`/dev/null` and just | ||
| 1959 | - check for a 0 exit code. | ||
| 1960 | - | ||
| 1961 | -The :samp:`--raw-stream-data` and | ||
| 1962 | -:samp:`--filtered-stream-data` options are ignored | ||
| 1963 | -unless :samp:`--show-object` is given. Either of these | ||
| 1964 | -options will cause the stream data to be written to standard output. In | ||
| 1965 | -order to avoid commingling of stream data with other output, it is | ||
| 1966 | -recommend that these objects not be combined with other test/inspection | ||
| 1967 | -options. | ||
| 1968 | - | ||
| 1969 | -If :samp:`--filtered-stream-data` is given and | ||
| 1970 | -:samp:`--normalize-content=y` is also given, qpdf will | ||
| 1971 | -attempt to normalize the stream data as if it is a page content stream. | ||
| 1972 | -This attempt will be made even if it is not a page content stream, in | ||
| 1973 | -which case it will produce unusable results. | ||
| 1974 | - | ||
| 1975 | -.. _ref.unicode-passwords: | ||
| 1976 | - | ||
| 1977 | -Unicode Passwords | ||
| 1978 | ------------------ | ||
| 1979 | - | ||
| 1980 | -At the library API level, all methods that perform encryption and | ||
| 1981 | -decryption interpret passwords as strings of bytes. It is up to the | ||
| 1982 | -caller to ensure that they are appropriately encoded. Starting with qpdf | ||
| 1983 | -version 8.4.0, qpdf will attempt to make this easier for you when | ||
| 1984 | -interact with qpdf via its command line interface. The PDF specification | ||
| 1985 | -requires passwords used to encrypt files with 40-bit or 128-bit | ||
| 1986 | -encryption to be encoded with PDF Doc encoding. This encoding is a | ||
| 1987 | -single-byte encoding that supports ISO-Latin-1 and a handful of other | ||
| 1988 | -commonly used characters. It has a large overlap with Windows ANSI but | ||
| 1989 | -is not exactly the same. There is generally not a way to provide PDF Doc | ||
| 1990 | -encoded strings on the command line. As such, qpdf versions prior to | ||
| 1991 | -8.4.0 would often create PDF files that couldn't be opened with other | ||
| 1992 | -software when given a password with non-ASCII characters to encrypt a | ||
| 1993 | -file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf | ||
| 1994 | -recognizes the encoding of the parameter and transcodes it as needed. | ||
| 1995 | -The rest of this section provides the details about exactly how qpdf | ||
| 1996 | -behaves. Most users will not need to know this information, but it might | ||
| 1997 | -be useful if you have been working around qpdf's old behavior or if you | ||
| 1998 | -are using qpdf to generate encrypted files for testing other PDF | ||
| 1999 | -software. | ||
| 2000 | - | ||
| 2001 | -A note about Windows: when qpdf builds, it attempts to determine what it | ||
| 2002 | -has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain`` | ||
| 2003 | -function is an alternative entry point that receives all arguments as | ||
| 2004 | -UTF-16-encoded strings. When qpdf starts up this way, it converts all | ||
| 2005 | -the strings to UTF-8 encoding and then invokes the regular main. This | ||
| 2006 | -means that, as far as qpdf is concerned, it receives its command-line | ||
| 2007 | -arguments with UTF-8 encoding, just as it would in any modern Linux or | ||
| 2008 | -UNIX environment. | ||
| 2009 | - | ||
| 2010 | -If a file is being encrypted with 40-bit or 128-bit encryption and the | ||
| 2011 | -supplied password is not a valid UTF-8 string, qpdf will fall back to | ||
| 2012 | -the behavior of interpreting the password as a string of bytes. If you | ||
| 2013 | -have old scripts that encrypt files by passing the output of | ||
| 2014 | -:command:`iconv` to qpdf, you no longer need to do that, | ||
| 2015 | -but if you do, qpdf should still work. The only exception would be for | ||
| 2016 | -the extremely unlikely case of a password that is encoded with a | ||
| 2017 | -single-byte encoding but also happens to be valid UTF-8. Such a password | ||
| 2018 | -would contain strings of even numbers of characters that alternate | ||
| 2019 | -between accented letters and symbols. In the extremely unlikely event | ||
| 2020 | -that you are intentionally using such passwords and qpdf is thwarting | ||
| 2021 | -you by interpreting them as UTF-8, you can use | ||
| 2022 | -:samp:`--password-mode=bytes` to suppress qpdf's | ||
| 2023 | -automatic behavior. | ||
| 2024 | - | ||
| 2025 | -The :samp:`--password-mode` option, as described earlier | ||
| 2026 | -in this chapter, can be used to change qpdf's interpretation of supplied | ||
| 2027 | -passwords. There are very few reasons to use this option. One would be | ||
| 2028 | -the unlikely case described in the previous paragraph in which the | ||
| 2029 | -supplied password happens to be valid UTF-8 but isn't supposed to be | ||
| 2030 | -UTF-8. Your best bet would be just to provide the password as a valid | ||
| 2031 | -UTF-8 string, but you could also use | ||
| 2032 | -:samp:`--password-mode=bytes`. Another reason to use | ||
| 2033 | -:samp:`--password-mode=bytes` would be to intentionally | ||
| 2034 | -generate PDF files encrypted with passwords that are not properly | ||
| 2035 | -encoded. The qpdf test suite does this to generate invalid files for the | ||
| 2036 | -purpose of testing its password recovery capability. If you were trying | ||
| 2037 | -to create intentionally incorrect files for a similar purposes, the | ||
| 2038 | -:samp:`bytes` password mode can enable you to do this. | ||
| 2039 | - | ||
| 2040 | -When qpdf attempts to decrypt a file with a password that contains | ||
| 2041 | -non-ASCII characters, it will generate a list of alternative passwords | ||
| 2042 | -by attempting to interpret the password as each of a handful of | ||
| 2043 | -different coding systems and then transcode them to the required format. | ||
| 2044 | -This helps to compensate for the supplied password being given in the | ||
| 2045 | -wrong coding system, such as would happen if you used the | ||
| 2046 | -:command:`iconv` workaround that was previously needed. | ||
| 2047 | -It also generates passwords by doing the reverse operation: translating | ||
| 2048 | -from correct in incorrect encoding of the password. This would enable | ||
| 2049 | -qpdf to decrypt files using passwords that were improperly encoded by | ||
| 2050 | -whatever software encrypted the files, including older versions of qpdf | ||
| 2051 | -invoked without properly encoded passwords. The combination of these two | ||
| 2052 | -recovery methods should make qpdf transparently open most encrypted | ||
| 2053 | -files with the password supplied correctly but in the wrong coding | ||
| 2054 | -system. There are no real downsides to this behavior, but if you don't | ||
| 2055 | -want qpdf to do this, you can use the | ||
| 2056 | -:samp:`--suppress-password-recovery` option. One reason | ||
| 2057 | -to do that is to ensure that you know the exact password that was used | ||
| 2058 | -to encrypt the file. | ||
| 2059 | - | ||
| 2060 | -With these changes, qpdf now generates compliant passwords in most | ||
| 2061 | -cases. There are still some exceptions. In particular, the PDF | ||
| 2062 | -specification directs compliant writers to normalize Unicode passwords | ||
| 2063 | -and to perform certain transformations on passwords with bidirectional | ||
| 2064 | -text. Implementing this functionality requires using a real Unicode | ||
| 2065 | -library like ICU. If a client application that uses qpdf wants to do | ||
| 2066 | -this, the qpdf library will accept the resulting passwords, but qpdf | ||
| 2067 | -will not perform these transformations itself. It is possible that this | ||
| 2068 | -will be addressed in a future version of qpdf. The ``QPDFWriter`` | ||
| 2069 | -methods that enable encryption on the output file accept passwords as | ||
| 2070 | -strings of bytes. | ||
| 2071 | - | ||
| 2072 | -Please note that the :samp:`--password-is-hex-key` | ||
| 2073 | -option is unrelated to all this. This flag bypasses the normal process | ||
| 2074 | -of going from password to encryption string entirely, allowing the raw | ||
| 2075 | -encryption key to be specified directly. This is useful for forensic | ||
| 2076 | -purposes or for brute-force recovery of files with unknown passwords. | ||
| 2077 | - | ||
| 2078 | -.. _ref.qdf: | ||
| 2079 | - | ||
| 2080 | -QDF Mode | ||
| 2081 | -======== | ||
| 2082 | - | ||
| 2083 | -In QDF mode, qpdf creates PDF files in what we call *QDF | ||
| 2084 | -form*. A PDF file in QDF form, sometimes called a QDF | ||
| 2085 | -file, is a completely valid PDF file that has ``%QDF-1.0`` as its third | ||
| 2086 | -line (after the pdf header and binary characters) and has certain other | ||
| 2087 | -characteristics. The purpose of QDF form is to make it possible to edit | ||
| 2088 | -PDF files, with some restrictions, in an ordinary text editor. This can | ||
| 2089 | -be very useful for experimenting with different PDF constructs or for | ||
| 2090 | -making one-off edits to PDF files (though there are other reasons why | ||
| 2091 | -this may not always work). Note that QDF mode does not support | ||
| 2092 | -linearized files. If you enable linearization, QDF mode is automatically | ||
| 2093 | -disabled. | ||
| 2094 | - | ||
| 2095 | -It is ordinarily very difficult to edit PDF files in a text editor for | ||
| 2096 | -two reasons: most meaningful data in PDF files is compressed, and PDF | ||
| 2097 | -files are full of offset and length information that makes it hard to | ||
| 2098 | -add or remove data. A QDF file is organized in a manner such that, if | ||
| 2099 | -edits are kept within certain constraints, the | ||
| 2100 | -:command:`fix-qdf` program, distributed with qpdf, is | ||
| 2101 | -able to restore edited files to a correct state. The | ||
| 2102 | -:command:`fix-qdf` program takes no command-line | ||
| 2103 | -arguments. It reads a possibly edited QDF file from standard input and | ||
| 2104 | -writes a repaired file to standard output. | ||
| 2105 | - | ||
| 2106 | -The following attributes characterize a QDF file: | ||
| 2107 | - | ||
| 2108 | -- All objects appear in numerical order in the PDF file, including when | ||
| 2109 | - objects appear in object streams. | ||
| 2110 | - | ||
| 2111 | -- Objects are printed in an easy-to-read format, and all line endings | ||
| 2112 | - are normalized to UNIX line endings. | ||
| 2113 | - | ||
| 2114 | -- Unless specifically overridden, streams appear uncompressed (when | ||
| 2115 | - qpdf supports the filters and they are compressed with a non-lossy | ||
| 2116 | - compression scheme), and most content streams are normalized (line | ||
| 2117 | - endings are converted to just a UNIX-style linefeeds). | ||
| 2118 | - | ||
| 2119 | -- All streams lengths are represented as indirect objects, and the | ||
| 2120 | - stream length object is always the next object after the stream. If | ||
| 2121 | - the stream data does not end with a newline, an extra newline is | ||
| 2122 | - inserted, and a special comment appears after the stream indicating | ||
| 2123 | - that this has been done. | ||
| 2124 | - | ||
| 2125 | -- If the PDF file contains object streams, if object stream *n* | ||
| 2126 | - contains *k* objects, those objects are numbered from *n+1* through | ||
| 2127 | - *n+k*, and the object number/offset pairs appear on a separate line | ||
| 2128 | - for each object. Additionally, each object in the object stream is | ||
| 2129 | - preceded by a comment indicating its object number and index. This | ||
| 2130 | - makes it very easy to find objects in object streams. | ||
| 2131 | - | ||
| 2132 | -- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens, | ||
| 2133 | - and ``endobj`` tokens appear on lines by themselves. A blank line | ||
| 2134 | - follows every ``endobj`` token. | ||
| 2135 | - | ||
| 2136 | -- If there is a cross-reference stream, it is unfiltered. | ||
| 2137 | - | ||
| 2138 | -- Page dictionaries and page content streams are marked with special | ||
| 2139 | - comments that make them easy to find. | ||
| 2140 | - | ||
| 2141 | -- Comments precede each object indicating the object number of the | ||
| 2142 | - corresponding object in the original file. | ||
| 2143 | - | ||
| 2144 | -When editing a QDF file, any edits can be made as long as the above | ||
| 2145 | -constraints are maintained. This means that you can freely edit a page's | ||
| 2146 | -content without worrying about messing up the QDF file. It is also | ||
| 2147 | -possible to add new objects so long as those objects are added after the | ||
| 2148 | -last object in the file or subsequent objects are renumbered. If a QDF | ||
| 2149 | -file has object streams in it, you can always add the new objects before | ||
| 2150 | -the xref stream and then change the number of the xref stream, since | ||
| 2151 | -nothing generally ever references it by number. | ||
| 2152 | - | ||
| 2153 | -It is not generally practical to remove objects from QDF files without | ||
| 2154 | -messing up object numbering, but if you remove all references to an | ||
| 2155 | -object, you can run qpdf on the file (after running | ||
| 2156 | -:command:`fix-qdf`), and qpdf will omit the now-orphaned | ||
| 2157 | -object. | ||
| 2158 | - | ||
| 2159 | -When :command:`fix-qdf` is run, it goes through the file | ||
| 2160 | -and recomputes the following parts of the file: | ||
| 2161 | - | ||
| 2162 | -- the ``/N``, ``/W``, and ``/First`` keys of all object stream | ||
| 2163 | - dictionaries | ||
| 2164 | - | ||
| 2165 | -- the pairs of numbers representing object numbers and offsets of | ||
| 2166 | - objects in object streams | ||
| 2167 | - | ||
| 2168 | -- all stream lengths | ||
| 2169 | - | ||
| 2170 | -- the cross-reference table or cross-reference stream | ||
| 2171 | - | ||
| 2172 | -- the offset to the cross-reference table or cross-reference stream | ||
| 2173 | - following the ``startxref`` token | ||
| 2174 | - | ||
| 2175 | -.. _ref.using-library: | ||
| 2176 | - | ||
| 2177 | -Using the QPDF Library | ||
| 2178 | -====================== | ||
| 2179 | - | ||
| 2180 | -.. _ref.using.from-cxx: | ||
| 2181 | - | ||
| 2182 | -Using QPDF from C++ | ||
| 2183 | -------------------- | ||
| 2184 | - | ||
| 2185 | -The source tree for the qpdf package has an | ||
| 2186 | -:file:`examples` directory that contains a few | ||
| 2187 | -example programs. The :file:`qpdf/qpdf.cc` source | ||
| 2188 | -file also serves as a useful example since it exercises almost all of | ||
| 2189 | -the qpdf library's public interface. The best source of documentation on | ||
| 2190 | -the library itself is reading comments in | ||
| 2191 | -:file:`include/qpdf/QPDF.hh`, | ||
| 2192 | -:file:`include/qpdf/QPDFWriter.hh`, and | ||
| 2193 | -:file:`include/qpdf/QPDFObjectHandle.hh`. | ||
| 2194 | - | ||
| 2195 | -All header files are installed in the | ||
| 2196 | -:file:`include/qpdf` directory. It is recommend that | ||
| 2197 | -you use ``#include <qpdf/QPDF.hh>`` rather than adding | ||
| 2198 | -:file:`include/qpdf` to your include path. | ||
| 2199 | - | ||
| 2200 | -When linking against the qpdf static library, you may also need to | ||
| 2201 | -specify ``-lz -ljpeg`` on your link command. If your system understands | ||
| 2202 | -how to read libtool :file:`.la` files, this may not | ||
| 2203 | -be necessary. | ||
| 2204 | - | ||
| 2205 | -The qpdf library is safe to use in a multithreaded program, but no | ||
| 2206 | -individual ``QPDF`` object instance (including ``QPDF``, | ||
| 2207 | -``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one | ||
| 2208 | -thread at a time. Multiple threads may simultaneously work with | ||
| 2209 | -different instances of these and all other QPDF objects. | ||
| 2210 | - | ||
| 2211 | -.. _ref.using.other-languages: | ||
| 2212 | - | ||
| 2213 | -Using QPDF from other languages | ||
| 2214 | -------------------------------- | ||
| 2215 | - | ||
| 2216 | -The qpdf library is implemented in C++, which makes it hard to use | ||
| 2217 | -directly in other languages. There are a few things that can help. | ||
| 2218 | - | ||
| 2219 | -"C" | ||
| 2220 | - The qpdf library includes a "C" language interface that provides a | ||
| 2221 | - subset of the overall capabilities. The header file | ||
| 2222 | - :file:`qpdf/qpdf-c.h` includes information about | ||
| 2223 | - its use. As long as you use a C++ linker, you can link C programs | ||
| 2224 | - with qpdf and use the C API. For languages that can directly load | ||
| 2225 | - methods from a shared library, the C API can also be useful. People | ||
| 2226 | - have reported success using the C API from other languages on Windows | ||
| 2227 | - by directly calling functions in the DLL. | ||
| 2228 | - | ||
| 2229 | -Python | ||
| 2230 | - A Python module called | ||
| 2231 | - `pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and | ||
| 2232 | - highly functional set of Python bindings to the qpdf library. Using | ||
| 2233 | - pikepdf, you can work with PDF files in a natural way and combine | ||
| 2234 | - qpdf's capabilities with other functionality provided by Python's | ||
| 2235 | - rich standard library and available modules. | ||
| 2236 | - | ||
| 2237 | -Other Languages | ||
| 2238 | - Starting with version 8.3.0, the :command:`qpdf` | ||
| 2239 | - command-line tool can produce a JSON representation of the PDF file's | ||
| 2240 | - non-content data. This can facilitate interacting programmatically | ||
| 2241 | - with PDF files through qpdf's command line interface. For more | ||
| 2242 | - information, please see :ref:`ref.json`. | ||
| 2243 | - | ||
| 2244 | -.. _ref.unicode-files: | ||
| 2245 | - | ||
| 2246 | -A Note About Unicode File Names | ||
| 2247 | -------------------------------- | ||
| 2248 | - | ||
| 2249 | -When strings are passed to qpdf library routines either as ``char*`` or | ||
| 2250 | -as ``std::string``, they are treated as byte arrays except where | ||
| 2251 | -otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless | ||
| 2252 | -otherwise noted in comments in header files. In modern UNIX/Linux | ||
| 2253 | -environments, this generally does the right thing. In Windows, it's a | ||
| 2254 | -bit more complicated. Starting in qpdf 8.4.0, passwords that contain | ||
| 2255 | -Unicode characters are handled much better, and starting in qpdf 8.4.1, | ||
| 2256 | -the library attempts to properly handle Unicode characters in filenames. | ||
| 2257 | -In particular, in Windows, if a UTF-8 encoded string is used as a | ||
| 2258 | -filename in either ``QPDF`` or ``QPDFWriter``, it is internally | ||
| 2259 | -converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As | ||
| 2260 | -such, qpdf will generally operate properly on files with non-ASCII | ||
| 2261 | -characters in their names as long as the filenames are UTF-8 encoded for | ||
| 2262 | -passing into the qpdf library API, but there are still some rough edges, | ||
| 2263 | -such as the encoding of the filenames in error messages our CLI output | ||
| 2264 | -messages. Patches or bug reports are welcome for any continuing issues | ||
| 2265 | -with Unicode file names in Windows. | ||
| 2266 | - | ||
| 2267 | -.. _ref.weak-crypto: | ||
| 2268 | - | ||
| 2269 | -Weak Cryptography | ||
| 2270 | -================= | ||
| 2271 | - | ||
| 2272 | -Start with version 10.4, qpdf is taking steps to reduce the likelihood | ||
| 2273 | -of a user *accidentally* creating PDF files with insecure cryptography | ||
| 2274 | -but will continue to allow creation of such files indefinitely with | ||
| 2275 | -explicit acknowledgment. | ||
| 2276 | - | ||
| 2277 | -The PDF file format makes use of RC4, which is known to be a weak | ||
| 2278 | -cryptography algorithm, and MD5, which is a weak hashing algorithm. In | ||
| 2279 | -version 10.4, qpdf generates warnings for some (but not all) cases of | ||
| 2280 | -writing files with weak cryptography when invoked from the command-line. | ||
| 2281 | -These warnings can be suppressed using the | ||
| 2282 | -:samp:`--allow-weak-crypto` option. | ||
| 2283 | - | ||
| 2284 | -It is planned for qpdf version 11 to be stricter, making it an error to | ||
| 2285 | -write files with insecure cryptography from the command-line tool in | ||
| 2286 | -most cases without specifying the | ||
| 2287 | -:samp:`--allow-weak-crypto` flag and also to require | ||
| 2288 | -explicit steps when using the C++ library to enable use of insecure | ||
| 2289 | -cryptography. | ||
| 2290 | - | ||
| 2291 | -Note that qpdf must always retain support for weak cryptographic | ||
| 2292 | -algorithms since this is required for reading older PDF files that use | ||
| 2293 | -it. Additionally, qpdf will always retain the ability to create files | ||
| 2294 | -using weak cryptographic algorithms since, as a development tool, qpdf | ||
| 2295 | -explicitly supports creating older or deprecated types of PDF files | ||
| 2296 | -since these are sometimes needed to test or work with older versions of | ||
| 2297 | -software. Even if other cryptography libraries drop support for RC4 or | ||
| 2298 | -MD5, qpdf can always fall back to its internal implementations of those | ||
| 2299 | -algorithms, so they are not going to disappear from qpdf. | ||
| 2300 | - | ||
| 2301 | -.. _ref.json: | ||
| 2302 | - | ||
| 2303 | -QPDF JSON | ||
| 2304 | -========= | ||
| 2305 | - | ||
| 2306 | -.. _ref.json-overview: | ||
| 2307 | - | ||
| 2308 | -Overview | ||
| 2309 | --------- | ||
| 2310 | - | ||
| 2311 | -Beginning with qpdf version 8.3.0, the :command:`qpdf` | ||
| 2312 | -command-line program can produce a JSON representation of the | ||
| 2313 | -non-content data in a PDF file. It includes a dump in JSON format of all | ||
| 2314 | -objects in the PDF file excluding the content of streams. This JSON | ||
| 2315 | -representation makes it very easy to look in detail at the structure of | ||
| 2316 | -a given PDF file, and it also provides a great way to work with PDF | ||
| 2317 | -files programmatically from the command-line in languages that can't | ||
| 2318 | -call or link with the qpdf library directly. Note that stream data can | ||
| 2319 | -be extracted from PDF files using other qpdf command-line options. | ||
| 2320 | - | ||
| 2321 | -.. _ref.json-guarantees: | ||
| 2322 | - | ||
| 2323 | -JSON Guarantees | ||
| 2324 | ---------------- | ||
| 2325 | - | ||
| 2326 | -The qpdf JSON representation includes a JSON serialization of the raw | ||
| 2327 | -objects in the PDF file as well as some computed information in a more | ||
| 2328 | -easily extracted format. QPDF provides some guarantees about its JSON | ||
| 2329 | -format. These guarantees are designed to simplify the experience of a | ||
| 2330 | -developer working with the JSON format. | ||
| 2331 | - | ||
| 2332 | -Compatibility | ||
| 2333 | - The top-level JSON object output is a dictionary. The JSON output | ||
| 2334 | - contains various nested dictionaries and arrays. With the exception | ||
| 2335 | - of dictionaries that are populated by the fields of objects from the | ||
| 2336 | - file, all instances of a dictionary are guaranteed to have exactly | ||
| 2337 | - the same keys. Future versions of qpdf are free to add additional | ||
| 2338 | - keys but not to remove keys or change the type of object that a key | ||
| 2339 | - points to. The qpdf program validates this guarantee, and in the | ||
| 2340 | - unlikely event that a bug in qpdf should cause it to generate data | ||
| 2341 | - that doesn't conform to this rule, it will ask you to file a bug | ||
| 2342 | - report. | ||
| 2343 | - | ||
| 2344 | - The top-level JSON structure contains a "``version``" key whose value | ||
| 2345 | - is simple integer. The value of the ``version`` key will be | ||
| 2346 | - incremented if a non-compatible change is made. A non-compatible | ||
| 2347 | - change would be any change that involves removal of a key, a change | ||
| 2348 | - to the format of data pointed to by a key, or a semantic change that | ||
| 2349 | - requires a different interpretation of a previously existing key. A | ||
| 2350 | - strong effort will be made to avoid breaking compatibility. | ||
| 2351 | - | ||
| 2352 | -Documentation | ||
| 2353 | - The :command:`qpdf` command can be invoked with the | ||
| 2354 | - :samp:`--json-help` option. This will output a JSON | ||
| 2355 | - structure that has the same structure as the JSON output that qpdf | ||
| 2356 | - generates, except that each field in the help output is a description | ||
| 2357 | - of the corresponding field in the JSON output. The specific | ||
| 2358 | - guarantees are as follows: | ||
| 2359 | - | ||
| 2360 | - - A dictionary in the help output means that the corresponding | ||
| 2361 | - location in the actual JSON output is also a dictionary with | ||
| 2362 | - exactly the same keys; that is, no keys present in help are absent | ||
| 2363 | - in the real output, and no keys will be present in the real output | ||
| 2364 | - that are not in help. As a special case, if the dictionary has a | ||
| 2365 | - single key whose name starts with ``<`` and ends with ``>``, it | ||
| 2366 | - means that the JSON output is a dictionary that can have any keys, | ||
| 2367 | - each of which conforms to the value of the special key. This is | ||
| 2368 | - used for cases in which the keys of the dictionary are things like | ||
| 2369 | - object IDs. | ||
| 2370 | - | ||
| 2371 | - - A string in the help output is a description of the item that | ||
| 2372 | - appears in the corresponding location of the actual output. The | ||
| 2373 | - corresponding output can have any format. | ||
| 2374 | - | ||
| 2375 | - - An array in the help output always contains a single element. It | ||
| 2376 | - indicates that the corresponding location in the actual output is | ||
| 2377 | - also an array, and that each element of the array has whatever | ||
| 2378 | - format is implied by the single element of the help output's | ||
| 2379 | - array. | ||
| 2380 | - | ||
| 2381 | - For example, the help output indicates includes a "``pagelabels``" | ||
| 2382 | - key whose value is an array of one element. That element is a | ||
| 2383 | - dictionary with keys "``index``" and "``label``". In addition to | ||
| 2384 | - describing the meaning of those keys, this tells you that the actual | ||
| 2385 | - JSON output will contain a ``pagelabels`` array, each of whose | ||
| 2386 | - elements is a dictionary that contains an ``index`` key, a ``label`` | ||
| 2387 | - key, and no other keys. | ||
| 2388 | - | ||
| 2389 | -Directness and Simplicity | ||
| 2390 | - The JSON output contains the value of every object in the file, but | ||
| 2391 | - it also contains some processed data. This is analogous to how qpdf's | ||
| 2392 | - library interface works. The processed data is similar to the helper | ||
| 2393 | - functions in that it allows you to look at certain aspects of the PDF | ||
| 2394 | - file without having to understand all the nuances of the PDF | ||
| 2395 | - specification, while the raw objects allow you to mine the PDF for | ||
| 2396 | - anything that the higher-level interfaces are lacking. | ||
| 2397 | - | ||
| 2398 | -.. _json.limitations: | ||
| 2399 | - | ||
| 2400 | -Limitations of JSON Representation | ||
| 2401 | ----------------------------------- | ||
| 2402 | - | ||
| 2403 | -There are a few limitations to be aware of with the JSON structure: | ||
| 2404 | - | ||
| 2405 | -- Strings, names, and indirect object references in the original PDF | ||
| 2406 | - file are all converted to strings in the JSON representation. In the | ||
| 2407 | - case of a "normal" PDF file, you can tell the difference because a | ||
| 2408 | - name starts with a slash (``/``), and an indirect object reference | ||
| 2409 | - looks like ``n n R``, but if there were to be a string that looked | ||
| 2410 | - like a name or indirect object reference, there would be no way to | ||
| 2411 | - tell this from the JSON output. Note that there are certain cases | ||
| 2412 | - where you know for sure what something is, such as knowing that | ||
| 2413 | - dictionary keys in objects are always names and that certain things | ||
| 2414 | - in the higher-level computed data are known to contain indirect | ||
| 2415 | - object references. | ||
| 2416 | - | ||
| 2417 | -- The JSON format doesn't support binary data very well. Mostly the | ||
| 2418 | - details are not important, but they are presented here for | ||
| 2419 | - information. When qpdf outputs a string in the JSON representation, | ||
| 2420 | - it converts the string to UTF-8, assuming usual PDF string semantics. | ||
| 2421 | - Specifically, if the original string is UTF-16, it is converted to | ||
| 2422 | - UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is | ||
| 2423 | - converted to UTF-8 with that assumption. This causes strange things | ||
| 2424 | - to happen to binary strings. For example, if you had the binary | ||
| 2425 | - string ``<038051>``, this would be output to the JSON as ``\u0003โขQ`` | ||
| 2426 | - because ``03`` is not a printable character and ``80`` is the bullet | ||
| 2427 | - character in PDF doc encoding and is mapped to the Unicode value | ||
| 2428 | - ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to | ||
| 2429 | - convert back from here to a binary string, would have to recognize | ||
| 2430 | - Unicode values whose code points are higher than ``0xFF`` and map | ||
| 2431 | - those back to their corresponding PDF doc encoding characters. There | ||
| 2432 | - is no way to tell the difference between a Unicode string that was | ||
| 2433 | - originally encoded as UTF-16 or one that was converted from PDF doc | ||
| 2434 | - encoding. In other words, it's best if you don't try to use the JSON | ||
| 2435 | - format to extract binary strings from the PDF file, but if you really | ||
| 2436 | - had to, it could be done. Note that qpdf's | ||
| 2437 | - :samp:`--show-object` option does not have this | ||
| 2438 | - limitation and will reveal the string as encoded in the original | ||
| 2439 | - file. | ||
| 2440 | - | ||
| 2441 | -.. _json.considerations: | ||
| 2442 | - | ||
| 2443 | -JSON: Special Considerations | ||
| 2444 | ----------------------------- | ||
| 2445 | - | ||
| 2446 | -For the most part, the built-in JSON help tells you everything you need | ||
| 2447 | -to know about the JSON format, but there are a few non-obvious things to | ||
| 2448 | -be aware of: | ||
| 2449 | - | ||
| 2450 | -- While qpdf guarantees that keys present in the help will be present | ||
| 2451 | - in the output, those fields may be null or empty if the information | ||
| 2452 | - is not known or absent in the file. Also, if you specify | ||
| 2453 | - :samp:`--json-keys`, the keys that are not listed | ||
| 2454 | - will be excluded entirely except for those that | ||
| 2455 | - :samp:`--json-help` says are always present. | ||
| 2456 | - | ||
| 2457 | -- In a few places, there are keys with names containing | ||
| 2458 | - ``pageposfrom1``. The values of these keys are null or an integer. If | ||
| 2459 | - an integer, they point to a page index within the file numbering from | ||
| 2460 | - 1. Note that JSON indexes from 0, and you would also use 0-based | ||
| 2461 | - indexing using the API. However, 1-based indexing is easier in this | ||
| 2462 | - case because the command-line syntax for specifying page ranges is | ||
| 2463 | - 1-based. If you were going to write a program that looked through the | ||
| 2464 | - JSON for information about specific pages and then use the | ||
| 2465 | - command-line to extract those pages, 1-based indexing is easier. | ||
| 2466 | - Besides, it's more convenient to subtract 1 from a program in a real | ||
| 2467 | - programming language than it is to add 1 from shell code. | ||
| 2468 | - | ||
| 2469 | -- The image information included in the ``page`` section of the JSON | ||
| 2470 | - output includes the key "``filterable``". Note that the value of this | ||
| 2471 | - field may depend on the :samp:`--decode-level` that | ||
| 2472 | - you invoke qpdf with. The JSON output includes a top-level key | ||
| 2473 | - "``parameters``" that indicates the decode level used for computing | ||
| 2474 | - whether a stream was filterable. For example, jpeg images will be | ||
| 2475 | - shown as not filterable by default, but they will be shown as | ||
| 2476 | - filterable if you run :command:`qpdf --json | ||
| 2477 | - --decode-level=all`. | ||
| 2478 | - | ||
| 2479 | -.. _ref.design: | ||
| 2480 | - | ||
| 2481 | -Design and Library Notes | ||
| 2482 | -======================== | ||
| 2483 | - | ||
| 2484 | -.. _ref.design.intro: | ||
| 2485 | - | ||
| 2486 | -Introduction | ||
| 2487 | ------------- | ||
| 2488 | - | ||
| 2489 | -This section was written prior to the implementation of the qpdf package | ||
| 2490 | -and was subsequently modified to reflect the implementation. In some | ||
| 2491 | -cases, for purposes of explanation, it may differ slightly from the | ||
| 2492 | -actual implementation. As always, the source code and test suite are | ||
| 2493 | -authoritative. Even if there are some errors, this document should serve | ||
| 2494 | -as a road map to understanding how this code works. | ||
| 2495 | - | ||
| 2496 | -In general, one should adhere strictly to a specification when writing | ||
| 2497 | -but be liberal in reading. This way, the product of our software will be | ||
| 2498 | -accepted by the widest range of other programs, and we will accept the | ||
| 2499 | -widest range of input files. This library attempts to conform to that | ||
| 2500 | -philosophy whenever possible but also aims to provide strict checking | ||
| 2501 | -for people who want to validate PDF files. If you don't want to see | ||
| 2502 | -warnings and are trying to write something that is tolerant, you can | ||
| 2503 | -call ``setSuppressWarnings(true)``. If you want to fail on the first | ||
| 2504 | -error, you can call ``setAttemptRecovery(false)``. The default behavior | ||
| 2505 | -is to generating warnings for recoverable problems. Note that recovery | ||
| 2506 | -will not always produce the desired results even if it is able to get | ||
| 2507 | -through the file. Unlike most other PDF files that produce generic | ||
| 2508 | -warnings such as "This file is damaged,", qpdf generally issues a | ||
| 2509 | -detailed error message that would be most useful to a PDF developer. | ||
| 2510 | -This is by design as there seems to be a shortage of PDF validation | ||
| 2511 | -tools out there. This was, in fact, one of the major motivations behind | ||
| 2512 | -the initial creation of qpdf. | ||
| 2513 | - | ||
| 2514 | -.. _ref.design-goals: | ||
| 2515 | - | ||
| 2516 | -Design Goals | ||
| 2517 | ------------- | ||
| 2518 | - | ||
| 2519 | -The QPDF package includes support for reading and rewriting PDF files. | ||
| 2520 | -It aims to hide from the user details involving object locations, | ||
| 2521 | -modified (appended) PDF files, the directness/indirectness of objects, | ||
| 2522 | -and stream filters including encryption. It does not aim to hide | ||
| 2523 | -knowledge of the object hierarchy or content stream contents. Put | ||
| 2524 | -another way, a user of the qpdf library is expected to have knowledge | ||
| 2525 | -about how PDF files work, but is not expected to have to keep track of | ||
| 2526 | -bookkeeping details such as file positions. | ||
| 2527 | - | ||
| 2528 | -A user of the library never has to care whether an object is direct or | ||
| 2529 | -indirect, though it is possible to determine whether an object is direct | ||
| 2530 | -or not if this information is needed. All access to objects deals with | ||
| 2531 | -this transparently. All memory management details are also handled by | ||
| 2532 | -the library. | ||
| 2533 | - | ||
| 2534 | -The ``PointerHolder`` object is used internally by the library to deal | ||
| 2535 | -with memory management. This is basically a smart pointer object very | ||
| 2536 | -similar in spirit to C++-11's ``std::shared_ptr`` object, but predating | ||
| 2537 | -it by several years. This library also makes use of a technique for | ||
| 2538 | -giving fine-grained access to methods in one class to other classes by | ||
| 2539 | -using public subclasses with friends and only private members that in | ||
| 2540 | -turn call private methods of the containing class. See | ||
| 2541 | -``QPDFObjectHandle::Factory`` as an example. | ||
| 2542 | - | ||
| 2543 | -The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF | ||
| 2544 | -file. The library provides methods for both accessing and mutating PDF | ||
| 2545 | -files. | ||
| 2546 | - | ||
| 2547 | -The primary class for interacting with PDF objects is | ||
| 2548 | -``QPDFObjectHandle``. Instances of this class can be passed around by | ||
| 2549 | -value, copied, stored in containers, etc. with very low overhead. | ||
| 2550 | -Instances of ``QPDFObjectHandle`` created by reading from a file will | ||
| 2551 | -always contain a reference back to the ``QPDF`` object from which they | ||
| 2552 | -were created. A ``QPDFObjectHandle`` may be direct or indirect. If | ||
| 2553 | -indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to | ||
| 2554 | -is a null pointer. In this case, the first attempt to access the | ||
| 2555 | -underlying ``QPDFObject`` will result in the ``QPDFObject`` being | ||
| 2556 | -resolved via a call to the referenced ``QPDF`` instance. This makes it | ||
| 2557 | -essentially impossible to make coding errors in which certain things | ||
| 2558 | -will work for some PDF files and not for others based on which objects | ||
| 2559 | -are direct and which objects are indirect. | ||
| 2560 | - | ||
| 2561 | -Instances of ``QPDFObjectHandle`` can be directly created and modified | ||
| 2562 | -using static factory methods in the ``QPDFObjectHandle`` class. There | ||
| 2563 | -are factory methods for each type of object as well as a convenience | ||
| 2564 | -method ``QPDFObjectHandle::parse`` that creates an object from a string | ||
| 2565 | -representation of the object. Existing instances of ``QPDFObjectHandle`` | ||
| 2566 | -can also be modified in several ways. See comments in | ||
| 2567 | -:file:`QPDFObjectHandle.hh` for details. | ||
| 2568 | - | ||
| 2569 | -An instance of ``QPDF`` is constructed by using the class's default | ||
| 2570 | -constructor. If desired, the ``QPDF`` object may be configured with | ||
| 2571 | -various methods that change its default behavior. Then the | ||
| 2572 | -``QPDF::processFile()`` method is passed the name of a PDF file, which | ||
| 2573 | -permanently associates the file with that QPDF object. A password may | ||
| 2574 | -also be given for access to password-protected files. QPDF does not | ||
| 2575 | -enforce encryption parameters and will treat user and owner passwords | ||
| 2576 | -equivalently. Either password may be used to access an encrypted file. | ||
| 2577 | -``QPDF`` will allow recovery of a user password given an owner password. | ||
| 2578 | -The input PDF file must be seekable. (Output files written by | ||
| 2579 | -``QPDFWriter`` need not be seekable, even when creating linearized | ||
| 2580 | -files.) During construction, ``QPDF`` validates the PDF file's header, | ||
| 2581 | -and then reads the cross reference tables and trailer dictionaries. The | ||
| 2582 | -``QPDF`` class keeps only the first trailer dictionary though it does | ||
| 2583 | -read all of them so it can check the ``/Prev`` key. ``QPDF`` class users | ||
| 2584 | -may request the root object and the trailer dictionary specifically. The | ||
| 2585 | -cross reference table is kept private. Objects may then be requested by | ||
| 2586 | -number of by walking the object tree. | ||
| 2587 | - | ||
| 2588 | -When a PDF file has a cross-reference stream instead of a | ||
| 2589 | -cross-reference table and trailer, requesting the document's trailer | ||
| 2590 | -dictionary returns the stream dictionary from the cross-reference stream | ||
| 2591 | -instead. | ||
| 2592 | - | ||
| 2593 | -There are some convenience routines for very common operations such as | ||
| 2594 | -walking the page tree and returning a vector of all page objects. For | ||
| 2595 | -full details, please see the header files | ||
| 2596 | -:file:`QPDF.hh` and | ||
| 2597 | -:file:`QPDFObjectHandle.hh`. There are also some | ||
| 2598 | -additional helper classes that provide higher level API functions for | ||
| 2599 | -certain document constructions. These are discussed in :ref:`ref.helper-classes`. | ||
| 2600 | - | ||
| 2601 | -.. _ref.helper-classes: | ||
| 2602 | - | ||
| 2603 | -Helper Classes | ||
| 2604 | --------------- | ||
| 2605 | - | ||
| 2606 | -QPDF version 8.1 introduced the concept of helper classes. Helper | ||
| 2607 | -classes are intended to contain higher level APIs that allow developers | ||
| 2608 | -to work with certain document constructs at an abstraction level above | ||
| 2609 | -that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of | ||
| 2610 | -not hiding document structure from the developer. As with qpdf in | ||
| 2611 | -general, the goal is take away some of the more tedious bookkeeping | ||
| 2612 | -aspects of working with PDF files, not to remove the need for the | ||
| 2613 | -developer to understand how the PDF construction in question works. The | ||
| 2614 | -driving factor behind the creation of helper classes was to allow the | ||
| 2615 | -evolution of higher level interfaces in qpdf without polluting the | ||
| 2616 | -interfaces of the main top-level classes ``QPDF`` and | ||
| 2617 | -``QPDFObjectHandle``. | ||
| 2618 | - | ||
| 2619 | -There are two kinds of helper classes: *document* helpers and *object* | ||
| 2620 | -helpers. Document helpers are constructed with a reference to a ``QPDF`` | ||
| 2621 | -object and provide methods for working with structures that are at the | ||
| 2622 | -document level. Object helpers are constructed with an instance of a | ||
| 2623 | -``QPDFObjectHandle`` and provide methods for working with specific types | ||
| 2624 | -of objects. | ||
| 2625 | - | ||
| 2626 | -Examples of document helpers include ``QPDFPageDocumentHelper``, which | ||
| 2627 | -contains methods for operating on the document's page trees, such as | ||
| 2628 | -enumerating all pages of a document and adding and removing pages; and | ||
| 2629 | -``QPDFAcroFormDocumentHelper``, which contains document-level methods | ||
| 2630 | -related to interactive forms, such as enumerating form fields and | ||
| 2631 | -creating mappings between form fields and annotations. | ||
| 2632 | - | ||
| 2633 | -Examples of object helpers include ``QPDFPageObjectHelper`` for | ||
| 2634 | -performing operations on pages such as page rotation and some operations | ||
| 2635 | -on content streams, ``QPDFFormFieldObjectHelper`` for performing | ||
| 2636 | -operations related to interactive form fields, and | ||
| 2637 | -``QPDFAnnotationObjectHelper`` for working with annotations. | ||
| 2638 | - | ||
| 2639 | -It is always possible to retrieve the underlying ``QPDF`` reference from | ||
| 2640 | -a document helper and the underlying ``QPDFObjectHandle`` reference from | ||
| 2641 | -an object helper. Helpers are designed to be helpers, not wrappers. The | ||
| 2642 | -intention is that, in general, it is safe to freely intermix operations | ||
| 2643 | -that use helpers with operations that use the underlying objects. | ||
| 2644 | -Document and object helpers do not attempt to provide a complete | ||
| 2645 | -interface for working with the things they are helping with, nor do they | ||
| 2646 | -attempt to encapsulate underlying structures. They just provide a few | ||
| 2647 | -methods to help with error-prone, repetitive, or complex tasks. In some | ||
| 2648 | -cases, a helper object may cache some information that is expensive to | ||
| 2649 | -gather. In such cases, the helper classes are implemented so that their | ||
| 2650 | -own methods keep the cache consistent, and the header file will provide | ||
| 2651 | -a method to invalidate the cache and a description of what kinds of | ||
| 2652 | -operations would make the cache invalid. If in doubt, you can always | ||
| 2653 | -discard a helper class and create a new one with the same underlying | ||
| 2654 | -objects, which will ensure that you have discarded any stale | ||
| 2655 | -information. | ||
| 2656 | - | ||
| 2657 | -By Convention, document helpers are called | ||
| 2658 | -``QPDFSomethingDocumentHelper`` and are derived from | ||
| 2659 | -``QPDFDocumentHelper``, and object helpers are called | ||
| 2660 | -``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``. | ||
| 2661 | -For details on specific helpers, please see their header files. You can | ||
| 2662 | -find them by looking at | ||
| 2663 | -:file:`include/qpdf/QPDF*DocumentHelper.hh` and | ||
| 2664 | -:file:`include/qpdf/QPDF*ObjectHelper.hh`. | ||
| 2665 | - | ||
| 2666 | -In order to avoid creation of circular dependencies, the following | ||
| 2667 | -general guidelines are followed with helper classes: | ||
| 2668 | - | ||
| 2669 | -- Core class interfaces do not know about helper classes. For example, | ||
| 2670 | - no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper | ||
| 2671 | - classes in their interfaces. | ||
| 2672 | - | ||
| 2673 | -- Interfaces of object helpers will usually not use document helpers in | ||
| 2674 | - their interfaces. This is because it is much more useful for document | ||
| 2675 | - helpers to have methods that return object helpers. Most operations | ||
| 2676 | - in PDF files start at the document level and go from there to the | ||
| 2677 | - object level rather than the other way around. It can sometimes be | ||
| 2678 | - useful to map back from object-level structures to document-level | ||
| 2679 | - structures. If there is a desire to do this, it will generally be | ||
| 2680 | - provided by a method in the document helper class. | ||
| 2681 | - | ||
| 2682 | -- Most of the time, object helpers don't know about other object | ||
| 2683 | - helpers. However, in some cases, one type of object may be a | ||
| 2684 | - container for another type of object, in which case it may make sense | ||
| 2685 | - for the outer object to know about the inner object. For example, | ||
| 2686 | - there are methods in the ``QPDFPageObjectHelper`` that know | ||
| 2687 | - ``QPDFAnnotationObjectHelper`` because references to annotations are | ||
| 2688 | - contained in page dictionaries. | ||
| 2689 | - | ||
| 2690 | -- Any helper or core library class may use helpers in their | ||
| 2691 | - implementations. | ||
| 2692 | - | ||
| 2693 | -Prior to qpdf version 8.1, higher level interfaces were added as | ||
| 2694 | -"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For | ||
| 2695 | -compatibility, older convenience functions for operating with pages will | ||
| 2696 | -remain in those classes even as alternatives are provided in helper | ||
| 2697 | -classes. Going forward, new higher level interfaces will be provided | ||
| 2698 | -using helper classes. | ||
| 2699 | - | ||
| 2700 | -.. _ref.implementation-notes: | ||
| 2701 | - | ||
| 2702 | -Implementation Notes | ||
| 2703 | --------------------- | ||
| 2704 | - | ||
| 2705 | -This section contains a few notes about QPDF's internal implementation, | ||
| 2706 | -particularly around what it does when it first processes a file. This | ||
| 2707 | -section is a bit of a simplification of what it actually does, but it | ||
| 2708 | -could serve as a starting point to someone trying to understand the | ||
| 2709 | -implementation. There is nothing in this section that you need to know | ||
| 2710 | -to use the qpdf library. | ||
| 2711 | - | ||
| 2712 | -``QPDFObject`` is the basic PDF Object class. It is an abstract base | ||
| 2713 | -class from which are derived classes for each type of PDF object. | ||
| 2714 | -Clients do not interact with Objects directly but instead interact with | ||
| 2715 | -``QPDFObjectHandle``. | ||
| 2716 | - | ||
| 2717 | -When the ``QPDF`` class creates a new object, it dynamically allocates | ||
| 2718 | -the appropriate type of ``QPDFObject`` and immediately hands the pointer | ||
| 2719 | -to an instance of ``QPDFObjectHandle``. The parser reads a token from | ||
| 2720 | -the current file position. If the token is a not either a dictionary or | ||
| 2721 | -array opener, an object is immediately constructed from the single token | ||
| 2722 | -and the parser returns. Otherwise, the parser iterates in a special mode | ||
| 2723 | -in which it accumulates objects until it finds a balancing closer. | ||
| 2724 | -During this process, the "``R``" keyword is recognized and an indirect | ||
| 2725 | -``QPDFObjectHandle`` may be constructed. | ||
| 2726 | - | ||
| 2727 | -The ``QPDF::resolve()`` method, which is used to resolve an indirect | ||
| 2728 | -object, may be invoked from the ``QPDFObjectHandle`` class. It first | ||
| 2729 | -checks a cache to see whether this object has already been read. If not, | ||
| 2730 | -it reads the object from the PDF file and caches it. It the returns the | ||
| 2731 | -resulting ``QPDFObjectHandle``. The calling object handle then replaces | ||
| 2732 | -its ``PointerHolder<QDFObject>`` with the one from the newly returned | ||
| 2733 | -``QPDFObjectHandle``. In this way, only a single copy of any direct | ||
| 2734 | -object need exist and clients can access objects transparently without | ||
| 2735 | -knowing caring whether they are direct or indirect objects. | ||
| 2736 | -Additionally, no object is ever read from the file more than once. That | ||
| 2737 | -means that only the portions of the PDF file that are actually needed | ||
| 2738 | -are ever read from the input file, thus allowing the qpdf package to | ||
| 2739 | -take advantage of this important design goal of PDF files. | ||
| 2740 | - | ||
| 2741 | -If the requested object is inside of an object stream, the object stream | ||
| 2742 | -itself is first read into memory. Then the tokenizer reads objects from | ||
| 2743 | -the memory stream based on the offset information stored in the stream. | ||
| 2744 | -Those individual objects are cached, after which the temporary buffer | ||
| 2745 | -holding the object stream contents are discarded. In this way, the first | ||
| 2746 | -time an object in an object stream is requested, all objects in the | ||
| 2747 | -stream are cached. | ||
| 2748 | - | ||
| 2749 | -The following example should clarify how ``QPDF`` processes a simple | ||
| 2750 | -file. | ||
| 2751 | - | ||
| 2752 | -- Client constructs ``QPDF`` ``pdf`` and calls | ||
| 2753 | - ``pdf.processFile("a.pdf");``. | ||
| 2754 | - | ||
| 2755 | -- The ``QPDF`` class checks the beginning of | ||
| 2756 | - :file:`a.pdf` for a PDF header. It then reads the | ||
| 2757 | - cross reference table mentioned at the end of the file, ensuring that | ||
| 2758 | - it is looking before the last ``%%EOF``. After getting to ``trailer`` | ||
| 2759 | - keyword, it invokes the parser. | ||
| 2760 | - | ||
| 2761 | -- The parser sees "``<<``", so it calls itself recursively in | ||
| 2762 | - dictionary creation mode. | ||
| 2763 | - | ||
| 2764 | -- In dictionary creation mode, the parser keeps accumulating objects | ||
| 2765 | - until it encounters "``>>``". Each object that is read is pushed onto | ||
| 2766 | - a stack. If "``R``" is read, the last two objects on the stack are | ||
| 2767 | - inspected. If they are integers, they are popped off the stack and | ||
| 2768 | - their values are used to construct an indirect object handle which is | ||
| 2769 | - then pushed onto the stack. When "``>>``" is finally read, the stack | ||
| 2770 | - is converted into a ``QPDF_Dictionary`` which is placed in a | ||
| 2771 | - ``QPDFObjectHandle`` and returned. | ||
| 2772 | - | ||
| 2773 | -- The resulting dictionary is saved as the trailer dictionary. | ||
| 2774 | - | ||
| 2775 | -- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that | ||
| 2776 | - point and repeats except that the new trailer dictionary is not | ||
| 2777 | - saved. If ``/Prev`` is not present, the initial parsing process is | ||
| 2778 | - complete. | ||
| 2779 | - | ||
| 2780 | - If there is an encryption dictionary, the document's encryption | ||
| 2781 | - parameters are initialized. | ||
| 2782 | - | ||
| 2783 | -- The client requests root object. The ``QPDF`` class gets the value of | ||
| 2784 | - root key from trailer dictionary and returns it. It is an unresolved | ||
| 2785 | - indirect ``QPDFObjectHandle``. | ||
| 2786 | - | ||
| 2787 | -- The client requests the ``/Pages`` key from root | ||
| 2788 | - ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is | ||
| 2789 | - indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the | ||
| 2790 | - object cache for an object with the root dictionary's object ID and | ||
| 2791 | - generation number. Upon not seeing it, it checks the cross reference | ||
| 2792 | - table, gets the offset, and reads the object present at that offset. | ||
| 2793 | - It stores the result in the object cache and returns the cached | ||
| 2794 | - result. The calling ``QPDFObjectHandle`` replaces its object pointer | ||
| 2795 | - with the one from the resolved ``QPDFObjectHandle``, verifies that it | ||
| 2796 | - a valid dictionary object, and returns the (unresolved indirect) | ||
| 2797 | - ``QPDFObject`` handle to the top of the Pages hierarchy. | ||
| 2798 | - | ||
| 2799 | - As the client continues to request objects, the same process is | ||
| 2800 | - followed for each new requested object. | ||
| 2801 | - | ||
| 2802 | -.. _ref.casting: | ||
| 2803 | - | ||
| 2804 | -Casting Policy | ||
| 2805 | --------------- | ||
| 2806 | - | ||
| 2807 | -This section describes the casting policy followed by qpdf's | ||
| 2808 | -implementation. This is no concern to qpdf's end users and largely of no | ||
| 2809 | -concern to people writing code that uses qpdf, but it could be of | ||
| 2810 | -interest to people who are porting qpdf to a new platform or who are | ||
| 2811 | -making modifications to the code. | ||
| 2812 | - | ||
| 2813 | -The C++ code in qpdf is free of old-style casts except where unavoidable | ||
| 2814 | -(e.g. where the old-style cast is in a macro provided by a third-party | ||
| 2815 | -header file). When there is a need for a cast, it is handled, in order | ||
| 2816 | -of preference, by rewriting the code to avoid the need for a cast, | ||
| 2817 | -calling ``const_cast``, calling ``static_cast``, calling | ||
| 2818 | -``reinterpret_cast``, or calling some combination of the above. As a | ||
| 2819 | -last resort, a compiler-specific ``#pragma`` may be used to suppress a | ||
| 2820 | -warning that we don't want to fix. Examples may include suppressing | ||
| 2821 | -warnings about the use of old-style casts in code that is shared between | ||
| 2822 | -C and C++ code. | ||
| 2823 | - | ||
| 2824 | -The ``QIntC`` namespace, provided by | ||
| 2825 | -:file:`include/qpdf/QIntC.hh`, implements safe | ||
| 2826 | -functions for converting between integer types. These functions do range | ||
| 2827 | -checking and throw a ``std::range_error``, which is subclass of | ||
| 2828 | -``std::runtime_error``, if conversion from one integer type to another | ||
| 2829 | -results in loss of information. There are many cases in which we have to | ||
| 2830 | -move between different integer types because of incompatible integer | ||
| 2831 | -types used in interoperable interfaces. Some are unavoidable, such as | ||
| 2832 | -moving between sizes and offsets, and others are there because of old | ||
| 2833 | -code that is too in entrenched to be fixable without breaking source | ||
| 2834 | -compatibility and causing pain for users. QPDF is compiled with extra | ||
| 2835 | -warnings to detect conversions with potential data loss, and all such | ||
| 2836 | -cases should be fixed by either using a function from ``QIntC`` or a | ||
| 2837 | -``static_cast``. | ||
| 2838 | - | ||
| 2839 | -When the intention is just to switch the type because of exchanging data | ||
| 2840 | -between incompatible interfaces, use ``QIntC``. This is the usual case. | ||
| 2841 | -However, there are some cases in which we are explicitly intending to | ||
| 2842 | -use the exact same bit pattern with a different type. This is most | ||
| 2843 | -common when switching between signed and unsigned characters. A lot of | ||
| 2844 | -qpdf's code uses unsigned characters internally, but ``std::string`` and | ||
| 2845 | -``char`` are signed. Using ``QIntC::to_char`` would be wrong for | ||
| 2846 | -converting from unsigned to signed characters because a negative | ||
| 2847 | -``char`` value and the corresponding ``unsigned char`` value greater | ||
| 2848 | -than 127 *mean the same thing*. There are also | ||
| 2849 | -cases in which we use ``static_cast`` when working with bit fields where | ||
| 2850 | -we are not representing a numerical value but rather a bunch of bits | ||
| 2851 | -packed together in some integer type. Also note that ``size_t`` and | ||
| 2852 | -``long`` both typically differ between 32-bit and 64-bit environments, | ||
| 2853 | -so sometimes an explicit cast may not be needed to avoid warnings on one | ||
| 2854 | -platform but may be needed on another. A conversion with ``QIntC`` | ||
| 2855 | -should always be used when the types are different even if the | ||
| 2856 | -underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit | ||
| 2857 | -platforms, and the test suite is very thorough, so it is hard to make | ||
| 2858 | -any of the potential errors here without being caught in build or test. | ||
| 2859 | - | ||
| 2860 | -Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The | ||
| 2861 | -pipeline interface has a ``write`` call that uses ``unsigned char*`` | ||
| 2862 | -without a ``const`` qualifier. The main reason for this is | ||
| 2863 | -to support pipelines that make calls to third-party libraries, such as | ||
| 2864 | -zlib, that don't include ``const`` in their interfaces. Unfortunately, | ||
| 2865 | -there are many places in the code where it is desirable to have | ||
| 2866 | -``const char*`` with pipelines. None of the pipeline implementations | ||
| 2867 | -in qpdf | ||
| 2868 | -currently modify the data passed to write, and doing so would be counter | ||
| 2869 | -to the intent of ``Pipeline``, but there is nothing in the code to | ||
| 2870 | -prevent this from being done. There are places in the code where | ||
| 2871 | -``const_cast`` is used to remove the const-ness of pointers going into | ||
| 2872 | -``Pipeline``\ s. This could theoretically be unsafe, but there is | ||
| 2873 | -adequate testing to assert that it is safe and will remain safe in | ||
| 2874 | -qpdf's code. | ||
| 2875 | - | ||
| 2876 | -.. _ref.encryption: | ||
| 2877 | - | ||
| 2878 | -Encryption | ||
| 2879 | ----------- | ||
| 2880 | - | ||
| 2881 | -Encryption is supported transparently by qpdf. When opening a PDF file, | ||
| 2882 | -if an encryption dictionary exists, the ``QPDF`` object processes this | ||
| 2883 | -dictionary using the password (if any) provided. The primary decryption | ||
| 2884 | -key is computed and cached. No further access is made to the encryption | ||
| 2885 | -dictionary after that time. When an object is read from a file, the | ||
| 2886 | -object ID and generation of the object in which it is contained is | ||
| 2887 | -always known. Using this information along with the stored encryption | ||
| 2888 | -key, all stream and string objects are transparently decrypted. Raw | ||
| 2889 | -encrypted objects are never stored in memory. This way, nothing in the | ||
| 2890 | -library ever has to know or care whether it is reading an encrypted | ||
| 2891 | -file. | ||
| 2892 | - | ||
| 2893 | -An interface is also provided for writing encrypted streams and strings | ||
| 2894 | -given an encryption key. This is used by ``QPDFWriter`` when it rewrites | ||
| 2895 | -encrypted files. | ||
| 2896 | - | ||
| 2897 | -When copying encrypted files, unless otherwise directed, qpdf will | ||
| 2898 | -preserve any encryption in force in the original file. qpdf can do this | ||
| 2899 | -with either the user or the owner password. There is no difference in | ||
| 2900 | -capability based on which password is used. When 40 or 128 bit | ||
| 2901 | -encryption keys are used, the user password can be recovered with the | ||
| 2902 | -owner password. With 256 keys, the user and owner passwords are used | ||
| 2903 | -independently to encrypt the actual encryption key, so while either can | ||
| 2904 | -be used, the owner password can no longer be used to recover the user | ||
| 2905 | -password. | ||
| 2906 | - | ||
| 2907 | -Starting with version 4.0.0, qpdf can read files that are not encrypted | ||
| 2908 | -but that contain encrypted attachments, but it cannot write such files. | ||
| 2909 | -qpdf also requires the password to be specified in order to open the | ||
| 2910 | -file, not just to extract attachments, since once the file is open, all | ||
| 2911 | -decryption is handled transparently. When copying files like this while | ||
| 2912 | -preserving encryption, qpdf will apply the file's encryption to | ||
| 2913 | -everything in the file, not just to the attachments. When decrypting the | ||
| 2914 | -file, qpdf will decrypt the attachments. In general, when copying PDF | ||
| 2915 | -files with multiple encryption formats, qpdf will choose the newest | ||
| 2916 | -format. The only exception to this is that clear-text metadata will be | ||
| 2917 | -preserved as clear-text if it is that way in the original file. | ||
| 2918 | - | ||
| 2919 | -One point of confusion some people have about encrypted PDF files is | ||
| 2920 | -that encryption is not the same as password protection. Password | ||
| 2921 | -protected files are always encrypted, but it is also possible to create | ||
| 2922 | -encrypted files that do not have passwords. Internally, such files use | ||
| 2923 | -the empty string as a password, and most readers try the empty string | ||
| 2924 | -first to see if it works and prompt for a password only if the empty | ||
| 2925 | -string doesn't work. Normally such files have an empty user password and | ||
| 2926 | -a non-empty owner password. In that way, if the file is opened by an | ||
| 2927 | -ordinary reader without specification of password, the restrictions | ||
| 2928 | -specified in the encryption dictionary can be enforced. Most users | ||
| 2929 | -wouldn't even realize such a file was encrypted. Since qpdf always | ||
| 2930 | -ignores the restrictions (except for the purpose of reporting what they | ||
| 2931 | -are), qpdf doesn't care which password you use. QPDF will allow you to | ||
| 2932 | -create PDF files with non-empty user passwords and empty owner | ||
| 2933 | -passwords. Some readers will require a password when you open these | ||
| 2934 | -files, and others will open the files without a password and not enforce | ||
| 2935 | -restrictions. Having a non-empty user password and an empty owner | ||
| 2936 | -password doesn't really make sense because it would mean that opening | ||
| 2937 | -the file with the user password would be more restrictive than not | ||
| 2938 | -supplying a password at all. QPDF also allows you to create PDF files | ||
| 2939 | -with the same password as both the user and owner password. Some readers | ||
| 2940 | -will not ever allow such files to be accessed without restrictions | ||
| 2941 | -because they never try the password as the owner password if it works as | ||
| 2942 | -the user password. Nonetheless, one of the powerful aspects of qpdf is | ||
| 2943 | -that it allows you to finely specify the way encrypted files are | ||
| 2944 | -created, even if the results are not useful to some readers. One use | ||
| 2945 | -case for this would be for testing a PDF reader to ensure that it | ||
| 2946 | -handles odd configurations of input files. | ||
| 2947 | - | ||
| 2948 | -.. _ref.random-numbers: | ||
| 2949 | - | ||
| 2950 | -Random Number Generation | ||
| 2951 | ------------------------- | ||
| 2952 | - | ||
| 2953 | -QPDF generates random numbers to support generation of encrypted data. | ||
| 2954 | -Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of | ||
| 2955 | -random numbers. Older versions used the OS-provided source of secure | ||
| 2956 | -random numbers or, if allowed at build time, insecure random numbers | ||
| 2957 | -from stdlib. Starting with version 5.1.0, you can disable use of | ||
| 2958 | -OS-provided secure random numbers at build time. This is especially | ||
| 2959 | -useful on Windows if you want to avoid a dependency on Microsoft's | ||
| 2960 | -cryptography API. You can also supply your own random data provider. For | ||
| 2961 | -details on how to do this, please refer to the top-level README.md file | ||
| 2962 | -in the source distribution and to comments in | ||
| 2963 | -:file:`QUtil.hh`. | ||
| 2964 | - | ||
| 2965 | -.. _ref.adding-and-remove-pages: | ||
| 2966 | - | ||
| 2967 | -Adding and Removing Pages | ||
| 2968 | -------------------------- | ||
| 2969 | - | ||
| 2970 | -While qpdf's API has supported adding and modifying objects for some | ||
| 2971 | -time, version 3.0 introduces specific methods for adding and removing | ||
| 2972 | -pages. These are largely convenience routines that handle two tricky | ||
| 2973 | -issues: pushing inheritable resources from the ``/Pages`` tree down to | ||
| 2974 | -individual pages and manipulation of the ``/Pages`` tree itself. For | ||
| 2975 | -details, see ``addPage`` and surrounding methods in | ||
| 2976 | -:file:`QPDF.hh`. | ||
| 2977 | - | ||
| 2978 | -.. _ref.reserved-objects: | ||
| 2979 | - | ||
| 2980 | -Reserving Object Numbers | ||
| 2981 | ------------------------- | ||
| 2982 | - | ||
| 2983 | -Version 3.0 of qpdf introduced the concept of reserved objects. These | ||
| 2984 | -are seldom needed for ordinary operations, but there are cases in which | ||
| 2985 | -you may want to add a series of indirect objects with references to each | ||
| 2986 | -other to a ``QPDF`` object. This causes a problem because you can't | ||
| 2987 | -determine the object ID that a new indirect object will have until you | ||
| 2988 | -add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The | ||
| 2989 | -only way to add two mutually referential objects to a ``QPDF`` object | ||
| 2990 | -prior to version 3.0 would be to add the new objects first and then make | ||
| 2991 | -them refer to each other after adding them. Now it is possible to create | ||
| 2992 | -a *reserved object* using | ||
| 2993 | -``QPDFObjectHandle::newReserved``. This is an indirect object that stays | ||
| 2994 | -"unresolved" even if it is queried for its type. So now, if you want to | ||
| 2995 | -create a set of mutually referential objects, you can create | ||
| 2996 | -reservations for each one of them and use those reservations to | ||
| 2997 | -construct the references. When finished, you can call | ||
| 2998 | -``QPDF::replaceReserved`` to replace the reserved objects with the real | ||
| 2999 | -ones. This functionality will never be needed by most applications, but | ||
| 3000 | -it is used internally by QPDF when copying objects from other PDF files, | ||
| 3001 | -as discussed in :ref:`ref.foreign-objects`. For an example of how to use reserved | ||
| 3002 | -objects, search for ``newReserved`` in | ||
| 3003 | -:file:`test_driver.cc` in qpdf's sources. | ||
| 3004 | - | ||
| 3005 | -.. _ref.foreign-objects: | ||
| 3006 | - | ||
| 3007 | -Copying Objects From Other PDF Files | ||
| 3008 | ------------------------------------- | ||
| 3009 | - | ||
| 3010 | -Version 3.0 of qpdf introduced the ability to copy objects into a | ||
| 3011 | -``QPDF`` object from a different ``QPDF`` object, which we refer to as | ||
| 3012 | -*foreign objects*. This allows arbitrary | ||
| 3013 | -merging of PDF files. The "from" ``QPDF`` object must remain valid after | ||
| 3014 | -the copy as discussed in the note below. The | ||
| 3015 | -:command:`qpdf` command-line tool provides limited | ||
| 3016 | -support for basic page selection, including merging in pages from other | ||
| 3017 | -files, but the library's API makes it possible to implement arbitrarily | ||
| 3018 | -complex merging operations. The main method for copying foreign objects | ||
| 3019 | -is ``QPDF::copyForeignObject``. This takes an indirect object from | ||
| 3020 | -another ``QPDF`` and copies it recursively into this object while | ||
| 3021 | -preserving all object structure, including circular references. This | ||
| 3022 | -means you can add a direct object that you create from scratch to a | ||
| 3023 | -``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an | ||
| 3024 | -indirect object from another file with ``QPDF::copyForeignObject``. The | ||
| 3025 | -fact that ``QPDF::makeIndirectObject`` does not automatically detect a | ||
| 3026 | -foreign object and copy it is an explicit design decision. Copying a | ||
| 3027 | -foreign object seems like a sufficiently significant thing to do that it | ||
| 3028 | -should be done explicitly. | ||
| 3029 | - | ||
| 3030 | -The other way to copy foreign objects is by passing a page from one | ||
| 3031 | -``QPDF`` to another by calling ``QPDF::addPage``. In contrast to | ||
| 3032 | -``QPDF::makeIndirectObject``, this method automatically distinguishes | ||
| 3033 | -between indirect objects in the current file, foreign objects, and | ||
| 3034 | -direct objects. | ||
| 3035 | - | ||
| 3036 | -Please note: when you copy objects from one ``QPDF`` to another, the | ||
| 3037 | -source ``QPDF`` object must remain valid until you have finished with | ||
| 3038 | -the destination object. This is because the original object is still | ||
| 3039 | -used to retrieve any referenced stream data from the copied object. | ||
| 3040 | - | ||
| 3041 | -.. _ref.rewriting: | ||
| 3042 | - | ||
| 3043 | -Writing PDF Files | ||
| 3044 | ------------------ | ||
| 3045 | - | ||
| 3046 | -The qpdf library supports file writing of ``QPDF`` objects to PDF files | ||
| 3047 | -through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two | ||
| 3048 | -writing modes: one for non-linearized files, and one for linearized | ||
| 3049 | -files. See :ref:`ref.linearization` for a description of | ||
| 3050 | -linearization is implemented. This section describes how we write | ||
| 3051 | -non-linearized files including the creation of QDF files (see :ref:`ref.qdf`. | ||
| 3052 | - | ||
| 3053 | -This outline was written prior to implementation and is not exactly | ||
| 3054 | -accurate, but it provides a correct "notional" idea of how writing | ||
| 3055 | -works. Look at the code in ``QPDFWriter`` for exact details. | ||
| 3056 | - | ||
| 3057 | -- Initialize state: | ||
| 3058 | - | ||
| 3059 | - - next object number = 1 | ||
| 3060 | - | ||
| 3061 | - - object queue = empty | ||
| 3062 | - | ||
| 3063 | - - renumber table: old object id/generation to new id/0 = empty | ||
| 3064 | - | ||
| 3065 | - - xref table: new id -> offset = empty | ||
| 3066 | - | ||
| 3067 | -- Create a QPDF object from a file. | ||
| 3068 | - | ||
| 3069 | -- Write header for new PDF file. | ||
| 3070 | - | ||
| 3071 | -- Request the trailer dictionary. | ||
| 3072 | - | ||
| 3073 | -- For each value that is an indirect object, grab the next object | ||
| 3074 | - number (via an operation that returns and increments the number). Map | ||
| 3075 | - object to new number in renumber table. Push object onto queue. | ||
| 3076 | - | ||
| 3077 | -- While there are more objects on the queue: | ||
| 3078 | - | ||
| 3079 | - - Pop queue. | ||
| 3080 | - | ||
| 3081 | - - Look up object's new number *n* in the renumbering table. | ||
| 3082 | - | ||
| 3083 | - - Store current offset into xref table. | ||
| 3084 | - | ||
| 3085 | - - Write ``:samp:`{n}` 0 obj``. | ||
| 3086 | - | ||
| 3087 | - - If object is null, whether direct or indirect, write out null, | ||
| 3088 | - thus eliminating unresolvable indirect object references. | ||
| 3089 | - | ||
| 3090 | - - If the object is a stream stream, write stream contents, piped | ||
| 3091 | - through any filters as required, to a memory buffer. Use this | ||
| 3092 | - buffer to determine the stream length. | ||
| 3093 | - | ||
| 3094 | - - If object is not a stream, array, or dictionary, write out its | ||
| 3095 | - contents. | ||
| 3096 | - | ||
| 3097 | - - If object is an array or dictionary (including stream), traverse | ||
| 3098 | - its elements (for array) or values (for dictionaries), handling | ||
| 3099 | - recursive dictionaries and arrays, looking for indirect objects. | ||
| 3100 | - When an indirect object is found, if it is not resolvable, ignore. | ||
| 3101 | - (This case is handled when writing it out.) Otherwise, look it up | ||
| 3102 | - in the renumbering table. If not found, grab the next available | ||
| 3103 | - object number, assign to the referenced object in the renumbering | ||
| 3104 | - table, and push the referenced object onto the queue. As a special | ||
| 3105 | - case, when writing out a stream dictionary, replace length, | ||
| 3106 | - filters, and decode parameters as required. | ||
| 3107 | - | ||
| 3108 | - Write out dictionary or array, replacing any unresolvable indirect | ||
| 3109 | - object references with null (pdf spec says reference to | ||
| 3110 | - non-existent object is legal and resolves to null) and any | ||
| 3111 | - resolvable ones with references to the renumbered objects. | ||
| 3112 | - | ||
| 3113 | - - If the object is a stream, write ``stream\n``, the stream contents | ||
| 3114 | - (from the memory buffer), and ``\nendstream\n``. | ||
| 3115 | - | ||
| 3116 | - - When done, write ``endobj``. | ||
| 3117 | - | ||
| 3118 | -Once we have finished the queue, all referenced objects will have been | ||
| 3119 | -written out and all deleted objects or unreferenced objects will have | ||
| 3120 | -been skipped. The new cross-reference table will contain an offset for | ||
| 3121 | -every new object number from 1 up to the number of objects written. This | ||
| 3122 | -can be used to write out a new xref table. Finally we can write out the | ||
| 3123 | -trailer dictionary with appropriately computed /ID (see spec, 8.3, File | ||
| 3124 | -Identifiers), the cross reference table offset, and ``%%EOF``. | ||
| 3125 | - | ||
| 3126 | -.. _ref.filtered-streams: | ||
| 3127 | - | ||
| 3128 | -Filtered Streams | ||
| 3129 | ----------------- | ||
| 3130 | - | ||
| 3131 | -Support for streams is implemented through the ``Pipeline`` interface | ||
| 3132 | -which was designed for this package. | ||
| 3133 | - | ||
| 3134 | -When reading streams, create a series of ``Pipeline`` objects. The | ||
| 3135 | -``Pipeline`` abstract base requires implementation ``write()`` and | ||
| 3136 | -``finish()`` and provides an implementation of ``getNext()``. Each | ||
| 3137 | -pipeline object, upon receiving data, does whatever it is going to do | ||
| 3138 | -and then writes the data (possibly modified) to its successor. | ||
| 3139 | -Alternatively, a pipeline may be an end-of-the-line pipeline that does | ||
| 3140 | -something like store its output to a file or a memory buffer ignoring a | ||
| 3141 | -successor. For additional details, look at | ||
| 3142 | -:file:`Pipeline.hh`. | ||
| 3143 | - | ||
| 3144 | -``QPDF`` can read raw or filtered streams. When reading a filtered | ||
| 3145 | -stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each | ||
| 3146 | -appropriate filter object and chains them together. The last filter | ||
| 3147 | -should write to whatever type of output is required. The ``QPDF`` class | ||
| 3148 | -has an interface to write raw or filtered stream contents to a given | ||
| 3149 | -pipeline. | ||
| 3150 | - | ||
| 3151 | -.. _ref.object-accessors: | ||
| 3152 | - | ||
| 3153 | -Object Accessor Methods | ||
| 3154 | ------------------------ | ||
| 3155 | - | ||
| 3156 | -.. | ||
| 3157 | - This section is referenced in QPDFObjectHandle.hh | ||
| 3158 | - | ||
| 3159 | -For general information about how to access instances of | ||
| 3160 | -``QPDFObjectHandle``, please see the comments in | ||
| 3161 | -:file:`QPDFObjectHandle.hh`. Search for "Accessor | ||
| 3162 | -methods". This section provides a more in-depth discussion of the | ||
| 3163 | -behavior and the rationale for the behavior. | ||
| 3164 | - | ||
| 3165 | -*Why were type errors made into warnings?* When type checks were | ||
| 3166 | -introduced into qpdf in the early days, it was expected that type errors | ||
| 3167 | -would only occur as a result of programmer error. However, in practice, | ||
| 3168 | -type errors would occur with malformed PDF files because of assumptions | ||
| 3169 | -made in code, including code within the qpdf library and code written by | ||
| 3170 | -library users. The most common case would be chaining calls to | ||
| 3171 | -``getKey()`` to access keys deep within a dictionary. In many cases, | ||
| 3172 | -qpdf would be able to recover from these situations, but the old | ||
| 3173 | -behavior often resulted in crashes rather than graceful recovery. For | ||
| 3174 | -this reason, the errors were changed to warnings. | ||
| 3175 | - | ||
| 3176 | -*Why even warn about type errors when the user can't usually do anything | ||
| 3177 | -about them?* Type warnings are extremely valuable during development. | ||
| 3178 | -Since it's impossible to catch at compile time things like typos in | ||
| 3179 | -dictionary key names or logic errors around what the structure of a PDF | ||
| 3180 | -file might be, the presence of type warnings can save lots of developer | ||
| 3181 | -time. They have also proven useful in exposing issues in qpdf itself | ||
| 3182 | -that would have otherwise gone undetected. | ||
| 3183 | - | ||
| 3184 | -*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if | ||
| 3185 | -``QPDFObjectHandle`` could be more strongly typed so that you'd have to | ||
| 3186 | -have check that something was of a particular type before calling | ||
| 3187 | -type-specific accessor methods. However, implementing this at this stage | ||
| 3188 | -of the library's history would be quite difficult, and it would make a | ||
| 3189 | -the common pattern of drilling into an object no longer work. While it | ||
| 3190 | -would be possible to have a parallel interface, it would create a lot of | ||
| 3191 | -extra code. If qpdf were written in a language like rust, an interface | ||
| 3192 | -like this would make a lot of sense, but, for a variety of reasons, the | ||
| 3193 | -qpdf API is consistent with other APIs of its time, relying on exception | ||
| 3194 | -handling to catch errors. The underlying PDF objects are inherently not | ||
| 3195 | -type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would | ||
| 3196 | -ultimately cause a lot more code to have to be written and would like | ||
| 3197 | -make software that uses qpdf more brittle, and even so, checks would | ||
| 3198 | -have to occur at runtime. | ||
| 3199 | - | ||
| 3200 | -*Why do type errors sometimes raise exceptions?* The way warnings work | ||
| 3201 | -in qpdf requires a ``QPDF`` object to be associated with an object | ||
| 3202 | -handle for a warning to be issued. It would be nice if this could be | ||
| 3203 | -fixed, but it would require major changes to the API. Rather than | ||
| 3204 | -throwing away these conditions, we convert them to exceptions. It's not | ||
| 3205 | -that bad though. Since any object handle that was read from a file has | ||
| 3206 | -an associated ``QPDF`` object, it would only be type errors on objects | ||
| 3207 | -that were created explicitly that would cause exceptions, and in that | ||
| 3208 | -case, type errors are much more likely to be the result of a coding | ||
| 3209 | -error than invalid input. | ||
| 3210 | - | ||
| 3211 | -*Why does the behavior of a type exception differ between the C and C++ | ||
| 3212 | -API?* There is no way to throw and catch exceptions in C short of | ||
| 3213 | -something like ``setjmp`` and ``longjmp``, and that approach is not | ||
| 3214 | -portable across language barriers. Since the C API is often used from | ||
| 3215 | -other languages, it's important to keep things as simple as possible. | ||
| 3216 | -Starting in qpdf 10.5, exceptions that used to crash code using the C | ||
| 3217 | -API will be written to stderr by default, and it is possible to register | ||
| 3218 | -an error handler. There's no reason that the error handler can't | ||
| 3219 | -simulate exception handling in some way, such as by using ``setjmp`` and | ||
| 3220 | -``longjmp`` or by setting some variable that can be checked after | ||
| 3221 | -library calls are made. In retrospect, it might have been better if the | ||
| 3222 | -C API object handle methods returned error codes like the other methods | ||
| 3223 | -and set return values in passed-in pointers, but this would complicate | ||
| 3224 | -both the implementation and the use of the library for a case that is | ||
| 3225 | -actually quite rare and largely avoidable. | ||
| 3226 | - | ||
| 3227 | -.. _ref.linearization: | ||
| 3228 | - | ||
| 3229 | -Linearization | ||
| 3230 | -============= | ||
| 3231 | - | ||
| 3232 | -This chapter describes how ``QPDF`` and ``QPDFWriter`` implement | ||
| 3233 | -creation and processing of linearized PDFS. | ||
| 3234 | - | ||
| 3235 | -.. _ref.linearization-strategy: | ||
| 3236 | - | ||
| 3237 | -Basic Strategy for Linearization | ||
| 3238 | --------------------------------- | ||
| 3239 | - | ||
| 3240 | -To avoid the incestuous problem of having the qpdf library validate its | ||
| 3241 | -own linearized files, we have a special linearized file checking mode | ||
| 3242 | -which can be invoked via :command:`qpdf | ||
| 3243 | ---check-linearization` (or :command:`qpdf | ||
| 3244 | ---check`). This mode reads the linearization parameter | ||
| 3245 | -dictionary and the hint streams and validates that object ordering, | ||
| 3246 | -parameters, and hint stream contents are correct. The validation code | ||
| 3247 | -was first tested against linearized files created by external tools | ||
| 3248 | -(Acrobat and pdlin) and then used to validate files created by | ||
| 3249 | -``QPDFWriter`` itself. | ||
| 3250 | - | ||
| 3251 | -.. _ref.linearized.preparation: | ||
| 3252 | - | ||
| 3253 | -Preparing For Linearization | ||
| 3254 | ---------------------------- | ||
| 3255 | - | ||
| 3256 | -Before creating a linearized PDF file from any other PDF file, the PDF | ||
| 3257 | -file must be altered such that all page attributes are propagated down | ||
| 3258 | -to the page level (and not inherited from parents in the ``/Pages`` | ||
| 3259 | -tree). We also have to know which objects refer to which other objects, | ||
| 3260 | -being concerned with page boundaries and a few other cases. We refer to | ||
| 3261 | -this part of preparing the PDF file as | ||
| 3262 | -*optimization*, discussed in | ||
| 3263 | -:ref:`ref.optimization`. Note the, in this context, the | ||
| 3264 | -term *optimization* is a qpdf term, and the | ||
| 3265 | -term *linearization* is a term from the PDF | ||
| 3266 | -specification. Do not be confused by the fact that many applications | ||
| 3267 | -refer to linearization as optimization or web optimization. | ||
| 3268 | - | ||
| 3269 | -When creating linearized PDF files from optimized PDF files, there are | ||
| 3270 | -really only a few issues that need to be dealt with: | ||
| 3271 | - | ||
| 3272 | -- Creation of hints tables | ||
| 3273 | - | ||
| 3274 | -- Placing objects in the correct order | ||
| 3275 | - | ||
| 3276 | -- Filling in offsets and byte sizes | ||
| 3277 | - | ||
| 3278 | -.. _ref.optimization: | ||
| 3279 | - | ||
| 3280 | -Optimization | ||
| 3281 | ------------- | ||
| 3282 | - | ||
| 3283 | -In order to perform various operations such as linearization and | ||
| 3284 | -splitting files into pages, it is necessary to know which objects are | ||
| 3285 | -referenced by which pages, page thumbnails, and root and trailer | ||
| 3286 | -dictionary keys. It is also necessary to ensure that all page-level | ||
| 3287 | -attributes appear directly at the page level and are not inherited from | ||
| 3288 | -parents in the pages tree. | ||
| 3289 | - | ||
| 3290 | -We refer to the process of enforcing these constraints as | ||
| 3291 | -*optimization*. As mentioned above, note | ||
| 3292 | -that some applications refer to linearization as optimization. Although | ||
| 3293 | -this optimization was initially motivated by the need to create | ||
| 3294 | -linearized files, we are using these terms separately. | ||
| 3295 | - | ||
| 3296 | -PDF file optimization is implemented in the | ||
| 3297 | -:file:`QPDF_optimization.cc` source file. That file | ||
| 3298 | -is richly commented and serves as the primary reference for the | ||
| 3299 | -optimization process. | ||
| 3300 | - | ||
| 3301 | -After optimization has been completed, the private member variables | ||
| 3302 | -``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have | ||
| 3303 | -been populated. Any object that has more than one value in the | ||
| 3304 | -``object_to_obj_users`` table is shared. Any object that has exactly one | ||
| 3305 | -value in the ``object_to_obj_users`` table is private. To find all the | ||
| 3306 | -private objects in a page or a trailer or root dictionary key, one | ||
| 3307 | -merely has make this determination for each element in the | ||
| 3308 | -``obj_user_to_objects`` table for the given page or key. | ||
| 3309 | - | ||
| 3310 | -Note that pages and thumbnails have different object user types, so the | ||
| 3311 | -above test on a page will not include objects referenced by the page's | ||
| 3312 | -thumbnail dictionary and nothing else. | ||
| 3313 | - | ||
| 3314 | -.. _ref.linearization.writing: | ||
| 3315 | - | ||
| 3316 | -Writing Linearized Files | ||
| 3317 | ------------------------- | ||
| 3318 | - | ||
| 3319 | -We will create files with only primary hint streams. We will never write | ||
| 3320 | -overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either, | ||
| 3321 | -and they are never necessary.) The hint streams contain offset | ||
| 3322 | -information to objects that point to where they would be if the hint | ||
| 3323 | -stream were not present. This means that we have to calculate all object | ||
| 3324 | -positions before we can generate and write the hint table. This means | ||
| 3325 | -that we have to generate the file in two passes. To make this reliable, | ||
| 3326 | -``QPDFWriter`` in linearization mode invokes exactly the same code twice | ||
| 3327 | -to write the file to a pipeline. | ||
| 3328 | - | ||
| 3329 | -In the first pass, the target pipeline is a count pipeline chained to a | ||
| 3330 | -discard pipeline. The count pipeline simply passes its data through to | ||
| 3331 | -the next pipeline in the chain but can return the number of bytes passed | ||
| 3332 | -through it at any intermediate point. The discard pipeline is an end of | ||
| 3333 | -line pipeline that just throws its data away. The hint stream is not | ||
| 3334 | -written and dummy values with adequate padding are stored in the first | ||
| 3335 | -cross reference table, linearization parameter dictionary, and /Prev key | ||
| 3336 | -of the first trailer dictionary. All the offset, length, object | ||
| 3337 | -renumbering information, and anything else we need for the second pass | ||
| 3338 | -is stored. | ||
| 3339 | - | ||
| 3340 | -At the end of the first pass, this information is passed to the ``QPDF`` | ||
| 3341 | -class which constructs a compressed hint stream in a memory buffer and | ||
| 3342 | -returns it. ``QPDFWriter`` uses this information to write a complete | ||
| 3343 | -hint stream object into a memory buffer. At this point, the length of | ||
| 3344 | -the hint stream is known. | ||
| 3345 | - | ||
| 3346 | -In the second pass, the end of the pipeline chain is a regular file | ||
| 3347 | -instead of a discard pipeline, and we have known values for all the | ||
| 3348 | -offsets and lengths that we didn't have in the first pass. We have to | ||
| 3349 | -adjust offsets that appear after the start of the hint stream by the | ||
| 3350 | -length of the hint stream, which is known. Anything that is of variable | ||
| 3351 | -length is padded, with the padding code surrounding any writing code | ||
| 3352 | -that differs in the two passes. This ensures that changes to the way | ||
| 3353 | -things are represented never results in offsets that were gathered | ||
| 3354 | -during the first pass becoming incorrect for the second pass. | ||
| 3355 | - | ||
| 3356 | -Using this strategy, we can write linearized files to a non-seekable | ||
| 3357 | -output stream with only a single pass to disk or wherever the output is | ||
| 3358 | -going. | ||
| 3359 | - | ||
| 3360 | -.. _ref.linearization-data: | ||
| 3361 | - | ||
| 3362 | -Calculating Linearization Data | ||
| 3363 | ------------------------------- | ||
| 3364 | - | ||
| 3365 | -Once a file is optimized, we have information about which objects access | ||
| 3366 | -which other objects. We can then process these tables to decide which | ||
| 3367 | -part (as described in "Linearized PDF Document Structure" in the PDF | ||
| 3368 | -specification) each object is contained within. This tells us the exact | ||
| 3369 | -order in which objects are written. The ``QPDFWriter`` class asks for | ||
| 3370 | -this information and enqueues objects for writing in the proper order. | ||
| 3371 | -It also turns on a check that causes an exception to be thrown if an | ||
| 3372 | -object is encountered that has not already been queued. (This could | ||
| 3373 | -happen only if there were a bug in the traversal code used to calculate | ||
| 3374 | -the linearization data.) | ||
| 3375 | - | ||
| 3376 | -.. _ref.linearization-issues: | ||
| 3377 | - | ||
| 3378 | -Known Issues with Linearization | ||
| 3379 | -------------------------------- | ||
| 3380 | - | ||
| 3381 | -There are a handful of known issues with this linearization code. These | ||
| 3382 | -issues do not appear to impact the behavior of linearized files which | ||
| 3383 | -still work as intended: it is possible for a web browser to begin to | ||
| 3384 | -display them before they are fully downloaded. In fact, it seems that | ||
| 3385 | -various other programs that create linearized files have many of these | ||
| 3386 | -same issues. These items make reference to terminology used in the | ||
| 3387 | -linearization appendix of the PDF specification. | ||
| 3388 | - | ||
| 3389 | -- Thread Dictionary information keys appear in part 4 with the rest of | ||
| 3390 | - Threads instead of in part 9. Objects in part 9 are not grouped | ||
| 3391 | - together functionally. | ||
| 3392 | - | ||
| 3393 | -- We are not calculating numerators for shared object positions within | ||
| 3394 | - content streams or interleaving them within content streams. | ||
| 3395 | - | ||
| 3396 | -- We generate only page offset, shared object, and outline hint tables. | ||
| 3397 | - It would be relatively easy to add some additional tables. We gather | ||
| 3398 | - most of the information needed to create thumbnail hint tables. There | ||
| 3399 | - are comments in the code about this. | ||
| 3400 | - | ||
| 3401 | -.. _ref.linearization-debugging: | ||
| 3402 | - | ||
| 3403 | -Debugging Note | ||
| 3404 | --------------- | ||
| 3405 | - | ||
| 3406 | -The :command:`qpdf --show-linearization` command can show | ||
| 3407 | -the complete contents of linearization hint streams. To look at the raw | ||
| 3408 | -data, you can extract the filtered contents of the linearization hint | ||
| 3409 | -tables using :command:`qpdf --show-object=n | ||
| 3410 | ---filtered-stream-data`. Then, to convert this into a bit | ||
| 3411 | -stream (since linearization tables are bit streams written without | ||
| 3412 | -regard to byte boundaries), you can pipe the resulting data through the | ||
| 3413 | -following perl code: | ||
| 3414 | - | ||
| 3415 | -.. code-block:: perl | ||
| 3416 | - | ||
| 3417 | - use bytes; | ||
| 3418 | - binmode STDIN; | ||
| 3419 | - undef $/; | ||
| 3420 | - my $a = <STDIN>; | ||
| 3421 | - my @ch = split(//, $a); | ||
| 3422 | - map { printf("%08b", ord($_)) } @ch; | ||
| 3423 | - print "\n"; | ||
| 3424 | - | ||
| 3425 | -.. _ref.object-and-xref-streams: | ||
| 3426 | - | ||
| 3427 | -Object and Cross-Reference Streams | ||
| 3428 | -================================== | ||
| 3429 | - | ||
| 3430 | -This chapter provides information about the implementation of object | ||
| 3431 | -stream and cross-reference stream support in qpdf. | ||
| 3432 | - | ||
| 3433 | -.. _ref.object-streams: | ||
| 3434 | - | ||
| 3435 | -Object Streams | ||
| 3436 | --------------- | ||
| 3437 | - | ||
| 3438 | -Object streams can contain any regular object except the following: | ||
| 3439 | - | ||
| 3440 | -- stream objects | ||
| 3441 | - | ||
| 3442 | -- objects with generation > 0 | ||
| 3443 | - | ||
| 3444 | -- the encryption dictionary | ||
| 3445 | - | ||
| 3446 | -- objects containing the /Length of another stream | ||
| 3447 | - | ||
| 3448 | -In addition, Adobe reader (at least as of version 8.0.0) appears to not | ||
| 3449 | -be able to handle having the document catalog appear in an object stream | ||
| 3450 | -if the file is encrypted, though this is not specifically disallowed by | ||
| 3451 | -the specification. | ||
| 3452 | - | ||
| 3453 | -There are additional restrictions for linearized files. See | ||
| 3454 | -:ref:`ref.object-streams-linearization` for details. | ||
| 3455 | - | ||
| 3456 | -The PDF specification refers to objects in object streams as "compressed | ||
| 3457 | -objects" regardless of whether the object stream is compressed. | ||
| 3458 | - | ||
| 3459 | -The generation number of every object in an object stream must be zero. | ||
| 3460 | -It is possible to delete and replace an object in an object stream with | ||
| 3461 | -a regular object. | ||
| 3462 | - | ||
| 3463 | -The object stream dictionary has the following keys: | ||
| 3464 | - | ||
| 3465 | -- ``/N``: number of objects | ||
| 3466 | - | ||
| 3467 | -- ``/First``: byte offset of first object | ||
| 3468 | - | ||
| 3469 | -- ``/Extends``: indirect reference to stream that this extends | ||
| 3470 | - | ||
| 3471 | -Stream collections are formed with ``/Extends``. They must form a | ||
| 3472 | -directed acyclic graph. These can be used for semantic information and | ||
| 3473 | -are not meaningful to the PDF document's syntactic structure. Although | ||
| 3474 | -qpdf preserves stream collections, it never generates them and doesn't | ||
| 3475 | -make use of this information in any way. | ||
| 3476 | - | ||
| 3477 | -The specification recommends limiting the number of objects in object | ||
| 3478 | -stream for efficiency in reading and decoding. Acrobat 6 uses no more | ||
| 3479 | -than 100 objects per object stream for linearized files and no more 200 | ||
| 3480 | -objects per stream for non-linearized files. ``QPDFWriter``, in object | ||
| 3481 | -stream generation mode, never puts more than 100 objects in an object | ||
| 3482 | -stream. | ||
| 3483 | - | ||
| 3484 | -Object stream contents consists of *N* pairs of integers, each of which | ||
| 3485 | -is the object number and the byte offset of the object relative to the | ||
| 3486 | -first object in the stream, followed by the objects themselves, | ||
| 3487 | -concatenated. | ||
| 3488 | - | ||
| 3489 | -.. _ref.xref-streams: | ||
| 3490 | - | ||
| 3491 | -Cross-Reference Streams | ||
| 3492 | ------------------------ | ||
| 3493 | - | ||
| 3494 | -For non-hybrid files, the value following ``startxref`` is the byte | ||
| 3495 | -offset to the xref stream rather than the word ``xref``. | ||
| 3496 | - | ||
| 3497 | -For hybrid files (files containing both xref tables and cross-reference | ||
| 3498 | -streams), the xref table's trailer dictionary contains the key | ||
| 3499 | -``/XRefStm`` whose value is the byte offset to a cross-reference stream | ||
| 3500 | -that supplements the xref table. A PDF 1.5-compliant application should | ||
| 3501 | -read the xref table first. Then it should replace any object that it has | ||
| 3502 | -already seen with any defined in the xref stream. Then it should follow | ||
| 3503 | -any ``/Prev`` pointer in the original xref table's trailer dictionary. | ||
| 3504 | -The specification is not clear about what should be done, if anything, | ||
| 3505 | -with a ``/Prev`` pointer in the xref stream referenced by an xref table. | ||
| 3506 | -The ``QPDF`` class ignores it, which is probably reasonable since, if | ||
| 3507 | -this case were to appear for any sensible PDF file, the previous xref | ||
| 3508 | -table would probably have a corresponding ``/XRefStm`` pointer of its | ||
| 3509 | -own. For example, if a hybrid file were appended, the appended section | ||
| 3510 | -would have its own xref table and ``/XRefStm``. The appended xref table | ||
| 3511 | -would point to the previous xref table which would point the | ||
| 3512 | -``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to | ||
| 3513 | -it. | ||
| 3514 | - | ||
| 3515 | -Since xref streams must be read very early, they may not be encrypted, | ||
| 3516 | -and the may not contain indirect objects for keys required to read them, | ||
| 3517 | -which are these: | ||
| 3518 | - | ||
| 3519 | -- ``/Type``: value ``/XRef`` | ||
| 3520 | - | ||
| 3521 | -- ``/Size``: value *n+1*: where *n* is highest object number (same as | ||
| 3522 | - ``/Size`` in the trailer dictionary) | ||
| 3523 | - | ||
| 3524 | -- ``/Index`` (optional): value | ||
| 3525 | - ``[:samp:`{n count}` ...]`` used to determine | ||
| 3526 | - which objects' information is stored in this stream. The default is | ||
| 3527 | - ``[0 /Size]``. | ||
| 3528 | - | ||
| 3529 | -- ``/Prev``: value :samp:`{offset}`: byte | ||
| 3530 | - offset of previous xref stream (same as ``/Prev`` in the trailer | ||
| 3531 | - dictionary) | ||
| 3532 | - | ||
| 3533 | -- ``/W [...]``: sizes of each field in the xref table | ||
| 3534 | - | ||
| 3535 | -The other fields in the xref stream, which may be indirect if desired, | ||
| 3536 | -are the union of those from the xref table's trailer dictionary. | ||
| 3537 | - | ||
| 3538 | -.. _ref.xref-stream-data: | ||
| 3539 | - | ||
| 3540 | -Cross-Reference Stream Data | ||
| 3541 | -~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 3542 | - | ||
| 3543 | -The stream data is binary and encoded in big-endian byte order. Entries | ||
| 3544 | -are concatenated, and each entry has a length equal to the total of the | ||
| 3545 | -entries in ``/W`` above. Each entry consists of one or more fields, the | ||
| 3546 | -first of which is the type of the field. The number of bytes for each | ||
| 3547 | -field is given by ``/W`` above. A 0 in ``/W`` indicates that the field | ||
| 3548 | -is omitted and has the default value. The default value for the field | ||
| 3549 | -type is "``1``". All other default values are "``0``". | ||
| 3550 | - | ||
| 3551 | -PDF 1.5 has three field types: | ||
| 3552 | - | ||
| 3553 | -- 0: for free objects. Format: ``0 obj next-generation``, same as the | ||
| 3554 | - free table in a traditional cross-reference table | ||
| 3555 | - | ||
| 3556 | -- 1: regular non-compressed object. Format: ``1 offset generation`` | ||
| 3557 | - | ||
| 3558 | -- 2: for objects in object streams. Format: ``2 object-stream-number | ||
| 3559 | - index``, the number of object stream containing the object and the | ||
| 3560 | - index within the object stream of the object. | ||
| 3561 | - | ||
| 3562 | -It seems standard to have the first entry in the table be ``0 0 0`` | ||
| 3563 | -instead of ``0 0 ffff`` if there are no deleted objects. | ||
| 3564 | - | ||
| 3565 | -.. _ref.object-streams-linearization: | ||
| 3566 | - | ||
| 3567 | -Implications for Linearized Files | ||
| 3568 | ---------------------------------- | ||
| 3569 | - | ||
| 3570 | -For linearized files, the linearization dictionary, document catalog, | ||
| 3571 | -and page objects may not be contained in object streams. | ||
| 3572 | - | ||
| 3573 | -Objects stored within object streams are given the highest range of | ||
| 3574 | -object numbers within the main and first-page cross-reference sections. | ||
| 3575 | - | ||
| 3576 | -It is okay to use cross-reference streams in place of regular xref | ||
| 3577 | -tables. There are on special considerations. | ||
| 3578 | - | ||
| 3579 | -Hint data refers to object streams themselves, not the objects in the | ||
| 3580 | -streams. Shared object references should also be made to the object | ||
| 3581 | -streams. There are no reference in any hint tables to the object numbers | ||
| 3582 | -of compressed objects (objects within object streams). | ||
| 3583 | - | ||
| 3584 | -When numbering objects, all shared objects within both the first and | ||
| 3585 | -second halves of the linearized files must be numbered consecutively | ||
| 3586 | -after all normal uncompressed objects in that half. | ||
| 3587 | - | ||
| 3588 | -.. _ref.object-stream-implementation: | ||
| 3589 | - | ||
| 3590 | -Implementation Notes | ||
| 3591 | --------------------- | ||
| 3592 | - | ||
| 3593 | -There are three modes for writing object streams: | ||
| 3594 | -:samp:`disable`, :samp:`preserve`, and | ||
| 3595 | -:samp:`generate`. In disable mode, we do not generate | ||
| 3596 | -any object streams, and we also generate an xref table rather than xref | ||
| 3597 | -streams. This can be used to generate PDF files that are viewable with | ||
| 3598 | -older readers. In preserve mode, we write object streams such that | ||
| 3599 | -written object streams contain the same objects and ``/Extends`` | ||
| 3600 | -relationships as in the original file. This is equal to disable if the | ||
| 3601 | -file has no object streams. In generate, we create object streams | ||
| 3602 | -ourselves by grouping objects that are allowed in object streams | ||
| 3603 | -together in sets of no more than 100 objects. We also ensure that the | ||
| 3604 | -PDF version is at least 1.5 in generate mode, but we preserve the | ||
| 3605 | -version header in the other modes. The default is | ||
| 3606 | -:samp:`preserve`. | ||
| 3607 | - | ||
| 3608 | -We do not support creation of hybrid files. When we write files, even in | ||
| 3609 | -preserve mode, we will lose any xref tables and merge any appended | ||
| 3610 | -sections. | ||
| 3611 | - | ||
| 3612 | -.. _ref.release-notes: | ||
| 3613 | - | ||
| 3614 | -Release Notes | ||
| 3615 | -============= | ||
| 3616 | - | ||
| 3617 | -For a detailed list of changes, please see the file | ||
| 3618 | -:file:`ChangeLog` in the source distribution. | ||
| 3619 | - | ||
| 3620 | -10.5.0: XXX Month dd, YYYY | ||
| 3621 | - - Library Enhancements | ||
| 3622 | - | ||
| 3623 | - - Since qpdf version 8, using object accessor methods on an | ||
| 3624 | - instance of ``QPDFObjectHandle`` may create warnings if the | ||
| 3625 | - object is not of the expected type. These warnings now have an | ||
| 3626 | - error code of ``qpdf_e_object`` instead of | ||
| 3627 | - ``qpdf_e_damaged_pdf``. Also, comments have been added to | ||
| 3628 | - :file:`QPDFObjectHandle.hh` to explain in more detail what the | ||
| 3629 | - behavior is. See :ref:`ref.object-accessors` for a more in-depth | ||
| 3630 | - discussion. | ||
| 3631 | - | ||
| 3632 | - - Add ``Pl_Buffer::getMallocBuffer()`` to initialize a buffer | ||
| 3633 | - allocated with ``malloc()`` for better cross-language | ||
| 3634 | - interoperability. | ||
| 3635 | - | ||
| 3636 | - - C API Enhancements | ||
| 3637 | - | ||
| 3638 | - - Overhaul error handling for the object handle functions C API. | ||
| 3639 | - Some rare error conditions that would previously have caused a | ||
| 3640 | - crash are now trapped and reported, and the functions that | ||
| 3641 | - generate them return fallback values. See comments in the | ||
| 3642 | - ``ERROR HANDLING`` section of :file:`include/qpdf/qpdf-c.h` for | ||
| 3643 | - details. In particular, exceptions thrown by the underlying C++ | ||
| 3644 | - code when calling object accessors are caught and converted into | ||
| 3645 | - errors. The errors can be checked by call ``qpdf_has_error``. | ||
| 3646 | - Use ``qpdf_silence_errors`` to prevent the error from being | ||
| 3647 | - written to stderr. | ||
| 3648 | - | ||
| 3649 | - - Add ``qpdf_get_last_string_length`` to the C API to get the | ||
| 3650 | - length of the last string that was returned. This is needed to | ||
| 3651 | - handle strings that contain embedded null characters. | ||
| 3652 | - | ||
| 3653 | - - Add ``qpdf_oh_is_initialized`` and | ||
| 3654 | - ``qpdf_oh_new_uninitialized`` to the C API to make it possible | ||
| 3655 | - to work with uninitialized objects. | ||
| 3656 | - | ||
| 3657 | - - Add ``qpdf_oh_new_object`` to the C API. This allows you to | ||
| 3658 | - clone an object handle. | ||
| 3659 | - | ||
| 3660 | - - Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``, | ||
| 3661 | - and ``qpdf_replace_object``, exposing the corresponding methods | ||
| 3662 | - in ``QPDF`` and ``QPDFObjectHandle``. | ||
| 3663 | - | ||
| 3664 | - - Add several functions for working with pages. See ``PAGE | ||
| 3665 | - FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details. | ||
| 3666 | - | ||
| 3667 | - - Add several functions for working with streams. See ``STREAM | ||
| 3668 | - FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details. | ||
| 3669 | - | ||
| 3670 | - - Add ``qpdf_oh_get_type_code`` and ``qpdf_oh_get_type_name``. | ||
| 3671 | - | ||
| 3672 | - - Documentation change | ||
| 3673 | - | ||
| 3674 | - - The documentation sources have been switched from docbook to | ||
| 3675 | - reStructuredText processed with `Sphinx | ||
| 3676 | - <https://sphinx-doc.org>`__. This is mostly transparent (other | ||
| 3677 | - than format change) with the exception that all section links | ||
| 3678 | - have changed. What used to be `#ref.something` is now | ||
| 3679 | - `#something`. A top-to-bottom review of the documentation is | ||
| 3680 | - planned for an upcoming release. | ||
| 3681 | - | ||
| 3682 | -10.4.0: November 16, 2021 | ||
| 3683 | - - Handling of Weak Cryptography Algorithms | ||
| 3684 | - | ||
| 3685 | - - From the qpdf CLI, the | ||
| 3686 | - :samp:`--allow-weak-crypto` is now required to | ||
| 3687 | - suppress a warning when explicitly creating PDF files using RC4 | ||
| 3688 | - encryption. While qpdf will always retain the ability to read | ||
| 3689 | - and write such files, doing so will require explicit | ||
| 3690 | - acknowledgment moving forward. For qpdf 10.4, this change only | ||
| 3691 | - affects the command-line tool. Starting in qpdf 11, there will | ||
| 3692 | - be small API changes to require explicit acknowledgment in | ||
| 3693 | - those cases as well. For additional information, see :ref:`ref.weak-crypto`. | ||
| 3694 | - | ||
| 3695 | - - Bug Fixes | ||
| 3696 | - | ||
| 3697 | - - Fix potential bounds error when handling shell completion that | ||
| 3698 | - could occur when given bogus input. | ||
| 3699 | - | ||
| 3700 | - - Properly handle overlay/underlay on completely empty pages | ||
| 3701 | - (with no resource dictionary). | ||
| 3702 | - | ||
| 3703 | - - Fix crash that could occur under certain conditions when using | ||
| 3704 | - :samp:`--pages` with files that had form | ||
| 3705 | - fields. | ||
| 3706 | - | ||
| 3707 | - - Library Enhancements | ||
| 3708 | - | ||
| 3709 | - - Make ``QPDF::findPage`` functions public. | ||
| 3710 | - | ||
| 3711 | - - Add methods to ``Pl_Flate`` to be able to receive warnings on | ||
| 3712 | - certain recoverable conditions. | ||
| 3713 | - | ||
| 3714 | - - Add an extra check to the library to detect when foreign | ||
| 3715 | - objects are inserted directly (instead of using | ||
| 3716 | - ``QPDF::copyForeignObject``) at the time of insertion rather | ||
| 3717 | - than when the file is written. Catching the error sooner makes | ||
| 3718 | - it much easier to locate the incorrect code. | ||
| 3719 | - | ||
| 3720 | - - CLI Enhancements | ||
| 3721 | - | ||
| 3722 | - - Improve diagnostics around parsing | ||
| 3723 | - :samp:`--pages` command-line options | ||
| 3724 | - | ||
| 3725 | - - Packaging Changes | ||
| 3726 | - | ||
| 3727 | - - The Windows binary distribution is now built with crypto | ||
| 3728 | - provided by OpenSSL 3.0. | ||
| 3729 | - | ||
| 3730 | -10.3.2: May 8, 2021 | ||
| 3731 | - - Bug Fixes | ||
| 3732 | - | ||
| 3733 | - - When generating a file while preserving object streams, | ||
| 3734 | - unreferenced objects are correctly removed unless | ||
| 3735 | - :samp:`--preserve-unreferenced` is specified. | ||
| 3736 | - | ||
| 3737 | - - Library Enhancements | ||
| 3738 | - | ||
| 3739 | - - When adding a page that already exists, make a shallow copy | ||
| 3740 | - instead of throwing an exception. This makes the library | ||
| 3741 | - behavior consistent with the CLI behavior. See | ||
| 3742 | - :file:`ChangeLog` for additional notes. | ||
| 3743 | - | ||
| 3744 | -10.3.1: March 11, 2021 | ||
| 3745 | - - Bug Fixes | ||
| 3746 | - | ||
| 3747 | - - Form field copying failed on files where /DR was a direct | ||
| 3748 | - object in the document-level form dictionary. | ||
| 3749 | - | ||
| 3750 | -10.3.0: March 4, 2021 | ||
| 3751 | - - Bug Fixes | ||
| 3752 | - | ||
| 3753 | - - The code for handling form fields when copying pages from | ||
| 3754 | - 10.2.0 was not quite right and didn't work in a number of | ||
| 3755 | - situations, such as when the same page was copied multiple | ||
| 3756 | - times or when there were conflicting resource or field names | ||
| 3757 | - across multiple copies. The 10.3.0 code has been much more | ||
| 3758 | - thoroughly tested with more complex cases and with a multitude | ||
| 3759 | - of readers and should be much closer to correct. The 10.2.0 | ||
| 3760 | - code worked well enough for page splitting or for copying pages | ||
| 3761 | - with form fields into documents that didn't already have them | ||
| 3762 | - but was still not quite correct in handling of field-level | ||
| 3763 | - resources. | ||
| 3764 | - | ||
| 3765 | - - When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is | ||
| 3766 | - called, existing ``QPDFObjectHandle`` instances no longer point | ||
| 3767 | - to the old objects. The next time they are accessed, they | ||
| 3768 | - automatically notice the change to the underlying object and | ||
| 3769 | - update themselves. This resolves a very longstanding source of | ||
| 3770 | - confusion, albeit in a very rarely used method call. | ||
| 3771 | - | ||
| 3772 | - - Fix form field handling code to look for default appearances, | ||
| 3773 | - quadding, and default resources in the right places. The code | ||
| 3774 | - was not looking for things in the document-level interactive | ||
| 3775 | - form dictionary that it was supposed to be finding there. This | ||
| 3776 | - required adding a few new methods to | ||
| 3777 | - ``QPDFFormFieldObjectHelper``. | ||
| 3778 | - | ||
| 3779 | - - Library Enhancements | ||
| 3780 | - | ||
| 3781 | - - Reworked the code that handles copying annotations and form | ||
| 3782 | - fields during page operations. There were additional methods | ||
| 3783 | - added to the public API from 10.2.0 and a one deprecation of a | ||
| 3784 | - method added in 10.2.0. The majority of the API changes are in | ||
| 3785 | - methods most people would never call and that will hopefully be | ||
| 3786 | - superseded by higher-level interfaces for handling page copies. | ||
| 3787 | - Please see the :file:`ChangeLog` file for | ||
| 3788 | - details. | ||
| 3789 | - | ||
| 3790 | - - The method ``QPDF::numWarnings`` was added so that you can tell | ||
| 3791 | - whether any warnings happened during a specific block of code. | ||
| 3792 | - | ||
| 3793 | -10.2.0: February 23, 2021 | ||
| 3794 | - - CLI Behavior Changes | ||
| 3795 | - | ||
| 3796 | - - Operations that work on combining pages are much better about | ||
| 3797 | - protecting form fields. In particular, | ||
| 3798 | - :samp:`--split-pages` and | ||
| 3799 | - :samp:`--pages` now preserve interaction form | ||
| 3800 | - functionality by copying the relevant form field information | ||
| 3801 | - from the original files. Additionally, if you use | ||
| 3802 | - :samp:`--pages` to select only some pages from | ||
| 3803 | - the original input file, unused form fields are removed, which | ||
| 3804 | - prevents lots of unused annotations from being retained. | ||
| 3805 | - | ||
| 3806 | - - By default, :command:`qpdf` no longer allows | ||
| 3807 | - creation of encrypted PDF files whose user password is | ||
| 3808 | - non-empty and owner password is empty when a 256-bit key is in | ||
| 3809 | - use. The :samp:`--allow-insecure` option, | ||
| 3810 | - specified inside the :samp:`--encrypt` options, | ||
| 3811 | - allows creation of such files. Behavior changes in the CLI are | ||
| 3812 | - avoided when possible, but an exception was made here because | ||
| 3813 | - this is security-related. qpdf must always allow creation of | ||
| 3814 | - weird files for testing purposes, but it should not default to | ||
| 3815 | - letting users unknowingly create insecure files. | ||
| 3816 | - | ||
| 3817 | - - Library Behavior Changes | ||
| 3818 | - | ||
| 3819 | - - Note: the changes in this section cause differences in output | ||
| 3820 | - in some cases. These differences change the syntax of the PDF | ||
| 3821 | - but do not change the semantics (meaning). I make a strong | ||
| 3822 | - effort to avoid gratuitous changes in qpdf's output so that | ||
| 3823 | - qpdf changes don't break people's tests. In this case, the | ||
| 3824 | - changes significantly improve the readability of the generated | ||
| 3825 | - PDF and don't affect any output that's generated by simple | ||
| 3826 | - transformation. If you are annoyed by having to update test | ||
| 3827 | - files, please rest assured that changes like this have been and | ||
| 3828 | - will continue to be rare events. | ||
| 3829 | - | ||
| 3830 | - - ``QPDFObjectHandle::newUnicodeString`` now uses whichever of | ||
| 3831 | - ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all | ||
| 3832 | - the characters in the string. This reduces needless encoding in | ||
| 3833 | - UTF-16 of strings that can be encoded in ASCII. This change may | ||
| 3834 | - cause qpdf to generate different output than before when form | ||
| 3835 | - field values are set using ``QPDFFormFieldObjectHelper`` but | ||
| 3836 | - does not change the meaning of the output. | ||
| 3837 | - | ||
| 3838 | - - The code that places form XObjects and also the code that | ||
| 3839 | - flattens rotations trim trailing zeroes from real numbers that | ||
| 3840 | - they calculate. This causes slight (but semantically | ||
| 3841 | - equivalent) differences in generated appearance streams and | ||
| 3842 | - form XObject invocations in overlay/underlay code or in user | ||
| 3843 | - code that calls the methods that place form XObjects on a page. | ||
| 3844 | - | ||
| 3845 | - - CLI Enhancements | ||
| 3846 | - | ||
| 3847 | - - Add new command line options for listing, saving, adding, | ||
| 3848 | - removing, and and copying file attachments. See :ref:`ref.attachments` for details. | ||
| 3849 | - | ||
| 3850 | - - Page splitting and merging operations, as well as | ||
| 3851 | - :samp:`--flatten-rotation`, are better behaved | ||
| 3852 | - with respect to annotations and interactive form fields. In | ||
| 3853 | - most cases, interactive form field functionality and proper | ||
| 3854 | - formatting and functionality of annotations is preserved by | ||
| 3855 | - these operations. There are still some cases that aren't | ||
| 3856 | - perfect, such as when functionality of annotations depends on | ||
| 3857 | - document-level data that qpdf doesn't yet understand or when | ||
| 3858 | - there are problems with referential integrity among form fields | ||
| 3859 | - and annotations (e.g., when a single form field object or its | ||
| 3860 | - associated annotations are shared across multiple pages, a case | ||
| 3861 | - that is out of spec but that works in most viewers anyway). | ||
| 3862 | - | ||
| 3863 | - - The option | ||
| 3864 | - :samp:`--password-file={filename}` | ||
| 3865 | - can now be used to read the decryption password from a file. | ||
| 3866 | - You can use ``-`` as the file name to read the password from | ||
| 3867 | - standard input. This is an easier/more obvious way to read | ||
| 3868 | - passwords from files or standard input than using | ||
| 3869 | - :samp:`@file` for this purpose. | ||
| 3870 | - | ||
| 3871 | - - Add some information about attachments to the json output, and | ||
| 3872 | - added ``attachments`` as an additional json key. The | ||
| 3873 | - information included here is limited to the preferred name and | ||
| 3874 | - content stream and a reference to the file spec object. This is | ||
| 3875 | - enough detail for clients to avoid the hassle of navigating a | ||
| 3876 | - name tree and provides what is needed for basic enumeration and | ||
| 3877 | - extraction of attachments. More detailed information can be | ||
| 3878 | - obtained by following the reference to the file spec object. | ||
| 3879 | - | ||
| 3880 | - - Add numeric option to :samp:`--collate`. If | ||
| 3881 | - :samp:`--collate={n}` | ||
| 3882 | - is given, take pages in groups of | ||
| 3883 | - :samp:`{n}` from the given files. | ||
| 3884 | - | ||
| 3885 | - - It is now valid to provide :samp:`--rotate=0` | ||
| 3886 | - to clear rotation from a page. | ||
| 3887 | - | ||
| 3888 | - - Library Enhancements | ||
| 3889 | - | ||
| 3890 | - - This release includes numerous additions to the API. Not all | ||
| 3891 | - changes are listed here. Please see the | ||
| 3892 | - :file:`ChangeLog` file in the source | ||
| 3893 | - distribution for a comprehensive list. Highlights appear below. | ||
| 3894 | - | ||
| 3895 | - - Add ``QPDFObjectHandle::ditems()`` and | ||
| 3896 | - ``QPDFObjectHandle::aitems()`` that enable C++-style iteration, | ||
| 3897 | - including range-for iteration, over dictionary and array | ||
| 3898 | - QPDFObjectHandles. See comments in | ||
| 3899 | - :file:`include/qpdf/QPDFObjectHandle.hh` | ||
| 3900 | - and | ||
| 3901 | - :file:`examples/pdf-name-number-tree.cc` | ||
| 3902 | - for details. | ||
| 3903 | - | ||
| 3904 | - - Add ``QPDFObjectHandle::copyStream`` for making a copy of a | ||
| 3905 | - stream within the same ``QPDF`` instance. | ||
| 3906 | - | ||
| 3907 | - - Add new helper classes for supporting file attachments, also | ||
| 3908 | - known as embedded files. New classes are | ||
| 3909 | - ``QPDFEmbeddedFileDocumentHelper``, | ||
| 3910 | - ``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``. | ||
| 3911 | - See their respective headers for details and | ||
| 3912 | - :file:`examples/pdf-attach-file.cc` for an | ||
| 3913 | - example. | ||
| 3914 | - | ||
| 3915 | - - Add a version of ``QPDFObjectHandle::parse`` that takes a | ||
| 3916 | - ``QPDF`` pointer as context so that it can parse strings | ||
| 3917 | - containing indirect object references. This is illustrated in | ||
| 3918 | - :file:`examples/pdf-attach-file.cc`. | ||
| 3919 | - | ||
| 3920 | - - Re-implement ``QPDFNameTreeObjectHelper`` and | ||
| 3921 | - ``QPDFNumberTreeObjectHelper`` to be more efficient, add an | ||
| 3922 | - iterator-based API, give them the capability to repair broken | ||
| 3923 | - trees, and create methods for modifying the trees. With this | ||
| 3924 | - change, qpdf has a robust read/write implementation of name and | ||
| 3925 | - number trees. | ||
| 3926 | - | ||
| 3927 | - - Add new versions of ``QPDFObjectHandle::replaceStreamData`` | ||
| 3928 | - that take ``std::function`` objects for cases when you need | ||
| 3929 | - something between a static string and a full-fledged | ||
| 3930 | - StreamDataProvider. Using this with ``QUtil::file_provider`` is | ||
| 3931 | - a very easy way to create a stream from the contents of a file. | ||
| 3932 | - | ||
| 3933 | - - The ``QPDFMatrix`` class, formerly a private, internal class, | ||
| 3934 | - has been added to the public API. See | ||
| 3935 | - :file:`include/qpdf/QPDFMatrix.hh` for | ||
| 3936 | - details. This class is for working with transformation | ||
| 3937 | - matrices. Some methods in ``QPDFPageObjectHelper`` make use of | ||
| 3938 | - this to make information about transformation matrices | ||
| 3939 | - available. For an example, see | ||
| 3940 | - :file:`examples/pdf-overlay-page.cc`. | ||
| 3941 | - | ||
| 3942 | - - Several new methods were added to | ||
| 3943 | - ``QPDFAcroFormDocumentHelper`` for adding, removing, getting | ||
| 3944 | - information about, and enumerating form fields. | ||
| 3945 | - | ||
| 3946 | - - Add method | ||
| 3947 | - ``QPDFAcroFormDocumentHelper::transformAnnotations``, which | ||
| 3948 | - applies a transformation to each annotation on a page. | ||
| 3949 | - | ||
| 3950 | - - Add ``QPDFPageObjectHelper::copyAnnotations``, which copies | ||
| 3951 | - annotations and, if applicable, associated form fields, from | ||
| 3952 | - one page to another, possibly transforming the rectangles. | ||
| 3953 | - | ||
| 3954 | - - Build Changes | ||
| 3955 | - | ||
| 3956 | - - A C++-14 compiler is now required to build qpdf. There is no | ||
| 3957 | - intention to require anything newer than that for a while. | ||
| 3958 | - C++-14 includes modest enhancements to C++-11 and appears to be | ||
| 3959 | - supported about as widely as C++-11. | ||
| 3960 | - | ||
| 3961 | - - Bug Fixes | ||
| 3962 | - | ||
| 3963 | - - The :samp:`--flatten-rotation` option applies | ||
| 3964 | - transformations to any annotations that may be on the page. | ||
| 3965 | - | ||
| 3966 | - - If a form XObject lacks a resources dictionary, consider any | ||
| 3967 | - names in that form XObject to be referenced from the containing | ||
| 3968 | - page. This is compliant with older PDF versions. Also detect if | ||
| 3969 | - any form XObjects have any unresolved names and, if so, don't | ||
| 3970 | - remove unreferenced resources from them or from the page that | ||
| 3971 | - contains them. Unfortunately this has the side effect of | ||
| 3972 | - preventing removal of unreferenced resources in some cases | ||
| 3973 | - where names appear that don't refer to resources, such as with | ||
| 3974 | - tagged PDF. This is a bit of a corner case that is not likely | ||
| 3975 | - to cause a significant problem in practice, but the only side | ||
| 3976 | - effect would be lack of removal of shared resources. A future | ||
| 3977 | - version of qpdf may be more sophisticated in its detection of | ||
| 3978 | - names that refer to resources. | ||
| 3979 | - | ||
| 3980 | - - Properly handle strings if they appear in inline image | ||
| 3981 | - dictionaries while externalizing inline images. | ||
| 3982 | - | ||
| 3983 | -10.1.0: January 5, 2021 | ||
| 3984 | - - CLI Enhancements | ||
| 3985 | - | ||
| 3986 | - - Add :samp:`--flatten-rotation` command-line | ||
| 3987 | - option, which causes all pages that are rotated using | ||
| 3988 | - parameters in the page's dictionary to instead be identically | ||
| 3989 | - rotated in the page's contents. The change is not user-visible | ||
| 3990 | - for compliant PDF readers but can be used to work around broken | ||
| 3991 | - PDF applications that don't properly handle page rotation. | ||
| 3992 | - | ||
| 3993 | - - Library Enhancements | ||
| 3994 | - | ||
| 3995 | - - Support for user-provided (pluggable, modular) stream filters. | ||
| 3996 | - It is now possible to derive a class from ``QPDFStreamFilter`` | ||
| 3997 | - and register it with ``QPDF`` so that regular library methods, | ||
| 3998 | - including those used by ``QPDFWriter``, can decode streams with | ||
| 3999 | - filters not directly supported by the library. The example | ||
| 4000 | - :file:`examples/pdf-custom-filter.cc` | ||
| 4001 | - illustrates how to use this capability. | ||
| 4002 | - | ||
| 4003 | - - Add methods to ``QPDFPageObjectHelper`` to iterate through | ||
| 4004 | - XObjects on a page or form XObjects, possibly recursing into | ||
| 4005 | - nested form XObjects: ``forEachXObject``, ``ForEachImage``, | ||
| 4006 | - ``forEachFormXObject``. | ||
| 4007 | - | ||
| 4008 | - - Enhance several methods in ``QPDFPageObjectHelper`` to work | ||
| 4009 | - with form XObjects as well as pages, as noted in comments. See | ||
| 4010 | - :file:`ChangeLog` for a full list. | ||
| 4011 | - | ||
| 4012 | - - Rename some functions in ``QPDFPageObjectHelper``, while | ||
| 4013 | - keeping old names for compatibility: | ||
| 4014 | - | ||
| 4015 | - - ``getPageImages`` to ``getImages`` | ||
| 4016 | - | ||
| 4017 | - - ``filterPageContents`` to ``filterContents`` | ||
| 4018 | - | ||
| 4019 | - - ``pipePageContents`` to ``pipeContents`` | ||
| 4020 | - | ||
| 4021 | - - ``parsePageContents`` to ``parseContents`` | ||
| 4022 | - | ||
| 4023 | - - Add method ``QPDFPageObjectHelper::getFormXObjects`` to return | ||
| 4024 | - a map of form XObjects directly on a page or form XObject | ||
| 4025 | - | ||
| 4026 | - - Add new helper methods to ``QPDFObjectHandle``: | ||
| 4027 | - ``isFormXObject``, ``isImage`` | ||
| 4028 | - | ||
| 4029 | - - Add the optional ``allow_streams`` parameter | ||
| 4030 | - ``QPDFObjectHandle::makeDirect``. When | ||
| 4031 | - ``QPDFObjectHandle::makeDirect`` is called in this way, it | ||
| 4032 | - preserves references to streams rather than throwing an | ||
| 4033 | - exception. | ||
| 4034 | - | ||
| 4035 | - - Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this | ||
| 4036 | - on a stream prevents ``QPDFWriter`` from attempting to | ||
| 4037 | - uncompress, recompress, or otherwise filter a stream even if it | ||
| 4038 | - could. Developers can use this to protect streams that are | ||
| 4039 | - optimized should be protected from ``QPDFWriter``'s default | ||
| 4040 | - behavior for any other reason. | ||
| 4041 | - | ||
| 4042 | - - Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is | ||
| 4043 | - useful to have for debugging. | ||
| 4044 | - | ||
| 4045 | - - Add method ``QPDFPageObjectHelper::flattenRotation``, which | ||
| 4046 | - replaces a page's ``/Rotate`` keyword by rotating the page | ||
| 4047 | - within the content stream and altering the page's bounding | ||
| 4048 | - boxes so the rendering is the same. This can be used to work | ||
| 4049 | - around buggy PDF readers that can't properly handle page | ||
| 4050 | - rotation. | ||
| 4051 | - | ||
| 4052 | - - C API Enhancements | ||
| 4053 | - | ||
| 4054 | - - Add several new functions to the C API for working with | ||
| 4055 | - objects. These are wrappers around many of the methods in | ||
| 4056 | - ``QPDFObjectHandle``. Their inclusion adds considerable new | ||
| 4057 | - capability to the C API. | ||
| 4058 | - | ||
| 4059 | - - Add ``qpdf_register_progress_reporter`` to the C API, | ||
| 4060 | - corresponding to ``QPDFWriter::registerProgressReporter``. | ||
| 4061 | - | ||
| 4062 | - - Performance Enhancements | ||
| 4063 | - | ||
| 4064 | - - Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object | ||
| 4065 | - for writing, resulting in about an 8% improvement in write | ||
| 4066 | - performance while allowing indirect objects to appear in | ||
| 4067 | - ``/DecodeParms``. | ||
| 4068 | - | ||
| 4069 | - - When extracting pages, the :command:`qpdf` CLI | ||
| 4070 | - only removes unreferenced resources from the pages that are | ||
| 4071 | - being kept, resulting in a significant performance improvement | ||
| 4072 | - when extracting small numbers of pages from large, complex | ||
| 4073 | - documents. | ||
| 4074 | - | ||
| 4075 | - - Bug Fixes | ||
| 4076 | - | ||
| 4077 | - - ``QPDFPageObjectHelper::externalizeInlineImages`` was not | ||
| 4078 | - externalizing images referenced from form XObjects that | ||
| 4079 | - appeared on the page. | ||
| 4080 | - | ||
| 4081 | - - ``QPDFObjectHandle::filterPageContents`` was broken for pages | ||
| 4082 | - with multiple content streams. | ||
| 4083 | - | ||
| 4084 | - - Tweak zsh completion code to behave a little better with | ||
| 4085 | - respect to path completion. | ||
| 4086 | - | ||
| 4087 | -10.0.4: November 21, 2020 | ||
| 4088 | - - Bug Fixes | ||
| 4089 | - | ||
| 4090 | - - Fix a handful of integer overflows. This includes cases found | ||
| 4091 | - by fuzzing as well as having qpdf not do range checking on | ||
| 4092 | - unused values in the xref stream. | ||
| 4093 | - | ||
| 4094 | -10.0.3: October 31, 2020 | ||
| 4095 | - - Bug Fixes | ||
| 4096 | - | ||
| 4097 | - - The fix to the bug involving copying streams with indirect | ||
| 4098 | - filters was incorrect and introduced a new, more serious bug. | ||
| 4099 | - The original bug has been fixed correctly, as has the bug | ||
| 4100 | - introduced in 10.0.2. | ||
| 4101 | - | ||
| 4102 | -10.0.2: October 27, 2020 | ||
| 4103 | - - Bug Fixes | ||
| 4104 | - | ||
| 4105 | - - When concatenating content streams, as with | ||
| 4106 | - :samp:`--coalesce-contents`, there were cases | ||
| 4107 | - in which qpdf would merge two lexical tokens together, creating | ||
| 4108 | - invalid results. A newline is now inserted between merged | ||
| 4109 | - content streams if one is not already present. | ||
| 4110 | - | ||
| 4111 | - - Fix an internal error that could occur when copying foreign | ||
| 4112 | - streams whose stream data had been replaced using a stream data | ||
| 4113 | - provider if those streams had indirect filters or decode | ||
| 4114 | - parameters. This is a rare corner case. | ||
| 4115 | - | ||
| 4116 | - - Ensure that the caller's locale settings do not change the | ||
| 4117 | - results of numeric conversions performed internally by the qpdf | ||
| 4118 | - library. Note that the problem here could only be caused when | ||
| 4119 | - the qpdf library was used programmatically. Using the qpdf CLI | ||
| 4120 | - already ignored the user's locale for numeric conversion. | ||
| 4121 | - | ||
| 4122 | - - Fix several instances in which warnings were not suppressed in | ||
| 4123 | - spite of :samp:`--no-warn` and/or errors or | ||
| 4124 | - warnings were written to standard output rather than standard | ||
| 4125 | - error. | ||
| 4126 | - | ||
| 4127 | - - Fixed a memory leak that could occur under specific | ||
| 4128 | - circumstances when | ||
| 4129 | - :samp:`--object-streams=generate` was used. | ||
| 4130 | - | ||
| 4131 | - - Fix various integer overflows and similar conditions found by | ||
| 4132 | - the OSS-Fuzz project. | ||
| 4133 | - | ||
| 4134 | - - Enhancements | ||
| 4135 | - | ||
| 4136 | - - New option :samp:`--warning-exit-0` causes qpdf | ||
| 4137 | - to exit with a status of ``0`` rather than ``3`` if there are | ||
| 4138 | - warnings but no errors. Combine with | ||
| 4139 | - :samp:`--no-warn` to completely ignore | ||
| 4140 | - warnings. | ||
| 4141 | - | ||
| 4142 | - - Performance improvements have been made to | ||
| 4143 | - ``QPDF::processMemoryFile``. | ||
| 4144 | - | ||
| 4145 | - - The OpenSSL crypto provider produces more detailed error | ||
| 4146 | - messages. | ||
| 4147 | - | ||
| 4148 | - - Build Changes | ||
| 4149 | - | ||
| 4150 | - - The option :samp:`--disable-rpath` is now | ||
| 4151 | - supported by qpdf's :command:`./configure` | ||
| 4152 | - script. Some distributions' packaging standards recommended the | ||
| 4153 | - use of this option. | ||
| 4154 | - | ||
| 4155 | - - Selection of a printf format string for ``long long`` has | ||
| 4156 | - been moved from ``ifdefs`` to an autoconf | ||
| 4157 | - test. If you are using your own build system, you will need to | ||
| 4158 | - provide a value for ``LL_FMT`` in | ||
| 4159 | - :file:`libqpdf/qpdf/qpdf-config.h`, which | ||
| 4160 | - would typically be ``"%lld"`` or, for some Windows compilers, | ||
| 4161 | - ``"%I64d"``. | ||
| 4162 | - | ||
| 4163 | - - Several improvements were made to build-time configuration of | ||
| 4164 | - the OpenSSL crypto provider. | ||
| 4165 | - | ||
| 4166 | - - A nearly stand-alone Linux binary zip file is now included with | ||
| 4167 | - the qpdf release. This is built on an older (but supported) | ||
| 4168 | - Ubuntu LTS release, but would work on most reasonably recent | ||
| 4169 | - Linux distributions. It contains only the executables and | ||
| 4170 | - required shared libraries that would not be present on a | ||
| 4171 | - minimal system. It can be used for including qpdf in a minimal | ||
| 4172 | - environment, such as a docker container. The zip file is also | ||
| 4173 | - known to work as a layer in AWS Lambda. | ||
| 4174 | - | ||
| 4175 | - - QPDF's automated build has been migrated from Azure Pipelines | ||
| 4176 | - to GitHub Actions. | ||
| 4177 | - | ||
| 4178 | - - Windows-specific Changes | ||
| 4179 | - | ||
| 4180 | - - The Windows executables distributed with qpdf releases now use | ||
| 4181 | - the OpenSSL crypto provider by default. The native crypto | ||
| 4182 | - provider is also compiled in and can be selected at runtime | ||
| 4183 | - with the ``QPDF_CRYPTO_PROVIDER`` environment variable. | ||
| 4184 | - | ||
| 4185 | - - Improvements have been made to how a cryptographic provider is | ||
| 4186 | - obtained in the native Windows crypto implementation. However | ||
| 4187 | - mostly this is shadowed by OpenSSL being used by default. | ||
| 4188 | - | ||
| 4189 | -10.0.1: April 9, 2020 | ||
| 4190 | - - Bug Fixes | ||
| 4191 | - | ||
| 4192 | - - 10.0.0 introduced a bug in which calling | ||
| 4193 | - ``QPDFObjectHandle::getStreamData`` on a stream that can't be | ||
| 4194 | - filtered was returning the raw data instead of throwing an | ||
| 4195 | - exception. This is now fixed. | ||
| 4196 | - | ||
| 4197 | - - Fix a bug that was preventing qpdf from linking with some | ||
| 4198 | - versions of clang on some platforms. | ||
| 4199 | - | ||
| 4200 | - - Enhancements | ||
| 4201 | - | ||
| 4202 | - - Improve the :file:`pdf-invert-images` | ||
| 4203 | - example to avoid having to load all the images into RAM at the | ||
| 4204 | - same time. | ||
| 4205 | - | ||
| 4206 | -10.0.0: April 6, 2020 | ||
| 4207 | - - Performance Enhancements | ||
| 4208 | - | ||
| 4209 | - - The qpdf library and executable should run much faster in this | ||
| 4210 | - version than in the last several releases. Several internal | ||
| 4211 | - library optimizations have been made, and there has been | ||
| 4212 | - improved behavior on page splitting as well. This version of | ||
| 4213 | - qpdf should outperform any of the 8.x or 9.x versions. | ||
| 4214 | - | ||
| 4215 | - - Incompatible API (source-level) Changes (minor) | ||
| 4216 | - | ||
| 4217 | - - The ``QUtil::srandom`` method was removed. It didn't do | ||
| 4218 | - anything unless insecure random numbers were compiled in, and | ||
| 4219 | - they have been off by default for a long time. If you were | ||
| 4220 | - calling it, just remove the call since it wasn't doing anything | ||
| 4221 | - anyway. | ||
| 4222 | - | ||
| 4223 | - - Build/Packaging Changes | ||
| 4224 | - | ||
| 4225 | - - Add a ``openssl`` crypto provider, which is implemented with | ||
| 4226 | - OpenSSL and also works with BoringSSL. Thanks to Dean Scarff | ||
| 4227 | - for this contribution. If you maintain qpdf for a distribution, | ||
| 4228 | - pay special attention to make sure that you are including | ||
| 4229 | - support for the crypto providers you want. Package maintainers | ||
| 4230 | - will have to weigh the advantages of allowing users to pick a | ||
| 4231 | - crypto provider at runtime against the disadvantages of adding | ||
| 4232 | - more dependencies to qpdf. | ||
| 4233 | - | ||
| 4234 | - - Allow qpdf to built on stripped down systems whose C/C++ | ||
| 4235 | - libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in | ||
| 4236 | - qpdf's README.md for details. This should be very rare, but it | ||
| 4237 | - is known to be helpful in some embedded environments. | ||
| 4238 | - | ||
| 4239 | - - CLI Enhancements | ||
| 4240 | - | ||
| 4241 | - - Add ``objectinfo`` key to the JSON output. This will be a place | ||
| 4242 | - to put computed metadata or other information about PDF objects | ||
| 4243 | - that are not immediately evident in other ways or that seem | ||
| 4244 | - useful for some other reason. In this version, information is | ||
| 4245 | - provided about each object indicating whether it is a stream | ||
| 4246 | - and, if so, what its length and filters are. Without this, it | ||
| 4247 | - was not possible to tell conclusively from the JSON output | ||
| 4248 | - alone whether or not an object was a stream. Run | ||
| 4249 | - :command:`qpdf --json-help` for details. | ||
| 4250 | - | ||
| 4251 | - - Add new option | ||
| 4252 | - :samp:`--remove-unreferenced-resources` which | ||
| 4253 | - takes ``auto``, ``yes``, or ``no`` as arguments. The new | ||
| 4254 | - ``auto`` mode, which is the default, performs a fast heuristic | ||
| 4255 | - over a PDF file when splitting pages to determine whether the | ||
| 4256 | - expensive process of finding and removing unreferenced | ||
| 4257 | - resources is likely to be of benefit. For most files, this new | ||
| 4258 | - default will result in a significant performance improvement | ||
| 4259 | - for splitting pages. See :ref:`ref.advanced-transformation` for a more detailed | ||
| 4260 | - discussion. | ||
| 4261 | - | ||
| 4262 | - - The :samp:`--preserve-unreferenced-resources` | ||
| 4263 | - is now just a synonym for | ||
| 4264 | - :samp:`--remove-unreferenced-resources=no`. | ||
| 4265 | - | ||
| 4266 | - - If the ``QPDF_EXECUTABLE`` environment variable is set when | ||
| 4267 | - invoking :command:`qpdf --bash-completion` or | ||
| 4268 | - :command:`qpdf --zsh-completion`, the completion | ||
| 4269 | - command that it outputs will refer to qpdf using the value of | ||
| 4270 | - that variable rather than what :command:`qpdf` | ||
| 4271 | - determines its executable path to be. This can be useful when | ||
| 4272 | - wrapping :command:`qpdf` with a script, working | ||
| 4273 | - with a version in the source tree, using an AppImage, or other | ||
| 4274 | - situations where there is some indirection. | ||
| 4275 | - | ||
| 4276 | - - Library Enhancements | ||
| 4277 | - | ||
| 4278 | - - Random number generation is now delegated to the crypto | ||
| 4279 | - provider. The old behavior is still used by the native crypto | ||
| 4280 | - provider. It is still possible to provide your own random | ||
| 4281 | - number generator. | ||
| 4282 | - | ||
| 4283 | - - Add a new version of | ||
| 4284 | - ``QPDFObjectHandle::StreamDataProvider::provideStreamData`` | ||
| 4285 | - that accepts the ``suppress_warnings`` and ``will_retry`` | ||
| 4286 | - options and allows a success code to be returned. This makes it | ||
| 4287 | - possible to implement a ``StreamDataProvider`` that calls | ||
| 4288 | - ``pipeStreamData`` on another stream and to pass the response | ||
| 4289 | - back to the caller, which enables better error handling on | ||
| 4290 | - those proxied streams. | ||
| 4291 | - | ||
| 4292 | - - Update ``QPDFObjectHandle::pipeStreamData`` to return an | ||
| 4293 | - overall success code that goes beyond whether or not filtered | ||
| 4294 | - data was written successfully. This allows better error | ||
| 4295 | - handling of cases that were not filtering errors. You have to | ||
| 4296 | - call this explicitly. Methods in previously existing APIs have | ||
| 4297 | - the same semantics as before. | ||
| 4298 | - | ||
| 4299 | - - The ``QPDFPageObjectHelper::placeFormXObject`` method now | ||
| 4300 | - allows separate control over whether it should be willing to | ||
| 4301 | - shrink or expand objects to fit them better into the | ||
| 4302 | - destination rectangle. The previous behavior was that shrinking | ||
| 4303 | - was allowed but expansion was not. The previous behavior is | ||
| 4304 | - still the default. | ||
| 4305 | - | ||
| 4306 | - - When calling the C API, any non-zero value passed to a boolean | ||
| 4307 | - parameter is treated as ``TRUE``. Previously only the value | ||
| 4308 | - ``1`` was accepted. This makes the C API behave more like most | ||
| 4309 | - C interfaces and is known to improve compatibility with some | ||
| 4310 | - Windows environments that dynamically load the DLL and call | ||
| 4311 | - functions from it. | ||
| 4312 | - | ||
| 4313 | - - Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only | ||
| 4314 | - top-level dictionary keys or array items. This is unsafe | ||
| 4315 | - because it creates a situation in which changing a lower-level | ||
| 4316 | - item in one object may also change it in another object, but | ||
| 4317 | - for cases in which you *know* you are only inserting or | ||
| 4318 | - replacing top-level items, it is much faster than | ||
| 4319 | - ``QPDFObjectHandle::shallowCopy``. | ||
| 4320 | - | ||
| 4321 | - - Add ``QPDFObjectHandle::filterAsContents``, which filter's a | ||
| 4322 | - stream's data as a content stream. This is useful for parsing | ||
| 4323 | - the contents for form XObjects in the same way as parsing page | ||
| 4324 | - content streams. | ||
| 4325 | - | ||
| 4326 | - - Bug Fixes | ||
| 4327 | - | ||
| 4328 | - - When detecting and removing unreferenced resources during page | ||
| 4329 | - splitting, traverse into form XObjects and handle their | ||
| 4330 | - resources dictionaries as well. | ||
| 4331 | - | ||
| 4332 | - - The same error recovery is applied to streams in other than the | ||
| 4333 | - primary input file when merging or splitting pages. | ||
| 4334 | - | ||
| 4335 | -9.1.1: January 26, 2020 | ||
| 4336 | - - Build/Packaging Changes | ||
| 4337 | - | ||
| 4338 | - - The fix-qdf program was converted from perl to C++. As such, | ||
| 4339 | - qpdf no longer has a runtime dependency on perl. | ||
| 4340 | - | ||
| 4341 | - - Library Enhancements | ||
| 4342 | - | ||
| 4343 | - - Added new helper routine ``QUtil::call_main_from_wmain`` which | ||
| 4344 | - converts ``wchar_t`` arguments to UTF-8 encoded strings. This | ||
| 4345 | - is useful for qpdf because library methods expect file names to | ||
| 4346 | - be UTF-8 encoded, even on Windows | ||
| 4347 | - | ||
| 4348 | - - Added new ``QUtil::read_lines_from_file`` methods that take | ||
| 4349 | - ``FILE*`` arguments and that allow preservation of end-of-line | ||
| 4350 | - characters. This also fixes a bug where | ||
| 4351 | - ``QUtil::read_lines_from_file`` wouldn't work properly with | ||
| 4352 | - Unicode filenames. | ||
| 4353 | - | ||
| 4354 | - - CLI Enhancements | ||
| 4355 | - | ||
| 4356 | - - Added options :samp:`--is-encrypted` and | ||
| 4357 | - :samp:`--requires-password` for testing whether | ||
| 4358 | - a file is encrypted or requires a password other than the | ||
| 4359 | - supplied (or empty) password. These communicate via exit | ||
| 4360 | - status, making them useful for shell scripts. They also work on | ||
| 4361 | - encrypted files with unknown passwords. | ||
| 4362 | - | ||
| 4363 | - - Added ``encrypt`` key to JSON options. With the exception of | ||
| 4364 | - the reconstructed user password for older encryption formats, | ||
| 4365 | - this provides the same information as | ||
| 4366 | - :samp:`--show-encryption` but in a consistent, | ||
| 4367 | - parseable format. See output of :command:`qpdf | ||
| 4368 | - --json-help` for details. | ||
| 4369 | - | ||
| 4370 | - - Bug Fixes | ||
| 4371 | - | ||
| 4372 | - - In QDF mode, be sure not to write more than one XRef stream to | ||
| 4373 | - a file, even when | ||
| 4374 | - :samp:`--preserve-unreferenced` is used. | ||
| 4375 | - :command:`fix-qdf` assumes that there is only | ||
| 4376 | - one XRef stream, and that it appears at the end of the file. | ||
| 4377 | - | ||
| 4378 | - - When externalizing inline images, properly handle images whose | ||
| 4379 | - color space is a reference to an object in the page's resource | ||
| 4380 | - dictionary. | ||
| 4381 | - | ||
| 4382 | - - Windows-specific fix for acquiring crypt context with a new | ||
| 4383 | - keyset. | ||
| 4384 | - | ||
| 4385 | -9.1.0: November 17, 2019 | ||
| 4386 | - - Build Changes | ||
| 4387 | - | ||
| 4388 | - - A C++-11 compiler is now required to build qpdf. | ||
| 4389 | - | ||
| 4390 | - - A new crypto provider that uses gnutls for crypto functions is | ||
| 4391 | - now available and can be enabled at build time. See :ref:`ref.crypto` for more information about crypto | ||
| 4392 | - providers and :ref:`ref.crypto.build` for specific information about | ||
| 4393 | - the build. | ||
| 4394 | - | ||
| 4395 | - - Library Enhancements | ||
| 4396 | - | ||
| 4397 | - - Incorporate contribution from Masamichi Hosoda to properly | ||
| 4398 | - handle signature dictionaries by not including them in object | ||
| 4399 | - streams, formatting the ``Contents`` key has a hexadecimal | ||
| 4400 | - string, and excluding the ``/Contents`` key from encryption and | ||
| 4401 | - decryption. | ||
| 4402 | - | ||
| 4403 | - - Incorporate contribution from Masamichi Hosoda to provide new | ||
| 4404 | - API calls for getting file-level information about input and | ||
| 4405 | - output files, enabling certain operations on the files at the | ||
| 4406 | - file level rather than the object level. New methods include | ||
| 4407 | - ``QPDF::getXRefTable()``, | ||
| 4408 | - ``QPDFObjectHandle::getParsedOffset()``, | ||
| 4409 | - ``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and | ||
| 4410 | - ``QPDFWriter::getWrittenXRefTable()``. | ||
| 4411 | - | ||
| 4412 | - - Support build-time and runtime selectable crypto providers. | ||
| 4413 | - This includes the addition of new classes | ||
| 4414 | - ``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the | ||
| 4415 | - recognition of the ``QPDF_CRYPTO_PROVIDER`` environment | ||
| 4416 | - variable. Crypto providers are described in depth in :ref:`ref.crypto`. | ||
| 4417 | - | ||
| 4418 | - - CLI Enhancements | ||
| 4419 | - | ||
| 4420 | - - Addition of the :samp:`--show-crypto` option in | ||
| 4421 | - support of selectable crypto providers, as described in :ref:`ref.crypto`. | ||
| 4422 | - | ||
| 4423 | - - Allow ``:even`` or ``:odd`` to be appended to numeric ranges | ||
| 4424 | - for specification of the even or odd pages from among the pages | ||
| 4425 | - specified in the range. | ||
| 4426 | - | ||
| 4427 | - - Fix shell wildcard expansion behavior (``*`` and ``?``) of the | ||
| 4428 | - :command:`qpdf.exe` as built my MSVC. | ||
| 4429 | - | ||
| 4430 | -9.0.2: October 12, 2019 | ||
| 4431 | - - Bug Fix | ||
| 4432 | - | ||
| 4433 | - - Fix the name of the temporary file used by | ||
| 4434 | - :samp:`--replace-input` so that it doesn't | ||
| 4435 | - require path splitting and works with paths include | ||
| 4436 | - directories. | ||
| 4437 | - | ||
| 4438 | -9.0.1: September 20, 2019 | ||
| 4439 | - - Bug Fixes/Enhancements | ||
| 4440 | - | ||
| 4441 | - - Fix some build and test issues on big-endian systems and | ||
| 4442 | - compilers with characters that are unsigned by default. The | ||
| 4443 | - problems were in build and test only. There were no actual bugs | ||
| 4444 | - in the qpdf library itself relating to endianness or unsigned | ||
| 4445 | - characters. | ||
| 4446 | - | ||
| 4447 | - - When a dictionary has a duplicated key, report this with a | ||
| 4448 | - warning. The behavior of the library in this case is unchanged, | ||
| 4449 | - but the error condition is no longer silently ignored. | ||
| 4450 | - | ||
| 4451 | - - When a form field's display rectangle is erroneously specified | ||
| 4452 | - with inverted coordinates, detect and correct this situation. | ||
| 4453 | - This avoids some form fields from being flipped when flattening | ||
| 4454 | - annotations on files with this condition. | ||
| 4455 | - | ||
| 4456 | -9.0.0: August 31, 2019 | ||
| 4457 | - - Incompatible API (source-level) Changes (minor) | ||
| 4458 | - | ||
| 4459 | - - The method ``QUtil::strcasecmp`` has been renamed to | ||
| 4460 | - ``QUtil::str_compare_nocase``. This incompatible change is | ||
| 4461 | - necessary to enable qpdf to build on platforms that define | ||
| 4462 | - ``strcasecmp`` as a macro. | ||
| 4463 | - | ||
| 4464 | - - The ``QPDF::copyForeignObject`` method had an overloaded | ||
| 4465 | - version that took a boolean parameter that was not used. If you | ||
| 4466 | - were using this version, just omit the extra parameter. | ||
| 4467 | - | ||
| 4468 | - - There was a version ``QPDFTokenizer::expectInlineImage`` that | ||
| 4469 | - took no arguments. This version has been removed since it | ||
| 4470 | - caused the tokenizer to return incorrect inline images. A new | ||
| 4471 | - version was added some time ago that produces correct output. | ||
| 4472 | - This is a very low level method that doesn't make sense to call | ||
| 4473 | - outside of qpdf's lexical engine. There are higher level | ||
| 4474 | - methods for tokenizing content streams. | ||
| 4475 | - | ||
| 4476 | - - Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and | ||
| 4477 | - ``QPDFOutlineObjectHelper::getKids`` to return a | ||
| 4478 | - ``std::vector`` instead of a ``std::list`` of | ||
| 4479 | - ``QPDFOutlineObjectHelper`` objects. | ||
| 4480 | - | ||
| 4481 | - - Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This | ||
| 4482 | - function would allow creation of name tokens whose value would | ||
| 4483 | - change when unparsed, which is never the correct behavior. | ||
| 4484 | - | ||
| 4485 | - - CLI Enhancements | ||
| 4486 | - | ||
| 4487 | - - The :samp:`--replace-input` option may be given | ||
| 4488 | - in place of an output file name. This causes qpdf to overwrite | ||
| 4489 | - the input file with the output. See the description of | ||
| 4490 | - :samp:`--replace-input` in :ref:`ref.basic-options` for more details. | ||
| 4491 | - | ||
| 4492 | - - The :samp:`--recompress-flate` instructs | ||
| 4493 | - :command:`qpdf` to recompress streams that are | ||
| 4494 | - already compressed with ``/FlateDecode``. Useful with | ||
| 4495 | - :samp:`--compression-level`. | ||
| 4496 | - | ||
| 4497 | - - The | ||
| 4498 | - :samp:`--compression-level={level}` | ||
| 4499 | - sets the zlib compression level used for any streams compressed | ||
| 4500 | - by ``/FlateDecode``. Most effective when combined with | ||
| 4501 | - :samp:`--recompress-flate`. | ||
| 4502 | - | ||
| 4503 | - - Library Enhancements | ||
| 4504 | - | ||
| 4505 | - - A new namespace ``QIntC``, provided by | ||
| 4506 | - :file:`qpdf/QIntC.hh`, provides safe | ||
| 4507 | - conversion methods between different integer types. These | ||
| 4508 | - conversion methods do range checking to ensure that the cast | ||
| 4509 | - can be performed with no loss of information. Every use of | ||
| 4510 | - ``static_cast`` in the library was inspected to see if it could | ||
| 4511 | - use one of these safe converters instead. See :ref:`ref.casting` for additional details. | ||
| 4512 | - | ||
| 4513 | - - Method ``QPDF::anyWarnings`` tells whether there have been any | ||
| 4514 | - warnings without clearing the list of warnings. | ||
| 4515 | - | ||
| 4516 | - - Method ``QPDF::closeInputSource`` closes or otherwise releases | ||
| 4517 | - the input source. This enables the input file to be deleted or | ||
| 4518 | - renamed. | ||
| 4519 | - | ||
| 4520 | - - New methods have been added to ``QUtil`` for converting back | ||
| 4521 | - and forth between strings and unsigned integers: | ||
| 4522 | - ``uint_to_string``, ``uint_to_string_base``, | ||
| 4523 | - ``string_to_uint``, and ``string_to_ull``. | ||
| 4524 | - | ||
| 4525 | - - New methods have been added to ``QPDFObjectHandle`` that return | ||
| 4526 | - the value of ``Integer`` objects as ``int`` or ``unsigned int`` | ||
| 4527 | - with range checking and sensible fallback values, and a new | ||
| 4528 | - method was added to return an unsigned value. This makes it | ||
| 4529 | - easier to write code that is safe from unintentional data loss. | ||
| 4530 | - Functions: ``getUIntValue``, ``getIntValueAsInt``, | ||
| 4531 | - ``getUIntValueAsUInt``. | ||
| 4532 | - | ||
| 4533 | - - When parsing content streams with | ||
| 4534 | - ``QPDFObjectHandle::ParserCallbacks``, in place of the method | ||
| 4535 | - ``handleObject(QPDFObjectHandle)``, the developer may override | ||
| 4536 | - ``handleObject(QPDFObjectHandle, size_t offset, size_t | ||
| 4537 | - length)``. If this method is defined, it will | ||
| 4538 | - be invoked with the object along with its offset and length | ||
| 4539 | - within the overall contents being parsed. Intervening spaces | ||
| 4540 | - and comments are not included in offset and length. | ||
| 4541 | - Additionally, a new method ``contentSize(size_t)`` may be | ||
| 4542 | - implemented. If present, it will be called prior to the first | ||
| 4543 | - call to ``handleObject`` with the total size in bytes of the | ||
| 4544 | - combined contents. | ||
| 4545 | - | ||
| 4546 | - - New methods ``QPDF::userPasswordMatched`` and | ||
| 4547 | - ``QPDF::ownerPasswordMatched`` have been added to enable a | ||
| 4548 | - caller to determine whether the supplied password was the user | ||
| 4549 | - password, the owner password, or both. This information is also | ||
| 4550 | - displayed by :command:`qpdf --show-encryption` | ||
| 4551 | - and :command:`qpdf --check`. | ||
| 4552 | - | ||
| 4553 | - - Static method ``Pl_Flate::setCompressionLevel`` can be called | ||
| 4554 | - to set the zlib compression level globally used by all | ||
| 4555 | - instances of Pl_Flate in deflate mode. | ||
| 4556 | - | ||
| 4557 | - - The method ``QPDFWriter::setRecompressFlate`` can be called to | ||
| 4558 | - tell ``QPDFWriter`` to uncompress and recompress streams | ||
| 4559 | - already compressed with ``/FlateDecode``. | ||
| 4560 | - | ||
| 4561 | - - The underlying implementation of QPDF arrays has been enhanced | ||
| 4562 | - to be much more memory efficient when dealing with arrays with | ||
| 4563 | - lots of nulls. This enables qpdf to use drastically less memory | ||
| 4564 | - for certain types of files. | ||
| 4565 | - | ||
| 4566 | - - When traversing the pages tree, if nodes are encountered with | ||
| 4567 | - invalid types, the types are fixed, and a warning is issued. | ||
| 4568 | - | ||
| 4569 | - - A new helper method ``QUtil::read_file_into_memory`` was added. | ||
| 4570 | - | ||
| 4571 | - - All conditions previously reported by | ||
| 4572 | - ``QPDF::checkLinearization()`` as errors are now presented as | ||
| 4573 | - warnings. | ||
| 4574 | - | ||
| 4575 | - - Name tokens containing the ``#`` character not preceded by two | ||
| 4576 | - hexadecimal digits, which is invalid in PDF 1.2 and above, are | ||
| 4577 | - properly handled by the library: a warning is generated, and | ||
| 4578 | - the name token is properly preserved, even if invalid, in the | ||
| 4579 | - output. See :file:`ChangeLog` for a more | ||
| 4580 | - complete description of this change. | ||
| 4581 | - | ||
| 4582 | - - Bug Fixes | ||
| 4583 | - | ||
| 4584 | - - A small handful of memory issues, assertion failures, and | ||
| 4585 | - unhandled exceptions that could occur on badly mangled input | ||
| 4586 | - files have been fixed. Most of these problems were found by | ||
| 4587 | - Google's OSS-Fuzz project. | ||
| 4588 | - | ||
| 4589 | - - When :command:`qpdf --check` or | ||
| 4590 | - :command:`qpdf --check-linearization` encounters | ||
| 4591 | - a file with linearization warnings but not errors, it now | ||
| 4592 | - properly exits with exit code 3 instead of 2. | ||
| 4593 | - | ||
| 4594 | - - The :samp:`--completion-bash` and | ||
| 4595 | - :samp:`--completion-zsh` options now work | ||
| 4596 | - properly when qpdf is invoked as an AppImage. | ||
| 4597 | - | ||
| 4598 | - - Calling ``QPDFWriter::set*EncryptionParameters`` on a | ||
| 4599 | - ``QPDFWriter`` object whose output filename has not yet been | ||
| 4600 | - set no longer produces a segmentation fault. | ||
| 4601 | - | ||
| 4602 | - - When reading encrypted files, follow the spec more closely | ||
| 4603 | - regarding encryption key length. This allows qpdf to open | ||
| 4604 | - encrypted files in most cases when they have invalid or missing | ||
| 4605 | - /Length keys in the encryption dictionary. | ||
| 4606 | - | ||
| 4607 | - - Build Changes | ||
| 4608 | - | ||
| 4609 | - - On platforms that support it, qpdf now builds with | ||
| 4610 | - :samp:`-fvisibility=hidden`. If you build qpdf | ||
| 4611 | - with your own build system, this is now safe to use. This | ||
| 4612 | - prevents methods that are not part of the public API from being | ||
| 4613 | - exported by the shared library, and makes qpdf's ELF shared | ||
| 4614 | - libraries (used on Linux, MacOS, and most other UNIX flavors) | ||
| 4615 | - behave more like the Windows DLL. Since the DLL already behaves | ||
| 4616 | - in much this way, it is unlikely that there are any methods | ||
| 4617 | - that were accidentally not exported. However, with ELF shared | ||
| 4618 | - libraries, typeinfo for some classes has to be explicitly | ||
| 4619 | - exported. If there are problems in dynamically linked code | ||
| 4620 | - catching exceptions or subclassing, this could be the reason. | ||
| 4621 | - If you see this, please report a bug at | ||
| 4622 | - https://github.com/qpdf/qpdf/issues/. | ||
| 4623 | - | ||
| 4624 | - - QPDF is now compiled with integer conversion and sign | ||
| 4625 | - conversion warnings enabled. Numerous changes were made to the | ||
| 4626 | - library to make this safe. | ||
| 4627 | - | ||
| 4628 | - - QPDF's :command:`make install` target explicitly | ||
| 4629 | - specifies the mode to use when installing files instead of | ||
| 4630 | - relying the user's umask. It was previously doing this for some | ||
| 4631 | - files but not others. | ||
| 4632 | - | ||
| 4633 | - - If :command:`pkg-config` is available, use it to | ||
| 4634 | - locate :file:`libjpeg` and | ||
| 4635 | - :file:`zlib` dependencies, falling back on | ||
| 4636 | - old behavior if unsuccessful. | ||
| 4637 | - | ||
| 4638 | - - Other Notes | ||
| 4639 | - | ||
| 4640 | - - QPDF has been fully integrated into `Google's OSS-Fuzz | ||
| 4641 | - project <https://github.com/google/oss-fuzz>`__. This project | ||
| 4642 | - exercises code with randomly mutated inputs and is great for | ||
| 4643 | - discovering hidden security crashes and security issues. | ||
| 4644 | - Several bugs found by oss-fuzz have already been fixed in qpdf. | ||
| 4645 | - | ||
| 4646 | -8.4.2: May 18, 2019 | ||
| 4647 | - This release has just one change: correction of a buffer overrun in | ||
| 4648 | - the Windows code used to open files. Windows users should take this | ||
| 4649 | - update. There are no code changes that affect non-Windows releases. | ||
| 4650 | - | ||
| 4651 | -8.4.1: April 27, 2019 | ||
| 4652 | - - Enhancements | ||
| 4653 | - | ||
| 4654 | - - When :command:`qpdf --version` is run, it will | ||
| 4655 | - detect if the qpdf CLI was built with a different version of | ||
| 4656 | - qpdf than the library, which may indicate a problem with the | ||
| 4657 | - installation. | ||
| 4658 | - | ||
| 4659 | - - New option :samp:`--remove-page-labels` will | ||
| 4660 | - remove page labels before generating output. This used to | ||
| 4661 | - happen if you ran :command:`qpdf --empty --pages .. | ||
| 4662 | - --`, but the behavior changed in qpdf 8.3.0. This | ||
| 4663 | - option enables people who were relying on the old behavior to | ||
| 4664 | - get it again. | ||
| 4665 | - | ||
| 4666 | - - New option | ||
| 4667 | - :samp:`--keep-files-open-threshold={count}` | ||
| 4668 | - can be used to override number of files that qpdf will use to | ||
| 4669 | - trigger the behavior of not keeping all files open when merging | ||
| 4670 | - files. This may be necessary if your system allows fewer than | ||
| 4671 | - the default value of 200 files to be open at the same time. | ||
| 4672 | - | ||
| 4673 | - - Bug Fixes | ||
| 4674 | - | ||
| 4675 | - - Handle Unicode characters in filenames on Windows. The changes | ||
| 4676 | - to support Unicode on the CLI in Windows broke Unicode | ||
| 4677 | - filenames for Windows. | ||
| 4678 | - | ||
| 4679 | - - Slightly tighten logic that determines whether an object is a | ||
| 4680 | - page. This should resolve problems in some rare files where | ||
| 4681 | - some non-page objects were passing qpdf's test for whether | ||
| 4682 | - something was a page, thus causing them to be erroneously lost | ||
| 4683 | - during page splitting operations. | ||
| 4684 | - | ||
| 4685 | - - Revert change that included preservation of outlines | ||
| 4686 | - (bookmarks) in :samp:`--split-pages`. The way | ||
| 4687 | - it was implemented in 8.3.0 and 8.4.0 caused a very significant | ||
| 4688 | - degradation of performance for splitting certain files. A | ||
| 4689 | - future release of qpdf may re-introduce the behavior in a more | ||
| 4690 | - performant and also more correct fashion. | ||
| 4691 | - | ||
| 4692 | - - In JSON mode, add missing leading 0 to decimal values between | ||
| 4693 | - -1 and 1 even if not present in the input. The JSON | ||
| 4694 | - specification requires the leading 0. The PDF specification | ||
| 4695 | - does not. | ||
| 4696 | - | ||
| 4697 | -8.4.0: February 1, 2019 | ||
| 4698 | - - Command-line Enhancements | ||
| 4699 | - | ||
| 4700 | - - *Non-compatible CLI change:* The qpdf command-line tool | ||
| 4701 | - interprets passwords given at the command-line differently from | ||
| 4702 | - previous releases when the passwords contain non-ASCII | ||
| 4703 | - characters. In some cases, the behavior differs from previous | ||
| 4704 | - releases. For a discussion of the current behavior, please see | ||
| 4705 | - :ref:`ref.unicode-passwords`. The | ||
| 4706 | - incompatibilities are as follows: | ||
| 4707 | - | ||
| 4708 | - - On Windows, qpdf now receives all command-line options as | ||
| 4709 | - Unicode strings if it can figure out the appropriate | ||
| 4710 | - compile/link options. This is enabled at least for MSVC and | ||
| 4711 | - mingw builds. That means that if non-ASCII strings are | ||
| 4712 | - passed to the qpdf CLI in Windows, qpdf will now correctly | ||
| 4713 | - receive them. In the past, they would have either been | ||
| 4714 | - encoded as Windows code page 1252 (also known as "Windows | ||
| 4715 | - ANSI" or as something unintelligible. In almost all cases, | ||
| 4716 | - qpdf is able to properly interpret Unicode arguments now, | ||
| 4717 | - whereas in the past, it would almost never interpret them | ||
| 4718 | - properly. The result is that non-ASCII passwords given to | ||
| 4719 | - the qpdf CLI on Windows now have a much greater chance of | ||
| 4720 | - creating PDF files that can be opened by a variety of | ||
| 4721 | - readers. In the past, usually files encrypted from the | ||
| 4722 | - Windows CLI using non-ASCII passwords would not be readable | ||
| 4723 | - by most viewers. Note that the current version of qpdf is | ||
| 4724 | - able to decrypt files that it previously created using the | ||
| 4725 | - previously supplied password. | ||
| 4726 | - | ||
| 4727 | - - The PDF specification requires passwords to be encoded as | ||
| 4728 | - UTF-8 for 256-bit encryption and with PDF Doc encoding for | ||
| 4729 | - 40-bit or 128-bit encryption. Older versions of qpdf left it | ||
| 4730 | - up to the user to provide passwords with the correct | ||
| 4731 | - encoding. The qpdf CLI now detects when a password is given | ||
| 4732 | - with UTF-8 encoding and automatically transcodes it to what | ||
| 4733 | - the PDF spec requires. While this is almost always the | ||
| 4734 | - correct behavior, it is possible to override the behavior if | ||
| 4735 | - there is some reason to do so. This is discussed in more | ||
| 4736 | - depth in :ref:`ref.unicode-passwords`. | ||
| 4737 | - | ||
| 4738 | - - New options | ||
| 4739 | - :samp:`--externalize-inline-images`, | ||
| 4740 | - :samp:`--ii-min-bytes`, and | ||
| 4741 | - :samp:`--keep-inline-images` control qpdf's | ||
| 4742 | - handling of inline images and possible conversion of them to | ||
| 4743 | - regular images. By default, | ||
| 4744 | - :samp:`--optimize-images` now also applies to | ||
| 4745 | - inline images. These options are discussed in :ref:`ref.advanced-transformation`. | ||
| 4746 | - | ||
| 4747 | - - Add options :samp:`--overlay` and | ||
| 4748 | - :samp:`--underlay` for overlaying or | ||
| 4749 | - underlaying pages of other files onto output pages. See | ||
| 4750 | - :ref:`ref.overlay-underlay` for | ||
| 4751 | - details. | ||
| 4752 | - | ||
| 4753 | - - When opening an encrypted file with a password, if the | ||
| 4754 | - specified password doesn't work and the password contains any | ||
| 4755 | - non-ASCII characters, qpdf will try a number of alternative | ||
| 4756 | - passwords to try to compensate for possible character encoding | ||
| 4757 | - errors. This behavior can be suppressed with the | ||
| 4758 | - :samp:`--suppress-password-recovery` option. | ||
| 4759 | - See :ref:`ref.unicode-passwords` for a full | ||
| 4760 | - discussion. | ||
| 4761 | - | ||
| 4762 | - - Add the :samp:`--password-mode` option to | ||
| 4763 | - fine-tune how qpdf interprets password arguments, especially | ||
| 4764 | - when they contain non-ASCII characters. See :ref:`ref.unicode-passwords` for more information. | ||
| 4765 | - | ||
| 4766 | - - In the :samp:`--pages` option, it is now | ||
| 4767 | - possible to copy the same page more than once from the same | ||
| 4768 | - file without using the previous workaround of specifying two | ||
| 4769 | - different paths to the same file. | ||
| 4770 | - | ||
| 4771 | - - In the :samp:`--pages` option, allow use of "." | ||
| 4772 | - as a shortcut for the primary input file. That way, you can do | ||
| 4773 | - :command:`qpdf in.pdf --pages . 1-2 -- out.pdf` | ||
| 4774 | - instead of having to repeat :file:`in.pdf` | ||
| 4775 | - in the command. | ||
| 4776 | - | ||
| 4777 | - - When encrypting with 128-bit and 256-bit encryption, new | ||
| 4778 | - encryption options :samp:`--assemble`, | ||
| 4779 | - :samp:`--annotate`, | ||
| 4780 | - :samp:`--form`, and | ||
| 4781 | - :samp:`--modify-other` allow more fine-grained | ||
| 4782 | - granularity in configuring options. Before, the | ||
| 4783 | - :samp:`--modify` option only configured certain | ||
| 4784 | - predefined groups of permissions. | ||
| 4785 | - | ||
| 4786 | - - Bug Fixes and Enhancements | ||
| 4787 | - | ||
| 4788 | - - *Potential data-loss bug:* Versions of qpdf between 8.1.0 and | ||
| 4789 | - 8.3.0 had a bug that could cause page splitting and merging | ||
| 4790 | - operations to drop some font or image resources if the PDF | ||
| 4791 | - file's internal structure shared these resource lists across | ||
| 4792 | - pages and if some but not all of the pages in the output did | ||
| 4793 | - not reference all the fonts and images. Using the | ||
| 4794 | - :samp:`--preserve-unreferenced-resources` | ||
| 4795 | - option would work around the incorrect behavior. This bug was | ||
| 4796 | - the result of a typo in the code and a deficiency in the test | ||
| 4797 | - suite. The case that triggered the error was known, just not | ||
| 4798 | - handled properly. This case is now exercised in qpdf's test | ||
| 4799 | - suite and properly handled. | ||
| 4800 | - | ||
| 4801 | - - When optimizing images, detect and refuse to optimize images | ||
| 4802 | - that can't be converted to JPEG because of bit depth or color | ||
| 4803 | - space. | ||
| 4804 | - | ||
| 4805 | - - Linearization and page manipulation APIs now detect and recover | ||
| 4806 | - from files that have duplicate Page objects in the pages tree. | ||
| 4807 | - | ||
| 4808 | - - Using older option | ||
| 4809 | - :samp:`--stream-data=compress` with object | ||
| 4810 | - streams, object streams and xref streams were not compressed. | ||
| 4811 | - | ||
| 4812 | - - When the tokenizer returns inline image tokens, delimiters | ||
| 4813 | - following ``ID`` and ``EI`` operators are no longer excluded. | ||
| 4814 | - This makes it possible to reliably extract the actual image | ||
| 4815 | - data. | ||
| 4816 | - | ||
| 4817 | - - Library Enhancements | ||
| 4818 | - | ||
| 4819 | - - Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to | ||
| 4820 | - convert inline images to regular images. | ||
| 4821 | - | ||
| 4822 | - - Add method ``QUtil::possible_repaired_encodings()`` to generate | ||
| 4823 | - a list of strings that represent other ways the given string | ||
| 4824 | - could have been encoded. This is the method the QPDF CLI uses | ||
| 4825 | - to generate the strings it tries when recovering incorrectly | ||
| 4826 | - encoded Unicode passwords. | ||
| 4827 | - | ||
| 4828 | - - Add new versions of | ||
| 4829 | - ``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow | ||
| 4830 | - more granular setting of permissions bits. See | ||
| 4831 | - :file:`QPDFWriter.hh` for details. | ||
| 4832 | - | ||
| 4833 | - - Add new versions of the transcoders from UTF-8 to single-byte | ||
| 4834 | - coding systems in ``QUtil`` that report success or failure | ||
| 4835 | - rather than just substituting a specified unknown character. | ||
| 4836 | - | ||
| 4837 | - - Add method ``QUtil::analyze_encoding()`` to determine whether a | ||
| 4838 | - string has high-bit characters and is appears to be UTF-16 or | ||
| 4839 | - valid UTF-8 encoding. | ||
| 4840 | - | ||
| 4841 | - - Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to | ||
| 4842 | - copy a new page that is a "shallow copy" of a page. The | ||
| 4843 | - resulting object is an indirect object ready to be passed to | ||
| 4844 | - ``QPDFPageDocumentHelper::addPage()`` for either the original | ||
| 4845 | - ``QPDF`` object or a different one. This is what the | ||
| 4846 | - :command:`qpdf` command-line tool uses to copy | ||
| 4847 | - the same page multiple times from the same file during | ||
| 4848 | - splitting and merging operations. | ||
| 4849 | - | ||
| 4850 | - - Add method ``QPDF::getUniqueId()``, which returns a unique | ||
| 4851 | - identifier for the given QPDF object. The identifier will be | ||
| 4852 | - unique across the life of the application. The returned value | ||
| 4853 | - can be safely used as a map key. | ||
| 4854 | - | ||
| 4855 | - - Add method ``QPDF::setImmediateCopyFrom``. This further | ||
| 4856 | - enhances qpdf's ability to allow a ``QPDF`` object from which | ||
| 4857 | - objects are being copied to go out of scope before the | ||
| 4858 | - destination object is written. If you call this method on a | ||
| 4859 | - ``QPDF`` instances, objects copied *from* this instance will be | ||
| 4860 | - copied immediately instead of lazily. This option uses more | ||
| 4861 | - memory but allows the source object to go out of scope before | ||
| 4862 | - the destination object is written in all cases. See comments in | ||
| 4863 | - :file:`QPDF.hh` for details. | ||
| 4864 | - | ||
| 4865 | - - Add method ``QPDFPageObjectHelper::getAttribute`` for | ||
| 4866 | - retrieving an attribute from the page dictionary taking | ||
| 4867 | - inheritance into consideration, and optionally making a copy if | ||
| 4868 | - your intention is to modify the attribute. | ||
| 4869 | - | ||
| 4870 | - - Fix long-standing limitation of | ||
| 4871 | - ``QPDFPageObjectHelper::getPageImages`` so that it now properly | ||
| 4872 | - reports images from inherited resources dictionaries, | ||
| 4873 | - eliminating the need to call | ||
| 4874 | - ``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in | ||
| 4875 | - this case. | ||
| 4876 | - | ||
| 4877 | - - Add method ``QPDFObjectHandle::getUniqueResourceName`` for | ||
| 4878 | - finding an unused name in a resource dictionary. | ||
| 4879 | - | ||
| 4880 | - - Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for | ||
| 4881 | - generating a form XObject equivalent to a page. The resulting | ||
| 4882 | - object can be used in the same file or copied to another file | ||
| 4883 | - with ``copyForeignObject``. This can be useful for implementing | ||
| 4884 | - underlay, overlay, n-up, thumbnails, or any other functionality | ||
| 4885 | - requiring replication of pages in other contexts. | ||
| 4886 | - | ||
| 4887 | - - Add method ``QPDFPageObjectHelper::placeFormXObject`` for | ||
| 4888 | - generating content stream text that places a given form XObject | ||
| 4889 | - on a page, centered and fit within a specified rectangle. This | ||
| 4890 | - method takes care of computing the proper transformation matrix | ||
| 4891 | - and may optionally compensate for rotation or scaling of the | ||
| 4892 | - destination page. | ||
| 4893 | - | ||
| 4894 | - - Build Improvements | ||
| 4895 | - | ||
| 4896 | - - Add new configure option | ||
| 4897 | - :samp:`--enable-avoid-windows-handle`, which | ||
| 4898 | - causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be | ||
| 4899 | - defined. When defined, qpdf will avoid referencing the Windows | ||
| 4900 | - ``HANDLE`` type, which is disallowed with certain versions of | ||
| 4901 | - the Windows SDK. | ||
| 4902 | - | ||
| 4903 | - - For Windows builds, attempt to determine what options, if any, | ||
| 4904 | - have to be passed to the compiler and linker to enable use of | ||
| 4905 | - ``wmain``. This causes the preprocessor symbol | ||
| 4906 | - ``WINDOWS_WMAIN`` to be defined. If you do your own builds with | ||
| 4907 | - other compilers, you can define this symbol to cause ``wmain`` | ||
| 4908 | - to be used. This is needed to allow the Windows | ||
| 4909 | - :command:`qpdf` command to receive Unicode | ||
| 4910 | - command-line options. | ||
| 4911 | - | ||
| 4912 | -8.3.0: January 7, 2019 | ||
| 4913 | - - Command-line Enhancements | ||
| 4914 | - | ||
| 4915 | - - Shell completion: you can now use eval :command:`$(qpdf | ||
| 4916 | - --completion-bash)` and eval :command:`$(qpdf | ||
| 4917 | - --completion-zsh)` to enable shell completion for | ||
| 4918 | - bash and zsh. | ||
| 4919 | - | ||
| 4920 | - - Page numbers (also known as page labels) are now preserved when | ||
| 4921 | - merging and splitting files with the | ||
| 4922 | - :samp:`--pages` and | ||
| 4923 | - :samp:`--split-pages` options. | ||
| 4924 | - | ||
| 4925 | - - Bookmarks are partially preserved when splitting pages with the | ||
| 4926 | - :samp:`--split-pages` option. Specifically, the | ||
| 4927 | - outlines dictionary and some supporting metadata are copied | ||
| 4928 | - into the split files. The result is that all bookmarks from the | ||
| 4929 | - original file appear, those that point to pages that are | ||
| 4930 | - preserved work, and those that point to pages that are not | ||
| 4931 | - preserved don't do anything. This is an interim step toward | ||
| 4932 | - proper support for bookmarks in splitting and merging | ||
| 4933 | - operations. | ||
| 4934 | - | ||
| 4935 | - - Page collation: add new option | ||
| 4936 | - :samp:`--collate`. When specified, the | ||
| 4937 | - semantics of :samp:`--pages` change from | ||
| 4938 | - concatenation to collation. See :ref:`ref.page-selection` for examples and discussion. | ||
| 4939 | - | ||
| 4940 | - - Generation of information in JSON format, primarily to | ||
| 4941 | - facilitate use of qpdf from languages other than C++. Add new | ||
| 4942 | - options :samp:`--json`, | ||
| 4943 | - :samp:`--json-key`, and | ||
| 4944 | - :samp:`--json-object` to generate a JSON | ||
| 4945 | - representation of the PDF file. Run :command:`qpdf | ||
| 4946 | - --json-help` to get a description of the JSON | ||
| 4947 | - format. For more information, see :ref:`ref.json`. | ||
| 4948 | - | ||
| 4949 | - - The :samp:`--generate-appearances` flag will | ||
| 4950 | - cause qpdf to generate appearances for form fields if the PDF | ||
| 4951 | - file indicates that form field appearances are out of date. | ||
| 4952 | - This can happen when PDF forms are filled in by a program that | ||
| 4953 | - doesn't know how to regenerate the appearances of the filled-in | ||
| 4954 | - fields. | ||
| 4955 | - | ||
| 4956 | - - The :samp:`--flatten-annotations` flag can be | ||
| 4957 | - used to *flatten* annotations, including form fields. | ||
| 4958 | - Ordinarily, annotations are drawn separately from the page. | ||
| 4959 | - Flattening annotations is the process of combining their | ||
| 4960 | - appearances into the page's contents. You might want to do this | ||
| 4961 | - if you are going to rotate or combine pages using a tool that | ||
| 4962 | - doesn't understand about annotations. You may also want to use | ||
| 4963 | - :samp:`--generate-appearances` when using this | ||
| 4964 | - flag since annotations for outdated form fields are not | ||
| 4965 | - flattened as that would cause loss of information. | ||
| 4966 | - | ||
| 4967 | - - The :samp:`--optimize-images` flag tells qpdf | ||
| 4968 | - to recompresses every image using DCT (JPEG) compression as | ||
| 4969 | - long as the image is not already compressed with lossy | ||
| 4970 | - compression and recompressing the image reduces its size. The | ||
| 4971 | - additional options :samp:`--oi-min-width`, | ||
| 4972 | - :samp:`--oi-min-height`, and | ||
| 4973 | - :samp:`--oi-min-area` prevent recompression of | ||
| 4974 | - images whose width, height, or pixel area (widthย รย height) are | ||
| 4975 | - below a specified threshold. | ||
| 4976 | - | ||
| 4977 | - - The :samp:`--show-object` option can now be | ||
| 4978 | - given as :samp:`--show-object=trailer` to show | ||
| 4979 | - the trailer dictionary. | ||
| 4980 | - | ||
| 4981 | - - Bug Fixes and Enhancements | ||
| 4982 | - | ||
| 4983 | - - QPDF now automatically detects and recovers from dangling | ||
| 4984 | - references. If a PDF file contained an indirect reference to a | ||
| 4985 | - non-existent object, which is valid, when adding a new object | ||
| 4986 | - to the file, it was possible for the new object to take the | ||
| 4987 | - object ID of the dangling reference, thereby causing the | ||
| 4988 | - dangling reference to point to the new object. This case is now | ||
| 4989 | - prevented. | ||
| 4990 | - | ||
| 4991 | - - Fixes to form field setting code: strings are always written in | ||
| 4992 | - UTF-16 format, and checkboxes and radio buttons are handled | ||
| 4993 | - properly with respect to synchronization of values and | ||
| 4994 | - appearance states. | ||
| 4995 | - | ||
| 4996 | - - The ``QPDF::checkLinearization()`` no longer causes the program | ||
| 4997 | - to crash when it detects problems with linearization data. | ||
| 4998 | - Instead, it issues a normal warning or error. | ||
| 4999 | - | ||
| 5000 | - - Ordinarily qpdf treats an argument of the form | ||
| 5001 | - :samp:`@file` to mean that command-line options | ||
| 5002 | - should be read from :file:`file`. Now, if | ||
| 5003 | - :file:`file` does not exist but | ||
| 5004 | - :file:`@file` does, qpdf will treat | ||
| 5005 | - :file:`@file` as a regular option. This | ||
| 5006 | - makes it possible to work more easily with PDF files whose | ||
| 5007 | - names happen to start with the ``@`` character. | ||
| 5008 | - | ||
| 5009 | - - Library Enhancements | ||
| 5010 | - | ||
| 5011 | - - Remove the restriction in most cases that the source QPDF | ||
| 5012 | - object used in a ``QPDF::copyForeignObject`` call has to stick | ||
| 5013 | - around until the destination QPDF is written. The exceptional | ||
| 5014 | - case is when the source stream gets is data using a | ||
| 5015 | - QPDFObjectHandle::StreamDataProvider. For a more in-depth | ||
| 5016 | - discussion, see comments around ``copyForeignObject`` in | ||
| 5017 | - :file:`QPDF.hh`. | ||
| 5018 | - | ||
| 5019 | - - Add new method ``QPDFWriter::getFinalVersion()``, which returns | ||
| 5020 | - the PDF version that will ultimately be written to the final | ||
| 5021 | - file. See comments in :file:`QPDFWriter.hh` | ||
| 5022 | - for some restrictions on its use. | ||
| 5023 | - | ||
| 5024 | - - Add several methods for transcoding strings to some of the | ||
| 5025 | - character sets used in PDF files: ``QUtil::utf8_to_ascii``, | ||
| 5026 | - ``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and | ||
| 5027 | - ``QUtil::utf8_to_utf16``. For the single-byte encodings that | ||
| 5028 | - support only a limited character sets, these methods replace | ||
| 5029 | - unsupported characters with a specified substitute. | ||
| 5030 | - | ||
| 5031 | - - Add new methods to ``QPDFAnnotationObjectHelper`` and | ||
| 5032 | - ``QPDFFormFieldObjectHelper`` for querying flags and | ||
| 5033 | - interpretation of different field types. Define constants in | ||
| 5034 | - :file:`qpdf/Constants.h` to help with | ||
| 5035 | - interpretation of flag values. | ||
| 5036 | - | ||
| 5037 | - - Add new methods | ||
| 5038 | - ``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and | ||
| 5039 | - ``QPDFFormFieldObjectHelper::generateAppearance`` for | ||
| 5040 | - generating appearance streams. See discussion in | ||
| 5041 | - :file:`QPDFFormFieldObjectHelper.hh` for | ||
| 5042 | - limitations. | ||
| 5043 | - | ||
| 5044 | - - Add two new helper functions for dealing with resource | ||
| 5045 | - dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns | ||
| 5046 | - a list of all second-level keys, which correspond to the names | ||
| 5047 | - of resources, and ``QPDFObjectHandle::mergeResources()`` merges | ||
| 5048 | - two resources dictionaries as long as they have non-conflicting | ||
| 5049 | - keys. These methods are useful for certain types of objects | ||
| 5050 | - that resolve resources from multiple places, such as form | ||
| 5051 | - fields. | ||
| 5052 | - | ||
| 5053 | - - Add methods ``QPDFPageDocumentHelper::flattenAnnotations()`` | ||
| 5054 | - and | ||
| 5055 | - ``QPDFAnnotationObjectHelper::getPageContentForAppearance()`` | ||
| 5056 | - for handling low-level details of annotation flattening. | ||
| 5057 | - | ||
| 5058 | - - Add new helper classes: ``QPDFOutlineDocumentHelper``, | ||
| 5059 | - ``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``, | ||
| 5060 | - ``QPDFNameTreeObjectHelper``, and | ||
| 5061 | - ``QPDFNumberTreeObjectHelper``. | ||
| 5062 | - | ||
| 5063 | - - Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON | ||
| 5064 | - representation of the object. Call ``serialize()`` on the | ||
| 5065 | - result to convert it to a string. | ||
| 5066 | - | ||
| 5067 | - - Add a simple JSON serializer. This is not a complete or | ||
| 5068 | - general-purpose JSON library. It allows assembly and | ||
| 5069 | - serialization of JSON structures with some restrictions, which | ||
| 5070 | - are described in the header file. This is the serializer used | ||
| 5071 | - by qpdf's new JSON representation. | ||
| 5072 | - | ||
| 5073 | - - Add new ``QPDFObjectHandle::Matrix`` class along with a few | ||
| 5074 | - convenience methods for dealing with six-element numerical | ||
| 5075 | - arrays as matrices. | ||
| 5076 | - | ||
| 5077 | - - Add new method ``QPDFObjectHandle::wrapInArray``, which returns | ||
| 5078 | - the object itself if it is an array, or an array containing the | ||
| 5079 | - object otherwise. This is a common construct in PDF. This | ||
| 5080 | - method prevents you from having to explicitly test whether | ||
| 5081 | - something is a single element or an array. | ||
| 5082 | - | ||
| 5083 | - - Build Improvements | ||
| 5084 | - | ||
| 5085 | - - It is no longer necessary to run | ||
| 5086 | - :command:`autogen.sh` to build from a pristine | ||
| 5087 | - checkout. Automatically generated files are now committed so | ||
| 5088 | - that it is possible to build on platforms without autoconf | ||
| 5089 | - directly from a clean checkout of the repository. The | ||
| 5090 | - :command:`configure` script detects if the files | ||
| 5091 | - are out of date when it also determines that the tools are | ||
| 5092 | - present to regenerate them. | ||
| 5093 | - | ||
| 5094 | - - Pull requests and the master branch are now built automatically | ||
| 5095 | - in `Azure | ||
| 5096 | - Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is | ||
| 5097 | - free for open source projects. The build includes Linux, mac, | ||
| 5098 | - Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage | ||
| 5099 | - build. Official qpdf releases are now built with Azure | ||
| 5100 | - Pipelines. | ||
| 5101 | - | ||
| 5102 | - - Notes for Packagers | ||
| 5103 | - | ||
| 5104 | - - A new section has been added to the documentation with notes | ||
| 5105 | - for packagers. Please see :ref:`ref.packaging`. | ||
| 5106 | - | ||
| 5107 | - - The qpdf detects out-of-date automatically generated files. If | ||
| 5108 | - your packaging system automatically refreshes libtool or | ||
| 5109 | - autoconf files, it could cause this check to fail. To avoid | ||
| 5110 | - this problem, pass | ||
| 5111 | - :samp:`--disable-check-autofiles` to | ||
| 5112 | - :command:`configure`. | ||
| 5113 | - | ||
| 5114 | - - If you would like to have qpdf completion enabled | ||
| 5115 | - automatically, you can install completion files in the | ||
| 5116 | - distribution's default location. You can find sample completion | ||
| 5117 | - files to install in the :file:`completions` | ||
| 5118 | - directory. | ||
| 5119 | - | ||
| 5120 | -8.2.1: August 18, 2018 | ||
| 5121 | - - Command-line Enhancements | ||
| 5122 | - | ||
| 5123 | - - Add | ||
| 5124 | - :samp:`--keep-files-open={[yn]}` | ||
| 5125 | - to override default determination of whether to keep files open | ||
| 5126 | - when merging. Please see the discussion of | ||
| 5127 | - :samp:`--keep-files-open` in :ref:`ref.basic-options` for additional details. | ||
| 5128 | - | ||
| 5129 | -8.2.0: August 16, 2018 | ||
| 5130 | - - Command-line Enhancements | ||
| 5131 | - | ||
| 5132 | - - Add :samp:`--no-warn` option to suppress | ||
| 5133 | - issuing warning messages. If there are any conditions that | ||
| 5134 | - would have caused warnings to be issued, the exit status is | ||
| 5135 | - still 3. | ||
| 5136 | - | ||
| 5137 | - - Bug Fixes and Optimizations | ||
| 5138 | - | ||
| 5139 | - - Performance fix: optimize page merging operation to avoid | ||
| 5140 | - unnecessary open/close calls on files being merged. This solves | ||
| 5141 | - a dramatic slow-down that was observed when merging certain | ||
| 5142 | - types of files. | ||
| 5143 | - | ||
| 5144 | - - Optimize how memory was used for the TIFF predictor, | ||
| 5145 | - drastically improving performance and memory usage for files | ||
| 5146 | - containing high-resolution images compressed with Flate using | ||
| 5147 | - the TIFF predictor. | ||
| 5148 | - | ||
| 5149 | - - Bug fix: end of line characters were not properly handled | ||
| 5150 | - inside strings in some cases. | ||
| 5151 | - | ||
| 5152 | - - Bug fix: using :samp:`--progress` on very small | ||
| 5153 | - files could cause an infinite loop. | ||
| 5154 | - | ||
| 5155 | - - API enhancements | ||
| 5156 | - | ||
| 5157 | - - Add new class ``QPDFSystemError``, derived from | ||
| 5158 | - ``std::runtime_error``, which is now thrown by | ||
| 5159 | - ``QUtil::throw_system_error``. This enables the triggering | ||
| 5160 | - ``errno`` value to be retrieved. | ||
| 5161 | - | ||
| 5162 | - - Add ``ClosedFileInputSource::stayOpen`` method, enabling a | ||
| 5163 | - ``ClosedFileInputSource`` to stay open during manually | ||
| 5164 | - indicated periods of high activity, thus reducing the overhead | ||
| 5165 | - of frequent open/close operations. | ||
| 5166 | - | ||
| 5167 | - - Build Changes | ||
| 5168 | - | ||
| 5169 | - - For the mingw builds, change the name of the DLL import library | ||
| 5170 | - from :file:`libqpdf.a` to | ||
| 5171 | - :file:`libqpdf.dll.a` to more accurately | ||
| 5172 | - reflect that it is an import library rather than a static | ||
| 5173 | - library. This potentially clears the way for supporting a | ||
| 5174 | - static library in the future, though presently, the qpdf | ||
| 5175 | - Windows build only builds the DLL and executables. | ||
| 5176 | - | ||
| 5177 | -8.1.0: June 23, 2018 | ||
| 5178 | - - Usability Improvements | ||
| 5179 | - | ||
| 5180 | - - When splitting files, qpdf detects fonts and images that the | ||
| 5181 | - document metadata claims are referenced from a page but are not | ||
| 5182 | - actually referenced and omits them from the output file. This | ||
| 5183 | - change can cause a significant reduction in the size of split | ||
| 5184 | - PDF files for files created by some software packages. In some | ||
| 5185 | - cases, it can also make page splitting slower. Prior versions | ||
| 5186 | - of qpdf would believe the document metadata and sometimes | ||
| 5187 | - include all the images from all the other pages even though the | ||
| 5188 | - pages were no longer present. In the unlikely event that the | ||
| 5189 | - old behavior should be desired, or if you have a case where | ||
| 5190 | - page splitting is very slow, the old behavior (and speed) can | ||
| 5191 | - be enabled by specifying | ||
| 5192 | - :samp:`--preserve-unreferenced-resources`. For | ||
| 5193 | - additional details, please see :ref:`ref.advanced-transformation`. | ||
| 5194 | - | ||
| 5195 | - - When merging multiple PDF files, qpdf no longer leaves all the | ||
| 5196 | - files open. This makes it possible to merge numbers of files | ||
| 5197 | - that may exceed the operating system's limit for the maximum | ||
| 5198 | - number of open files. | ||
| 5199 | - | ||
| 5200 | - - The :samp:`--rotate` option's syntax has been | ||
| 5201 | - extended to make the page range optional. If you specify | ||
| 5202 | - :samp:`--rotate={angle}` | ||
| 5203 | - without specifying a page range, the rotation will be applied | ||
| 5204 | - to all pages. This can be especially useful for adjusting a PDF | ||
| 5205 | - created from a multi-page document that was scanned upside | ||
| 5206 | - down. | ||
| 5207 | - | ||
| 5208 | - - When merging multiple files, the | ||
| 5209 | - :samp:`--verbose` option now prints information | ||
| 5210 | - about each file as it operates on that file. | ||
| 5211 | - | ||
| 5212 | - - When the :samp:`--progress` option is | ||
| 5213 | - specified, qpdf will print a running indicator of its best | ||
| 5214 | - guess at how far through the writing process it is. Note that, | ||
| 5215 | - as with all progress meters, it's an approximation. This option | ||
| 5216 | - is implemented in a way that makes it useful for software that | ||
| 5217 | - uses the qpdf library; see API Enhancements below. | ||
| 5218 | - | ||
| 5219 | - - Bug Fixes | ||
| 5220 | - | ||
| 5221 | - - Properly decrypt files that use revision 3 of the standard | ||
| 5222 | - security handler but use 40 bit keys (even though revision 3 | ||
| 5223 | - supports 128-bit keys). | ||
| 5224 | - | ||
| 5225 | - - Limit depth of nested data structures to prevent crashes from | ||
| 5226 | - certain types of malformed (malicious) PDFs. | ||
| 5227 | - | ||
| 5228 | - - In "newline before endstream" mode, insert the required extra | ||
| 5229 | - newline before the ``endstream`` at the end of object streams. | ||
| 5230 | - This one case was previously omitted. | ||
| 5231 | - | ||
| 5232 | - - API Enhancements | ||
| 5233 | - | ||
| 5234 | - - The first round of higher level "helper" interfaces has been | ||
| 5235 | - introduced. These are designed to provide a more convenient way | ||
| 5236 | - of interacting with certain document features than using | ||
| 5237 | - ``QPDFObjectHandle`` directly. For details on helpers, see | ||
| 5238 | - :ref:`ref.helper-classes`. Specific additional | ||
| 5239 | - interfaces are described below. | ||
| 5240 | - | ||
| 5241 | - - Add two new document helper classes: ``QPDFPageDocumentHelper`` | ||
| 5242 | - for working with pages, and ``QPDFAcroFormDocumentHelper`` for | ||
| 5243 | - working with interactive forms. No old methods have been | ||
| 5244 | - removed, but ``QPDFPageDocumentHelper`` is now the preferred | ||
| 5245 | - way to perform operations on pages rather than calling the old | ||
| 5246 | - methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments | ||
| 5247 | - in the header files direct you to the new interfaces. Please | ||
| 5248 | - see the header files and :file:`ChangeLog` | ||
| 5249 | - for additional details. | ||
| 5250 | - | ||
| 5251 | - - Add three new object helper class: ``QPDFPageObjectHelper`` for | ||
| 5252 | - pages, ``QPDFFormFieldObjectHelper`` for interactive form | ||
| 5253 | - fields, and ``QPDFAnnotationObjectHelper`` for annotations. All | ||
| 5254 | - three classes are fairly sparse at the moment, but they have | ||
| 5255 | - some useful, basic functionality. | ||
| 5256 | - | ||
| 5257 | - - A new example program | ||
| 5258 | - :file:`examples/pdf-set-form-values.cc` has | ||
| 5259 | - been added that illustrates use of the new document and object | ||
| 5260 | - helpers. | ||
| 5261 | - | ||
| 5262 | - - The method ``QPDFWriter::registerProgressReporter`` has been | ||
| 5263 | - added. This method allows you to register a function that is | ||
| 5264 | - called by ``QPDFWriter`` to update your idea of the percentage | ||
| 5265 | - it thinks it is through writing its output. Client programs can | ||
| 5266 | - use this to implement reasonably accurate progress meters. The | ||
| 5267 | - :command:`qpdf` command line tool uses this to | ||
| 5268 | - implement its :samp:`--progress` option. | ||
| 5269 | - | ||
| 5270 | - - New methods ``QPDFObjectHandle::newUnicodeString`` and | ||
| 5271 | - ``QPDFObject::unparseBinary`` have been added to allow for more | ||
| 5272 | - convenient creation of strings that are explicitly encoded | ||
| 5273 | - using big-endian UTF-16. This is useful for creating strings | ||
| 5274 | - that appear outside of content streams, such as labels, form | ||
| 5275 | - fields, outlines, document metadata, etc. | ||
| 5276 | - | ||
| 5277 | - - A new class ``QPDFObjectHandle::Rectangle`` has been added to | ||
| 5278 | - ease working with PDF rectangles, which are just arrays of four | ||
| 5279 | - numeric values. | ||
| 5280 | - | ||
| 5281 | -8.0.2: March 6, 2018 | ||
| 5282 | - - When a loop is detected while following cross reference streams or | ||
| 5283 | - tables, treat this as damage instead of silently ignoring the | ||
| 5284 | - previous table. This prevents loss of otherwise recoverable data | ||
| 5285 | - in some damaged files. | ||
| 5286 | - | ||
| 5287 | - - Properly handle pages with no contents. | ||
| 5288 | - | ||
| 5289 | -8.0.1: March 4, 2018 | ||
| 5290 | - - Disregard data check errors when uncompressing ``/FlateDecode`` | ||
| 5291 | - streams. This is consistent with most other PDF readers and allows | ||
| 5292 | - qpdf to recover data from another class of malformed PDF files. | ||
| 5293 | - | ||
| 5294 | - - On the command line when specifying page ranges, support preceding | ||
| 5295 | - a page number by "r" to indicate that it should be counted from | ||
| 5296 | - the end. For example, the range ``r3-r1`` would indicate the last | ||
| 5297 | - three pages of a document. | ||
| 5298 | - | ||
| 5299 | -8.0.0: February 25, 2018 | ||
| 5300 | - - Packaging and Distribution Changes | ||
| 5301 | - | ||
| 5302 | - - QPDF is now distributed as an | ||
| 5303 | - `AppImage <https://appimage.org/>`__ in addition to all the | ||
| 5304 | - other ways it is distributed. The AppImage can be found in the | ||
| 5305 | - download area with the other packages. Thanks to Kurt Pfeifle | ||
| 5306 | - and Simon Peter for their contributions. | ||
| 5307 | - | ||
| 5308 | - - Bug Fixes | ||
| 5309 | - | ||
| 5310 | - - ``QPDFObjectHandle::getUTF8Val`` now properly treats | ||
| 5311 | - non-Unicode strings as encoded with PDF Doc Encoding. | ||
| 5312 | - | ||
| 5313 | - - Improvements to handling of objects in PDF files that are not | ||
| 5314 | - of the expected type. In most cases, qpdf will be able to warn | ||
| 5315 | - for such cases rather than fail with an exception. Previous | ||
| 5316 | - versions of qpdf would sometimes fail with errors such as | ||
| 5317 | - "operation for dictionary object attempted on object of wrong | ||
| 5318 | - type". This situation should be mostly or entirely eliminated | ||
| 5319 | - now. | ||
| 5320 | - | ||
| 5321 | - - Enhancements to the :command:`qpdf` Command-line | ||
| 5322 | - Tool. All new options listed here are documented in more detail in | ||
| 5323 | - :ref:`ref.using`. | ||
| 5324 | - | ||
| 5325 | - - The option | ||
| 5326 | - :samp:`--linearize-pass1={file}` | ||
| 5327 | - has been added for debugging qpdf's linearization code. | ||
| 5328 | - | ||
| 5329 | - - The option :samp:`--coalesce-contents` can be | ||
| 5330 | - used to combine content streams of a page whose contents are an | ||
| 5331 | - array of streams into a single stream. | ||
| 5332 | - | ||
| 5333 | - - API Enhancements. All new API calls are documented in their | ||
| 5334 | - respective classes' header files. There are no non-compatible | ||
| 5335 | - changes to the API. | ||
| 5336 | - | ||
| 5337 | - - Add function ``qpdf_check_pdf`` to the C API. This function | ||
| 5338 | - does basic checking that is a subset of what :command:`qpdf | ||
| 5339 | - --check` performs. | ||
| 5340 | - | ||
| 5341 | - - Major enhancements to the lexical layer of qpdf. For a complete | ||
| 5342 | - list of enhancements, please refer to the | ||
| 5343 | - :file:`ChangeLog` file. Most of the changes | ||
| 5344 | - result in improvements to qpdf's ability handle erroneous | ||
| 5345 | - files. It is also possible for programs to handle whitespace, | ||
| 5346 | - comments, and inline images as tokens. | ||
| 5347 | - | ||
| 5348 | - - New API for working with PDF content streams at a lexical | ||
| 5349 | - level. The new class ``QPDFObjectHandle::TokenFilter`` allows | ||
| 5350 | - the developer to provide token handlers. Token filters can be | ||
| 5351 | - used with several different methods in ``QPDFObjectHandle`` as | ||
| 5352 | - well as with a lower-level interface. See comments in | ||
| 5353 | - :file:`QPDFObjectHandle.hh` as well as the | ||
| 5354 | - new examples | ||
| 5355 | - :file:`examples/pdf-filter-tokens.cc` and | ||
| 5356 | - :file:`examples/pdf-count-strings.cc` for | ||
| 5357 | - details. | ||
| 5358 | - | ||
| 5359 | -7.1.1: February 4, 2018 | ||
| 5360 | - - Bug fix: files whose /ID fields were other than 16 bytes long can | ||
| 5361 | - now be properly linearized | ||
| 5362 | - | ||
| 5363 | - - A few compile and link issues have been corrected for some | ||
| 5364 | - platforms. | ||
| 5365 | - | ||
| 5366 | -7.1.0: January 14, 2018 | ||
| 5367 | - - PDF files contain streams that may be compressed with various | ||
| 5368 | - compression algorithms which, in some cases, may be enhanced by | ||
| 5369 | - various predictor functions. Previously only the PNG up predictor | ||
| 5370 | - was supported. In this version, all the PNG predictors as well as | ||
| 5371 | - the TIFF predictor are supported. This increases the range of | ||
| 5372 | - files that qpdf is able to handle. | ||
| 5373 | - | ||
| 5374 | - - QPDF now allows a raw encryption key to be specified in place of a | ||
| 5375 | - password when opening encrypted files, and will optionally display | ||
| 5376 | - the encryption key used by a file. This is a non-standard | ||
| 5377 | - operation, but it can be useful in certain situations. Please see | ||
| 5378 | - the discussion of :samp:`--password-is-hex-key` in | ||
| 5379 | - :ref:`ref.basic-options` or the comments around | ||
| 5380 | - ``QPDF::setPasswordIsHexKey`` in | ||
| 5381 | - :file:`QPDF.hh` for additional details. | ||
| 5382 | - | ||
| 5383 | - - Bug fix: numbers ending with a trailing decimal point are now | ||
| 5384 | - properly recognized as numbers. | ||
| 5385 | - | ||
| 5386 | - - Bug fix: when building qpdf from source on some platforms | ||
| 5387 | - (especially MacOS), the build could get confused by older versions | ||
| 5388 | - of qpdf installed on the system. This has been corrected. | ||
| 5389 | - | ||
| 5390 | -7.0.0: September 15, 2017 | ||
| 5391 | - - Packaging and Distribution Changes | ||
| 5392 | - | ||
| 5393 | - - QPDF's primary license is now `version 2.0 of the Apache | ||
| 5394 | - License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather | ||
| 5395 | - than version 2.0 of the Artistic License. You may still, at | ||
| 5396 | - your option, consider qpdf to be licensed with version 2.0 of | ||
| 5397 | - the Artistic license. | ||
| 5398 | - | ||
| 5399 | - - QPDF no longer has a dependency on the PCRE (Perl-Compatible | ||
| 5400 | - Regular Expression) library. QPDF now has an added dependency | ||
| 5401 | - on the JPEG library. | ||
| 5402 | - | ||
| 5403 | - - Bug Fixes | ||
| 5404 | - | ||
| 5405 | - - This release contains many bug fixes for various infinite | ||
| 5406 | - loops, memory leaks, and other memory errors that could be | ||
| 5407 | - encountered with specially crafted or otherwise erroneous PDF | ||
| 5408 | - files. | ||
| 5409 | - | ||
| 5410 | - - New Features | ||
| 5411 | - | ||
| 5412 | - - QPDF now supports reading and writing streams encoded with JPEG | ||
| 5413 | - or RunLength encoding. Library API enhancements and | ||
| 5414 | - command-line options have been added to control this behavior. | ||
| 5415 | - See command-line options | ||
| 5416 | - :samp:`--compress-streams` and | ||
| 5417 | - :samp:`--decode-level` and methods | ||
| 5418 | - ``QPDFWriter::setCompressStreams`` and | ||
| 5419 | - ``QPDFWriter::setDecodeLevel``. | ||
| 5420 | - | ||
| 5421 | - - QPDF is much better at recovering from broken files. In most | ||
| 5422 | - cases, qpdf will skip invalid objects and will preserve broken | ||
| 5423 | - stream data by not attempting to filter broken streams. QPDF is | ||
| 5424 | - now able to recover or at least not crash on dozens of broken | ||
| 5425 | - test files I have received over the past few years. | ||
| 5426 | - | ||
| 5427 | - - Page rotation is now supported and accessible from both the | ||
| 5428 | - library and the command line. | ||
| 5429 | - | ||
| 5430 | - - ``QPDFWriter`` supports writing files in a way that preserves | ||
| 5431 | - PCLm compliance in support of driverless printing. This is very | ||
| 5432 | - specialized and is only useful to applications that already | ||
| 5433 | - know how to create PCLm files. | ||
| 5434 | - | ||
| 5435 | - - Enhancements to the :command:`qpdf` Command-line | ||
| 5436 | - Tool. All new options listed here are documented in more detail in | ||
| 5437 | - :ref:`ref.using`. | ||
| 5438 | - | ||
| 5439 | - - Command-line arguments can now be read from files or standard | ||
| 5440 | - input using ``@file`` or ``@-`` syntax. Please see :ref:`ref.invocation`. | ||
| 5441 | - | ||
| 5442 | - - :samp:`--rotate`: request page rotation | ||
| 5443 | - | ||
| 5444 | - - :samp:`--newline-before-endstream`: ensure that | ||
| 5445 | - a newline appears before every ``endstream`` keyword in the | ||
| 5446 | - file; used to prevent qpdf from breaking PDF/A compliance on | ||
| 5447 | - already compliant files. | ||
| 5448 | - | ||
| 5449 | - - :samp:`--preserve-unreferenced`: preserve | ||
| 5450 | - unreferenced objects in the input PDF | ||
| 5451 | - | ||
| 5452 | - - :samp:`--split-pages`: break output into chunks | ||
| 5453 | - with fixed numbers of pages | ||
| 5454 | - | ||
| 5455 | - - :samp:`--verbose`: print the name of each | ||
| 5456 | - output file that is created | ||
| 5457 | - | ||
| 5458 | - - :samp:`--compress-streams` and | ||
| 5459 | - :samp:`--decode-level` replace | ||
| 5460 | - :samp:`--stream-data` for improving granularity | ||
| 5461 | - of controlling compression and decompression of stream data. | ||
| 5462 | - The :samp:`--stream-data` option will remain | ||
| 5463 | - available. | ||
| 5464 | - | ||
| 5465 | - - When running :command:`qpdf --check` with other | ||
| 5466 | - options, checks are always run first. This enables qpdf to | ||
| 5467 | - perform its full recovery logic before outputting other | ||
| 5468 | - information. This can be especially useful when manually | ||
| 5469 | - recovering broken files, looking at qpdf's regenerated cross | ||
| 5470 | - reference table, or other similar operations. | ||
| 5471 | - | ||
| 5472 | - - Process :command:`--pages` earlier so that other | ||
| 5473 | - options like :samp:`--show-pages` or | ||
| 5474 | - :samp:`--split-pages` can operate on the file | ||
| 5475 | - after page splitting/merging has occurred. | ||
| 5476 | - | ||
| 5477 | - - API Changes. All new API calls are documented in their respective | ||
| 5478 | - classes' header files. | ||
| 5479 | - | ||
| 5480 | - - ``QPDFObjectHandle::rotatePage``: apply rotation to a page | ||
| 5481 | - object | ||
| 5482 | - | ||
| 5483 | - - ``QPDFWriter::setNewlineBeforeEndstream``: force newline to | ||
| 5484 | - appear before ``endstream`` | ||
| 5485 | - | ||
| 5486 | - - ``QPDFWriter::setPreserveUnreferencedObjects``: preserve | ||
| 5487 | - unreferenced objects that appear in the input PDF. The default | ||
| 5488 | - behavior is to discard them. | ||
| 5489 | - | ||
| 5490 | - - New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are | ||
| 5491 | - available for developers who wish to produce or consume | ||
| 5492 | - RunLength or DCT stream data directly. The | ||
| 5493 | - :file:`examples/pdf-create.cc` example | ||
| 5494 | - illustrates their use. | ||
| 5495 | - | ||
| 5496 | - - ``QPDFWriter::setCompressStreams`` and | ||
| 5497 | - ``QPDFWriter::setDecodeLevel`` methods control handling of | ||
| 5498 | - different types of stream compression. | ||
| 5499 | - | ||
| 5500 | - - Add new C API functions ``qpdf_set_compress_streams``, | ||
| 5501 | - ``qpdf_set_decode_level``, | ||
| 5502 | - ``qpdf_set_preserve_unreferenced_objects``, and | ||
| 5503 | - ``qpdf_set_newline_before_endstream`` corresponding to the new | ||
| 5504 | - ``QPDFWriter`` methods. | ||
| 5505 | - | ||
| 5506 | -6.0.0: November 10, 2015 | ||
| 5507 | - - Implement :samp:`--deterministic-id` command-line | ||
| 5508 | - option and ``QPDFWriter::setDeterministicID`` as well as C API | ||
| 5509 | - function ``qpdf_set_deterministic_ID`` for generating a | ||
| 5510 | - deterministic ID for non-encrypted files. When this option is | ||
| 5511 | - selected, the ID of the file depends on the contents of the output | ||
| 5512 | - file, and not on transient items such as the timestamp or output | ||
| 5513 | - file name. | ||
| 5514 | - | ||
| 5515 | - - Make qpdf more tolerant of files whose xref table entries are not | ||
| 5516 | - the correct length. | ||
| 5517 | - | ||
| 5518 | -5.1.3: May 24, 2015 | ||
| 5519 | - - Bug fix: fix-qdf was not properly handling files that contained | ||
| 5520 | - object streams with more than 255 objects in them. | ||
| 5521 | - | ||
| 5522 | - - Bug fix: qpdf was not properly initializing Microsoft's secure | ||
| 5523 | - crypto provider on fresh Windows installations that had not had | ||
| 5524 | - any keys created yet. | ||
| 5525 | - | ||
| 5526 | - - Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of | ||
| 5527 | - the Google Security Team. Please see the ChangeLog for details. | ||
| 5528 | - | ||
| 5529 | - - Properly handle pages that have no contents at all. There were | ||
| 5530 | - many cases in which qpdf handled this fine, but a few methods | ||
| 5531 | - blindly obtained page contents with handling the possibility that | ||
| 5532 | - there were no contents. | ||
| 5533 | - | ||
| 5534 | - - Make qpdf more robust for a few more kinds of problems that may | ||
| 5535 | - occur in invalid PDF files. | ||
| 5536 | - | ||
| 5537 | -5.1.2: June 7, 2014 | ||
| 5538 | - - Bug fix: linearizing files could create a corrupted output file | ||
| 5539 | - under extremely unlikely file size circumstances. See ChangeLog | ||
| 5540 | - for details. The odds of getting hit by this are very low, though | ||
| 5541 | - one person did. | ||
| 5542 | - | ||
| 5543 | - - Bug fix: qpdf would fail to write files that had streams with | ||
| 5544 | - decode parameters referencing other streams. | ||
| 5545 | - | ||
| 5546 | - - New example program: :command:`pdf-split-pages`: | ||
| 5547 | - efficiently split PDF files into individual pages. The example | ||
| 5548 | - program does this more efficiently than using :command:`qpdf | ||
| 5549 | - --pages` to do it. | ||
| 5550 | - | ||
| 5551 | - - Packaging fix: Visual C++ binaries did not support Windows XP. | ||
| 5552 | - This has been rectified by updating the compilers used to generate | ||
| 5553 | - the release binaries. | ||
| 5554 | - | ||
| 5555 | -5.1.1: January 14, 2014 | ||
| 5556 | - - Performance fix: copying foreign objects could be very slow with | ||
| 5557 | - certain types of files. This was most likely to be visible during | ||
| 5558 | - page splitting and was due to traversing the same objects multiple | ||
| 5559 | - times in some cases. | ||
| 5560 | - | ||
| 5561 | -5.1.0: December 17, 2013 | ||
| 5562 | - - Added runtime option (``QUtil::setRandomDataProvider``) to supply | ||
| 5563 | - your own random data provider. You can use this if you want to | ||
| 5564 | - avoid using the OS-provided secure random number generation | ||
| 5565 | - facility or stdlib's less secure version. See comments in | ||
| 5566 | - include/qpdf/QUtil.hh for details. | ||
| 5567 | - | ||
| 5568 | - - Fixed image comparison tests to not create 12-bit-per-pixel images | ||
| 5569 | - since some versions of tiffcmp have bugs in comparing them in some | ||
| 5570 | - cases. This increases the disk space required by the image | ||
| 5571 | - comparison tests, which are off by default anyway. | ||
| 5572 | - | ||
| 5573 | - - Introduce a number of small fixes for compilation on the latest | ||
| 5574 | - clang in MacOS and the latest Visual C++ in Windows. | ||
| 5575 | - | ||
| 5576 | - - Be able to handle broken files that end the xref table header with | ||
| 5577 | - a space instead of a newline. | ||
| 5578 | - | ||
| 5579 | -5.0.1: October 18, 2013 | ||
| 5580 | - - Thanks to a detailed review by Florian Weimer and the Red Hat | ||
| 5581 | - Product Security Team, this release includes a number of | ||
| 5582 | - non-user-visible security hardening changes. Please see the | ||
| 5583 | - ChangeLog file in the source distribution for the complete list. | ||
| 5584 | - | ||
| 5585 | - - When available, operating system-specific secure random number | ||
| 5586 | - generation is used for generating initialization vectors and other | ||
| 5587 | - random values used during encryption or file creation. For the | ||
| 5588 | - Windows build, this results in an added dependency on Microsoft's | ||
| 5589 | - cryptography API. To disable the OS-specific cryptography and use | ||
| 5590 | - the old version, pass the | ||
| 5591 | - :samp:`--enable-insecure-random` option to | ||
| 5592 | - :command:`./configure`. | ||
| 5593 | - | ||
| 5594 | - - The :command:`qpdf` command-line tool now issues a | ||
| 5595 | - warning when :samp:`-accessibility=n` is specified | ||
| 5596 | - for newer encryption versions stating that the option is ignored. | ||
| 5597 | - qpdf, per the spec, has always ignored this flag, but it | ||
| 5598 | - previously did so silently. This warning is issued only by the | ||
| 5599 | - command-line tool, not by the library. The library's handling of | ||
| 5600 | - this flag is unchanged. | ||
| 5601 | - | ||
| 5602 | -5.0.0: July 10, 2013 | ||
| 5603 | - - Bug fix: previous versions of qpdf would lose objects with | ||
| 5604 | - generation != 0 when generating object streams. Fixing this | ||
| 5605 | - required changes to the public API. | ||
| 5606 | - | ||
| 5607 | - - Removed methods from public API that were only supposed to be | ||
| 5608 | - called by QPDFWriter and couldn't realistically be called anywhere | ||
| 5609 | - else. See ChangeLog for details. | ||
| 5610 | - | ||
| 5611 | - - New ``QPDFObjGen`` class added to represent an object | ||
| 5612 | - ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now | ||
| 5613 | - preferred over ``QPDFObjectHandle::getObjectID()`` and | ||
| 5614 | - ``QPDFObjectHandle::getGeneration()`` as it makes it less likely | ||
| 5615 | - for people to accidentally write code that ignores the generation | ||
| 5616 | - number. See :file:`QPDF.hh` and | ||
| 5617 | - :file:`QPDFObjectHandle.hh` for additional | ||
| 5618 | - notes. | ||
| 5619 | - | ||
| 5620 | - - Add :samp:`--show-npages` command-line option to | ||
| 5621 | - the :command:`qpdf` command to show the number of | ||
| 5622 | - pages in a file. | ||
| 5623 | - | ||
| 5624 | - - Allow omission of the page range within | ||
| 5625 | - :samp:`--pages` for the | ||
| 5626 | - :command:`qpdf` command. When omitted, the page | ||
| 5627 | - range is implicitly taken to be all the pages in the file. | ||
| 5628 | - | ||
| 5629 | - - Various enhancements were made to support different types of | ||
| 5630 | - broken files or broken readers. Details can be found in | ||
| 5631 | - :file:`ChangeLog`. | ||
| 5632 | - | ||
| 5633 | -4.1.0: April 14, 2013 | ||
| 5634 | - - Note to people including qpdf in distributions: the | ||
| 5635 | - :file:`.la` files generated by libtool are now | ||
| 5636 | - installed by qpdf's :command:`make install` target. | ||
| 5637 | - Before, they were not installed. This means that if your | ||
| 5638 | - distribution does not want to include | ||
| 5639 | - :file:`.la` files, you must remove them as | ||
| 5640 | - part of your packaging process. | ||
| 5641 | - | ||
| 5642 | - - Major enhancement: API enhancements have been made to support | ||
| 5643 | - parsing of content streams. This enhancement includes the | ||
| 5644 | - following changes: | ||
| 5645 | - | ||
| 5646 | - - ``QPDFObjectHandle::parseContentStream`` method parses objects | ||
| 5647 | - in a content stream and calls handlers in a callback class. The | ||
| 5648 | - example | ||
| 5649 | - :file:`examples/pdf-parse-content.cc` | ||
| 5650 | - illustrates how this may be used. | ||
| 5651 | - | ||
| 5652 | - - ``QPDFObjectHandle`` can now represent operators and inline | ||
| 5653 | - images, object types that may only appear in content streams. | ||
| 5654 | - | ||
| 5655 | - - Method ``QPDFObjectHandle::getTypeCode()`` returns an | ||
| 5656 | - enumerated type value representing the underlying object type. | ||
| 5657 | - Method ``QPDFObjectHandle::getTypeName()`` returns a text | ||
| 5658 | - string describing the name of the type of a | ||
| 5659 | - ``QPDFObjectHandle`` object. These methods can be used for more | ||
| 5660 | - efficient parsing and debugging/diagnostic messages. | ||
| 5661 | - | ||
| 5662 | - - :command:`qpdf --check` now parses all pages' | ||
| 5663 | - content streams in addition to doing other checks. While there are | ||
| 5664 | - still many types of errors that cannot be detected, syntactic | ||
| 5665 | - errors in content streams will now be reported. | ||
| 5666 | - | ||
| 5667 | - - Minor compilation enhancements have been made to facilitate easier | ||
| 5668 | - for support for a broader range of compilers and compiler | ||
| 5669 | - versions. | ||
| 5670 | - | ||
| 5671 | - - Warning flags have been moved into a separate variable in | ||
| 5672 | - :file:`autoconf.mk` | ||
| 5673 | - | ||
| 5674 | - - The configure flag :samp:`--enable-werror` work | ||
| 5675 | - for Microsoft compilers | ||
| 5676 | - | ||
| 5677 | - - All MSVC CRT security warnings have been resolved. | ||
| 5678 | - | ||
| 5679 | - - All C-style casts in C++ Code have been replaced by C++ casts, | ||
| 5680 | - and many casts that had been included to suppress higher | ||
| 5681 | - warning levels for some compilers have been removed, primarily | ||
| 5682 | - for clarity. Places where integer type coercion occurs have | ||
| 5683 | - been scrutinized. A new casting policy has been documented in | ||
| 5684 | - the manual. This is of concern mainly to people porting qpdf to | ||
| 5685 | - new platforms or compilers. It is not visible to programmers | ||
| 5686 | - writing code that uses the library | ||
| 5687 | - | ||
| 5688 | - - Some internal limits have been removed in code that converts | ||
| 5689 | - numbers to strings. This is largely invisible to users, but it | ||
| 5690 | - does trigger a bug in some older versions of mingw-w64's C++ | ||
| 5691 | - library. See :file:`README-windows.md` in | ||
| 5692 | - the source distribution if you think this may affect you. The | ||
| 5693 | - copy of the DLL distributed with qpdf's binary distribution is | ||
| 5694 | - not affected by this problem. | ||
| 5695 | - | ||
| 5696 | - - The RPM spec file previously included with qpdf has been removed. | ||
| 5697 | - This is because virtually all Linux distributions include qpdf now | ||
| 5698 | - that it is a dependency of CUPS filters. | ||
| 5699 | - | ||
| 5700 | - - A few bug fixes are included: | ||
| 5701 | - | ||
| 5702 | - - Overridden compressed objects are properly handled. Before, | ||
| 5703 | - there were certain constructs that could cause qpdf to see old | ||
| 5704 | - versions of some objects. The most usual manifestation of this | ||
| 5705 | - was loss of filled in form values for certain files. | ||
| 5706 | - | ||
| 5707 | - - Installation no longer uses GNU/Linux-specific versions of some | ||
| 5708 | - commands, so :command:`make install` works on | ||
| 5709 | - Solaris with native tools. | ||
| 5710 | - | ||
| 5711 | - - The 64-bit mingw Windows binary package no longer includes a | ||
| 5712 | - 32-bit DLL. | ||
| 5713 | - | ||
| 5714 | -4.0.1: January 17, 2013 | ||
| 5715 | - - Fix detection of binary attachments in test suite to avoid false | ||
| 5716 | - test failures on some platforms. | ||
| 5717 | - | ||
| 5718 | - - Add clarifying comment in :file:`QPDF.hh` to | ||
| 5719 | - methods that return the user password explaining that it is no | ||
| 5720 | - longer possible with newer encryption formats to recover the user | ||
| 5721 | - password knowing the owner password. In earlier encryption | ||
| 5722 | - formats, the user password was encrypted in the file using the | ||
| 5723 | - owner password. In newer encryption formats, a separate encryption | ||
| 5724 | - key is used on the file, and that key is independently encrypted | ||
| 5725 | - using both the user password and the owner password. | ||
| 5726 | - | ||
| 5727 | -4.0.0: December 31, 2012 | ||
| 5728 | - - Major enhancement: support has been added for newer encryption | ||
| 5729 | - schemes supported by version X of Adobe Acrobat. This includes use | ||
| 5730 | - of 127-character passwords, 256-bit encryption keys, and the | ||
| 5731 | - encryption scheme specified in ISO 32000-2, the PDF 2.0 | ||
| 5732 | - specification. This scheme can be chosen from the command line by | ||
| 5733 | - specifying use of 256-bit keys. qpdf also supports the deprecated | ||
| 5734 | - encryption method used by Acrobat IX. This encryption style has | ||
| 5735 | - known security weaknesses and should not be used in practice. | ||
| 5736 | - However, such files exist "in the wild," so support for this | ||
| 5737 | - scheme is still useful. New methods | ||
| 5738 | - ``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme) | ||
| 5739 | - and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated | ||
| 5740 | - scheme) have been added to enable these new encryption schemes. | ||
| 5741 | - Corresponding functions have been added to the C API as well. | ||
| 5742 | - | ||
| 5743 | - - Full support for Adobe extension levels in PDF version | ||
| 5744 | - information. Starting with PDF version 1.7, corresponding to ISO | ||
| 5745 | - 32000, Adobe adds new functionality by increasing the extension | ||
| 5746 | - level rather than increasing the version. This support includes | ||
| 5747 | - addition of the ``QPDF::getExtensionLevel`` method for retrieving | ||
| 5748 | - the document's extension level, addition of versions of | ||
| 5749 | - ``QPDFWriter::setMinimumPDFVersion`` and | ||
| 5750 | - ``QPDFWriter::forcePDFVersion`` that accept an extension level, | ||
| 5751 | - and extended syntax for specifying forced and minimum versions on | ||
| 5752 | - the command line as described in :ref:`ref.advanced-transformation`. Corresponding functions | ||
| 5753 | - have been added to the C API as well. | ||
| 5754 | - | ||
| 5755 | - - Minor fixes to prevent qpdf from referencing objects in the file | ||
| 5756 | - that are not referenced in the file's overall structure. Most | ||
| 5757 | - files don't have any such objects, but some files have contain | ||
| 5758 | - unreferenced objects with errors, so these fixes prevent qpdf from | ||
| 5759 | - needlessly rejecting or complaining about such objects. | ||
| 5760 | - | ||
| 5761 | - - Add new generalized methods for reading and writing files from/to | ||
| 5762 | - programmer-defined sources. The method | ||
| 5763 | - ``QPDF::processInputSource`` allows the programmer to use any | ||
| 5764 | - input source for the input file, and | ||
| 5765 | - ``QPDFWriter::setOutputPipeline`` allows the programmer to write | ||
| 5766 | - the output file through any pipeline. These methods would make it | ||
| 5767 | - possible to perform any number of specialized operations, such as | ||
| 5768 | - accessing external storage systems, creating bindings for qpdf in | ||
| 5769 | - other programming languages that have their own I/O systems, etc. | ||
| 5770 | - | ||
| 5771 | - - Add new method ``QPDF::getEncryptionKey`` for retrieving the | ||
| 5772 | - underlying encryption key used in the file. | ||
| 5773 | - | ||
| 5774 | - - This release includes a small handful of non-compatible API | ||
| 5775 | - changes. While effort is made to avoid such changes, all the | ||
| 5776 | - non-compatible API changes in this version were to parts of the | ||
| 5777 | - API that would likely never be used outside the library itself. In | ||
| 5778 | - all cases, the altered methods or structures were parts of the | ||
| 5779 | - ``QPDF`` that were public to enable them to be called from either | ||
| 5780 | - ``QPDFWriter`` or were part of validation code that was | ||
| 5781 | - over-zealous in reporting problems in parts of the file that would | ||
| 5782 | - not ordinarily be referenced. In no case did any of the removed | ||
| 5783 | - methods do anything worse that falsely report error conditions in | ||
| 5784 | - files that were broken in ways that didn't matter. The following | ||
| 5785 | - public parts of the ``QPDF`` class were changed in a | ||
| 5786 | - non-compatible way: | ||
| 5787 | - | ||
| 5788 | - - Updated nested ``QPDF::EncryptionData`` class to add fields | ||
| 5789 | - needed by the newer encryption formats, member variables | ||
| 5790 | - changed to private so that future changes will not require | ||
| 5791 | - breaking backward compatibility. | ||
| 5792 | - | ||
| 5793 | - - Added additional parameters to ``compute_data_key``, which is | ||
| 5794 | - used by ``QPDFWriter`` to compute the encryption key used to | ||
| 5795 | - encrypt a specific object. | ||
| 5796 | - | ||
| 5797 | - - Removed the method ``flattenScalarReferences``. This method was | ||
| 5798 | - previously used prior to writing a new PDF file, but it has the | ||
| 5799 | - undesired side effect of causing qpdf to read objects in the | ||
| 5800 | - file that were not referenced. Some otherwise files have | ||
| 5801 | - unreferenced objects with errors in them, so this could cause | ||
| 5802 | - qpdf to reject files that would be accepted by virtually all | ||
| 5803 | - other PDF readers. In fact, qpdf relied on only a very small | ||
| 5804 | - part of what flattenScalarReferences did, so only this part has | ||
| 5805 | - been preserved, and it is now done directly inside | ||
| 5806 | - ``QPDFWriter``. | ||
| 5807 | - | ||
| 5808 | - - Removed the method ``decodeStreams``. This method was used by | ||
| 5809 | - the :samp:`--check` option of the | ||
| 5810 | - :command:`qpdf` command-line tool to force all | ||
| 5811 | - streams in the file to be decoded, but it also suffered from | ||
| 5812 | - the problem of opening otherwise unreferenced streams and thus | ||
| 5813 | - could report false positive. The | ||
| 5814 | - :samp:`--check` option now causes qpdf to go | ||
| 5815 | - through all the motions of writing a new file based on the | ||
| 5816 | - original one, so it will always reference and check exactly | ||
| 5817 | - those parts of a file that any ordinary viewer would check. | ||
| 5818 | - | ||
| 5819 | - - Removed the method ``trimTrailerForWrite``. This method was | ||
| 5820 | - used by ``QPDFWriter`` to modify the original QPDF object by | ||
| 5821 | - removing fields from the trailer dictionary that wouldn't apply | ||
| 5822 | - to the newly written file. This functionality, though generally | ||
| 5823 | - harmless, was a poor implementation and has been replaced by | ||
| 5824 | - having QPDFWriter filter these out when copying the trailer | ||
| 5825 | - rather than modifying the original QPDF object. (Note that qpdf | ||
| 5826 | - never modifies the original file itself.) | ||
| 5827 | - | ||
| 5828 | - - Allow the PDF header to appear anywhere in the first 1024 bytes of | ||
| 5829 | - the file. This is consistent with what other readers do. | ||
| 5830 | - | ||
| 5831 | - - Fix the :command:`pkg-config` files to list zlib | ||
| 5832 | - and pcre in ``Requires.private`` to better support static linking | ||
| 5833 | - using :command:`pkg-config`. | ||
| 5834 | - | ||
| 5835 | -3.0.2: September 6, 2012 | ||
| 5836 | - - Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not | ||
| 5837 | - used with ``QPDFWriter::setStaticID``, which made it pretty much | ||
| 5838 | - useless. This has been fixed. | ||
| 5839 | - | ||
| 5840 | - - New API call ``QPDFWriter::setExtraHeaderText`` inserts additional | ||
| 5841 | - text near the header of the PDF file. The intended use case is to | ||
| 5842 | - insert comments that may be consumed by a downstream application, | ||
| 5843 | - though other use cases may exist. | ||
| 5844 | - | ||
| 5845 | -3.0.1: August 11, 2012 | ||
| 5846 | - - Version 3.0.0 included addition of files for | ||
| 5847 | - :command:`pkg-config`, but this was not mentioned | ||
| 5848 | - in the release notes. The release notes for 3.0.0 were updated to | ||
| 5849 | - mention this. | ||
| 5850 | - | ||
| 5851 | - - Bug fix: if an object stream ended with a scalar object not | ||
| 5852 | - followed by space, qpdf would incorrectly report that it | ||
| 5853 | - encountered a premature EOF. This bug has been in qpdf since | ||
| 5854 | - versionย 2.0. | ||
| 5855 | - | ||
| 5856 | -3.0.0: August 2, 2012 | ||
| 5857 | - - Acknowledgment: I would like to express gratitude for the | ||
| 5858 | - contributions of Tobias Hoffmann toward the release of qpdf | ||
| 5859 | - version 3.0. He is responsible for most of the implementation and | ||
| 5860 | - design of the new API for manipulating pages, and contributed code | ||
| 5861 | - and ideas for many of the improvements made in version 3.0. | ||
| 5862 | - Without his work, this release would certainly not have happened | ||
| 5863 | - as soon as it did, if at all. | ||
| 5864 | - | ||
| 5865 | - - *Non-compatible API changes:* | ||
| 5866 | - | ||
| 5867 | - - The method ``QPDFObjectHandle::replaceStreamData`` that uses a | ||
| 5868 | - ``StreamDataProvider`` to provide the stream data no longer | ||
| 5869 | - takes a ``length`` parameter. The parameter was removed since | ||
| 5870 | - this provides the user an opportunity to simplify the calling | ||
| 5871 | - code. This method was introduced in version 2.2. At the time, | ||
| 5872 | - the ``length`` parameter was required in order to ensure that | ||
| 5873 | - calls to the stream data provider returned the same length for a | ||
| 5874 | - specific stream every time they were invoked. In particular, the | ||
| 5875 | - linearization code depends on this. Instead, qpdf 3.0 and newer | ||
| 5876 | - check for that constraint explicitly. The first time the stream | ||
| 5877 | - data provider is called for a specific stream, the actual length | ||
| 5878 | - is saved, and subsequent calls are required to return the same | ||
| 5879 | - number of bytes. This means the calling code no longer has to | ||
| 5880 | - compute the length in advance, which can be a significant | ||
| 5881 | - simplification. If your code fails to compile because of the | ||
| 5882 | - extra argument and you don't want to make other changes to your | ||
| 5883 | - code, just omit the argument. | ||
| 5884 | - | ||
| 5885 | - - Many methods take ``long long`` instead of other integer types. | ||
| 5886 | - Most if not all existing code should compile fine with this | ||
| 5887 | - change since such parameters had always previously been smaller | ||
| 5888 | - types. This change was required to support files larger than two | ||
| 5889 | - gigabytes in size. | ||
| 5890 | - | ||
| 5891 | - - Support has been added for large files. The test suite verifies | ||
| 5892 | - support for files larger than 4 gigabytes, and manual testing has | ||
| 5893 | - verified support for files larger than 10 gigabytes. Large file | ||
| 5894 | - support is available for both 32-bit and 64-bit platforms as long | ||
| 5895 | - as the compiler and underlying platforms support it. | ||
| 5896 | - | ||
| 5897 | - - Support for page selection (splitting and merging PDF files) has | ||
| 5898 | - been added to the :command:`qpdf` command-line | ||
| 5899 | - tool. See :ref:`ref.page-selection`. | ||
| 5900 | - | ||
| 5901 | - - Options have been added to the :command:`qpdf` | ||
| 5902 | - command-line tool for copying encryption parameters from another | ||
| 5903 | - file. See :ref:`ref.basic-options`. | ||
| 5904 | - | ||
| 5905 | - - New methods have been added to the ``QPDF`` object for adding and | ||
| 5906 | - removing pages. See :ref:`ref.adding-and-remove-pages`. | ||
| 5907 | - | ||
| 5908 | - - New methods have been added to the ``QPDF`` object for copying | ||
| 5909 | - objects from other PDF files. See :ref:`ref.foreign-objects` | ||
| 5910 | - | ||
| 5911 | - - A new method ``QPDFObjectHandle::parse`` has been added for | ||
| 5912 | - constructing ``QPDFObjectHandle`` objects from a string | ||
| 5913 | - description. | ||
| 5914 | - | ||
| 5915 | - - Methods have been added to ``QPDFWriter`` to allow writing to an | ||
| 5916 | - already open stdio ``FILE*`` addition to writing to standard | ||
| 5917 | - output or a named file. Methods have been added to ``QPDF`` to be | ||
| 5918 | - able to process a file from an already open stdio ``FILE*``. This | ||
| 5919 | - makes it possible to read and write PDF from secure temporary | ||
| 5920 | - files that have been unlinked prior to being fully read or | ||
| 5921 | - written. | ||
| 5922 | - | ||
| 5923 | - - The ``QPDF::emptyPDF`` can be used to allow creation of PDF files | ||
| 5924 | - from scratch. The example | ||
| 5925 | - :file:`examples/pdf-create.cc` illustrates how | ||
| 5926 | - it can be used. | ||
| 5927 | - | ||
| 5928 | - - Several methods to take ``PointerHolder<Buffer>`` can now also | ||
| 5929 | - accept ``std::string`` arguments. | ||
| 5930 | - | ||
| 5931 | - - Many new convenience methods have been added to the library, most | ||
| 5932 | - in ``QPDFObjectHandle``. See :file:`ChangeLog` | ||
| 5933 | - for a full list. | ||
| 5934 | - | ||
| 5935 | - - When building on a platform that supports ELF shared libraries | ||
| 5936 | - (such as Linux), symbol versions are enabled by default. They can | ||
| 5937 | - be disabled by passing | ||
| 5938 | - :samp:`--disable-ld-version-script` to | ||
| 5939 | - :command:`./configure`. | ||
| 5940 | - | ||
| 5941 | - - The file :file:`libqpdf.pc` is now installed | ||
| 5942 | - to support :command:`pkg-config`. | ||
| 5943 | - | ||
| 5944 | - - Image comparison tests are off by default now since they are not | ||
| 5945 | - needed to verify a correct build or port of qpdf. They are needed | ||
| 5946 | - only when changing the actual PDF output generated by qpdf. You | ||
| 5947 | - should enable them if you are making deep changes to qpdf itself. | ||
| 5948 | - See :file:`README.md` for details. | ||
| 5949 | - | ||
| 5950 | - - Large file tests are off by default but can be turned on with | ||
| 5951 | - :command:`./configure` or by setting an environment | ||
| 5952 | - variable before running the test suite. See | ||
| 5953 | - :file:`README.md` for details. | ||
| 5954 | - | ||
| 5955 | - - When qpdf's test suite fails, failures are not printed to the | ||
| 5956 | - terminal anymore by default. Instead, find them in | ||
| 5957 | - :file:`build/qtest.log`. For packagers who are | ||
| 5958 | - building with an autobuilder, you can add the | ||
| 5959 | - :samp:`--enable-show-failed-test-output` option to | ||
| 5960 | - :command:`./configure` to restore the old behavior. | ||
| 5961 | - | ||
| 5962 | -2.3.1: December 28, 2011 | ||
| 5963 | - - Fix thread-safety problem resulting from non-thread-safe use of | ||
| 5964 | - the PCRE library. | ||
| 5965 | - | ||
| 5966 | - - Made a few minor documentation fixes. | ||
| 5967 | - | ||
| 5968 | - - Add workaround for a bug that appears in some versions of | ||
| 5969 | - ghostscript to the test suite | ||
| 5970 | - | ||
| 5971 | - - Fix minor build issue for Visual C++ 2010. | ||
| 5972 | - | ||
| 5973 | -2.3.0: August 11, 2011 | ||
| 5974 | - - Bug fix: when preserving existing encryption on encrypted files | ||
| 5975 | - with cleartext metadata, older qpdf versions would generate | ||
| 5976 | - password-protected files with no valid password. This operation | ||
| 5977 | - now works. This bug only affected files created by copying | ||
| 5978 | - existing encryption parameters; explicit encryption with | ||
| 5979 | - specification of cleartext metadata worked before and continues to | ||
| 5980 | - work. | ||
| 5981 | - | ||
| 5982 | - - Enhance ``QPDFWriter`` with a new constructor that allows you to | ||
| 5983 | - delay the specification of the output file. When using this | ||
| 5984 | - constructor, you may now call ``QPDFWriter::setOutputFilename`` to | ||
| 5985 | - specify the output file, or you may use | ||
| 5986 | - ``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write | ||
| 5987 | - the resulting PDF file to a memory buffer. You may then use | ||
| 5988 | - ``QPDFWriter::getBuffer`` to retrieve the memory buffer. | ||
| 5989 | - | ||
| 5990 | - - Add new API call ``QPDF::replaceObject`` for replacing objects by | ||
| 5991 | - object ID | ||
| 5992 | - | ||
| 5993 | - - Add new API call ``QPDF::swapObjects`` for swapping two objects by | ||
| 5994 | - object ID | ||
| 5995 | - | ||
| 5996 | - - Add ``QPDFObjectHandle::getDictAsMap`` and | ||
| 5997 | - ``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of | ||
| 5998 | - dictionary objects as maps and array objects as vectors. | ||
| 5999 | - | ||
| 6000 | - - Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to | ||
| 6001 | - the C API for manipulating string fields of the document's | ||
| 6002 | - ``/Info`` dictionary. | ||
| 6003 | - | ||
| 6004 | - - Add functions ``qpdf_init_write_memory``, | ||
| 6005 | - ``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API | ||
| 6006 | - for writing PDF files to a memory buffer instead of a file. | ||
| 6007 | - | ||
| 6008 | -2.2.4: June 25, 2011 | ||
| 6009 | - - Fix installation and compilation issues; no functionality changes. | ||
| 6010 | - | ||
| 6011 | -2.2.3: April 30, 2011 | ||
| 6012 | - - Handle some damaged streams with incorrect characters following | ||
| 6013 | - the stream keyword. | ||
| 6014 | - | ||
| 6015 | - - Improve handling of inline images when normalizing content | ||
| 6016 | - streams. | ||
| 6017 | - | ||
| 6018 | - - Enhance error recovery to properly handle files that use object 0 | ||
| 6019 | - as a regular object, which is specifically disallowed by the spec. | ||
| 6020 | - | ||
| 6021 | -2.2.2: October 4, 2010 | ||
| 6022 | - - Add new function ``qpdf_read_memory`` to the C API to call | ||
| 6023 | - ``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1. | ||
| 6024 | - | ||
| 6025 | -2.2.1: October 1, 2010 | ||
| 6026 | - - Add new method ``QPDF::setOutputStreams`` to replace ``std::cout`` | ||
| 6027 | - and ``std::cerr`` with other streams for generation of diagnostic | ||
| 6028 | - messages and error messages. This can be useful for GUIs or other | ||
| 6029 | - applications that want to capture any output generated by the | ||
| 6030 | - library to present to the user in some other way. Note that QPDF | ||
| 6031 | - does not write to ``std::cout`` (or the specified output stream) | ||
| 6032 | - except where explicitly mentioned in | ||
| 6033 | - :file:`QPDF.hh`, and that the only use of the | ||
| 6034 | - error stream is for warnings. Note also that output of warnings is | ||
| 6035 | - suppressed when ``setSuppressWarnings(true)`` is called. | ||
| 6036 | - | ||
| 6037 | - - Add new method ``QPDF::processMemoryFile`` for operating on PDF | ||
| 6038 | - files that are loaded into memory rather than in a file on disk. | ||
| 6039 | - | ||
| 6040 | - - Give a warning but otherwise ignore empty PDF objects by treating | ||
| 6041 | - them as null. Empty object are not permitted by the PDF | ||
| 6042 | - specification but have been known to appear in some actual PDF | ||
| 6043 | - files. | ||
| 6044 | - | ||
| 6045 | - - Handle inline image filter abbreviations when the appear as stream | ||
| 6046 | - filter abbreviations. The PDF specification does not allow use of | ||
| 6047 | - stream filter abbreviations in this way, but Adobe Reader and some | ||
| 6048 | - other PDF readers accept them since they sometimes appear | ||
| 6049 | - incorrectly in actual PDF files. | ||
| 6050 | - | ||
| 6051 | - - Implement miscellaneous enhancements to ``PointerHolder`` and | ||
| 6052 | - ``Buffer`` to support other changes. | ||
| 6053 | - | ||
| 6054 | -2.2.0: August 14, 2010 | ||
| 6055 | - - Add new methods to ``QPDFObjectHandle`` (``newStream`` and | ||
| 6056 | - ``replaceStreamData`` for creating new streams and replacing | ||
| 6057 | - stream data. This makes it possible to perform a wide range of | ||
| 6058 | - operations that were not previously possible. | ||
| 6059 | - | ||
| 6060 | - - Add new helper method in ``QPDFObjectHandle`` | ||
| 6061 | - (``addPageContents``) for appending or prepending new content | ||
| 6062 | - streams to a page. This method makes it possible to manipulate | ||
| 6063 | - content streams without having to be concerned whether a page's | ||
| 6064 | - contents are a single stream or an array of streams. | ||
| 6065 | - | ||
| 6066 | - - Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``, | ||
| 6067 | - which replaces a dictionary key with a given value unless the | ||
| 6068 | - value is null, in which case it removes the key instead. | ||
| 6069 | - | ||
| 6070 | - - Add new method in ``QPDFObjectHandle``: ``getRawStreamData``, | ||
| 6071 | - which returns the raw (unfiltered) stream data into a buffer. This | ||
| 6072 | - complements the ``getStreamData`` method, which returns the | ||
| 6073 | - filtered (uncompressed) stream data and can only be used when the | ||
| 6074 | - stream's data is filterable. | ||
| 6075 | - | ||
| 6076 | - - Provide two new examples: | ||
| 6077 | - :command:`pdf-double-page-size` and | ||
| 6078 | - :command:`pdf-invert-images` that illustrate the | ||
| 6079 | - newly added interfaces. | ||
| 6080 | - | ||
| 6081 | - - Fix a memory leak that would cause loss of a few bytes for every | ||
| 6082 | - object involved in a cycle of object references. Thanks to Jian Ma | ||
| 6083 | - for calling my attention to the leak. | ||
| 6084 | - | ||
| 6085 | -2.1.5: April 25, 2010 | ||
| 6086 | - - Remove restriction of file identifier strings to 16 bytes. This | ||
| 6087 | - unnecessary restriction was preventing qpdf from being able to | ||
| 6088 | - encrypt or decrypt files with identifier strings that were not | ||
| 6089 | - exactly 16 bytes long. The specification imposes no such | ||
| 6090 | - restriction. | ||
| 6091 | - | ||
| 6092 | -2.1.4: April 18, 2010 | ||
| 6093 | - - Apply the same padding calculation fix from version 2.1.2 to the | ||
| 6094 | - main cross reference stream as well. | ||
| 6095 | - | ||
| 6096 | - - Since :command:`qpdf --check` only performs limited | ||
| 6097 | - checks, clarify the output to make it clear that there still may | ||
| 6098 | - be errors that qpdf can't check. This should make it less | ||
| 6099 | - surprising to people when another PDF reader is unable to read a | ||
| 6100 | - file that qpdf thinks is okay. | ||
| 6101 | - | ||
| 6102 | -2.1.3: March 27, 2010 | ||
| 6103 | - - Fix bug that could cause a failure when rewriting PDF files that | ||
| 6104 | - contain object streams with unreferenced objects that in turn | ||
| 6105 | - reference indirect scalars. | ||
| 6106 | - | ||
| 6107 | - - Don't complain about (invalid) AES streams that aren't a multiple | ||
| 6108 | - of 16 bytes. Instead, pad them before decrypting. | ||
| 6109 | - | ||
| 6110 | -2.1.2: January 24, 2010 | ||
| 6111 | - - Fix bug in padding around first half cross reference stream in | ||
| 6112 | - linearized files. The bug could cause an assertion failure when | ||
| 6113 | - linearizing certain unlucky files. | ||
| 6114 | - | ||
| 6115 | -2.1.1: December 14, 2009 | ||
| 6116 | - - No changes in functionality; insert missing include in an internal | ||
| 6117 | - library header file to support gcc 4.4, and update test suite to | ||
| 6118 | - ignore broken Adobe Reader installations. | ||
| 6119 | - | ||
| 6120 | -2.1: October 30, 2009 | ||
| 6121 | - - This is the first version of qpdf to include Windows support. On | ||
| 6122 | - Windows, it is possible to build a DLL. Additionally, a partial | ||
| 6123 | - C-language API has been introduced, which makes it possible to | ||
| 6124 | - call qpdf functions from non-C++ environments. I am very grateful | ||
| 6125 | - to ลฝarko Gajiฤ (http://zarko-gajic.iz.hr/) for tirelessly testing | ||
| 6126 | - numerous pre-release versions of this DLL and providing many | ||
| 6127 | - excellent suggestions on improving the interface. | ||
| 6128 | - | ||
| 6129 | - For programming to the C interface, please see the header file | ||
| 6130 | - :file:`qpdf/qpdf-c.h` and the example | ||
| 6131 | - :file:`examples/pdf-linearize.c`. | ||
| 6132 | - | ||
| 6133 | - - ลฝarko Gajiฤ has written a Delphi wrapper for qpdf, which can be | ||
| 6134 | - downloaded from qpdf's download side. ลฝarko's Delphi wrapper is | ||
| 6135 | - released with the same licensing terms as qpdf itself and comes | ||
| 6136 | - with this disclaimer: "Delphi wrapper unit | ||
| 6137 | - :file:`qpdf.pas` created by ลฝarko Gajiฤ | ||
| 6138 | - (http://zarko-gajic.iz.hr/). Use at your own risk and for whatever | ||
| 6139 | - purpose you want. No support is provided. Sample code is | ||
| 6140 | - provided." | ||
| 6141 | - | ||
| 6142 | - - Support has been added for AES encryption and crypt filters. | ||
| 6143 | - Although qpdf does not presently support files that use PKI-based | ||
| 6144 | - encryption, with the addition of AES and crypt filters, qpdf is | ||
| 6145 | - now be able to open most encrypted files created with newer | ||
| 6146 | - versions of Acrobat or other PDF creation software. Note that I | ||
| 6147 | - have not been able to get very many files encrypted in this way, | ||
| 6148 | - so it's possible there could still be some cases that qpdf can't | ||
| 6149 | - handle. Please report them if you find them. | ||
| 6150 | - | ||
| 6151 | - - Many error messages have been improved to include more information | ||
| 6152 | - in hopes of making qpdf a more useful tool for PDF experts to use | ||
| 6153 | - in manually recovering damaged PDF files. | ||
| 6154 | - | ||
| 6155 | - - Attempt to avoid compressing metadata streams if possible. This is | ||
| 6156 | - consistent with other PDF creation applications. | ||
| 6157 | - | ||
| 6158 | - - Provide new command-line options for AES encrypt, cleartext | ||
| 6159 | - metadata, and setting the minimum and forced PDF versions of | ||
| 6160 | - output files. | ||
| 6161 | - | ||
| 6162 | - - Add additional methods to the ``QPDF`` object for querying the | ||
| 6163 | - document's permissions. Although qpdf does not enforce these | ||
| 6164 | - permissions, it does make them available so that applications that | ||
| 6165 | - use qpdf can enforce permissions. | ||
| 6166 | - | ||
| 6167 | - - The :samp:`--check` option to | ||
| 6168 | - :command:`qpdf` has been extended to include some | ||
| 6169 | - additional information. | ||
| 6170 | - | ||
| 6171 | - - *Non-compatible API changes:* | ||
| 6172 | - | ||
| 6173 | - - QPDF's exception handling mechanism now uses | ||
| 6174 | - ``std::logic_error`` for internal errors and | ||
| 6175 | - ``std::runtime_error`` for runtime errors in favor of the now | ||
| 6176 | - removed ``QEXC`` classes used in previous versions. The ``QEXC`` | ||
| 6177 | - exception classes predated the addition of the | ||
| 6178 | - :file:`<stdexcept>` header file to the C++ standard library. | ||
| 6179 | - Most of the exceptions thrown by the qpdf library itself are | ||
| 6180 | - still of type ``QPDFExc`` which is now derived from | ||
| 6181 | - ``std::runtime_error``. Programs that catch an instance of | ||
| 6182 | - ``std::exception`` and displayed it by calling the ``what()`` | ||
| 6183 | - method will not need to be changed. | ||
| 6184 | - | ||
| 6185 | - - The ``QPDFExc`` class now internally represents various fields | ||
| 6186 | - of the error condition and provides interfaces for querying | ||
| 6187 | - them. Among the fields is a numeric error code that can help | ||
| 6188 | - applications act differently on (a small number of) different | ||
| 6189 | - error conditions. See :file:`QPDFExc.hh` for details. | ||
| 6190 | - | ||
| 6191 | - - Warnings can be retrieved from qpdf as instances of ``QPDFExc`` | ||
| 6192 | - instead of strings. | ||
| 6193 | - | ||
| 6194 | - - The nested ``QPDF::EncryptionData`` class's constructor takes an | ||
| 6195 | - additional argument. This class is primarily intended to be used | ||
| 6196 | - by ``QPDFWriter``. There's not really anything useful an | ||
| 6197 | - end-user application could do with it. It probably shouldn't | ||
| 6198 | - really be part of the public interface to begin with. Likewise, | ||
| 6199 | - some of the methods for computing internal encryption dictionary | ||
| 6200 | - parameters have changed to support ``/R=4`` encryption. | ||
| 6201 | - | ||
| 6202 | - - The method ``QPDF::getUserPassword`` has been removed since it | ||
| 6203 | - didn't do what people would think it did. There are now two new | ||
| 6204 | - methods: ``QPDF::getPaddedUserPassword`` and | ||
| 6205 | - ``QPDF::getTrimmedUserPassword``. The first one does what the | ||
| 6206 | - old ``QPDF::getUserPassword`` method used to do, which is to | ||
| 6207 | - return the password with possible binary padding as specified by | ||
| 6208 | - the PDF specification. The second one returns a human-readable | ||
| 6209 | - password string. | ||
| 6210 | - | ||
| 6211 | - - The enumerated types that used to be nested in ``QPDFWriter`` | ||
| 6212 | - have moved to top-level enumerated types and are now defined in | ||
| 6213 | - the file :file:`qpdf/Constants.h`. This enables them to be | ||
| 6214 | - shared by both the C and C++ interfaces. | ||
| 6215 | - | ||
| 6216 | -2.0.6: May 3, 2009 | ||
| 6217 | - - Do not attempt to uncompress streams that have decode parameters | ||
| 6218 | - we don't recognize. Earlier versions of qpdf would have rejected | ||
| 6219 | - files with such streams. | ||
| 6220 | - | ||
| 6221 | -2.0.5: March 10, 2009 | ||
| 6222 | - - Improve error handling in the LZW decoder, and fix a small error | ||
| 6223 | - introduced in the previous version with regard to handling full | ||
| 6224 | - tables. The LZW decoder has been more strongly verified in this | ||
| 6225 | - release. | ||
| 6226 | - | ||
| 6227 | -2.0.4: February 21, 2009 | ||
| 6228 | - - Include proper support for LZW streams encoded without the "early | ||
| 6229 | - code change" flag. Special thanks to Atom Smasher who reported the | ||
| 6230 | - problem and provided an input file compressed in this way, which I | ||
| 6231 | - did not previously have. | ||
| 6232 | - | ||
| 6233 | - - Implement some improvements to file recovery logic. | ||
| 6234 | - | ||
| 6235 | -2.0.3: February 15, 2009 | ||
| 6236 | - - Compile cleanly with gcc 4.4. | ||
| 6237 | - | ||
| 6238 | - - Handle strings encoded as UTF-16BE properly. | ||
| 6239 | - | ||
| 6240 | -2.0.2: June 30, 2008 | ||
| 6241 | - - Update test suite to work properly with a | ||
| 6242 | - non-:command:`bash` | ||
| 6243 | - :file:`/bin/sh` and with Perl 5.10. No changes | ||
| 6244 | - were made to the actual qpdf source code itself for this release. | ||
| 6245 | - | ||
| 6246 | -2.0.1: May 6, 2008 | ||
| 6247 | - - No changes in functionality or interface. This release includes | ||
| 6248 | - fixes to the source code so that qpdf compiles properly and passes | ||
| 6249 | - its test suite on a broader range of platforms. See | ||
| 6250 | - :file:`ChangeLog` in the source distribution | ||
| 6251 | - for details. | ||
| 6252 | - | ||
| 6253 | -2.0: April 29, 2008 | ||
| 6254 | - - First public release. | ||
| 6255 | - | ||
| 6256 | -.. _acknowledgments: | ||
| 6257 | - | ||
| 6258 | -Acknowledgment | ||
| 6259 | -============== | ||
| 6260 | - | ||
| 6261 | -QPDF was originally created in 2001 and modified periodically between | ||
| 6262 | -2001 and 2005 during my employment at `Apex CoVantage | ||
| 6263 | -<http://www.apexcovantage.com>`__. Upon my departure from Apex, the | ||
| 6264 | -company graciously allowed me to take ownership of the software and | ||
| 6265 | -continue maintaining it as an open source project, a decision for which I | ||
| 6266 | -am very grateful. I have made considerable enhancements to it since | ||
| 6267 | -that time. I feel fortunate to have worked for people who would make | ||
| 6268 | -such a decision. This work would not have been possible without their | ||
| 6269 | -support. | 12 | + overview |
| 13 | + license | ||
| 14 | + installation | ||
| 15 | + cli | ||
| 16 | + qdf | ||
| 17 | + library | ||
| 18 | + weak-crypto | ||
| 19 | + json | ||
| 20 | + design | ||
| 21 | + linearization | ||
| 22 | + object-streams | ||
| 23 | + release-notes | ||
| 24 | + acknowledgement |
manual/installation.rst
0 โ 100644
| 1 | +.. _ref.installing: | ||
| 2 | + | ||
| 3 | +Building and Installing QPDF | ||
| 4 | +============================ | ||
| 5 | + | ||
| 6 | +This chapter describes how to build and install qpdf. Please see also | ||
| 7 | +the :file:`README.md` and | ||
| 8 | +:file:`INSTALL` files in the source distribution. | ||
| 9 | + | ||
| 10 | +.. _ref.prerequisites: | ||
| 11 | + | ||
| 12 | +System Requirements | ||
| 13 | +------------------- | ||
| 14 | + | ||
| 15 | +The qpdf package has few external dependencies. In order to build qpdf, | ||
| 16 | +the following packages are required: | ||
| 17 | + | ||
| 18 | +- A C++ compiler that supports C++-14. | ||
| 19 | + | ||
| 20 | +- zlib: http://www.zlib.net/ | ||
| 21 | + | ||
| 22 | +- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/ | ||
| 23 | + | ||
| 24 | +- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be | ||
| 25 | + able to use the gnutls crypto provider, and/or openssl: | ||
| 26 | + https://openssl.org/ to be able to use the openssl crypto provider. | ||
| 27 | + | ||
| 28 | +- gnu make 3.81 or newer: http://www.gnu.org/software/make | ||
| 29 | + | ||
| 30 | +- perl version 5.8 or newer: http://www.perl.org/; required for running | ||
| 31 | + the test suite. Starting with qpdf version 9.1.1, perl is no longer | ||
| 32 | + required at runtime. | ||
| 33 | + | ||
| 34 | +- GNU diffutils (any version): http://www.gnu.org/software/diffutils/ | ||
| 35 | + is required to run the test suite. Note that this is the version of | ||
| 36 | + diff present on virtually all GNU/Linux systems. This is required | ||
| 37 | + because the test suite uses :command:`diff -u`. | ||
| 38 | + | ||
| 39 | +Part of qpdf's test suite does comparisons of the contents PDF files by | ||
| 40 | +converting them images and comparing the images. The image comparison | ||
| 41 | +tests are disabled by default. Those tests are not required for | ||
| 42 | +determining correctness of a qpdf build if you have not modified the | ||
| 43 | +code since the test suite also contains expected output files that are | ||
| 44 | +compared literally. The image comparison tests provide an extra check to | ||
| 45 | +make sure that any content transformations don't break the rendering of | ||
| 46 | +pages. Transformations that affect the content streams themselves are | ||
| 47 | +off by default and are only provided to help developers look into the | ||
| 48 | +contents of PDF files. If you are making deep changes to the library | ||
| 49 | +that cause changes in the contents of the files that qpdf generate, | ||
| 50 | +then you should enable the image comparison tests. Enable them by | ||
| 51 | +running :command:`configure` with the | ||
| 52 | +:samp:`--enable-test-compare-images` flag. If you enable | ||
| 53 | +this, the following additional requirements are required by the test | ||
| 54 | +suite. Note that in no case are these items required to use qpdf. | ||
| 55 | + | ||
| 56 | +- libtiff: http://www.remotesensing.org/libtiff/ | ||
| 57 | + | ||
| 58 | +- GhostScript version 8.60 or newer: http://www.ghostscript.com | ||
| 59 | + | ||
| 60 | +If you do not enable this, then you do not need to have tiff and | ||
| 61 | +ghostscript. | ||
| 62 | + | ||
| 63 | +Pre-built documentation is distributed with qpdf, so you should | ||
| 64 | +generally not need to rebuild the documentation. In order to build the | ||
| 65 | +documentation from source, you need to install `Sphinx | ||
| 66 | +<https://sphinx-doc.org>`__. To build the PDF version of the | ||
| 67 | +documentation, you need `pdflatex`, `latexmk`, and a fairly complete | ||
| 68 | +LaTeX installation. Detailed requirements can be found in the Sphinx | ||
| 69 | +documentation. | ||
| 70 | + | ||
| 71 | +.. _ref.building: | ||
| 72 | + | ||
| 73 | +Build Instructions | ||
| 74 | +------------------ | ||
| 75 | + | ||
| 76 | +Building qpdf on UNIX is generally just a matter of running | ||
| 77 | + | ||
| 78 | +:: | ||
| 79 | + | ||
| 80 | + ./configure | ||
| 81 | + make | ||
| 82 | + | ||
| 83 | +You can also run :command:`make check` to run the test | ||
| 84 | +suite and :command:`make install` to install. Please run | ||
| 85 | +:command:`./configure --help` for options on what can be | ||
| 86 | +configured. You can also set the value of ``DESTDIR`` during | ||
| 87 | +installation to install to a temporary location, as is common with many | ||
| 88 | +open source packages. Please see also the | ||
| 89 | +:file:`README.md` and | ||
| 90 | +:file:`INSTALL` files in the source distribution. | ||
| 91 | + | ||
| 92 | +Building on Windows is a little bit more complicated. For details, | ||
| 93 | +please see :file:`README-windows.md` in the source | ||
| 94 | +distribution. You can also download a binary distribution for Windows. | ||
| 95 | +There is a port of qpdf to Visual C++ version 6 in the | ||
| 96 | +:file:`contrib` area generously contributed by Jian | ||
| 97 | +Ma. This is also discussed in more detail in | ||
| 98 | +:file:`README-windows.md`. | ||
| 99 | + | ||
| 100 | +While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one | ||
| 101 | +place in the public API, and it's just in a helper function. It is | ||
| 102 | +possible to build qpdf on a system that doesn't have ``wchar_t``, and | ||
| 103 | +it's also possible to compile a program that uses qpdf on a system | ||
| 104 | +without ``wchar_t`` as long as you don't call that one method. This is a | ||
| 105 | +very unusual situation. For a detailed discussion, please see the | ||
| 106 | +top-level README.md file in qpdf's source distribution. | ||
| 107 | + | ||
| 108 | +There are some other things you can do with the build. Although qpdf | ||
| 109 | +uses :command:`autoconf`, it does not use | ||
| 110 | +:command:`automake` but instead uses a | ||
| 111 | +hand-crafted non-recursive Makefile that requires gnu make. If you're | ||
| 112 | +really interested, please read the comments in the top-level | ||
| 113 | +:file:`Makefile`. | ||
| 114 | + | ||
| 115 | +.. _ref.crypto: | ||
| 116 | + | ||
| 117 | +Crypto Providers | ||
| 118 | +---------------- | ||
| 119 | + | ||
| 120 | +Starting with qpdf 9.1.0, the qpdf library can be built with multiple | ||
| 121 | +implementations of providers of cryptographic functions, which we refer | ||
| 122 | +to as "crypto providers." At the time of writing, a crypto | ||
| 123 | +implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes | ||
| 124 | +and RC4 and AES256 with and without CBC encryption. In the future, if | ||
| 125 | +digital signature is added to qpdf, there may be additional requirements | ||
| 126 | +beyond this. | ||
| 127 | + | ||
| 128 | +Starting with qpdf version 9.1.0, the available implementations are | ||
| 129 | +``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added. | ||
| 130 | +Additional implementations may be added if needed. It is also possible | ||
| 131 | +for a developer to provide their own implementation without modifying | ||
| 132 | +the qpdf library. | ||
| 133 | + | ||
| 134 | +.. _ref.crypto.build: | ||
| 135 | + | ||
| 136 | +Build Support For Crypto Providers | ||
| 137 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 138 | + | ||
| 139 | +When building with qpdf's build system, crypto providers can be enabled | ||
| 140 | +at build time using various :command:`./configure` | ||
| 141 | +options. The default behavior is for | ||
| 142 | +:command:`./configure` to discover which crypto providers | ||
| 143 | +can be supported based on available external libraries, to build all | ||
| 144 | +available crypto providers, and to use an external provider as the | ||
| 145 | +default over the native one. This behavior can be changed with the | ||
| 146 | +following flags to :command:`./configure`: | ||
| 147 | + | ||
| 148 | +- :samp:`--enable-crypto-{x}` | ||
| 149 | + (where :samp:`{x}` is a supported crypto | ||
| 150 | + provider): enable the :samp:`{x}` crypto | ||
| 151 | + provider, requiring any external dependencies it needs | ||
| 152 | + | ||
| 153 | +- :samp:`--disable-crypto-{x}`: | ||
| 154 | + disable the :samp:`{x}` provider, and do not | ||
| 155 | + link against its dependencies even if they are available | ||
| 156 | + | ||
| 157 | +- :samp:`--with-default-crypto={x}`: | ||
| 158 | + make :samp:`{x}` the default provider even if | ||
| 159 | + a higher priority one is available | ||
| 160 | + | ||
| 161 | +- :samp:`--disable-implicit-crypto`: only build crypto | ||
| 162 | + providers that are explicitly requested with an | ||
| 163 | + :samp:`--enable-crypto-{x}` | ||
| 164 | + option | ||
| 165 | + | ||
| 166 | +For example, if you want to guarantee that the gnutls crypto provider is | ||
| 167 | +used and that the native provider is not built, you could run | ||
| 168 | +:command:`./configure --enable-crypto-gnutls | ||
| 169 | +--disable-implicit-crypto`. | ||
| 170 | + | ||
| 171 | +If you build qpdf using your own build system, in order for qpdf to work | ||
| 172 | +at all, you need to enable at least one crypto provider. The file | ||
| 173 | +:file:`libqpdf/qpdf/qpdf-config.h.in` provides | ||
| 174 | +macros ``DEFAULT_CRYPTO``, whose value must be a string naming the | ||
| 175 | +default crypto provider, and various symbols starting with | ||
| 176 | +``USE_CRYPTO_``, at least one of which has to be enabled. Additionally, | ||
| 177 | +you must compile the source files that implement a crypto provider. To | ||
| 178 | +get a list of those files, look at | ||
| 179 | +:file:`libqpdf/build.mk`. If you want to omit a | ||
| 180 | +particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is | ||
| 181 | +undefined, you can completely ignore the source files that belong to a | ||
| 182 | +particular crypto provider. Additionally, crypto providers may have | ||
| 183 | +their own external dependencies that can be omitted if the crypto | ||
| 184 | +provider is not used. For example, if you are building qpdf yourself and | ||
| 185 | +are using an environment that does not support gnutls or openssl, you | ||
| 186 | +can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS`` | ||
| 187 | +is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then | ||
| 188 | +you must include the source files used in the native implementation, | ||
| 189 | +some of which were added or renamed from earlier versions, to your | ||
| 190 | +build, and you can ignore | ||
| 191 | +:file:`QPDFCrypto_gnutls.cc`. Always consult | ||
| 192 | +:file:`libqpdf/build.mk` to get the list of source | ||
| 193 | +files you need to build. | ||
| 194 | + | ||
| 195 | +.. _ref.crypto.runtime: | ||
| 196 | + | ||
| 197 | +Runtime Crypto Provider Selection | ||
| 198 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 199 | + | ||
| 200 | +You can use the :samp:`--show-crypto` option to | ||
| 201 | +:command:`qpdf` to get a list of available crypto | ||
| 202 | +providers. The default provider is always listed first, and the rest are | ||
| 203 | +listed in lexical order. Each crypto provider is listed on a line by | ||
| 204 | +itself with no other text, enabling the output of this command to be | ||
| 205 | +used easily in scripts. | ||
| 206 | + | ||
| 207 | +You can override which crypto provider is used by setting the | ||
| 208 | +``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to | ||
| 209 | +ever do this, but you might want to do it if you were explicitly trying | ||
| 210 | +to compare behavior of two different crypto providers while testing | ||
| 211 | +performance or reproducing a bug. It could also be useful for people who | ||
| 212 | +are implementing their own crypto providers. | ||
| 213 | + | ||
| 214 | +.. _ref.crypto.develop: | ||
| 215 | + | ||
| 216 | +Crypto Provider Information for Developers | ||
| 217 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 218 | + | ||
| 219 | +If you are writing code that uses libqpdf and you want to force a | ||
| 220 | +certain crypto provider to be used, you can call the method | ||
| 221 | +``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of | ||
| 222 | +a built-in or developer-supplied provider. To add your own crypto | ||
| 223 | +provider, you have to create a class derived from ``QPDFCryptoImpl`` and | ||
| 224 | +register it with ``QPDFCryptoProvider``. For additional information, see | ||
| 225 | +comments in :file:`include/qpdf/QPDFCryptoImpl.hh`. | ||
| 226 | + | ||
| 227 | +.. _ref.crypto.design: | ||
| 228 | + | ||
| 229 | +Crypto Provider Design Notes | ||
| 230 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 231 | + | ||
| 232 | +This section describes a few bits of rationale for why the crypto | ||
| 233 | +provider interface was set up the way it was. You don't need to know any | ||
| 234 | +of this information, but it's provided for the record and in case it's | ||
| 235 | +interesting. | ||
| 236 | + | ||
| 237 | +As a general rule, I want to avoid as much as possible including large | ||
| 238 | +blocks of code that are conditionally compiled such that, in most | ||
| 239 | +builds, some code is never built. This is dangerous because it makes it | ||
| 240 | +very easy for invalid code to creep in unnoticed. As such, I want it to | ||
| 241 | +be possible to build qpdf with all available crypto providers, and this | ||
| 242 | +is the way I build qpdf for local development. At the same time, if a | ||
| 243 | +particular packager feels that it is a security liability for qpdf to | ||
| 244 | +use crypto functionality from other than a library that gets | ||
| 245 | +considerable scrutiny for this specific purpose (such as gnutls, | ||
| 246 | +openssl, or nettle), then I want to give that packager the ability to | ||
| 247 | +completely disable qpdf's native implementation. Or if someone wants to | ||
| 248 | +avoid adding a dependency on one of the external crypto providers, I | ||
| 249 | +don't want the availability of the provider to impose additional | ||
| 250 | +external dependencies within that environment. Both of these are | ||
| 251 | +situations that I know to be true for some users of qpdf. | ||
| 252 | + | ||
| 253 | +I want registration and selection of crypto providers to be thread-safe, | ||
| 254 | +and I want it to work deterministically for a developer to provide their | ||
| 255 | +own crypto provider and be able to set it up as the default. This was | ||
| 256 | +the primary motivation behind requiring C++-11 as doing so enabled me to | ||
| 257 | +exploit the guaranteed thread safety of local block static | ||
| 258 | +initialization. The ``QPDFCryptoProvider`` class uses a singleton | ||
| 259 | +pattern with thread-safe initialization to create the singleton instance | ||
| 260 | +of ``QPDFCryptoProvider`` and exposes only static methods in its public | ||
| 261 | +interface. In this way, if a developer wants to call any | ||
| 262 | +``QPDFCryptoProvider`` methods, the library guarantees the | ||
| 263 | +``QPDFCryptoProvider`` is fully initialized and all built-in crypto | ||
| 264 | +providers are registered. Making ``QPDFCryptoProvider`` actually know | ||
| 265 | +about all the built-in providers may seem a bit sad at first, but this | ||
| 266 | +choice makes it extremely clear exactly what the initialization behavior | ||
| 267 | +is. There's no question about provider implementations automatically | ||
| 268 | +registering themselves in a nondeterministic order. It also means that | ||
| 269 | +implementations do not need to know anything about the provider | ||
| 270 | +interface, which makes them easier to test in isolation. Another | ||
| 271 | +advantage of this approach is that a developer who wants to develop | ||
| 272 | +their own crypto provider can do so in complete isolation from the qpdf | ||
| 273 | +library and, with just two calls, can make qpdf use their provider in | ||
| 274 | +their application. If they decided to contribute their code, plugging it | ||
| 275 | +into the qpdf library would require a very small change to qpdf's source | ||
| 276 | +code. | ||
| 277 | + | ||
| 278 | +The decision to make the crypto provider selectable at runtime was one I | ||
| 279 | +struggled with a little, but I decided to do it for various reasons. | ||
| 280 | +Allowing an end user to switch crypto providers easily could be very | ||
| 281 | +useful for reproducing a potential bug. If a user reports a bug that | ||
| 282 | +some cryptographic thing is broken, I can easily ask that person to try | ||
| 283 | +with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The | ||
| 284 | +same could apply in the event of a performance problem. This also makes | ||
| 285 | +it easier for qpdf's own test suite to exercise code with different | ||
| 286 | +providers without having to make every program that links with qpdf | ||
| 287 | +aware of the possibility of multiple providers. In qpdf's continuous | ||
| 288 | +integration environment, the entire test suite is run for each supported | ||
| 289 | +crypto provider. This is made simple by being able to select the | ||
| 290 | +provider using an environment variable. | ||
| 291 | + | ||
| 292 | +Finally, making crypto providers selectable in this way establish a | ||
| 293 | +pattern that I may follow again in the future for stream filter | ||
| 294 | +providers. One could imagine a future enhancement where someone could | ||
| 295 | +provide their own implementations for basic filters like | ||
| 296 | +``/FlateDecode`` or for other filters that qpdf doesn't support. | ||
| 297 | +Implementing the registration functions and internal storage of | ||
| 298 | +registered providers was also easier using C++-11's functional | ||
| 299 | +interfaces, which was another reason to require C++-11 at this time. | ||
| 300 | + | ||
| 301 | +.. _ref.packaging: | ||
| 302 | + | ||
| 303 | +Notes for Packagers | ||
| 304 | +------------------- | ||
| 305 | + | ||
| 306 | +If you are packaging qpdf for an operating system distribution, here are | ||
| 307 | +some things you may want to keep in mind: | ||
| 308 | + | ||
| 309 | +- Starting in qpdf version 9.1.1, qpdf no longer has a runtime | ||
| 310 | + dependency on perl. This is because fix-qdf was rewritten in C++. | ||
| 311 | + However, qpdf still has a build-time dependency on perl. | ||
| 312 | + | ||
| 313 | +- Make sure you are getting the intended behavior with regard to crypto | ||
| 314 | + providers. Read :ref:`ref.crypto.build` for details. | ||
| 315 | + | ||
| 316 | +- Passing :samp:`--enable-show-failed-test-output` to | ||
| 317 | + :command:`./configure` will cause any failed test | ||
| 318 | + output to be written to the console. This can be very useful for | ||
| 319 | + seeing test failures generated by autobuilders where you can't access | ||
| 320 | + qtest.log after the fact. | ||
| 321 | + | ||
| 322 | +- If qpdf's build environment detects the presence of autoconf and | ||
| 323 | + related tools, it will check to ensure that automatically generated | ||
| 324 | + files are up-to-date with recorded checksums and fail if it detects a | ||
| 325 | + discrepancy. This feature is intended to prevent you from | ||
| 326 | + accidentally forgetting to regenerate automatic files after modifying | ||
| 327 | + their sources. If your packaging environment automatically refreshes | ||
| 328 | + automatic files, it can cause this check to fail. Suppress qpdf's | ||
| 329 | + checks by passing :samp:`--disable-check-autofiles` | ||
| 330 | + to :command:`/.configure`. This is safe since qpdf's | ||
| 331 | + :command:`autogen.sh` just runs autotools in the | ||
| 332 | + normal way. | ||
| 333 | + | ||
| 334 | +- QPDF's :command:`make install` does not install | ||
| 335 | + completion files by default, but as a packager, it's good if you | ||
| 336 | + install them wherever your distribution expects such files to go. You | ||
| 337 | + can find completion files to install in the | ||
| 338 | + :file:`completions` directory. | ||
| 339 | + | ||
| 340 | +- Packagers are encouraged to install the source files from the | ||
| 341 | + :file:`examples` directory along with qpdf | ||
| 342 | + development packages. |
manual/json.rst
0 โ 100644
| 1 | +.. _ref.json: | ||
| 2 | + | ||
| 3 | +QPDF JSON | ||
| 4 | +========= | ||
| 5 | + | ||
| 6 | +.. _ref.json-overview: | ||
| 7 | + | ||
| 8 | +Overview | ||
| 9 | +-------- | ||
| 10 | + | ||
| 11 | +Beginning with qpdf version 8.3.0, the :command:`qpdf` | ||
| 12 | +command-line program can produce a JSON representation of the | ||
| 13 | +non-content data in a PDF file. It includes a dump in JSON format of all | ||
| 14 | +objects in the PDF file excluding the content of streams. This JSON | ||
| 15 | +representation makes it very easy to look in detail at the structure of | ||
| 16 | +a given PDF file, and it also provides a great way to work with PDF | ||
| 17 | +files programmatically from the command-line in languages that can't | ||
| 18 | +call or link with the qpdf library directly. Note that stream data can | ||
| 19 | +be extracted from PDF files using other qpdf command-line options. | ||
| 20 | + | ||
| 21 | +.. _ref.json-guarantees: | ||
| 22 | + | ||
| 23 | +JSON Guarantees | ||
| 24 | +--------------- | ||
| 25 | + | ||
| 26 | +The qpdf JSON representation includes a JSON serialization of the raw | ||
| 27 | +objects in the PDF file as well as some computed information in a more | ||
| 28 | +easily extracted format. QPDF provides some guarantees about its JSON | ||
| 29 | +format. These guarantees are designed to simplify the experience of a | ||
| 30 | +developer working with the JSON format. | ||
| 31 | + | ||
| 32 | +Compatibility | ||
| 33 | + The top-level JSON object output is a dictionary. The JSON output | ||
| 34 | + contains various nested dictionaries and arrays. With the exception | ||
| 35 | + of dictionaries that are populated by the fields of objects from the | ||
| 36 | + file, all instances of a dictionary are guaranteed to have exactly | ||
| 37 | + the same keys. Future versions of qpdf are free to add additional | ||
| 38 | + keys but not to remove keys or change the type of object that a key | ||
| 39 | + points to. The qpdf program validates this guarantee, and in the | ||
| 40 | + unlikely event that a bug in qpdf should cause it to generate data | ||
| 41 | + that doesn't conform to this rule, it will ask you to file a bug | ||
| 42 | + report. | ||
| 43 | + | ||
| 44 | + The top-level JSON structure contains a "``version``" key whose value | ||
| 45 | + is simple integer. The value of the ``version`` key will be | ||
| 46 | + incremented if a non-compatible change is made. A non-compatible | ||
| 47 | + change would be any change that involves removal of a key, a change | ||
| 48 | + to the format of data pointed to by a key, or a semantic change that | ||
| 49 | + requires a different interpretation of a previously existing key. A | ||
| 50 | + strong effort will be made to avoid breaking compatibility. | ||
| 51 | + | ||
| 52 | +Documentation | ||
| 53 | + The :command:`qpdf` command can be invoked with the | ||
| 54 | + :samp:`--json-help` option. This will output a JSON | ||
| 55 | + structure that has the same structure as the JSON output that qpdf | ||
| 56 | + generates, except that each field in the help output is a description | ||
| 57 | + of the corresponding field in the JSON output. The specific | ||
| 58 | + guarantees are as follows: | ||
| 59 | + | ||
| 60 | + - A dictionary in the help output means that the corresponding | ||
| 61 | + location in the actual JSON output is also a dictionary with | ||
| 62 | + exactly the same keys; that is, no keys present in help are absent | ||
| 63 | + in the real output, and no keys will be present in the real output | ||
| 64 | + that are not in help. As a special case, if the dictionary has a | ||
| 65 | + single key whose name starts with ``<`` and ends with ``>``, it | ||
| 66 | + means that the JSON output is a dictionary that can have any keys, | ||
| 67 | + each of which conforms to the value of the special key. This is | ||
| 68 | + used for cases in which the keys of the dictionary are things like | ||
| 69 | + object IDs. | ||
| 70 | + | ||
| 71 | + - A string in the help output is a description of the item that | ||
| 72 | + appears in the corresponding location of the actual output. The | ||
| 73 | + corresponding output can have any format. | ||
| 74 | + | ||
| 75 | + - An array in the help output always contains a single element. It | ||
| 76 | + indicates that the corresponding location in the actual output is | ||
| 77 | + also an array, and that each element of the array has whatever | ||
| 78 | + format is implied by the single element of the help output's | ||
| 79 | + array. | ||
| 80 | + | ||
| 81 | + For example, the help output indicates includes a "``pagelabels``" | ||
| 82 | + key whose value is an array of one element. That element is a | ||
| 83 | + dictionary with keys "``index``" and "``label``". In addition to | ||
| 84 | + describing the meaning of those keys, this tells you that the actual | ||
| 85 | + JSON output will contain a ``pagelabels`` array, each of whose | ||
| 86 | + elements is a dictionary that contains an ``index`` key, a ``label`` | ||
| 87 | + key, and no other keys. | ||
| 88 | + | ||
| 89 | +Directness and Simplicity | ||
| 90 | + The JSON output contains the value of every object in the file, but | ||
| 91 | + it also contains some processed data. This is analogous to how qpdf's | ||
| 92 | + library interface works. The processed data is similar to the helper | ||
| 93 | + functions in that it allows you to look at certain aspects of the PDF | ||
| 94 | + file without having to understand all the nuances of the PDF | ||
| 95 | + specification, while the raw objects allow you to mine the PDF for | ||
| 96 | + anything that the higher-level interfaces are lacking. | ||
| 97 | + | ||
| 98 | +.. _json.limitations: | ||
| 99 | + | ||
| 100 | +Limitations of JSON Representation | ||
| 101 | +---------------------------------- | ||
| 102 | + | ||
| 103 | +There are a few limitations to be aware of with the JSON structure: | ||
| 104 | + | ||
| 105 | +- Strings, names, and indirect object references in the original PDF | ||
| 106 | + file are all converted to strings in the JSON representation. In the | ||
| 107 | + case of a "normal" PDF file, you can tell the difference because a | ||
| 108 | + name starts with a slash (``/``), and an indirect object reference | ||
| 109 | + looks like ``n n R``, but if there were to be a string that looked | ||
| 110 | + like a name or indirect object reference, there would be no way to | ||
| 111 | + tell this from the JSON output. Note that there are certain cases | ||
| 112 | + where you know for sure what something is, such as knowing that | ||
| 113 | + dictionary keys in objects are always names and that certain things | ||
| 114 | + in the higher-level computed data are known to contain indirect | ||
| 115 | + object references. | ||
| 116 | + | ||
| 117 | +- The JSON format doesn't support binary data very well. Mostly the | ||
| 118 | + details are not important, but they are presented here for | ||
| 119 | + information. When qpdf outputs a string in the JSON representation, | ||
| 120 | + it converts the string to UTF-8, assuming usual PDF string semantics. | ||
| 121 | + Specifically, if the original string is UTF-16, it is converted to | ||
| 122 | + UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is | ||
| 123 | + converted to UTF-8 with that assumption. This causes strange things | ||
| 124 | + to happen to binary strings. For example, if you had the binary | ||
| 125 | + string ``<038051>``, this would be output to the JSON as ``\u0003โขQ`` | ||
| 126 | + because ``03`` is not a printable character and ``80`` is the bullet | ||
| 127 | + character in PDF doc encoding and is mapped to the Unicode value | ||
| 128 | + ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to | ||
| 129 | + convert back from here to a binary string, would have to recognize | ||
| 130 | + Unicode values whose code points are higher than ``0xFF`` and map | ||
| 131 | + those back to their corresponding PDF doc encoding characters. There | ||
| 132 | + is no way to tell the difference between a Unicode string that was | ||
| 133 | + originally encoded as UTF-16 or one that was converted from PDF doc | ||
| 134 | + encoding. In other words, it's best if you don't try to use the JSON | ||
| 135 | + format to extract binary strings from the PDF file, but if you really | ||
| 136 | + had to, it could be done. Note that qpdf's | ||
| 137 | + :samp:`--show-object` option does not have this | ||
| 138 | + limitation and will reveal the string as encoded in the original | ||
| 139 | + file. | ||
| 140 | + | ||
| 141 | +.. _json.considerations: | ||
| 142 | + | ||
| 143 | +JSON: Special Considerations | ||
| 144 | +---------------------------- | ||
| 145 | + | ||
| 146 | +For the most part, the built-in JSON help tells you everything you need | ||
| 147 | +to know about the JSON format, but there are a few non-obvious things to | ||
| 148 | +be aware of: | ||
| 149 | + | ||
| 150 | +- While qpdf guarantees that keys present in the help will be present | ||
| 151 | + in the output, those fields may be null or empty if the information | ||
| 152 | + is not known or absent in the file. Also, if you specify | ||
| 153 | + :samp:`--json-keys`, the keys that are not listed | ||
| 154 | + will be excluded entirely except for those that | ||
| 155 | + :samp:`--json-help` says are always present. | ||
| 156 | + | ||
| 157 | +- In a few places, there are keys with names containing | ||
| 158 | + ``pageposfrom1``. The values of these keys are null or an integer. If | ||
| 159 | + an integer, they point to a page index within the file numbering from | ||
| 160 | + 1. Note that JSON indexes from 0, and you would also use 0-based | ||
| 161 | + indexing using the API. However, 1-based indexing is easier in this | ||
| 162 | + case because the command-line syntax for specifying page ranges is | ||
| 163 | + 1-based. If you were going to write a program that looked through the | ||
| 164 | + JSON for information about specific pages and then use the | ||
| 165 | + command-line to extract those pages, 1-based indexing is easier. | ||
| 166 | + Besides, it's more convenient to subtract 1 from a program in a real | ||
| 167 | + programming language than it is to add 1 from shell code. | ||
| 168 | + | ||
| 169 | +- The image information included in the ``page`` section of the JSON | ||
| 170 | + output includes the key "``filterable``". Note that the value of this | ||
| 171 | + field may depend on the :samp:`--decode-level` that | ||
| 172 | + you invoke qpdf with. The JSON output includes a top-level key | ||
| 173 | + "``parameters``" that indicates the decode level used for computing | ||
| 174 | + whether a stream was filterable. For example, jpeg images will be | ||
| 175 | + shown as not filterable by default, but they will be shown as | ||
| 176 | + filterable if you run :command:`qpdf --json | ||
| 177 | + --decode-level=all`. |
manual/library.rst
0 โ 100644
| 1 | +.. _ref.using-library: | ||
| 2 | + | ||
| 3 | +Using the QPDF Library | ||
| 4 | +====================== | ||
| 5 | + | ||
| 6 | +.. _ref.using.from-cxx: | ||
| 7 | + | ||
| 8 | +Using QPDF from C++ | ||
| 9 | +------------------- | ||
| 10 | + | ||
| 11 | +The source tree for the qpdf package has an | ||
| 12 | +:file:`examples` directory that contains a few | ||
| 13 | +example programs. The :file:`qpdf/qpdf.cc` source | ||
| 14 | +file also serves as a useful example since it exercises almost all of | ||
| 15 | +the qpdf library's public interface. The best source of documentation on | ||
| 16 | +the library itself is reading comments in | ||
| 17 | +:file:`include/qpdf/QPDF.hh`, | ||
| 18 | +:file:`include/qpdf/QPDFWriter.hh`, and | ||
| 19 | +:file:`include/qpdf/QPDFObjectHandle.hh`. | ||
| 20 | + | ||
| 21 | +All header files are installed in the | ||
| 22 | +:file:`include/qpdf` directory. It is recommend that | ||
| 23 | +you use ``#include <qpdf/QPDF.hh>`` rather than adding | ||
| 24 | +:file:`include/qpdf` to your include path. | ||
| 25 | + | ||
| 26 | +When linking against the qpdf static library, you may also need to | ||
| 27 | +specify ``-lz -ljpeg`` on your link command. If your system understands | ||
| 28 | +how to read libtool :file:`.la` files, this may not | ||
| 29 | +be necessary. | ||
| 30 | + | ||
| 31 | +The qpdf library is safe to use in a multithreaded program, but no | ||
| 32 | +individual ``QPDF`` object instance (including ``QPDF``, | ||
| 33 | +``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one | ||
| 34 | +thread at a time. Multiple threads may simultaneously work with | ||
| 35 | +different instances of these and all other QPDF objects. | ||
| 36 | + | ||
| 37 | +.. _ref.using.other-languages: | ||
| 38 | + | ||
| 39 | +Using QPDF from other languages | ||
| 40 | +------------------------------- | ||
| 41 | + | ||
| 42 | +The qpdf library is implemented in C++, which makes it hard to use | ||
| 43 | +directly in other languages. There are a few things that can help. | ||
| 44 | + | ||
| 45 | +"C" | ||
| 46 | + The qpdf library includes a "C" language interface that provides a | ||
| 47 | + subset of the overall capabilities. The header file | ||
| 48 | + :file:`qpdf/qpdf-c.h` includes information about | ||
| 49 | + its use. As long as you use a C++ linker, you can link C programs | ||
| 50 | + with qpdf and use the C API. For languages that can directly load | ||
| 51 | + methods from a shared library, the C API can also be useful. People | ||
| 52 | + have reported success using the C API from other languages on Windows | ||
| 53 | + by directly calling functions in the DLL. | ||
| 54 | + | ||
| 55 | +Python | ||
| 56 | + A Python module called | ||
| 57 | + `pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and | ||
| 58 | + highly functional set of Python bindings to the qpdf library. Using | ||
| 59 | + pikepdf, you can work with PDF files in a natural way and combine | ||
| 60 | + qpdf's capabilities with other functionality provided by Python's | ||
| 61 | + rich standard library and available modules. | ||
| 62 | + | ||
| 63 | +Other Languages | ||
| 64 | + Starting with version 8.3.0, the :command:`qpdf` | ||
| 65 | + command-line tool can produce a JSON representation of the PDF file's | ||
| 66 | + non-content data. This can facilitate interacting programmatically | ||
| 67 | + with PDF files through qpdf's command line interface. For more | ||
| 68 | + information, please see :ref:`ref.json`. | ||
| 69 | + | ||
| 70 | +.. _ref.unicode-files: | ||
| 71 | + | ||
| 72 | +A Note About Unicode File Names | ||
| 73 | +------------------------------- | ||
| 74 | + | ||
| 75 | +When strings are passed to qpdf library routines either as ``char*`` or | ||
| 76 | +as ``std::string``, they are treated as byte arrays except where | ||
| 77 | +otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless | ||
| 78 | +otherwise noted in comments in header files. In modern UNIX/Linux | ||
| 79 | +environments, this generally does the right thing. In Windows, it's a | ||
| 80 | +bit more complicated. Starting in qpdf 8.4.0, passwords that contain | ||
| 81 | +Unicode characters are handled much better, and starting in qpdf 8.4.1, | ||
| 82 | +the library attempts to properly handle Unicode characters in filenames. | ||
| 83 | +In particular, in Windows, if a UTF-8 encoded string is used as a | ||
| 84 | +filename in either ``QPDF`` or ``QPDFWriter``, it is internally | ||
| 85 | +converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As | ||
| 86 | +such, qpdf will generally operate properly on files with non-ASCII | ||
| 87 | +characters in their names as long as the filenames are UTF-8 encoded for | ||
| 88 | +passing into the qpdf library API, but there are still some rough edges, | ||
| 89 | +such as the encoding of the filenames in error messages our CLI output | ||
| 90 | +messages. Patches or bug reports are welcome for any continuing issues | ||
| 91 | +with Unicode file names in Windows. |
manual/license.rst
0 โ 100644
| 1 | +.. _ref.license: | ||
| 2 | + | ||
| 3 | +License | ||
| 4 | +======= | ||
| 5 | + | ||
| 6 | +QPDF is licensed under `the Apache License, Version 2.0 | ||
| 7 | +<http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License"). | ||
| 8 | +Unless required by applicable law or agreed to in writing, software | ||
| 9 | +distributed under the License is distributed on an "AS IS" BASIS, | ||
| 10 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or | ||
| 11 | +implied. See the License for the specific language governing | ||
| 12 | +permissions and limitations under the License. |
manual/linearization.rst
0 โ 100644
| 1 | +.. _ref.linearization: | ||
| 2 | + | ||
| 3 | +Linearization | ||
| 4 | +============= | ||
| 5 | + | ||
| 6 | +This chapter describes how ``QPDF`` and ``QPDFWriter`` implement | ||
| 7 | +creation and processing of linearized PDFS. | ||
| 8 | + | ||
| 9 | +.. _ref.linearization-strategy: | ||
| 10 | + | ||
| 11 | +Basic Strategy for Linearization | ||
| 12 | +-------------------------------- | ||
| 13 | + | ||
| 14 | +To avoid the incestuous problem of having the qpdf library validate its | ||
| 15 | +own linearized files, we have a special linearized file checking mode | ||
| 16 | +which can be invoked via :command:`qpdf | ||
| 17 | +--check-linearization` (or :command:`qpdf | ||
| 18 | +--check`). This mode reads the linearization parameter | ||
| 19 | +dictionary and the hint streams and validates that object ordering, | ||
| 20 | +parameters, and hint stream contents are correct. The validation code | ||
| 21 | +was first tested against linearized files created by external tools | ||
| 22 | +(Acrobat and pdlin) and then used to validate files created by | ||
| 23 | +``QPDFWriter`` itself. | ||
| 24 | + | ||
| 25 | +.. _ref.linearized.preparation: | ||
| 26 | + | ||
| 27 | +Preparing For Linearization | ||
| 28 | +--------------------------- | ||
| 29 | + | ||
| 30 | +Before creating a linearized PDF file from any other PDF file, the PDF | ||
| 31 | +file must be altered such that all page attributes are propagated down | ||
| 32 | +to the page level (and not inherited from parents in the ``/Pages`` | ||
| 33 | +tree). We also have to know which objects refer to which other objects, | ||
| 34 | +being concerned with page boundaries and a few other cases. We refer to | ||
| 35 | +this part of preparing the PDF file as | ||
| 36 | +*optimization*, discussed in | ||
| 37 | +:ref:`ref.optimization`. Note the, in this context, the | ||
| 38 | +term *optimization* is a qpdf term, and the | ||
| 39 | +term *linearization* is a term from the PDF | ||
| 40 | +specification. Do not be confused by the fact that many applications | ||
| 41 | +refer to linearization as optimization or web optimization. | ||
| 42 | + | ||
| 43 | +When creating linearized PDF files from optimized PDF files, there are | ||
| 44 | +really only a few issues that need to be dealt with: | ||
| 45 | + | ||
| 46 | +- Creation of hints tables | ||
| 47 | + | ||
| 48 | +- Placing objects in the correct order | ||
| 49 | + | ||
| 50 | +- Filling in offsets and byte sizes | ||
| 51 | + | ||
| 52 | +.. _ref.optimization: | ||
| 53 | + | ||
| 54 | +Optimization | ||
| 55 | +------------ | ||
| 56 | + | ||
| 57 | +In order to perform various operations such as linearization and | ||
| 58 | +splitting files into pages, it is necessary to know which objects are | ||
| 59 | +referenced by which pages, page thumbnails, and root and trailer | ||
| 60 | +dictionary keys. It is also necessary to ensure that all page-level | ||
| 61 | +attributes appear directly at the page level and are not inherited from | ||
| 62 | +parents in the pages tree. | ||
| 63 | + | ||
| 64 | +We refer to the process of enforcing these constraints as | ||
| 65 | +*optimization*. As mentioned above, note | ||
| 66 | +that some applications refer to linearization as optimization. Although | ||
| 67 | +this optimization was initially motivated by the need to create | ||
| 68 | +linearized files, we are using these terms separately. | ||
| 69 | + | ||
| 70 | +PDF file optimization is implemented in the | ||
| 71 | +:file:`QPDF_optimization.cc` source file. That file | ||
| 72 | +is richly commented and serves as the primary reference for the | ||
| 73 | +optimization process. | ||
| 74 | + | ||
| 75 | +After optimization has been completed, the private member variables | ||
| 76 | +``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have | ||
| 77 | +been populated. Any object that has more than one value in the | ||
| 78 | +``object_to_obj_users`` table is shared. Any object that has exactly one | ||
| 79 | +value in the ``object_to_obj_users`` table is private. To find all the | ||
| 80 | +private objects in a page or a trailer or root dictionary key, one | ||
| 81 | +merely has make this determination for each element in the | ||
| 82 | +``obj_user_to_objects`` table for the given page or key. | ||
| 83 | + | ||
| 84 | +Note that pages and thumbnails have different object user types, so the | ||
| 85 | +above test on a page will not include objects referenced by the page's | ||
| 86 | +thumbnail dictionary and nothing else. | ||
| 87 | + | ||
| 88 | +.. _ref.linearization.writing: | ||
| 89 | + | ||
| 90 | +Writing Linearized Files | ||
| 91 | +------------------------ | ||
| 92 | + | ||
| 93 | +We will create files with only primary hint streams. We will never write | ||
| 94 | +overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either, | ||
| 95 | +and they are never necessary.) The hint streams contain offset | ||
| 96 | +information to objects that point to where they would be if the hint | ||
| 97 | +stream were not present. This means that we have to calculate all object | ||
| 98 | +positions before we can generate and write the hint table. This means | ||
| 99 | +that we have to generate the file in two passes. To make this reliable, | ||
| 100 | +``QPDFWriter`` in linearization mode invokes exactly the same code twice | ||
| 101 | +to write the file to a pipeline. | ||
| 102 | + | ||
| 103 | +In the first pass, the target pipeline is a count pipeline chained to a | ||
| 104 | +discard pipeline. The count pipeline simply passes its data through to | ||
| 105 | +the next pipeline in the chain but can return the number of bytes passed | ||
| 106 | +through it at any intermediate point. The discard pipeline is an end of | ||
| 107 | +line pipeline that just throws its data away. The hint stream is not | ||
| 108 | +written and dummy values with adequate padding are stored in the first | ||
| 109 | +cross reference table, linearization parameter dictionary, and /Prev key | ||
| 110 | +of the first trailer dictionary. All the offset, length, object | ||
| 111 | +renumbering information, and anything else we need for the second pass | ||
| 112 | +is stored. | ||
| 113 | + | ||
| 114 | +At the end of the first pass, this information is passed to the ``QPDF`` | ||
| 115 | +class which constructs a compressed hint stream in a memory buffer and | ||
| 116 | +returns it. ``QPDFWriter`` uses this information to write a complete | ||
| 117 | +hint stream object into a memory buffer. At this point, the length of | ||
| 118 | +the hint stream is known. | ||
| 119 | + | ||
| 120 | +In the second pass, the end of the pipeline chain is a regular file | ||
| 121 | +instead of a discard pipeline, and we have known values for all the | ||
| 122 | +offsets and lengths that we didn't have in the first pass. We have to | ||
| 123 | +adjust offsets that appear after the start of the hint stream by the | ||
| 124 | +length of the hint stream, which is known. Anything that is of variable | ||
| 125 | +length is padded, with the padding code surrounding any writing code | ||
| 126 | +that differs in the two passes. This ensures that changes to the way | ||
| 127 | +things are represented never results in offsets that were gathered | ||
| 128 | +during the first pass becoming incorrect for the second pass. | ||
| 129 | + | ||
| 130 | +Using this strategy, we can write linearized files to a non-seekable | ||
| 131 | +output stream with only a single pass to disk or wherever the output is | ||
| 132 | +going. | ||
| 133 | + | ||
| 134 | +.. _ref.linearization-data: | ||
| 135 | + | ||
| 136 | +Calculating Linearization Data | ||
| 137 | +------------------------------ | ||
| 138 | + | ||
| 139 | +Once a file is optimized, we have information about which objects access | ||
| 140 | +which other objects. We can then process these tables to decide which | ||
| 141 | +part (as described in "Linearized PDF Document Structure" in the PDF | ||
| 142 | +specification) each object is contained within. This tells us the exact | ||
| 143 | +order in which objects are written. The ``QPDFWriter`` class asks for | ||
| 144 | +this information and enqueues objects for writing in the proper order. | ||
| 145 | +It also turns on a check that causes an exception to be thrown if an | ||
| 146 | +object is encountered that has not already been queued. (This could | ||
| 147 | +happen only if there were a bug in the traversal code used to calculate | ||
| 148 | +the linearization data.) | ||
| 149 | + | ||
| 150 | +.. _ref.linearization-issues: | ||
| 151 | + | ||
| 152 | +Known Issues with Linearization | ||
| 153 | +------------------------------- | ||
| 154 | + | ||
| 155 | +There are a handful of known issues with this linearization code. These | ||
| 156 | +issues do not appear to impact the behavior of linearized files which | ||
| 157 | +still work as intended: it is possible for a web browser to begin to | ||
| 158 | +display them before they are fully downloaded. In fact, it seems that | ||
| 159 | +various other programs that create linearized files have many of these | ||
| 160 | +same issues. These items make reference to terminology used in the | ||
| 161 | +linearization appendix of the PDF specification. | ||
| 162 | + | ||
| 163 | +- Thread Dictionary information keys appear in part 4 with the rest of | ||
| 164 | + Threads instead of in part 9. Objects in part 9 are not grouped | ||
| 165 | + together functionally. | ||
| 166 | + | ||
| 167 | +- We are not calculating numerators for shared object positions within | ||
| 168 | + content streams or interleaving them within content streams. | ||
| 169 | + | ||
| 170 | +- We generate only page offset, shared object, and outline hint tables. | ||
| 171 | + It would be relatively easy to add some additional tables. We gather | ||
| 172 | + most of the information needed to create thumbnail hint tables. There | ||
| 173 | + are comments in the code about this. | ||
| 174 | + | ||
| 175 | +.. _ref.linearization-debugging: | ||
| 176 | + | ||
| 177 | +Debugging Note | ||
| 178 | +-------------- | ||
| 179 | + | ||
| 180 | +The :command:`qpdf --show-linearization` command can show | ||
| 181 | +the complete contents of linearization hint streams. To look at the raw | ||
| 182 | +data, you can extract the filtered contents of the linearization hint | ||
| 183 | +tables using :command:`qpdf --show-object=n | ||
| 184 | +--filtered-stream-data`. Then, to convert this into a bit | ||
| 185 | +stream (since linearization tables are bit streams written without | ||
| 186 | +regard to byte boundaries), you can pipe the resulting data through the | ||
| 187 | +following perl code: | ||
| 188 | + | ||
| 189 | +.. code-block:: perl | ||
| 190 | + | ||
| 191 | + use bytes; | ||
| 192 | + binmode STDIN; | ||
| 193 | + undef $/; | ||
| 194 | + my $a = <STDIN>; | ||
| 195 | + my @ch = split(//, $a); | ||
| 196 | + map { printf("%08b", ord($_)) } @ch; | ||
| 197 | + print "\n"; |
manual/object-streams.rst
0 โ 100644
| 1 | +.. _ref.object-and-xref-streams: | ||
| 2 | + | ||
| 3 | +Object and Cross-Reference Streams | ||
| 4 | +================================== | ||
| 5 | + | ||
| 6 | +This chapter provides information about the implementation of object | ||
| 7 | +stream and cross-reference stream support in qpdf. | ||
| 8 | + | ||
| 9 | +.. _ref.object-streams: | ||
| 10 | + | ||
| 11 | +Object Streams | ||
| 12 | +-------------- | ||
| 13 | + | ||
| 14 | +Object streams can contain any regular object except the following: | ||
| 15 | + | ||
| 16 | +- stream objects | ||
| 17 | + | ||
| 18 | +- objects with generation > 0 | ||
| 19 | + | ||
| 20 | +- the encryption dictionary | ||
| 21 | + | ||
| 22 | +- objects containing the /Length of another stream | ||
| 23 | + | ||
| 24 | +In addition, Adobe reader (at least as of version 8.0.0) appears to not | ||
| 25 | +be able to handle having the document catalog appear in an object stream | ||
| 26 | +if the file is encrypted, though this is not specifically disallowed by | ||
| 27 | +the specification. | ||
| 28 | + | ||
| 29 | +There are additional restrictions for linearized files. See | ||
| 30 | +:ref:`ref.object-streams-linearization` for details. | ||
| 31 | + | ||
| 32 | +The PDF specification refers to objects in object streams as "compressed | ||
| 33 | +objects" regardless of whether the object stream is compressed. | ||
| 34 | + | ||
| 35 | +The generation number of every object in an object stream must be zero. | ||
| 36 | +It is possible to delete and replace an object in an object stream with | ||
| 37 | +a regular object. | ||
| 38 | + | ||
| 39 | +The object stream dictionary has the following keys: | ||
| 40 | + | ||
| 41 | +- ``/N``: number of objects | ||
| 42 | + | ||
| 43 | +- ``/First``: byte offset of first object | ||
| 44 | + | ||
| 45 | +- ``/Extends``: indirect reference to stream that this extends | ||
| 46 | + | ||
| 47 | +Stream collections are formed with ``/Extends``. They must form a | ||
| 48 | +directed acyclic graph. These can be used for semantic information and | ||
| 49 | +are not meaningful to the PDF document's syntactic structure. Although | ||
| 50 | +qpdf preserves stream collections, it never generates them and doesn't | ||
| 51 | +make use of this information in any way. | ||
| 52 | + | ||
| 53 | +The specification recommends limiting the number of objects in object | ||
| 54 | +stream for efficiency in reading and decoding. Acrobat 6 uses no more | ||
| 55 | +than 100 objects per object stream for linearized files and no more 200 | ||
| 56 | +objects per stream for non-linearized files. ``QPDFWriter``, in object | ||
| 57 | +stream generation mode, never puts more than 100 objects in an object | ||
| 58 | +stream. | ||
| 59 | + | ||
| 60 | +Object stream contents consists of *N* pairs of integers, each of which | ||
| 61 | +is the object number and the byte offset of the object relative to the | ||
| 62 | +first object in the stream, followed by the objects themselves, | ||
| 63 | +concatenated. | ||
| 64 | + | ||
| 65 | +.. _ref.xref-streams: | ||
| 66 | + | ||
| 67 | +Cross-Reference Streams | ||
| 68 | +----------------------- | ||
| 69 | + | ||
| 70 | +For non-hybrid files, the value following ``startxref`` is the byte | ||
| 71 | +offset to the xref stream rather than the word ``xref``. | ||
| 72 | + | ||
| 73 | +For hybrid files (files containing both xref tables and cross-reference | ||
| 74 | +streams), the xref table's trailer dictionary contains the key | ||
| 75 | +``/XRefStm`` whose value is the byte offset to a cross-reference stream | ||
| 76 | +that supplements the xref table. A PDF 1.5-compliant application should | ||
| 77 | +read the xref table first. Then it should replace any object that it has | ||
| 78 | +already seen with any defined in the xref stream. Then it should follow | ||
| 79 | +any ``/Prev`` pointer in the original xref table's trailer dictionary. | ||
| 80 | +The specification is not clear about what should be done, if anything, | ||
| 81 | +with a ``/Prev`` pointer in the xref stream referenced by an xref table. | ||
| 82 | +The ``QPDF`` class ignores it, which is probably reasonable since, if | ||
| 83 | +this case were to appear for any sensible PDF file, the previous xref | ||
| 84 | +table would probably have a corresponding ``/XRefStm`` pointer of its | ||
| 85 | +own. For example, if a hybrid file were appended, the appended section | ||
| 86 | +would have its own xref table and ``/XRefStm``. The appended xref table | ||
| 87 | +would point to the previous xref table which would point the | ||
| 88 | +``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to | ||
| 89 | +it. | ||
| 90 | + | ||
| 91 | +Since xref streams must be read very early, they may not be encrypted, | ||
| 92 | +and the may not contain indirect objects for keys required to read them, | ||
| 93 | +which are these: | ||
| 94 | + | ||
| 95 | +- ``/Type``: value ``/XRef`` | ||
| 96 | + | ||
| 97 | +- ``/Size``: value *n+1*: where *n* is highest object number (same as | ||
| 98 | + ``/Size`` in the trailer dictionary) | ||
| 99 | + | ||
| 100 | +- ``/Index`` (optional): value | ||
| 101 | + ``[:samp:`{n count}` ...]`` used to determine | ||
| 102 | + which objects' information is stored in this stream. The default is | ||
| 103 | + ``[0 /Size]``. | ||
| 104 | + | ||
| 105 | +- ``/Prev``: value :samp:`{offset}`: byte | ||
| 106 | + offset of previous xref stream (same as ``/Prev`` in the trailer | ||
| 107 | + dictionary) | ||
| 108 | + | ||
| 109 | +- ``/W [...]``: sizes of each field in the xref table | ||
| 110 | + | ||
| 111 | +The other fields in the xref stream, which may be indirect if desired, | ||
| 112 | +are the union of those from the xref table's trailer dictionary. | ||
| 113 | + | ||
| 114 | +.. _ref.xref-stream-data: | ||
| 115 | + | ||
| 116 | +Cross-Reference Stream Data | ||
| 117 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
| 118 | + | ||
| 119 | +The stream data is binary and encoded in big-endian byte order. Entries | ||
| 120 | +are concatenated, and each entry has a length equal to the total of the | ||
| 121 | +entries in ``/W`` above. Each entry consists of one or more fields, the | ||
| 122 | +first of which is the type of the field. The number of bytes for each | ||
| 123 | +field is given by ``/W`` above. A 0 in ``/W`` indicates that the field | ||
| 124 | +is omitted and has the default value. The default value for the field | ||
| 125 | +type is "``1``". All other default values are "``0``". | ||
| 126 | + | ||
| 127 | +PDF 1.5 has three field types: | ||
| 128 | + | ||
| 129 | +- 0: for free objects. Format: ``0 obj next-generation``, same as the | ||
| 130 | + free table in a traditional cross-reference table | ||
| 131 | + | ||
| 132 | +- 1: regular non-compressed object. Format: ``1 offset generation`` | ||
| 133 | + | ||
| 134 | +- 2: for objects in object streams. Format: ``2 object-stream-number | ||
| 135 | + index``, the number of object stream containing the object and the | ||
| 136 | + index within the object stream of the object. | ||
| 137 | + | ||
| 138 | +It seems standard to have the first entry in the table be ``0 0 0`` | ||
| 139 | +instead of ``0 0 ffff`` if there are no deleted objects. | ||
| 140 | + | ||
| 141 | +.. _ref.object-streams-linearization: | ||
| 142 | + | ||
| 143 | +Implications for Linearized Files | ||
| 144 | +--------------------------------- | ||
| 145 | + | ||
| 146 | +For linearized files, the linearization dictionary, document catalog, | ||
| 147 | +and page objects may not be contained in object streams. | ||
| 148 | + | ||
| 149 | +Objects stored within object streams are given the highest range of | ||
| 150 | +object numbers within the main and first-page cross-reference sections. | ||
| 151 | + | ||
| 152 | +It is okay to use cross-reference streams in place of regular xref | ||
| 153 | +tables. There are on special considerations. | ||
| 154 | + | ||
| 155 | +Hint data refers to object streams themselves, not the objects in the | ||
| 156 | +streams. Shared object references should also be made to the object | ||
| 157 | +streams. There are no reference in any hint tables to the object numbers | ||
| 158 | +of compressed objects (objects within object streams). | ||
| 159 | + | ||
| 160 | +When numbering objects, all shared objects within both the first and | ||
| 161 | +second halves of the linearized files must be numbered consecutively | ||
| 162 | +after all normal uncompressed objects in that half. | ||
| 163 | + | ||
| 164 | +.. _ref.object-stream-implementation: | ||
| 165 | + | ||
| 166 | +Implementation Notes | ||
| 167 | +-------------------- | ||
| 168 | + | ||
| 169 | +There are three modes for writing object streams: | ||
| 170 | +:samp:`disable`, :samp:`preserve`, and | ||
| 171 | +:samp:`generate`. In disable mode, we do not generate | ||
| 172 | +any object streams, and we also generate an xref table rather than xref | ||
| 173 | +streams. This can be used to generate PDF files that are viewable with | ||
| 174 | +older readers. In preserve mode, we write object streams such that | ||
| 175 | +written object streams contain the same objects and ``/Extends`` | ||
| 176 | +relationships as in the original file. This is equal to disable if the | ||
| 177 | +file has no object streams. In generate, we create object streams | ||
| 178 | +ourselves by grouping objects that are allowed in object streams | ||
| 179 | +together in sets of no more than 100 objects. We also ensure that the | ||
| 180 | +PDF version is at least 1.5 in generate mode, but we preserve the | ||
| 181 | +version header in the other modes. The default is | ||
| 182 | +:samp:`preserve`. | ||
| 183 | + | ||
| 184 | +We do not support creation of hybrid files. When we write files, even in | ||
| 185 | +preserve mode, we will lose any xref tables and merge any appended | ||
| 186 | +sections. |
manual/overview.rst
0 โ 100644
| 1 | +.. _ref.overview: | ||
| 2 | + | ||
| 3 | +What is QPDF? | ||
| 4 | +============= | ||
| 5 | + | ||
| 6 | +QPDF is a program and C++ library for structural, content-preserving | ||
| 7 | +transformations on PDF files. QPDF's website is located at | ||
| 8 | +https://qpdf.sourceforge.io/. QPDF's source code is hosted on github | ||
| 9 | +at https://github.com/qpdf/qpdf. | ||
| 10 | + | ||
| 11 | +QPDF provides many useful capabilities to developers of PDF-producing | ||
| 12 | +software or for people who just want to look at the innards of a PDF | ||
| 13 | +file to learn more about how they work. With QPDF, it is possible to | ||
| 14 | +copy objects from one PDF file into another and to manipulate the list | ||
| 15 | +of pages in a PDF file. This makes it possible to merge and split PDF | ||
| 16 | +files. The QPDF library also makes it possible for you to create PDF | ||
| 17 | +files from scratch. In this mode, you are responsible for supplying | ||
| 18 | +all the contents of the file, while the QPDF library takes care of all | ||
| 19 | +the syntactical representation of the objects, creation of cross | ||
| 20 | +references tables and, if you use them, object streams, encryption, | ||
| 21 | +linearization, and other syntactic details. You are still responsible | ||
| 22 | +for generating PDF content on your own. | ||
| 23 | + | ||
| 24 | +QPDF has been designed with very few external dependencies, and it is | ||
| 25 | +intentionally very lightweight. QPDF is *not* a PDF content creation | ||
| 26 | +library, a PDF viewer, or a program capable of converting PDF into other | ||
| 27 | +formats. In particular, QPDF knows nothing about the semantics of PDF | ||
| 28 | +content streams. If you are looking for something that can do that, you | ||
| 29 | +should look elsewhere. However, once you have a valid PDF file, QPDF can | ||
| 30 | +be used to transform that file in ways that perhaps your original PDF | ||
| 31 | +creation tool can't handle. For example, many programs generate simple PDF | ||
| 32 | +files but can't password-protect them, web-optimize them, or perform | ||
| 33 | +other transformations of that type. |
manual/qdf.rst
0 โ 100644
| 1 | +.. _ref.qdf: | ||
| 2 | + | ||
| 3 | +QDF Mode | ||
| 4 | +======== | ||
| 5 | + | ||
| 6 | +In QDF mode, qpdf creates PDF files in what we call *QDF | ||
| 7 | +form*. A PDF file in QDF form, sometimes called a QDF | ||
| 8 | +file, is a completely valid PDF file that has ``%QDF-1.0`` as its third | ||
| 9 | +line (after the pdf header and binary characters) and has certain other | ||
| 10 | +characteristics. The purpose of QDF form is to make it possible to edit | ||
| 11 | +PDF files, with some restrictions, in an ordinary text editor. This can | ||
| 12 | +be very useful for experimenting with different PDF constructs or for | ||
| 13 | +making one-off edits to PDF files (though there are other reasons why | ||
| 14 | +this may not always work). Note that QDF mode does not support | ||
| 15 | +linearized files. If you enable linearization, QDF mode is automatically | ||
| 16 | +disabled. | ||
| 17 | + | ||
| 18 | +It is ordinarily very difficult to edit PDF files in a text editor for | ||
| 19 | +two reasons: most meaningful data in PDF files is compressed, and PDF | ||
| 20 | +files are full of offset and length information that makes it hard to | ||
| 21 | +add or remove data. A QDF file is organized in a manner such that, if | ||
| 22 | +edits are kept within certain constraints, the | ||
| 23 | +:command:`fix-qdf` program, distributed with qpdf, is | ||
| 24 | +able to restore edited files to a correct state. The | ||
| 25 | +:command:`fix-qdf` program takes no command-line | ||
| 26 | +arguments. It reads a possibly edited QDF file from standard input and | ||
| 27 | +writes a repaired file to standard output. | ||
| 28 | + | ||
| 29 | +The following attributes characterize a QDF file: | ||
| 30 | + | ||
| 31 | +- All objects appear in numerical order in the PDF file, including when | ||
| 32 | + objects appear in object streams. | ||
| 33 | + | ||
| 34 | +- Objects are printed in an easy-to-read format, and all line endings | ||
| 35 | + are normalized to UNIX line endings. | ||
| 36 | + | ||
| 37 | +- Unless specifically overridden, streams appear uncompressed (when | ||
| 38 | + qpdf supports the filters and they are compressed with a non-lossy | ||
| 39 | + compression scheme), and most content streams are normalized (line | ||
| 40 | + endings are converted to just a UNIX-style linefeeds). | ||
| 41 | + | ||
| 42 | +- All streams lengths are represented as indirect objects, and the | ||
| 43 | + stream length object is always the next object after the stream. If | ||
| 44 | + the stream data does not end with a newline, an extra newline is | ||
| 45 | + inserted, and a special comment appears after the stream indicating | ||
| 46 | + that this has been done. | ||
| 47 | + | ||
| 48 | +- If the PDF file contains object streams, if object stream *n* | ||
| 49 | + contains *k* objects, those objects are numbered from *n+1* through | ||
| 50 | + *n+k*, and the object number/offset pairs appear on a separate line | ||
| 51 | + for each object. Additionally, each object in the object stream is | ||
| 52 | + preceded by a comment indicating its object number and index. This | ||
| 53 | + makes it very easy to find objects in object streams. | ||
| 54 | + | ||
| 55 | +- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens, | ||
| 56 | + and ``endobj`` tokens appear on lines by themselves. A blank line | ||
| 57 | + follows every ``endobj`` token. | ||
| 58 | + | ||
| 59 | +- If there is a cross-reference stream, it is unfiltered. | ||
| 60 | + | ||
| 61 | +- Page dictionaries and page content streams are marked with special | ||
| 62 | + comments that make them easy to find. | ||
| 63 | + | ||
| 64 | +- Comments precede each object indicating the object number of the | ||
| 65 | + corresponding object in the original file. | ||
| 66 | + | ||
| 67 | +When editing a QDF file, any edits can be made as long as the above | ||
| 68 | +constraints are maintained. This means that you can freely edit a page's | ||
| 69 | +content without worrying about messing up the QDF file. It is also | ||
| 70 | +possible to add new objects so long as those objects are added after the | ||
| 71 | +last object in the file or subsequent objects are renumbered. If a QDF | ||
| 72 | +file has object streams in it, you can always add the new objects before | ||
| 73 | +the xref stream and then change the number of the xref stream, since | ||
| 74 | +nothing generally ever references it by number. | ||
| 75 | + | ||
| 76 | +It is not generally practical to remove objects from QDF files without | ||
| 77 | +messing up object numbering, but if you remove all references to an | ||
| 78 | +object, you can run qpdf on the file (after running | ||
| 79 | +:command:`fix-qdf`), and qpdf will omit the now-orphaned | ||
| 80 | +object. | ||
| 81 | + | ||
| 82 | +When :command:`fix-qdf` is run, it goes through the file | ||
| 83 | +and recomputes the following parts of the file: | ||
| 84 | + | ||
| 85 | +- the ``/N``, ``/W``, and ``/First`` keys of all object stream | ||
| 86 | + dictionaries | ||
| 87 | + | ||
| 88 | +- the pairs of numbers representing object numbers and offsets of | ||
| 89 | + objects in object streams | ||
| 90 | + | ||
| 91 | +- all stream lengths | ||
| 92 | + | ||
| 93 | +- the cross-reference table or cross-reference stream | ||
| 94 | + | ||
| 95 | +- the offset to the cross-reference table or cross-reference stream | ||
| 96 | + following the ``startxref`` token |
manual/release-notes.rst
0 โ 100644
| 1 | +.. _ref.release-notes: | ||
| 2 | + | ||
| 3 | +Release Notes | ||
| 4 | +============= | ||
| 5 | + | ||
| 6 | +For a detailed list of changes, please see the file | ||
| 7 | +:file:`ChangeLog` in the source distribution. | ||
| 8 | + | ||
| 9 | +10.5.0: XXX Month dd, YYYY | ||
| 10 | + - Library Enhancements | ||
| 11 | + | ||
| 12 | + - Since qpdf version 8, using object accessor methods on an | ||
| 13 | + instance of ``QPDFObjectHandle`` may create warnings if the | ||
| 14 | + object is not of the expected type. These warnings now have an | ||
| 15 | + error code of ``qpdf_e_object`` instead of | ||
| 16 | + ``qpdf_e_damaged_pdf``. Also, comments have been added to | ||
| 17 | + :file:`QPDFObjectHandle.hh` to explain in more detail what the | ||
| 18 | + behavior is. See :ref:`ref.object-accessors` for a more in-depth | ||
| 19 | + discussion. | ||
| 20 | + | ||
| 21 | + - Add ``Pl_Buffer::getMallocBuffer()`` to initialize a buffer | ||
| 22 | + allocated with ``malloc()`` for better cross-language | ||
| 23 | + interoperability. | ||
| 24 | + | ||
| 25 | + - C API Enhancements | ||
| 26 | + | ||
| 27 | + - Overhaul error handling for the object handle functions C API. | ||
| 28 | + Some rare error conditions that would previously have caused a | ||
| 29 | + crash are now trapped and reported, and the functions that | ||
| 30 | + generate them return fallback values. See comments in the | ||
| 31 | + ``ERROR HANDLING`` section of :file:`include/qpdf/qpdf-c.h` for | ||
| 32 | + details. In particular, exceptions thrown by the underlying C++ | ||
| 33 | + code when calling object accessors are caught and converted into | ||
| 34 | + errors. The errors can be checked by call ``qpdf_has_error``. | ||
| 35 | + Use ``qpdf_silence_errors`` to prevent the error from being | ||
| 36 | + written to stderr. | ||
| 37 | + | ||
| 38 | + - Add ``qpdf_get_last_string_length`` to the C API to get the | ||
| 39 | + length of the last string that was returned. This is needed to | ||
| 40 | + handle strings that contain embedded null characters. | ||
| 41 | + | ||
| 42 | + - Add ``qpdf_oh_is_initialized`` and | ||
| 43 | + ``qpdf_oh_new_uninitialized`` to the C API to make it possible | ||
| 44 | + to work with uninitialized objects. | ||
| 45 | + | ||
| 46 | + - Add ``qpdf_oh_new_object`` to the C API. This allows you to | ||
| 47 | + clone an object handle. | ||
| 48 | + | ||
| 49 | + - Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``, | ||
| 50 | + and ``qpdf_replace_object``, exposing the corresponding methods | ||
| 51 | + in ``QPDF`` and ``QPDFObjectHandle``. | ||
| 52 | + | ||
| 53 | + - Add several functions for working with pages. See ``PAGE | ||
| 54 | + FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details. | ||
| 55 | + | ||
| 56 | + - Add several functions for working with streams. See ``STREAM | ||
| 57 | + FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details. | ||
| 58 | + | ||
| 59 | + - Add ``qpdf_oh_get_type_code`` and ``qpdf_oh_get_type_name``. | ||
| 60 | + | ||
| 61 | + - Documentation change | ||
| 62 | + | ||
| 63 | + - The documentation sources have been switched from docbook to | ||
| 64 | + reStructuredText processed with `Sphinx | ||
| 65 | + <https://sphinx-doc.org>`__. This is mostly transparent (other | ||
| 66 | + than format change) with the exception that all section links | ||
| 67 | + have changed. What used to be `#ref.something` is now | ||
| 68 | + `#something`. A top-to-bottom review of the documentation is | ||
| 69 | + planned for an upcoming release. | ||
| 70 | + | ||
| 71 | +10.4.0: November 16, 2021 | ||
| 72 | + - Handling of Weak Cryptography Algorithms | ||
| 73 | + | ||
| 74 | + - From the qpdf CLI, the | ||
| 75 | + :samp:`--allow-weak-crypto` is now required to | ||
| 76 | + suppress a warning when explicitly creating PDF files using RC4 | ||
| 77 | + encryption. While qpdf will always retain the ability to read | ||
| 78 | + and write such files, doing so will require explicit | ||
| 79 | + acknowledgment moving forward. For qpdf 10.4, this change only | ||
| 80 | + affects the command-line tool. Starting in qpdf 11, there will | ||
| 81 | + be small API changes to require explicit acknowledgment in | ||
| 82 | + those cases as well. For additional information, see :ref:`ref.weak-crypto`. | ||
| 83 | + | ||
| 84 | + - Bug Fixes | ||
| 85 | + | ||
| 86 | + - Fix potential bounds error when handling shell completion that | ||
| 87 | + could occur when given bogus input. | ||
| 88 | + | ||
| 89 | + - Properly handle overlay/underlay on completely empty pages | ||
| 90 | + (with no resource dictionary). | ||
| 91 | + | ||
| 92 | + - Fix crash that could occur under certain conditions when using | ||
| 93 | + :samp:`--pages` with files that had form | ||
| 94 | + fields. | ||
| 95 | + | ||
| 96 | + - Library Enhancements | ||
| 97 | + | ||
| 98 | + - Make ``QPDF::findPage`` functions public. | ||
| 99 | + | ||
| 100 | + - Add methods to ``Pl_Flate`` to be able to receive warnings on | ||
| 101 | + certain recoverable conditions. | ||
| 102 | + | ||
| 103 | + - Add an extra check to the library to detect when foreign | ||
| 104 | + objects are inserted directly (instead of using | ||
| 105 | + ``QPDF::copyForeignObject``) at the time of insertion rather | ||
| 106 | + than when the file is written. Catching the error sooner makes | ||
| 107 | + it much easier to locate the incorrect code. | ||
| 108 | + | ||
| 109 | + - CLI Enhancements | ||
| 110 | + | ||
| 111 | + - Improve diagnostics around parsing | ||
| 112 | + :samp:`--pages` command-line options | ||
| 113 | + | ||
| 114 | + - Packaging Changes | ||
| 115 | + | ||
| 116 | + - The Windows binary distribution is now built with crypto | ||
| 117 | + provided by OpenSSL 3.0. | ||
| 118 | + | ||
| 119 | +10.3.2: May 8, 2021 | ||
| 120 | + - Bug Fixes | ||
| 121 | + | ||
| 122 | + - When generating a file while preserving object streams, | ||
| 123 | + unreferenced objects are correctly removed unless | ||
| 124 | + :samp:`--preserve-unreferenced` is specified. | ||
| 125 | + | ||
| 126 | + - Library Enhancements | ||
| 127 | + | ||
| 128 | + - When adding a page that already exists, make a shallow copy | ||
| 129 | + instead of throwing an exception. This makes the library | ||
| 130 | + behavior consistent with the CLI behavior. See | ||
| 131 | + :file:`ChangeLog` for additional notes. | ||
| 132 | + | ||
| 133 | +10.3.1: March 11, 2021 | ||
| 134 | + - Bug Fixes | ||
| 135 | + | ||
| 136 | + - Form field copying failed on files where /DR was a direct | ||
| 137 | + object in the document-level form dictionary. | ||
| 138 | + | ||
| 139 | +10.3.0: March 4, 2021 | ||
| 140 | + - Bug Fixes | ||
| 141 | + | ||
| 142 | + - The code for handling form fields when copying pages from | ||
| 143 | + 10.2.0 was not quite right and didn't work in a number of | ||
| 144 | + situations, such as when the same page was copied multiple | ||
| 145 | + times or when there were conflicting resource or field names | ||
| 146 | + across multiple copies. The 10.3.0 code has been much more | ||
| 147 | + thoroughly tested with more complex cases and with a multitude | ||
| 148 | + of readers and should be much closer to correct. The 10.2.0 | ||
| 149 | + code worked well enough for page splitting or for copying pages | ||
| 150 | + with form fields into documents that didn't already have them | ||
| 151 | + but was still not quite correct in handling of field-level | ||
| 152 | + resources. | ||
| 153 | + | ||
| 154 | + - When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is | ||
| 155 | + called, existing ``QPDFObjectHandle`` instances no longer point | ||
| 156 | + to the old objects. The next time they are accessed, they | ||
| 157 | + automatically notice the change to the underlying object and | ||
| 158 | + update themselves. This resolves a very longstanding source of | ||
| 159 | + confusion, albeit in a very rarely used method call. | ||
| 160 | + | ||
| 161 | + - Fix form field handling code to look for default appearances, | ||
| 162 | + quadding, and default resources in the right places. The code | ||
| 163 | + was not looking for things in the document-level interactive | ||
| 164 | + form dictionary that it was supposed to be finding there. This | ||
| 165 | + required adding a few new methods to | ||
| 166 | + ``QPDFFormFieldObjectHelper``. | ||
| 167 | + | ||
| 168 | + - Library Enhancements | ||
| 169 | + | ||
| 170 | + - Reworked the code that handles copying annotations and form | ||
| 171 | + fields during page operations. There were additional methods | ||
| 172 | + added to the public API from 10.2.0 and a one deprecation of a | ||
| 173 | + method added in 10.2.0. The majority of the API changes are in | ||
| 174 | + methods most people would never call and that will hopefully be | ||
| 175 | + superseded by higher-level interfaces for handling page copies. | ||
| 176 | + Please see the :file:`ChangeLog` file for | ||
| 177 | + details. | ||
| 178 | + | ||
| 179 | + - The method ``QPDF::numWarnings`` was added so that you can tell | ||
| 180 | + whether any warnings happened during a specific block of code. | ||
| 181 | + | ||
| 182 | +10.2.0: February 23, 2021 | ||
| 183 | + - CLI Behavior Changes | ||
| 184 | + | ||
| 185 | + - Operations that work on combining pages are much better about | ||
| 186 | + protecting form fields. In particular, | ||
| 187 | + :samp:`--split-pages` and | ||
| 188 | + :samp:`--pages` now preserve interaction form | ||
| 189 | + functionality by copying the relevant form field information | ||
| 190 | + from the original files. Additionally, if you use | ||
| 191 | + :samp:`--pages` to select only some pages from | ||
| 192 | + the original input file, unused form fields are removed, which | ||
| 193 | + prevents lots of unused annotations from being retained. | ||
| 194 | + | ||
| 195 | + - By default, :command:`qpdf` no longer allows | ||
| 196 | + creation of encrypted PDF files whose user password is | ||
| 197 | + non-empty and owner password is empty when a 256-bit key is in | ||
| 198 | + use. The :samp:`--allow-insecure` option, | ||
| 199 | + specified inside the :samp:`--encrypt` options, | ||
| 200 | + allows creation of such files. Behavior changes in the CLI are | ||
| 201 | + avoided when possible, but an exception was made here because | ||
| 202 | + this is security-related. qpdf must always allow creation of | ||
| 203 | + weird files for testing purposes, but it should not default to | ||
| 204 | + letting users unknowingly create insecure files. | ||
| 205 | + | ||
| 206 | + - Library Behavior Changes | ||
| 207 | + | ||
| 208 | + - Note: the changes in this section cause differences in output | ||
| 209 | + in some cases. These differences change the syntax of the PDF | ||
| 210 | + but do not change the semantics (meaning). I make a strong | ||
| 211 | + effort to avoid gratuitous changes in qpdf's output so that | ||
| 212 | + qpdf changes don't break people's tests. In this case, the | ||
| 213 | + changes significantly improve the readability of the generated | ||
| 214 | + PDF and don't affect any output that's generated by simple | ||
| 215 | + transformation. If you are annoyed by having to update test | ||
| 216 | + files, please rest assured that changes like this have been and | ||
| 217 | + will continue to be rare events. | ||
| 218 | + | ||
| 219 | + - ``QPDFObjectHandle::newUnicodeString`` now uses whichever of | ||
| 220 | + ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all | ||
| 221 | + the characters in the string. This reduces needless encoding in | ||
| 222 | + UTF-16 of strings that can be encoded in ASCII. This change may | ||
| 223 | + cause qpdf to generate different output than before when form | ||
| 224 | + field values are set using ``QPDFFormFieldObjectHelper`` but | ||
| 225 | + does not change the meaning of the output. | ||
| 226 | + | ||
| 227 | + - The code that places form XObjects and also the code that | ||
| 228 | + flattens rotations trim trailing zeroes from real numbers that | ||
| 229 | + they calculate. This causes slight (but semantically | ||
| 230 | + equivalent) differences in generated appearance streams and | ||
| 231 | + form XObject invocations in overlay/underlay code or in user | ||
| 232 | + code that calls the methods that place form XObjects on a page. | ||
| 233 | + | ||
| 234 | + - CLI Enhancements | ||
| 235 | + | ||
| 236 | + - Add new command line options for listing, saving, adding, | ||
| 237 | + removing, and and copying file attachments. See :ref:`ref.attachments` for details. | ||
| 238 | + | ||
| 239 | + - Page splitting and merging operations, as well as | ||
| 240 | + :samp:`--flatten-rotation`, are better behaved | ||
| 241 | + with respect to annotations and interactive form fields. In | ||
| 242 | + most cases, interactive form field functionality and proper | ||
| 243 | + formatting and functionality of annotations is preserved by | ||
| 244 | + these operations. There are still some cases that aren't | ||
| 245 | + perfect, such as when functionality of annotations depends on | ||
| 246 | + document-level data that qpdf doesn't yet understand or when | ||
| 247 | + there are problems with referential integrity among form fields | ||
| 248 | + and annotations (e.g., when a single form field object or its | ||
| 249 | + associated annotations are shared across multiple pages, a case | ||
| 250 | + that is out of spec but that works in most viewers anyway). | ||
| 251 | + | ||
| 252 | + - The option | ||
| 253 | + :samp:`--password-file={filename}` | ||
| 254 | + can now be used to read the decryption password from a file. | ||
| 255 | + You can use ``-`` as the file name to read the password from | ||
| 256 | + standard input. This is an easier/more obvious way to read | ||
| 257 | + passwords from files or standard input than using | ||
| 258 | + :samp:`@file` for this purpose. | ||
| 259 | + | ||
| 260 | + - Add some information about attachments to the json output, and | ||
| 261 | + added ``attachments`` as an additional json key. The | ||
| 262 | + information included here is limited to the preferred name and | ||
| 263 | + content stream and a reference to the file spec object. This is | ||
| 264 | + enough detail for clients to avoid the hassle of navigating a | ||
| 265 | + name tree and provides what is needed for basic enumeration and | ||
| 266 | + extraction of attachments. More detailed information can be | ||
| 267 | + obtained by following the reference to the file spec object. | ||
| 268 | + | ||
| 269 | + - Add numeric option to :samp:`--collate`. If | ||
| 270 | + :samp:`--collate={n}` | ||
| 271 | + is given, take pages in groups of | ||
| 272 | + :samp:`{n}` from the given files. | ||
| 273 | + | ||
| 274 | + - It is now valid to provide :samp:`--rotate=0` | ||
| 275 | + to clear rotation from a page. | ||
| 276 | + | ||
| 277 | + - Library Enhancements | ||
| 278 | + | ||
| 279 | + - This release includes numerous additions to the API. Not all | ||
| 280 | + changes are listed here. Please see the | ||
| 281 | + :file:`ChangeLog` file in the source | ||
| 282 | + distribution for a comprehensive list. Highlights appear below. | ||
| 283 | + | ||
| 284 | + - Add ``QPDFObjectHandle::ditems()`` and | ||
| 285 | + ``QPDFObjectHandle::aitems()`` that enable C++-style iteration, | ||
| 286 | + including range-for iteration, over dictionary and array | ||
| 287 | + QPDFObjectHandles. See comments in | ||
| 288 | + :file:`include/qpdf/QPDFObjectHandle.hh` | ||
| 289 | + and | ||
| 290 | + :file:`examples/pdf-name-number-tree.cc` | ||
| 291 | + for details. | ||
| 292 | + | ||
| 293 | + - Add ``QPDFObjectHandle::copyStream`` for making a copy of a | ||
| 294 | + stream within the same ``QPDF`` instance. | ||
| 295 | + | ||
| 296 | + - Add new helper classes for supporting file attachments, also | ||
| 297 | + known as embedded files. New classes are | ||
| 298 | + ``QPDFEmbeddedFileDocumentHelper``, | ||
| 299 | + ``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``. | ||
| 300 | + See their respective headers for details and | ||
| 301 | + :file:`examples/pdf-attach-file.cc` for an | ||
| 302 | + example. | ||
| 303 | + | ||
| 304 | + - Add a version of ``QPDFObjectHandle::parse`` that takes a | ||
| 305 | + ``QPDF`` pointer as context so that it can parse strings | ||
| 306 | + containing indirect object references. This is illustrated in | ||
| 307 | + :file:`examples/pdf-attach-file.cc`. | ||
| 308 | + | ||
| 309 | + - Re-implement ``QPDFNameTreeObjectHelper`` and | ||
| 310 | + ``QPDFNumberTreeObjectHelper`` to be more efficient, add an | ||
| 311 | + iterator-based API, give them the capability to repair broken | ||
| 312 | + trees, and create methods for modifying the trees. With this | ||
| 313 | + change, qpdf has a robust read/write implementation of name and | ||
| 314 | + number trees. | ||
| 315 | + | ||
| 316 | + - Add new versions of ``QPDFObjectHandle::replaceStreamData`` | ||
| 317 | + that take ``std::function`` objects for cases when you need | ||
| 318 | + something between a static string and a full-fledged | ||
| 319 | + StreamDataProvider. Using this with ``QUtil::file_provider`` is | ||
| 320 | + a very easy way to create a stream from the contents of a file. | ||
| 321 | + | ||
| 322 | + - The ``QPDFMatrix`` class, formerly a private, internal class, | ||
| 323 | + has been added to the public API. See | ||
| 324 | + :file:`include/qpdf/QPDFMatrix.hh` for | ||
| 325 | + details. This class is for working with transformation | ||
| 326 | + matrices. Some methods in ``QPDFPageObjectHelper`` make use of | ||
| 327 | + this to make information about transformation matrices | ||
| 328 | + available. For an example, see | ||
| 329 | + :file:`examples/pdf-overlay-page.cc`. | ||
| 330 | + | ||
| 331 | + - Several new methods were added to | ||
| 332 | + ``QPDFAcroFormDocumentHelper`` for adding, removing, getting | ||
| 333 | + information about, and enumerating form fields. | ||
| 334 | + | ||
| 335 | + - Add method | ||
| 336 | + ``QPDFAcroFormDocumentHelper::transformAnnotations``, which | ||
| 337 | + applies a transformation to each annotation on a page. | ||
| 338 | + | ||
| 339 | + - Add ``QPDFPageObjectHelper::copyAnnotations``, which copies | ||
| 340 | + annotations and, if applicable, associated form fields, from | ||
| 341 | + one page to another, possibly transforming the rectangles. | ||
| 342 | + | ||
| 343 | + - Build Changes | ||
| 344 | + | ||
| 345 | + - A C++-14 compiler is now required to build qpdf. There is no | ||
| 346 | + intention to require anything newer than that for a while. | ||
| 347 | + C++-14 includes modest enhancements to C++-11 and appears to be | ||
| 348 | + supported about as widely as C++-11. | ||
| 349 | + | ||
| 350 | + - Bug Fixes | ||
| 351 | + | ||
| 352 | + - The :samp:`--flatten-rotation` option applies | ||
| 353 | + transformations to any annotations that may be on the page. | ||
| 354 | + | ||
| 355 | + - If a form XObject lacks a resources dictionary, consider any | ||
| 356 | + names in that form XObject to be referenced from the containing | ||
| 357 | + page. This is compliant with older PDF versions. Also detect if | ||
| 358 | + any form XObjects have any unresolved names and, if so, don't | ||
| 359 | + remove unreferenced resources from them or from the page that | ||
| 360 | + contains them. Unfortunately this has the side effect of | ||
| 361 | + preventing removal of unreferenced resources in some cases | ||
| 362 | + where names appear that don't refer to resources, such as with | ||
| 363 | + tagged PDF. This is a bit of a corner case that is not likely | ||
| 364 | + to cause a significant problem in practice, but the only side | ||
| 365 | + effect would be lack of removal of shared resources. A future | ||
| 366 | + version of qpdf may be more sophisticated in its detection of | ||
| 367 | + names that refer to resources. | ||
| 368 | + | ||
| 369 | + - Properly handle strings if they appear in inline image | ||
| 370 | + dictionaries while externalizing inline images. | ||
| 371 | + | ||
| 372 | +10.1.0: January 5, 2021 | ||
| 373 | + - CLI Enhancements | ||
| 374 | + | ||
| 375 | + - Add :samp:`--flatten-rotation` command-line | ||
| 376 | + option, which causes all pages that are rotated using | ||
| 377 | + parameters in the page's dictionary to instead be identically | ||
| 378 | + rotated in the page's contents. The change is not user-visible | ||
| 379 | + for compliant PDF readers but can be used to work around broken | ||
| 380 | + PDF applications that don't properly handle page rotation. | ||
| 381 | + | ||
| 382 | + - Library Enhancements | ||
| 383 | + | ||
| 384 | + - Support for user-provided (pluggable, modular) stream filters. | ||
| 385 | + It is now possible to derive a class from ``QPDFStreamFilter`` | ||
| 386 | + and register it with ``QPDF`` so that regular library methods, | ||
| 387 | + including those used by ``QPDFWriter``, can decode streams with | ||
| 388 | + filters not directly supported by the library. The example | ||
| 389 | + :file:`examples/pdf-custom-filter.cc` | ||
| 390 | + illustrates how to use this capability. | ||
| 391 | + | ||
| 392 | + - Add methods to ``QPDFPageObjectHelper`` to iterate through | ||
| 393 | + XObjects on a page or form XObjects, possibly recursing into | ||
| 394 | + nested form XObjects: ``forEachXObject``, ``ForEachImage``, | ||
| 395 | + ``forEachFormXObject``. | ||
| 396 | + | ||
| 397 | + - Enhance several methods in ``QPDFPageObjectHelper`` to work | ||
| 398 | + with form XObjects as well as pages, as noted in comments. See | ||
| 399 | + :file:`ChangeLog` for a full list. | ||
| 400 | + | ||
| 401 | + - Rename some functions in ``QPDFPageObjectHelper``, while | ||
| 402 | + keeping old names for compatibility: | ||
| 403 | + | ||
| 404 | + - ``getPageImages`` to ``getImages`` | ||
| 405 | + | ||
| 406 | + - ``filterPageContents`` to ``filterContents`` | ||
| 407 | + | ||
| 408 | + - ``pipePageContents`` to ``pipeContents`` | ||
| 409 | + | ||
| 410 | + - ``parsePageContents`` to ``parseContents`` | ||
| 411 | + | ||
| 412 | + - Add method ``QPDFPageObjectHelper::getFormXObjects`` to return | ||
| 413 | + a map of form XObjects directly on a page or form XObject | ||
| 414 | + | ||
| 415 | + - Add new helper methods to ``QPDFObjectHandle``: | ||
| 416 | + ``isFormXObject``, ``isImage`` | ||
| 417 | + | ||
| 418 | + - Add the optional ``allow_streams`` parameter | ||
| 419 | + ``QPDFObjectHandle::makeDirect``. When | ||
| 420 | + ``QPDFObjectHandle::makeDirect`` is called in this way, it | ||
| 421 | + preserves references to streams rather than throwing an | ||
| 422 | + exception. | ||
| 423 | + | ||
| 424 | + - Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this | ||
| 425 | + on a stream prevents ``QPDFWriter`` from attempting to | ||
| 426 | + uncompress, recompress, or otherwise filter a stream even if it | ||
| 427 | + could. Developers can use this to protect streams that are | ||
| 428 | + optimized should be protected from ``QPDFWriter``'s default | ||
| 429 | + behavior for any other reason. | ||
| 430 | + | ||
| 431 | + - Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is | ||
| 432 | + useful to have for debugging. | ||
| 433 | + | ||
| 434 | + - Add method ``QPDFPageObjectHelper::flattenRotation``, which | ||
| 435 | + replaces a page's ``/Rotate`` keyword by rotating the page | ||
| 436 | + within the content stream and altering the page's bounding | ||
| 437 | + boxes so the rendering is the same. This can be used to work | ||
| 438 | + around buggy PDF readers that can't properly handle page | ||
| 439 | + rotation. | ||
| 440 | + | ||
| 441 | + - C API Enhancements | ||
| 442 | + | ||
| 443 | + - Add several new functions to the C API for working with | ||
| 444 | + objects. These are wrappers around many of the methods in | ||
| 445 | + ``QPDFObjectHandle``. Their inclusion adds considerable new | ||
| 446 | + capability to the C API. | ||
| 447 | + | ||
| 448 | + - Add ``qpdf_register_progress_reporter`` to the C API, | ||
| 449 | + corresponding to ``QPDFWriter::registerProgressReporter``. | ||
| 450 | + | ||
| 451 | + - Performance Enhancements | ||
| 452 | + | ||
| 453 | + - Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object | ||
| 454 | + for writing, resulting in about an 8% improvement in write | ||
| 455 | + performance while allowing indirect objects to appear in | ||
| 456 | + ``/DecodeParms``. | ||
| 457 | + | ||
| 458 | + - When extracting pages, the :command:`qpdf` CLI | ||
| 459 | + only removes unreferenced resources from the pages that are | ||
| 460 | + being kept, resulting in a significant performance improvement | ||
| 461 | + when extracting small numbers of pages from large, complex | ||
| 462 | + documents. | ||
| 463 | + | ||
| 464 | + - Bug Fixes | ||
| 465 | + | ||
| 466 | + - ``QPDFPageObjectHelper::externalizeInlineImages`` was not | ||
| 467 | + externalizing images referenced from form XObjects that | ||
| 468 | + appeared on the page. | ||
| 469 | + | ||
| 470 | + - ``QPDFObjectHandle::filterPageContents`` was broken for pages | ||
| 471 | + with multiple content streams. | ||
| 472 | + | ||
| 473 | + - Tweak zsh completion code to behave a little better with | ||
| 474 | + respect to path completion. | ||
| 475 | + | ||
| 476 | +10.0.4: November 21, 2020 | ||
| 477 | + - Bug Fixes | ||
| 478 | + | ||
| 479 | + - Fix a handful of integer overflows. This includes cases found | ||
| 480 | + by fuzzing as well as having qpdf not do range checking on | ||
| 481 | + unused values in the xref stream. | ||
| 482 | + | ||
| 483 | +10.0.3: October 31, 2020 | ||
| 484 | + - Bug Fixes | ||
| 485 | + | ||
| 486 | + - The fix to the bug involving copying streams with indirect | ||
| 487 | + filters was incorrect and introduced a new, more serious bug. | ||
| 488 | + The original bug has been fixed correctly, as has the bug | ||
| 489 | + introduced in 10.0.2. | ||
| 490 | + | ||
| 491 | +10.0.2: October 27, 2020 | ||
| 492 | + - Bug Fixes | ||
| 493 | + | ||
| 494 | + - When concatenating content streams, as with | ||
| 495 | + :samp:`--coalesce-contents`, there were cases | ||
| 496 | + in which qpdf would merge two lexical tokens together, creating | ||
| 497 | + invalid results. A newline is now inserted between merged | ||
| 498 | + content streams if one is not already present. | ||
| 499 | + | ||
| 500 | + - Fix an internal error that could occur when copying foreign | ||
| 501 | + streams whose stream data had been replaced using a stream data | ||
| 502 | + provider if those streams had indirect filters or decode | ||
| 503 | + parameters. This is a rare corner case. | ||
| 504 | + | ||
| 505 | + - Ensure that the caller's locale settings do not change the | ||
| 506 | + results of numeric conversions performed internally by the qpdf | ||
| 507 | + library. Note that the problem here could only be caused when | ||
| 508 | + the qpdf library was used programmatically. Using the qpdf CLI | ||
| 509 | + already ignored the user's locale for numeric conversion. | ||
| 510 | + | ||
| 511 | + - Fix several instances in which warnings were not suppressed in | ||
| 512 | + spite of :samp:`--no-warn` and/or errors or | ||
| 513 | + warnings were written to standard output rather than standard | ||
| 514 | + error. | ||
| 515 | + | ||
| 516 | + - Fixed a memory leak that could occur under specific | ||
| 517 | + circumstances when | ||
| 518 | + :samp:`--object-streams=generate` was used. | ||
| 519 | + | ||
| 520 | + - Fix various integer overflows and similar conditions found by | ||
| 521 | + the OSS-Fuzz project. | ||
| 522 | + | ||
| 523 | + - Enhancements | ||
| 524 | + | ||
| 525 | + - New option :samp:`--warning-exit-0` causes qpdf | ||
| 526 | + to exit with a status of ``0`` rather than ``3`` if there are | ||
| 527 | + warnings but no errors. Combine with | ||
| 528 | + :samp:`--no-warn` to completely ignore | ||
| 529 | + warnings. | ||
| 530 | + | ||
| 531 | + - Performance improvements have been made to | ||
| 532 | + ``QPDF::processMemoryFile``. | ||
| 533 | + | ||
| 534 | + - The OpenSSL crypto provider produces more detailed error | ||
| 535 | + messages. | ||
| 536 | + | ||
| 537 | + - Build Changes | ||
| 538 | + | ||
| 539 | + - The option :samp:`--disable-rpath` is now | ||
| 540 | + supported by qpdf's :command:`./configure` | ||
| 541 | + script. Some distributions' packaging standards recommended the | ||
| 542 | + use of this option. | ||
| 543 | + | ||
| 544 | + - Selection of a printf format string for ``long long`` has | ||
| 545 | + been moved from ``ifdefs`` to an autoconf | ||
| 546 | + test. If you are using your own build system, you will need to | ||
| 547 | + provide a value for ``LL_FMT`` in | ||
| 548 | + :file:`libqpdf/qpdf/qpdf-config.h`, which | ||
| 549 | + would typically be ``"%lld"`` or, for some Windows compilers, | ||
| 550 | + ``"%I64d"``. | ||
| 551 | + | ||
| 552 | + - Several improvements were made to build-time configuration of | ||
| 553 | + the OpenSSL crypto provider. | ||
| 554 | + | ||
| 555 | + - A nearly stand-alone Linux binary zip file is now included with | ||
| 556 | + the qpdf release. This is built on an older (but supported) | ||
| 557 | + Ubuntu LTS release, but would work on most reasonably recent | ||
| 558 | + Linux distributions. It contains only the executables and | ||
| 559 | + required shared libraries that would not be present on a | ||
| 560 | + minimal system. It can be used for including qpdf in a minimal | ||
| 561 | + environment, such as a docker container. The zip file is also | ||
| 562 | + known to work as a layer in AWS Lambda. | ||
| 563 | + | ||
| 564 | + - QPDF's automated build has been migrated from Azure Pipelines | ||
| 565 | + to GitHub Actions. | ||
| 566 | + | ||
| 567 | + - Windows-specific Changes | ||
| 568 | + | ||
| 569 | + - The Windows executables distributed with qpdf releases now use | ||
| 570 | + the OpenSSL crypto provider by default. The native crypto | ||
| 571 | + provider is also compiled in and can be selected at runtime | ||
| 572 | + with the ``QPDF_CRYPTO_PROVIDER`` environment variable. | ||
| 573 | + | ||
| 574 | + - Improvements have been made to how a cryptographic provider is | ||
| 575 | + obtained in the native Windows crypto implementation. However | ||
| 576 | + mostly this is shadowed by OpenSSL being used by default. | ||
| 577 | + | ||
| 578 | +10.0.1: April 9, 2020 | ||
| 579 | + - Bug Fixes | ||
| 580 | + | ||
| 581 | + - 10.0.0 introduced a bug in which calling | ||
| 582 | + ``QPDFObjectHandle::getStreamData`` on a stream that can't be | ||
| 583 | + filtered was returning the raw data instead of throwing an | ||
| 584 | + exception. This is now fixed. | ||
| 585 | + | ||
| 586 | + - Fix a bug that was preventing qpdf from linking with some | ||
| 587 | + versions of clang on some platforms. | ||
| 588 | + | ||
| 589 | + - Enhancements | ||
| 590 | + | ||
| 591 | + - Improve the :file:`pdf-invert-images` | ||
| 592 | + example to avoid having to load all the images into RAM at the | ||
| 593 | + same time. | ||
| 594 | + | ||
| 595 | +10.0.0: April 6, 2020 | ||
| 596 | + - Performance Enhancements | ||
| 597 | + | ||
| 598 | + - The qpdf library and executable should run much faster in this | ||
| 599 | + version than in the last several releases. Several internal | ||
| 600 | + library optimizations have been made, and there has been | ||
| 601 | + improved behavior on page splitting as well. This version of | ||
| 602 | + qpdf should outperform any of the 8.x or 9.x versions. | ||
| 603 | + | ||
| 604 | + - Incompatible API (source-level) Changes (minor) | ||
| 605 | + | ||
| 606 | + - The ``QUtil::srandom`` method was removed. It didn't do | ||
| 607 | + anything unless insecure random numbers were compiled in, and | ||
| 608 | + they have been off by default for a long time. If you were | ||
| 609 | + calling it, just remove the call since it wasn't doing anything | ||
| 610 | + anyway. | ||
| 611 | + | ||
| 612 | + - Build/Packaging Changes | ||
| 613 | + | ||
| 614 | + - Add a ``openssl`` crypto provider, which is implemented with | ||
| 615 | + OpenSSL and also works with BoringSSL. Thanks to Dean Scarff | ||
| 616 | + for this contribution. If you maintain qpdf for a distribution, | ||
| 617 | + pay special attention to make sure that you are including | ||
| 618 | + support for the crypto providers you want. Package maintainers | ||
| 619 | + will have to weigh the advantages of allowing users to pick a | ||
| 620 | + crypto provider at runtime against the disadvantages of adding | ||
| 621 | + more dependencies to qpdf. | ||
| 622 | + | ||
| 623 | + - Allow qpdf to built on stripped down systems whose C/C++ | ||
| 624 | + libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in | ||
| 625 | + qpdf's README.md for details. This should be very rare, but it | ||
| 626 | + is known to be helpful in some embedded environments. | ||
| 627 | + | ||
| 628 | + - CLI Enhancements | ||
| 629 | + | ||
| 630 | + - Add ``objectinfo`` key to the JSON output. This will be a place | ||
| 631 | + to put computed metadata or other information about PDF objects | ||
| 632 | + that are not immediately evident in other ways or that seem | ||
| 633 | + useful for some other reason. In this version, information is | ||
| 634 | + provided about each object indicating whether it is a stream | ||
| 635 | + and, if so, what its length and filters are. Without this, it | ||
| 636 | + was not possible to tell conclusively from the JSON output | ||
| 637 | + alone whether or not an object was a stream. Run | ||
| 638 | + :command:`qpdf --json-help` for details. | ||
| 639 | + | ||
| 640 | + - Add new option | ||
| 641 | + :samp:`--remove-unreferenced-resources` which | ||
| 642 | + takes ``auto``, ``yes``, or ``no`` as arguments. The new | ||
| 643 | + ``auto`` mode, which is the default, performs a fast heuristic | ||
| 644 | + over a PDF file when splitting pages to determine whether the | ||
| 645 | + expensive process of finding and removing unreferenced | ||
| 646 | + resources is likely to be of benefit. For most files, this new | ||
| 647 | + default will result in a significant performance improvement | ||
| 648 | + for splitting pages. See :ref:`ref.advanced-transformation` for a more detailed | ||
| 649 | + discussion. | ||
| 650 | + | ||
| 651 | + - The :samp:`--preserve-unreferenced-resources` | ||
| 652 | + is now just a synonym for | ||
| 653 | + :samp:`--remove-unreferenced-resources=no`. | ||
| 654 | + | ||
| 655 | + - If the ``QPDF_EXECUTABLE`` environment variable is set when | ||
| 656 | + invoking :command:`qpdf --bash-completion` or | ||
| 657 | + :command:`qpdf --zsh-completion`, the completion | ||
| 658 | + command that it outputs will refer to qpdf using the value of | ||
| 659 | + that variable rather than what :command:`qpdf` | ||
| 660 | + determines its executable path to be. This can be useful when | ||
| 661 | + wrapping :command:`qpdf` with a script, working | ||
| 662 | + with a version in the source tree, using an AppImage, or other | ||
| 663 | + situations where there is some indirection. | ||
| 664 | + | ||
| 665 | + - Library Enhancements | ||
| 666 | + | ||
| 667 | + - Random number generation is now delegated to the crypto | ||
| 668 | + provider. The old behavior is still used by the native crypto | ||
| 669 | + provider. It is still possible to provide your own random | ||
| 670 | + number generator. | ||
| 671 | + | ||
| 672 | + - Add a new version of | ||
| 673 | + ``QPDFObjectHandle::StreamDataProvider::provideStreamData`` | ||
| 674 | + that accepts the ``suppress_warnings`` and ``will_retry`` | ||
| 675 | + options and allows a success code to be returned. This makes it | ||
| 676 | + possible to implement a ``StreamDataProvider`` that calls | ||
| 677 | + ``pipeStreamData`` on another stream and to pass the response | ||
| 678 | + back to the caller, which enables better error handling on | ||
| 679 | + those proxied streams. | ||
| 680 | + | ||
| 681 | + - Update ``QPDFObjectHandle::pipeStreamData`` to return an | ||
| 682 | + overall success code that goes beyond whether or not filtered | ||
| 683 | + data was written successfully. This allows better error | ||
| 684 | + handling of cases that were not filtering errors. You have to | ||
| 685 | + call this explicitly. Methods in previously existing APIs have | ||
| 686 | + the same semantics as before. | ||
| 687 | + | ||
| 688 | + - The ``QPDFPageObjectHelper::placeFormXObject`` method now | ||
| 689 | + allows separate control over whether it should be willing to | ||
| 690 | + shrink or expand objects to fit them better into the | ||
| 691 | + destination rectangle. The previous behavior was that shrinking | ||
| 692 | + was allowed but expansion was not. The previous behavior is | ||
| 693 | + still the default. | ||
| 694 | + | ||
| 695 | + - When calling the C API, any non-zero value passed to a boolean | ||
| 696 | + parameter is treated as ``TRUE``. Previously only the value | ||
| 697 | + ``1`` was accepted. This makes the C API behave more like most | ||
| 698 | + C interfaces and is known to improve compatibility with some | ||
| 699 | + Windows environments that dynamically load the DLL and call | ||
| 700 | + functions from it. | ||
| 701 | + | ||
| 702 | + - Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only | ||
| 703 | + top-level dictionary keys or array items. This is unsafe | ||
| 704 | + because it creates a situation in which changing a lower-level | ||
| 705 | + item in one object may also change it in another object, but | ||
| 706 | + for cases in which you *know* you are only inserting or | ||
| 707 | + replacing top-level items, it is much faster than | ||
| 708 | + ``QPDFObjectHandle::shallowCopy``. | ||
| 709 | + | ||
| 710 | + - Add ``QPDFObjectHandle::filterAsContents``, which filter's a | ||
| 711 | + stream's data as a content stream. This is useful for parsing | ||
| 712 | + the contents for form XObjects in the same way as parsing page | ||
| 713 | + content streams. | ||
| 714 | + | ||
| 715 | + - Bug Fixes | ||
| 716 | + | ||
| 717 | + - When detecting and removing unreferenced resources during page | ||
| 718 | + splitting, traverse into form XObjects and handle their | ||
| 719 | + resources dictionaries as well. | ||
| 720 | + | ||
| 721 | + - The same error recovery is applied to streams in other than the | ||
| 722 | + primary input file when merging or splitting pages. | ||
| 723 | + | ||
| 724 | +9.1.1: January 26, 2020 | ||
| 725 | + - Build/Packaging Changes | ||
| 726 | + | ||
| 727 | + - The fix-qdf program was converted from perl to C++. As such, | ||
| 728 | + qpdf no longer has a runtime dependency on perl. | ||
| 729 | + | ||
| 730 | + - Library Enhancements | ||
| 731 | + | ||
| 732 | + - Added new helper routine ``QUtil::call_main_from_wmain`` which | ||
| 733 | + converts ``wchar_t`` arguments to UTF-8 encoded strings. This | ||
| 734 | + is useful for qpdf because library methods expect file names to | ||
| 735 | + be UTF-8 encoded, even on Windows | ||
| 736 | + | ||
| 737 | + - Added new ``QUtil::read_lines_from_file`` methods that take | ||
| 738 | + ``FILE*`` arguments and that allow preservation of end-of-line | ||
| 739 | + characters. This also fixes a bug where | ||
| 740 | + ``QUtil::read_lines_from_file`` wouldn't work properly with | ||
| 741 | + Unicode filenames. | ||
| 742 | + | ||
| 743 | + - CLI Enhancements | ||
| 744 | + | ||
| 745 | + - Added options :samp:`--is-encrypted` and | ||
| 746 | + :samp:`--requires-password` for testing whether | ||
| 747 | + a file is encrypted or requires a password other than the | ||
| 748 | + supplied (or empty) password. These communicate via exit | ||
| 749 | + status, making them useful for shell scripts. They also work on | ||
| 750 | + encrypted files with unknown passwords. | ||
| 751 | + | ||
| 752 | + - Added ``encrypt`` key to JSON options. With the exception of | ||
| 753 | + the reconstructed user password for older encryption formats, | ||
| 754 | + this provides the same information as | ||
| 755 | + :samp:`--show-encryption` but in a consistent, | ||
| 756 | + parseable format. See output of :command:`qpdf | ||
| 757 | + --json-help` for details. | ||
| 758 | + | ||
| 759 | + - Bug Fixes | ||
| 760 | + | ||
| 761 | + - In QDF mode, be sure not to write more than one XRef stream to | ||
| 762 | + a file, even when | ||
| 763 | + :samp:`--preserve-unreferenced` is used. | ||
| 764 | + :command:`fix-qdf` assumes that there is only | ||
| 765 | + one XRef stream, and that it appears at the end of the file. | ||
| 766 | + | ||
| 767 | + - When externalizing inline images, properly handle images whose | ||
| 768 | + color space is a reference to an object in the page's resource | ||
| 769 | + dictionary. | ||
| 770 | + | ||
| 771 | + - Windows-specific fix for acquiring crypt context with a new | ||
| 772 | + keyset. | ||
| 773 | + | ||
| 774 | +9.1.0: November 17, 2019 | ||
| 775 | + - Build Changes | ||
| 776 | + | ||
| 777 | + - A C++-11 compiler is now required to build qpdf. | ||
| 778 | + | ||
| 779 | + - A new crypto provider that uses gnutls for crypto functions is | ||
| 780 | + now available and can be enabled at build time. See :ref:`ref.crypto` for more information about crypto | ||
| 781 | + providers and :ref:`ref.crypto.build` for specific information about | ||
| 782 | + the build. | ||
| 783 | + | ||
| 784 | + - Library Enhancements | ||
| 785 | + | ||
| 786 | + - Incorporate contribution from Masamichi Hosoda to properly | ||
| 787 | + handle signature dictionaries by not including them in object | ||
| 788 | + streams, formatting the ``Contents`` key has a hexadecimal | ||
| 789 | + string, and excluding the ``/Contents`` key from encryption and | ||
| 790 | + decryption. | ||
| 791 | + | ||
| 792 | + - Incorporate contribution from Masamichi Hosoda to provide new | ||
| 793 | + API calls for getting file-level information about input and | ||
| 794 | + output files, enabling certain operations on the files at the | ||
| 795 | + file level rather than the object level. New methods include | ||
| 796 | + ``QPDF::getXRefTable()``, | ||
| 797 | + ``QPDFObjectHandle::getParsedOffset()``, | ||
| 798 | + ``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and | ||
| 799 | + ``QPDFWriter::getWrittenXRefTable()``. | ||
| 800 | + | ||
| 801 | + - Support build-time and runtime selectable crypto providers. | ||
| 802 | + This includes the addition of new classes | ||
| 803 | + ``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the | ||
| 804 | + recognition of the ``QPDF_CRYPTO_PROVIDER`` environment | ||
| 805 | + variable. Crypto providers are described in depth in :ref:`ref.crypto`. | ||
| 806 | + | ||
| 807 | + - CLI Enhancements | ||
| 808 | + | ||
| 809 | + - Addition of the :samp:`--show-crypto` option in | ||
| 810 | + support of selectable crypto providers, as described in :ref:`ref.crypto`. | ||
| 811 | + | ||
| 812 | + - Allow ``:even`` or ``:odd`` to be appended to numeric ranges | ||
| 813 | + for specification of the even or odd pages from among the pages | ||
| 814 | + specified in the range. | ||
| 815 | + | ||
| 816 | + - Fix shell wildcard expansion behavior (``*`` and ``?``) of the | ||
| 817 | + :command:`qpdf.exe` as built my MSVC. | ||
| 818 | + | ||
| 819 | +9.0.2: October 12, 2019 | ||
| 820 | + - Bug Fix | ||
| 821 | + | ||
| 822 | + - Fix the name of the temporary file used by | ||
| 823 | + :samp:`--replace-input` so that it doesn't | ||
| 824 | + require path splitting and works with paths include | ||
| 825 | + directories. | ||
| 826 | + | ||
| 827 | +9.0.1: September 20, 2019 | ||
| 828 | + - Bug Fixes/Enhancements | ||
| 829 | + | ||
| 830 | + - Fix some build and test issues on big-endian systems and | ||
| 831 | + compilers with characters that are unsigned by default. The | ||
| 832 | + problems were in build and test only. There were no actual bugs | ||
| 833 | + in the qpdf library itself relating to endianness or unsigned | ||
| 834 | + characters. | ||
| 835 | + | ||
| 836 | + - When a dictionary has a duplicated key, report this with a | ||
| 837 | + warning. The behavior of the library in this case is unchanged, | ||
| 838 | + but the error condition is no longer silently ignored. | ||
| 839 | + | ||
| 840 | + - When a form field's display rectangle is erroneously specified | ||
| 841 | + with inverted coordinates, detect and correct this situation. | ||
| 842 | + This avoids some form fields from being flipped when flattening | ||
| 843 | + annotations on files with this condition. | ||
| 844 | + | ||
| 845 | +9.0.0: August 31, 2019 | ||
| 846 | + - Incompatible API (source-level) Changes (minor) | ||
| 847 | + | ||
| 848 | + - The method ``QUtil::strcasecmp`` has been renamed to | ||
| 849 | + ``QUtil::str_compare_nocase``. This incompatible change is | ||
| 850 | + necessary to enable qpdf to build on platforms that define | ||
| 851 | + ``strcasecmp`` as a macro. | ||
| 852 | + | ||
| 853 | + - The ``QPDF::copyForeignObject`` method had an overloaded | ||
| 854 | + version that took a boolean parameter that was not used. If you | ||
| 855 | + were using this version, just omit the extra parameter. | ||
| 856 | + | ||
| 857 | + - There was a version ``QPDFTokenizer::expectInlineImage`` that | ||
| 858 | + took no arguments. This version has been removed since it | ||
| 859 | + caused the tokenizer to return incorrect inline images. A new | ||
| 860 | + version was added some time ago that produces correct output. | ||
| 861 | + This is a very low level method that doesn't make sense to call | ||
| 862 | + outside of qpdf's lexical engine. There are higher level | ||
| 863 | + methods for tokenizing content streams. | ||
| 864 | + | ||
| 865 | + - Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and | ||
| 866 | + ``QPDFOutlineObjectHelper::getKids`` to return a | ||
| 867 | + ``std::vector`` instead of a ``std::list`` of | ||
| 868 | + ``QPDFOutlineObjectHelper`` objects. | ||
| 869 | + | ||
| 870 | + - Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This | ||
| 871 | + function would allow creation of name tokens whose value would | ||
| 872 | + change when unparsed, which is never the correct behavior. | ||
| 873 | + | ||
| 874 | + - CLI Enhancements | ||
| 875 | + | ||
| 876 | + - The :samp:`--replace-input` option may be given | ||
| 877 | + in place of an output file name. This causes qpdf to overwrite | ||
| 878 | + the input file with the output. See the description of | ||
| 879 | + :samp:`--replace-input` in :ref:`ref.basic-options` for more details. | ||
| 880 | + | ||
| 881 | + - The :samp:`--recompress-flate` instructs | ||
| 882 | + :command:`qpdf` to recompress streams that are | ||
| 883 | + already compressed with ``/FlateDecode``. Useful with | ||
| 884 | + :samp:`--compression-level`. | ||
| 885 | + | ||
| 886 | + - The | ||
| 887 | + :samp:`--compression-level={level}` | ||
| 888 | + sets the zlib compression level used for any streams compressed | ||
| 889 | + by ``/FlateDecode``. Most effective when combined with | ||
| 890 | + :samp:`--recompress-flate`. | ||
| 891 | + | ||
| 892 | + - Library Enhancements | ||
| 893 | + | ||
| 894 | + - A new namespace ``QIntC``, provided by | ||
| 895 | + :file:`qpdf/QIntC.hh`, provides safe | ||
| 896 | + conversion methods between different integer types. These | ||
| 897 | + conversion methods do range checking to ensure that the cast | ||
| 898 | + can be performed with no loss of information. Every use of | ||
| 899 | + ``static_cast`` in the library was inspected to see if it could | ||
| 900 | + use one of these safe converters instead. See :ref:`ref.casting` for additional details. | ||
| 901 | + | ||
| 902 | + - Method ``QPDF::anyWarnings`` tells whether there have been any | ||
| 903 | + warnings without clearing the list of warnings. | ||
| 904 | + | ||
| 905 | + - Method ``QPDF::closeInputSource`` closes or otherwise releases | ||
| 906 | + the input source. This enables the input file to be deleted or | ||
| 907 | + renamed. | ||
| 908 | + | ||
| 909 | + - New methods have been added to ``QUtil`` for converting back | ||
| 910 | + and forth between strings and unsigned integers: | ||
| 911 | + ``uint_to_string``, ``uint_to_string_base``, | ||
| 912 | + ``string_to_uint``, and ``string_to_ull``. | ||
| 913 | + | ||
| 914 | + - New methods have been added to ``QPDFObjectHandle`` that return | ||
| 915 | + the value of ``Integer`` objects as ``int`` or ``unsigned int`` | ||
| 916 | + with range checking and sensible fallback values, and a new | ||
| 917 | + method was added to return an unsigned value. This makes it | ||
| 918 | + easier to write code that is safe from unintentional data loss. | ||
| 919 | + Functions: ``getUIntValue``, ``getIntValueAsInt``, | ||
| 920 | + ``getUIntValueAsUInt``. | ||
| 921 | + | ||
| 922 | + - When parsing content streams with | ||
| 923 | + ``QPDFObjectHandle::ParserCallbacks``, in place of the method | ||
| 924 | + ``handleObject(QPDFObjectHandle)``, the developer may override | ||
| 925 | + ``handleObject(QPDFObjectHandle, size_t offset, size_t | ||
| 926 | + length)``. If this method is defined, it will | ||
| 927 | + be invoked with the object along with its offset and length | ||
| 928 | + within the overall contents being parsed. Intervening spaces | ||
| 929 | + and comments are not included in offset and length. | ||
| 930 | + Additionally, a new method ``contentSize(size_t)`` may be | ||
| 931 | + implemented. If present, it will be called prior to the first | ||
| 932 | + call to ``handleObject`` with the total size in bytes of the | ||
| 933 | + combined contents. | ||
| 934 | + | ||
| 935 | + - New methods ``QPDF::userPasswordMatched`` and | ||
| 936 | + ``QPDF::ownerPasswordMatched`` have been added to enable a | ||
| 937 | + caller to determine whether the supplied password was the user | ||
| 938 | + password, the owner password, or both. This information is also | ||
| 939 | + displayed by :command:`qpdf --show-encryption` | ||
| 940 | + and :command:`qpdf --check`. | ||
| 941 | + | ||
| 942 | + - Static method ``Pl_Flate::setCompressionLevel`` can be called | ||
| 943 | + to set the zlib compression level globally used by all | ||
| 944 | + instances of Pl_Flate in deflate mode. | ||
| 945 | + | ||
| 946 | + - The method ``QPDFWriter::setRecompressFlate`` can be called to | ||
| 947 | + tell ``QPDFWriter`` to uncompress and recompress streams | ||
| 948 | + already compressed with ``/FlateDecode``. | ||
| 949 | + | ||
| 950 | + - The underlying implementation of QPDF arrays has been enhanced | ||
| 951 | + to be much more memory efficient when dealing with arrays with | ||
| 952 | + lots of nulls. This enables qpdf to use drastically less memory | ||
| 953 | + for certain types of files. | ||
| 954 | + | ||
| 955 | + - When traversing the pages tree, if nodes are encountered with | ||
| 956 | + invalid types, the types are fixed, and a warning is issued. | ||
| 957 | + | ||
| 958 | + - A new helper method ``QUtil::read_file_into_memory`` was added. | ||
| 959 | + | ||
| 960 | + - All conditions previously reported by | ||
| 961 | + ``QPDF::checkLinearization()`` as errors are now presented as | ||
| 962 | + warnings. | ||
| 963 | + | ||
| 964 | + - Name tokens containing the ``#`` character not preceded by two | ||
| 965 | + hexadecimal digits, which is invalid in PDF 1.2 and above, are | ||
| 966 | + properly handled by the library: a warning is generated, and | ||
| 967 | + the name token is properly preserved, even if invalid, in the | ||
| 968 | + output. See :file:`ChangeLog` for a more | ||
| 969 | + complete description of this change. | ||
| 970 | + | ||
| 971 | + - Bug Fixes | ||
| 972 | + | ||
| 973 | + - A small handful of memory issues, assertion failures, and | ||
| 974 | + unhandled exceptions that could occur on badly mangled input | ||
| 975 | + files have been fixed. Most of these problems were found by | ||
| 976 | + Google's OSS-Fuzz project. | ||
| 977 | + | ||
| 978 | + - When :command:`qpdf --check` or | ||
| 979 | + :command:`qpdf --check-linearization` encounters | ||
| 980 | + a file with linearization warnings but not errors, it now | ||
| 981 | + properly exits with exit code 3 instead of 2. | ||
| 982 | + | ||
| 983 | + - The :samp:`--completion-bash` and | ||
| 984 | + :samp:`--completion-zsh` options now work | ||
| 985 | + properly when qpdf is invoked as an AppImage. | ||
| 986 | + | ||
| 987 | + - Calling ``QPDFWriter::set*EncryptionParameters`` on a | ||
| 988 | + ``QPDFWriter`` object whose output filename has not yet been | ||
| 989 | + set no longer produces a segmentation fault. | ||
| 990 | + | ||
| 991 | + - When reading encrypted files, follow the spec more closely | ||
| 992 | + regarding encryption key length. This allows qpdf to open | ||
| 993 | + encrypted files in most cases when they have invalid or missing | ||
| 994 | + /Length keys in the encryption dictionary. | ||
| 995 | + | ||
| 996 | + - Build Changes | ||
| 997 | + | ||
| 998 | + - On platforms that support it, qpdf now builds with | ||
| 999 | + :samp:`-fvisibility=hidden`. If you build qpdf | ||
| 1000 | + with your own build system, this is now safe to use. This | ||
| 1001 | + prevents methods that are not part of the public API from being | ||
| 1002 | + exported by the shared library, and makes qpdf's ELF shared | ||
| 1003 | + libraries (used on Linux, MacOS, and most other UNIX flavors) | ||
| 1004 | + behave more like the Windows DLL. Since the DLL already behaves | ||
| 1005 | + in much this way, it is unlikely that there are any methods | ||
| 1006 | + that were accidentally not exported. However, with ELF shared | ||
| 1007 | + libraries, typeinfo for some classes has to be explicitly | ||
| 1008 | + exported. If there are problems in dynamically linked code | ||
| 1009 | + catching exceptions or subclassing, this could be the reason. | ||
| 1010 | + If you see this, please report a bug at | ||
| 1011 | + https://github.com/qpdf/qpdf/issues/. | ||
| 1012 | + | ||
| 1013 | + - QPDF is now compiled with integer conversion and sign | ||
| 1014 | + conversion warnings enabled. Numerous changes were made to the | ||
| 1015 | + library to make this safe. | ||
| 1016 | + | ||
| 1017 | + - QPDF's :command:`make install` target explicitly | ||
| 1018 | + specifies the mode to use when installing files instead of | ||
| 1019 | + relying the user's umask. It was previously doing this for some | ||
| 1020 | + files but not others. | ||
| 1021 | + | ||
| 1022 | + - If :command:`pkg-config` is available, use it to | ||
| 1023 | + locate :file:`libjpeg` and | ||
| 1024 | + :file:`zlib` dependencies, falling back on | ||
| 1025 | + old behavior if unsuccessful. | ||
| 1026 | + | ||
| 1027 | + - Other Notes | ||
| 1028 | + | ||
| 1029 | + - QPDF has been fully integrated into `Google's OSS-Fuzz | ||
| 1030 | + project <https://github.com/google/oss-fuzz>`__. This project | ||
| 1031 | + exercises code with randomly mutated inputs and is great for | ||
| 1032 | + discovering hidden security crashes and security issues. | ||
| 1033 | + Several bugs found by oss-fuzz have already been fixed in qpdf. | ||
| 1034 | + | ||
| 1035 | +8.4.2: May 18, 2019 | ||
| 1036 | + This release has just one change: correction of a buffer overrun in | ||
| 1037 | + the Windows code used to open files. Windows users should take this | ||
| 1038 | + update. There are no code changes that affect non-Windows releases. | ||
| 1039 | + | ||
| 1040 | +8.4.1: April 27, 2019 | ||
| 1041 | + - Enhancements | ||
| 1042 | + | ||
| 1043 | + - When :command:`qpdf --version` is run, it will | ||
| 1044 | + detect if the qpdf CLI was built with a different version of | ||
| 1045 | + qpdf than the library, which may indicate a problem with the | ||
| 1046 | + installation. | ||
| 1047 | + | ||
| 1048 | + - New option :samp:`--remove-page-labels` will | ||
| 1049 | + remove page labels before generating output. This used to | ||
| 1050 | + happen if you ran :command:`qpdf --empty --pages .. | ||
| 1051 | + --`, but the behavior changed in qpdf 8.3.0. This | ||
| 1052 | + option enables people who were relying on the old behavior to | ||
| 1053 | + get it again. | ||
| 1054 | + | ||
| 1055 | + - New option | ||
| 1056 | + :samp:`--keep-files-open-threshold={count}` | ||
| 1057 | + can be used to override number of files that qpdf will use to | ||
| 1058 | + trigger the behavior of not keeping all files open when merging | ||
| 1059 | + files. This may be necessary if your system allows fewer than | ||
| 1060 | + the default value of 200 files to be open at the same time. | ||
| 1061 | + | ||
| 1062 | + - Bug Fixes | ||
| 1063 | + | ||
| 1064 | + - Handle Unicode characters in filenames on Windows. The changes | ||
| 1065 | + to support Unicode on the CLI in Windows broke Unicode | ||
| 1066 | + filenames for Windows. | ||
| 1067 | + | ||
| 1068 | + - Slightly tighten logic that determines whether an object is a | ||
| 1069 | + page. This should resolve problems in some rare files where | ||
| 1070 | + some non-page objects were passing qpdf's test for whether | ||
| 1071 | + something was a page, thus causing them to be erroneously lost | ||
| 1072 | + during page splitting operations. | ||
| 1073 | + | ||
| 1074 | + - Revert change that included preservation of outlines | ||
| 1075 | + (bookmarks) in :samp:`--split-pages`. The way | ||
| 1076 | + it was implemented in 8.3.0 and 8.4.0 caused a very significant | ||
| 1077 | + degradation of performance for splitting certain files. A | ||
| 1078 | + future release of qpdf may re-introduce the behavior in a more | ||
| 1079 | + performant and also more correct fashion. | ||
| 1080 | + | ||
| 1081 | + - In JSON mode, add missing leading 0 to decimal values between | ||
| 1082 | + -1 and 1 even if not present in the input. The JSON | ||
| 1083 | + specification requires the leading 0. The PDF specification | ||
| 1084 | + does not. | ||
| 1085 | + | ||
| 1086 | +8.4.0: February 1, 2019 | ||
| 1087 | + - Command-line Enhancements | ||
| 1088 | + | ||
| 1089 | + - *Non-compatible CLI change:* The qpdf command-line tool | ||
| 1090 | + interprets passwords given at the command-line differently from | ||
| 1091 | + previous releases when the passwords contain non-ASCII | ||
| 1092 | + characters. In some cases, the behavior differs from previous | ||
| 1093 | + releases. For a discussion of the current behavior, please see | ||
| 1094 | + :ref:`ref.unicode-passwords`. The | ||
| 1095 | + incompatibilities are as follows: | ||
| 1096 | + | ||
| 1097 | + - On Windows, qpdf now receives all command-line options as | ||
| 1098 | + Unicode strings if it can figure out the appropriate | ||
| 1099 | + compile/link options. This is enabled at least for MSVC and | ||
| 1100 | + mingw builds. That means that if non-ASCII strings are | ||
| 1101 | + passed to the qpdf CLI in Windows, qpdf will now correctly | ||
| 1102 | + receive them. In the past, they would have either been | ||
| 1103 | + encoded as Windows code page 1252 (also known as "Windows | ||
| 1104 | + ANSI" or as something unintelligible. In almost all cases, | ||
| 1105 | + qpdf is able to properly interpret Unicode arguments now, | ||
| 1106 | + whereas in the past, it would almost never interpret them | ||
| 1107 | + properly. The result is that non-ASCII passwords given to | ||
| 1108 | + the qpdf CLI on Windows now have a much greater chance of | ||
| 1109 | + creating PDF files that can be opened by a variety of | ||
| 1110 | + readers. In the past, usually files encrypted from the | ||
| 1111 | + Windows CLI using non-ASCII passwords would not be readable | ||
| 1112 | + by most viewers. Note that the current version of qpdf is | ||
| 1113 | + able to decrypt files that it previously created using the | ||
| 1114 | + previously supplied password. | ||
| 1115 | + | ||
| 1116 | + - The PDF specification requires passwords to be encoded as | ||
| 1117 | + UTF-8 for 256-bit encryption and with PDF Doc encoding for | ||
| 1118 | + 40-bit or 128-bit encryption. Older versions of qpdf left it | ||
| 1119 | + up to the user to provide passwords with the correct | ||
| 1120 | + encoding. The qpdf CLI now detects when a password is given | ||
| 1121 | + with UTF-8 encoding and automatically transcodes it to what | ||
| 1122 | + the PDF spec requires. While this is almost always the | ||
| 1123 | + correct behavior, it is possible to override the behavior if | ||
| 1124 | + there is some reason to do so. This is discussed in more | ||
| 1125 | + depth in :ref:`ref.unicode-passwords`. | ||
| 1126 | + | ||
| 1127 | + - New options | ||
| 1128 | + :samp:`--externalize-inline-images`, | ||
| 1129 | + :samp:`--ii-min-bytes`, and | ||
| 1130 | + :samp:`--keep-inline-images` control qpdf's | ||
| 1131 | + handling of inline images and possible conversion of them to | ||
| 1132 | + regular images. By default, | ||
| 1133 | + :samp:`--optimize-images` now also applies to | ||
| 1134 | + inline images. These options are discussed in :ref:`ref.advanced-transformation`. | ||
| 1135 | + | ||
| 1136 | + - Add options :samp:`--overlay` and | ||
| 1137 | + :samp:`--underlay` for overlaying or | ||
| 1138 | + underlaying pages of other files onto output pages. See | ||
| 1139 | + :ref:`ref.overlay-underlay` for | ||
| 1140 | + details. | ||
| 1141 | + | ||
| 1142 | + - When opening an encrypted file with a password, if the | ||
| 1143 | + specified password doesn't work and the password contains any | ||
| 1144 | + non-ASCII characters, qpdf will try a number of alternative | ||
| 1145 | + passwords to try to compensate for possible character encoding | ||
| 1146 | + errors. This behavior can be suppressed with the | ||
| 1147 | + :samp:`--suppress-password-recovery` option. | ||
| 1148 | + See :ref:`ref.unicode-passwords` for a full | ||
| 1149 | + discussion. | ||
| 1150 | + | ||
| 1151 | + - Add the :samp:`--password-mode` option to | ||
| 1152 | + fine-tune how qpdf interprets password arguments, especially | ||
| 1153 | + when they contain non-ASCII characters. See :ref:`ref.unicode-passwords` for more information. | ||
| 1154 | + | ||
| 1155 | + - In the :samp:`--pages` option, it is now | ||
| 1156 | + possible to copy the same page more than once from the same | ||
| 1157 | + file without using the previous workaround of specifying two | ||
| 1158 | + different paths to the same file. | ||
| 1159 | + | ||
| 1160 | + - In the :samp:`--pages` option, allow use of "." | ||
| 1161 | + as a shortcut for the primary input file. That way, you can do | ||
| 1162 | + :command:`qpdf in.pdf --pages . 1-2 -- out.pdf` | ||
| 1163 | + instead of having to repeat :file:`in.pdf` | ||
| 1164 | + in the command. | ||
| 1165 | + | ||
| 1166 | + - When encrypting with 128-bit and 256-bit encryption, new | ||
| 1167 | + encryption options :samp:`--assemble`, | ||
| 1168 | + :samp:`--annotate`, | ||
| 1169 | + :samp:`--form`, and | ||
| 1170 | + :samp:`--modify-other` allow more fine-grained | ||
| 1171 | + granularity in configuring options. Before, the | ||
| 1172 | + :samp:`--modify` option only configured certain | ||
| 1173 | + predefined groups of permissions. | ||
| 1174 | + | ||
| 1175 | + - Bug Fixes and Enhancements | ||
| 1176 | + | ||
| 1177 | + - *Potential data-loss bug:* Versions of qpdf between 8.1.0 and | ||
| 1178 | + 8.3.0 had a bug that could cause page splitting and merging | ||
| 1179 | + operations to drop some font or image resources if the PDF | ||
| 1180 | + file's internal structure shared these resource lists across | ||
| 1181 | + pages and if some but not all of the pages in the output did | ||
| 1182 | + not reference all the fonts and images. Using the | ||
| 1183 | + :samp:`--preserve-unreferenced-resources` | ||
| 1184 | + option would work around the incorrect behavior. This bug was | ||
| 1185 | + the result of a typo in the code and a deficiency in the test | ||
| 1186 | + suite. The case that triggered the error was known, just not | ||
| 1187 | + handled properly. This case is now exercised in qpdf's test | ||
| 1188 | + suite and properly handled. | ||
| 1189 | + | ||
| 1190 | + - When optimizing images, detect and refuse to optimize images | ||
| 1191 | + that can't be converted to JPEG because of bit depth or color | ||
| 1192 | + space. | ||
| 1193 | + | ||
| 1194 | + - Linearization and page manipulation APIs now detect and recover | ||
| 1195 | + from files that have duplicate Page objects in the pages tree. | ||
| 1196 | + | ||
| 1197 | + - Using older option | ||
| 1198 | + :samp:`--stream-data=compress` with object | ||
| 1199 | + streams, object streams and xref streams were not compressed. | ||
| 1200 | + | ||
| 1201 | + - When the tokenizer returns inline image tokens, delimiters | ||
| 1202 | + following ``ID`` and ``EI`` operators are no longer excluded. | ||
| 1203 | + This makes it possible to reliably extract the actual image | ||
| 1204 | + data. | ||
| 1205 | + | ||
| 1206 | + - Library Enhancements | ||
| 1207 | + | ||
| 1208 | + - Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to | ||
| 1209 | + convert inline images to regular images. | ||
| 1210 | + | ||
| 1211 | + - Add method ``QUtil::possible_repaired_encodings()`` to generate | ||
| 1212 | + a list of strings that represent other ways the given string | ||
| 1213 | + could have been encoded. This is the method the QPDF CLI uses | ||
| 1214 | + to generate the strings it tries when recovering incorrectly | ||
| 1215 | + encoded Unicode passwords. | ||
| 1216 | + | ||
| 1217 | + - Add new versions of | ||
| 1218 | + ``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow | ||
| 1219 | + more granular setting of permissions bits. See | ||
| 1220 | + :file:`QPDFWriter.hh` for details. | ||
| 1221 | + | ||
| 1222 | + - Add new versions of the transcoders from UTF-8 to single-byte | ||
| 1223 | + coding systems in ``QUtil`` that report success or failure | ||
| 1224 | + rather than just substituting a specified unknown character. | ||
| 1225 | + | ||
| 1226 | + - Add method ``QUtil::analyze_encoding()`` to determine whether a | ||
| 1227 | + string has high-bit characters and is appears to be UTF-16 or | ||
| 1228 | + valid UTF-8 encoding. | ||
| 1229 | + | ||
| 1230 | + - Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to | ||
| 1231 | + copy a new page that is a "shallow copy" of a page. The | ||
| 1232 | + resulting object is an indirect object ready to be passed to | ||
| 1233 | + ``QPDFPageDocumentHelper::addPage()`` for either the original | ||
| 1234 | + ``QPDF`` object or a different one. This is what the | ||
| 1235 | + :command:`qpdf` command-line tool uses to copy | ||
| 1236 | + the same page multiple times from the same file during | ||
| 1237 | + splitting and merging operations. | ||
| 1238 | + | ||
| 1239 | + - Add method ``QPDF::getUniqueId()``, which returns a unique | ||
| 1240 | + identifier for the given QPDF object. The identifier will be | ||
| 1241 | + unique across the life of the application. The returned value | ||
| 1242 | + can be safely used as a map key. | ||
| 1243 | + | ||
| 1244 | + - Add method ``QPDF::setImmediateCopyFrom``. This further | ||
| 1245 | + enhances qpdf's ability to allow a ``QPDF`` object from which | ||
| 1246 | + objects are being copied to go out of scope before the | ||
| 1247 | + destination object is written. If you call this method on a | ||
| 1248 | + ``QPDF`` instances, objects copied *from* this instance will be | ||
| 1249 | + copied immediately instead of lazily. This option uses more | ||
| 1250 | + memory but allows the source object to go out of scope before | ||
| 1251 | + the destination object is written in all cases. See comments in | ||
| 1252 | + :file:`QPDF.hh` for details. | ||
| 1253 | + | ||
| 1254 | + - Add method ``QPDFPageObjectHelper::getAttribute`` for | ||
| 1255 | + retrieving an attribute from the page dictionary taking | ||
| 1256 | + inheritance into consideration, and optionally making a copy if | ||
| 1257 | + your intention is to modify the attribute. | ||
| 1258 | + | ||
| 1259 | + - Fix long-standing limitation of | ||
| 1260 | + ``QPDFPageObjectHelper::getPageImages`` so that it now properly | ||
| 1261 | + reports images from inherited resources dictionaries, | ||
| 1262 | + eliminating the need to call | ||
| 1263 | + ``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in | ||
| 1264 | + this case. | ||
| 1265 | + | ||
| 1266 | + - Add method ``QPDFObjectHandle::getUniqueResourceName`` for | ||
| 1267 | + finding an unused name in a resource dictionary. | ||
| 1268 | + | ||
| 1269 | + - Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for | ||
| 1270 | + generating a form XObject equivalent to a page. The resulting | ||
| 1271 | + object can be used in the same file or copied to another file | ||
| 1272 | + with ``copyForeignObject``. This can be useful for implementing | ||
| 1273 | + underlay, overlay, n-up, thumbnails, or any other functionality | ||
| 1274 | + requiring replication of pages in other contexts. | ||
| 1275 | + | ||
| 1276 | + - Add method ``QPDFPageObjectHelper::placeFormXObject`` for | ||
| 1277 | + generating content stream text that places a given form XObject | ||
| 1278 | + on a page, centered and fit within a specified rectangle. This | ||
| 1279 | + method takes care of computing the proper transformation matrix | ||
| 1280 | + and may optionally compensate for rotation or scaling of the | ||
| 1281 | + destination page. | ||
| 1282 | + | ||
| 1283 | + - Build Improvements | ||
| 1284 | + | ||
| 1285 | + - Add new configure option | ||
| 1286 | + :samp:`--enable-avoid-windows-handle`, which | ||
| 1287 | + causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be | ||
| 1288 | + defined. When defined, qpdf will avoid referencing the Windows | ||
| 1289 | + ``HANDLE`` type, which is disallowed with certain versions of | ||
| 1290 | + the Windows SDK. | ||
| 1291 | + | ||
| 1292 | + - For Windows builds, attempt to determine what options, if any, | ||
| 1293 | + have to be passed to the compiler and linker to enable use of | ||
| 1294 | + ``wmain``. This causes the preprocessor symbol | ||
| 1295 | + ``WINDOWS_WMAIN`` to be defined. If you do your own builds with | ||
| 1296 | + other compilers, you can define this symbol to cause ``wmain`` | ||
| 1297 | + to be used. This is needed to allow the Windows | ||
| 1298 | + :command:`qpdf` command to receive Unicode | ||
| 1299 | + command-line options. | ||
| 1300 | + | ||
| 1301 | +8.3.0: January 7, 2019 | ||
| 1302 | + - Command-line Enhancements | ||
| 1303 | + | ||
| 1304 | + - Shell completion: you can now use eval :command:`$(qpdf | ||
| 1305 | + --completion-bash)` and eval :command:`$(qpdf | ||
| 1306 | + --completion-zsh)` to enable shell completion for | ||
| 1307 | + bash and zsh. | ||
| 1308 | + | ||
| 1309 | + - Page numbers (also known as page labels) are now preserved when | ||
| 1310 | + merging and splitting files with the | ||
| 1311 | + :samp:`--pages` and | ||
| 1312 | + :samp:`--split-pages` options. | ||
| 1313 | + | ||
| 1314 | + - Bookmarks are partially preserved when splitting pages with the | ||
| 1315 | + :samp:`--split-pages` option. Specifically, the | ||
| 1316 | + outlines dictionary and some supporting metadata are copied | ||
| 1317 | + into the split files. The result is that all bookmarks from the | ||
| 1318 | + original file appear, those that point to pages that are | ||
| 1319 | + preserved work, and those that point to pages that are not | ||
| 1320 | + preserved don't do anything. This is an interim step toward | ||
| 1321 | + proper support for bookmarks in splitting and merging | ||
| 1322 | + operations. | ||
| 1323 | + | ||
| 1324 | + - Page collation: add new option | ||
| 1325 | + :samp:`--collate`. When specified, the | ||
| 1326 | + semantics of :samp:`--pages` change from | ||
| 1327 | + concatenation to collation. See :ref:`ref.page-selection` for examples and discussion. | ||
| 1328 | + | ||
| 1329 | + - Generation of information in JSON format, primarily to | ||
| 1330 | + facilitate use of qpdf from languages other than C++. Add new | ||
| 1331 | + options :samp:`--json`, | ||
| 1332 | + :samp:`--json-key`, and | ||
| 1333 | + :samp:`--json-object` to generate a JSON | ||
| 1334 | + representation of the PDF file. Run :command:`qpdf | ||
| 1335 | + --json-help` to get a description of the JSON | ||
| 1336 | + format. For more information, see :ref:`ref.json`. | ||
| 1337 | + | ||
| 1338 | + - The :samp:`--generate-appearances` flag will | ||
| 1339 | + cause qpdf to generate appearances for form fields if the PDF | ||
| 1340 | + file indicates that form field appearances are out of date. | ||
| 1341 | + This can happen when PDF forms are filled in by a program that | ||
| 1342 | + doesn't know how to regenerate the appearances of the filled-in | ||
| 1343 | + fields. | ||
| 1344 | + | ||
| 1345 | + - The :samp:`--flatten-annotations` flag can be | ||
| 1346 | + used to *flatten* annotations, including form fields. | ||
| 1347 | + Ordinarily, annotations are drawn separately from the page. | ||
| 1348 | + Flattening annotations is the process of combining their | ||
| 1349 | + appearances into the page's contents. You might want to do this | ||
| 1350 | + if you are going to rotate or combine pages using a tool that | ||
| 1351 | + doesn't understand about annotations. You may also want to use | ||
| 1352 | + :samp:`--generate-appearances` when using this | ||
| 1353 | + flag since annotations for outdated form fields are not | ||
| 1354 | + flattened as that would cause loss of information. | ||
| 1355 | + | ||
| 1356 | + - The :samp:`--optimize-images` flag tells qpdf | ||
| 1357 | + to recompresses every image using DCT (JPEG) compression as | ||
| 1358 | + long as the image is not already compressed with lossy | ||
| 1359 | + compression and recompressing the image reduces its size. The | ||
| 1360 | + additional options :samp:`--oi-min-width`, | ||
| 1361 | + :samp:`--oi-min-height`, and | ||
| 1362 | + :samp:`--oi-min-area` prevent recompression of | ||
| 1363 | + images whose width, height, or pixel area (widthย รย height) are | ||
| 1364 | + below a specified threshold. | ||
| 1365 | + | ||
| 1366 | + - The :samp:`--show-object` option can now be | ||
| 1367 | + given as :samp:`--show-object=trailer` to show | ||
| 1368 | + the trailer dictionary. | ||
| 1369 | + | ||
| 1370 | + - Bug Fixes and Enhancements | ||
| 1371 | + | ||
| 1372 | + - QPDF now automatically detects and recovers from dangling | ||
| 1373 | + references. If a PDF file contained an indirect reference to a | ||
| 1374 | + non-existent object, which is valid, when adding a new object | ||
| 1375 | + to the file, it was possible for the new object to take the | ||
| 1376 | + object ID of the dangling reference, thereby causing the | ||
| 1377 | + dangling reference to point to the new object. This case is now | ||
| 1378 | + prevented. | ||
| 1379 | + | ||
| 1380 | + - Fixes to form field setting code: strings are always written in | ||
| 1381 | + UTF-16 format, and checkboxes and radio buttons are handled | ||
| 1382 | + properly with respect to synchronization of values and | ||
| 1383 | + appearance states. | ||
| 1384 | + | ||
| 1385 | + - The ``QPDF::checkLinearization()`` no longer causes the program | ||
| 1386 | + to crash when it detects problems with linearization data. | ||
| 1387 | + Instead, it issues a normal warning or error. | ||
| 1388 | + | ||
| 1389 | + - Ordinarily qpdf treats an argument of the form | ||
| 1390 | + :samp:`@file` to mean that command-line options | ||
| 1391 | + should be read from :file:`file`. Now, if | ||
| 1392 | + :file:`file` does not exist but | ||
| 1393 | + :file:`@file` does, qpdf will treat | ||
| 1394 | + :file:`@file` as a regular option. This | ||
| 1395 | + makes it possible to work more easily with PDF files whose | ||
| 1396 | + names happen to start with the ``@`` character. | ||
| 1397 | + | ||
| 1398 | + - Library Enhancements | ||
| 1399 | + | ||
| 1400 | + - Remove the restriction in most cases that the source QPDF | ||
| 1401 | + object used in a ``QPDF::copyForeignObject`` call has to stick | ||
| 1402 | + around until the destination QPDF is written. The exceptional | ||
| 1403 | + case is when the source stream gets is data using a | ||
| 1404 | + QPDFObjectHandle::StreamDataProvider. For a more in-depth | ||
| 1405 | + discussion, see comments around ``copyForeignObject`` in | ||
| 1406 | + :file:`QPDF.hh`. | ||
| 1407 | + | ||
| 1408 | + - Add new method ``QPDFWriter::getFinalVersion()``, which returns | ||
| 1409 | + the PDF version that will ultimately be written to the final | ||
| 1410 | + file. See comments in :file:`QPDFWriter.hh` | ||
| 1411 | + for some restrictions on its use. | ||
| 1412 | + | ||
| 1413 | + - Add several methods for transcoding strings to some of the | ||
| 1414 | + character sets used in PDF files: ``QUtil::utf8_to_ascii``, | ||
| 1415 | + ``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and | ||
| 1416 | + ``QUtil::utf8_to_utf16``. For the single-byte encodings that | ||
| 1417 | + support only a limited character sets, these methods replace | ||
| 1418 | + unsupported characters with a specified substitute. | ||
| 1419 | + | ||
| 1420 | + - Add new methods to ``QPDFAnnotationObjectHelper`` and | ||
| 1421 | + ``QPDFFormFieldObjectHelper`` for querying flags and | ||
| 1422 | + interpretation of different field types. Define constants in | ||
| 1423 | + :file:`qpdf/Constants.h` to help with | ||
| 1424 | + interpretation of flag values. | ||
| 1425 | + | ||
| 1426 | + - Add new methods | ||
| 1427 | + ``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and | ||
| 1428 | + ``QPDFFormFieldObjectHelper::generateAppearance`` for | ||
| 1429 | + generating appearance streams. See discussion in | ||
| 1430 | + :file:`QPDFFormFieldObjectHelper.hh` for | ||
| 1431 | + limitations. | ||
| 1432 | + | ||
| 1433 | + - Add two new helper functions for dealing with resource | ||
| 1434 | + dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns | ||
| 1435 | + a list of all second-level keys, which correspond to the names | ||
| 1436 | + of resources, and ``QPDFObjectHandle::mergeResources()`` merges | ||
| 1437 | + two resources dictionaries as long as they have non-conflicting | ||
| 1438 | + keys. These methods are useful for certain types of objects | ||
| 1439 | + that resolve resources from multiple places, such as form | ||
| 1440 | + fields. | ||
| 1441 | + | ||
| 1442 | + - Add methods ``QPDFPageDocumentHelper::flattenAnnotations()`` | ||
| 1443 | + and | ||
| 1444 | + ``QPDFAnnotationObjectHelper::getPageContentForAppearance()`` | ||
| 1445 | + for handling low-level details of annotation flattening. | ||
| 1446 | + | ||
| 1447 | + - Add new helper classes: ``QPDFOutlineDocumentHelper``, | ||
| 1448 | + ``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``, | ||
| 1449 | + ``QPDFNameTreeObjectHelper``, and | ||
| 1450 | + ``QPDFNumberTreeObjectHelper``. | ||
| 1451 | + | ||
| 1452 | + - Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON | ||
| 1453 | + representation of the object. Call ``serialize()`` on the | ||
| 1454 | + result to convert it to a string. | ||
| 1455 | + | ||
| 1456 | + - Add a simple JSON serializer. This is not a complete or | ||
| 1457 | + general-purpose JSON library. It allows assembly and | ||
| 1458 | + serialization of JSON structures with some restrictions, which | ||
| 1459 | + are described in the header file. This is the serializer used | ||
| 1460 | + by qpdf's new JSON representation. | ||
| 1461 | + | ||
| 1462 | + - Add new ``QPDFObjectHandle::Matrix`` class along with a few | ||
| 1463 | + convenience methods for dealing with six-element numerical | ||
| 1464 | + arrays as matrices. | ||
| 1465 | + | ||
| 1466 | + - Add new method ``QPDFObjectHandle::wrapInArray``, which returns | ||
| 1467 | + the object itself if it is an array, or an array containing the | ||
| 1468 | + object otherwise. This is a common construct in PDF. This | ||
| 1469 | + method prevents you from having to explicitly test whether | ||
| 1470 | + something is a single element or an array. | ||
| 1471 | + | ||
| 1472 | + - Build Improvements | ||
| 1473 | + | ||
| 1474 | + - It is no longer necessary to run | ||
| 1475 | + :command:`autogen.sh` to build from a pristine | ||
| 1476 | + checkout. Automatically generated files are now committed so | ||
| 1477 | + that it is possible to build on platforms without autoconf | ||
| 1478 | + directly from a clean checkout of the repository. The | ||
| 1479 | + :command:`configure` script detects if the files | ||
| 1480 | + are out of date when it also determines that the tools are | ||
| 1481 | + present to regenerate them. | ||
| 1482 | + | ||
| 1483 | + - Pull requests and the master branch are now built automatically | ||
| 1484 | + in `Azure | ||
| 1485 | + Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is | ||
| 1486 | + free for open source projects. The build includes Linux, mac, | ||
| 1487 | + Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage | ||
| 1488 | + build. Official qpdf releases are now built with Azure | ||
| 1489 | + Pipelines. | ||
| 1490 | + | ||
| 1491 | + - Notes for Packagers | ||
| 1492 | + | ||
| 1493 | + - A new section has been added to the documentation with notes | ||
| 1494 | + for packagers. Please see :ref:`ref.packaging`. | ||
| 1495 | + | ||
| 1496 | + - The qpdf detects out-of-date automatically generated files. If | ||
| 1497 | + your packaging system automatically refreshes libtool or | ||
| 1498 | + autoconf files, it could cause this check to fail. To avoid | ||
| 1499 | + this problem, pass | ||
| 1500 | + :samp:`--disable-check-autofiles` to | ||
| 1501 | + :command:`configure`. | ||
| 1502 | + | ||
| 1503 | + - If you would like to have qpdf completion enabled | ||
| 1504 | + automatically, you can install completion files in the | ||
| 1505 | + distribution's default location. You can find sample completion | ||
| 1506 | + files to install in the :file:`completions` | ||
| 1507 | + directory. | ||
| 1508 | + | ||
| 1509 | +8.2.1: August 18, 2018 | ||
| 1510 | + - Command-line Enhancements | ||
| 1511 | + | ||
| 1512 | + - Add | ||
| 1513 | + :samp:`--keep-files-open={[yn]}` | ||
| 1514 | + to override default determination of whether to keep files open | ||
| 1515 | + when merging. Please see the discussion of | ||
| 1516 | + :samp:`--keep-files-open` in :ref:`ref.basic-options` for additional details. | ||
| 1517 | + | ||
| 1518 | +8.2.0: August 16, 2018 | ||
| 1519 | + - Command-line Enhancements | ||
| 1520 | + | ||
| 1521 | + - Add :samp:`--no-warn` option to suppress | ||
| 1522 | + issuing warning messages. If there are any conditions that | ||
| 1523 | + would have caused warnings to be issued, the exit status is | ||
| 1524 | + still 3. | ||
| 1525 | + | ||
| 1526 | + - Bug Fixes and Optimizations | ||
| 1527 | + | ||
| 1528 | + - Performance fix: optimize page merging operation to avoid | ||
| 1529 | + unnecessary open/close calls on files being merged. This solves | ||
| 1530 | + a dramatic slow-down that was observed when merging certain | ||
| 1531 | + types of files. | ||
| 1532 | + | ||
| 1533 | + - Optimize how memory was used for the TIFF predictor, | ||
| 1534 | + drastically improving performance and memory usage for files | ||
| 1535 | + containing high-resolution images compressed with Flate using | ||
| 1536 | + the TIFF predictor. | ||
| 1537 | + | ||
| 1538 | + - Bug fix: end of line characters were not properly handled | ||
| 1539 | + inside strings in some cases. | ||
| 1540 | + | ||
| 1541 | + - Bug fix: using :samp:`--progress` on very small | ||
| 1542 | + files could cause an infinite loop. | ||
| 1543 | + | ||
| 1544 | + - API enhancements | ||
| 1545 | + | ||
| 1546 | + - Add new class ``QPDFSystemError``, derived from | ||
| 1547 | + ``std::runtime_error``, which is now thrown by | ||
| 1548 | + ``QUtil::throw_system_error``. This enables the triggering | ||
| 1549 | + ``errno`` value to be retrieved. | ||
| 1550 | + | ||
| 1551 | + - Add ``ClosedFileInputSource::stayOpen`` method, enabling a | ||
| 1552 | + ``ClosedFileInputSource`` to stay open during manually | ||
| 1553 | + indicated periods of high activity, thus reducing the overhead | ||
| 1554 | + of frequent open/close operations. | ||
| 1555 | + | ||
| 1556 | + - Build Changes | ||
| 1557 | + | ||
| 1558 | + - For the mingw builds, change the name of the DLL import library | ||
| 1559 | + from :file:`libqpdf.a` to | ||
| 1560 | + :file:`libqpdf.dll.a` to more accurately | ||
| 1561 | + reflect that it is an import library rather than a static | ||
| 1562 | + library. This potentially clears the way for supporting a | ||
| 1563 | + static library in the future, though presently, the qpdf | ||
| 1564 | + Windows build only builds the DLL and executables. | ||
| 1565 | + | ||
| 1566 | +8.1.0: June 23, 2018 | ||
| 1567 | + - Usability Improvements | ||
| 1568 | + | ||
| 1569 | + - When splitting files, qpdf detects fonts and images that the | ||
| 1570 | + document metadata claims are referenced from a page but are not | ||
| 1571 | + actually referenced and omits them from the output file. This | ||
| 1572 | + change can cause a significant reduction in the size of split | ||
| 1573 | + PDF files for files created by some software packages. In some | ||
| 1574 | + cases, it can also make page splitting slower. Prior versions | ||
| 1575 | + of qpdf would believe the document metadata and sometimes | ||
| 1576 | + include all the images from all the other pages even though the | ||
| 1577 | + pages were no longer present. In the unlikely event that the | ||
| 1578 | + old behavior should be desired, or if you have a case where | ||
| 1579 | + page splitting is very slow, the old behavior (and speed) can | ||
| 1580 | + be enabled by specifying | ||
| 1581 | + :samp:`--preserve-unreferenced-resources`. For | ||
| 1582 | + additional details, please see :ref:`ref.advanced-transformation`. | ||
| 1583 | + | ||
| 1584 | + - When merging multiple PDF files, qpdf no longer leaves all the | ||
| 1585 | + files open. This makes it possible to merge numbers of files | ||
| 1586 | + that may exceed the operating system's limit for the maximum | ||
| 1587 | + number of open files. | ||
| 1588 | + | ||
| 1589 | + - The :samp:`--rotate` option's syntax has been | ||
| 1590 | + extended to make the page range optional. If you specify | ||
| 1591 | + :samp:`--rotate={angle}` | ||
| 1592 | + without specifying a page range, the rotation will be applied | ||
| 1593 | + to all pages. This can be especially useful for adjusting a PDF | ||
| 1594 | + created from a multi-page document that was scanned upside | ||
| 1595 | + down. | ||
| 1596 | + | ||
| 1597 | + - When merging multiple files, the | ||
| 1598 | + :samp:`--verbose` option now prints information | ||
| 1599 | + about each file as it operates on that file. | ||
| 1600 | + | ||
| 1601 | + - When the :samp:`--progress` option is | ||
| 1602 | + specified, qpdf will print a running indicator of its best | ||
| 1603 | + guess at how far through the writing process it is. Note that, | ||
| 1604 | + as with all progress meters, it's an approximation. This option | ||
| 1605 | + is implemented in a way that makes it useful for software that | ||
| 1606 | + uses the qpdf library; see API Enhancements below. | ||
| 1607 | + | ||
| 1608 | + - Bug Fixes | ||
| 1609 | + | ||
| 1610 | + - Properly decrypt files that use revision 3 of the standard | ||
| 1611 | + security handler but use 40 bit keys (even though revision 3 | ||
| 1612 | + supports 128-bit keys). | ||
| 1613 | + | ||
| 1614 | + - Limit depth of nested data structures to prevent crashes from | ||
| 1615 | + certain types of malformed (malicious) PDFs. | ||
| 1616 | + | ||
| 1617 | + - In "newline before endstream" mode, insert the required extra | ||
| 1618 | + newline before the ``endstream`` at the end of object streams. | ||
| 1619 | + This one case was previously omitted. | ||
| 1620 | + | ||
| 1621 | + - API Enhancements | ||
| 1622 | + | ||
| 1623 | + - The first round of higher level "helper" interfaces has been | ||
| 1624 | + introduced. These are designed to provide a more convenient way | ||
| 1625 | + of interacting with certain document features than using | ||
| 1626 | + ``QPDFObjectHandle`` directly. For details on helpers, see | ||
| 1627 | + :ref:`ref.helper-classes`. Specific additional | ||
| 1628 | + interfaces are described below. | ||
| 1629 | + | ||
| 1630 | + - Add two new document helper classes: ``QPDFPageDocumentHelper`` | ||
| 1631 | + for working with pages, and ``QPDFAcroFormDocumentHelper`` for | ||
| 1632 | + working with interactive forms. No old methods have been | ||
| 1633 | + removed, but ``QPDFPageDocumentHelper`` is now the preferred | ||
| 1634 | + way to perform operations on pages rather than calling the old | ||
| 1635 | + methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments | ||
| 1636 | + in the header files direct you to the new interfaces. Please | ||
| 1637 | + see the header files and :file:`ChangeLog` | ||
| 1638 | + for additional details. | ||
| 1639 | + | ||
| 1640 | + - Add three new object helper class: ``QPDFPageObjectHelper`` for | ||
| 1641 | + pages, ``QPDFFormFieldObjectHelper`` for interactive form | ||
| 1642 | + fields, and ``QPDFAnnotationObjectHelper`` for annotations. All | ||
| 1643 | + three classes are fairly sparse at the moment, but they have | ||
| 1644 | + some useful, basic functionality. | ||
| 1645 | + | ||
| 1646 | + - A new example program | ||
| 1647 | + :file:`examples/pdf-set-form-values.cc` has | ||
| 1648 | + been added that illustrates use of the new document and object | ||
| 1649 | + helpers. | ||
| 1650 | + | ||
| 1651 | + - The method ``QPDFWriter::registerProgressReporter`` has been | ||
| 1652 | + added. This method allows you to register a function that is | ||
| 1653 | + called by ``QPDFWriter`` to update your idea of the percentage | ||
| 1654 | + it thinks it is through writing its output. Client programs can | ||
| 1655 | + use this to implement reasonably accurate progress meters. The | ||
| 1656 | + :command:`qpdf` command line tool uses this to | ||
| 1657 | + implement its :samp:`--progress` option. | ||
| 1658 | + | ||
| 1659 | + - New methods ``QPDFObjectHandle::newUnicodeString`` and | ||
| 1660 | + ``QPDFObject::unparseBinary`` have been added to allow for more | ||
| 1661 | + convenient creation of strings that are explicitly encoded | ||
| 1662 | + using big-endian UTF-16. This is useful for creating strings | ||
| 1663 | + that appear outside of content streams, such as labels, form | ||
| 1664 | + fields, outlines, document metadata, etc. | ||
| 1665 | + | ||
| 1666 | + - A new class ``QPDFObjectHandle::Rectangle`` has been added to | ||
| 1667 | + ease working with PDF rectangles, which are just arrays of four | ||
| 1668 | + numeric values. | ||
| 1669 | + | ||
| 1670 | +8.0.2: March 6, 2018 | ||
| 1671 | + - When a loop is detected while following cross reference streams or | ||
| 1672 | + tables, treat this as damage instead of silently ignoring the | ||
| 1673 | + previous table. This prevents loss of otherwise recoverable data | ||
| 1674 | + in some damaged files. | ||
| 1675 | + | ||
| 1676 | + - Properly handle pages with no contents. | ||
| 1677 | + | ||
| 1678 | +8.0.1: March 4, 2018 | ||
| 1679 | + - Disregard data check errors when uncompressing ``/FlateDecode`` | ||
| 1680 | + streams. This is consistent with most other PDF readers and allows | ||
| 1681 | + qpdf to recover data from another class of malformed PDF files. | ||
| 1682 | + | ||
| 1683 | + - On the command line when specifying page ranges, support preceding | ||
| 1684 | + a page number by "r" to indicate that it should be counted from | ||
| 1685 | + the end. For example, the range ``r3-r1`` would indicate the last | ||
| 1686 | + three pages of a document. | ||
| 1687 | + | ||
| 1688 | +8.0.0: February 25, 2018 | ||
| 1689 | + - Packaging and Distribution Changes | ||
| 1690 | + | ||
| 1691 | + - QPDF is now distributed as an | ||
| 1692 | + `AppImage <https://appimage.org/>`__ in addition to all the | ||
| 1693 | + other ways it is distributed. The AppImage can be found in the | ||
| 1694 | + download area with the other packages. Thanks to Kurt Pfeifle | ||
| 1695 | + and Simon Peter for their contributions. | ||
| 1696 | + | ||
| 1697 | + - Bug Fixes | ||
| 1698 | + | ||
| 1699 | + - ``QPDFObjectHandle::getUTF8Val`` now properly treats | ||
| 1700 | + non-Unicode strings as encoded with PDF Doc Encoding. | ||
| 1701 | + | ||
| 1702 | + - Improvements to handling of objects in PDF files that are not | ||
| 1703 | + of the expected type. In most cases, qpdf will be able to warn | ||
| 1704 | + for such cases rather than fail with an exception. Previous | ||
| 1705 | + versions of qpdf would sometimes fail with errors such as | ||
| 1706 | + "operation for dictionary object attempted on object of wrong | ||
| 1707 | + type". This situation should be mostly or entirely eliminated | ||
| 1708 | + now. | ||
| 1709 | + | ||
| 1710 | + - Enhancements to the :command:`qpdf` Command-line | ||
| 1711 | + Tool. All new options listed here are documented in more detail in | ||
| 1712 | + :ref:`ref.using`. | ||
| 1713 | + | ||
| 1714 | + - The option | ||
| 1715 | + :samp:`--linearize-pass1={file}` | ||
| 1716 | + has been added for debugging qpdf's linearization code. | ||
| 1717 | + | ||
| 1718 | + - The option :samp:`--coalesce-contents` can be | ||
| 1719 | + used to combine content streams of a page whose contents are an | ||
| 1720 | + array of streams into a single stream. | ||
| 1721 | + | ||
| 1722 | + - API Enhancements. All new API calls are documented in their | ||
| 1723 | + respective classes' header files. There are no non-compatible | ||
| 1724 | + changes to the API. | ||
| 1725 | + | ||
| 1726 | + - Add function ``qpdf_check_pdf`` to the C API. This function | ||
| 1727 | + does basic checking that is a subset of what :command:`qpdf | ||
| 1728 | + --check` performs. | ||
| 1729 | + | ||
| 1730 | + - Major enhancements to the lexical layer of qpdf. For a complete | ||
| 1731 | + list of enhancements, please refer to the | ||
| 1732 | + :file:`ChangeLog` file. Most of the changes | ||
| 1733 | + result in improvements to qpdf's ability handle erroneous | ||
| 1734 | + files. It is also possible for programs to handle whitespace, | ||
| 1735 | + comments, and inline images as tokens. | ||
| 1736 | + | ||
| 1737 | + - New API for working with PDF content streams at a lexical | ||
| 1738 | + level. The new class ``QPDFObjectHandle::TokenFilter`` allows | ||
| 1739 | + the developer to provide token handlers. Token filters can be | ||
| 1740 | + used with several different methods in ``QPDFObjectHandle`` as | ||
| 1741 | + well as with a lower-level interface. See comments in | ||
| 1742 | + :file:`QPDFObjectHandle.hh` as well as the | ||
| 1743 | + new examples | ||
| 1744 | + :file:`examples/pdf-filter-tokens.cc` and | ||
| 1745 | + :file:`examples/pdf-count-strings.cc` for | ||
| 1746 | + details. | ||
| 1747 | + | ||
| 1748 | +7.1.1: February 4, 2018 | ||
| 1749 | + - Bug fix: files whose /ID fields were other than 16 bytes long can | ||
| 1750 | + now be properly linearized | ||
| 1751 | + | ||
| 1752 | + - A few compile and link issues have been corrected for some | ||
| 1753 | + platforms. | ||
| 1754 | + | ||
| 1755 | +7.1.0: January 14, 2018 | ||
| 1756 | + - PDF files contain streams that may be compressed with various | ||
| 1757 | + compression algorithms which, in some cases, may be enhanced by | ||
| 1758 | + various predictor functions. Previously only the PNG up predictor | ||
| 1759 | + was supported. In this version, all the PNG predictors as well as | ||
| 1760 | + the TIFF predictor are supported. This increases the range of | ||
| 1761 | + files that qpdf is able to handle. | ||
| 1762 | + | ||
| 1763 | + - QPDF now allows a raw encryption key to be specified in place of a | ||
| 1764 | + password when opening encrypted files, and will optionally display | ||
| 1765 | + the encryption key used by a file. This is a non-standard | ||
| 1766 | + operation, but it can be useful in certain situations. Please see | ||
| 1767 | + the discussion of :samp:`--password-is-hex-key` in | ||
| 1768 | + :ref:`ref.basic-options` or the comments around | ||
| 1769 | + ``QPDF::setPasswordIsHexKey`` in | ||
| 1770 | + :file:`QPDF.hh` for additional details. | ||
| 1771 | + | ||
| 1772 | + - Bug fix: numbers ending with a trailing decimal point are now | ||
| 1773 | + properly recognized as numbers. | ||
| 1774 | + | ||
| 1775 | + - Bug fix: when building qpdf from source on some platforms | ||
| 1776 | + (especially MacOS), the build could get confused by older versions | ||
| 1777 | + of qpdf installed on the system. This has been corrected. | ||
| 1778 | + | ||
| 1779 | +7.0.0: September 15, 2017 | ||
| 1780 | + - Packaging and Distribution Changes | ||
| 1781 | + | ||
| 1782 | + - QPDF's primary license is now `version 2.0 of the Apache | ||
| 1783 | + License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather | ||
| 1784 | + than version 2.0 of the Artistic License. You may still, at | ||
| 1785 | + your option, consider qpdf to be licensed with version 2.0 of | ||
| 1786 | + the Artistic license. | ||
| 1787 | + | ||
| 1788 | + - QPDF no longer has a dependency on the PCRE (Perl-Compatible | ||
| 1789 | + Regular Expression) library. QPDF now has an added dependency | ||
| 1790 | + on the JPEG library. | ||
| 1791 | + | ||
| 1792 | + - Bug Fixes | ||
| 1793 | + | ||
| 1794 | + - This release contains many bug fixes for various infinite | ||
| 1795 | + loops, memory leaks, and other memory errors that could be | ||
| 1796 | + encountered with specially crafted or otherwise erroneous PDF | ||
| 1797 | + files. | ||
| 1798 | + | ||
| 1799 | + - New Features | ||
| 1800 | + | ||
| 1801 | + - QPDF now supports reading and writing streams encoded with JPEG | ||
| 1802 | + or RunLength encoding. Library API enhancements and | ||
| 1803 | + command-line options have been added to control this behavior. | ||
| 1804 | + See command-line options | ||
| 1805 | + :samp:`--compress-streams` and | ||
| 1806 | + :samp:`--decode-level` and methods | ||
| 1807 | + ``QPDFWriter::setCompressStreams`` and | ||
| 1808 | + ``QPDFWriter::setDecodeLevel``. | ||
| 1809 | + | ||
| 1810 | + - QPDF is much better at recovering from broken files. In most | ||
| 1811 | + cases, qpdf will skip invalid objects and will preserve broken | ||
| 1812 | + stream data by not attempting to filter broken streams. QPDF is | ||
| 1813 | + now able to recover or at least not crash on dozens of broken | ||
| 1814 | + test files I have received over the past few years. | ||
| 1815 | + | ||
| 1816 | + - Page rotation is now supported and accessible from both the | ||
| 1817 | + library and the command line. | ||
| 1818 | + | ||
| 1819 | + - ``QPDFWriter`` supports writing files in a way that preserves | ||
| 1820 | + PCLm compliance in support of driverless printing. This is very | ||
| 1821 | + specialized and is only useful to applications that already | ||
| 1822 | + know how to create PCLm files. | ||
| 1823 | + | ||
| 1824 | + - Enhancements to the :command:`qpdf` Command-line | ||
| 1825 | + Tool. All new options listed here are documented in more detail in | ||
| 1826 | + :ref:`ref.using`. | ||
| 1827 | + | ||
| 1828 | + - Command-line arguments can now be read from files or standard | ||
| 1829 | + input using ``@file`` or ``@-`` syntax. Please see :ref:`ref.invocation`. | ||
| 1830 | + | ||
| 1831 | + - :samp:`--rotate`: request page rotation | ||
| 1832 | + | ||
| 1833 | + - :samp:`--newline-before-endstream`: ensure that | ||
| 1834 | + a newline appears before every ``endstream`` keyword in the | ||
| 1835 | + file; used to prevent qpdf from breaking PDF/A compliance on | ||
| 1836 | + already compliant files. | ||
| 1837 | + | ||
| 1838 | + - :samp:`--preserve-unreferenced`: preserve | ||
| 1839 | + unreferenced objects in the input PDF | ||
| 1840 | + | ||
| 1841 | + - :samp:`--split-pages`: break output into chunks | ||
| 1842 | + with fixed numbers of pages | ||
| 1843 | + | ||
| 1844 | + - :samp:`--verbose`: print the name of each | ||
| 1845 | + output file that is created | ||
| 1846 | + | ||
| 1847 | + - :samp:`--compress-streams` and | ||
| 1848 | + :samp:`--decode-level` replace | ||
| 1849 | + :samp:`--stream-data` for improving granularity | ||
| 1850 | + of controlling compression and decompression of stream data. | ||
| 1851 | + The :samp:`--stream-data` option will remain | ||
| 1852 | + available. | ||
| 1853 | + | ||
| 1854 | + - When running :command:`qpdf --check` with other | ||
| 1855 | + options, checks are always run first. This enables qpdf to | ||
| 1856 | + perform its full recovery logic before outputting other | ||
| 1857 | + information. This can be especially useful when manually | ||
| 1858 | + recovering broken files, looking at qpdf's regenerated cross | ||
| 1859 | + reference table, or other similar operations. | ||
| 1860 | + | ||
| 1861 | + - Process :command:`--pages` earlier so that other | ||
| 1862 | + options like :samp:`--show-pages` or | ||
| 1863 | + :samp:`--split-pages` can operate on the file | ||
| 1864 | + after page splitting/merging has occurred. | ||
| 1865 | + | ||
| 1866 | + - API Changes. All new API calls are documented in their respective | ||
| 1867 | + classes' header files. | ||
| 1868 | + | ||
| 1869 | + - ``QPDFObjectHandle::rotatePage``: apply rotation to a page | ||
| 1870 | + object | ||
| 1871 | + | ||
| 1872 | + - ``QPDFWriter::setNewlineBeforeEndstream``: force newline to | ||
| 1873 | + appear before ``endstream`` | ||
| 1874 | + | ||
| 1875 | + - ``QPDFWriter::setPreserveUnreferencedObjects``: preserve | ||
| 1876 | + unreferenced objects that appear in the input PDF. The default | ||
| 1877 | + behavior is to discard them. | ||
| 1878 | + | ||
| 1879 | + - New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are | ||
| 1880 | + available for developers who wish to produce or consume | ||
| 1881 | + RunLength or DCT stream data directly. The | ||
| 1882 | + :file:`examples/pdf-create.cc` example | ||
| 1883 | + illustrates their use. | ||
| 1884 | + | ||
| 1885 | + - ``QPDFWriter::setCompressStreams`` and | ||
| 1886 | + ``QPDFWriter::setDecodeLevel`` methods control handling of | ||
| 1887 | + different types of stream compression. | ||
| 1888 | + | ||
| 1889 | + - Add new C API functions ``qpdf_set_compress_streams``, | ||
| 1890 | + ``qpdf_set_decode_level``, | ||
| 1891 | + ``qpdf_set_preserve_unreferenced_objects``, and | ||
| 1892 | + ``qpdf_set_newline_before_endstream`` corresponding to the new | ||
| 1893 | + ``QPDFWriter`` methods. | ||
| 1894 | + | ||
| 1895 | +6.0.0: November 10, 2015 | ||
| 1896 | + - Implement :samp:`--deterministic-id` command-line | ||
| 1897 | + option and ``QPDFWriter::setDeterministicID`` as well as C API | ||
| 1898 | + function ``qpdf_set_deterministic_ID`` for generating a | ||
| 1899 | + deterministic ID for non-encrypted files. When this option is | ||
| 1900 | + selected, the ID of the file depends on the contents of the output | ||
| 1901 | + file, and not on transient items such as the timestamp or output | ||
| 1902 | + file name. | ||
| 1903 | + | ||
| 1904 | + - Make qpdf more tolerant of files whose xref table entries are not | ||
| 1905 | + the correct length. | ||
| 1906 | + | ||
| 1907 | +5.1.3: May 24, 2015 | ||
| 1908 | + - Bug fix: fix-qdf was not properly handling files that contained | ||
| 1909 | + object streams with more than 255 objects in them. | ||
| 1910 | + | ||
| 1911 | + - Bug fix: qpdf was not properly initializing Microsoft's secure | ||
| 1912 | + crypto provider on fresh Windows installations that had not had | ||
| 1913 | + any keys created yet. | ||
| 1914 | + | ||
| 1915 | + - Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of | ||
| 1916 | + the Google Security Team. Please see the ChangeLog for details. | ||
| 1917 | + | ||
| 1918 | + - Properly handle pages that have no contents at all. There were | ||
| 1919 | + many cases in which qpdf handled this fine, but a few methods | ||
| 1920 | + blindly obtained page contents with handling the possibility that | ||
| 1921 | + there were no contents. | ||
| 1922 | + | ||
| 1923 | + - Make qpdf more robust for a few more kinds of problems that may | ||
| 1924 | + occur in invalid PDF files. | ||
| 1925 | + | ||
| 1926 | +5.1.2: June 7, 2014 | ||
| 1927 | + - Bug fix: linearizing files could create a corrupted output file | ||
| 1928 | + under extremely unlikely file size circumstances. See ChangeLog | ||
| 1929 | + for details. The odds of getting hit by this are very low, though | ||
| 1930 | + one person did. | ||
| 1931 | + | ||
| 1932 | + - Bug fix: qpdf would fail to write files that had streams with | ||
| 1933 | + decode parameters referencing other streams. | ||
| 1934 | + | ||
| 1935 | + - New example program: :command:`pdf-split-pages`: | ||
| 1936 | + efficiently split PDF files into individual pages. The example | ||
| 1937 | + program does this more efficiently than using :command:`qpdf | ||
| 1938 | + --pages` to do it. | ||
| 1939 | + | ||
| 1940 | + - Packaging fix: Visual C++ binaries did not support Windows XP. | ||
| 1941 | + This has been rectified by updating the compilers used to generate | ||
| 1942 | + the release binaries. | ||
| 1943 | + | ||
| 1944 | +5.1.1: January 14, 2014 | ||
| 1945 | + - Performance fix: copying foreign objects could be very slow with | ||
| 1946 | + certain types of files. This was most likely to be visible during | ||
| 1947 | + page splitting and was due to traversing the same objects multiple | ||
| 1948 | + times in some cases. | ||
| 1949 | + | ||
| 1950 | +5.1.0: December 17, 2013 | ||
| 1951 | + - Added runtime option (``QUtil::setRandomDataProvider``) to supply | ||
| 1952 | + your own random data provider. You can use this if you want to | ||
| 1953 | + avoid using the OS-provided secure random number generation | ||
| 1954 | + facility or stdlib's less secure version. See comments in | ||
| 1955 | + include/qpdf/QUtil.hh for details. | ||
| 1956 | + | ||
| 1957 | + - Fixed image comparison tests to not create 12-bit-per-pixel images | ||
| 1958 | + since some versions of tiffcmp have bugs in comparing them in some | ||
| 1959 | + cases. This increases the disk space required by the image | ||
| 1960 | + comparison tests, which are off by default anyway. | ||
| 1961 | + | ||
| 1962 | + - Introduce a number of small fixes for compilation on the latest | ||
| 1963 | + clang in MacOS and the latest Visual C++ in Windows. | ||
| 1964 | + | ||
| 1965 | + - Be able to handle broken files that end the xref table header with | ||
| 1966 | + a space instead of a newline. | ||
| 1967 | + | ||
| 1968 | +5.0.1: October 18, 2013 | ||
| 1969 | + - Thanks to a detailed review by Florian Weimer and the Red Hat | ||
| 1970 | + Product Security Team, this release includes a number of | ||
| 1971 | + non-user-visible security hardening changes. Please see the | ||
| 1972 | + ChangeLog file in the source distribution for the complete list. | ||
| 1973 | + | ||
| 1974 | + - When available, operating system-specific secure random number | ||
| 1975 | + generation is used for generating initialization vectors and other | ||
| 1976 | + random values used during encryption or file creation. For the | ||
| 1977 | + Windows build, this results in an added dependency on Microsoft's | ||
| 1978 | + cryptography API. To disable the OS-specific cryptography and use | ||
| 1979 | + the old version, pass the | ||
| 1980 | + :samp:`--enable-insecure-random` option to | ||
| 1981 | + :command:`./configure`. | ||
| 1982 | + | ||
| 1983 | + - The :command:`qpdf` command-line tool now issues a | ||
| 1984 | + warning when :samp:`-accessibility=n` is specified | ||
| 1985 | + for newer encryption versions stating that the option is ignored. | ||
| 1986 | + qpdf, per the spec, has always ignored this flag, but it | ||
| 1987 | + previously did so silently. This warning is issued only by the | ||
| 1988 | + command-line tool, not by the library. The library's handling of | ||
| 1989 | + this flag is unchanged. | ||
| 1990 | + | ||
| 1991 | +5.0.0: July 10, 2013 | ||
| 1992 | + - Bug fix: previous versions of qpdf would lose objects with | ||
| 1993 | + generation != 0 when generating object streams. Fixing this | ||
| 1994 | + required changes to the public API. | ||
| 1995 | + | ||
| 1996 | + - Removed methods from public API that were only supposed to be | ||
| 1997 | + called by QPDFWriter and couldn't realistically be called anywhere | ||
| 1998 | + else. See ChangeLog for details. | ||
| 1999 | + | ||
| 2000 | + - New ``QPDFObjGen`` class added to represent an object | ||
| 2001 | + ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now | ||
| 2002 | + preferred over ``QPDFObjectHandle::getObjectID()`` and | ||
| 2003 | + ``QPDFObjectHandle::getGeneration()`` as it makes it less likely | ||
| 2004 | + for people to accidentally write code that ignores the generation | ||
| 2005 | + number. See :file:`QPDF.hh` and | ||
| 2006 | + :file:`QPDFObjectHandle.hh` for additional | ||
| 2007 | + notes. | ||
| 2008 | + | ||
| 2009 | + - Add :samp:`--show-npages` command-line option to | ||
| 2010 | + the :command:`qpdf` command to show the number of | ||
| 2011 | + pages in a file. | ||
| 2012 | + | ||
| 2013 | + - Allow omission of the page range within | ||
| 2014 | + :samp:`--pages` for the | ||
| 2015 | + :command:`qpdf` command. When omitted, the page | ||
| 2016 | + range is implicitly taken to be all the pages in the file. | ||
| 2017 | + | ||
| 2018 | + - Various enhancements were made to support different types of | ||
| 2019 | + broken files or broken readers. Details can be found in | ||
| 2020 | + :file:`ChangeLog`. | ||
| 2021 | + | ||
| 2022 | +4.1.0: April 14, 2013 | ||
| 2023 | + - Note to people including qpdf in distributions: the | ||
| 2024 | + :file:`.la` files generated by libtool are now | ||
| 2025 | + installed by qpdf's :command:`make install` target. | ||
| 2026 | + Before, they were not installed. This means that if your | ||
| 2027 | + distribution does not want to include | ||
| 2028 | + :file:`.la` files, you must remove them as | ||
| 2029 | + part of your packaging process. | ||
| 2030 | + | ||
| 2031 | + - Major enhancement: API enhancements have been made to support | ||
| 2032 | + parsing of content streams. This enhancement includes the | ||
| 2033 | + following changes: | ||
| 2034 | + | ||
| 2035 | + - ``QPDFObjectHandle::parseContentStream`` method parses objects | ||
| 2036 | + in a content stream and calls handlers in a callback class. The | ||
| 2037 | + example | ||
| 2038 | + :file:`examples/pdf-parse-content.cc` | ||
| 2039 | + illustrates how this may be used. | ||
| 2040 | + | ||
| 2041 | + - ``QPDFObjectHandle`` can now represent operators and inline | ||
| 2042 | + images, object types that may only appear in content streams. | ||
| 2043 | + | ||
| 2044 | + - Method ``QPDFObjectHandle::getTypeCode()`` returns an | ||
| 2045 | + enumerated type value representing the underlying object type. | ||
| 2046 | + Method ``QPDFObjectHandle::getTypeName()`` returns a text | ||
| 2047 | + string describing the name of the type of a | ||
| 2048 | + ``QPDFObjectHandle`` object. These methods can be used for more | ||
| 2049 | + efficient parsing and debugging/diagnostic messages. | ||
| 2050 | + | ||
| 2051 | + - :command:`qpdf --check` now parses all pages' | ||
| 2052 | + content streams in addition to doing other checks. While there are | ||
| 2053 | + still many types of errors that cannot be detected, syntactic | ||
| 2054 | + errors in content streams will now be reported. | ||
| 2055 | + | ||
| 2056 | + - Minor compilation enhancements have been made to facilitate easier | ||
| 2057 | + for support for a broader range of compilers and compiler | ||
| 2058 | + versions. | ||
| 2059 | + | ||
| 2060 | + - Warning flags have been moved into a separate variable in | ||
| 2061 | + :file:`autoconf.mk` | ||
| 2062 | + | ||
| 2063 | + - The configure flag :samp:`--enable-werror` work | ||
| 2064 | + for Microsoft compilers | ||
| 2065 | + | ||
| 2066 | + - All MSVC CRT security warnings have been resolved. | ||
| 2067 | + | ||
| 2068 | + - All C-style casts in C++ Code have been replaced by C++ casts, | ||
| 2069 | + and many casts that had been included to suppress higher | ||
| 2070 | + warning levels for some compilers have been removed, primarily | ||
| 2071 | + for clarity. Places where integer type coercion occurs have | ||
| 2072 | + been scrutinized. A new casting policy has been documented in | ||
| 2073 | + the manual. This is of concern mainly to people porting qpdf to | ||
| 2074 | + new platforms or compilers. It is not visible to programmers | ||
| 2075 | + writing code that uses the library | ||
| 2076 | + | ||
| 2077 | + - Some internal limits have been removed in code that converts | ||
| 2078 | + numbers to strings. This is largely invisible to users, but it | ||
| 2079 | + does trigger a bug in some older versions of mingw-w64's C++ | ||
| 2080 | + library. See :file:`README-windows.md` in | ||
| 2081 | + the source distribution if you think this may affect you. The | ||
| 2082 | + copy of the DLL distributed with qpdf's binary distribution is | ||
| 2083 | + not affected by this problem. | ||
| 2084 | + | ||
| 2085 | + - The RPM spec file previously included with qpdf has been removed. | ||
| 2086 | + This is because virtually all Linux distributions include qpdf now | ||
| 2087 | + that it is a dependency of CUPS filters. | ||
| 2088 | + | ||
| 2089 | + - A few bug fixes are included: | ||
| 2090 | + | ||
| 2091 | + - Overridden compressed objects are properly handled. Before, | ||
| 2092 | + there were certain constructs that could cause qpdf to see old | ||
| 2093 | + versions of some objects. The most usual manifestation of this | ||
| 2094 | + was loss of filled in form values for certain files. | ||
| 2095 | + | ||
| 2096 | + - Installation no longer uses GNU/Linux-specific versions of some | ||
| 2097 | + commands, so :command:`make install` works on | ||
| 2098 | + Solaris with native tools. | ||
| 2099 | + | ||
| 2100 | + - The 64-bit mingw Windows binary package no longer includes a | ||
| 2101 | + 32-bit DLL. | ||
| 2102 | + | ||
| 2103 | +4.0.1: January 17, 2013 | ||
| 2104 | + - Fix detection of binary attachments in test suite to avoid false | ||
| 2105 | + test failures on some platforms. | ||
| 2106 | + | ||
| 2107 | + - Add clarifying comment in :file:`QPDF.hh` to | ||
| 2108 | + methods that return the user password explaining that it is no | ||
| 2109 | + longer possible with newer encryption formats to recover the user | ||
| 2110 | + password knowing the owner password. In earlier encryption | ||
| 2111 | + formats, the user password was encrypted in the file using the | ||
| 2112 | + owner password. In newer encryption formats, a separate encryption | ||
| 2113 | + key is used on the file, and that key is independently encrypted | ||
| 2114 | + using both the user password and the owner password. | ||
| 2115 | + | ||
| 2116 | +4.0.0: December 31, 2012 | ||
| 2117 | + - Major enhancement: support has been added for newer encryption | ||
| 2118 | + schemes supported by version X of Adobe Acrobat. This includes use | ||
| 2119 | + of 127-character passwords, 256-bit encryption keys, and the | ||
| 2120 | + encryption scheme specified in ISO 32000-2, the PDF 2.0 | ||
| 2121 | + specification. This scheme can be chosen from the command line by | ||
| 2122 | + specifying use of 256-bit keys. qpdf also supports the deprecated | ||
| 2123 | + encryption method used by Acrobat IX. This encryption style has | ||
| 2124 | + known security weaknesses and should not be used in practice. | ||
| 2125 | + However, such files exist "in the wild," so support for this | ||
| 2126 | + scheme is still useful. New methods | ||
| 2127 | + ``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme) | ||
| 2128 | + and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated | ||
| 2129 | + scheme) have been added to enable these new encryption schemes. | ||
| 2130 | + Corresponding functions have been added to the C API as well. | ||
| 2131 | + | ||
| 2132 | + - Full support for Adobe extension levels in PDF version | ||
| 2133 | + information. Starting with PDF version 1.7, corresponding to ISO | ||
| 2134 | + 32000, Adobe adds new functionality by increasing the extension | ||
| 2135 | + level rather than increasing the version. This support includes | ||
| 2136 | + addition of the ``QPDF::getExtensionLevel`` method for retrieving | ||
| 2137 | + the document's extension level, addition of versions of | ||
| 2138 | + ``QPDFWriter::setMinimumPDFVersion`` and | ||
| 2139 | + ``QPDFWriter::forcePDFVersion`` that accept an extension level, | ||
| 2140 | + and extended syntax for specifying forced and minimum versions on | ||
| 2141 | + the command line as described in :ref:`ref.advanced-transformation`. Corresponding functions | ||
| 2142 | + have been added to the C API as well. | ||
| 2143 | + | ||
| 2144 | + - Minor fixes to prevent qpdf from referencing objects in the file | ||
| 2145 | + that are not referenced in the file's overall structure. Most | ||
| 2146 | + files don't have any such objects, but some files have contain | ||
| 2147 | + unreferenced objects with errors, so these fixes prevent qpdf from | ||
| 2148 | + needlessly rejecting or complaining about such objects. | ||
| 2149 | + | ||
| 2150 | + - Add new generalized methods for reading and writing files from/to | ||
| 2151 | + programmer-defined sources. The method | ||
| 2152 | + ``QPDF::processInputSource`` allows the programmer to use any | ||
| 2153 | + input source for the input file, and | ||
| 2154 | + ``QPDFWriter::setOutputPipeline`` allows the programmer to write | ||
| 2155 | + the output file through any pipeline. These methods would make it | ||
| 2156 | + possible to perform any number of specialized operations, such as | ||
| 2157 | + accessing external storage systems, creating bindings for qpdf in | ||
| 2158 | + other programming languages that have their own I/O systems, etc. | ||
| 2159 | + | ||
| 2160 | + - Add new method ``QPDF::getEncryptionKey`` for retrieving the | ||
| 2161 | + underlying encryption key used in the file. | ||
| 2162 | + | ||
| 2163 | + - This release includes a small handful of non-compatible API | ||
| 2164 | + changes. While effort is made to avoid such changes, all the | ||
| 2165 | + non-compatible API changes in this version were to parts of the | ||
| 2166 | + API that would likely never be used outside the library itself. In | ||
| 2167 | + all cases, the altered methods or structures were parts of the | ||
| 2168 | + ``QPDF`` that were public to enable them to be called from either | ||
| 2169 | + ``QPDFWriter`` or were part of validation code that was | ||
| 2170 | + over-zealous in reporting problems in parts of the file that would | ||
| 2171 | + not ordinarily be referenced. In no case did any of the removed | ||
| 2172 | + methods do anything worse that falsely report error conditions in | ||
| 2173 | + files that were broken in ways that didn't matter. The following | ||
| 2174 | + public parts of the ``QPDF`` class were changed in a | ||
| 2175 | + non-compatible way: | ||
| 2176 | + | ||
| 2177 | + - Updated nested ``QPDF::EncryptionData`` class to add fields | ||
| 2178 | + needed by the newer encryption formats, member variables | ||
| 2179 | + changed to private so that future changes will not require | ||
| 2180 | + breaking backward compatibility. | ||
| 2181 | + | ||
| 2182 | + - Added additional parameters to ``compute_data_key``, which is | ||
| 2183 | + used by ``QPDFWriter`` to compute the encryption key used to | ||
| 2184 | + encrypt a specific object. | ||
| 2185 | + | ||
| 2186 | + - Removed the method ``flattenScalarReferences``. This method was | ||
| 2187 | + previously used prior to writing a new PDF file, but it has the | ||
| 2188 | + undesired side effect of causing qpdf to read objects in the | ||
| 2189 | + file that were not referenced. Some otherwise files have | ||
| 2190 | + unreferenced objects with errors in them, so this could cause | ||
| 2191 | + qpdf to reject files that would be accepted by virtually all | ||
| 2192 | + other PDF readers. In fact, qpdf relied on only a very small | ||
| 2193 | + part of what flattenScalarReferences did, so only this part has | ||
| 2194 | + been preserved, and it is now done directly inside | ||
| 2195 | + ``QPDFWriter``. | ||
| 2196 | + | ||
| 2197 | + - Removed the method ``decodeStreams``. This method was used by | ||
| 2198 | + the :samp:`--check` option of the | ||
| 2199 | + :command:`qpdf` command-line tool to force all | ||
| 2200 | + streams in the file to be decoded, but it also suffered from | ||
| 2201 | + the problem of opening otherwise unreferenced streams and thus | ||
| 2202 | + could report false positive. The | ||
| 2203 | + :samp:`--check` option now causes qpdf to go | ||
| 2204 | + through all the motions of writing a new file based on the | ||
| 2205 | + original one, so it will always reference and check exactly | ||
| 2206 | + those parts of a file that any ordinary viewer would check. | ||
| 2207 | + | ||
| 2208 | + - Removed the method ``trimTrailerForWrite``. This method was | ||
| 2209 | + used by ``QPDFWriter`` to modify the original QPDF object by | ||
| 2210 | + removing fields from the trailer dictionary that wouldn't apply | ||
| 2211 | + to the newly written file. This functionality, though generally | ||
| 2212 | + harmless, was a poor implementation and has been replaced by | ||
| 2213 | + having QPDFWriter filter these out when copying the trailer | ||
| 2214 | + rather than modifying the original QPDF object. (Note that qpdf | ||
| 2215 | + never modifies the original file itself.) | ||
| 2216 | + | ||
| 2217 | + - Allow the PDF header to appear anywhere in the first 1024 bytes of | ||
| 2218 | + the file. This is consistent with what other readers do. | ||
| 2219 | + | ||
| 2220 | + - Fix the :command:`pkg-config` files to list zlib | ||
| 2221 | + and pcre in ``Requires.private`` to better support static linking | ||
| 2222 | + using :command:`pkg-config`. | ||
| 2223 | + | ||
| 2224 | +3.0.2: September 6, 2012 | ||
| 2225 | + - Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not | ||
| 2226 | + used with ``QPDFWriter::setStaticID``, which made it pretty much | ||
| 2227 | + useless. This has been fixed. | ||
| 2228 | + | ||
| 2229 | + - New API call ``QPDFWriter::setExtraHeaderText`` inserts additional | ||
| 2230 | + text near the header of the PDF file. The intended use case is to | ||
| 2231 | + insert comments that may be consumed by a downstream application, | ||
| 2232 | + though other use cases may exist. | ||
| 2233 | + | ||
| 2234 | +3.0.1: August 11, 2012 | ||
| 2235 | + - Version 3.0.0 included addition of files for | ||
| 2236 | + :command:`pkg-config`, but this was not mentioned | ||
| 2237 | + in the release notes. The release notes for 3.0.0 were updated to | ||
| 2238 | + mention this. | ||
| 2239 | + | ||
| 2240 | + - Bug fix: if an object stream ended with a scalar object not | ||
| 2241 | + followed by space, qpdf would incorrectly report that it | ||
| 2242 | + encountered a premature EOF. This bug has been in qpdf since | ||
| 2243 | + versionย 2.0. | ||
| 2244 | + | ||
| 2245 | +3.0.0: August 2, 2012 | ||
| 2246 | + - Acknowledgment: I would like to express gratitude for the | ||
| 2247 | + contributions of Tobias Hoffmann toward the release of qpdf | ||
| 2248 | + version 3.0. He is responsible for most of the implementation and | ||
| 2249 | + design of the new API for manipulating pages, and contributed code | ||
| 2250 | + and ideas for many of the improvements made in version 3.0. | ||
| 2251 | + Without his work, this release would certainly not have happened | ||
| 2252 | + as soon as it did, if at all. | ||
| 2253 | + | ||
| 2254 | + - *Non-compatible API changes:* | ||
| 2255 | + | ||
| 2256 | + - The method ``QPDFObjectHandle::replaceStreamData`` that uses a | ||
| 2257 | + ``StreamDataProvider`` to provide the stream data no longer | ||
| 2258 | + takes a ``length`` parameter. The parameter was removed since | ||
| 2259 | + this provides the user an opportunity to simplify the calling | ||
| 2260 | + code. This method was introduced in version 2.2. At the time, | ||
| 2261 | + the ``length`` parameter was required in order to ensure that | ||
| 2262 | + calls to the stream data provider returned the same length for a | ||
| 2263 | + specific stream every time they were invoked. In particular, the | ||
| 2264 | + linearization code depends on this. Instead, qpdf 3.0 and newer | ||
| 2265 | + check for that constraint explicitly. The first time the stream | ||
| 2266 | + data provider is called for a specific stream, the actual length | ||
| 2267 | + is saved, and subsequent calls are required to return the same | ||
| 2268 | + number of bytes. This means the calling code no longer has to | ||
| 2269 | + compute the length in advance, which can be a significant | ||
| 2270 | + simplification. If your code fails to compile because of the | ||
| 2271 | + extra argument and you don't want to make other changes to your | ||
| 2272 | + code, just omit the argument. | ||
| 2273 | + | ||
| 2274 | + - Many methods take ``long long`` instead of other integer types. | ||
| 2275 | + Most if not all existing code should compile fine with this | ||
| 2276 | + change since such parameters had always previously been smaller | ||
| 2277 | + types. This change was required to support files larger than two | ||
| 2278 | + gigabytes in size. | ||
| 2279 | + | ||
| 2280 | + - Support has been added for large files. The test suite verifies | ||
| 2281 | + support for files larger than 4 gigabytes, and manual testing has | ||
| 2282 | + verified support for files larger than 10 gigabytes. Large file | ||
| 2283 | + support is available for both 32-bit and 64-bit platforms as long | ||
| 2284 | + as the compiler and underlying platforms support it. | ||
| 2285 | + | ||
| 2286 | + - Support for page selection (splitting and merging PDF files) has | ||
| 2287 | + been added to the :command:`qpdf` command-line | ||
| 2288 | + tool. See :ref:`ref.page-selection`. | ||
| 2289 | + | ||
| 2290 | + - Options have been added to the :command:`qpdf` | ||
| 2291 | + command-line tool for copying encryption parameters from another | ||
| 2292 | + file. See :ref:`ref.basic-options`. | ||
| 2293 | + | ||
| 2294 | + - New methods have been added to the ``QPDF`` object for adding and | ||
| 2295 | + removing pages. See :ref:`ref.adding-and-remove-pages`. | ||
| 2296 | + | ||
| 2297 | + - New methods have been added to the ``QPDF`` object for copying | ||
| 2298 | + objects from other PDF files. See :ref:`ref.foreign-objects` | ||
| 2299 | + | ||
| 2300 | + - A new method ``QPDFObjectHandle::parse`` has been added for | ||
| 2301 | + constructing ``QPDFObjectHandle`` objects from a string | ||
| 2302 | + description. | ||
| 2303 | + | ||
| 2304 | + - Methods have been added to ``QPDFWriter`` to allow writing to an | ||
| 2305 | + already open stdio ``FILE*`` addition to writing to standard | ||
| 2306 | + output or a named file. Methods have been added to ``QPDF`` to be | ||
| 2307 | + able to process a file from an already open stdio ``FILE*``. This | ||
| 2308 | + makes it possible to read and write PDF from secure temporary | ||
| 2309 | + files that have been unlinked prior to being fully read or | ||
| 2310 | + written. | ||
| 2311 | + | ||
| 2312 | + - The ``QPDF::emptyPDF`` can be used to allow creation of PDF files | ||
| 2313 | + from scratch. The example | ||
| 2314 | + :file:`examples/pdf-create.cc` illustrates how | ||
| 2315 | + it can be used. | ||
| 2316 | + | ||
| 2317 | + - Several methods to take ``PointerHolder<Buffer>`` can now also | ||
| 2318 | + accept ``std::string`` arguments. | ||
| 2319 | + | ||
| 2320 | + - Many new convenience methods have been added to the library, most | ||
| 2321 | + in ``QPDFObjectHandle``. See :file:`ChangeLog` | ||
| 2322 | + for a full list. | ||
| 2323 | + | ||
| 2324 | + - When building on a platform that supports ELF shared libraries | ||
| 2325 | + (such as Linux), symbol versions are enabled by default. They can | ||
| 2326 | + be disabled by passing | ||
| 2327 | + :samp:`--disable-ld-version-script` to | ||
| 2328 | + :command:`./configure`. | ||
| 2329 | + | ||
| 2330 | + - The file :file:`libqpdf.pc` is now installed | ||
| 2331 | + to support :command:`pkg-config`. | ||
| 2332 | + | ||
| 2333 | + - Image comparison tests are off by default now since they are not | ||
| 2334 | + needed to verify a correct build or port of qpdf. They are needed | ||
| 2335 | + only when changing the actual PDF output generated by qpdf. You | ||
| 2336 | + should enable them if you are making deep changes to qpdf itself. | ||
| 2337 | + See :file:`README.md` for details. | ||
| 2338 | + | ||
| 2339 | + - Large file tests are off by default but can be turned on with | ||
| 2340 | + :command:`./configure` or by setting an environment | ||
| 2341 | + variable before running the test suite. See | ||
| 2342 | + :file:`README.md` for details. | ||
| 2343 | + | ||
| 2344 | + - When qpdf's test suite fails, failures are not printed to the | ||
| 2345 | + terminal anymore by default. Instead, find them in | ||
| 2346 | + :file:`build/qtest.log`. For packagers who are | ||
| 2347 | + building with an autobuilder, you can add the | ||
| 2348 | + :samp:`--enable-show-failed-test-output` option to | ||
| 2349 | + :command:`./configure` to restore the old behavior. | ||
| 2350 | + | ||
| 2351 | +2.3.1: December 28, 2011 | ||
| 2352 | + - Fix thread-safety problem resulting from non-thread-safe use of | ||
| 2353 | + the PCRE library. | ||
| 2354 | + | ||
| 2355 | + - Made a few minor documentation fixes. | ||
| 2356 | + | ||
| 2357 | + - Add workaround for a bug that appears in some versions of | ||
| 2358 | + ghostscript to the test suite | ||
| 2359 | + | ||
| 2360 | + - Fix minor build issue for Visual C++ 2010. | ||
| 2361 | + | ||
| 2362 | +2.3.0: August 11, 2011 | ||
| 2363 | + - Bug fix: when preserving existing encryption on encrypted files | ||
| 2364 | + with cleartext metadata, older qpdf versions would generate | ||
| 2365 | + password-protected files with no valid password. This operation | ||
| 2366 | + now works. This bug only affected files created by copying | ||
| 2367 | + existing encryption parameters; explicit encryption with | ||
| 2368 | + specification of cleartext metadata worked before and continues to | ||
| 2369 | + work. | ||
| 2370 | + | ||
| 2371 | + - Enhance ``QPDFWriter`` with a new constructor that allows you to | ||
| 2372 | + delay the specification of the output file. When using this | ||
| 2373 | + constructor, you may now call ``QPDFWriter::setOutputFilename`` to | ||
| 2374 | + specify the output file, or you may use | ||
| 2375 | + ``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write | ||
| 2376 | + the resulting PDF file to a memory buffer. You may then use | ||
| 2377 | + ``QPDFWriter::getBuffer`` to retrieve the memory buffer. | ||
| 2378 | + | ||
| 2379 | + - Add new API call ``QPDF::replaceObject`` for replacing objects by | ||
| 2380 | + object ID | ||
| 2381 | + | ||
| 2382 | + - Add new API call ``QPDF::swapObjects`` for swapping two objects by | ||
| 2383 | + object ID | ||
| 2384 | + | ||
| 2385 | + - Add ``QPDFObjectHandle::getDictAsMap`` and | ||
| 2386 | + ``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of | ||
| 2387 | + dictionary objects as maps and array objects as vectors. | ||
| 2388 | + | ||
| 2389 | + - Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to | ||
| 2390 | + the C API for manipulating string fields of the document's | ||
| 2391 | + ``/Info`` dictionary. | ||
| 2392 | + | ||
| 2393 | + - Add functions ``qpdf_init_write_memory``, | ||
| 2394 | + ``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API | ||
| 2395 | + for writing PDF files to a memory buffer instead of a file. | ||
| 2396 | + | ||
| 2397 | +2.2.4: June 25, 2011 | ||
| 2398 | + - Fix installation and compilation issues; no functionality changes. | ||
| 2399 | + | ||
| 2400 | +2.2.3: April 30, 2011 | ||
| 2401 | + - Handle some damaged streams with incorrect characters following | ||
| 2402 | + the stream keyword. | ||
| 2403 | + | ||
| 2404 | + - Improve handling of inline images when normalizing content | ||
| 2405 | + streams. | ||
| 2406 | + | ||
| 2407 | + - Enhance error recovery to properly handle files that use object 0 | ||
| 2408 | + as a regular object, which is specifically disallowed by the spec. | ||
| 2409 | + | ||
| 2410 | +2.2.2: October 4, 2010 | ||
| 2411 | + - Add new function ``qpdf_read_memory`` to the C API to call | ||
| 2412 | + ``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1. | ||
| 2413 | + | ||
| 2414 | +2.2.1: October 1, 2010 | ||
| 2415 | + - Add new method ``QPDF::setOutputStreams`` to replace ``std::cout`` | ||
| 2416 | + and ``std::cerr`` with other streams for generation of diagnostic | ||
| 2417 | + messages and error messages. This can be useful for GUIs or other | ||
| 2418 | + applications that want to capture any output generated by the | ||
| 2419 | + library to present to the user in some other way. Note that QPDF | ||
| 2420 | + does not write to ``std::cout`` (or the specified output stream) | ||
| 2421 | + except where explicitly mentioned in | ||
| 2422 | + :file:`QPDF.hh`, and that the only use of the | ||
| 2423 | + error stream is for warnings. Note also that output of warnings is | ||
| 2424 | + suppressed when ``setSuppressWarnings(true)`` is called. | ||
| 2425 | + | ||
| 2426 | + - Add new method ``QPDF::processMemoryFile`` for operating on PDF | ||
| 2427 | + files that are loaded into memory rather than in a file on disk. | ||
| 2428 | + | ||
| 2429 | + - Give a warning but otherwise ignore empty PDF objects by treating | ||
| 2430 | + them as null. Empty object are not permitted by the PDF | ||
| 2431 | + specification but have been known to appear in some actual PDF | ||
| 2432 | + files. | ||
| 2433 | + | ||
| 2434 | + - Handle inline image filter abbreviations when the appear as stream | ||
| 2435 | + filter abbreviations. The PDF specification does not allow use of | ||
| 2436 | + stream filter abbreviations in this way, but Adobe Reader and some | ||
| 2437 | + other PDF readers accept them since they sometimes appear | ||
| 2438 | + incorrectly in actual PDF files. | ||
| 2439 | + | ||
| 2440 | + - Implement miscellaneous enhancements to ``PointerHolder`` and | ||
| 2441 | + ``Buffer`` to support other changes. | ||
| 2442 | + | ||
| 2443 | +2.2.0: August 14, 2010 | ||
| 2444 | + - Add new methods to ``QPDFObjectHandle`` (``newStream`` and | ||
| 2445 | + ``replaceStreamData`` for creating new streams and replacing | ||
| 2446 | + stream data. This makes it possible to perform a wide range of | ||
| 2447 | + operations that were not previously possible. | ||
| 2448 | + | ||
| 2449 | + - Add new helper method in ``QPDFObjectHandle`` | ||
| 2450 | + (``addPageContents``) for appending or prepending new content | ||
| 2451 | + streams to a page. This method makes it possible to manipulate | ||
| 2452 | + content streams without having to be concerned whether a page's | ||
| 2453 | + contents are a single stream or an array of streams. | ||
| 2454 | + | ||
| 2455 | + - Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``, | ||
| 2456 | + which replaces a dictionary key with a given value unless the | ||
| 2457 | + value is null, in which case it removes the key instead. | ||
| 2458 | + | ||
| 2459 | + - Add new method in ``QPDFObjectHandle``: ``getRawStreamData``, | ||
| 2460 | + which returns the raw (unfiltered) stream data into a buffer. This | ||
| 2461 | + complements the ``getStreamData`` method, which returns the | ||
| 2462 | + filtered (uncompressed) stream data and can only be used when the | ||
| 2463 | + stream's data is filterable. | ||
| 2464 | + | ||
| 2465 | + - Provide two new examples: | ||
| 2466 | + :command:`pdf-double-page-size` and | ||
| 2467 | + :command:`pdf-invert-images` that illustrate the | ||
| 2468 | + newly added interfaces. | ||
| 2469 | + | ||
| 2470 | + - Fix a memory leak that would cause loss of a few bytes for every | ||
| 2471 | + object involved in a cycle of object references. Thanks to Jian Ma | ||
| 2472 | + for calling my attention to the leak. | ||
| 2473 | + | ||
| 2474 | +2.1.5: April 25, 2010 | ||
| 2475 | + - Remove restriction of file identifier strings to 16 bytes. This | ||
| 2476 | + unnecessary restriction was preventing qpdf from being able to | ||
| 2477 | + encrypt or decrypt files with identifier strings that were not | ||
| 2478 | + exactly 16 bytes long. The specification imposes no such | ||
| 2479 | + restriction. | ||
| 2480 | + | ||
| 2481 | +2.1.4: April 18, 2010 | ||
| 2482 | + - Apply the same padding calculation fix from version 2.1.2 to the | ||
| 2483 | + main cross reference stream as well. | ||
| 2484 | + | ||
| 2485 | + - Since :command:`qpdf --check` only performs limited | ||
| 2486 | + checks, clarify the output to make it clear that there still may | ||
| 2487 | + be errors that qpdf can't check. This should make it less | ||
| 2488 | + surprising to people when another PDF reader is unable to read a | ||
| 2489 | + file that qpdf thinks is okay. | ||
| 2490 | + | ||
| 2491 | +2.1.3: March 27, 2010 | ||
| 2492 | + - Fix bug that could cause a failure when rewriting PDF files that | ||
| 2493 | + contain object streams with unreferenced objects that in turn | ||
| 2494 | + reference indirect scalars. | ||
| 2495 | + | ||
| 2496 | + - Don't complain about (invalid) AES streams that aren't a multiple | ||
| 2497 | + of 16 bytes. Instead, pad them before decrypting. | ||
| 2498 | + | ||
| 2499 | +2.1.2: January 24, 2010 | ||
| 2500 | + - Fix bug in padding around first half cross reference stream in | ||
| 2501 | + linearized files. The bug could cause an assertion failure when | ||
| 2502 | + linearizing certain unlucky files. | ||
| 2503 | + | ||
| 2504 | +2.1.1: December 14, 2009 | ||
| 2505 | + - No changes in functionality; insert missing include in an internal | ||
| 2506 | + library header file to support gcc 4.4, and update test suite to | ||
| 2507 | + ignore broken Adobe Reader installations. | ||
| 2508 | + | ||
| 2509 | +2.1: October 30, 2009 | ||
| 2510 | + - This is the first version of qpdf to include Windows support. On | ||
| 2511 | + Windows, it is possible to build a DLL. Additionally, a partial | ||
| 2512 | + C-language API has been introduced, which makes it possible to | ||
| 2513 | + call qpdf functions from non-C++ environments. I am very grateful | ||
| 2514 | + to ลฝarko Gajiฤ (http://zarko-gajic.iz.hr/) for tirelessly testing | ||
| 2515 | + numerous pre-release versions of this DLL and providing many | ||
| 2516 | + excellent suggestions on improving the interface. | ||
| 2517 | + | ||
| 2518 | + For programming to the C interface, please see the header file | ||
| 2519 | + :file:`qpdf/qpdf-c.h` and the example | ||
| 2520 | + :file:`examples/pdf-linearize.c`. | ||
| 2521 | + | ||
| 2522 | + - ลฝarko Gajiฤ has written a Delphi wrapper for qpdf, which can be | ||
| 2523 | + downloaded from qpdf's download side. ลฝarko's Delphi wrapper is | ||
| 2524 | + released with the same licensing terms as qpdf itself and comes | ||
| 2525 | + with this disclaimer: "Delphi wrapper unit | ||
| 2526 | + :file:`qpdf.pas` created by ลฝarko Gajiฤ | ||
| 2527 | + (http://zarko-gajic.iz.hr/). Use at your own risk and for whatever | ||
| 2528 | + purpose you want. No support is provided. Sample code is | ||
| 2529 | + provided." | ||
| 2530 | + | ||
| 2531 | + - Support has been added for AES encryption and crypt filters. | ||
| 2532 | + Although qpdf does not presently support files that use PKI-based | ||
| 2533 | + encryption, with the addition of AES and crypt filters, qpdf is | ||
| 2534 | + now be able to open most encrypted files created with newer | ||
| 2535 | + versions of Acrobat or other PDF creation software. Note that I | ||
| 2536 | + have not been able to get very many files encrypted in this way, | ||
| 2537 | + so it's possible there could still be some cases that qpdf can't | ||
| 2538 | + handle. Please report them if you find them. | ||
| 2539 | + | ||
| 2540 | + - Many error messages have been improved to include more information | ||
| 2541 | + in hopes of making qpdf a more useful tool for PDF experts to use | ||
| 2542 | + in manually recovering damaged PDF files. | ||
| 2543 | + | ||
| 2544 | + - Attempt to avoid compressing metadata streams if possible. This is | ||
| 2545 | + consistent with other PDF creation applications. | ||
| 2546 | + | ||
| 2547 | + - Provide new command-line options for AES encrypt, cleartext | ||
| 2548 | + metadata, and setting the minimum and forced PDF versions of | ||
| 2549 | + output files. | ||
| 2550 | + | ||
| 2551 | + - Add additional methods to the ``QPDF`` object for querying the | ||
| 2552 | + document's permissions. Although qpdf does not enforce these | ||
| 2553 | + permissions, it does make them available so that applications that | ||
| 2554 | + use qpdf can enforce permissions. | ||
| 2555 | + | ||
| 2556 | + - The :samp:`--check` option to | ||
| 2557 | + :command:`qpdf` has been extended to include some | ||
| 2558 | + additional information. | ||
| 2559 | + | ||
| 2560 | + - *Non-compatible API changes:* | ||
| 2561 | + | ||
| 2562 | + - QPDF's exception handling mechanism now uses | ||
| 2563 | + ``std::logic_error`` for internal errors and | ||
| 2564 | + ``std::runtime_error`` for runtime errors in favor of the now | ||
| 2565 | + removed ``QEXC`` classes used in previous versions. The ``QEXC`` | ||
| 2566 | + exception classes predated the addition of the | ||
| 2567 | + :file:`<stdexcept>` header file to the C++ standard library. | ||
| 2568 | + Most of the exceptions thrown by the qpdf library itself are | ||
| 2569 | + still of type ``QPDFExc`` which is now derived from | ||
| 2570 | + ``std::runtime_error``. Programs that catch an instance of | ||
| 2571 | + ``std::exception`` and displayed it by calling the ``what()`` | ||
| 2572 | + method will not need to be changed. | ||
| 2573 | + | ||
| 2574 | + - The ``QPDFExc`` class now internally represents various fields | ||
| 2575 | + of the error condition and provides interfaces for querying | ||
| 2576 | + them. Among the fields is a numeric error code that can help | ||
| 2577 | + applications act differently on (a small number of) different | ||
| 2578 | + error conditions. See :file:`QPDFExc.hh` for details. | ||
| 2579 | + | ||
| 2580 | + - Warnings can be retrieved from qpdf as instances of ``QPDFExc`` | ||
| 2581 | + instead of strings. | ||
| 2582 | + | ||
| 2583 | + - The nested ``QPDF::EncryptionData`` class's constructor takes an | ||
| 2584 | + additional argument. This class is primarily intended to be used | ||
| 2585 | + by ``QPDFWriter``. There's not really anything useful an | ||
| 2586 | + end-user application could do with it. It probably shouldn't | ||
| 2587 | + really be part of the public interface to begin with. Likewise, | ||
| 2588 | + some of the methods for computing internal encryption dictionary | ||
| 2589 | + parameters have changed to support ``/R=4`` encryption. | ||
| 2590 | + | ||
| 2591 | + - The method ``QPDF::getUserPassword`` has been removed since it | ||
| 2592 | + didn't do what people would think it did. There are now two new | ||
| 2593 | + methods: ``QPDF::getPaddedUserPassword`` and | ||
| 2594 | + ``QPDF::getTrimmedUserPassword``. The first one does what the | ||
| 2595 | + old ``QPDF::getUserPassword`` method used to do, which is to | ||
| 2596 | + return the password with possible binary padding as specified by | ||
| 2597 | + the PDF specification. The second one returns a human-readable | ||
| 2598 | + password string. | ||
| 2599 | + | ||
| 2600 | + - The enumerated types that used to be nested in ``QPDFWriter`` | ||
| 2601 | + have moved to top-level enumerated types and are now defined in | ||
| 2602 | + the file :file:`qpdf/Constants.h`. This enables them to be | ||
| 2603 | + shared by both the C and C++ interfaces. | ||
| 2604 | + | ||
| 2605 | +2.0.6: May 3, 2009 | ||
| 2606 | + - Do not attempt to uncompress streams that have decode parameters | ||
| 2607 | + we don't recognize. Earlier versions of qpdf would have rejected | ||
| 2608 | + files with such streams. | ||
| 2609 | + | ||
| 2610 | +2.0.5: March 10, 2009 | ||
| 2611 | + - Improve error handling in the LZW decoder, and fix a small error | ||
| 2612 | + introduced in the previous version with regard to handling full | ||
| 2613 | + tables. The LZW decoder has been more strongly verified in this | ||
| 2614 | + release. | ||
| 2615 | + | ||
| 2616 | +2.0.4: February 21, 2009 | ||
| 2617 | + - Include proper support for LZW streams encoded without the "early | ||
| 2618 | + code change" flag. Special thanks to Atom Smasher who reported the | ||
| 2619 | + problem and provided an input file compressed in this way, which I | ||
| 2620 | + did not previously have. | ||
| 2621 | + | ||
| 2622 | + - Implement some improvements to file recovery logic. | ||
| 2623 | + | ||
| 2624 | +2.0.3: February 15, 2009 | ||
| 2625 | + - Compile cleanly with gcc 4.4. | ||
| 2626 | + | ||
| 2627 | + - Handle strings encoded as UTF-16BE properly. | ||
| 2628 | + | ||
| 2629 | +2.0.2: June 30, 2008 | ||
| 2630 | + - Update test suite to work properly with a | ||
| 2631 | + non-:command:`bash` | ||
| 2632 | + :file:`/bin/sh` and with Perl 5.10. No changes | ||
| 2633 | + were made to the actual qpdf source code itself for this release. | ||
| 2634 | + | ||
| 2635 | +2.0.1: May 6, 2008 | ||
| 2636 | + - No changes in functionality or interface. This release includes | ||
| 2637 | + fixes to the source code so that qpdf compiles properly and passes | ||
| 2638 | + its test suite on a broader range of platforms. See | ||
| 2639 | + :file:`ChangeLog` in the source distribution | ||
| 2640 | + for details. | ||
| 2641 | + | ||
| 2642 | +2.0: April 29, 2008 | ||
| 2643 | + - First public release. |
manual/weak-crypto.rst
0 โ 100644
| 1 | +.. _ref.weak-crypto: | ||
| 2 | + | ||
| 3 | +Weak Cryptography | ||
| 4 | +================= | ||
| 5 | + | ||
| 6 | +Start with version 10.4, qpdf is taking steps to reduce the likelihood | ||
| 7 | +of a user *accidentally* creating PDF files with insecure cryptography | ||
| 8 | +but will continue to allow creation of such files indefinitely with | ||
| 9 | +explicit acknowledgment. | ||
| 10 | + | ||
| 11 | +The PDF file format makes use of RC4, which is known to be a weak | ||
| 12 | +cryptography algorithm, and MD5, which is a weak hashing algorithm. In | ||
| 13 | +version 10.4, qpdf generates warnings for some (but not all) cases of | ||
| 14 | +writing files with weak cryptography when invoked from the command-line. | ||
| 15 | +These warnings can be suppressed using the | ||
| 16 | +:samp:`--allow-weak-crypto` option. | ||
| 17 | + | ||
| 18 | +It is planned for qpdf version 11 to be stricter, making it an error to | ||
| 19 | +write files with insecure cryptography from the command-line tool in | ||
| 20 | +most cases without specifying the | ||
| 21 | +:samp:`--allow-weak-crypto` flag and also to require | ||
| 22 | +explicit steps when using the C++ library to enable use of insecure | ||
| 23 | +cryptography. | ||
| 24 | + | ||
| 25 | +Note that qpdf must always retain support for weak cryptographic | ||
| 26 | +algorithms since this is required for reading older PDF files that use | ||
| 27 | +it. Additionally, qpdf will always retain the ability to create files | ||
| 28 | +using weak cryptographic algorithms since, as a development tool, qpdf | ||
| 29 | +explicitly supports creating older or deprecated types of PDF files | ||
| 30 | +since these are sometimes needed to test or work with older versions of | ||
| 31 | +software. Even if other cryptography libraries drop support for RC4 or | ||
| 32 | +MD5, qpdf can always fall back to its internal implementations of those | ||
| 33 | +algorithms, so they are not going to disappear from qpdf. |