Commit 10fb619d3e0618528b7ac6c20cad6262020cf947

Authored by Jay Berkenbilt
1 parent f3d1138b

Split documentation into multiple pages, change theme

... ... @@ -30,8 +30,6 @@ Before release:
30 30 I can do about, and it doesn't seem worth fixing. Maybe mention it
31 31 somewhere?
32 32 * README-maintainer: Fix installation of documentation to website
33   -* Get navigation working properly
34   -* Figure out where to put :ref:`search` so we get doc search
35 33  
36 34 Soon:
37 35  
... ...
manual/acknowledgement.rst 0 โ†’ 100644
  1 +.. _acknowledgments:
  2 +
  3 +Acknowledgment
  4 +==============
  5 +
  6 +QPDF was originally created in 2001 and modified periodically between
  7 +2001 and 2005 during my employment at `Apex CoVantage
  8 +<http://www.apexcovantage.com>`__. Upon my departure from Apex, the
  9 +company graciously allowed me to take ownership of the software and
  10 +continue maintaining it as an open source project, a decision for which I
  11 +am very grateful. I have made considerable enhancements to it since
  12 +that time. I feel fortunate to have worked for people who would make
  13 +such a decision. This work would not have been possible without their
  14 +support.
... ...
manual/cli.rst 0 โ†’ 100644
  1 +.. _ref.using:
  2 +
  3 +Running QPDF
  4 +============
  5 +
  6 +This chapter describes how to run the qpdf program from the command
  7 +line.
  8 +
  9 +.. _ref.invocation:
  10 +
  11 +Basic Invocation
  12 +----------------
  13 +
  14 +When running qpdf, the basic invocation is as follows:
  15 +
  16 +::
  17 +
  18 + qpdf [ options ] { infilename | --empty } outfilename
  19 +
  20 +This converts PDF file :samp:`infilename` to PDF file
  21 +:samp:`outfilename`. The output file is functionally
  22 +identical to the input file but may have been structurally reorganized.
  23 +Also, orphaned objects will be removed from the file. Many
  24 +transformations are available as controlled by the options below. In
  25 +place of :samp:`infilename`, the parameter
  26 +:samp:`--empty` may be specified. This causes qpdf to
  27 +use a dummy input file that contains zero pages. The only normal use
  28 +case for using :samp:`--empty` would be if you were
  29 +going to add pages from another source, as discussed in :ref:`ref.page-selection`.
  30 +
  31 +If :samp:`@filename` appears as a word anywhere in the
  32 +command-line, it will be read line by line, and each line will be
  33 +treated as a command-line argument. Leading and trailing whitespace is
  34 +intentionally not removed from lines, which makes it possible to handle
  35 +arguments that start or end with spaces. The :samp:`@-`
  36 +option allows arguments to be read from standard input. This allows qpdf
  37 +to be invoked with an arbitrary number of arbitrarily long arguments. It
  38 +is also very useful for avoiding having to pass passwords on the command
  39 +line. Note that the :samp:`@filename` can't appear in
  40 +the middle of an argument, so constructs such as
  41 +:samp:`--arg=@option` will not work. You would have to
  42 +include the argument and its options together in the arguments file.
  43 +
  44 +:samp:`outfilename` does not have to be seekable, even
  45 +when generating linearized files. Specifying ":samp:`-`"
  46 +as :samp:`outfilename` means to write to standard
  47 +output. If you want to overwrite the input file with the output, use the
  48 +option :samp:`--replace-input` and omit the output file
  49 +name. You can't specify the same file as both the input and the output.
  50 +If you do this, qpdf will tell you about the
  51 +:samp:`--replace-input` option.
  52 +
  53 +Most options require an output file, but some testing or inspection
  54 +commands do not. These are specifically noted.
  55 +
  56 +.. _ref.exit-status:
  57 +
  58 +Exit Status
  59 +~~~~~~~~~~~
  60 +
  61 +The exit status of :command:`qpdf` may be interpreted as
  62 +follows:
  63 +
  64 +- ``0``: no errors or warnings were found. The file may still have
  65 + problems qpdf can't detect. If
  66 + :samp:`--warning-exit-0` was specified, exit status 0
  67 + is used even if there are warnings.
  68 +
  69 +- ``2``: errors were found. qpdf was not able to fully process the
  70 + file.
  71 +
  72 +- ``3``: qpdf encountered problems that it was able to recover from. In
  73 + some cases, the resulting file may still be damaged. Note that qpdf
  74 + still exits with status ``3`` if it finds warnings even when
  75 + :samp:`--no-warn` is specified. With
  76 + :samp:`--warning-exit-0`, warnings without errors
  77 + exit with status 0 instead of 3.
  78 +
  79 +Note that :command:`qpdf` never exists with status ``1``.
  80 +If you get an exit status of ``1``, it was something else, like the
  81 +shell not being able to find or execute :command:`qpdf`.
  82 +
  83 +.. _ref.shell-completion:
  84 +
  85 +Shell Completion
  86 +----------------
  87 +
  88 +Starting in qpdf version 8.3.0, qpdf provides its own completion support
  89 +for zsh and bash. You can enable bash completion with :command:`eval
  90 +$(qpdf --completion-bash)` and zsh completion with
  91 +:command:`eval $(qpdf --completion-zsh)`. If
  92 +:command:`qpdf` is not in your path, you should invoke it
  93 +above with an absolute path. If you invoke it with a relative path, it
  94 +will warn you, and the completion won't work if you're in a different
  95 +directory.
  96 +
  97 +qpdf will use ``argv[0]`` to figure out where its executable is. This
  98 +may produce unwanted results in some cases, especially if you are trying
  99 +to use completion with copy of qpdf that is built from source. You can
  100 +specify a full path to the qpdf you want to use for completion in the
  101 +``QPDF_EXECUTABLE`` environment variable.
  102 +
  103 +.. _ref.basic-options:
  104 +
  105 +Basic Options
  106 +-------------
  107 +
  108 +The following options are the most common ones and perform commonly
  109 +needed transformations.
  110 +
  111 +:samp:`--help`
  112 + Display command-line invocation help.
  113 +
  114 +:samp:`--version`
  115 + Display the current version of qpdf.
  116 +
  117 +:samp:`--copyright`
  118 + Show detailed copyright information.
  119 +
  120 +:samp:`--show-crypto`
  121 + Show a list of available crypto providers, each on a line by itself.
  122 + The default provider is always listed first. See :ref:`ref.crypto` for more information about crypto
  123 + providers.
  124 +
  125 +:samp:`--completion-bash`
  126 + Output a completion command you can eval to enable shell completion
  127 + from bash.
  128 +
  129 +:samp:`--completion-zsh`
  130 + Output a completion command you can eval to enable shell completion
  131 + from zsh.
  132 +
  133 +:samp:`--password={password}`
  134 + Specifies a password for accessing encrypted files. To read the
  135 + password from a file or standard input, you can use
  136 + :samp:`--password-file`, added in qpdf 10.2. Note
  137 + that you can also use :samp:`@filename` or
  138 + :samp:`@-` as described above to put the password in
  139 + a file or pass it via standard input, but you would do so by
  140 + specifying the entire
  141 + :samp:`--password={password}`
  142 + option in the file. Syntax such as
  143 + :samp:`--password=@filename` won't work since
  144 + :samp:`@filename` is not recognized in the middle of
  145 + an argument.
  146 +
  147 +:samp:`--password-file={filename}`
  148 + Reads the first line from the specified file and uses it as the
  149 + password for accessing encrypted files.
  150 + :samp:`{filename}`
  151 + may be ``-`` to read the password from standard input. Note that, in
  152 + this case, the password is echoed and there is no prompt, so use with
  153 + caution.
  154 +
  155 +:samp:`--is-encrypted`
  156 + Silently exit with status 0 if the file is encrypted or status 2 if
  157 + the file is not encrypted. This is useful for shell scripts. Other
  158 + options are ignored if this is given. This option is mutually
  159 + exclusive with :samp:`--requires-password`. Both this
  160 + option and :samp:`--requires-password` exit with
  161 + status 2 for non-encrypted files.
  162 +
  163 +:samp:`--requires-password`
  164 + Silently exit with status 0 if a password (other than as supplied) is
  165 + required. Exit with status 2 if the file is not encrypted. Exit with
  166 + status 3 if the file is encrypted but requires no password or the
  167 + correct password has been supplied. This is useful for shell scripts.
  168 + Note that any supplied password is used when opening the file. When
  169 + used with a :samp:`--password` option, this option
  170 + can be used to check the correctness of the password. In that case,
  171 + an exit status of 3 means the file works with the supplied password.
  172 + This option is mutually exclusive with
  173 + :samp:`--is-encrypted`. Both this option and
  174 + :samp:`--is-encrypted` exit with status 2 for
  175 + non-encrypted files.
  176 +
  177 +:samp:`--verbose`
  178 + Increase verbosity of output. For now, this just prints some
  179 + indication of any file that it creates.
  180 +
  181 +:samp:`--progress`
  182 + Indicate progress while writing files.
  183 +
  184 +:samp:`--no-warn`
  185 + Suppress writing of warnings to stderr. If warnings were detected and
  186 + suppressed, :command:`qpdf` will still exit with exit
  187 + code 3. See also :samp:`--warning-exit-0`.
  188 +
  189 +:samp:`--warning-exit-0`
  190 + If warnings are found but no errors, exit with exit code 0 instead 3.
  191 + When combined with :samp:`--no-warn`, the effect is
  192 + for :command:`qpdf` to completely ignore warnings.
  193 +
  194 +:samp:`--linearize`
  195 + Causes generation of a linearized (web-optimized) output file.
  196 +
  197 +:samp:`--replace-input`
  198 + If specified, the output file name should be omitted. This option
  199 + tells qpdf to replace the input file with the output. It does this by
  200 + writing to
  201 + :file:`{infilename}.~qpdf-temp#`
  202 + and, when done, overwriting the input file with the temporary file.
  203 + If there were any warnings, the original input is saved as
  204 + :file:`{infilename}.~qpdf-orig`.
  205 +
  206 +:samp:`--copy-encryption=file`
  207 + Encrypt the file using the same encryption parameters, including user
  208 + and owner password, as the specified file. Use
  209 + :samp:`--encryption-file-password` to specify a
  210 + password if one is needed to open this file. Note that copying the
  211 + encryption parameters from a file also copies the first half of
  212 + ``/ID`` from the file since this is part of the encryption
  213 + parameters.
  214 +
  215 +:samp:`--encryption-file-password=password`
  216 + If the file specified with :samp:`--copy-encryption`
  217 + requires a password, specify the password using this option. Note
  218 + that only one of the user or owner password is required. Both
  219 + passwords will be preserved since QPDF does not distinguish between
  220 + the two passwords. It is possible to preserve encryption parameters,
  221 + including the owner password, from a file even if you don't know the
  222 + file's owner password.
  223 +
  224 +:samp:`--allow-weak-crypto`
  225 + Starting with version 10.4, qpdf issues warnings when requested to
  226 + create files using RC4 encryption. This option suppresses those
  227 + warnings. In future versions of qpdf, qpdf will refuse to create
  228 + files with weak cryptography when this flag is not given. See :ref:`ref.weak-crypto` for additional details.
  229 +
  230 +:samp:`--encrypt options --`
  231 + Causes generation an encrypted output file. Please see :ref:`ref.encryption-options` for details on how to specify
  232 + encryption parameters.
  233 +
  234 +:samp:`--decrypt`
  235 + Removes any encryption on the file. A password must be supplied if
  236 + the file is password protected.
  237 +
  238 +:samp:`--password-is-hex-key`
  239 + Overrides the usual computation/retrieval of the PDF file's
  240 + encryption key from user/owner password with an explicit
  241 + specification of the encryption key. When this option is specified,
  242 + the argument to the :samp:`--password` option is
  243 + interpreted as a hexadecimal-encoded key value. This only applies to
  244 + the password used to open the main input file. It does not apply to
  245 + other files opened by :samp:`--pages` or other
  246 + options or to files being written.
  247 +
  248 + Most users will never have a need for this option, and no standard
  249 + viewers support this mode of operation, but it can be useful for
  250 + forensic or investigatory purposes. For example, if a PDF file is
  251 + encrypted with an unknown password, a brute-force attack using the
  252 + key directly is sometimes more efficient than one using the password.
  253 + Also, if a file is heavily damaged, it may be possible to derive the
  254 + encryption key and recover parts of the file using it directly. To
  255 + expose the encryption key used by an encrypted file that you can open
  256 + normally, use the :samp:`--show-encryption-key`
  257 + option.
  258 +
  259 +:samp:`--suppress-password-recovery`
  260 + Ordinarily, qpdf attempts to automatically compensate for passwords
  261 + specified in the wrong character encoding. This option suppresses
  262 + that behavior. Under normal conditions, there are no reasons to use
  263 + this option. See :ref:`ref.unicode-passwords` for a
  264 + discussion
  265 +
  266 +:samp:`--password-mode={mode}`
  267 + This option can be used to fine-tune how qpdf interprets Unicode
  268 + (non-ASCII) password strings passed on the command line. With the
  269 + exception of the :samp:`hex-bytes` mode, these only
  270 + apply to passwords provided when encrypting files. The
  271 + :samp:`hex-bytes` mode also applies to passwords
  272 + specified for reading files. For additional discussion of the
  273 + supported password modes and when you might want to use them, see
  274 + :ref:`ref.unicode-passwords`. The following modes
  275 + are supported:
  276 +
  277 + - :samp:`auto`: Automatically determine whether the
  278 + specified password is a properly encoded Unicode (UTF-8) string,
  279 + and transcode it as required by the PDF spec based on the type
  280 + encryption being applied. On Windows starting with version 8.4.0,
  281 + and on almost all other modern platforms, incoming passwords will
  282 + be properly encoded in UTF-8, so this is almost always what you
  283 + want.
  284 +
  285 + - :samp:`unicode`: Tells qpdf that the incoming
  286 + password is UTF-8, overriding whatever its automatic detection
  287 + determines. The only difference between this mode and
  288 + :samp:`auto` is that qpdf will fail with an error
  289 + message if the password is not valid UTF-8 instead of falling back
  290 + to :samp:`bytes` mode with a warning.
  291 +
  292 + - :samp:`bytes`: Interpret the password as a literal
  293 + byte string. For non-Windows platforms, this is what versions of
  294 + qpdf prior to 8.4.0 did. For Windows platforms, there is no way to
  295 + specify strings of binary data on the command line directly, but
  296 + you can use the :samp:`@filename` option to do it,
  297 + in which case this option forces qpdf to respect the string of
  298 + bytes as provided. This option will allow you to encrypt PDF files
  299 + with passwords that will not be usable by other readers.
  300 +
  301 + - :samp:`hex-bytes`: Interpret the password as a
  302 + hex-encoded string. This provides a way to pass binary data as a
  303 + password on all platforms including Windows. As with
  304 + :samp:`bytes`, this option may allow creation of
  305 + files that can't be opened by other readers. This mode affects
  306 + qpdf's interpretation of passwords specified for decrypting files
  307 + as well as for encrypting them. It makes it possible to specify
  308 + strings that are encoded in some manner other than the system's
  309 + default encoding.
  310 +
  311 +:samp:`--rotate=[+|-]angle[:page-range]`
  312 + Apply rotation to specified pages. The
  313 + :samp:`page-range` portion of the option value has
  314 + the same format as page ranges in :ref:`ref.page-selection`. If the page range is omitted, the
  315 + rotation is applied to all pages. The :samp:`angle`
  316 + portion of the parameter may be either 0, 90, 180, or 270. If
  317 + preceded by :samp:`+` or :samp:`-`,
  318 + the angle is added to or subtracted from the specified pages'
  319 + original rotations. This is almost always what you want. Otherwise
  320 + the pages' rotations are set to the exact value, which may cause the
  321 + appearances of the pages to be inconsistent, especially for scans.
  322 + For example, the command :command:`qpdf in.pdf out.pdf
  323 + --rotate=+90:2,4,6 --rotate=180:7-8` would rotate pages
  324 + 2, 4, and 6 90 degrees clockwise from their original rotation and
  325 + force the rotation of pages 7 through 8 to 180 degrees regardless of
  326 + their original rotation, and the command :command:`qpdf in.pdf
  327 + out.pdf --rotate=+180` would rotate all pages by 180
  328 + degrees.
  329 +
  330 +:samp:`--keep-files-open={[yn]}`
  331 + This option controls whether qpdf keeps individual files open while
  332 + merging. Prior to version 8.1.0, qpdf always kept all files open, but
  333 + this meant that the number of files that could be merged was limited
  334 + by the operating system's open file limit. Version 8.1.0 opened files
  335 + as they were referenced and closed them after each read, but this
  336 + caused a major performance impact. Version 8.2.0 optimized the
  337 + performance but did so in a way that, for local file systems, there
  338 + was a small but unavoidable performance hit, but for networked file
  339 + systems, the performance impact could be very high. Starting with
  340 + version 8.2.1, the default behavior is that files are kept open if no
  341 + more than 200 files are specified, but this default behavior can be
  342 + explicitly overridden with the
  343 + :samp:`--keep-files-open` flag. If you are merging
  344 + more than 200 files but less than the operating system's max open
  345 + files limit, you may want to use
  346 + :samp:`--keep-files-open=y`, especially if working
  347 + over a networked file system. If you are using a local file system
  348 + where the overhead is low and you might sometimes merge more than the
  349 + OS limit's number of files from a script and are not worried about a
  350 + few seconds additional processing time, you may want to specify
  351 + :samp:`--keep-files-open=n`. The threshold for
  352 + switching may be changed from the default 200 with the
  353 + :samp:`--keep-files-open-threshold` option.
  354 +
  355 +:samp:`--keep-files-open-threshold={count}`
  356 + If specified, overrides the default value of 200 used as the
  357 + threshold for qpdf deciding whether or not to keep files open. See
  358 + :samp:`--keep-files-open` for details.
  359 +
  360 +:samp:`--pages options --`
  361 + Select specific pages from one or more input files. See :ref:`ref.page-selection` for details on how to do
  362 + page selection (splitting and merging).
  363 +
  364 +:samp:`--collate={n}`
  365 + When specified, collate rather than concatenate pages from files
  366 + specified with :samp:`--pages`. With a numeric
  367 + argument, collate in groups of :samp:`{n}`.
  368 + The default is 1. See :ref:`ref.page-selection` for additional details.
  369 +
  370 +:samp:`--flatten-rotation`
  371 + For each page that is rotated using the ``/Rotate`` key in the page's
  372 + dictionary, remove the ``/Rotate`` key and implement the identical
  373 + rotation semantics by modifying the page's contents. This option can
  374 + be useful to prepare files for buggy PDF applications that don't
  375 + properly handle rotated pages.
  376 +
  377 +:samp:`--split-pages=[n]`
  378 + Write each group of :samp:`n` pages to a separate
  379 + output file. If :samp:`n` is not specified, create
  380 + single pages. Output file names are generated as follows:
  381 +
  382 + - If the string ``%d`` appears in the output file name, it is
  383 + replaced with a range of zero-padded page numbers starting from 1.
  384 +
  385 + - Otherwise, if the output file name ends in
  386 + :file:`.pdf` (case insensitive), a zero-padded
  387 + page range, preceded by a dash, is inserted before the file
  388 + extension.
  389 +
  390 + - Otherwise, the file name is appended with a zero-padded page range
  391 + preceded by a dash.
  392 +
  393 + Page ranges are a single number in the case of single-page groups or
  394 + two numbers separated by a dash otherwise. For example, if
  395 + :file:`infile.pdf` has 12 pages
  396 +
  397 + - :command:`qpdf --split-pages infile.pdf %d-out`
  398 + would generate files :file:`01-out` through
  399 + :file:`12-out`
  400 +
  401 + - :command:`qpdf --split-pages=2 infile.pdf
  402 + outfile.pdf` would generate files
  403 + :file:`outfile-01-02.pdf` through
  404 + :file:`outfile-11-12.pdf`
  405 +
  406 + - :command:`qpdf --split-pages infile.pdf
  407 + something.else` would generate files
  408 + :file:`something.else-01` through
  409 + :file:`something.else-12`
  410 +
  411 + Note that outlines, threads, and other global features of the
  412 + original PDF file are not preserved. For each page of output, this
  413 + option creates an empty PDF and copies a single page from the output
  414 + into it. If you require the global data, you will have to run
  415 + :command:`qpdf` with the
  416 + :samp:`--pages` option once for each file. Using
  417 + :samp:`--split-pages` is much faster if you don't
  418 + require the global data.
  419 +
  420 +:samp:`--overlay options --`
  421 + Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
  422 + overlay/underlay.
  423 +
  424 +:samp:`--underlay options --`
  425 + Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
  426 + overlay/underlay.
  427 +
  428 +Password-protected files may be opened by specifying a password. By
  429 +default, qpdf will preserve any encryption data associated with a file.
  430 +If :samp:`--decrypt` is specified, qpdf will attempt to
  431 +remove any encryption information. If :samp:`--encrypt`
  432 +is specified, qpdf will replace the document's encryption parameters
  433 +with whatever is specified.
  434 +
  435 +Note that qpdf does not obey encryption restrictions already imposed on
  436 +the file. Doing so would be meaningless since qpdf can be used to remove
  437 +encryption from the file entirely. This functionality is not intended to
  438 +be used for bypassing copyright restrictions or other restrictions
  439 +placed on files by their producers.
  440 +
  441 +Prior to 8.4.0, in the case of passwords that contain characters that
  442 +fall outside of 7-bit US-ASCII, qpdf left the burden of supplying
  443 +properly encoded encryption and decryption passwords to the user.
  444 +Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For
  445 +an in-depth discussion, please see :ref:`ref.unicode-passwords`. Previous versions of this manual
  446 +described workarounds using the :command:`iconv` command.
  447 +Such workarounds are no longer required or recommended with qpdf 8.4.0.
  448 +However, for backward compatibility, qpdf attempts to detect those
  449 +workarounds and do the right thing in most cases.
  450 +
  451 +.. _ref.encryption-options:
  452 +
  453 +Encryption Options
  454 +------------------
  455 +
  456 +To change the encryption parameters of a file, use the --encrypt flag.
  457 +The syntax is
  458 +
  459 +::
  460 +
  461 + --encrypt user-password owner-password key-length [ restrictions ] --
  462 +
  463 +Note that ":samp:`--`" terminates parsing of encryption
  464 +flags and must be present even if no restrictions are present.
  465 +
  466 +Either or both of the user password and the owner password may be empty
  467 +strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation
  468 +of PDF files with a non-empty user password, an empty owner password,
  469 +and a 256-bit key since such files can be opened with no password. If
  470 +you want to create such files, specify the encryption option
  471 +:samp:`--allow-insecure`, as described below.
  472 +
  473 +The value for
  474 +:samp:`{key-length}` may
  475 +be 40, 128, or 256. The restriction flags are dependent upon key length.
  476 +When no additional restrictions are given, the default is to be fully
  477 +permissive.
  478 +
  479 +If :samp:`{key-length}`
  480 +is 40, the following restriction options are available:
  481 +
  482 +:samp:`--print=[yn]`
  483 + Determines whether or not to allow printing.
  484 +
  485 +:samp:`--modify=[yn]`
  486 + Determines whether or not to allow document modification.
  487 +
  488 +:samp:`--extract=[yn]`
  489 + Determines whether or not to allow text/image extraction.
  490 +
  491 +:samp:`--annotate=[yn]`
  492 + Determines whether or not to allow comments and form fill-in and
  493 + signing.
  494 +
  495 +If :samp:`{key-length}`
  496 +is 128, the following restriction options are available:
  497 +
  498 +:samp:`--accessibility=[yn]`
  499 + Determines whether or not to allow accessibility to visually
  500 + impaired. The qpdf library disregards this field when AES is used or
  501 + when 256-bit encryption is used. You should really never disable
  502 + accessibility, but qpdf lets you do it in case you need to configure
  503 + a file this way for testing purposes. The PDF spec says that
  504 + conforming readers should disregard this permission and always allow
  505 + accessibility.
  506 +
  507 +:samp:`--extract=[yn]`
  508 + Determines whether or not to allow text/graphic extraction.
  509 +
  510 +:samp:`--assemble=[yn]`
  511 + Determines whether document assembly (rotation and reordering of
  512 + pages) is allowed.
  513 +
  514 +:samp:`--annotate=[yn]`
  515 + Determines whether modifying annotations is allowed. This includes
  516 + adding comments and filling in form fields. Also allows editing of
  517 + form fields if :samp:`--modify-other=y` is given.
  518 +
  519 +:samp:`--form=[yn]`
  520 + Determines whether filling form fields is allowed.
  521 +
  522 +:samp:`--modify-other=[yn]`
  523 + Allow all document editing except those controlled separately by the
  524 + :samp:`--assemble`,
  525 + :samp:`--annotate`, and
  526 + :samp:`--form` options.
  527 +
  528 +:samp:`--print={print-opt}`
  529 + Controls printing access.
  530 + :samp:`{print-opt}`
  531 + may be one of the following:
  532 +
  533 + - :samp:`full`: allow full printing
  534 +
  535 + - :samp:`low`: allow low-resolution printing only
  536 +
  537 + - :samp:`none`: disallow printing
  538 +
  539 +:samp:`--modify={modify-opt}`
  540 + Controls modify access. This way of controlling modify access has
  541 + less granularity than new options added in qpdf 8.4.
  542 + :samp:`{modify-opt}`
  543 + may be one of the following:
  544 +
  545 + - :samp:`all`: allow full document modification
  546 +
  547 + - :samp:`annotate`: allow comment authoring, form
  548 + operations, and document assembly
  549 +
  550 + - :samp:`form`: allow form field fill-in and signing
  551 + and document assembly
  552 +
  553 + - :samp:`assembly`: allow document assembly only
  554 +
  555 + - :samp:`none`: allow no modifications
  556 +
  557 + Using the :samp:`--modify` option does not allow you
  558 + to create certain combinations of permissions such as allowing form
  559 + filling but not allowing document assembly. Starting with qpdf 8.4,
  560 + you can either just use the other options to control fields
  561 + individually, or you can use something like :samp:`--modify=form
  562 + --assembly=n` to fine tune.
  563 +
  564 +:samp:`--cleartext-metadata`
  565 + If specified, any metadata stream in the document will be left
  566 + unencrypted even if the rest of the document is encrypted. This also
  567 + forces the PDF version to be at least 1.5.
  568 +
  569 +:samp:`--use-aes=[yn]`
  570 + If :samp:`--use-aes=y` is specified, AES encryption
  571 + will be used instead of RC4 encryption. This forces the PDF version
  572 + to be at least 1.6.
  573 +
  574 +:samp:`--allow-insecure`
  575 + From qpdf 10.2, qpdf defaults to not allowing creation of PDF files
  576 + where the user password is non-empty, the owner password is empty,
  577 + and a 256-bit key is in use. Files created in this way are insecure
  578 + since they can be opened without a password. Users would ordinarily
  579 + never want to create such files. If you are using qpdf to
  580 + intentionally created strange files for testing (a definite valid use
  581 + of qpdf!), this option allows you to create such insecure files.
  582 +
  583 +:samp:`--force-V4`
  584 + Use of this option forces the ``/V`` and ``/R`` parameters in the
  585 + document's encryption dictionary to be set to the value ``4``. As
  586 + qpdf will automatically do this when required, there is no reason to
  587 + ever use this option. It exists primarily for use in testing qpdf
  588 + itself. This option also forces the PDF version to be at least 1.5.
  589 +
  590 +If :samp:`{key-length}`
  591 +is 256, the minimum PDF version is 1.7 with extension level 8, and the
  592 +AES-based encryption format used is the PDF 2.0 encryption method
  593 +supported by Acrobat X. the same options are available as with 128 bits
  594 +with the following exceptions:
  595 +
  596 +:samp:`--use-aes`
  597 + This option is not available with 256-bit keys. AES is always used
  598 + with 256-bit encryption keys.
  599 +
  600 +:samp:`--force-V4`
  601 + This option is not available with 256 keys.
  602 +
  603 +:samp:`--force-R5`
  604 + If specified, qpdf sets the minimum version to 1.7 at extension level
  605 + 3 and writes the deprecated encryption format used by Acrobat version
  606 + IX. This option should not be used in practice to generate PDF files
  607 + that will be in general use, but it can be useful to generate files
  608 + if you are trying to test proper support in another application for
  609 + PDF files encrypted in this way.
  610 +
  611 +The default for each permission option is to be fully permissive.
  612 +
  613 +.. _ref.page-selection:
  614 +
  615 +Page Selection Options
  616 +----------------------
  617 +
  618 +Starting with qpdf 3.0, it is possible to split and merge PDF files by
  619 +selecting pages from one or more input files. Whatever file is given as
  620 +the primary input file is used as the starting point, but its pages are
  621 +replaced with pages as specified.
  622 +
  623 +::
  624 +
  625 + --pages input-file [ --password=password ] [ page-range ] [ ... ] --
  626 +
  627 +Multiple input files may be specified. Each one is given as the name of
  628 +the input file, an optional password (if required to open the file), and
  629 +the range of pages. Note that ":samp:`--`" terminates
  630 +parsing of page selection flags.
  631 +
  632 +Starting with qpf 8.4, the special input file name
  633 +":file:`.`" can be used as a shortcut for the
  634 +primary input filename.
  635 +
  636 +For each file that pages should be taken from, specify the file, a
  637 +password needed to open the file (if any), and a page range. The
  638 +password needs to be given only once per file. If any of the input files
  639 +are the same as the primary input file or the file used to copy
  640 +encryption parameters (if specified), you do not need to repeat the
  641 +password here. The same file can be repeated multiple times. If a file
  642 +that is repeated has a password, the password only has to be given the
  643 +first time. All non-page data (info, outlines, page numbers, etc.) are
  644 +taken from the primary input file. To discard these, use
  645 +:samp:`--empty` as the primary input.
  646 +
  647 +Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf
  648 +sees a value in the place where it expects a page range and that value
  649 +is not a valid range but is a valid file name, qpdf will implicitly use
  650 +the range ``1-z``, meaning that it will include all pages in the file.
  651 +This makes it possible to easily combine all pages in a set of files
  652 +with a command like :command:`qpdf --empty out.pdf --pages \*.pdf
  653 +--`.
  654 +
  655 +The page range is a set of numbers separated by commas, ranges of
  656 +numbers separated dashes, or combinations of those. The character "z"
  657 +represents the last page. A number preceded by an "r" indicates to count
  658 +from the end, so ``r3-r1`` would be the last three pages of the
  659 +document. Pages can appear in any order. Ranges can appear with a high
  660 +number followed by a low number, which causes the pages to appear in
  661 +reverse. Numbers may be repeated in a page range. A page range may be
  662 +optionally appended with ``:even`` or ``:odd`` to indicate only the even
  663 +or odd pages in the given range. Note that even and odd refer to the
  664 +positions within the specified, range, not whether the original number
  665 +is even or odd.
  666 +
  667 +Example page ranges:
  668 +
  669 +- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in
  670 + that order.
  671 +
  672 +- ``z-1``: all pages in the document in reverse
  673 +
  674 +- ``r3-r1``: the last three pages of the document
  675 +
  676 +- ``r1-r3``: the last three pages of the document in reverse order
  677 +
  678 +- ``1-20:even``: even pages from 2 to 20
  679 +
  680 +- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd
  681 + positions from among the original range, which represents pages 5, 7,
  682 + 8, 9, and 12.
  683 +
  684 +Starting in qpdf version 8.3, you can specify the
  685 +:samp:`--collate` option. Note that this option is
  686 +specified outside of :samp:`--pagesย ...ย --`. When
  687 +:samp:`--collate` is specified, it changes the meaning
  688 +of :samp:`--pages` so that the specified files, as
  689 +modified by page ranges, are collated rather than concatenated. For
  690 +example, if you add the files :file:`odd.pdf` and
  691 +:file:`even.pdf` containing odd and even pages of a
  692 +document respectively, you could run :command:`qpdf --collate odd.pdf
  693 +--pages odd.pdf even.pdf -- all.pdf` to collate the pages.
  694 +This would pick page 1 from odd, page 1 from even, page 2 from odd, page
  695 +2 from even, etc. until all pages have been included. Any number of
  696 +files and page ranges can be specified. If any file has fewer pages,
  697 +that file is just skipped when its pages have all been included. For
  698 +example, if you ran :command:`qpdf --collate --empty --pages a.pdf
  699 +1-5 b.pdf 6-4 c.pdf r1 -- out.pdf`, you would get the
  700 +following pages in this order:
  701 +
  702 +- a.pdf page 1
  703 +
  704 +- b.pdf page 6
  705 +
  706 +- c.pdf last page
  707 +
  708 +- a.pdf page 2
  709 +
  710 +- b.pdf page 5
  711 +
  712 +- a.pdf page 3
  713 +
  714 +- b.pdf page 4
  715 +
  716 +- a.pdf page 4
  717 +
  718 +- a.pdf page 5
  719 +
  720 +Starting in qpdf version 10.2, you may specify a numeric argument to
  721 +:samp:`--collate`. With
  722 +:samp:`--collate={n}`,
  723 +pull groups of :samp:`{n}` pages from each file,
  724 +again, stopping when there are no more pages. For example, if you ran
  725 +:command:`qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf
  726 +r1 -- out.pdf`, you would get the following pages in this
  727 +order:
  728 +
  729 +- a.pdf page 1
  730 +
  731 +- a.pdf page 2
  732 +
  733 +- b.pdf page 6
  734 +
  735 +- b.pdf page 5
  736 +
  737 +- c.pdf last page
  738 +
  739 +- a.pdf page 3
  740 +
  741 +- a.pdf page 4
  742 +
  743 +- b.pdf page 4
  744 +
  745 +- a.pdf page 5
  746 +
  747 +Starting in qpdf version 8.3, when you split and merge files, any page
  748 +labels (page numbers) are preserved in the final file. It is expected
  749 +that more document features will be preserved by splitting and merging.
  750 +In the mean time, semantics of splitting and merging vary across
  751 +features. For example, the document's outlines (bookmarks) point to
  752 +actual page objects, so if you select some pages and not others,
  753 +bookmarks that point to pages that are in the output file will work, and
  754 +remaining bookmarks will not work. A future version of
  755 +:command:`qpdf` may do a better job at handling these
  756 +issues. (Note that the qpdf library already contains all of the APIs
  757 +required in order to implement this in your own application if you need
  758 +it.) In the mean time, you can always use
  759 +:samp:`--empty` as the primary input file to avoid
  760 +copying all of that from the first file. For example, to take pages 1
  761 +through 5 from a :file:`infile.pdf` while preserving
  762 +all metadata associated with that file, you could use
  763 +
  764 +::
  765 +
  766 + qpdf infile.pdf --pages . 1-5 -- outfile.pdf
  767 +
  768 +If you wanted pages 1 through 5 from
  769 +:file:`infile.pdf` but you wanted the rest of the
  770 +metadata to be dropped, you could instead run
  771 +
  772 +::
  773 +
  774 + qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf
  775 +
  776 +If you wanted to take pages 1 through 5 from
  777 +:file:`file1.pdf` and pages 11 through 15 from
  778 +:file:`file2.pdf` in reverse, taking document-level
  779 +metadata from :file:`file2.pdf`, you would run
  780 +
  781 +::
  782 +
  783 + qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf
  784 +
  785 +If, for some reason, you wanted to take the first page of an encrypted
  786 +file called :file:`encrypted.pdf` with password
  787 +``pass`` and repeat it twice in an output file, and if you wanted to
  788 +drop document-level metadata but preserve encryption, you would use
  789 +
  790 +::
  791 +
  792 + qpdf --empty --copy-encryption=encrypted.pdf --encryption-file-password=pass
  793 + --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --
  794 + outfile.pdf
  795 +
  796 +Note that we had to specify the password all three times because giving
  797 +a password as :samp:`--encryption-file-password` doesn't
  798 +count for page selection, and as far as qpdf is concerned,
  799 +:file:`encrypted.pdf` and
  800 +:file:`./encrypted.pdf` are separated files. These
  801 +are all corner cases that most users should hopefully never have to be
  802 +bothered with.
  803 +
  804 +Prior to version 8.4, it was not possible to specify the same page from
  805 +the same file directly more than once, and the workaround of specifying
  806 +the same file in more than one way was required. Version 8.4 removes
  807 +this limitation, but there is still a valid use case. When you specify
  808 +the same page from the same file more than once, qpdf will share objects
  809 +between the pages. If you are going to do further manipulation on the
  810 +file and need the two instances of the same original page to be deep
  811 +copies, then you can specify the file in two different ways. For example
  812 +:command:`qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf`
  813 +would create a file with two copies of the first page of the input, and
  814 +the two copies would share any objects in common. This includes fonts,
  815 +images, and anything else the page references.
  816 +
  817 +.. _ref.overlay-underlay:
  818 +
  819 +Overlay and Underlay Options
  820 +----------------------------
  821 +
  822 +Starting with qpdf 8.4, it is possible to overlay or underlay pages from
  823 +other files onto the output generated by qpdf. Specify overlay or
  824 +underlay as follows:
  825 +
  826 +::
  827 +
  828 + { --overlay | --underlay } file [ options ] --
  829 +
  830 +Overlay and underlay options are processed late, so they can be combined
  831 +with other like merging and will apply to the final output. The
  832 +:samp:`--overlay` and :samp:`--underlay`
  833 +options work the same way, except underlay pages are drawn underneath
  834 +the page to which they are applied, possibly obscured by the original
  835 +page, and overlay files are drawn on top of the page to which they are
  836 +applied, possibly obscuring the page. You can combine overlay and
  837 +underlay.
  838 +
  839 +The default behavior of overlay and underlay is that pages are taken
  840 +from the overlay/underlay file in sequence and applied to corresponding
  841 +pages in the output until there are no more output pages. If the overlay
  842 +or underlay file runs out of pages, remaining output pages are left
  843 +alone. This behavior can be modified by options, which are provided
  844 +between the :samp:`--overlay` or
  845 +:samp:`--underlay` flag and the
  846 +:samp:`--` option. The following options are supported:
  847 +
  848 +- :samp:`--password=password`: supply a password if the
  849 + overlay/underlay file is encrypted.
  850 +
  851 +- :samp:`--to=page-range`: a range of pages in the same
  852 + form at described in :ref:`ref.page-selection`
  853 + indicates which pages in the output should have the overlay/underlay
  854 + applied. If not specified, overlay/underlay are applied to all pages.
  855 +
  856 +- :samp:`--from=[page-range]`: a range of pages that
  857 + specifies which pages in the overlay/underlay file will be used for
  858 + overlay or underlay. If not specified, all pages will be used. This
  859 + can be explicitly specified to be empty if
  860 + :samp:`--repeat` is used.
  861 +
  862 +- :samp:`--repeat=page-range`: an optional range of
  863 + pages that specifies which pages in the overlay/underlay file will be
  864 + repeated after the "from" pages are used up. If you want to repeat a
  865 + range of pages starting at the beginning, you can explicitly use
  866 + :samp:`--from=`.
  867 +
  868 +Here are some examples.
  869 +
  870 +- :command:`--overlay o.pdf --to=1-5 --from=1-3 --repeat=4
  871 + --`: overlay the first three pages from file
  872 + :file:`o.pdf` onto the first three pages of the
  873 + output, then overlay page 4 from :file:`o.pdf`
  874 + onto pages 4 and 5 of the output. Leave remaining output pages
  875 + untouched.
  876 +
  877 +- :command:`--underlay footer.pdf --from= --repeat=1,2
  878 + --`: Underlay page 1 of
  879 + :file:`footer.pdf` on all odd output pages, and
  880 + underlay page 2 of :file:`footer.pdf` on all even
  881 + output pages.
  882 +
  883 +.. _ref.attachments:
  884 +
  885 +Embedded Files/Attachments Options
  886 +----------------------------------
  887 +
  888 +Starting with qpdf 10.2, you can work with file attachments in PDF files
  889 +from the command line. The following options are available:
  890 +
  891 +:samp:`--list-attachments`
  892 + Show the "key" and stream number for embedded files. With
  893 + :samp:`--verbose`, additional information, including
  894 + preferred file name, description, dates, and more are also displayed.
  895 + The key is usually but not always equal to the file name, and is
  896 + needed by some of the other options.
  897 +
  898 +:samp:`--show-attachment={key}`
  899 + Write the contents of the specified attachment to standard output as
  900 + binary data. The key should match one of the keys shown by
  901 + :samp:`--list-attachments`. If specified multiple
  902 + times, only the last attachment will be shown.
  903 +
  904 +:samp:`--add-attachment {file} {options} --`
  905 + Add or replace an attachment with the contents of
  906 + :samp:`{file}`. This may be specified more
  907 + than once. The following additional options may appear before the
  908 + ``--`` that ends this option:
  909 +
  910 + :samp:`--key={key}`
  911 + The key to use to register the attachment in the embedded files
  912 + table. Defaults to the last path element of
  913 + :samp:`{file}`.
  914 +
  915 + :samp:`--filename={name}`
  916 + The file name to be used for the attachment. This is what is
  917 + usually displayed to the user and is the name most graphical PDF
  918 + viewers will use when saving a file. It defaults to the last path
  919 + element of :samp:`{file}`.
  920 +
  921 + :samp:`--creationdate={date}`
  922 + The attachment's creation date in PDF format; defaults to the
  923 + current time. The date format is explained below.
  924 +
  925 + :samp:`--moddate={date}`
  926 + The attachment's modification date in PDF format; defaults to the
  927 + current time. The date format is explained below.
  928 +
  929 + :samp:`--mimetype={type/subtype}`
  930 + The mime type for the attachment, e.g. ``text/plain`` or
  931 + ``application/pdf``. Note that the mimetype appears in a field
  932 + called ``/Subtype`` in the PDF but actually includes the full type
  933 + and subtype of the mime type.
  934 +
  935 + :samp:`--description={"text"}`
  936 + Descriptive text for the attachment, displayed by some PDF
  937 + viewers.
  938 +
  939 + :samp:`--replace`
  940 + Indicates that any existing attachment with the same key should be
  941 + replaced by the new attachment. Otherwise,
  942 + :command:`qpdf` gives an error if an attachment
  943 + with that key is already present.
  944 +
  945 +:samp:`--remove-attachment={key}`
  946 + Remove the specified attachment. This doesn't only remove the
  947 + attachment from the embedded files table but also clears out the file
  948 + specification. That means that any potential internal links to the
  949 + attachment will be broken. This option may be specified multiple
  950 + times. Run with :samp:`--verbose` to see status of
  951 + the removal.
  952 +
  953 +:samp:`--copy-attachments-from {file} {options} --`
  954 + Copy attachments from another file. This may be specified more than
  955 + once. The following additional options may appear before the ``--``
  956 + that ends this option:
  957 +
  958 + :samp:`--password={password}`
  959 + If required, the password needed to open
  960 + :samp:`{file}`
  961 +
  962 + :samp:`--prefix={prefix}`
  963 + Only required if the file from which attachments are being copied
  964 + has attachments with keys that conflict with attachments already
  965 + in the file. In this case, the specified prefix will be prepended
  966 + to each key. This affects only the key in the embedded files
  967 + table, not the file name. The PDF specification doesn't preclude
  968 + multiple attachments having the same file name.
  969 +
  970 +When a date is required, the date should conform to the PDF date format
  971 +specification, which is
  972 +``D:``\ :samp:`{yyyymmddhhmmss<z>}`, where
  973 +:samp:`{<z>}` is either ``Z`` for UTC or a
  974 +timezone offset in the form :samp:`{-hh'mm'}` or
  975 +:samp:`{+hh'mm'}`. Examples:
  976 +``D:20210207161528-05'00'``, ``D:20210207211528Z``.
  977 +
  978 +.. _ref.advanced-parsing:
  979 +
  980 +Advanced Parsing Options
  981 +------------------------
  982 +
  983 +These options control aspects of how qpdf reads PDF files. Mostly these
  984 +are of use to people who are working with damaged files. There is little
  985 +reason to use these options unless you are trying to solve specific
  986 +problems. The following options are available:
  987 +
  988 +:samp:`--suppress-recovery`
  989 + Prevents qpdf from attempting to recover damaged files.
  990 +
  991 +:samp:`--ignore-xref-streams`
  992 + Tells qpdf to ignore any cross-reference streams.
  993 +
  994 +Ordinarily, qpdf will attempt to recover from certain types of errors in
  995 +PDF files. These include errors in the cross-reference table, certain
  996 +types of object numbering errors, and certain types of stream length
  997 +errors. Sometimes, qpdf may think it has recovered but may not have
  998 +actually recovered, so care should be taken when using this option as
  999 +some data loss is possible. The
  1000 +:samp:`--suppress-recovery` option will prevent qpdf
  1001 +from attempting recovery. In this case, it will fail on the first error
  1002 +that it encounters.
  1003 +
  1004 +Ordinarily, qpdf reads cross-reference streams when they are present in
  1005 +a PDF file. If :samp:`--ignore-xref-streams` is
  1006 +specified, qpdf will ignore any cross-reference streams for hybrid PDF
  1007 +files. The purpose of hybrid files is to make some content available to
  1008 +viewers that are not aware of cross-reference streams. It is almost
  1009 +never desirable to ignore them. The only time when you might want to use
  1010 +this feature is if you are testing creation of hybrid PDF files and wish
  1011 +to see how a PDF consumer that doesn't understand object and
  1012 +cross-reference streams would interpret such a file.
  1013 +
  1014 +.. _ref.advanced-transformation:
  1015 +
  1016 +Advanced Transformation Options
  1017 +-------------------------------
  1018 +
  1019 +These transformation options control fine points of how qpdf creates the
  1020 +output file. Mostly these are of use only to people who are very
  1021 +familiar with the PDF file format or who are PDF developers. The
  1022 +following options are available:
  1023 +
  1024 +:samp:`--compress-streams={[yn]}`
  1025 + By default, or with :samp:`--compress-streams=y`,
  1026 + qpdf will compress any stream with no other filters applied to it
  1027 + with the ``/FlateDecode`` filter when it writes it. To suppress this
  1028 + behavior and preserve uncompressed streams as uncompressed, use
  1029 + :samp:`--compress-streams=n`.
  1030 +
  1031 +:samp:`--decode-level={option}`
  1032 + Controls which streams qpdf tries to decode. The default is
  1033 + :samp:`generalized`. The following options are
  1034 + available:
  1035 +
  1036 + - :samp:`none`: do not attempt to decode any streams
  1037 +
  1038 + - :samp:`generalized`: decode streams filtered with
  1039 + supported generalized filters: ``/LZWDecode``, ``/FlateDecode``,
  1040 + ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized
  1041 + filters as those to be used for general-purpose compression or
  1042 + encoding, as opposed to filters specifically designed for image
  1043 + data. Note that, by default, streams already compressed with
  1044 + ``/FlateDecode`` are not uncompressed and recompressed unless you
  1045 + also specify :samp:`--recompress-flate`.
  1046 +
  1047 + - :samp:`specialized`: in addition to generalized,
  1048 + decode streams with supported non-lossy specialized filters;
  1049 + currently this is just ``/RunLengthDecode``
  1050 +
  1051 + - :samp:`all`: in addition to generalized and
  1052 + specialized, decode streams with supported lossy filters;
  1053 + currently this is just ``/DCTDecode`` (JPEG)
  1054 +
  1055 +:samp:`--stream-data={option}`
  1056 + Controls transformation of stream data. This option predates the
  1057 + :samp:`--compress-streams` and
  1058 + :samp:`--decode-level` options. Those options can be
  1059 + used to achieve the same affect with more control. The value of
  1060 + :samp:`{option}` may
  1061 + be one of the following:
  1062 +
  1063 + - :samp:`compress`: recompress stream data when
  1064 + possible (default); equivalent to
  1065 + :samp:`--compress-streams=y`
  1066 + :samp:`--decode-level=generalized`. Does not
  1067 + recompress streams already compressed with ``/FlateDecode`` unless
  1068 + :samp:`--recompress-flate` is also specified.
  1069 +
  1070 + - :samp:`preserve`: leave all stream data as is;
  1071 + equivalent to :samp:`--compress-streams=n`
  1072 + :samp:`--decode-level=none`
  1073 +
  1074 + - :samp:`uncompress`: uncompress stream data
  1075 + compressed with generalized filters when possible; equivalent to
  1076 + :samp:`--compress-streams=n`
  1077 + :samp:`--decode-level=generalized`
  1078 +
  1079 +:samp:`--recompress-flate`
  1080 + By default, streams already compressed with ``/FlateDecode`` are left
  1081 + alone rather than being uncompressed and recompressed. This option
  1082 + causes qpdf to uncompress and recompress the streams. There is a
  1083 + significant performance cost to using this option, but you probably
  1084 + want to use it if you specify
  1085 + :samp:`--compression-level`.
  1086 +
  1087 +:samp:`--compression-level={level}`
  1088 + When writing new streams that are compressed with ``/FlateDecode``,
  1089 + use the specified compression level. The value of
  1090 + :samp:`level` should be a number from 1 to 9 and is
  1091 + passed directly to zlib, which implements deflate compression. Note
  1092 + that qpdf doesn't uncompress and recompress streams by default. To
  1093 + have this option apply to already compressed streams, you should also
  1094 + specify :samp:`--recompress-flate`. If your goal is
  1095 + to shrink the size of PDF files, you should also use
  1096 + :samp:`--object-streams=generate`.
  1097 +
  1098 +:samp:`--normalize-content=[yn]`
  1099 + Enables or disables normalization of content streams. Content
  1100 + normalization is enabled by default in QDF mode. Please see :ref:`ref.qdf` for additional discussion of QDF mode.
  1101 +
  1102 +:samp:`--object-streams={mode}`
  1103 + Controls handling of object streams. The value of
  1104 + :samp:`{mode}` may be
  1105 + one of the following:
  1106 +
  1107 + - :samp:`preserve`: preserve original object streams
  1108 + (default)
  1109 +
  1110 + - :samp:`disable`: don't write any object streams
  1111 +
  1112 + - :samp:`generate`: use object streams wherever
  1113 + possible
  1114 +
  1115 +:samp:`--preserve-unreferenced`
  1116 + Tells qpdf to preserve objects that are not referenced when writing
  1117 + the file. Ordinarily any object that is not referenced in a traversal
  1118 + of the document from the trailer dictionary will be discarded. This
  1119 + may be useful in working with some damaged files or inspecting files
  1120 + with known unreferenced objects.
  1121 +
  1122 + This flag is ignored for linearized files and has the effect of
  1123 + causing objects in the new file to be written in order by object ID
  1124 + from the original file. This does not mean that object numbers will
  1125 + be the same since qpdf may create stream lengths as direct or
  1126 + indirect differently from the original file, and the original file
  1127 + may have gaps in its numbering.
  1128 +
  1129 + See also :samp:`--preserve-unreferenced-resources`,
  1130 + which does something completely different.
  1131 +
  1132 +:samp:`--remove-unreferenced-resources={option}`
  1133 + The :samp:`{option}` may be ``auto``,
  1134 + ``yes``, or ``no``. The default is ``auto``.
  1135 +
  1136 + Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt
  1137 + to remove images and fonts that are not used by a page even if they
  1138 + are referenced in the page's resources dictionary. When shared
  1139 + resources are in use, this behavior can greatly reduce the file sizes
  1140 + of split pages, but the analysis is very slow. In versions from 8.1
  1141 + through 9.1.1, qpdf did this analysis by default. Starting in qpdf
  1142 + 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file
  1143 + to determine whether the file is likely to have unreferenced objects
  1144 + on pages, a pattern that frequently occurs when resource dictionaries
  1145 + are shared across multiple pages and rarely occurs otherwise. If it
  1146 + discovers this pattern, then it will attempt to remove unreferenced
  1147 + resources. Usually this means you get the slower splitting speed only
  1148 + when it's actually going to create smaller files. You can suppress
  1149 + removal of unreferenced resources altogether by specifying ``no`` or
  1150 + force it to do the full algorithm by specifying ``yes``.
  1151 +
  1152 + Other than cases in which you don't care about file size and care a
  1153 + lot about runtime, there are few reasons to use this option,
  1154 + especially now that ``auto`` mode is supported. One reason to use
  1155 + this is if you suspect that qpdf is removing resources it shouldn't
  1156 + be removing. If you encounter that case, please report it as bug at
  1157 + https://github.com/qpdf/qpdf/issues/.
  1158 +
  1159 +:samp:`--preserve-unreferenced-resources`
  1160 + This is a synonym for
  1161 + :samp:`--remove-unreferenced-resources=no`.
  1162 +
  1163 + See also :samp:`--preserve-unreferenced`, which does
  1164 + something completely different.
  1165 +
  1166 +:samp:`--newline-before-endstream`
  1167 + Tells qpdf to insert a newline before the ``endstream`` keyword, not
  1168 + counted in the length, after any stream content even if the last
  1169 + character of the stream was a newline. This may result in two
  1170 + newlines in some cases. This is a requirement of PDF/A. While qpdf
  1171 + doesn't specifically know how to generate PDF/A-compliant PDFs, this
  1172 + at least prevents it from removing compliance on already compliant
  1173 + files.
  1174 +
  1175 +:samp:`--linearize-pass1={file}`
  1176 + Write the first pass of linearization to the named file. The
  1177 + resulting file is not a valid PDF file. This option is useful only
  1178 + for debugging ``QPDFWriter``'s linearization code. When qpdf
  1179 + linearizes files, it writes the file in two passes, using the first
  1180 + pass to calculate sizes and offsets that are required for hint tables
  1181 + and the linearization dictionary. Ordinarily, the first pass is
  1182 + discarded. This option enables it to be captured.
  1183 +
  1184 +:samp:`--coalesce-contents`
  1185 + When a page's contents are split across multiple streams, this option
  1186 + causes qpdf to combine them into a single stream. Use of this option
  1187 + is never necessary for ordinary usage, but it can help when working
  1188 + with some files in some cases. For example, this can also be combined
  1189 + with QDF mode or content normalization to make it easier to look at
  1190 + all of a page's contents at once.
  1191 +
  1192 +:samp:`--flatten-annotations={option}`
  1193 + This option collapses annotations into the pages' contents with
  1194 + special handling for form fields. Ordinarily, an annotation is
  1195 + rendered separately and on top of the page. Combining annotations
  1196 + into the page's contents effectively freezes the placement of the
  1197 + annotations, making them look right after various page
  1198 + transformations. The library functionality backing this option was
  1199 + added for the benefit of programs that want to create *n-up* page
  1200 + layouts and other similar things that don't work well with
  1201 + annotations. The :samp:`{option}` parameter
  1202 + may be any of the following:
  1203 +
  1204 + - :samp:`all`: include all annotations that are not
  1205 + marked invisible or hidden
  1206 +
  1207 + - :samp:`print`: only include annotations that
  1208 + indicate that they should appear when the page is printed
  1209 +
  1210 + - :samp:`screen`: omit annotations that indicate
  1211 + they should not appear on the screen
  1212 +
  1213 + Note that form fields are special because the annotations that are
  1214 + used to render filled-in form fields may become out of date from the
  1215 + fields' values if the form is filled in by a program that doesn't
  1216 + know how to update the appearances. If qpdf detects this case, its
  1217 + default behavior is not to flatten those annotations because doing so
  1218 + would cause the value of the form field to be lost. This gives you a
  1219 + chance to go back and resave the form with a program that knows how
  1220 + to generate appearances. QPDF itself can generate appearances with
  1221 + some limitations. See the
  1222 + :samp:`--generate-appearances` option below.
  1223 +
  1224 +:samp:`--generate-appearances`
  1225 + If a file contains interactive form fields and indicates that the
  1226 + appearances are out of date with the values of the form, this flag
  1227 + will regenerate appearances, subject to a few limitations. Note that
  1228 + there is not usually a reason to do this, but it can be necessary
  1229 + before using the :samp:`--flatten-annotations`
  1230 + option. Most of these are not a problem with well-behaved PDF files.
  1231 + The limitations are as follows:
  1232 +
  1233 + - Radio button and checkbox appearances use the pre-set values in
  1234 + the PDF file. QPDF just makes sure that the correct appearance is
  1235 + displayed based on the value of the field. This is fine for PDF
  1236 + files that create their forms properly. Some PDF writers save
  1237 + appearances for fields when they change, which could cause some
  1238 + controls to have inconsistent appearances.
  1239 +
  1240 + - For text fields and list boxes, any characters that fall outside
  1241 + of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman"
  1242 + encoding, will be replaced by the ``?`` character.
  1243 +
  1244 + - Quadding is ignored. Quadding is used to specify whether the
  1245 + contents of a field should be left, center, or right aligned with
  1246 + the field.
  1247 +
  1248 + - Rich text, multi-line, and other more elaborate formatting
  1249 + directives are ignored.
  1250 +
  1251 + - There is no support for multi-select fields or signature fields.
  1252 +
  1253 + If qpdf doesn't do a good enough job with your form, use an external
  1254 + application to save your filled-in form before processing it with
  1255 + qpdf.
  1256 +
  1257 +:samp:`--optimize-images`
  1258 + This flag causes qpdf to recompress all images that are not
  1259 + compressed with DCT (JPEG) using DCT compression as long as doing so
  1260 + decreases the size in bytes of the image data and the image does not
  1261 + fall below minimum specified dimensions. Useful information is
  1262 + provided when used in combination with
  1263 + :samp:`--verbose`. See also the
  1264 + :samp:`--oi-min-width`,
  1265 + :samp:`--oi-min-height`, and
  1266 + :samp:`--oi-min-area` options. By default, starting
  1267 + in qpdf 8.4, inline images are converted to regular images and
  1268 + optimized as well. Use :samp:`--keep-inline-images`
  1269 + to prevent inline images from being included.
  1270 +
  1271 +:samp:`--oi-min-width={width}`
  1272 + Avoid optimizing images whose width is below the specified amount. If
  1273 + omitted, the default is 128 pixels. Use 0 for no minimum.
  1274 +
  1275 +:samp:`--oi-min-height={height}`
  1276 + Avoid optimizing images whose height is below the specified amount.
  1277 + If omitted, the default is 128 pixels. Use 0 for no minimum.
  1278 +
  1279 +:samp:`--oi-min-area={area-in-pixels}`
  1280 + Avoid optimizing images whose pixel count (widthย ร—ย height) is below
  1281 + the specified amount. If omitted, the default is 16,384 pixels. Use 0
  1282 + for no minimum.
  1283 +
  1284 +:samp:`--externalize-inline-images`
  1285 + Convert inline images to regular images. By default, images whose
  1286 + data is at least 1,024 bytes are converted when this option is
  1287 + selected. Use :samp:`--ii-min-bytes` to change the
  1288 + size threshold. This option is implicitly selected when
  1289 + :samp:`--optimize-images` is selected. Use
  1290 + :samp:`--keep-inline-images` to exclude inline images
  1291 + from image optimization.
  1292 +
  1293 +:samp:`--ii-min-bytes={bytes}`
  1294 + Avoid converting inline images whose size is below the specified
  1295 + minimum size to regular images. If omitted, the default is 1,024
  1296 + bytes. Use 0 for no minimum.
  1297 +
  1298 +:samp:`--keep-inline-images`
  1299 + Prevent inline images from being included in image optimization. This
  1300 + option has no affect when :samp:`--optimize-images`
  1301 + is not specified.
  1302 +
  1303 +:samp:`--remove-page-labels`
  1304 + Remove page labels from the output file.
  1305 +
  1306 +:samp:`--qdf`
  1307 + Turns on QDF mode. For additional information on QDF, please see :ref:`ref.qdf`. Note that :samp:`--linearize`
  1308 + disables QDF mode.
  1309 +
  1310 +:samp:`--min-version={version}`
  1311 + Forces the PDF version of the output file to be at least
  1312 + :samp:`{version}`. In other words, if the
  1313 + input file has a lower version than the specified version, the
  1314 + specified version will be used. If the input file has a higher
  1315 + version, the input file's original version will be used. It is seldom
  1316 + necessary to use this option since qpdf will automatically increase
  1317 + the version as needed when adding features that require newer PDF
  1318 + readers.
  1319 +
  1320 + The version number may be expressed in the form
  1321 + :samp:`{major.minor.extension-level}`, in
  1322 + which case the version is interpreted as
  1323 + :samp:`{major.minor}` at extension level
  1324 + :samp:`{extension-level}`. For example,
  1325 + version ``1.7.8`` represents version 1.7 at extension level 8. Note
  1326 + that minimal syntax checking is done on the command line.
  1327 +
  1328 +:samp:`--force-version={version}`
  1329 + This option forces the PDF version to be the exact version specified
  1330 + *even when the file may have content that is not supported in that
  1331 + version*. The version number is interpreted in the same way as with
  1332 + :samp:`--min-version` so that extension levels can be
  1333 + set. In some cases, forcing the output file's PDF version to be lower
  1334 + than that of the input file will cause qpdf to disable certain
  1335 + features of the document. Specifically, 256-bit keys are disabled if
  1336 + the version is less than 1.7 with extension level 8 (except R5 is
  1337 + disabled if less than 1.7 with extension level 3), AES encryption is
  1338 + disabled if the version is less than 1.6, cleartext metadata and
  1339 + object streams are disabled if less than 1.5, 128-bit encryption keys
  1340 + are disabled if less than 1.4, and all encryption is disabled if less
  1341 + than 1.3. Even with these precautions, qpdf won't be able to do
  1342 + things like eliminate use of newer image compression schemes,
  1343 + transparency groups, or other features that may have been added in
  1344 + more recent versions of PDF.
  1345 +
  1346 + As a general rule, with the exception of big structural things like
  1347 + the use of object streams or AES encryption, PDF viewers are supposed
  1348 + to ignore features in files that they don't support from newer
  1349 + versions. This means that forcing the version to a lower version may
  1350 + make it possible to open your PDF file with an older version, though
  1351 + bear in mind that some of the original document's functionality may
  1352 + be lost.
  1353 +
  1354 +By default, when a stream is encoded using non-lossy filters that qpdf
  1355 +understands and is not already compressed using a good compression
  1356 +scheme, qpdf will uncompress and recompress streams. Assuming proper
  1357 +filter implements, this is safe and generally results in smaller files.
  1358 +This behavior may also be explicitly requested with
  1359 +:samp:`--stream-data=compress`.
  1360 +
  1361 +When :samp:`--normalize-content=y` is specified, qpdf
  1362 +will attempt to normalize whitespace and newlines in page content
  1363 +streams. This is generally safe but could, in some cases, cause damage
  1364 +to the content streams. This option is intended for people who wish to
  1365 +study PDF content streams or to debug PDF content. You should not use
  1366 +this for "production" PDF files.
  1367 +
  1368 +When normalizing content, if qpdf runs into any lexical errors, it will
  1369 +print a warning indicating that content may be damaged. The only
  1370 +situation in which qpdf is known to cause damage during content
  1371 +normalization is when a page's contents are split across multiple
  1372 +streams and streams are split in the middle of a lexical token such as a
  1373 +string, name, or inline image. Note that files that do this are invalid
  1374 +since the PDF specification states that content streams are not to be
  1375 +split in the middle of a token. If you want to inspect the original
  1376 +content streams in an uncompressed format, you can always run with
  1377 +:samp:`--qdf --normalize-content=n` for a QDF file
  1378 +without content normalization, or alternatively
  1379 +:samp:`--stream-data=uncompress` for a regular non-QDF
  1380 +mode file with uncompressed streams. These will both uncompress all the
  1381 +streams but will not attempt to normalize content. Please note that if
  1382 +you are using content normalization or QDF mode for the purpose of
  1383 +manually inspecting files, you don't have to care about this.
  1384 +
  1385 +Object streams, also known as compressed objects, were introduced into
  1386 +the PDF specification at version 1.5, corresponding to Acrobat 6. Some
  1387 +older PDF viewers may not support files with object streams. qpdf can be
  1388 +used to transform files with object streams to files without object
  1389 +streams or vice versa. As mentioned above, there are three object stream
  1390 +modes: :samp:`preserve`,
  1391 +:samp:`disable`, and :samp:`generate`.
  1392 +
  1393 +In :samp:`preserve` mode, the relationship to objects
  1394 +and the streams that contain them is preserved from the original file.
  1395 +In :samp:`disable` mode, all objects are written as
  1396 +regular, uncompressed objects. The resulting file should be readable by
  1397 +older PDF viewers. (Of course, the content of the files may include
  1398 +features not supported by older viewers, but at least the structure will
  1399 +be supported.) In :samp:`generate` mode, qpdf will
  1400 +create its own object streams. This will usually result in more compact
  1401 +PDF files, though they may not be readable by older viewers. In this
  1402 +mode, qpdf will also make sure the PDF version number in the header is
  1403 +at least 1.5.
  1404 +
  1405 +The :samp:`--qdf` flag turns on QDF mode, which changes
  1406 +some of the defaults described above. Specifically, in QDF mode, by
  1407 +default, stream data is uncompressed, content streams are normalized,
  1408 +and encryption is removed. These defaults can still be overridden by
  1409 +specifying the appropriate options as described above. Additionally, in
  1410 +QDF mode, stream lengths are stored as indirect objects, objects are
  1411 +laid out in a less efficient but more readable fashion, and the
  1412 +documents are interspersed with comments that make it easier for the
  1413 +user to find things and also make it possible for
  1414 +:command:`fix-qdf` to work properly. QDF mode is intended
  1415 +for people, mostly developers, who wish to inspect or modify PDF files
  1416 +in a text editor. For details, please see :ref:`ref.qdf`.
  1417 +
  1418 +.. _ref.testing-options:
  1419 +
  1420 +Testing, Inspection, and Debugging Options
  1421 +------------------------------------------
  1422 +
  1423 +These options can be useful for digging into PDF files or for use in
  1424 +automated test suites for software that uses the qpdf library. When any
  1425 +of the options in this section are specified, no output file should be
  1426 +given. The following options are available:
  1427 +
  1428 +:samp:`--deterministic-id`
  1429 + Causes generation of a deterministic value for /ID. This prevents use
  1430 + of timestamp and output file name information in the /ID generation.
  1431 + Instead, at some slight additional runtime cost, the /ID field is
  1432 + generated to include a digest of the significant parts of the content
  1433 + of the output PDF file. This means that a given qpdf operation should
  1434 + generate the same /ID each time it is run, which can be useful when
  1435 + caching results or for generation of some test data. Use of this flag
  1436 + is not compatible with creation of encrypted files.
  1437 +
  1438 +:samp:`--static-id`
  1439 + Causes generation of a fixed value for /ID. This is intended for
  1440 + testing only. Never use it for production files. If you are trying to
  1441 + get the same /ID each time for a given file and you are not
  1442 + generating encrypted files, consider using the
  1443 + :samp:`--deterministic-id` option.
  1444 +
  1445 +:samp:`--static-aes-iv`
  1446 + Causes use of a static initialization vector for AES-CBC. This is
  1447 + intended for testing only so that output files can be reproducible.
  1448 + Never use it for production files. This option in particular is not
  1449 + secure since it significantly weakens the encryption.
  1450 +
  1451 +:samp:`--no-original-object-ids`
  1452 + Suppresses inclusion of original object ID comments in QDF files.
  1453 + This can be useful when generating QDF files for test purposes,
  1454 + particularly when comparing them to determine whether two PDF files
  1455 + have identical content.
  1456 +
  1457 +:samp:`--show-encryption`
  1458 + Shows document encryption parameters. Also shows the document's user
  1459 + password if the owner password is given.
  1460 +
  1461 +:samp:`--show-encryption-key`
  1462 + When encryption information is being displayed, as when
  1463 + :samp:`--check` or
  1464 + :samp:`--show-encryption` is given, display the
  1465 + computed or retrieved encryption key as a hexadecimal string. This
  1466 + value is not ordinarily useful to users, but it can be used as the
  1467 + argument to :samp:`--password` if the
  1468 + :samp:`--password-is-hex-key` is specified. Note
  1469 + that, when PDF files are encrypted, passwords and other metadata are
  1470 + used only to compute an encryption key, and the encryption key is
  1471 + what is actually used for encryption. This enables retrieval of that
  1472 + key.
  1473 +
  1474 +:samp:`--check-linearization`
  1475 + Checks file integrity and linearization status.
  1476 +
  1477 +:samp:`--show-linearization`
  1478 + Checks and displays all data in the linearization hint tables.
  1479 +
  1480 +:samp:`--show-xref`
  1481 + Shows the contents of the cross-reference table in a human-readable
  1482 + form. This is especially useful for files with cross-reference
  1483 + streams which are stored in a binary format.
  1484 +
  1485 +:samp:`--show-object=trailer|obj[,gen]`
  1486 + Show the contents of the given object. This is especially useful for
  1487 + inspecting objects that are inside of object streams (also known as
  1488 + "compressed objects").
  1489 +
  1490 +:samp:`--raw-stream-data`
  1491 + When used along with the :samp:`--show-object`
  1492 + option, if the object is a stream, shows the raw stream data instead
  1493 + of object's contents.
  1494 +
  1495 +:samp:`--filtered-stream-data`
  1496 + When used along with the :samp:`--show-object`
  1497 + option, if the object is a stream, shows the filtered stream data
  1498 + instead of object's contents. If the stream is filtered using filters
  1499 + that qpdf does not support, an error will be issued.
  1500 +
  1501 +:samp:`--show-npages`
  1502 + Prints the number of pages in the input file on a line by itself.
  1503 + Since the number of pages appears by itself on a line, this option
  1504 + can be useful for scripting if you need to know the number of pages
  1505 + in a file.
  1506 +
  1507 +:samp:`--show-pages`
  1508 + Shows the object and generation number for each page dictionary
  1509 + object and for each content stream associated with the page. Having
  1510 + this information makes it more convenient to inspect objects from a
  1511 + particular page.
  1512 +
  1513 +:samp:`--with-images`
  1514 + When used along with :samp:`--show-pages`, also shows
  1515 + the object and generation numbers for the image objects on each page.
  1516 + (At present, information about images in shared resource dictionaries
  1517 + are not output by this command. This is discussed in a comment in the
  1518 + source code.)
  1519 +
  1520 +:samp:`--json`
  1521 + Generate a JSON representation of the file. This is described in
  1522 + depth in :ref:`ref.json`
  1523 +
  1524 +:samp:`--json-help`
  1525 + Describe the format of the JSON output.
  1526 +
  1527 +:samp:`--json-key=key`
  1528 + This option is repeatable. If specified, only top-level keys
  1529 + specified will be included in the JSON output. If not specified, all
  1530 + keys will be shown.
  1531 +
  1532 +:samp:`--json-object=trailer|obj[,gen]`
  1533 + This option is repeatable. If specified, only specified objects will
  1534 + be shown in the "``objects``" key of the JSON output. If absent, all
  1535 + objects will be shown.
  1536 +
  1537 +:samp:`--check`
  1538 + Checks file structure and well as encryption, linearization, and
  1539 + encoding of stream data. A file for which
  1540 + :samp:`--check` reports no errors may still have
  1541 + errors in stream data content but should otherwise be structurally
  1542 + sound. If :samp:`--check` any errors, qpdf will exit
  1543 + with a status of 2. There are some recoverable conditions that
  1544 + :samp:`--check` detects. These are issued as warnings
  1545 + instead of errors. If qpdf finds no errors but finds warnings, it
  1546 + will exit with a status of 3 (as of versionย 2.0.4). When
  1547 + :samp:`--check` is combined with other options,
  1548 + checks are always performed before any other options are processed.
  1549 + For erroneous files, :samp:`--check` will cause qpdf
  1550 + to attempt to recover, after which other options are effectively
  1551 + operating on the recovered file. Combining
  1552 + :samp:`--check` with other options in this way can be
  1553 + useful for manually recovering severely damaged files. Note that
  1554 + :samp:`--check` produces no output to standard output
  1555 + when everything is valid, so if you are using this to
  1556 + programmatically validate files in bulk, it is safe to run without
  1557 + output redirected to :file:`/dev/null` and just
  1558 + check for a 0 exit code.
  1559 +
  1560 +The :samp:`--raw-stream-data` and
  1561 +:samp:`--filtered-stream-data` options are ignored
  1562 +unless :samp:`--show-object` is given. Either of these
  1563 +options will cause the stream data to be written to standard output. In
  1564 +order to avoid commingling of stream data with other output, it is
  1565 +recommend that these objects not be combined with other test/inspection
  1566 +options.
  1567 +
  1568 +If :samp:`--filtered-stream-data` is given and
  1569 +:samp:`--normalize-content=y` is also given, qpdf will
  1570 +attempt to normalize the stream data as if it is a page content stream.
  1571 +This attempt will be made even if it is not a page content stream, in
  1572 +which case it will produce unusable results.
  1573 +
  1574 +.. _ref.unicode-passwords:
  1575 +
  1576 +Unicode Passwords
  1577 +-----------------
  1578 +
  1579 +At the library API level, all methods that perform encryption and
  1580 +decryption interpret passwords as strings of bytes. It is up to the
  1581 +caller to ensure that they are appropriately encoded. Starting with qpdf
  1582 +version 8.4.0, qpdf will attempt to make this easier for you when
  1583 +interact with qpdf via its command line interface. The PDF specification
  1584 +requires passwords used to encrypt files with 40-bit or 128-bit
  1585 +encryption to be encoded with PDF Doc encoding. This encoding is a
  1586 +single-byte encoding that supports ISO-Latin-1 and a handful of other
  1587 +commonly used characters. It has a large overlap with Windows ANSI but
  1588 +is not exactly the same. There is generally not a way to provide PDF Doc
  1589 +encoded strings on the command line. As such, qpdf versions prior to
  1590 +8.4.0 would often create PDF files that couldn't be opened with other
  1591 +software when given a password with non-ASCII characters to encrypt a
  1592 +file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf
  1593 +recognizes the encoding of the parameter and transcodes it as needed.
  1594 +The rest of this section provides the details about exactly how qpdf
  1595 +behaves. Most users will not need to know this information, but it might
  1596 +be useful if you have been working around qpdf's old behavior or if you
  1597 +are using qpdf to generate encrypted files for testing other PDF
  1598 +software.
  1599 +
  1600 +A note about Windows: when qpdf builds, it attempts to determine what it
  1601 +has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain``
  1602 +function is an alternative entry point that receives all arguments as
  1603 +UTF-16-encoded strings. When qpdf starts up this way, it converts all
  1604 +the strings to UTF-8 encoding and then invokes the regular main. This
  1605 +means that, as far as qpdf is concerned, it receives its command-line
  1606 +arguments with UTF-8 encoding, just as it would in any modern Linux or
  1607 +UNIX environment.
  1608 +
  1609 +If a file is being encrypted with 40-bit or 128-bit encryption and the
  1610 +supplied password is not a valid UTF-8 string, qpdf will fall back to
  1611 +the behavior of interpreting the password as a string of bytes. If you
  1612 +have old scripts that encrypt files by passing the output of
  1613 +:command:`iconv` to qpdf, you no longer need to do that,
  1614 +but if you do, qpdf should still work. The only exception would be for
  1615 +the extremely unlikely case of a password that is encoded with a
  1616 +single-byte encoding but also happens to be valid UTF-8. Such a password
  1617 +would contain strings of even numbers of characters that alternate
  1618 +between accented letters and symbols. In the extremely unlikely event
  1619 +that you are intentionally using such passwords and qpdf is thwarting
  1620 +you by interpreting them as UTF-8, you can use
  1621 +:samp:`--password-mode=bytes` to suppress qpdf's
  1622 +automatic behavior.
  1623 +
  1624 +The :samp:`--password-mode` option, as described earlier
  1625 +in this chapter, can be used to change qpdf's interpretation of supplied
  1626 +passwords. There are very few reasons to use this option. One would be
  1627 +the unlikely case described in the previous paragraph in which the
  1628 +supplied password happens to be valid UTF-8 but isn't supposed to be
  1629 +UTF-8. Your best bet would be just to provide the password as a valid
  1630 +UTF-8 string, but you could also use
  1631 +:samp:`--password-mode=bytes`. Another reason to use
  1632 +:samp:`--password-mode=bytes` would be to intentionally
  1633 +generate PDF files encrypted with passwords that are not properly
  1634 +encoded. The qpdf test suite does this to generate invalid files for the
  1635 +purpose of testing its password recovery capability. If you were trying
  1636 +to create intentionally incorrect files for a similar purposes, the
  1637 +:samp:`bytes` password mode can enable you to do this.
  1638 +
  1639 +When qpdf attempts to decrypt a file with a password that contains
  1640 +non-ASCII characters, it will generate a list of alternative passwords
  1641 +by attempting to interpret the password as each of a handful of
  1642 +different coding systems and then transcode them to the required format.
  1643 +This helps to compensate for the supplied password being given in the
  1644 +wrong coding system, such as would happen if you used the
  1645 +:command:`iconv` workaround that was previously needed.
  1646 +It also generates passwords by doing the reverse operation: translating
  1647 +from correct in incorrect encoding of the password. This would enable
  1648 +qpdf to decrypt files using passwords that were improperly encoded by
  1649 +whatever software encrypted the files, including older versions of qpdf
  1650 +invoked without properly encoded passwords. The combination of these two
  1651 +recovery methods should make qpdf transparently open most encrypted
  1652 +files with the password supplied correctly but in the wrong coding
  1653 +system. There are no real downsides to this behavior, but if you don't
  1654 +want qpdf to do this, you can use the
  1655 +:samp:`--suppress-password-recovery` option. One reason
  1656 +to do that is to ensure that you know the exact password that was used
  1657 +to encrypt the file.
  1658 +
  1659 +With these changes, qpdf now generates compliant passwords in most
  1660 +cases. There are still some exceptions. In particular, the PDF
  1661 +specification directs compliant writers to normalize Unicode passwords
  1662 +and to perform certain transformations on passwords with bidirectional
  1663 +text. Implementing this functionality requires using a real Unicode
  1664 +library like ICU. If a client application that uses qpdf wants to do
  1665 +this, the qpdf library will accept the resulting passwords, but qpdf
  1666 +will not perform these transformations itself. It is possible that this
  1667 +will be addressed in a future version of qpdf. The ``QPDFWriter``
  1668 +methods that enable encryption on the output file accept passwords as
  1669 +strings of bytes.
  1670 +
  1671 +Please note that the :samp:`--password-is-hex-key`
  1672 +option is unrelated to all this. This flag bypasses the normal process
  1673 +of going from password to encryption string entirely, allowing the raw
  1674 +encryption key to be specified directly. This is useful for forensic
  1675 +purposes or for brute-force recovery of files with unknown passwords.
... ...
manual/conf.py
... ... @@ -11,4 +11,7 @@ project = &#39;QPDF&#39;
11 11 copyright = '2005-2021, Jay Berkenbilt'
12 12 author = 'Jay Berkenbilt'
13 13 release = '10.4.0'
14   -html_theme = 'alabaster'
  14 +html_theme = 'agogo'
  15 +html_theme_options = {
  16 + "body_max_width": None,
  17 +}
... ...
manual/design.rst 0 โ†’ 100644
  1 +.. _ref.design:
  2 +
  3 +Design and Library Notes
  4 +========================
  5 +
  6 +.. _ref.design.intro:
  7 +
  8 +Introduction
  9 +------------
  10 +
  11 +This section was written prior to the implementation of the qpdf package
  12 +and was subsequently modified to reflect the implementation. In some
  13 +cases, for purposes of explanation, it may differ slightly from the
  14 +actual implementation. As always, the source code and test suite are
  15 +authoritative. Even if there are some errors, this document should serve
  16 +as a road map to understanding how this code works.
  17 +
  18 +In general, one should adhere strictly to a specification when writing
  19 +but be liberal in reading. This way, the product of our software will be
  20 +accepted by the widest range of other programs, and we will accept the
  21 +widest range of input files. This library attempts to conform to that
  22 +philosophy whenever possible but also aims to provide strict checking
  23 +for people who want to validate PDF files. If you don't want to see
  24 +warnings and are trying to write something that is tolerant, you can
  25 +call ``setSuppressWarnings(true)``. If you want to fail on the first
  26 +error, you can call ``setAttemptRecovery(false)``. The default behavior
  27 +is to generating warnings for recoverable problems. Note that recovery
  28 +will not always produce the desired results even if it is able to get
  29 +through the file. Unlike most other PDF files that produce generic
  30 +warnings such as "This file is damaged,", qpdf generally issues a
  31 +detailed error message that would be most useful to a PDF developer.
  32 +This is by design as there seems to be a shortage of PDF validation
  33 +tools out there. This was, in fact, one of the major motivations behind
  34 +the initial creation of qpdf.
  35 +
  36 +.. _ref.design-goals:
  37 +
  38 +Design Goals
  39 +------------
  40 +
  41 +The QPDF package includes support for reading and rewriting PDF files.
  42 +It aims to hide from the user details involving object locations,
  43 +modified (appended) PDF files, the directness/indirectness of objects,
  44 +and stream filters including encryption. It does not aim to hide
  45 +knowledge of the object hierarchy or content stream contents. Put
  46 +another way, a user of the qpdf library is expected to have knowledge
  47 +about how PDF files work, but is not expected to have to keep track of
  48 +bookkeeping details such as file positions.
  49 +
  50 +A user of the library never has to care whether an object is direct or
  51 +indirect, though it is possible to determine whether an object is direct
  52 +or not if this information is needed. All access to objects deals with
  53 +this transparently. All memory management details are also handled by
  54 +the library.
  55 +
  56 +The ``PointerHolder`` object is used internally by the library to deal
  57 +with memory management. This is basically a smart pointer object very
  58 +similar in spirit to C++-11's ``std::shared_ptr`` object, but predating
  59 +it by several years. This library also makes use of a technique for
  60 +giving fine-grained access to methods in one class to other classes by
  61 +using public subclasses with friends and only private members that in
  62 +turn call private methods of the containing class. See
  63 +``QPDFObjectHandle::Factory`` as an example.
  64 +
  65 +The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF
  66 +file. The library provides methods for both accessing and mutating PDF
  67 +files.
  68 +
  69 +The primary class for interacting with PDF objects is
  70 +``QPDFObjectHandle``. Instances of this class can be passed around by
  71 +value, copied, stored in containers, etc. with very low overhead.
  72 +Instances of ``QPDFObjectHandle`` created by reading from a file will
  73 +always contain a reference back to the ``QPDF`` object from which they
  74 +were created. A ``QPDFObjectHandle`` may be direct or indirect. If
  75 +indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to
  76 +is a null pointer. In this case, the first attempt to access the
  77 +underlying ``QPDFObject`` will result in the ``QPDFObject`` being
  78 +resolved via a call to the referenced ``QPDF`` instance. This makes it
  79 +essentially impossible to make coding errors in which certain things
  80 +will work for some PDF files and not for others based on which objects
  81 +are direct and which objects are indirect.
  82 +
  83 +Instances of ``QPDFObjectHandle`` can be directly created and modified
  84 +using static factory methods in the ``QPDFObjectHandle`` class. There
  85 +are factory methods for each type of object as well as a convenience
  86 +method ``QPDFObjectHandle::parse`` that creates an object from a string
  87 +representation of the object. Existing instances of ``QPDFObjectHandle``
  88 +can also be modified in several ways. See comments in
  89 +:file:`QPDFObjectHandle.hh` for details.
  90 +
  91 +An instance of ``QPDF`` is constructed by using the class's default
  92 +constructor. If desired, the ``QPDF`` object may be configured with
  93 +various methods that change its default behavior. Then the
  94 +``QPDF::processFile()`` method is passed the name of a PDF file, which
  95 +permanently associates the file with that QPDF object. A password may
  96 +also be given for access to password-protected files. QPDF does not
  97 +enforce encryption parameters and will treat user and owner passwords
  98 +equivalently. Either password may be used to access an encrypted file.
  99 +``QPDF`` will allow recovery of a user password given an owner password.
  100 +The input PDF file must be seekable. (Output files written by
  101 +``QPDFWriter`` need not be seekable, even when creating linearized
  102 +files.) During construction, ``QPDF`` validates the PDF file's header,
  103 +and then reads the cross reference tables and trailer dictionaries. The
  104 +``QPDF`` class keeps only the first trailer dictionary though it does
  105 +read all of them so it can check the ``/Prev`` key. ``QPDF`` class users
  106 +may request the root object and the trailer dictionary specifically. The
  107 +cross reference table is kept private. Objects may then be requested by
  108 +number of by walking the object tree.
  109 +
  110 +When a PDF file has a cross-reference stream instead of a
  111 +cross-reference table and trailer, requesting the document's trailer
  112 +dictionary returns the stream dictionary from the cross-reference stream
  113 +instead.
  114 +
  115 +There are some convenience routines for very common operations such as
  116 +walking the page tree and returning a vector of all page objects. For
  117 +full details, please see the header files
  118 +:file:`QPDF.hh` and
  119 +:file:`QPDFObjectHandle.hh`. There are also some
  120 +additional helper classes that provide higher level API functions for
  121 +certain document constructions. These are discussed in :ref:`ref.helper-classes`.
  122 +
  123 +.. _ref.helper-classes:
  124 +
  125 +Helper Classes
  126 +--------------
  127 +
  128 +QPDF version 8.1 introduced the concept of helper classes. Helper
  129 +classes are intended to contain higher level APIs that allow developers
  130 +to work with certain document constructs at an abstraction level above
  131 +that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of
  132 +not hiding document structure from the developer. As with qpdf in
  133 +general, the goal is take away some of the more tedious bookkeeping
  134 +aspects of working with PDF files, not to remove the need for the
  135 +developer to understand how the PDF construction in question works. The
  136 +driving factor behind the creation of helper classes was to allow the
  137 +evolution of higher level interfaces in qpdf without polluting the
  138 +interfaces of the main top-level classes ``QPDF`` and
  139 +``QPDFObjectHandle``.
  140 +
  141 +There are two kinds of helper classes: *document* helpers and *object*
  142 +helpers. Document helpers are constructed with a reference to a ``QPDF``
  143 +object and provide methods for working with structures that are at the
  144 +document level. Object helpers are constructed with an instance of a
  145 +``QPDFObjectHandle`` and provide methods for working with specific types
  146 +of objects.
  147 +
  148 +Examples of document helpers include ``QPDFPageDocumentHelper``, which
  149 +contains methods for operating on the document's page trees, such as
  150 +enumerating all pages of a document and adding and removing pages; and
  151 +``QPDFAcroFormDocumentHelper``, which contains document-level methods
  152 +related to interactive forms, such as enumerating form fields and
  153 +creating mappings between form fields and annotations.
  154 +
  155 +Examples of object helpers include ``QPDFPageObjectHelper`` for
  156 +performing operations on pages such as page rotation and some operations
  157 +on content streams, ``QPDFFormFieldObjectHelper`` for performing
  158 +operations related to interactive form fields, and
  159 +``QPDFAnnotationObjectHelper`` for working with annotations.
  160 +
  161 +It is always possible to retrieve the underlying ``QPDF`` reference from
  162 +a document helper and the underlying ``QPDFObjectHandle`` reference from
  163 +an object helper. Helpers are designed to be helpers, not wrappers. The
  164 +intention is that, in general, it is safe to freely intermix operations
  165 +that use helpers with operations that use the underlying objects.
  166 +Document and object helpers do not attempt to provide a complete
  167 +interface for working with the things they are helping with, nor do they
  168 +attempt to encapsulate underlying structures. They just provide a few
  169 +methods to help with error-prone, repetitive, or complex tasks. In some
  170 +cases, a helper object may cache some information that is expensive to
  171 +gather. In such cases, the helper classes are implemented so that their
  172 +own methods keep the cache consistent, and the header file will provide
  173 +a method to invalidate the cache and a description of what kinds of
  174 +operations would make the cache invalid. If in doubt, you can always
  175 +discard a helper class and create a new one with the same underlying
  176 +objects, which will ensure that you have discarded any stale
  177 +information.
  178 +
  179 +By Convention, document helpers are called
  180 +``QPDFSomethingDocumentHelper`` and are derived from
  181 +``QPDFDocumentHelper``, and object helpers are called
  182 +``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``.
  183 +For details on specific helpers, please see their header files. You can
  184 +find them by looking at
  185 +:file:`include/qpdf/QPDF*DocumentHelper.hh` and
  186 +:file:`include/qpdf/QPDF*ObjectHelper.hh`.
  187 +
  188 +In order to avoid creation of circular dependencies, the following
  189 +general guidelines are followed with helper classes:
  190 +
  191 +- Core class interfaces do not know about helper classes. For example,
  192 + no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper
  193 + classes in their interfaces.
  194 +
  195 +- Interfaces of object helpers will usually not use document helpers in
  196 + their interfaces. This is because it is much more useful for document
  197 + helpers to have methods that return object helpers. Most operations
  198 + in PDF files start at the document level and go from there to the
  199 + object level rather than the other way around. It can sometimes be
  200 + useful to map back from object-level structures to document-level
  201 + structures. If there is a desire to do this, it will generally be
  202 + provided by a method in the document helper class.
  203 +
  204 +- Most of the time, object helpers don't know about other object
  205 + helpers. However, in some cases, one type of object may be a
  206 + container for another type of object, in which case it may make sense
  207 + for the outer object to know about the inner object. For example,
  208 + there are methods in the ``QPDFPageObjectHelper`` that know
  209 + ``QPDFAnnotationObjectHelper`` because references to annotations are
  210 + contained in page dictionaries.
  211 +
  212 +- Any helper or core library class may use helpers in their
  213 + implementations.
  214 +
  215 +Prior to qpdf version 8.1, higher level interfaces were added as
  216 +"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For
  217 +compatibility, older convenience functions for operating with pages will
  218 +remain in those classes even as alternatives are provided in helper
  219 +classes. Going forward, new higher level interfaces will be provided
  220 +using helper classes.
  221 +
  222 +.. _ref.implementation-notes:
  223 +
  224 +Implementation Notes
  225 +--------------------
  226 +
  227 +This section contains a few notes about QPDF's internal implementation,
  228 +particularly around what it does when it first processes a file. This
  229 +section is a bit of a simplification of what it actually does, but it
  230 +could serve as a starting point to someone trying to understand the
  231 +implementation. There is nothing in this section that you need to know
  232 +to use the qpdf library.
  233 +
  234 +``QPDFObject`` is the basic PDF Object class. It is an abstract base
  235 +class from which are derived classes for each type of PDF object.
  236 +Clients do not interact with Objects directly but instead interact with
  237 +``QPDFObjectHandle``.
  238 +
  239 +When the ``QPDF`` class creates a new object, it dynamically allocates
  240 +the appropriate type of ``QPDFObject`` and immediately hands the pointer
  241 +to an instance of ``QPDFObjectHandle``. The parser reads a token from
  242 +the current file position. If the token is a not either a dictionary or
  243 +array opener, an object is immediately constructed from the single token
  244 +and the parser returns. Otherwise, the parser iterates in a special mode
  245 +in which it accumulates objects until it finds a balancing closer.
  246 +During this process, the "``R``" keyword is recognized and an indirect
  247 +``QPDFObjectHandle`` may be constructed.
  248 +
  249 +The ``QPDF::resolve()`` method, which is used to resolve an indirect
  250 +object, may be invoked from the ``QPDFObjectHandle`` class. It first
  251 +checks a cache to see whether this object has already been read. If not,
  252 +it reads the object from the PDF file and caches it. It the returns the
  253 +resulting ``QPDFObjectHandle``. The calling object handle then replaces
  254 +its ``PointerHolder<QDFObject>`` with the one from the newly returned
  255 +``QPDFObjectHandle``. In this way, only a single copy of any direct
  256 +object need exist and clients can access objects transparently without
  257 +knowing caring whether they are direct or indirect objects.
  258 +Additionally, no object is ever read from the file more than once. That
  259 +means that only the portions of the PDF file that are actually needed
  260 +are ever read from the input file, thus allowing the qpdf package to
  261 +take advantage of this important design goal of PDF files.
  262 +
  263 +If the requested object is inside of an object stream, the object stream
  264 +itself is first read into memory. Then the tokenizer reads objects from
  265 +the memory stream based on the offset information stored in the stream.
  266 +Those individual objects are cached, after which the temporary buffer
  267 +holding the object stream contents are discarded. In this way, the first
  268 +time an object in an object stream is requested, all objects in the
  269 +stream are cached.
  270 +
  271 +The following example should clarify how ``QPDF`` processes a simple
  272 +file.
  273 +
  274 +- Client constructs ``QPDF`` ``pdf`` and calls
  275 + ``pdf.processFile("a.pdf");``.
  276 +
  277 +- The ``QPDF`` class checks the beginning of
  278 + :file:`a.pdf` for a PDF header. It then reads the
  279 + cross reference table mentioned at the end of the file, ensuring that
  280 + it is looking before the last ``%%EOF``. After getting to ``trailer``
  281 + keyword, it invokes the parser.
  282 +
  283 +- The parser sees "``<<``", so it calls itself recursively in
  284 + dictionary creation mode.
  285 +
  286 +- In dictionary creation mode, the parser keeps accumulating objects
  287 + until it encounters "``>>``". Each object that is read is pushed onto
  288 + a stack. If "``R``" is read, the last two objects on the stack are
  289 + inspected. If they are integers, they are popped off the stack and
  290 + their values are used to construct an indirect object handle which is
  291 + then pushed onto the stack. When "``>>``" is finally read, the stack
  292 + is converted into a ``QPDF_Dictionary`` which is placed in a
  293 + ``QPDFObjectHandle`` and returned.
  294 +
  295 +- The resulting dictionary is saved as the trailer dictionary.
  296 +
  297 +- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that
  298 + point and repeats except that the new trailer dictionary is not
  299 + saved. If ``/Prev`` is not present, the initial parsing process is
  300 + complete.
  301 +
  302 + If there is an encryption dictionary, the document's encryption
  303 + parameters are initialized.
  304 +
  305 +- The client requests root object. The ``QPDF`` class gets the value of
  306 + root key from trailer dictionary and returns it. It is an unresolved
  307 + indirect ``QPDFObjectHandle``.
  308 +
  309 +- The client requests the ``/Pages`` key from root
  310 + ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is
  311 + indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the
  312 + object cache for an object with the root dictionary's object ID and
  313 + generation number. Upon not seeing it, it checks the cross reference
  314 + table, gets the offset, and reads the object present at that offset.
  315 + It stores the result in the object cache and returns the cached
  316 + result. The calling ``QPDFObjectHandle`` replaces its object pointer
  317 + with the one from the resolved ``QPDFObjectHandle``, verifies that it
  318 + a valid dictionary object, and returns the (unresolved indirect)
  319 + ``QPDFObject`` handle to the top of the Pages hierarchy.
  320 +
  321 + As the client continues to request objects, the same process is
  322 + followed for each new requested object.
  323 +
  324 +.. _ref.casting:
  325 +
  326 +Casting Policy
  327 +--------------
  328 +
  329 +This section describes the casting policy followed by qpdf's
  330 +implementation. This is no concern to qpdf's end users and largely of no
  331 +concern to people writing code that uses qpdf, but it could be of
  332 +interest to people who are porting qpdf to a new platform or who are
  333 +making modifications to the code.
  334 +
  335 +The C++ code in qpdf is free of old-style casts except where unavoidable
  336 +(e.g. where the old-style cast is in a macro provided by a third-party
  337 +header file). When there is a need for a cast, it is handled, in order
  338 +of preference, by rewriting the code to avoid the need for a cast,
  339 +calling ``const_cast``, calling ``static_cast``, calling
  340 +``reinterpret_cast``, or calling some combination of the above. As a
  341 +last resort, a compiler-specific ``#pragma`` may be used to suppress a
  342 +warning that we don't want to fix. Examples may include suppressing
  343 +warnings about the use of old-style casts in code that is shared between
  344 +C and C++ code.
  345 +
  346 +The ``QIntC`` namespace, provided by
  347 +:file:`include/qpdf/QIntC.hh`, implements safe
  348 +functions for converting between integer types. These functions do range
  349 +checking and throw a ``std::range_error``, which is subclass of
  350 +``std::runtime_error``, if conversion from one integer type to another
  351 +results in loss of information. There are many cases in which we have to
  352 +move between different integer types because of incompatible integer
  353 +types used in interoperable interfaces. Some are unavoidable, such as
  354 +moving between sizes and offsets, and others are there because of old
  355 +code that is too in entrenched to be fixable without breaking source
  356 +compatibility and causing pain for users. QPDF is compiled with extra
  357 +warnings to detect conversions with potential data loss, and all such
  358 +cases should be fixed by either using a function from ``QIntC`` or a
  359 +``static_cast``.
  360 +
  361 +When the intention is just to switch the type because of exchanging data
  362 +between incompatible interfaces, use ``QIntC``. This is the usual case.
  363 +However, there are some cases in which we are explicitly intending to
  364 +use the exact same bit pattern with a different type. This is most
  365 +common when switching between signed and unsigned characters. A lot of
  366 +qpdf's code uses unsigned characters internally, but ``std::string`` and
  367 +``char`` are signed. Using ``QIntC::to_char`` would be wrong for
  368 +converting from unsigned to signed characters because a negative
  369 +``char`` value and the corresponding ``unsigned char`` value greater
  370 +than 127 *mean the same thing*. There are also
  371 +cases in which we use ``static_cast`` when working with bit fields where
  372 +we are not representing a numerical value but rather a bunch of bits
  373 +packed together in some integer type. Also note that ``size_t`` and
  374 +``long`` both typically differ between 32-bit and 64-bit environments,
  375 +so sometimes an explicit cast may not be needed to avoid warnings on one
  376 +platform but may be needed on another. A conversion with ``QIntC``
  377 +should always be used when the types are different even if the
  378 +underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit
  379 +platforms, and the test suite is very thorough, so it is hard to make
  380 +any of the potential errors here without being caught in build or test.
  381 +
  382 +Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The
  383 +pipeline interface has a ``write`` call that uses ``unsigned char*``
  384 +without a ``const`` qualifier. The main reason for this is
  385 +to support pipelines that make calls to third-party libraries, such as
  386 +zlib, that don't include ``const`` in their interfaces. Unfortunately,
  387 +there are many places in the code where it is desirable to have
  388 +``const char*`` with pipelines. None of the pipeline implementations
  389 +in qpdf
  390 +currently modify the data passed to write, and doing so would be counter
  391 +to the intent of ``Pipeline``, but there is nothing in the code to
  392 +prevent this from being done. There are places in the code where
  393 +``const_cast`` is used to remove the const-ness of pointers going into
  394 +``Pipeline``\ s. This could theoretically be unsafe, but there is
  395 +adequate testing to assert that it is safe and will remain safe in
  396 +qpdf's code.
  397 +
  398 +.. _ref.encryption:
  399 +
  400 +Encryption
  401 +----------
  402 +
  403 +Encryption is supported transparently by qpdf. When opening a PDF file,
  404 +if an encryption dictionary exists, the ``QPDF`` object processes this
  405 +dictionary using the password (if any) provided. The primary decryption
  406 +key is computed and cached. No further access is made to the encryption
  407 +dictionary after that time. When an object is read from a file, the
  408 +object ID and generation of the object in which it is contained is
  409 +always known. Using this information along with the stored encryption
  410 +key, all stream and string objects are transparently decrypted. Raw
  411 +encrypted objects are never stored in memory. This way, nothing in the
  412 +library ever has to know or care whether it is reading an encrypted
  413 +file.
  414 +
  415 +An interface is also provided for writing encrypted streams and strings
  416 +given an encryption key. This is used by ``QPDFWriter`` when it rewrites
  417 +encrypted files.
  418 +
  419 +When copying encrypted files, unless otherwise directed, qpdf will
  420 +preserve any encryption in force in the original file. qpdf can do this
  421 +with either the user or the owner password. There is no difference in
  422 +capability based on which password is used. When 40 or 128 bit
  423 +encryption keys are used, the user password can be recovered with the
  424 +owner password. With 256 keys, the user and owner passwords are used
  425 +independently to encrypt the actual encryption key, so while either can
  426 +be used, the owner password can no longer be used to recover the user
  427 +password.
  428 +
  429 +Starting with version 4.0.0, qpdf can read files that are not encrypted
  430 +but that contain encrypted attachments, but it cannot write such files.
  431 +qpdf also requires the password to be specified in order to open the
  432 +file, not just to extract attachments, since once the file is open, all
  433 +decryption is handled transparently. When copying files like this while
  434 +preserving encryption, qpdf will apply the file's encryption to
  435 +everything in the file, not just to the attachments. When decrypting the
  436 +file, qpdf will decrypt the attachments. In general, when copying PDF
  437 +files with multiple encryption formats, qpdf will choose the newest
  438 +format. The only exception to this is that clear-text metadata will be
  439 +preserved as clear-text if it is that way in the original file.
  440 +
  441 +One point of confusion some people have about encrypted PDF files is
  442 +that encryption is not the same as password protection. Password
  443 +protected files are always encrypted, but it is also possible to create
  444 +encrypted files that do not have passwords. Internally, such files use
  445 +the empty string as a password, and most readers try the empty string
  446 +first to see if it works and prompt for a password only if the empty
  447 +string doesn't work. Normally such files have an empty user password and
  448 +a non-empty owner password. In that way, if the file is opened by an
  449 +ordinary reader without specification of password, the restrictions
  450 +specified in the encryption dictionary can be enforced. Most users
  451 +wouldn't even realize such a file was encrypted. Since qpdf always
  452 +ignores the restrictions (except for the purpose of reporting what they
  453 +are), qpdf doesn't care which password you use. QPDF will allow you to
  454 +create PDF files with non-empty user passwords and empty owner
  455 +passwords. Some readers will require a password when you open these
  456 +files, and others will open the files without a password and not enforce
  457 +restrictions. Having a non-empty user password and an empty owner
  458 +password doesn't really make sense because it would mean that opening
  459 +the file with the user password would be more restrictive than not
  460 +supplying a password at all. QPDF also allows you to create PDF files
  461 +with the same password as both the user and owner password. Some readers
  462 +will not ever allow such files to be accessed without restrictions
  463 +because they never try the password as the owner password if it works as
  464 +the user password. Nonetheless, one of the powerful aspects of qpdf is
  465 +that it allows you to finely specify the way encrypted files are
  466 +created, even if the results are not useful to some readers. One use
  467 +case for this would be for testing a PDF reader to ensure that it
  468 +handles odd configurations of input files.
  469 +
  470 +.. _ref.random-numbers:
  471 +
  472 +Random Number Generation
  473 +------------------------
  474 +
  475 +QPDF generates random numbers to support generation of encrypted data.
  476 +Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of
  477 +random numbers. Older versions used the OS-provided source of secure
  478 +random numbers or, if allowed at build time, insecure random numbers
  479 +from stdlib. Starting with version 5.1.0, you can disable use of
  480 +OS-provided secure random numbers at build time. This is especially
  481 +useful on Windows if you want to avoid a dependency on Microsoft's
  482 +cryptography API. You can also supply your own random data provider. For
  483 +details on how to do this, please refer to the top-level README.md file
  484 +in the source distribution and to comments in
  485 +:file:`QUtil.hh`.
  486 +
  487 +.. _ref.adding-and-remove-pages:
  488 +
  489 +Adding and Removing Pages
  490 +-------------------------
  491 +
  492 +While qpdf's API has supported adding and modifying objects for some
  493 +time, version 3.0 introduces specific methods for adding and removing
  494 +pages. These are largely convenience routines that handle two tricky
  495 +issues: pushing inheritable resources from the ``/Pages`` tree down to
  496 +individual pages and manipulation of the ``/Pages`` tree itself. For
  497 +details, see ``addPage`` and surrounding methods in
  498 +:file:`QPDF.hh`.
  499 +
  500 +.. _ref.reserved-objects:
  501 +
  502 +Reserving Object Numbers
  503 +------------------------
  504 +
  505 +Version 3.0 of qpdf introduced the concept of reserved objects. These
  506 +are seldom needed for ordinary operations, but there are cases in which
  507 +you may want to add a series of indirect objects with references to each
  508 +other to a ``QPDF`` object. This causes a problem because you can't
  509 +determine the object ID that a new indirect object will have until you
  510 +add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The
  511 +only way to add two mutually referential objects to a ``QPDF`` object
  512 +prior to version 3.0 would be to add the new objects first and then make
  513 +them refer to each other after adding them. Now it is possible to create
  514 +a *reserved object* using
  515 +``QPDFObjectHandle::newReserved``. This is an indirect object that stays
  516 +"unresolved" even if it is queried for its type. So now, if you want to
  517 +create a set of mutually referential objects, you can create
  518 +reservations for each one of them and use those reservations to
  519 +construct the references. When finished, you can call
  520 +``QPDF::replaceReserved`` to replace the reserved objects with the real
  521 +ones. This functionality will never be needed by most applications, but
  522 +it is used internally by QPDF when copying objects from other PDF files,
  523 +as discussed in :ref:`ref.foreign-objects`. For an example of how to use reserved
  524 +objects, search for ``newReserved`` in
  525 +:file:`test_driver.cc` in qpdf's sources.
  526 +
  527 +.. _ref.foreign-objects:
  528 +
  529 +Copying Objects From Other PDF Files
  530 +------------------------------------
  531 +
  532 +Version 3.0 of qpdf introduced the ability to copy objects into a
  533 +``QPDF`` object from a different ``QPDF`` object, which we refer to as
  534 +*foreign objects*. This allows arbitrary
  535 +merging of PDF files. The "from" ``QPDF`` object must remain valid after
  536 +the copy as discussed in the note below. The
  537 +:command:`qpdf` command-line tool provides limited
  538 +support for basic page selection, including merging in pages from other
  539 +files, but the library's API makes it possible to implement arbitrarily
  540 +complex merging operations. The main method for copying foreign objects
  541 +is ``QPDF::copyForeignObject``. This takes an indirect object from
  542 +another ``QPDF`` and copies it recursively into this object while
  543 +preserving all object structure, including circular references. This
  544 +means you can add a direct object that you create from scratch to a
  545 +``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an
  546 +indirect object from another file with ``QPDF::copyForeignObject``. The
  547 +fact that ``QPDF::makeIndirectObject`` does not automatically detect a
  548 +foreign object and copy it is an explicit design decision. Copying a
  549 +foreign object seems like a sufficiently significant thing to do that it
  550 +should be done explicitly.
  551 +
  552 +The other way to copy foreign objects is by passing a page from one
  553 +``QPDF`` to another by calling ``QPDF::addPage``. In contrast to
  554 +``QPDF::makeIndirectObject``, this method automatically distinguishes
  555 +between indirect objects in the current file, foreign objects, and
  556 +direct objects.
  557 +
  558 +Please note: when you copy objects from one ``QPDF`` to another, the
  559 +source ``QPDF`` object must remain valid until you have finished with
  560 +the destination object. This is because the original object is still
  561 +used to retrieve any referenced stream data from the copied object.
  562 +
  563 +.. _ref.rewriting:
  564 +
  565 +Writing PDF Files
  566 +-----------------
  567 +
  568 +The qpdf library supports file writing of ``QPDF`` objects to PDF files
  569 +through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two
  570 +writing modes: one for non-linearized files, and one for linearized
  571 +files. See :ref:`ref.linearization` for a description of
  572 +linearization is implemented. This section describes how we write
  573 +non-linearized files including the creation of QDF files (see :ref:`ref.qdf`.
  574 +
  575 +This outline was written prior to implementation and is not exactly
  576 +accurate, but it provides a correct "notional" idea of how writing
  577 +works. Look at the code in ``QPDFWriter`` for exact details.
  578 +
  579 +- Initialize state:
  580 +
  581 + - next object number = 1
  582 +
  583 + - object queue = empty
  584 +
  585 + - renumber table: old object id/generation to new id/0 = empty
  586 +
  587 + - xref table: new id -> offset = empty
  588 +
  589 +- Create a QPDF object from a file.
  590 +
  591 +- Write header for new PDF file.
  592 +
  593 +- Request the trailer dictionary.
  594 +
  595 +- For each value that is an indirect object, grab the next object
  596 + number (via an operation that returns and increments the number). Map
  597 + object to new number in renumber table. Push object onto queue.
  598 +
  599 +- While there are more objects on the queue:
  600 +
  601 + - Pop queue.
  602 +
  603 + - Look up object's new number *n* in the renumbering table.
  604 +
  605 + - Store current offset into xref table.
  606 +
  607 + - Write ``:samp:`{n}` 0 obj``.
  608 +
  609 + - If object is null, whether direct or indirect, write out null,
  610 + thus eliminating unresolvable indirect object references.
  611 +
  612 + - If the object is a stream stream, write stream contents, piped
  613 + through any filters as required, to a memory buffer. Use this
  614 + buffer to determine the stream length.
  615 +
  616 + - If object is not a stream, array, or dictionary, write out its
  617 + contents.
  618 +
  619 + - If object is an array or dictionary (including stream), traverse
  620 + its elements (for array) or values (for dictionaries), handling
  621 + recursive dictionaries and arrays, looking for indirect objects.
  622 + When an indirect object is found, if it is not resolvable, ignore.
  623 + (This case is handled when writing it out.) Otherwise, look it up
  624 + in the renumbering table. If not found, grab the next available
  625 + object number, assign to the referenced object in the renumbering
  626 + table, and push the referenced object onto the queue. As a special
  627 + case, when writing out a stream dictionary, replace length,
  628 + filters, and decode parameters as required.
  629 +
  630 + Write out dictionary or array, replacing any unresolvable indirect
  631 + object references with null (pdf spec says reference to
  632 + non-existent object is legal and resolves to null) and any
  633 + resolvable ones with references to the renumbered objects.
  634 +
  635 + - If the object is a stream, write ``stream\n``, the stream contents
  636 + (from the memory buffer), and ``\nendstream\n``.
  637 +
  638 + - When done, write ``endobj``.
  639 +
  640 +Once we have finished the queue, all referenced objects will have been
  641 +written out and all deleted objects or unreferenced objects will have
  642 +been skipped. The new cross-reference table will contain an offset for
  643 +every new object number from 1 up to the number of objects written. This
  644 +can be used to write out a new xref table. Finally we can write out the
  645 +trailer dictionary with appropriately computed /ID (see spec, 8.3, File
  646 +Identifiers), the cross reference table offset, and ``%%EOF``.
  647 +
  648 +.. _ref.filtered-streams:
  649 +
  650 +Filtered Streams
  651 +----------------
  652 +
  653 +Support for streams is implemented through the ``Pipeline`` interface
  654 +which was designed for this package.
  655 +
  656 +When reading streams, create a series of ``Pipeline`` objects. The
  657 +``Pipeline`` abstract base requires implementation ``write()`` and
  658 +``finish()`` and provides an implementation of ``getNext()``. Each
  659 +pipeline object, upon receiving data, does whatever it is going to do
  660 +and then writes the data (possibly modified) to its successor.
  661 +Alternatively, a pipeline may be an end-of-the-line pipeline that does
  662 +something like store its output to a file or a memory buffer ignoring a
  663 +successor. For additional details, look at
  664 +:file:`Pipeline.hh`.
  665 +
  666 +``QPDF`` can read raw or filtered streams. When reading a filtered
  667 +stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each
  668 +appropriate filter object and chains them together. The last filter
  669 +should write to whatever type of output is required. The ``QPDF`` class
  670 +has an interface to write raw or filtered stream contents to a given
  671 +pipeline.
  672 +
  673 +.. _ref.object-accessors:
  674 +
  675 +Object Accessor Methods
  676 +-----------------------
  677 +
  678 +..
  679 + This section is referenced in QPDFObjectHandle.hh
  680 +
  681 +For general information about how to access instances of
  682 +``QPDFObjectHandle``, please see the comments in
  683 +:file:`QPDFObjectHandle.hh`. Search for "Accessor
  684 +methods". This section provides a more in-depth discussion of the
  685 +behavior and the rationale for the behavior.
  686 +
  687 +*Why were type errors made into warnings?* When type checks were
  688 +introduced into qpdf in the early days, it was expected that type errors
  689 +would only occur as a result of programmer error. However, in practice,
  690 +type errors would occur with malformed PDF files because of assumptions
  691 +made in code, including code within the qpdf library and code written by
  692 +library users. The most common case would be chaining calls to
  693 +``getKey()`` to access keys deep within a dictionary. In many cases,
  694 +qpdf would be able to recover from these situations, but the old
  695 +behavior often resulted in crashes rather than graceful recovery. For
  696 +this reason, the errors were changed to warnings.
  697 +
  698 +*Why even warn about type errors when the user can't usually do anything
  699 +about them?* Type warnings are extremely valuable during development.
  700 +Since it's impossible to catch at compile time things like typos in
  701 +dictionary key names or logic errors around what the structure of a PDF
  702 +file might be, the presence of type warnings can save lots of developer
  703 +time. They have also proven useful in exposing issues in qpdf itself
  704 +that would have otherwise gone undetected.
  705 +
  706 +*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if
  707 +``QPDFObjectHandle`` could be more strongly typed so that you'd have to
  708 +have check that something was of a particular type before calling
  709 +type-specific accessor methods. However, implementing this at this stage
  710 +of the library's history would be quite difficult, and it would make a
  711 +the common pattern of drilling into an object no longer work. While it
  712 +would be possible to have a parallel interface, it would create a lot of
  713 +extra code. If qpdf were written in a language like rust, an interface
  714 +like this would make a lot of sense, but, for a variety of reasons, the
  715 +qpdf API is consistent with other APIs of its time, relying on exception
  716 +handling to catch errors. The underlying PDF objects are inherently not
  717 +type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would
  718 +ultimately cause a lot more code to have to be written and would like
  719 +make software that uses qpdf more brittle, and even so, checks would
  720 +have to occur at runtime.
  721 +
  722 +*Why do type errors sometimes raise exceptions?* The way warnings work
  723 +in qpdf requires a ``QPDF`` object to be associated with an object
  724 +handle for a warning to be issued. It would be nice if this could be
  725 +fixed, but it would require major changes to the API. Rather than
  726 +throwing away these conditions, we convert them to exceptions. It's not
  727 +that bad though. Since any object handle that was read from a file has
  728 +an associated ``QPDF`` object, it would only be type errors on objects
  729 +that were created explicitly that would cause exceptions, and in that
  730 +case, type errors are much more likely to be the result of a coding
  731 +error than invalid input.
  732 +
  733 +*Why does the behavior of a type exception differ between the C and C++
  734 +API?* There is no way to throw and catch exceptions in C short of
  735 +something like ``setjmp`` and ``longjmp``, and that approach is not
  736 +portable across language barriers. Since the C API is often used from
  737 +other languages, it's important to keep things as simple as possible.
  738 +Starting in qpdf 10.5, exceptions that used to crash code using the C
  739 +API will be written to stderr by default, and it is possible to register
  740 +an error handler. There's no reason that the error handler can't
  741 +simulate exception handling in some way, such as by using ``setjmp`` and
  742 +``longjmp`` or by setting some variable that can be checked after
  743 +library calls are made. In retrospect, it might have been better if the
  744 +C API object handle methods returned error codes like the other methods
  745 +and set return values in passed-in pointers, but this would complicate
  746 +both the implementation and the use of the library for a case that is
  747 +actually quite rare and largely avoidable.
... ...
manual/index.rst
Changes suppressed. Click to show
... ... @@ -9,6261 +9,16 @@ QPDF version |release|
9 9 :maxdepth: 2
10 10 :caption: Contents:
11 11  
12   -.. _ref.overview:
13   -
14   -What is QPDF?
15   -=============
16   -
17   -QPDF is a program and C++ library for structural, content-preserving
18   -transformations on PDF files. QPDF's website is located at
19   -https://qpdf.sourceforge.io/. QPDF's source code is hosted on github
20   -at https://github.com/qpdf/qpdf.
21   -
22   -QPDF provides many useful capabilities to developers of PDF-producing
23   -software or for people who just want to look at the innards of a PDF
24   -file to learn more about how they work. With QPDF, it is possible to
25   -copy objects from one PDF file into another and to manipulate the list
26   -of pages in a PDF file. This makes it possible to merge and split PDF
27   -files. The QPDF library also makes it possible for you to create PDF
28   -files from scratch. In this mode, you are responsible for supplying
29   -all the contents of the file, while the QPDF library takes care of all
30   -the syntactical representation of the objects, creation of cross
31   -references tables and, if you use them, object streams, encryption,
32   -linearization, and other syntactic details. You are still responsible
33   -for generating PDF content on your own.
34   -
35   -QPDF has been designed with very few external dependencies, and it is
36   -intentionally very lightweight. QPDF is *not* a PDF content creation
37   -library, a PDF viewer, or a program capable of converting PDF into other
38   -formats. In particular, QPDF knows nothing about the semantics of PDF
39   -content streams. If you are looking for something that can do that, you
40   -should look elsewhere. However, once you have a valid PDF file, QPDF can
41   -be used to transform that file in ways that perhaps your original PDF
42   -creation tool can't handle. For example, many programs generate simple PDF
43   -files but can't password-protect them, web-optimize them, or perform
44   -other transformations of that type.
45   -
46   -.. _ref.license:
47   -
48   -License
49   -=======
50   -
51   -QPDF is licensed under `the Apache License, Version 2.0
52   -<http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License").
53   -Unless required by applicable law or agreed to in writing, software
54   -distributed under the License is distributed on an "AS IS" BASIS,
55   -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
56   -implied. See the License for the specific language governing
57   -permissions and limitations under the License.
58   -
59   -.. _ref.installing:
60   -
61   -Building and Installing QPDF
62   -============================
63   -
64   -This chapter describes how to build and install qpdf. Please see also
65   -the :file:`README.md` and
66   -:file:`INSTALL` files in the source distribution.
67   -
68   -.. _ref.prerequisites:
69   -
70   -System Requirements
71   --------------------
72   -
73   -The qpdf package has few external dependencies. In order to build qpdf,
74   -the following packages are required:
75   -
76   -- A C++ compiler that supports C++-14.
77   -
78   -- zlib: http://www.zlib.net/
79   -
80   -- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/
81   -
82   -- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be
83   - able to use the gnutls crypto provider, and/or openssl:
84   - https://openssl.org/ to be able to use the openssl crypto provider.
85   -
86   -- gnu make 3.81 or newer: http://www.gnu.org/software/make
87   -
88   -- perl version 5.8 or newer: http://www.perl.org/; required for running
89   - the test suite. Starting with qpdf version 9.1.1, perl is no longer
90   - required at runtime.
91   -
92   -- GNU diffutils (any version): http://www.gnu.org/software/diffutils/
93   - is required to run the test suite. Note that this is the version of
94   - diff present on virtually all GNU/Linux systems. This is required
95   - because the test suite uses :command:`diff -u`.
96   -
97   -Part of qpdf's test suite does comparisons of the contents PDF files by
98   -converting them images and comparing the images. The image comparison
99   -tests are disabled by default. Those tests are not required for
100   -determining correctness of a qpdf build if you have not modified the
101   -code since the test suite also contains expected output files that are
102   -compared literally. The image comparison tests provide an extra check to
103   -make sure that any content transformations don't break the rendering of
104   -pages. Transformations that affect the content streams themselves are
105   -off by default and are only provided to help developers look into the
106   -contents of PDF files. If you are making deep changes to the library
107   -that cause changes in the contents of the files that qpdf generate,
108   -then you should enable the image comparison tests. Enable them by
109   -running :command:`configure` with the
110   -:samp:`--enable-test-compare-images` flag. If you enable
111   -this, the following additional requirements are required by the test
112   -suite. Note that in no case are these items required to use qpdf.
113   -
114   -- libtiff: http://www.remotesensing.org/libtiff/
115   -
116   -- GhostScript version 8.60 or newer: http://www.ghostscript.com
117   -
118   -If you do not enable this, then you do not need to have tiff and
119   -ghostscript.
120   -
121   -Pre-built documentation is distributed with qpdf, so you should
122   -generally not need to rebuild the documentation. In order to build the
123   -documentation from source, you need to install `Sphinx
124   -<https://sphinx-doc.org>`__. To build the PDF version of the
125   -documentation, you need `pdflatex`, `latexmk`, and a fairly complete
126   -LaTeX installation. Detailed requirements can be found in the Sphinx
127   -documentation.
128   -
129   -.. _ref.building:
130   -
131   -Build Instructions
132   -------------------
133   -
134   -Building qpdf on UNIX is generally just a matter of running
135   -
136   -::
137   -
138   - ./configure
139   - make
140   -
141   -You can also run :command:`make check` to run the test
142   -suite and :command:`make install` to install. Please run
143   -:command:`./configure --help` for options on what can be
144   -configured. You can also set the value of ``DESTDIR`` during
145   -installation to install to a temporary location, as is common with many
146   -open source packages. Please see also the
147   -:file:`README.md` and
148   -:file:`INSTALL` files in the source distribution.
149   -
150   -Building on Windows is a little bit more complicated. For details,
151   -please see :file:`README-windows.md` in the source
152   -distribution. You can also download a binary distribution for Windows.
153   -There is a port of qpdf to Visual C++ version 6 in the
154   -:file:`contrib` area generously contributed by Jian
155   -Ma. This is also discussed in more detail in
156   -:file:`README-windows.md`.
157   -
158   -While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one
159   -place in the public API, and it's just in a helper function. It is
160   -possible to build qpdf on a system that doesn't have ``wchar_t``, and
161   -it's also possible to compile a program that uses qpdf on a system
162   -without ``wchar_t`` as long as you don't call that one method. This is a
163   -very unusual situation. For a detailed discussion, please see the
164   -top-level README.md file in qpdf's source distribution.
165   -
166   -There are some other things you can do with the build. Although qpdf
167   -uses :command:`autoconf`, it does not use
168   -:command:`automake` but instead uses a
169   -hand-crafted non-recursive Makefile that requires gnu make. If you're
170   -really interested, please read the comments in the top-level
171   -:file:`Makefile`.
172   -
173   -.. _ref.crypto:
174   -
175   -Crypto Providers
176   -----------------
177   -
178   -Starting with qpdf 9.1.0, the qpdf library can be built with multiple
179   -implementations of providers of cryptographic functions, which we refer
180   -to as "crypto providers." At the time of writing, a crypto
181   -implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes
182   -and RC4 and AES256 with and without CBC encryption. In the future, if
183   -digital signature is added to qpdf, there may be additional requirements
184   -beyond this.
185   -
186   -Starting with qpdf version 9.1.0, the available implementations are
187   -``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added.
188   -Additional implementations may be added if needed. It is also possible
189   -for a developer to provide their own implementation without modifying
190   -the qpdf library.
191   -
192   -.. _ref.crypto.build:
193   -
194   -Build Support For Crypto Providers
195   -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
196   -
197   -When building with qpdf's build system, crypto providers can be enabled
198   -at build time using various :command:`./configure`
199   -options. The default behavior is for
200   -:command:`./configure` to discover which crypto providers
201   -can be supported based on available external libraries, to build all
202   -available crypto providers, and to use an external provider as the
203   -default over the native one. This behavior can be changed with the
204   -following flags to :command:`./configure`:
205   -
206   -- :samp:`--enable-crypto-{x}`
207   - (where :samp:`{x}` is a supported crypto
208   - provider): enable the :samp:`{x}` crypto
209   - provider, requiring any external dependencies it needs
210   -
211   -- :samp:`--disable-crypto-{x}`:
212   - disable the :samp:`{x}` provider, and do not
213   - link against its dependencies even if they are available
214   -
215   -- :samp:`--with-default-crypto={x}`:
216   - make :samp:`{x}` the default provider even if
217   - a higher priority one is available
218   -
219   -- :samp:`--disable-implicit-crypto`: only build crypto
220   - providers that are explicitly requested with an
221   - :samp:`--enable-crypto-{x}`
222   - option
223   -
224   -For example, if you want to guarantee that the gnutls crypto provider is
225   -used and that the native provider is not built, you could run
226   -:command:`./configure --enable-crypto-gnutls
227   ---disable-implicit-crypto`.
228   -
229   -If you build qpdf using your own build system, in order for qpdf to work
230   -at all, you need to enable at least one crypto provider. The file
231   -:file:`libqpdf/qpdf/qpdf-config.h.in` provides
232   -macros ``DEFAULT_CRYPTO``, whose value must be a string naming the
233   -default crypto provider, and various symbols starting with
234   -``USE_CRYPTO_``, at least one of which has to be enabled. Additionally,
235   -you must compile the source files that implement a crypto provider. To
236   -get a list of those files, look at
237   -:file:`libqpdf/build.mk`. If you want to omit a
238   -particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is
239   -undefined, you can completely ignore the source files that belong to a
240   -particular crypto provider. Additionally, crypto providers may have
241   -their own external dependencies that can be omitted if the crypto
242   -provider is not used. For example, if you are building qpdf yourself and
243   -are using an environment that does not support gnutls or openssl, you
244   -can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS``
245   -is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then
246   -you must include the source files used in the native implementation,
247   -some of which were added or renamed from earlier versions, to your
248   -build, and you can ignore
249   -:file:`QPDFCrypto_gnutls.cc`. Always consult
250   -:file:`libqpdf/build.mk` to get the list of source
251   -files you need to build.
252   -
253   -.. _ref.crypto.runtime:
254   -
255   -Runtime Crypto Provider Selection
256   -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
257   -
258   -You can use the :samp:`--show-crypto` option to
259   -:command:`qpdf` to get a list of available crypto
260   -providers. The default provider is always listed first, and the rest are
261   -listed in lexical order. Each crypto provider is listed on a line by
262   -itself with no other text, enabling the output of this command to be
263   -used easily in scripts.
264   -
265   -You can override which crypto provider is used by setting the
266   -``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to
267   -ever do this, but you might want to do it if you were explicitly trying
268   -to compare behavior of two different crypto providers while testing
269   -performance or reproducing a bug. It could also be useful for people who
270   -are implementing their own crypto providers.
271   -
272   -.. _ref.crypto.develop:
273   -
274   -Crypto Provider Information for Developers
275   -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
276   -
277   -If you are writing code that uses libqpdf and you want to force a
278   -certain crypto provider to be used, you can call the method
279   -``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of
280   -a built-in or developer-supplied provider. To add your own crypto
281   -provider, you have to create a class derived from ``QPDFCryptoImpl`` and
282   -register it with ``QPDFCryptoProvider``. For additional information, see
283   -comments in :file:`include/qpdf/QPDFCryptoImpl.hh`.
284   -
285   -.. _ref.crypto.design:
286   -
287   -Crypto Provider Design Notes
288   -~~~~~~~~~~~~~~~~~~~~~~~~~~~~
289   -
290   -This section describes a few bits of rationale for why the crypto
291   -provider interface was set up the way it was. You don't need to know any
292   -of this information, but it's provided for the record and in case it's
293   -interesting.
294   -
295   -As a general rule, I want to avoid as much as possible including large
296   -blocks of code that are conditionally compiled such that, in most
297   -builds, some code is never built. This is dangerous because it makes it
298   -very easy for invalid code to creep in unnoticed. As such, I want it to
299   -be possible to build qpdf with all available crypto providers, and this
300   -is the way I build qpdf for local development. At the same time, if a
301   -particular packager feels that it is a security liability for qpdf to
302   -use crypto functionality from other than a library that gets
303   -considerable scrutiny for this specific purpose (such as gnutls,
304   -openssl, or nettle), then I want to give that packager the ability to
305   -completely disable qpdf's native implementation. Or if someone wants to
306   -avoid adding a dependency on one of the external crypto providers, I
307   -don't want the availability of the provider to impose additional
308   -external dependencies within that environment. Both of these are
309   -situations that I know to be true for some users of qpdf.
310   -
311   -I want registration and selection of crypto providers to be thread-safe,
312   -and I want it to work deterministically for a developer to provide their
313   -own crypto provider and be able to set it up as the default. This was
314   -the primary motivation behind requiring C++-11 as doing so enabled me to
315   -exploit the guaranteed thread safety of local block static
316   -initialization. The ``QPDFCryptoProvider`` class uses a singleton
317   -pattern with thread-safe initialization to create the singleton instance
318   -of ``QPDFCryptoProvider`` and exposes only static methods in its public
319   -interface. In this way, if a developer wants to call any
320   -``QPDFCryptoProvider`` methods, the library guarantees the
321   -``QPDFCryptoProvider`` is fully initialized and all built-in crypto
322   -providers are registered. Making ``QPDFCryptoProvider`` actually know
323   -about all the built-in providers may seem a bit sad at first, but this
324   -choice makes it extremely clear exactly what the initialization behavior
325   -is. There's no question about provider implementations automatically
326   -registering themselves in a nondeterministic order. It also means that
327   -implementations do not need to know anything about the provider
328   -interface, which makes them easier to test in isolation. Another
329   -advantage of this approach is that a developer who wants to develop
330   -their own crypto provider can do so in complete isolation from the qpdf
331   -library and, with just two calls, can make qpdf use their provider in
332   -their application. If they decided to contribute their code, plugging it
333   -into the qpdf library would require a very small change to qpdf's source
334   -code.
335   -
336   -The decision to make the crypto provider selectable at runtime was one I
337   -struggled with a little, but I decided to do it for various reasons.
338   -Allowing an end user to switch crypto providers easily could be very
339   -useful for reproducing a potential bug. If a user reports a bug that
340   -some cryptographic thing is broken, I can easily ask that person to try
341   -with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The
342   -same could apply in the event of a performance problem. This also makes
343   -it easier for qpdf's own test suite to exercise code with different
344   -providers without having to make every program that links with qpdf
345   -aware of the possibility of multiple providers. In qpdf's continuous
346   -integration environment, the entire test suite is run for each supported
347   -crypto provider. This is made simple by being able to select the
348   -provider using an environment variable.
349   -
350   -Finally, making crypto providers selectable in this way establish a
351   -pattern that I may follow again in the future for stream filter
352   -providers. One could imagine a future enhancement where someone could
353   -provide their own implementations for basic filters like
354   -``/FlateDecode`` or for other filters that qpdf doesn't support.
355   -Implementing the registration functions and internal storage of
356   -registered providers was also easier using C++-11's functional
357   -interfaces, which was another reason to require C++-11 at this time.
358   -
359   -.. _ref.packaging:
360   -
361   -Notes for Packagers
362   --------------------
363   -
364   -If you are packaging qpdf for an operating system distribution, here are
365   -some things you may want to keep in mind:
366   -
367   -- Starting in qpdf version 9.1.1, qpdf no longer has a runtime
368   - dependency on perl. This is because fix-qdf was rewritten in C++.
369   - However, qpdf still has a build-time dependency on perl.
370   -
371   -- Make sure you are getting the intended behavior with regard to crypto
372   - providers. Read :ref:`ref.crypto.build` for details.
373   -
374   -- Passing :samp:`--enable-show-failed-test-output` to
375   - :command:`./configure` will cause any failed test
376   - output to be written to the console. This can be very useful for
377   - seeing test failures generated by autobuilders where you can't access
378   - qtest.log after the fact.
379   -
380   -- If qpdf's build environment detects the presence of autoconf and
381   - related tools, it will check to ensure that automatically generated
382   - files are up-to-date with recorded checksums and fail if it detects a
383   - discrepancy. This feature is intended to prevent you from
384   - accidentally forgetting to regenerate automatic files after modifying
385   - their sources. If your packaging environment automatically refreshes
386   - automatic files, it can cause this check to fail. Suppress qpdf's
387   - checks by passing :samp:`--disable-check-autofiles`
388   - to :command:`/.configure`. This is safe since qpdf's
389   - :command:`autogen.sh` just runs autotools in the
390   - normal way.
391   -
392   -- QPDF's :command:`make install` does not install
393   - completion files by default, but as a packager, it's good if you
394   - install them wherever your distribution expects such files to go. You
395   - can find completion files to install in the
396   - :file:`completions` directory.
397   -
398   -- Packagers are encouraged to install the source files from the
399   - :file:`examples` directory along with qpdf
400   - development packages.
401   -
402   -.. _ref.using:
403   -
404   -Running QPDF
405   -============
406   -
407   -This chapter describes how to run the qpdf program from the command
408   -line.
409   -
410   -.. _ref.invocation:
411   -
412   -Basic Invocation
413   -----------------
414   -
415   -When running qpdf, the basic invocation is as follows:
416   -
417   -::
418   -
419   - qpdf [ options ] { infilename | --empty } outfilename
420   -
421   -This converts PDF file :samp:`infilename` to PDF file
422   -:samp:`outfilename`. The output file is functionally
423   -identical to the input file but may have been structurally reorganized.
424   -Also, orphaned objects will be removed from the file. Many
425   -transformations are available as controlled by the options below. In
426   -place of :samp:`infilename`, the parameter
427   -:samp:`--empty` may be specified. This causes qpdf to
428   -use a dummy input file that contains zero pages. The only normal use
429   -case for using :samp:`--empty` would be if you were
430   -going to add pages from another source, as discussed in :ref:`ref.page-selection`.
431   -
432   -If :samp:`@filename` appears as a word anywhere in the
433   -command-line, it will be read line by line, and each line will be
434   -treated as a command-line argument. Leading and trailing whitespace is
435   -intentionally not removed from lines, which makes it possible to handle
436   -arguments that start or end with spaces. The :samp:`@-`
437   -option allows arguments to be read from standard input. This allows qpdf
438   -to be invoked with an arbitrary number of arbitrarily long arguments. It
439   -is also very useful for avoiding having to pass passwords on the command
440   -line. Note that the :samp:`@filename` can't appear in
441   -the middle of an argument, so constructs such as
442   -:samp:`--arg=@option` will not work. You would have to
443   -include the argument and its options together in the arguments file.
444   -
445   -:samp:`outfilename` does not have to be seekable, even
446   -when generating linearized files. Specifying ":samp:`-`"
447   -as :samp:`outfilename` means to write to standard
448   -output. If you want to overwrite the input file with the output, use the
449   -option :samp:`--replace-input` and omit the output file
450   -name. You can't specify the same file as both the input and the output.
451   -If you do this, qpdf will tell you about the
452   -:samp:`--replace-input` option.
453   -
454   -Most options require an output file, but some testing or inspection
455   -commands do not. These are specifically noted.
456   -
457   -.. _ref.exit-status:
458   -
459   -Exit Status
460   -~~~~~~~~~~~
461   -
462   -The exit status of :command:`qpdf` may be interpreted as
463   -follows:
464   -
465   -- ``0``: no errors or warnings were found. The file may still have
466   - problems qpdf can't detect. If
467   - :samp:`--warning-exit-0` was specified, exit status 0
468   - is used even if there are warnings.
469   -
470   -- ``2``: errors were found. qpdf was not able to fully process the
471   - file.
472   -
473   -- ``3``: qpdf encountered problems that it was able to recover from. In
474   - some cases, the resulting file may still be damaged. Note that qpdf
475   - still exits with status ``3`` if it finds warnings even when
476   - :samp:`--no-warn` is specified. With
477   - :samp:`--warning-exit-0`, warnings without errors
478   - exit with status 0 instead of 3.
479   -
480   -Note that :command:`qpdf` never exists with status ``1``.
481   -If you get an exit status of ``1``, it was something else, like the
482   -shell not being able to find or execute :command:`qpdf`.
483   -
484   -.. _ref.shell-completion:
485   -
486   -Shell Completion
487   -----------------
488   -
489   -Starting in qpdf version 8.3.0, qpdf provides its own completion support
490   -for zsh and bash. You can enable bash completion with :command:`eval
491   -$(qpdf --completion-bash)` and zsh completion with
492   -:command:`eval $(qpdf --completion-zsh)`. If
493   -:command:`qpdf` is not in your path, you should invoke it
494   -above with an absolute path. If you invoke it with a relative path, it
495   -will warn you, and the completion won't work if you're in a different
496   -directory.
497   -
498   -qpdf will use ``argv[0]`` to figure out where its executable is. This
499   -may produce unwanted results in some cases, especially if you are trying
500   -to use completion with copy of qpdf that is built from source. You can
501   -specify a full path to the qpdf you want to use for completion in the
502   -``QPDF_EXECUTABLE`` environment variable.
503   -
504   -.. _ref.basic-options:
505   -
506   -Basic Options
507   --------------
508   -
509   -The following options are the most common ones and perform commonly
510   -needed transformations.
511   -
512   -:samp:`--help`
513   - Display command-line invocation help.
514   -
515   -:samp:`--version`
516   - Display the current version of qpdf.
517   -
518   -:samp:`--copyright`
519   - Show detailed copyright information.
520   -
521   -:samp:`--show-crypto`
522   - Show a list of available crypto providers, each on a line by itself.
523   - The default provider is always listed first. See :ref:`ref.crypto` for more information about crypto
524   - providers.
525   -
526   -:samp:`--completion-bash`
527   - Output a completion command you can eval to enable shell completion
528   - from bash.
529   -
530   -:samp:`--completion-zsh`
531   - Output a completion command you can eval to enable shell completion
532   - from zsh.
533   -
534   -:samp:`--password={password}`
535   - Specifies a password for accessing encrypted files. To read the
536   - password from a file or standard input, you can use
537   - :samp:`--password-file`, added in qpdf 10.2. Note
538   - that you can also use :samp:`@filename` or
539   - :samp:`@-` as described above to put the password in
540   - a file or pass it via standard input, but you would do so by
541   - specifying the entire
542   - :samp:`--password={password}`
543   - option in the file. Syntax such as
544   - :samp:`--password=@filename` won't work since
545   - :samp:`@filename` is not recognized in the middle of
546   - an argument.
547   -
548   -:samp:`--password-file={filename}`
549   - Reads the first line from the specified file and uses it as the
550   - password for accessing encrypted files.
551   - :samp:`{filename}`
552   - may be ``-`` to read the password from standard input. Note that, in
553   - this case, the password is echoed and there is no prompt, so use with
554   - caution.
555   -
556   -:samp:`--is-encrypted`
557   - Silently exit with status 0 if the file is encrypted or status 2 if
558   - the file is not encrypted. This is useful for shell scripts. Other
559   - options are ignored if this is given. This option is mutually
560   - exclusive with :samp:`--requires-password`. Both this
561   - option and :samp:`--requires-password` exit with
562   - status 2 for non-encrypted files.
563   -
564   -:samp:`--requires-password`
565   - Silently exit with status 0 if a password (other than as supplied) is
566   - required. Exit with status 2 if the file is not encrypted. Exit with
567   - status 3 if the file is encrypted but requires no password or the
568   - correct password has been supplied. This is useful for shell scripts.
569   - Note that any supplied password is used when opening the file. When
570   - used with a :samp:`--password` option, this option
571   - can be used to check the correctness of the password. In that case,
572   - an exit status of 3 means the file works with the supplied password.
573   - This option is mutually exclusive with
574   - :samp:`--is-encrypted`. Both this option and
575   - :samp:`--is-encrypted` exit with status 2 for
576   - non-encrypted files.
577   -
578   -:samp:`--verbose`
579   - Increase verbosity of output. For now, this just prints some
580   - indication of any file that it creates.
581   -
582   -:samp:`--progress`
583   - Indicate progress while writing files.
584   -
585   -:samp:`--no-warn`
586   - Suppress writing of warnings to stderr. If warnings were detected and
587   - suppressed, :command:`qpdf` will still exit with exit
588   - code 3. See also :samp:`--warning-exit-0`.
589   -
590   -:samp:`--warning-exit-0`
591   - If warnings are found but no errors, exit with exit code 0 instead 3.
592   - When combined with :samp:`--no-warn`, the effect is
593   - for :command:`qpdf` to completely ignore warnings.
594   -
595   -:samp:`--linearize`
596   - Causes generation of a linearized (web-optimized) output file.
597   -
598   -:samp:`--replace-input`
599   - If specified, the output file name should be omitted. This option
600   - tells qpdf to replace the input file with the output. It does this by
601   - writing to
602   - :file:`{infilename}.~qpdf-temp#`
603   - and, when done, overwriting the input file with the temporary file.
604   - If there were any warnings, the original input is saved as
605   - :file:`{infilename}.~qpdf-orig`.
606   -
607   -:samp:`--copy-encryption=file`
608   - Encrypt the file using the same encryption parameters, including user
609   - and owner password, as the specified file. Use
610   - :samp:`--encryption-file-password` to specify a
611   - password if one is needed to open this file. Note that copying the
612   - encryption parameters from a file also copies the first half of
613   - ``/ID`` from the file since this is part of the encryption
614   - parameters.
615   -
616   -:samp:`--encryption-file-password=password`
617   - If the file specified with :samp:`--copy-encryption`
618   - requires a password, specify the password using this option. Note
619   - that only one of the user or owner password is required. Both
620   - passwords will be preserved since QPDF does not distinguish between
621   - the two passwords. It is possible to preserve encryption parameters,
622   - including the owner password, from a file even if you don't know the
623   - file's owner password.
624   -
625   -:samp:`--allow-weak-crypto`
626   - Starting with version 10.4, qpdf issues warnings when requested to
627   - create files using RC4 encryption. This option suppresses those
628   - warnings. In future versions of qpdf, qpdf will refuse to create
629   - files with weak cryptography when this flag is not given. See :ref:`ref.weak-crypto` for additional details.
630   -
631   -:samp:`--encrypt options --`
632   - Causes generation an encrypted output file. Please see :ref:`ref.encryption-options` for details on how to specify
633   - encryption parameters.
634   -
635   -:samp:`--decrypt`
636   - Removes any encryption on the file. A password must be supplied if
637   - the file is password protected.
638   -
639   -:samp:`--password-is-hex-key`
640   - Overrides the usual computation/retrieval of the PDF file's
641   - encryption key from user/owner password with an explicit
642   - specification of the encryption key. When this option is specified,
643   - the argument to the :samp:`--password` option is
644   - interpreted as a hexadecimal-encoded key value. This only applies to
645   - the password used to open the main input file. It does not apply to
646   - other files opened by :samp:`--pages` or other
647   - options or to files being written.
648   -
649   - Most users will never have a need for this option, and no standard
650   - viewers support this mode of operation, but it can be useful for
651   - forensic or investigatory purposes. For example, if a PDF file is
652   - encrypted with an unknown password, a brute-force attack using the
653   - key directly is sometimes more efficient than one using the password.
654   - Also, if a file is heavily damaged, it may be possible to derive the
655   - encryption key and recover parts of the file using it directly. To
656   - expose the encryption key used by an encrypted file that you can open
657   - normally, use the :samp:`--show-encryption-key`
658   - option.
659   -
660   -:samp:`--suppress-password-recovery`
661   - Ordinarily, qpdf attempts to automatically compensate for passwords
662   - specified in the wrong character encoding. This option suppresses
663   - that behavior. Under normal conditions, there are no reasons to use
664   - this option. See :ref:`ref.unicode-passwords` for a
665   - discussion
666   -
667   -:samp:`--password-mode={mode}`
668   - This option can be used to fine-tune how qpdf interprets Unicode
669   - (non-ASCII) password strings passed on the command line. With the
670   - exception of the :samp:`hex-bytes` mode, these only
671   - apply to passwords provided when encrypting files. The
672   - :samp:`hex-bytes` mode also applies to passwords
673   - specified for reading files. For additional discussion of the
674   - supported password modes and when you might want to use them, see
675   - :ref:`ref.unicode-passwords`. The following modes
676   - are supported:
677   -
678   - - :samp:`auto`: Automatically determine whether the
679   - specified password is a properly encoded Unicode (UTF-8) string,
680   - and transcode it as required by the PDF spec based on the type
681   - encryption being applied. On Windows starting with version 8.4.0,
682   - and on almost all other modern platforms, incoming passwords will
683   - be properly encoded in UTF-8, so this is almost always what you
684   - want.
685   -
686   - - :samp:`unicode`: Tells qpdf that the incoming
687   - password is UTF-8, overriding whatever its automatic detection
688   - determines. The only difference between this mode and
689   - :samp:`auto` is that qpdf will fail with an error
690   - message if the password is not valid UTF-8 instead of falling back
691   - to :samp:`bytes` mode with a warning.
692   -
693   - - :samp:`bytes`: Interpret the password as a literal
694   - byte string. For non-Windows platforms, this is what versions of
695   - qpdf prior to 8.4.0 did. For Windows platforms, there is no way to
696   - specify strings of binary data on the command line directly, but
697   - you can use the :samp:`@filename` option to do it,
698   - in which case this option forces qpdf to respect the string of
699   - bytes as provided. This option will allow you to encrypt PDF files
700   - with passwords that will not be usable by other readers.
701   -
702   - - :samp:`hex-bytes`: Interpret the password as a
703   - hex-encoded string. This provides a way to pass binary data as a
704   - password on all platforms including Windows. As with
705   - :samp:`bytes`, this option may allow creation of
706   - files that can't be opened by other readers. This mode affects
707   - qpdf's interpretation of passwords specified for decrypting files
708   - as well as for encrypting them. It makes it possible to specify
709   - strings that are encoded in some manner other than the system's
710   - default encoding.
711   -
712   -:samp:`--rotate=[+|-]angle[:page-range]`
713   - Apply rotation to specified pages. The
714   - :samp:`page-range` portion of the option value has
715   - the same format as page ranges in :ref:`ref.page-selection`. If the page range is omitted, the
716   - rotation is applied to all pages. The :samp:`angle`
717   - portion of the parameter may be either 0, 90, 180, or 270. If
718   - preceded by :samp:`+` or :samp:`-`,
719   - the angle is added to or subtracted from the specified pages'
720   - original rotations. This is almost always what you want. Otherwise
721   - the pages' rotations are set to the exact value, which may cause the
722   - appearances of the pages to be inconsistent, especially for scans.
723   - For example, the command :command:`qpdf in.pdf out.pdf
724   - --rotate=+90:2,4,6 --rotate=180:7-8` would rotate pages
725   - 2, 4, and 6 90 degrees clockwise from their original rotation and
726   - force the rotation of pages 7 through 8 to 180 degrees regardless of
727   - their original rotation, and the command :command:`qpdf in.pdf
728   - out.pdf --rotate=+180` would rotate all pages by 180
729   - degrees.
730   -
731   -:samp:`--keep-files-open={[yn]}`
732   - This option controls whether qpdf keeps individual files open while
733   - merging. Prior to version 8.1.0, qpdf always kept all files open, but
734   - this meant that the number of files that could be merged was limited
735   - by the operating system's open file limit. Version 8.1.0 opened files
736   - as they were referenced and closed them after each read, but this
737   - caused a major performance impact. Version 8.2.0 optimized the
738   - performance but did so in a way that, for local file systems, there
739   - was a small but unavoidable performance hit, but for networked file
740   - systems, the performance impact could be very high. Starting with
741   - version 8.2.1, the default behavior is that files are kept open if no
742   - more than 200 files are specified, but this default behavior can be
743   - explicitly overridden with the
744   - :samp:`--keep-files-open` flag. If you are merging
745   - more than 200 files but less than the operating system's max open
746   - files limit, you may want to use
747   - :samp:`--keep-files-open=y`, especially if working
748   - over a networked file system. If you are using a local file system
749   - where the overhead is low and you might sometimes merge more than the
750   - OS limit's number of files from a script and are not worried about a
751   - few seconds additional processing time, you may want to specify
752   - :samp:`--keep-files-open=n`. The threshold for
753   - switching may be changed from the default 200 with the
754   - :samp:`--keep-files-open-threshold` option.
755   -
756   -:samp:`--keep-files-open-threshold={count}`
757   - If specified, overrides the default value of 200 used as the
758   - threshold for qpdf deciding whether or not to keep files open. See
759   - :samp:`--keep-files-open` for details.
760   -
761   -:samp:`--pages options --`
762   - Select specific pages from one or more input files. See :ref:`ref.page-selection` for details on how to do
763   - page selection (splitting and merging).
764   -
765   -:samp:`--collate={n}`
766   - When specified, collate rather than concatenate pages from files
767   - specified with :samp:`--pages`. With a numeric
768   - argument, collate in groups of :samp:`{n}`.
769   - The default is 1. See :ref:`ref.page-selection` for additional details.
770   -
771   -:samp:`--flatten-rotation`
772   - For each page that is rotated using the ``/Rotate`` key in the page's
773   - dictionary, remove the ``/Rotate`` key and implement the identical
774   - rotation semantics by modifying the page's contents. This option can
775   - be useful to prepare files for buggy PDF applications that don't
776   - properly handle rotated pages.
777   -
778   -:samp:`--split-pages=[n]`
779   - Write each group of :samp:`n` pages to a separate
780   - output file. If :samp:`n` is not specified, create
781   - single pages. Output file names are generated as follows:
782   -
783   - - If the string ``%d`` appears in the output file name, it is
784   - replaced with a range of zero-padded page numbers starting from 1.
785   -
786   - - Otherwise, if the output file name ends in
787   - :file:`.pdf` (case insensitive), a zero-padded
788   - page range, preceded by a dash, is inserted before the file
789   - extension.
790   -
791   - - Otherwise, the file name is appended with a zero-padded page range
792   - preceded by a dash.
793   -
794   - Page ranges are a single number in the case of single-page groups or
795   - two numbers separated by a dash otherwise. For example, if
796   - :file:`infile.pdf` has 12 pages
797   -
798   - - :command:`qpdf --split-pages infile.pdf %d-out`
799   - would generate files :file:`01-out` through
800   - :file:`12-out`
801   -
802   - - :command:`qpdf --split-pages=2 infile.pdf
803   - outfile.pdf` would generate files
804   - :file:`outfile-01-02.pdf` through
805   - :file:`outfile-11-12.pdf`
806   -
807   - - :command:`qpdf --split-pages infile.pdf
808   - something.else` would generate files
809   - :file:`something.else-01` through
810   - :file:`something.else-12`
811   -
812   - Note that outlines, threads, and other global features of the
813   - original PDF file are not preserved. For each page of output, this
814   - option creates an empty PDF and copies a single page from the output
815   - into it. If you require the global data, you will have to run
816   - :command:`qpdf` with the
817   - :samp:`--pages` option once for each file. Using
818   - :samp:`--split-pages` is much faster if you don't
819   - require the global data.
820   -
821   -:samp:`--overlay options --`
822   - Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
823   - overlay/underlay.
824   -
825   -:samp:`--underlay options --`
826   - Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
827   - overlay/underlay.
828   -
829   -Password-protected files may be opened by specifying a password. By
830   -default, qpdf will preserve any encryption data associated with a file.
831   -If :samp:`--decrypt` is specified, qpdf will attempt to
832   -remove any encryption information. If :samp:`--encrypt`
833   -is specified, qpdf will replace the document's encryption parameters
834   -with whatever is specified.
835   -
836   -Note that qpdf does not obey encryption restrictions already imposed on
837   -the file. Doing so would be meaningless since qpdf can be used to remove
838   -encryption from the file entirely. This functionality is not intended to
839   -be used for bypassing copyright restrictions or other restrictions
840   -placed on files by their producers.
841   -
842   -Prior to 8.4.0, in the case of passwords that contain characters that
843   -fall outside of 7-bit US-ASCII, qpdf left the burden of supplying
844   -properly encoded encryption and decryption passwords to the user.
845   -Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For
846   -an in-depth discussion, please see :ref:`ref.unicode-passwords`. Previous versions of this manual
847   -described workarounds using the :command:`iconv` command.
848   -Such workarounds are no longer required or recommended with qpdf 8.4.0.
849   -However, for backward compatibility, qpdf attempts to detect those
850   -workarounds and do the right thing in most cases.
851   -
852   -.. _ref.encryption-options:
853   -
854   -Encryption Options
855   -------------------
856   -
857   -To change the encryption parameters of a file, use the --encrypt flag.
858   -The syntax is
859   -
860   -::
861   -
862   - --encrypt user-password owner-password key-length [ restrictions ] --
863   -
864   -Note that ":samp:`--`" terminates parsing of encryption
865   -flags and must be present even if no restrictions are present.
866   -
867   -Either or both of the user password and the owner password may be empty
868   -strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation
869   -of PDF files with a non-empty user password, an empty owner password,
870   -and a 256-bit key since such files can be opened with no password. If
871   -you want to create such files, specify the encryption option
872   -:samp:`--allow-insecure`, as described below.
873   -
874   -The value for
875   -:samp:`{key-length}` may
876   -be 40, 128, or 256. The restriction flags are dependent upon key length.
877   -When no additional restrictions are given, the default is to be fully
878   -permissive.
879   -
880   -If :samp:`{key-length}`
881   -is 40, the following restriction options are available:
882   -
883   -:samp:`--print=[yn]`
884   - Determines whether or not to allow printing.
885   -
886   -:samp:`--modify=[yn]`
887   - Determines whether or not to allow document modification.
888   -
889   -:samp:`--extract=[yn]`
890   - Determines whether or not to allow text/image extraction.
891   -
892   -:samp:`--annotate=[yn]`
893   - Determines whether or not to allow comments and form fill-in and
894   - signing.
895   -
896   -If :samp:`{key-length}`
897   -is 128, the following restriction options are available:
898   -
899   -:samp:`--accessibility=[yn]`
900   - Determines whether or not to allow accessibility to visually
901   - impaired. The qpdf library disregards this field when AES is used or
902   - when 256-bit encryption is used. You should really never disable
903   - accessibility, but qpdf lets you do it in case you need to configure
904   - a file this way for testing purposes. The PDF spec says that
905   - conforming readers should disregard this permission and always allow
906   - accessibility.
907   -
908   -:samp:`--extract=[yn]`
909   - Determines whether or not to allow text/graphic extraction.
910   -
911   -:samp:`--assemble=[yn]`
912   - Determines whether document assembly (rotation and reordering of
913   - pages) is allowed.
914   -
915   -:samp:`--annotate=[yn]`
916   - Determines whether modifying annotations is allowed. This includes
917   - adding comments and filling in form fields. Also allows editing of
918   - form fields if :samp:`--modify-other=y` is given.
919   -
920   -:samp:`--form=[yn]`
921   - Determines whether filling form fields is allowed.
922   -
923   -:samp:`--modify-other=[yn]`
924   - Allow all document editing except those controlled separately by the
925   - :samp:`--assemble`,
926   - :samp:`--annotate`, and
927   - :samp:`--form` options.
928   -
929   -:samp:`--print={print-opt}`
930   - Controls printing access.
931   - :samp:`{print-opt}`
932   - may be one of the following:
933   -
934   - - :samp:`full`: allow full printing
935   -
936   - - :samp:`low`: allow low-resolution printing only
937   -
938   - - :samp:`none`: disallow printing
939   -
940   -:samp:`--modify={modify-opt}`
941   - Controls modify access. This way of controlling modify access has
942   - less granularity than new options added in qpdf 8.4.
943   - :samp:`{modify-opt}`
944   - may be one of the following:
945   -
946   - - :samp:`all`: allow full document modification
947   -
948   - - :samp:`annotate`: allow comment authoring, form
949   - operations, and document assembly
950   -
951   - - :samp:`form`: allow form field fill-in and signing
952   - and document assembly
953   -
954   - - :samp:`assembly`: allow document assembly only
955   -
956   - - :samp:`none`: allow no modifications
957   -
958   - Using the :samp:`--modify` option does not allow you
959   - to create certain combinations of permissions such as allowing form
960   - filling but not allowing document assembly. Starting with qpdf 8.4,
961   - you can either just use the other options to control fields
962   - individually, or you can use something like :samp:`--modify=form
963   - --assembly=n` to fine tune.
964   -
965   -:samp:`--cleartext-metadata`
966   - If specified, any metadata stream in the document will be left
967   - unencrypted even if the rest of the document is encrypted. This also
968   - forces the PDF version to be at least 1.5.
969   -
970   -:samp:`--use-aes=[yn]`
971   - If :samp:`--use-aes=y` is specified, AES encryption
972   - will be used instead of RC4 encryption. This forces the PDF version
973   - to be at least 1.6.
974   -
975   -:samp:`--allow-insecure`
976   - From qpdf 10.2, qpdf defaults to not allowing creation of PDF files
977   - where the user password is non-empty, the owner password is empty,
978   - and a 256-bit key is in use. Files created in this way are insecure
979   - since they can be opened without a password. Users would ordinarily
980   - never want to create such files. If you are using qpdf to
981   - intentionally created strange files for testing (a definite valid use
982   - of qpdf!), this option allows you to create such insecure files.
983   -
984   -:samp:`--force-V4`
985   - Use of this option forces the ``/V`` and ``/R`` parameters in the
986   - document's encryption dictionary to be set to the value ``4``. As
987   - qpdf will automatically do this when required, there is no reason to
988   - ever use this option. It exists primarily for use in testing qpdf
989   - itself. This option also forces the PDF version to be at least 1.5.
990   -
991   -If :samp:`{key-length}`
992   -is 256, the minimum PDF version is 1.7 with extension level 8, and the
993   -AES-based encryption format used is the PDF 2.0 encryption method
994   -supported by Acrobat X. the same options are available as with 128 bits
995   -with the following exceptions:
996   -
997   -:samp:`--use-aes`
998   - This option is not available with 256-bit keys. AES is always used
999   - with 256-bit encryption keys.
1000   -
1001   -:samp:`--force-V4`
1002   - This option is not available with 256 keys.
1003   -
1004   -:samp:`--force-R5`
1005   - If specified, qpdf sets the minimum version to 1.7 at extension level
1006   - 3 and writes the deprecated encryption format used by Acrobat version
1007   - IX. This option should not be used in practice to generate PDF files
1008   - that will be in general use, but it can be useful to generate files
1009   - if you are trying to test proper support in another application for
1010   - PDF files encrypted in this way.
1011   -
1012   -The default for each permission option is to be fully permissive.
1013   -
1014   -.. _ref.page-selection:
1015   -
1016   -Page Selection Options
1017   -----------------------
1018   -
1019   -Starting with qpdf 3.0, it is possible to split and merge PDF files by
1020   -selecting pages from one or more input files. Whatever file is given as
1021   -the primary input file is used as the starting point, but its pages are
1022   -replaced with pages as specified.
1023   -
1024   -::
1025   -
1026   - --pages input-file [ --password=password ] [ page-range ] [ ... ] --
1027   -
1028   -Multiple input files may be specified. Each one is given as the name of
1029   -the input file, an optional password (if required to open the file), and
1030   -the range of pages. Note that ":samp:`--`" terminates
1031   -parsing of page selection flags.
1032   -
1033   -Starting with qpf 8.4, the special input file name
1034   -":file:`.`" can be used as a shortcut for the
1035   -primary input filename.
1036   -
1037   -For each file that pages should be taken from, specify the file, a
1038   -password needed to open the file (if any), and a page range. The
1039   -password needs to be given only once per file. If any of the input files
1040   -are the same as the primary input file or the file used to copy
1041   -encryption parameters (if specified), you do not need to repeat the
1042   -password here. The same file can be repeated multiple times. If a file
1043   -that is repeated has a password, the password only has to be given the
1044   -first time. All non-page data (info, outlines, page numbers, etc.) are
1045   -taken from the primary input file. To discard these, use
1046   -:samp:`--empty` as the primary input.
1047   -
1048   -Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf
1049   -sees a value in the place where it expects a page range and that value
1050   -is not a valid range but is a valid file name, qpdf will implicitly use
1051   -the range ``1-z``, meaning that it will include all pages in the file.
1052   -This makes it possible to easily combine all pages in a set of files
1053   -with a command like :command:`qpdf --empty out.pdf --pages \*.pdf
1054   ---`.
1055   -
1056   -The page range is a set of numbers separated by commas, ranges of
1057   -numbers separated dashes, or combinations of those. The character "z"
1058   -represents the last page. A number preceded by an "r" indicates to count
1059   -from the end, so ``r3-r1`` would be the last three pages of the
1060   -document. Pages can appear in any order. Ranges can appear with a high
1061   -number followed by a low number, which causes the pages to appear in
1062   -reverse. Numbers may be repeated in a page range. A page range may be
1063   -optionally appended with ``:even`` or ``:odd`` to indicate only the even
1064   -or odd pages in the given range. Note that even and odd refer to the
1065   -positions within the specified, range, not whether the original number
1066   -is even or odd.
1067   -
1068   -Example page ranges:
1069   -
1070   -- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in
1071   - that order.
1072   -
1073   -- ``z-1``: all pages in the document in reverse
1074   -
1075   -- ``r3-r1``: the last three pages of the document
1076   -
1077   -- ``r1-r3``: the last three pages of the document in reverse order
1078   -
1079   -- ``1-20:even``: even pages from 2 to 20
1080   -
1081   -- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd
1082   - positions from among the original range, which represents pages 5, 7,
1083   - 8, 9, and 12.
1084   -
1085   -Starting in qpdf version 8.3, you can specify the
1086   -:samp:`--collate` option. Note that this option is
1087   -specified outside of :samp:`--pagesย ...ย --`. When
1088   -:samp:`--collate` is specified, it changes the meaning
1089   -of :samp:`--pages` so that the specified files, as
1090   -modified by page ranges, are collated rather than concatenated. For
1091   -example, if you add the files :file:`odd.pdf` and
1092   -:file:`even.pdf` containing odd and even pages of a
1093   -document respectively, you could run :command:`qpdf --collate odd.pdf
1094   ---pages odd.pdf even.pdf -- all.pdf` to collate the pages.
1095   -This would pick page 1 from odd, page 1 from even, page 2 from odd, page
1096   -2 from even, etc. until all pages have been included. Any number of
1097   -files and page ranges can be specified. If any file has fewer pages,
1098   -that file is just skipped when its pages have all been included. For
1099   -example, if you ran :command:`qpdf --collate --empty --pages a.pdf
1100   -1-5 b.pdf 6-4 c.pdf r1 -- out.pdf`, you would get the
1101   -following pages in this order:
1102   -
1103   -- a.pdf page 1
1104   -
1105   -- b.pdf page 6
1106   -
1107   -- c.pdf last page
1108   -
1109   -- a.pdf page 2
1110   -
1111   -- b.pdf page 5
1112   -
1113   -- a.pdf page 3
1114   -
1115   -- b.pdf page 4
1116   -
1117   -- a.pdf page 4
1118   -
1119   -- a.pdf page 5
1120   -
1121   -Starting in qpdf version 10.2, you may specify a numeric argument to
1122   -:samp:`--collate`. With
1123   -:samp:`--collate={n}`,
1124   -pull groups of :samp:`{n}` pages from each file,
1125   -again, stopping when there are no more pages. For example, if you ran
1126   -:command:`qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf
1127   -r1 -- out.pdf`, you would get the following pages in this
1128   -order:
1129   -
1130   -- a.pdf page 1
1131   -
1132   -- a.pdf page 2
1133   -
1134   -- b.pdf page 6
1135   -
1136   -- b.pdf page 5
1137   -
1138   -- c.pdf last page
1139   -
1140   -- a.pdf page 3
1141   -
1142   -- a.pdf page 4
1143   -
1144   -- b.pdf page 4
1145   -
1146   -- a.pdf page 5
1147   -
1148   -Starting in qpdf version 8.3, when you split and merge files, any page
1149   -labels (page numbers) are preserved in the final file. It is expected
1150   -that more document features will be preserved by splitting and merging.
1151   -In the mean time, semantics of splitting and merging vary across
1152   -features. For example, the document's outlines (bookmarks) point to
1153   -actual page objects, so if you select some pages and not others,
1154   -bookmarks that point to pages that are in the output file will work, and
1155   -remaining bookmarks will not work. A future version of
1156   -:command:`qpdf` may do a better job at handling these
1157   -issues. (Note that the qpdf library already contains all of the APIs
1158   -required in order to implement this in your own application if you need
1159   -it.) In the mean time, you can always use
1160   -:samp:`--empty` as the primary input file to avoid
1161   -copying all of that from the first file. For example, to take pages 1
1162   -through 5 from a :file:`infile.pdf` while preserving
1163   -all metadata associated with that file, you could use
1164   -
1165   -::
1166   -
1167   - qpdf infile.pdf --pages . 1-5 -- outfile.pdf
1168   -
1169   -If you wanted pages 1 through 5 from
1170   -:file:`infile.pdf` but you wanted the rest of the
1171   -metadata to be dropped, you could instead run
1172   -
1173   -::
1174   -
1175   - qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf
1176   -
1177   -If you wanted to take pages 1 through 5 from
1178   -:file:`file1.pdf` and pages 11 through 15 from
1179   -:file:`file2.pdf` in reverse, taking document-level
1180   -metadata from :file:`file2.pdf`, you would run
1181   -
1182   -::
1183   -
1184   - qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf
1185   -
1186   -If, for some reason, you wanted to take the first page of an encrypted
1187   -file called :file:`encrypted.pdf` with password
1188   -``pass`` and repeat it twice in an output file, and if you wanted to
1189   -drop document-level metadata but preserve encryption, you would use
1190   -
1191   -::
1192   -
1193   - qpdf --empty --copy-encryption=encrypted.pdf --encryption-file-password=pass
1194   - --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --
1195   - outfile.pdf
1196   -
1197   -Note that we had to specify the password all three times because giving
1198   -a password as :samp:`--encryption-file-password` doesn't
1199   -count for page selection, and as far as qpdf is concerned,
1200   -:file:`encrypted.pdf` and
1201   -:file:`./encrypted.pdf` are separated files. These
1202   -are all corner cases that most users should hopefully never have to be
1203   -bothered with.
1204   -
1205   -Prior to version 8.4, it was not possible to specify the same page from
1206   -the same file directly more than once, and the workaround of specifying
1207   -the same file in more than one way was required. Version 8.4 removes
1208   -this limitation, but there is still a valid use case. When you specify
1209   -the same page from the same file more than once, qpdf will share objects
1210   -between the pages. If you are going to do further manipulation on the
1211   -file and need the two instances of the same original page to be deep
1212   -copies, then you can specify the file in two different ways. For example
1213   -:command:`qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf`
1214   -would create a file with two copies of the first page of the input, and
1215   -the two copies would share any objects in common. This includes fonts,
1216   -images, and anything else the page references.
1217   -
1218   -.. _ref.overlay-underlay:
1219   -
1220   -Overlay and Underlay Options
1221   -----------------------------
1222   -
1223   -Starting with qpdf 8.4, it is possible to overlay or underlay pages from
1224   -other files onto the output generated by qpdf. Specify overlay or
1225   -underlay as follows:
1226   -
1227   -::
1228   -
1229   - { --overlay | --underlay } file [ options ] --
1230   -
1231   -Overlay and underlay options are processed late, so they can be combined
1232   -with other like merging and will apply to the final output. The
1233   -:samp:`--overlay` and :samp:`--underlay`
1234   -options work the same way, except underlay pages are drawn underneath
1235   -the page to which they are applied, possibly obscured by the original
1236   -page, and overlay files are drawn on top of the page to which they are
1237   -applied, possibly obscuring the page. You can combine overlay and
1238   -underlay.
1239   -
1240   -The default behavior of overlay and underlay is that pages are taken
1241   -from the overlay/underlay file in sequence and applied to corresponding
1242   -pages in the output until there are no more output pages. If the overlay
1243   -or underlay file runs out of pages, remaining output pages are left
1244   -alone. This behavior can be modified by options, which are provided
1245   -between the :samp:`--overlay` or
1246   -:samp:`--underlay` flag and the
1247   -:samp:`--` option. The following options are supported:
1248   -
1249   -- :samp:`--password=password`: supply a password if the
1250   - overlay/underlay file is encrypted.
1251   -
1252   -- :samp:`--to=page-range`: a range of pages in the same
1253   - form at described in :ref:`ref.page-selection`
1254   - indicates which pages in the output should have the overlay/underlay
1255   - applied. If not specified, overlay/underlay are applied to all pages.
1256   -
1257   -- :samp:`--from=[page-range]`: a range of pages that
1258   - specifies which pages in the overlay/underlay file will be used for
1259   - overlay or underlay. If not specified, all pages will be used. This
1260   - can be explicitly specified to be empty if
1261   - :samp:`--repeat` is used.
1262   -
1263   -- :samp:`--repeat=page-range`: an optional range of
1264   - pages that specifies which pages in the overlay/underlay file will be
1265   - repeated after the "from" pages are used up. If you want to repeat a
1266   - range of pages starting at the beginning, you can explicitly use
1267   - :samp:`--from=`.
1268   -
1269   -Here are some examples.
1270   -
1271   -- :command:`--overlay o.pdf --to=1-5 --from=1-3 --repeat=4
1272   - --`: overlay the first three pages from file
1273   - :file:`o.pdf` onto the first three pages of the
1274   - output, then overlay page 4 from :file:`o.pdf`
1275   - onto pages 4 and 5 of the output. Leave remaining output pages
1276   - untouched.
1277   -
1278   -- :command:`--underlay footer.pdf --from= --repeat=1,2
1279   - --`: Underlay page 1 of
1280   - :file:`footer.pdf` on all odd output pages, and
1281   - underlay page 2 of :file:`footer.pdf` on all even
1282   - output pages.
1283   -
1284   -.. _ref.attachments:
1285   -
1286   -Embedded Files/Attachments Options
1287   -----------------------------------
1288   -
1289   -Starting with qpdf 10.2, you can work with file attachments in PDF files
1290   -from the command line. The following options are available:
1291   -
1292   -:samp:`--list-attachments`
1293   - Show the "key" and stream number for embedded files. With
1294   - :samp:`--verbose`, additional information, including
1295   - preferred file name, description, dates, and more are also displayed.
1296   - The key is usually but not always equal to the file name, and is
1297   - needed by some of the other options.
1298   -
1299   -:samp:`--show-attachment={key}`
1300   - Write the contents of the specified attachment to standard output as
1301   - binary data. The key should match one of the keys shown by
1302   - :samp:`--list-attachments`. If specified multiple
1303   - times, only the last attachment will be shown.
1304   -
1305   -:samp:`--add-attachment {file} {options} --`
1306   - Add or replace an attachment with the contents of
1307   - :samp:`{file}`. This may be specified more
1308   - than once. The following additional options may appear before the
1309   - ``--`` that ends this option:
1310   -
1311   - :samp:`--key={key}`
1312   - The key to use to register the attachment in the embedded files
1313   - table. Defaults to the last path element of
1314   - :samp:`{file}`.
1315   -
1316   - :samp:`--filename={name}`
1317   - The file name to be used for the attachment. This is what is
1318   - usually displayed to the user and is the name most graphical PDF
1319   - viewers will use when saving a file. It defaults to the last path
1320   - element of :samp:`{file}`.
1321   -
1322   - :samp:`--creationdate={date}`
1323   - The attachment's creation date in PDF format; defaults to the
1324   - current time. The date format is explained below.
1325   -
1326   - :samp:`--moddate={date}`
1327   - The attachment's modification date in PDF format; defaults to the
1328   - current time. The date format is explained below.
1329   -
1330   - :samp:`--mimetype={type/subtype}`
1331   - The mime type for the attachment, e.g. ``text/plain`` or
1332   - ``application/pdf``. Note that the mimetype appears in a field
1333   - called ``/Subtype`` in the PDF but actually includes the full type
1334   - and subtype of the mime type.
1335   -
1336   - :samp:`--description={"text"}`
1337   - Descriptive text for the attachment, displayed by some PDF
1338   - viewers.
1339   -
1340   - :samp:`--replace`
1341   - Indicates that any existing attachment with the same key should be
1342   - replaced by the new attachment. Otherwise,
1343   - :command:`qpdf` gives an error if an attachment
1344   - with that key is already present.
1345   -
1346   -:samp:`--remove-attachment={key}`
1347   - Remove the specified attachment. This doesn't only remove the
1348   - attachment from the embedded files table but also clears out the file
1349   - specification. That means that any potential internal links to the
1350   - attachment will be broken. This option may be specified multiple
1351   - times. Run with :samp:`--verbose` to see status of
1352   - the removal.
1353   -
1354   -:samp:`--copy-attachments-from {file} {options} --`
1355   - Copy attachments from another file. This may be specified more than
1356   - once. The following additional options may appear before the ``--``
1357   - that ends this option:
1358   -
1359   - :samp:`--password={password}`
1360   - If required, the password needed to open
1361   - :samp:`{file}`
1362   -
1363   - :samp:`--prefix={prefix}`
1364   - Only required if the file from which attachments are being copied
1365   - has attachments with keys that conflict with attachments already
1366   - in the file. In this case, the specified prefix will be prepended
1367   - to each key. This affects only the key in the embedded files
1368   - table, not the file name. The PDF specification doesn't preclude
1369   - multiple attachments having the same file name.
1370   -
1371   -When a date is required, the date should conform to the PDF date format
1372   -specification, which is
1373   -``D:``\ :samp:`{yyyymmddhhmmss<z>}`, where
1374   -:samp:`{<z>}` is either ``Z`` for UTC or a
1375   -timezone offset in the form :samp:`{-hh'mm'}` or
1376   -:samp:`{+hh'mm'}`. Examples:
1377   -``D:20210207161528-05'00'``, ``D:20210207211528Z``.
1378   -
1379   -.. _ref.advanced-parsing:
1380   -
1381   -Advanced Parsing Options
1382   -------------------------
1383   -
1384   -These options control aspects of how qpdf reads PDF files. Mostly these
1385   -are of use to people who are working with damaged files. There is little
1386   -reason to use these options unless you are trying to solve specific
1387   -problems. The following options are available:
1388   -
1389   -:samp:`--suppress-recovery`
1390   - Prevents qpdf from attempting to recover damaged files.
1391   -
1392   -:samp:`--ignore-xref-streams`
1393   - Tells qpdf to ignore any cross-reference streams.
1394   -
1395   -Ordinarily, qpdf will attempt to recover from certain types of errors in
1396   -PDF files. These include errors in the cross-reference table, certain
1397   -types of object numbering errors, and certain types of stream length
1398   -errors. Sometimes, qpdf may think it has recovered but may not have
1399   -actually recovered, so care should be taken when using this option as
1400   -some data loss is possible. The
1401   -:samp:`--suppress-recovery` option will prevent qpdf
1402   -from attempting recovery. In this case, it will fail on the first error
1403   -that it encounters.
1404   -
1405   -Ordinarily, qpdf reads cross-reference streams when they are present in
1406   -a PDF file. If :samp:`--ignore-xref-streams` is
1407   -specified, qpdf will ignore any cross-reference streams for hybrid PDF
1408   -files. The purpose of hybrid files is to make some content available to
1409   -viewers that are not aware of cross-reference streams. It is almost
1410   -never desirable to ignore them. The only time when you might want to use
1411   -this feature is if you are testing creation of hybrid PDF files and wish
1412   -to see how a PDF consumer that doesn't understand object and
1413   -cross-reference streams would interpret such a file.
1414   -
1415   -.. _ref.advanced-transformation:
1416   -
1417   -Advanced Transformation Options
1418   --------------------------------
1419   -
1420   -These transformation options control fine points of how qpdf creates the
1421   -output file. Mostly these are of use only to people who are very
1422   -familiar with the PDF file format or who are PDF developers. The
1423   -following options are available:
1424   -
1425   -:samp:`--compress-streams={[yn]}`
1426   - By default, or with :samp:`--compress-streams=y`,
1427   - qpdf will compress any stream with no other filters applied to it
1428   - with the ``/FlateDecode`` filter when it writes it. To suppress this
1429   - behavior and preserve uncompressed streams as uncompressed, use
1430   - :samp:`--compress-streams=n`.
1431   -
1432   -:samp:`--decode-level={option}`
1433   - Controls which streams qpdf tries to decode. The default is
1434   - :samp:`generalized`. The following options are
1435   - available:
1436   -
1437   - - :samp:`none`: do not attempt to decode any streams
1438   -
1439   - - :samp:`generalized`: decode streams filtered with
1440   - supported generalized filters: ``/LZWDecode``, ``/FlateDecode``,
1441   - ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized
1442   - filters as those to be used for general-purpose compression or
1443   - encoding, as opposed to filters specifically designed for image
1444   - data. Note that, by default, streams already compressed with
1445   - ``/FlateDecode`` are not uncompressed and recompressed unless you
1446   - also specify :samp:`--recompress-flate`.
1447   -
1448   - - :samp:`specialized`: in addition to generalized,
1449   - decode streams with supported non-lossy specialized filters;
1450   - currently this is just ``/RunLengthDecode``
1451   -
1452   - - :samp:`all`: in addition to generalized and
1453   - specialized, decode streams with supported lossy filters;
1454   - currently this is just ``/DCTDecode`` (JPEG)
1455   -
1456   -:samp:`--stream-data={option}`
1457   - Controls transformation of stream data. This option predates the
1458   - :samp:`--compress-streams` and
1459   - :samp:`--decode-level` options. Those options can be
1460   - used to achieve the same affect with more control. The value of
1461   - :samp:`{option}` may
1462   - be one of the following:
1463   -
1464   - - :samp:`compress`: recompress stream data when
1465   - possible (default); equivalent to
1466   - :samp:`--compress-streams=y`
1467   - :samp:`--decode-level=generalized`. Does not
1468   - recompress streams already compressed with ``/FlateDecode`` unless
1469   - :samp:`--recompress-flate` is also specified.
1470   -
1471   - - :samp:`preserve`: leave all stream data as is;
1472   - equivalent to :samp:`--compress-streams=n`
1473   - :samp:`--decode-level=none`
1474   -
1475   - - :samp:`uncompress`: uncompress stream data
1476   - compressed with generalized filters when possible; equivalent to
1477   - :samp:`--compress-streams=n`
1478   - :samp:`--decode-level=generalized`
1479   -
1480   -:samp:`--recompress-flate`
1481   - By default, streams already compressed with ``/FlateDecode`` are left
1482   - alone rather than being uncompressed and recompressed. This option
1483   - causes qpdf to uncompress and recompress the streams. There is a
1484   - significant performance cost to using this option, but you probably
1485   - want to use it if you specify
1486   - :samp:`--compression-level`.
1487   -
1488   -:samp:`--compression-level={level}`
1489   - When writing new streams that are compressed with ``/FlateDecode``,
1490   - use the specified compression level. The value of
1491   - :samp:`level` should be a number from 1 to 9 and is
1492   - passed directly to zlib, which implements deflate compression. Note
1493   - that qpdf doesn't uncompress and recompress streams by default. To
1494   - have this option apply to already compressed streams, you should also
1495   - specify :samp:`--recompress-flate`. If your goal is
1496   - to shrink the size of PDF files, you should also use
1497   - :samp:`--object-streams=generate`.
1498   -
1499   -:samp:`--normalize-content=[yn]`
1500   - Enables or disables normalization of content streams. Content
1501   - normalization is enabled by default in QDF mode. Please see :ref:`ref.qdf` for additional discussion of QDF mode.
1502   -
1503   -:samp:`--object-streams={mode}`
1504   - Controls handling of object streams. The value of
1505   - :samp:`{mode}` may be
1506   - one of the following:
1507   -
1508   - - :samp:`preserve`: preserve original object streams
1509   - (default)
1510   -
1511   - - :samp:`disable`: don't write any object streams
1512   -
1513   - - :samp:`generate`: use object streams wherever
1514   - possible
1515   -
1516   -:samp:`--preserve-unreferenced`
1517   - Tells qpdf to preserve objects that are not referenced when writing
1518   - the file. Ordinarily any object that is not referenced in a traversal
1519   - of the document from the trailer dictionary will be discarded. This
1520   - may be useful in working with some damaged files or inspecting files
1521   - with known unreferenced objects.
1522   -
1523   - This flag is ignored for linearized files and has the effect of
1524   - causing objects in the new file to be written in order by object ID
1525   - from the original file. This does not mean that object numbers will
1526   - be the same since qpdf may create stream lengths as direct or
1527   - indirect differently from the original file, and the original file
1528   - may have gaps in its numbering.
1529   -
1530   - See also :samp:`--preserve-unreferenced-resources`,
1531   - which does something completely different.
1532   -
1533   -:samp:`--remove-unreferenced-resources={option}`
1534   - The :samp:`{option}` may be ``auto``,
1535   - ``yes``, or ``no``. The default is ``auto``.
1536   -
1537   - Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt
1538   - to remove images and fonts that are not used by a page even if they
1539   - are referenced in the page's resources dictionary. When shared
1540   - resources are in use, this behavior can greatly reduce the file sizes
1541   - of split pages, but the analysis is very slow. In versions from 8.1
1542   - through 9.1.1, qpdf did this analysis by default. Starting in qpdf
1543   - 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file
1544   - to determine whether the file is likely to have unreferenced objects
1545   - on pages, a pattern that frequently occurs when resource dictionaries
1546   - are shared across multiple pages and rarely occurs otherwise. If it
1547   - discovers this pattern, then it will attempt to remove unreferenced
1548   - resources. Usually this means you get the slower splitting speed only
1549   - when it's actually going to create smaller files. You can suppress
1550   - removal of unreferenced resources altogether by specifying ``no`` or
1551   - force it to do the full algorithm by specifying ``yes``.
1552   -
1553   - Other than cases in which you don't care about file size and care a
1554   - lot about runtime, there are few reasons to use this option,
1555   - especially now that ``auto`` mode is supported. One reason to use
1556   - this is if you suspect that qpdf is removing resources it shouldn't
1557   - be removing. If you encounter that case, please report it as bug at
1558   - https://github.com/qpdf/qpdf/issues/.
1559   -
1560   -:samp:`--preserve-unreferenced-resources`
1561   - This is a synonym for
1562   - :samp:`--remove-unreferenced-resources=no`.
1563   -
1564   - See also :samp:`--preserve-unreferenced`, which does
1565   - something completely different.
1566   -
1567   -:samp:`--newline-before-endstream`
1568   - Tells qpdf to insert a newline before the ``endstream`` keyword, not
1569   - counted in the length, after any stream content even if the last
1570   - character of the stream was a newline. This may result in two
1571   - newlines in some cases. This is a requirement of PDF/A. While qpdf
1572   - doesn't specifically know how to generate PDF/A-compliant PDFs, this
1573   - at least prevents it from removing compliance on already compliant
1574   - files.
1575   -
1576   -:samp:`--linearize-pass1={file}`
1577   - Write the first pass of linearization to the named file. The
1578   - resulting file is not a valid PDF file. This option is useful only
1579   - for debugging ``QPDFWriter``'s linearization code. When qpdf
1580   - linearizes files, it writes the file in two passes, using the first
1581   - pass to calculate sizes and offsets that are required for hint tables
1582   - and the linearization dictionary. Ordinarily, the first pass is
1583   - discarded. This option enables it to be captured.
1584   -
1585   -:samp:`--coalesce-contents`
1586   - When a page's contents are split across multiple streams, this option
1587   - causes qpdf to combine them into a single stream. Use of this option
1588   - is never necessary for ordinary usage, but it can help when working
1589   - with some files in some cases. For example, this can also be combined
1590   - with QDF mode or content normalization to make it easier to look at
1591   - all of a page's contents at once.
1592   -
1593   -:samp:`--flatten-annotations={option}`
1594   - This option collapses annotations into the pages' contents with
1595   - special handling for form fields. Ordinarily, an annotation is
1596   - rendered separately and on top of the page. Combining annotations
1597   - into the page's contents effectively freezes the placement of the
1598   - annotations, making them look right after various page
1599   - transformations. The library functionality backing this option was
1600   - added for the benefit of programs that want to create *n-up* page
1601   - layouts and other similar things that don't work well with
1602   - annotations. The :samp:`{option}` parameter
1603   - may be any of the following:
1604   -
1605   - - :samp:`all`: include all annotations that are not
1606   - marked invisible or hidden
1607   -
1608   - - :samp:`print`: only include annotations that
1609   - indicate that they should appear when the page is printed
1610   -
1611   - - :samp:`screen`: omit annotations that indicate
1612   - they should not appear on the screen
1613   -
1614   - Note that form fields are special because the annotations that are
1615   - used to render filled-in form fields may become out of date from the
1616   - fields' values if the form is filled in by a program that doesn't
1617   - know how to update the appearances. If qpdf detects this case, its
1618   - default behavior is not to flatten those annotations because doing so
1619   - would cause the value of the form field to be lost. This gives you a
1620   - chance to go back and resave the form with a program that knows how
1621   - to generate appearances. QPDF itself can generate appearances with
1622   - some limitations. See the
1623   - :samp:`--generate-appearances` option below.
1624   -
1625   -:samp:`--generate-appearances`
1626   - If a file contains interactive form fields and indicates that the
1627   - appearances are out of date with the values of the form, this flag
1628   - will regenerate appearances, subject to a few limitations. Note that
1629   - there is not usually a reason to do this, but it can be necessary
1630   - before using the :samp:`--flatten-annotations`
1631   - option. Most of these are not a problem with well-behaved PDF files.
1632   - The limitations are as follows:
1633   -
1634   - - Radio button and checkbox appearances use the pre-set values in
1635   - the PDF file. QPDF just makes sure that the correct appearance is
1636   - displayed based on the value of the field. This is fine for PDF
1637   - files that create their forms properly. Some PDF writers save
1638   - appearances for fields when they change, which could cause some
1639   - controls to have inconsistent appearances.
1640   -
1641   - - For text fields and list boxes, any characters that fall outside
1642   - of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman"
1643   - encoding, will be replaced by the ``?`` character.
1644   -
1645   - - Quadding is ignored. Quadding is used to specify whether the
1646   - contents of a field should be left, center, or right aligned with
1647   - the field.
1648   -
1649   - - Rich text, multi-line, and other more elaborate formatting
1650   - directives are ignored.
1651   -
1652   - - There is no support for multi-select fields or signature fields.
1653   -
1654   - If qpdf doesn't do a good enough job with your form, use an external
1655   - application to save your filled-in form before processing it with
1656   - qpdf.
1657   -
1658   -:samp:`--optimize-images`
1659   - This flag causes qpdf to recompress all images that are not
1660   - compressed with DCT (JPEG) using DCT compression as long as doing so
1661   - decreases the size in bytes of the image data and the image does not
1662   - fall below minimum specified dimensions. Useful information is
1663   - provided when used in combination with
1664   - :samp:`--verbose`. See also the
1665   - :samp:`--oi-min-width`,
1666   - :samp:`--oi-min-height`, and
1667   - :samp:`--oi-min-area` options. By default, starting
1668   - in qpdf 8.4, inline images are converted to regular images and
1669   - optimized as well. Use :samp:`--keep-inline-images`
1670   - to prevent inline images from being included.
1671   -
1672   -:samp:`--oi-min-width={width}`
1673   - Avoid optimizing images whose width is below the specified amount. If
1674   - omitted, the default is 128 pixels. Use 0 for no minimum.
1675   -
1676   -:samp:`--oi-min-height={height}`
1677   - Avoid optimizing images whose height is below the specified amount.
1678   - If omitted, the default is 128 pixels. Use 0 for no minimum.
1679   -
1680   -:samp:`--oi-min-area={area-in-pixels}`
1681   - Avoid optimizing images whose pixel count (widthย ร—ย height) is below
1682   - the specified amount. If omitted, the default is 16,384 pixels. Use 0
1683   - for no minimum.
1684   -
1685   -:samp:`--externalize-inline-images`
1686   - Convert inline images to regular images. By default, images whose
1687   - data is at least 1,024 bytes are converted when this option is
1688   - selected. Use :samp:`--ii-min-bytes` to change the
1689   - size threshold. This option is implicitly selected when
1690   - :samp:`--optimize-images` is selected. Use
1691   - :samp:`--keep-inline-images` to exclude inline images
1692   - from image optimization.
1693   -
1694   -:samp:`--ii-min-bytes={bytes}`
1695   - Avoid converting inline images whose size is below the specified
1696   - minimum size to regular images. If omitted, the default is 1,024
1697   - bytes. Use 0 for no minimum.
1698   -
1699   -:samp:`--keep-inline-images`
1700   - Prevent inline images from being included in image optimization. This
1701   - option has no affect when :samp:`--optimize-images`
1702   - is not specified.
1703   -
1704   -:samp:`--remove-page-labels`
1705   - Remove page labels from the output file.
1706   -
1707   -:samp:`--qdf`
1708   - Turns on QDF mode. For additional information on QDF, please see :ref:`ref.qdf`. Note that :samp:`--linearize`
1709   - disables QDF mode.
1710   -
1711   -:samp:`--min-version={version}`
1712   - Forces the PDF version of the output file to be at least
1713   - :samp:`{version}`. In other words, if the
1714   - input file has a lower version than the specified version, the
1715   - specified version will be used. If the input file has a higher
1716   - version, the input file's original version will be used. It is seldom
1717   - necessary to use this option since qpdf will automatically increase
1718   - the version as needed when adding features that require newer PDF
1719   - readers.
1720   -
1721   - The version number may be expressed in the form
1722   - :samp:`{major.minor.extension-level}`, in
1723   - which case the version is interpreted as
1724   - :samp:`{major.minor}` at extension level
1725   - :samp:`{extension-level}`. For example,
1726   - version ``1.7.8`` represents version 1.7 at extension level 8. Note
1727   - that minimal syntax checking is done on the command line.
1728   -
1729   -:samp:`--force-version={version}`
1730   - This option forces the PDF version to be the exact version specified
1731   - *even when the file may have content that is not supported in that
1732   - version*. The version number is interpreted in the same way as with
1733   - :samp:`--min-version` so that extension levels can be
1734   - set. In some cases, forcing the output file's PDF version to be lower
1735   - than that of the input file will cause qpdf to disable certain
1736   - features of the document. Specifically, 256-bit keys are disabled if
1737   - the version is less than 1.7 with extension level 8 (except R5 is
1738   - disabled if less than 1.7 with extension level 3), AES encryption is
1739   - disabled if the version is less than 1.6, cleartext metadata and
1740   - object streams are disabled if less than 1.5, 128-bit encryption keys
1741   - are disabled if less than 1.4, and all encryption is disabled if less
1742   - than 1.3. Even with these precautions, qpdf won't be able to do
1743   - things like eliminate use of newer image compression schemes,
1744   - transparency groups, or other features that may have been added in
1745   - more recent versions of PDF.
1746   -
1747   - As a general rule, with the exception of big structural things like
1748   - the use of object streams or AES encryption, PDF viewers are supposed
1749   - to ignore features in files that they don't support from newer
1750   - versions. This means that forcing the version to a lower version may
1751   - make it possible to open your PDF file with an older version, though
1752   - bear in mind that some of the original document's functionality may
1753   - be lost.
1754   -
1755   -By default, when a stream is encoded using non-lossy filters that qpdf
1756   -understands and is not already compressed using a good compression
1757   -scheme, qpdf will uncompress and recompress streams. Assuming proper
1758   -filter implements, this is safe and generally results in smaller files.
1759   -This behavior may also be explicitly requested with
1760   -:samp:`--stream-data=compress`.
1761   -
1762   -When :samp:`--normalize-content=y` is specified, qpdf
1763   -will attempt to normalize whitespace and newlines in page content
1764   -streams. This is generally safe but could, in some cases, cause damage
1765   -to the content streams. This option is intended for people who wish to
1766   -study PDF content streams or to debug PDF content. You should not use
1767   -this for "production" PDF files.
1768   -
1769   -When normalizing content, if qpdf runs into any lexical errors, it will
1770   -print a warning indicating that content may be damaged. The only
1771   -situation in which qpdf is known to cause damage during content
1772   -normalization is when a page's contents are split across multiple
1773   -streams and streams are split in the middle of a lexical token such as a
1774   -string, name, or inline image. Note that files that do this are invalid
1775   -since the PDF specification states that content streams are not to be
1776   -split in the middle of a token. If you want to inspect the original
1777   -content streams in an uncompressed format, you can always run with
1778   -:samp:`--qdf --normalize-content=n` for a QDF file
1779   -without content normalization, or alternatively
1780   -:samp:`--stream-data=uncompress` for a regular non-QDF
1781   -mode file with uncompressed streams. These will both uncompress all the
1782   -streams but will not attempt to normalize content. Please note that if
1783   -you are using content normalization or QDF mode for the purpose of
1784   -manually inspecting files, you don't have to care about this.
1785   -
1786   -Object streams, also known as compressed objects, were introduced into
1787   -the PDF specification at version 1.5, corresponding to Acrobat 6. Some
1788   -older PDF viewers may not support files with object streams. qpdf can be
1789   -used to transform files with object streams to files without object
1790   -streams or vice versa. As mentioned above, there are three object stream
1791   -modes: :samp:`preserve`,
1792   -:samp:`disable`, and :samp:`generate`.
1793   -
1794   -In :samp:`preserve` mode, the relationship to objects
1795   -and the streams that contain them is preserved from the original file.
1796   -In :samp:`disable` mode, all objects are written as
1797   -regular, uncompressed objects. The resulting file should be readable by
1798   -older PDF viewers. (Of course, the content of the files may include
1799   -features not supported by older viewers, but at least the structure will
1800   -be supported.) In :samp:`generate` mode, qpdf will
1801   -create its own object streams. This will usually result in more compact
1802   -PDF files, though they may not be readable by older viewers. In this
1803   -mode, qpdf will also make sure the PDF version number in the header is
1804   -at least 1.5.
1805   -
1806   -The :samp:`--qdf` flag turns on QDF mode, which changes
1807   -some of the defaults described above. Specifically, in QDF mode, by
1808   -default, stream data is uncompressed, content streams are normalized,
1809   -and encryption is removed. These defaults can still be overridden by
1810   -specifying the appropriate options as described above. Additionally, in
1811   -QDF mode, stream lengths are stored as indirect objects, objects are
1812   -laid out in a less efficient but more readable fashion, and the
1813   -documents are interspersed with comments that make it easier for the
1814   -user to find things and also make it possible for
1815   -:command:`fix-qdf` to work properly. QDF mode is intended
1816   -for people, mostly developers, who wish to inspect or modify PDF files
1817   -in a text editor. For details, please see :ref:`ref.qdf`.
1818   -
1819   -.. _ref.testing-options:
1820   -
1821   -Testing, Inspection, and Debugging Options
1822   -------------------------------------------
1823   -
1824   -These options can be useful for digging into PDF files or for use in
1825   -automated test suites for software that uses the qpdf library. When any
1826   -of the options in this section are specified, no output file should be
1827   -given. The following options are available:
1828   -
1829   -:samp:`--deterministic-id`
1830   - Causes generation of a deterministic value for /ID. This prevents use
1831   - of timestamp and output file name information in the /ID generation.
1832   - Instead, at some slight additional runtime cost, the /ID field is
1833   - generated to include a digest of the significant parts of the content
1834   - of the output PDF file. This means that a given qpdf operation should
1835   - generate the same /ID each time it is run, which can be useful when
1836   - caching results or for generation of some test data. Use of this flag
1837   - is not compatible with creation of encrypted files.
1838   -
1839   -:samp:`--static-id`
1840   - Causes generation of a fixed value for /ID. This is intended for
1841   - testing only. Never use it for production files. If you are trying to
1842   - get the same /ID each time for a given file and you are not
1843   - generating encrypted files, consider using the
1844   - :samp:`--deterministic-id` option.
1845   -
1846   -:samp:`--static-aes-iv`
1847   - Causes use of a static initialization vector for AES-CBC. This is
1848   - intended for testing only so that output files can be reproducible.
1849   - Never use it for production files. This option in particular is not
1850   - secure since it significantly weakens the encryption.
1851   -
1852   -:samp:`--no-original-object-ids`
1853   - Suppresses inclusion of original object ID comments in QDF files.
1854   - This can be useful when generating QDF files for test purposes,
1855   - particularly when comparing them to determine whether two PDF files
1856   - have identical content.
1857   -
1858   -:samp:`--show-encryption`
1859   - Shows document encryption parameters. Also shows the document's user
1860   - password if the owner password is given.
1861   -
1862   -:samp:`--show-encryption-key`
1863   - When encryption information is being displayed, as when
1864   - :samp:`--check` or
1865   - :samp:`--show-encryption` is given, display the
1866   - computed or retrieved encryption key as a hexadecimal string. This
1867   - value is not ordinarily useful to users, but it can be used as the
1868   - argument to :samp:`--password` if the
1869   - :samp:`--password-is-hex-key` is specified. Note
1870   - that, when PDF files are encrypted, passwords and other metadata are
1871   - used only to compute an encryption key, and the encryption key is
1872   - what is actually used for encryption. This enables retrieval of that
1873   - key.
1874   -
1875   -:samp:`--check-linearization`
1876   - Checks file integrity and linearization status.
1877   -
1878   -:samp:`--show-linearization`
1879   - Checks and displays all data in the linearization hint tables.
1880   -
1881   -:samp:`--show-xref`
1882   - Shows the contents of the cross-reference table in a human-readable
1883   - form. This is especially useful for files with cross-reference
1884   - streams which are stored in a binary format.
1885   -
1886   -:samp:`--show-object=trailer|obj[,gen]`
1887   - Show the contents of the given object. This is especially useful for
1888   - inspecting objects that are inside of object streams (also known as
1889   - "compressed objects").
1890   -
1891   -:samp:`--raw-stream-data`
1892   - When used along with the :samp:`--show-object`
1893   - option, if the object is a stream, shows the raw stream data instead
1894   - of object's contents.
1895   -
1896   -:samp:`--filtered-stream-data`
1897   - When used along with the :samp:`--show-object`
1898   - option, if the object is a stream, shows the filtered stream data
1899   - instead of object's contents. If the stream is filtered using filters
1900   - that qpdf does not support, an error will be issued.
1901   -
1902   -:samp:`--show-npages`
1903   - Prints the number of pages in the input file on a line by itself.
1904   - Since the number of pages appears by itself on a line, this option
1905   - can be useful for scripting if you need to know the number of pages
1906   - in a file.
1907   -
1908   -:samp:`--show-pages`
1909   - Shows the object and generation number for each page dictionary
1910   - object and for each content stream associated with the page. Having
1911   - this information makes it more convenient to inspect objects from a
1912   - particular page.
1913   -
1914   -:samp:`--with-images`
1915   - When used along with :samp:`--show-pages`, also shows
1916   - the object and generation numbers for the image objects on each page.
1917   - (At present, information about images in shared resource dictionaries
1918   - are not output by this command. This is discussed in a comment in the
1919   - source code.)
1920   -
1921   -:samp:`--json`
1922   - Generate a JSON representation of the file. This is described in
1923   - depth in :ref:`ref.json`
1924   -
1925   -:samp:`--json-help`
1926   - Describe the format of the JSON output.
1927   -
1928   -:samp:`--json-key=key`
1929   - This option is repeatable. If specified, only top-level keys
1930   - specified will be included in the JSON output. If not specified, all
1931   - keys will be shown.
1932   -
1933   -:samp:`--json-object=trailer|obj[,gen]`
1934   - This option is repeatable. If specified, only specified objects will
1935   - be shown in the "``objects``" key of the JSON output. If absent, all
1936   - objects will be shown.
1937   -
1938   -:samp:`--check`
1939   - Checks file structure and well as encryption, linearization, and
1940   - encoding of stream data. A file for which
1941   - :samp:`--check` reports no errors may still have
1942   - errors in stream data content but should otherwise be structurally
1943   - sound. If :samp:`--check` any errors, qpdf will exit
1944   - with a status of 2. There are some recoverable conditions that
1945   - :samp:`--check` detects. These are issued as warnings
1946   - instead of errors. If qpdf finds no errors but finds warnings, it
1947   - will exit with a status of 3 (as of versionย 2.0.4). When
1948   - :samp:`--check` is combined with other options,
1949   - checks are always performed before any other options are processed.
1950   - For erroneous files, :samp:`--check` will cause qpdf
1951   - to attempt to recover, after which other options are effectively
1952   - operating on the recovered file. Combining
1953   - :samp:`--check` with other options in this way can be
1954   - useful for manually recovering severely damaged files. Note that
1955   - :samp:`--check` produces no output to standard output
1956   - when everything is valid, so if you are using this to
1957   - programmatically validate files in bulk, it is safe to run without
1958   - output redirected to :file:`/dev/null` and just
1959   - check for a 0 exit code.
1960   -
1961   -The :samp:`--raw-stream-data` and
1962   -:samp:`--filtered-stream-data` options are ignored
1963   -unless :samp:`--show-object` is given. Either of these
1964   -options will cause the stream data to be written to standard output. In
1965   -order to avoid commingling of stream data with other output, it is
1966   -recommend that these objects not be combined with other test/inspection
1967   -options.
1968   -
1969   -If :samp:`--filtered-stream-data` is given and
1970   -:samp:`--normalize-content=y` is also given, qpdf will
1971   -attempt to normalize the stream data as if it is a page content stream.
1972   -This attempt will be made even if it is not a page content stream, in
1973   -which case it will produce unusable results.
1974   -
1975   -.. _ref.unicode-passwords:
1976   -
1977   -Unicode Passwords
1978   ------------------
1979   -
1980   -At the library API level, all methods that perform encryption and
1981   -decryption interpret passwords as strings of bytes. It is up to the
1982   -caller to ensure that they are appropriately encoded. Starting with qpdf
1983   -version 8.4.0, qpdf will attempt to make this easier for you when
1984   -interact with qpdf via its command line interface. The PDF specification
1985   -requires passwords used to encrypt files with 40-bit or 128-bit
1986   -encryption to be encoded with PDF Doc encoding. This encoding is a
1987   -single-byte encoding that supports ISO-Latin-1 and a handful of other
1988   -commonly used characters. It has a large overlap with Windows ANSI but
1989   -is not exactly the same. There is generally not a way to provide PDF Doc
1990   -encoded strings on the command line. As such, qpdf versions prior to
1991   -8.4.0 would often create PDF files that couldn't be opened with other
1992   -software when given a password with non-ASCII characters to encrypt a
1993   -file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf
1994   -recognizes the encoding of the parameter and transcodes it as needed.
1995   -The rest of this section provides the details about exactly how qpdf
1996   -behaves. Most users will not need to know this information, but it might
1997   -be useful if you have been working around qpdf's old behavior or if you
1998   -are using qpdf to generate encrypted files for testing other PDF
1999   -software.
2000   -
2001   -A note about Windows: when qpdf builds, it attempts to determine what it
2002   -has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain``
2003   -function is an alternative entry point that receives all arguments as
2004   -UTF-16-encoded strings. When qpdf starts up this way, it converts all
2005   -the strings to UTF-8 encoding and then invokes the regular main. This
2006   -means that, as far as qpdf is concerned, it receives its command-line
2007   -arguments with UTF-8 encoding, just as it would in any modern Linux or
2008   -UNIX environment.
2009   -
2010   -If a file is being encrypted with 40-bit or 128-bit encryption and the
2011   -supplied password is not a valid UTF-8 string, qpdf will fall back to
2012   -the behavior of interpreting the password as a string of bytes. If you
2013   -have old scripts that encrypt files by passing the output of
2014   -:command:`iconv` to qpdf, you no longer need to do that,
2015   -but if you do, qpdf should still work. The only exception would be for
2016   -the extremely unlikely case of a password that is encoded with a
2017   -single-byte encoding but also happens to be valid UTF-8. Such a password
2018   -would contain strings of even numbers of characters that alternate
2019   -between accented letters and symbols. In the extremely unlikely event
2020   -that you are intentionally using such passwords and qpdf is thwarting
2021   -you by interpreting them as UTF-8, you can use
2022   -:samp:`--password-mode=bytes` to suppress qpdf's
2023   -automatic behavior.
2024   -
2025   -The :samp:`--password-mode` option, as described earlier
2026   -in this chapter, can be used to change qpdf's interpretation of supplied
2027   -passwords. There are very few reasons to use this option. One would be
2028   -the unlikely case described in the previous paragraph in which the
2029   -supplied password happens to be valid UTF-8 but isn't supposed to be
2030   -UTF-8. Your best bet would be just to provide the password as a valid
2031   -UTF-8 string, but you could also use
2032   -:samp:`--password-mode=bytes`. Another reason to use
2033   -:samp:`--password-mode=bytes` would be to intentionally
2034   -generate PDF files encrypted with passwords that are not properly
2035   -encoded. The qpdf test suite does this to generate invalid files for the
2036   -purpose of testing its password recovery capability. If you were trying
2037   -to create intentionally incorrect files for a similar purposes, the
2038   -:samp:`bytes` password mode can enable you to do this.
2039   -
2040   -When qpdf attempts to decrypt a file with a password that contains
2041   -non-ASCII characters, it will generate a list of alternative passwords
2042   -by attempting to interpret the password as each of a handful of
2043   -different coding systems and then transcode them to the required format.
2044   -This helps to compensate for the supplied password being given in the
2045   -wrong coding system, such as would happen if you used the
2046   -:command:`iconv` workaround that was previously needed.
2047   -It also generates passwords by doing the reverse operation: translating
2048   -from correct in incorrect encoding of the password. This would enable
2049   -qpdf to decrypt files using passwords that were improperly encoded by
2050   -whatever software encrypted the files, including older versions of qpdf
2051   -invoked without properly encoded passwords. The combination of these two
2052   -recovery methods should make qpdf transparently open most encrypted
2053   -files with the password supplied correctly but in the wrong coding
2054   -system. There are no real downsides to this behavior, but if you don't
2055   -want qpdf to do this, you can use the
2056   -:samp:`--suppress-password-recovery` option. One reason
2057   -to do that is to ensure that you know the exact password that was used
2058   -to encrypt the file.
2059   -
2060   -With these changes, qpdf now generates compliant passwords in most
2061   -cases. There are still some exceptions. In particular, the PDF
2062   -specification directs compliant writers to normalize Unicode passwords
2063   -and to perform certain transformations on passwords with bidirectional
2064   -text. Implementing this functionality requires using a real Unicode
2065   -library like ICU. If a client application that uses qpdf wants to do
2066   -this, the qpdf library will accept the resulting passwords, but qpdf
2067   -will not perform these transformations itself. It is possible that this
2068   -will be addressed in a future version of qpdf. The ``QPDFWriter``
2069   -methods that enable encryption on the output file accept passwords as
2070   -strings of bytes.
2071   -
2072   -Please note that the :samp:`--password-is-hex-key`
2073   -option is unrelated to all this. This flag bypasses the normal process
2074   -of going from password to encryption string entirely, allowing the raw
2075   -encryption key to be specified directly. This is useful for forensic
2076   -purposes or for brute-force recovery of files with unknown passwords.
2077   -
2078   -.. _ref.qdf:
2079   -
2080   -QDF Mode
2081   -========
2082   -
2083   -In QDF mode, qpdf creates PDF files in what we call *QDF
2084   -form*. A PDF file in QDF form, sometimes called a QDF
2085   -file, is a completely valid PDF file that has ``%QDF-1.0`` as its third
2086   -line (after the pdf header and binary characters) and has certain other
2087   -characteristics. The purpose of QDF form is to make it possible to edit
2088   -PDF files, with some restrictions, in an ordinary text editor. This can
2089   -be very useful for experimenting with different PDF constructs or for
2090   -making one-off edits to PDF files (though there are other reasons why
2091   -this may not always work). Note that QDF mode does not support
2092   -linearized files. If you enable linearization, QDF mode is automatically
2093   -disabled.
2094   -
2095   -It is ordinarily very difficult to edit PDF files in a text editor for
2096   -two reasons: most meaningful data in PDF files is compressed, and PDF
2097   -files are full of offset and length information that makes it hard to
2098   -add or remove data. A QDF file is organized in a manner such that, if
2099   -edits are kept within certain constraints, the
2100   -:command:`fix-qdf` program, distributed with qpdf, is
2101   -able to restore edited files to a correct state. The
2102   -:command:`fix-qdf` program takes no command-line
2103   -arguments. It reads a possibly edited QDF file from standard input and
2104   -writes a repaired file to standard output.
2105   -
2106   -The following attributes characterize a QDF file:
2107   -
2108   -- All objects appear in numerical order in the PDF file, including when
2109   - objects appear in object streams.
2110   -
2111   -- Objects are printed in an easy-to-read format, and all line endings
2112   - are normalized to UNIX line endings.
2113   -
2114   -- Unless specifically overridden, streams appear uncompressed (when
2115   - qpdf supports the filters and they are compressed with a non-lossy
2116   - compression scheme), and most content streams are normalized (line
2117   - endings are converted to just a UNIX-style linefeeds).
2118   -
2119   -- All streams lengths are represented as indirect objects, and the
2120   - stream length object is always the next object after the stream. If
2121   - the stream data does not end with a newline, an extra newline is
2122   - inserted, and a special comment appears after the stream indicating
2123   - that this has been done.
2124   -
2125   -- If the PDF file contains object streams, if object stream *n*
2126   - contains *k* objects, those objects are numbered from *n+1* through
2127   - *n+k*, and the object number/offset pairs appear on a separate line
2128   - for each object. Additionally, each object in the object stream is
2129   - preceded by a comment indicating its object number and index. This
2130   - makes it very easy to find objects in object streams.
2131   -
2132   -- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens,
2133   - and ``endobj`` tokens appear on lines by themselves. A blank line
2134   - follows every ``endobj`` token.
2135   -
2136   -- If there is a cross-reference stream, it is unfiltered.
2137   -
2138   -- Page dictionaries and page content streams are marked with special
2139   - comments that make them easy to find.
2140   -
2141   -- Comments precede each object indicating the object number of the
2142   - corresponding object in the original file.
2143   -
2144   -When editing a QDF file, any edits can be made as long as the above
2145   -constraints are maintained. This means that you can freely edit a page's
2146   -content without worrying about messing up the QDF file. It is also
2147   -possible to add new objects so long as those objects are added after the
2148   -last object in the file or subsequent objects are renumbered. If a QDF
2149   -file has object streams in it, you can always add the new objects before
2150   -the xref stream and then change the number of the xref stream, since
2151   -nothing generally ever references it by number.
2152   -
2153   -It is not generally practical to remove objects from QDF files without
2154   -messing up object numbering, but if you remove all references to an
2155   -object, you can run qpdf on the file (after running
2156   -:command:`fix-qdf`), and qpdf will omit the now-orphaned
2157   -object.
2158   -
2159   -When :command:`fix-qdf` is run, it goes through the file
2160   -and recomputes the following parts of the file:
2161   -
2162   -- the ``/N``, ``/W``, and ``/First`` keys of all object stream
2163   - dictionaries
2164   -
2165   -- the pairs of numbers representing object numbers and offsets of
2166   - objects in object streams
2167   -
2168   -- all stream lengths
2169   -
2170   -- the cross-reference table or cross-reference stream
2171   -
2172   -- the offset to the cross-reference table or cross-reference stream
2173   - following the ``startxref`` token
2174   -
2175   -.. _ref.using-library:
2176   -
2177   -Using the QPDF Library
2178   -======================
2179   -
2180   -.. _ref.using.from-cxx:
2181   -
2182   -Using QPDF from C++
2183   --------------------
2184   -
2185   -The source tree for the qpdf package has an
2186   -:file:`examples` directory that contains a few
2187   -example programs. The :file:`qpdf/qpdf.cc` source
2188   -file also serves as a useful example since it exercises almost all of
2189   -the qpdf library's public interface. The best source of documentation on
2190   -the library itself is reading comments in
2191   -:file:`include/qpdf/QPDF.hh`,
2192   -:file:`include/qpdf/QPDFWriter.hh`, and
2193   -:file:`include/qpdf/QPDFObjectHandle.hh`.
2194   -
2195   -All header files are installed in the
2196   -:file:`include/qpdf` directory. It is recommend that
2197   -you use ``#include <qpdf/QPDF.hh>`` rather than adding
2198   -:file:`include/qpdf` to your include path.
2199   -
2200   -When linking against the qpdf static library, you may also need to
2201   -specify ``-lz -ljpeg`` on your link command. If your system understands
2202   -how to read libtool :file:`.la` files, this may not
2203   -be necessary.
2204   -
2205   -The qpdf library is safe to use in a multithreaded program, but no
2206   -individual ``QPDF`` object instance (including ``QPDF``,
2207   -``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one
2208   -thread at a time. Multiple threads may simultaneously work with
2209   -different instances of these and all other QPDF objects.
2210   -
2211   -.. _ref.using.other-languages:
2212   -
2213   -Using QPDF from other languages
2214   --------------------------------
2215   -
2216   -The qpdf library is implemented in C++, which makes it hard to use
2217   -directly in other languages. There are a few things that can help.
2218   -
2219   -"C"
2220   - The qpdf library includes a "C" language interface that provides a
2221   - subset of the overall capabilities. The header file
2222   - :file:`qpdf/qpdf-c.h` includes information about
2223   - its use. As long as you use a C++ linker, you can link C programs
2224   - with qpdf and use the C API. For languages that can directly load
2225   - methods from a shared library, the C API can also be useful. People
2226   - have reported success using the C API from other languages on Windows
2227   - by directly calling functions in the DLL.
2228   -
2229   -Python
2230   - A Python module called
2231   - `pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and
2232   - highly functional set of Python bindings to the qpdf library. Using
2233   - pikepdf, you can work with PDF files in a natural way and combine
2234   - qpdf's capabilities with other functionality provided by Python's
2235   - rich standard library and available modules.
2236   -
2237   -Other Languages
2238   - Starting with version 8.3.0, the :command:`qpdf`
2239   - command-line tool can produce a JSON representation of the PDF file's
2240   - non-content data. This can facilitate interacting programmatically
2241   - with PDF files through qpdf's command line interface. For more
2242   - information, please see :ref:`ref.json`.
2243   -
2244   -.. _ref.unicode-files:
2245   -
2246   -A Note About Unicode File Names
2247   --------------------------------
2248   -
2249   -When strings are passed to qpdf library routines either as ``char*`` or
2250   -as ``std::string``, they are treated as byte arrays except where
2251   -otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless
2252   -otherwise noted in comments in header files. In modern UNIX/Linux
2253   -environments, this generally does the right thing. In Windows, it's a
2254   -bit more complicated. Starting in qpdf 8.4.0, passwords that contain
2255   -Unicode characters are handled much better, and starting in qpdf 8.4.1,
2256   -the library attempts to properly handle Unicode characters in filenames.
2257   -In particular, in Windows, if a UTF-8 encoded string is used as a
2258   -filename in either ``QPDF`` or ``QPDFWriter``, it is internally
2259   -converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As
2260   -such, qpdf will generally operate properly on files with non-ASCII
2261   -characters in their names as long as the filenames are UTF-8 encoded for
2262   -passing into the qpdf library API, but there are still some rough edges,
2263   -such as the encoding of the filenames in error messages our CLI output
2264   -messages. Patches or bug reports are welcome for any continuing issues
2265   -with Unicode file names in Windows.
2266   -
2267   -.. _ref.weak-crypto:
2268   -
2269   -Weak Cryptography
2270   -=================
2271   -
2272   -Start with version 10.4, qpdf is taking steps to reduce the likelihood
2273   -of a user *accidentally* creating PDF files with insecure cryptography
2274   -but will continue to allow creation of such files indefinitely with
2275   -explicit acknowledgment.
2276   -
2277   -The PDF file format makes use of RC4, which is known to be a weak
2278   -cryptography algorithm, and MD5, which is a weak hashing algorithm. In
2279   -version 10.4, qpdf generates warnings for some (but not all) cases of
2280   -writing files with weak cryptography when invoked from the command-line.
2281   -These warnings can be suppressed using the
2282   -:samp:`--allow-weak-crypto` option.
2283   -
2284   -It is planned for qpdf version 11 to be stricter, making it an error to
2285   -write files with insecure cryptography from the command-line tool in
2286   -most cases without specifying the
2287   -:samp:`--allow-weak-crypto` flag and also to require
2288   -explicit steps when using the C++ library to enable use of insecure
2289   -cryptography.
2290   -
2291   -Note that qpdf must always retain support for weak cryptographic
2292   -algorithms since this is required for reading older PDF files that use
2293   -it. Additionally, qpdf will always retain the ability to create files
2294   -using weak cryptographic algorithms since, as a development tool, qpdf
2295   -explicitly supports creating older or deprecated types of PDF files
2296   -since these are sometimes needed to test or work with older versions of
2297   -software. Even if other cryptography libraries drop support for RC4 or
2298   -MD5, qpdf can always fall back to its internal implementations of those
2299   -algorithms, so they are not going to disappear from qpdf.
2300   -
2301   -.. _ref.json:
2302   -
2303   -QPDF JSON
2304   -=========
2305   -
2306   -.. _ref.json-overview:
2307   -
2308   -Overview
2309   ---------
2310   -
2311   -Beginning with qpdf version 8.3.0, the :command:`qpdf`
2312   -command-line program can produce a JSON representation of the
2313   -non-content data in a PDF file. It includes a dump in JSON format of all
2314   -objects in the PDF file excluding the content of streams. This JSON
2315   -representation makes it very easy to look in detail at the structure of
2316   -a given PDF file, and it also provides a great way to work with PDF
2317   -files programmatically from the command-line in languages that can't
2318   -call or link with the qpdf library directly. Note that stream data can
2319   -be extracted from PDF files using other qpdf command-line options.
2320   -
2321   -.. _ref.json-guarantees:
2322   -
2323   -JSON Guarantees
2324   ----------------
2325   -
2326   -The qpdf JSON representation includes a JSON serialization of the raw
2327   -objects in the PDF file as well as some computed information in a more
2328   -easily extracted format. QPDF provides some guarantees about its JSON
2329   -format. These guarantees are designed to simplify the experience of a
2330   -developer working with the JSON format.
2331   -
2332   -Compatibility
2333   - The top-level JSON object output is a dictionary. The JSON output
2334   - contains various nested dictionaries and arrays. With the exception
2335   - of dictionaries that are populated by the fields of objects from the
2336   - file, all instances of a dictionary are guaranteed to have exactly
2337   - the same keys. Future versions of qpdf are free to add additional
2338   - keys but not to remove keys or change the type of object that a key
2339   - points to. The qpdf program validates this guarantee, and in the
2340   - unlikely event that a bug in qpdf should cause it to generate data
2341   - that doesn't conform to this rule, it will ask you to file a bug
2342   - report.
2343   -
2344   - The top-level JSON structure contains a "``version``" key whose value
2345   - is simple integer. The value of the ``version`` key will be
2346   - incremented if a non-compatible change is made. A non-compatible
2347   - change would be any change that involves removal of a key, a change
2348   - to the format of data pointed to by a key, or a semantic change that
2349   - requires a different interpretation of a previously existing key. A
2350   - strong effort will be made to avoid breaking compatibility.
2351   -
2352   -Documentation
2353   - The :command:`qpdf` command can be invoked with the
2354   - :samp:`--json-help` option. This will output a JSON
2355   - structure that has the same structure as the JSON output that qpdf
2356   - generates, except that each field in the help output is a description
2357   - of the corresponding field in the JSON output. The specific
2358   - guarantees are as follows:
2359   -
2360   - - A dictionary in the help output means that the corresponding
2361   - location in the actual JSON output is also a dictionary with
2362   - exactly the same keys; that is, no keys present in help are absent
2363   - in the real output, and no keys will be present in the real output
2364   - that are not in help. As a special case, if the dictionary has a
2365   - single key whose name starts with ``<`` and ends with ``>``, it
2366   - means that the JSON output is a dictionary that can have any keys,
2367   - each of which conforms to the value of the special key. This is
2368   - used for cases in which the keys of the dictionary are things like
2369   - object IDs.
2370   -
2371   - - A string in the help output is a description of the item that
2372   - appears in the corresponding location of the actual output. The
2373   - corresponding output can have any format.
2374   -
2375   - - An array in the help output always contains a single element. It
2376   - indicates that the corresponding location in the actual output is
2377   - also an array, and that each element of the array has whatever
2378   - format is implied by the single element of the help output's
2379   - array.
2380   -
2381   - For example, the help output indicates includes a "``pagelabels``"
2382   - key whose value is an array of one element. That element is a
2383   - dictionary with keys "``index``" and "``label``". In addition to
2384   - describing the meaning of those keys, this tells you that the actual
2385   - JSON output will contain a ``pagelabels`` array, each of whose
2386   - elements is a dictionary that contains an ``index`` key, a ``label``
2387   - key, and no other keys.
2388   -
2389   -Directness and Simplicity
2390   - The JSON output contains the value of every object in the file, but
2391   - it also contains some processed data. This is analogous to how qpdf's
2392   - library interface works. The processed data is similar to the helper
2393   - functions in that it allows you to look at certain aspects of the PDF
2394   - file without having to understand all the nuances of the PDF
2395   - specification, while the raw objects allow you to mine the PDF for
2396   - anything that the higher-level interfaces are lacking.
2397   -
2398   -.. _json.limitations:
2399   -
2400   -Limitations of JSON Representation
2401   -----------------------------------
2402   -
2403   -There are a few limitations to be aware of with the JSON structure:
2404   -
2405   -- Strings, names, and indirect object references in the original PDF
2406   - file are all converted to strings in the JSON representation. In the
2407   - case of a "normal" PDF file, you can tell the difference because a
2408   - name starts with a slash (``/``), and an indirect object reference
2409   - looks like ``n n R``, but if there were to be a string that looked
2410   - like a name or indirect object reference, there would be no way to
2411   - tell this from the JSON output. Note that there are certain cases
2412   - where you know for sure what something is, such as knowing that
2413   - dictionary keys in objects are always names and that certain things
2414   - in the higher-level computed data are known to contain indirect
2415   - object references.
2416   -
2417   -- The JSON format doesn't support binary data very well. Mostly the
2418   - details are not important, but they are presented here for
2419   - information. When qpdf outputs a string in the JSON representation,
2420   - it converts the string to UTF-8, assuming usual PDF string semantics.
2421   - Specifically, if the original string is UTF-16, it is converted to
2422   - UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is
2423   - converted to UTF-8 with that assumption. This causes strange things
2424   - to happen to binary strings. For example, if you had the binary
2425   - string ``<038051>``, this would be output to the JSON as ``\u0003โ€ขQ``
2426   - because ``03`` is not a printable character and ``80`` is the bullet
2427   - character in PDF doc encoding and is mapped to the Unicode value
2428   - ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to
2429   - convert back from here to a binary string, would have to recognize
2430   - Unicode values whose code points are higher than ``0xFF`` and map
2431   - those back to their corresponding PDF doc encoding characters. There
2432   - is no way to tell the difference between a Unicode string that was
2433   - originally encoded as UTF-16 or one that was converted from PDF doc
2434   - encoding. In other words, it's best if you don't try to use the JSON
2435   - format to extract binary strings from the PDF file, but if you really
2436   - had to, it could be done. Note that qpdf's
2437   - :samp:`--show-object` option does not have this
2438   - limitation and will reveal the string as encoded in the original
2439   - file.
2440   -
2441   -.. _json.considerations:
2442   -
2443   -JSON: Special Considerations
2444   -----------------------------
2445   -
2446   -For the most part, the built-in JSON help tells you everything you need
2447   -to know about the JSON format, but there are a few non-obvious things to
2448   -be aware of:
2449   -
2450   -- While qpdf guarantees that keys present in the help will be present
2451   - in the output, those fields may be null or empty if the information
2452   - is not known or absent in the file. Also, if you specify
2453   - :samp:`--json-keys`, the keys that are not listed
2454   - will be excluded entirely except for those that
2455   - :samp:`--json-help` says are always present.
2456   -
2457   -- In a few places, there are keys with names containing
2458   - ``pageposfrom1``. The values of these keys are null or an integer. If
2459   - an integer, they point to a page index within the file numbering from
2460   - 1. Note that JSON indexes from 0, and you would also use 0-based
2461   - indexing using the API. However, 1-based indexing is easier in this
2462   - case because the command-line syntax for specifying page ranges is
2463   - 1-based. If you were going to write a program that looked through the
2464   - JSON for information about specific pages and then use the
2465   - command-line to extract those pages, 1-based indexing is easier.
2466   - Besides, it's more convenient to subtract 1 from a program in a real
2467   - programming language than it is to add 1 from shell code.
2468   -
2469   -- The image information included in the ``page`` section of the JSON
2470   - output includes the key "``filterable``". Note that the value of this
2471   - field may depend on the :samp:`--decode-level` that
2472   - you invoke qpdf with. The JSON output includes a top-level key
2473   - "``parameters``" that indicates the decode level used for computing
2474   - whether a stream was filterable. For example, jpeg images will be
2475   - shown as not filterable by default, but they will be shown as
2476   - filterable if you run :command:`qpdf --json
2477   - --decode-level=all`.
2478   -
2479   -.. _ref.design:
2480   -
2481   -Design and Library Notes
2482   -========================
2483   -
2484   -.. _ref.design.intro:
2485   -
2486   -Introduction
2487   -------------
2488   -
2489   -This section was written prior to the implementation of the qpdf package
2490   -and was subsequently modified to reflect the implementation. In some
2491   -cases, for purposes of explanation, it may differ slightly from the
2492   -actual implementation. As always, the source code and test suite are
2493   -authoritative. Even if there are some errors, this document should serve
2494   -as a road map to understanding how this code works.
2495   -
2496   -In general, one should adhere strictly to a specification when writing
2497   -but be liberal in reading. This way, the product of our software will be
2498   -accepted by the widest range of other programs, and we will accept the
2499   -widest range of input files. This library attempts to conform to that
2500   -philosophy whenever possible but also aims to provide strict checking
2501   -for people who want to validate PDF files. If you don't want to see
2502   -warnings and are trying to write something that is tolerant, you can
2503   -call ``setSuppressWarnings(true)``. If you want to fail on the first
2504   -error, you can call ``setAttemptRecovery(false)``. The default behavior
2505   -is to generating warnings for recoverable problems. Note that recovery
2506   -will not always produce the desired results even if it is able to get
2507   -through the file. Unlike most other PDF files that produce generic
2508   -warnings such as "This file is damaged,", qpdf generally issues a
2509   -detailed error message that would be most useful to a PDF developer.
2510   -This is by design as there seems to be a shortage of PDF validation
2511   -tools out there. This was, in fact, one of the major motivations behind
2512   -the initial creation of qpdf.
2513   -
2514   -.. _ref.design-goals:
2515   -
2516   -Design Goals
2517   -------------
2518   -
2519   -The QPDF package includes support for reading and rewriting PDF files.
2520   -It aims to hide from the user details involving object locations,
2521   -modified (appended) PDF files, the directness/indirectness of objects,
2522   -and stream filters including encryption. It does not aim to hide
2523   -knowledge of the object hierarchy or content stream contents. Put
2524   -another way, a user of the qpdf library is expected to have knowledge
2525   -about how PDF files work, but is not expected to have to keep track of
2526   -bookkeeping details such as file positions.
2527   -
2528   -A user of the library never has to care whether an object is direct or
2529   -indirect, though it is possible to determine whether an object is direct
2530   -or not if this information is needed. All access to objects deals with
2531   -this transparently. All memory management details are also handled by
2532   -the library.
2533   -
2534   -The ``PointerHolder`` object is used internally by the library to deal
2535   -with memory management. This is basically a smart pointer object very
2536   -similar in spirit to C++-11's ``std::shared_ptr`` object, but predating
2537   -it by several years. This library also makes use of a technique for
2538   -giving fine-grained access to methods in one class to other classes by
2539   -using public subclasses with friends and only private members that in
2540   -turn call private methods of the containing class. See
2541   -``QPDFObjectHandle::Factory`` as an example.
2542   -
2543   -The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF
2544   -file. The library provides methods for both accessing and mutating PDF
2545   -files.
2546   -
2547   -The primary class for interacting with PDF objects is
2548   -``QPDFObjectHandle``. Instances of this class can be passed around by
2549   -value, copied, stored in containers, etc. with very low overhead.
2550   -Instances of ``QPDFObjectHandle`` created by reading from a file will
2551   -always contain a reference back to the ``QPDF`` object from which they
2552   -were created. A ``QPDFObjectHandle`` may be direct or indirect. If
2553   -indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to
2554   -is a null pointer. In this case, the first attempt to access the
2555   -underlying ``QPDFObject`` will result in the ``QPDFObject`` being
2556   -resolved via a call to the referenced ``QPDF`` instance. This makes it
2557   -essentially impossible to make coding errors in which certain things
2558   -will work for some PDF files and not for others based on which objects
2559   -are direct and which objects are indirect.
2560   -
2561   -Instances of ``QPDFObjectHandle`` can be directly created and modified
2562   -using static factory methods in the ``QPDFObjectHandle`` class. There
2563   -are factory methods for each type of object as well as a convenience
2564   -method ``QPDFObjectHandle::parse`` that creates an object from a string
2565   -representation of the object. Existing instances of ``QPDFObjectHandle``
2566   -can also be modified in several ways. See comments in
2567   -:file:`QPDFObjectHandle.hh` for details.
2568   -
2569   -An instance of ``QPDF`` is constructed by using the class's default
2570   -constructor. If desired, the ``QPDF`` object may be configured with
2571   -various methods that change its default behavior. Then the
2572   -``QPDF::processFile()`` method is passed the name of a PDF file, which
2573   -permanently associates the file with that QPDF object. A password may
2574   -also be given for access to password-protected files. QPDF does not
2575   -enforce encryption parameters and will treat user and owner passwords
2576   -equivalently. Either password may be used to access an encrypted file.
2577   -``QPDF`` will allow recovery of a user password given an owner password.
2578   -The input PDF file must be seekable. (Output files written by
2579   -``QPDFWriter`` need not be seekable, even when creating linearized
2580   -files.) During construction, ``QPDF`` validates the PDF file's header,
2581   -and then reads the cross reference tables and trailer dictionaries. The
2582   -``QPDF`` class keeps only the first trailer dictionary though it does
2583   -read all of them so it can check the ``/Prev`` key. ``QPDF`` class users
2584   -may request the root object and the trailer dictionary specifically. The
2585   -cross reference table is kept private. Objects may then be requested by
2586   -number of by walking the object tree.
2587   -
2588   -When a PDF file has a cross-reference stream instead of a
2589   -cross-reference table and trailer, requesting the document's trailer
2590   -dictionary returns the stream dictionary from the cross-reference stream
2591   -instead.
2592   -
2593   -There are some convenience routines for very common operations such as
2594   -walking the page tree and returning a vector of all page objects. For
2595   -full details, please see the header files
2596   -:file:`QPDF.hh` and
2597   -:file:`QPDFObjectHandle.hh`. There are also some
2598   -additional helper classes that provide higher level API functions for
2599   -certain document constructions. These are discussed in :ref:`ref.helper-classes`.
2600   -
2601   -.. _ref.helper-classes:
2602   -
2603   -Helper Classes
2604   ---------------
2605   -
2606   -QPDF version 8.1 introduced the concept of helper classes. Helper
2607   -classes are intended to contain higher level APIs that allow developers
2608   -to work with certain document constructs at an abstraction level above
2609   -that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of
2610   -not hiding document structure from the developer. As with qpdf in
2611   -general, the goal is take away some of the more tedious bookkeeping
2612   -aspects of working with PDF files, not to remove the need for the
2613   -developer to understand how the PDF construction in question works. The
2614   -driving factor behind the creation of helper classes was to allow the
2615   -evolution of higher level interfaces in qpdf without polluting the
2616   -interfaces of the main top-level classes ``QPDF`` and
2617   -``QPDFObjectHandle``.
2618   -
2619   -There are two kinds of helper classes: *document* helpers and *object*
2620   -helpers. Document helpers are constructed with a reference to a ``QPDF``
2621   -object and provide methods for working with structures that are at the
2622   -document level. Object helpers are constructed with an instance of a
2623   -``QPDFObjectHandle`` and provide methods for working with specific types
2624   -of objects.
2625   -
2626   -Examples of document helpers include ``QPDFPageDocumentHelper``, which
2627   -contains methods for operating on the document's page trees, such as
2628   -enumerating all pages of a document and adding and removing pages; and
2629   -``QPDFAcroFormDocumentHelper``, which contains document-level methods
2630   -related to interactive forms, such as enumerating form fields and
2631   -creating mappings between form fields and annotations.
2632   -
2633   -Examples of object helpers include ``QPDFPageObjectHelper`` for
2634   -performing operations on pages such as page rotation and some operations
2635   -on content streams, ``QPDFFormFieldObjectHelper`` for performing
2636   -operations related to interactive form fields, and
2637   -``QPDFAnnotationObjectHelper`` for working with annotations.
2638   -
2639   -It is always possible to retrieve the underlying ``QPDF`` reference from
2640   -a document helper and the underlying ``QPDFObjectHandle`` reference from
2641   -an object helper. Helpers are designed to be helpers, not wrappers. The
2642   -intention is that, in general, it is safe to freely intermix operations
2643   -that use helpers with operations that use the underlying objects.
2644   -Document and object helpers do not attempt to provide a complete
2645   -interface for working with the things they are helping with, nor do they
2646   -attempt to encapsulate underlying structures. They just provide a few
2647   -methods to help with error-prone, repetitive, or complex tasks. In some
2648   -cases, a helper object may cache some information that is expensive to
2649   -gather. In such cases, the helper classes are implemented so that their
2650   -own methods keep the cache consistent, and the header file will provide
2651   -a method to invalidate the cache and a description of what kinds of
2652   -operations would make the cache invalid. If in doubt, you can always
2653   -discard a helper class and create a new one with the same underlying
2654   -objects, which will ensure that you have discarded any stale
2655   -information.
2656   -
2657   -By Convention, document helpers are called
2658   -``QPDFSomethingDocumentHelper`` and are derived from
2659   -``QPDFDocumentHelper``, and object helpers are called
2660   -``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``.
2661   -For details on specific helpers, please see their header files. You can
2662   -find them by looking at
2663   -:file:`include/qpdf/QPDF*DocumentHelper.hh` and
2664   -:file:`include/qpdf/QPDF*ObjectHelper.hh`.
2665   -
2666   -In order to avoid creation of circular dependencies, the following
2667   -general guidelines are followed with helper classes:
2668   -
2669   -- Core class interfaces do not know about helper classes. For example,
2670   - no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper
2671   - classes in their interfaces.
2672   -
2673   -- Interfaces of object helpers will usually not use document helpers in
2674   - their interfaces. This is because it is much more useful for document
2675   - helpers to have methods that return object helpers. Most operations
2676   - in PDF files start at the document level and go from there to the
2677   - object level rather than the other way around. It can sometimes be
2678   - useful to map back from object-level structures to document-level
2679   - structures. If there is a desire to do this, it will generally be
2680   - provided by a method in the document helper class.
2681   -
2682   -- Most of the time, object helpers don't know about other object
2683   - helpers. However, in some cases, one type of object may be a
2684   - container for another type of object, in which case it may make sense
2685   - for the outer object to know about the inner object. For example,
2686   - there are methods in the ``QPDFPageObjectHelper`` that know
2687   - ``QPDFAnnotationObjectHelper`` because references to annotations are
2688   - contained in page dictionaries.
2689   -
2690   -- Any helper or core library class may use helpers in their
2691   - implementations.
2692   -
2693   -Prior to qpdf version 8.1, higher level interfaces were added as
2694   -"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For
2695   -compatibility, older convenience functions for operating with pages will
2696   -remain in those classes even as alternatives are provided in helper
2697   -classes. Going forward, new higher level interfaces will be provided
2698   -using helper classes.
2699   -
2700   -.. _ref.implementation-notes:
2701   -
2702   -Implementation Notes
2703   ---------------------
2704   -
2705   -This section contains a few notes about QPDF's internal implementation,
2706   -particularly around what it does when it first processes a file. This
2707   -section is a bit of a simplification of what it actually does, but it
2708   -could serve as a starting point to someone trying to understand the
2709   -implementation. There is nothing in this section that you need to know
2710   -to use the qpdf library.
2711   -
2712   -``QPDFObject`` is the basic PDF Object class. It is an abstract base
2713   -class from which are derived classes for each type of PDF object.
2714   -Clients do not interact with Objects directly but instead interact with
2715   -``QPDFObjectHandle``.
2716   -
2717   -When the ``QPDF`` class creates a new object, it dynamically allocates
2718   -the appropriate type of ``QPDFObject`` and immediately hands the pointer
2719   -to an instance of ``QPDFObjectHandle``. The parser reads a token from
2720   -the current file position. If the token is a not either a dictionary or
2721   -array opener, an object is immediately constructed from the single token
2722   -and the parser returns. Otherwise, the parser iterates in a special mode
2723   -in which it accumulates objects until it finds a balancing closer.
2724   -During this process, the "``R``" keyword is recognized and an indirect
2725   -``QPDFObjectHandle`` may be constructed.
2726   -
2727   -The ``QPDF::resolve()`` method, which is used to resolve an indirect
2728   -object, may be invoked from the ``QPDFObjectHandle`` class. It first
2729   -checks a cache to see whether this object has already been read. If not,
2730   -it reads the object from the PDF file and caches it. It the returns the
2731   -resulting ``QPDFObjectHandle``. The calling object handle then replaces
2732   -its ``PointerHolder<QDFObject>`` with the one from the newly returned
2733   -``QPDFObjectHandle``. In this way, only a single copy of any direct
2734   -object need exist and clients can access objects transparently without
2735   -knowing caring whether they are direct or indirect objects.
2736   -Additionally, no object is ever read from the file more than once. That
2737   -means that only the portions of the PDF file that are actually needed
2738   -are ever read from the input file, thus allowing the qpdf package to
2739   -take advantage of this important design goal of PDF files.
2740   -
2741   -If the requested object is inside of an object stream, the object stream
2742   -itself is first read into memory. Then the tokenizer reads objects from
2743   -the memory stream based on the offset information stored in the stream.
2744   -Those individual objects are cached, after which the temporary buffer
2745   -holding the object stream contents are discarded. In this way, the first
2746   -time an object in an object stream is requested, all objects in the
2747   -stream are cached.
2748   -
2749   -The following example should clarify how ``QPDF`` processes a simple
2750   -file.
2751   -
2752   -- Client constructs ``QPDF`` ``pdf`` and calls
2753   - ``pdf.processFile("a.pdf");``.
2754   -
2755   -- The ``QPDF`` class checks the beginning of
2756   - :file:`a.pdf` for a PDF header. It then reads the
2757   - cross reference table mentioned at the end of the file, ensuring that
2758   - it is looking before the last ``%%EOF``. After getting to ``trailer``
2759   - keyword, it invokes the parser.
2760   -
2761   -- The parser sees "``<<``", so it calls itself recursively in
2762   - dictionary creation mode.
2763   -
2764   -- In dictionary creation mode, the parser keeps accumulating objects
2765   - until it encounters "``>>``". Each object that is read is pushed onto
2766   - a stack. If "``R``" is read, the last two objects on the stack are
2767   - inspected. If they are integers, they are popped off the stack and
2768   - their values are used to construct an indirect object handle which is
2769   - then pushed onto the stack. When "``>>``" is finally read, the stack
2770   - is converted into a ``QPDF_Dictionary`` which is placed in a
2771   - ``QPDFObjectHandle`` and returned.
2772   -
2773   -- The resulting dictionary is saved as the trailer dictionary.
2774   -
2775   -- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that
2776   - point and repeats except that the new trailer dictionary is not
2777   - saved. If ``/Prev`` is not present, the initial parsing process is
2778   - complete.
2779   -
2780   - If there is an encryption dictionary, the document's encryption
2781   - parameters are initialized.
2782   -
2783   -- The client requests root object. The ``QPDF`` class gets the value of
2784   - root key from trailer dictionary and returns it. It is an unresolved
2785   - indirect ``QPDFObjectHandle``.
2786   -
2787   -- The client requests the ``/Pages`` key from root
2788   - ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is
2789   - indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the
2790   - object cache for an object with the root dictionary's object ID and
2791   - generation number. Upon not seeing it, it checks the cross reference
2792   - table, gets the offset, and reads the object present at that offset.
2793   - It stores the result in the object cache and returns the cached
2794   - result. The calling ``QPDFObjectHandle`` replaces its object pointer
2795   - with the one from the resolved ``QPDFObjectHandle``, verifies that it
2796   - a valid dictionary object, and returns the (unresolved indirect)
2797   - ``QPDFObject`` handle to the top of the Pages hierarchy.
2798   -
2799   - As the client continues to request objects, the same process is
2800   - followed for each new requested object.
2801   -
2802   -.. _ref.casting:
2803   -
2804   -Casting Policy
2805   ---------------
2806   -
2807   -This section describes the casting policy followed by qpdf's
2808   -implementation. This is no concern to qpdf's end users and largely of no
2809   -concern to people writing code that uses qpdf, but it could be of
2810   -interest to people who are porting qpdf to a new platform or who are
2811   -making modifications to the code.
2812   -
2813   -The C++ code in qpdf is free of old-style casts except where unavoidable
2814   -(e.g. where the old-style cast is in a macro provided by a third-party
2815   -header file). When there is a need for a cast, it is handled, in order
2816   -of preference, by rewriting the code to avoid the need for a cast,
2817   -calling ``const_cast``, calling ``static_cast``, calling
2818   -``reinterpret_cast``, or calling some combination of the above. As a
2819   -last resort, a compiler-specific ``#pragma`` may be used to suppress a
2820   -warning that we don't want to fix. Examples may include suppressing
2821   -warnings about the use of old-style casts in code that is shared between
2822   -C and C++ code.
2823   -
2824   -The ``QIntC`` namespace, provided by
2825   -:file:`include/qpdf/QIntC.hh`, implements safe
2826   -functions for converting between integer types. These functions do range
2827   -checking and throw a ``std::range_error``, which is subclass of
2828   -``std::runtime_error``, if conversion from one integer type to another
2829   -results in loss of information. There are many cases in which we have to
2830   -move between different integer types because of incompatible integer
2831   -types used in interoperable interfaces. Some are unavoidable, such as
2832   -moving between sizes and offsets, and others are there because of old
2833   -code that is too in entrenched to be fixable without breaking source
2834   -compatibility and causing pain for users. QPDF is compiled with extra
2835   -warnings to detect conversions with potential data loss, and all such
2836   -cases should be fixed by either using a function from ``QIntC`` or a
2837   -``static_cast``.
2838   -
2839   -When the intention is just to switch the type because of exchanging data
2840   -between incompatible interfaces, use ``QIntC``. This is the usual case.
2841   -However, there are some cases in which we are explicitly intending to
2842   -use the exact same bit pattern with a different type. This is most
2843   -common when switching between signed and unsigned characters. A lot of
2844   -qpdf's code uses unsigned characters internally, but ``std::string`` and
2845   -``char`` are signed. Using ``QIntC::to_char`` would be wrong for
2846   -converting from unsigned to signed characters because a negative
2847   -``char`` value and the corresponding ``unsigned char`` value greater
2848   -than 127 *mean the same thing*. There are also
2849   -cases in which we use ``static_cast`` when working with bit fields where
2850   -we are not representing a numerical value but rather a bunch of bits
2851   -packed together in some integer type. Also note that ``size_t`` and
2852   -``long`` both typically differ between 32-bit and 64-bit environments,
2853   -so sometimes an explicit cast may not be needed to avoid warnings on one
2854   -platform but may be needed on another. A conversion with ``QIntC``
2855   -should always be used when the types are different even if the
2856   -underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit
2857   -platforms, and the test suite is very thorough, so it is hard to make
2858   -any of the potential errors here without being caught in build or test.
2859   -
2860   -Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The
2861   -pipeline interface has a ``write`` call that uses ``unsigned char*``
2862   -without a ``const`` qualifier. The main reason for this is
2863   -to support pipelines that make calls to third-party libraries, such as
2864   -zlib, that don't include ``const`` in their interfaces. Unfortunately,
2865   -there are many places in the code where it is desirable to have
2866   -``const char*`` with pipelines. None of the pipeline implementations
2867   -in qpdf
2868   -currently modify the data passed to write, and doing so would be counter
2869   -to the intent of ``Pipeline``, but there is nothing in the code to
2870   -prevent this from being done. There are places in the code where
2871   -``const_cast`` is used to remove the const-ness of pointers going into
2872   -``Pipeline``\ s. This could theoretically be unsafe, but there is
2873   -adequate testing to assert that it is safe and will remain safe in
2874   -qpdf's code.
2875   -
2876   -.. _ref.encryption:
2877   -
2878   -Encryption
2879   -----------
2880   -
2881   -Encryption is supported transparently by qpdf. When opening a PDF file,
2882   -if an encryption dictionary exists, the ``QPDF`` object processes this
2883   -dictionary using the password (if any) provided. The primary decryption
2884   -key is computed and cached. No further access is made to the encryption
2885   -dictionary after that time. When an object is read from a file, the
2886   -object ID and generation of the object in which it is contained is
2887   -always known. Using this information along with the stored encryption
2888   -key, all stream and string objects are transparently decrypted. Raw
2889   -encrypted objects are never stored in memory. This way, nothing in the
2890   -library ever has to know or care whether it is reading an encrypted
2891   -file.
2892   -
2893   -An interface is also provided for writing encrypted streams and strings
2894   -given an encryption key. This is used by ``QPDFWriter`` when it rewrites
2895   -encrypted files.
2896   -
2897   -When copying encrypted files, unless otherwise directed, qpdf will
2898   -preserve any encryption in force in the original file. qpdf can do this
2899   -with either the user or the owner password. There is no difference in
2900   -capability based on which password is used. When 40 or 128 bit
2901   -encryption keys are used, the user password can be recovered with the
2902   -owner password. With 256 keys, the user and owner passwords are used
2903   -independently to encrypt the actual encryption key, so while either can
2904   -be used, the owner password can no longer be used to recover the user
2905   -password.
2906   -
2907   -Starting with version 4.0.0, qpdf can read files that are not encrypted
2908   -but that contain encrypted attachments, but it cannot write such files.
2909   -qpdf also requires the password to be specified in order to open the
2910   -file, not just to extract attachments, since once the file is open, all
2911   -decryption is handled transparently. When copying files like this while
2912   -preserving encryption, qpdf will apply the file's encryption to
2913   -everything in the file, not just to the attachments. When decrypting the
2914   -file, qpdf will decrypt the attachments. In general, when copying PDF
2915   -files with multiple encryption formats, qpdf will choose the newest
2916   -format. The only exception to this is that clear-text metadata will be
2917   -preserved as clear-text if it is that way in the original file.
2918   -
2919   -One point of confusion some people have about encrypted PDF files is
2920   -that encryption is not the same as password protection. Password
2921   -protected files are always encrypted, but it is also possible to create
2922   -encrypted files that do not have passwords. Internally, such files use
2923   -the empty string as a password, and most readers try the empty string
2924   -first to see if it works and prompt for a password only if the empty
2925   -string doesn't work. Normally such files have an empty user password and
2926   -a non-empty owner password. In that way, if the file is opened by an
2927   -ordinary reader without specification of password, the restrictions
2928   -specified in the encryption dictionary can be enforced. Most users
2929   -wouldn't even realize such a file was encrypted. Since qpdf always
2930   -ignores the restrictions (except for the purpose of reporting what they
2931   -are), qpdf doesn't care which password you use. QPDF will allow you to
2932   -create PDF files with non-empty user passwords and empty owner
2933   -passwords. Some readers will require a password when you open these
2934   -files, and others will open the files without a password and not enforce
2935   -restrictions. Having a non-empty user password and an empty owner
2936   -password doesn't really make sense because it would mean that opening
2937   -the file with the user password would be more restrictive than not
2938   -supplying a password at all. QPDF also allows you to create PDF files
2939   -with the same password as both the user and owner password. Some readers
2940   -will not ever allow such files to be accessed without restrictions
2941   -because they never try the password as the owner password if it works as
2942   -the user password. Nonetheless, one of the powerful aspects of qpdf is
2943   -that it allows you to finely specify the way encrypted files are
2944   -created, even if the results are not useful to some readers. One use
2945   -case for this would be for testing a PDF reader to ensure that it
2946   -handles odd configurations of input files.
2947   -
2948   -.. _ref.random-numbers:
2949   -
2950   -Random Number Generation
2951   -------------------------
2952   -
2953   -QPDF generates random numbers to support generation of encrypted data.
2954   -Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of
2955   -random numbers. Older versions used the OS-provided source of secure
2956   -random numbers or, if allowed at build time, insecure random numbers
2957   -from stdlib. Starting with version 5.1.0, you can disable use of
2958   -OS-provided secure random numbers at build time. This is especially
2959   -useful on Windows if you want to avoid a dependency on Microsoft's
2960   -cryptography API. You can also supply your own random data provider. For
2961   -details on how to do this, please refer to the top-level README.md file
2962   -in the source distribution and to comments in
2963   -:file:`QUtil.hh`.
2964   -
2965   -.. _ref.adding-and-remove-pages:
2966   -
2967   -Adding and Removing Pages
2968   --------------------------
2969   -
2970   -While qpdf's API has supported adding and modifying objects for some
2971   -time, version 3.0 introduces specific methods for adding and removing
2972   -pages. These are largely convenience routines that handle two tricky
2973   -issues: pushing inheritable resources from the ``/Pages`` tree down to
2974   -individual pages and manipulation of the ``/Pages`` tree itself. For
2975   -details, see ``addPage`` and surrounding methods in
2976   -:file:`QPDF.hh`.
2977   -
2978   -.. _ref.reserved-objects:
2979   -
2980   -Reserving Object Numbers
2981   -------------------------
2982   -
2983   -Version 3.0 of qpdf introduced the concept of reserved objects. These
2984   -are seldom needed for ordinary operations, but there are cases in which
2985   -you may want to add a series of indirect objects with references to each
2986   -other to a ``QPDF`` object. This causes a problem because you can't
2987   -determine the object ID that a new indirect object will have until you
2988   -add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The
2989   -only way to add two mutually referential objects to a ``QPDF`` object
2990   -prior to version 3.0 would be to add the new objects first and then make
2991   -them refer to each other after adding them. Now it is possible to create
2992   -a *reserved object* using
2993   -``QPDFObjectHandle::newReserved``. This is an indirect object that stays
2994   -"unresolved" even if it is queried for its type. So now, if you want to
2995   -create a set of mutually referential objects, you can create
2996   -reservations for each one of them and use those reservations to
2997   -construct the references. When finished, you can call
2998   -``QPDF::replaceReserved`` to replace the reserved objects with the real
2999   -ones. This functionality will never be needed by most applications, but
3000   -it is used internally by QPDF when copying objects from other PDF files,
3001   -as discussed in :ref:`ref.foreign-objects`. For an example of how to use reserved
3002   -objects, search for ``newReserved`` in
3003   -:file:`test_driver.cc` in qpdf's sources.
3004   -
3005   -.. _ref.foreign-objects:
3006   -
3007   -Copying Objects From Other PDF Files
3008   -------------------------------------
3009   -
3010   -Version 3.0 of qpdf introduced the ability to copy objects into a
3011   -``QPDF`` object from a different ``QPDF`` object, which we refer to as
3012   -*foreign objects*. This allows arbitrary
3013   -merging of PDF files. The "from" ``QPDF`` object must remain valid after
3014   -the copy as discussed in the note below. The
3015   -:command:`qpdf` command-line tool provides limited
3016   -support for basic page selection, including merging in pages from other
3017   -files, but the library's API makes it possible to implement arbitrarily
3018   -complex merging operations. The main method for copying foreign objects
3019   -is ``QPDF::copyForeignObject``. This takes an indirect object from
3020   -another ``QPDF`` and copies it recursively into this object while
3021   -preserving all object structure, including circular references. This
3022   -means you can add a direct object that you create from scratch to a
3023   -``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an
3024   -indirect object from another file with ``QPDF::copyForeignObject``. The
3025   -fact that ``QPDF::makeIndirectObject`` does not automatically detect a
3026   -foreign object and copy it is an explicit design decision. Copying a
3027   -foreign object seems like a sufficiently significant thing to do that it
3028   -should be done explicitly.
3029   -
3030   -The other way to copy foreign objects is by passing a page from one
3031   -``QPDF`` to another by calling ``QPDF::addPage``. In contrast to
3032   -``QPDF::makeIndirectObject``, this method automatically distinguishes
3033   -between indirect objects in the current file, foreign objects, and
3034   -direct objects.
3035   -
3036   -Please note: when you copy objects from one ``QPDF`` to another, the
3037   -source ``QPDF`` object must remain valid until you have finished with
3038   -the destination object. This is because the original object is still
3039   -used to retrieve any referenced stream data from the copied object.
3040   -
3041   -.. _ref.rewriting:
3042   -
3043   -Writing PDF Files
3044   ------------------
3045   -
3046   -The qpdf library supports file writing of ``QPDF`` objects to PDF files
3047   -through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two
3048   -writing modes: one for non-linearized files, and one for linearized
3049   -files. See :ref:`ref.linearization` for a description of
3050   -linearization is implemented. This section describes how we write
3051   -non-linearized files including the creation of QDF files (see :ref:`ref.qdf`.
3052   -
3053   -This outline was written prior to implementation and is not exactly
3054   -accurate, but it provides a correct "notional" idea of how writing
3055   -works. Look at the code in ``QPDFWriter`` for exact details.
3056   -
3057   -- Initialize state:
3058   -
3059   - - next object number = 1
3060   -
3061   - - object queue = empty
3062   -
3063   - - renumber table: old object id/generation to new id/0 = empty
3064   -
3065   - - xref table: new id -> offset = empty
3066   -
3067   -- Create a QPDF object from a file.
3068   -
3069   -- Write header for new PDF file.
3070   -
3071   -- Request the trailer dictionary.
3072   -
3073   -- For each value that is an indirect object, grab the next object
3074   - number (via an operation that returns and increments the number). Map
3075   - object to new number in renumber table. Push object onto queue.
3076   -
3077   -- While there are more objects on the queue:
3078   -
3079   - - Pop queue.
3080   -
3081   - - Look up object's new number *n* in the renumbering table.
3082   -
3083   - - Store current offset into xref table.
3084   -
3085   - - Write ``:samp:`{n}` 0 obj``.
3086   -
3087   - - If object is null, whether direct or indirect, write out null,
3088   - thus eliminating unresolvable indirect object references.
3089   -
3090   - - If the object is a stream stream, write stream contents, piped
3091   - through any filters as required, to a memory buffer. Use this
3092   - buffer to determine the stream length.
3093   -
3094   - - If object is not a stream, array, or dictionary, write out its
3095   - contents.
3096   -
3097   - - If object is an array or dictionary (including stream), traverse
3098   - its elements (for array) or values (for dictionaries), handling
3099   - recursive dictionaries and arrays, looking for indirect objects.
3100   - When an indirect object is found, if it is not resolvable, ignore.
3101   - (This case is handled when writing it out.) Otherwise, look it up
3102   - in the renumbering table. If not found, grab the next available
3103   - object number, assign to the referenced object in the renumbering
3104   - table, and push the referenced object onto the queue. As a special
3105   - case, when writing out a stream dictionary, replace length,
3106   - filters, and decode parameters as required.
3107   -
3108   - Write out dictionary or array, replacing any unresolvable indirect
3109   - object references with null (pdf spec says reference to
3110   - non-existent object is legal and resolves to null) and any
3111   - resolvable ones with references to the renumbered objects.
3112   -
3113   - - If the object is a stream, write ``stream\n``, the stream contents
3114   - (from the memory buffer), and ``\nendstream\n``.
3115   -
3116   - - When done, write ``endobj``.
3117   -
3118   -Once we have finished the queue, all referenced objects will have been
3119   -written out and all deleted objects or unreferenced objects will have
3120   -been skipped. The new cross-reference table will contain an offset for
3121   -every new object number from 1 up to the number of objects written. This
3122   -can be used to write out a new xref table. Finally we can write out the
3123   -trailer dictionary with appropriately computed /ID (see spec, 8.3, File
3124   -Identifiers), the cross reference table offset, and ``%%EOF``.
3125   -
3126   -.. _ref.filtered-streams:
3127   -
3128   -Filtered Streams
3129   -----------------
3130   -
3131   -Support for streams is implemented through the ``Pipeline`` interface
3132   -which was designed for this package.
3133   -
3134   -When reading streams, create a series of ``Pipeline`` objects. The
3135   -``Pipeline`` abstract base requires implementation ``write()`` and
3136   -``finish()`` and provides an implementation of ``getNext()``. Each
3137   -pipeline object, upon receiving data, does whatever it is going to do
3138   -and then writes the data (possibly modified) to its successor.
3139   -Alternatively, a pipeline may be an end-of-the-line pipeline that does
3140   -something like store its output to a file or a memory buffer ignoring a
3141   -successor. For additional details, look at
3142   -:file:`Pipeline.hh`.
3143   -
3144   -``QPDF`` can read raw or filtered streams. When reading a filtered
3145   -stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each
3146   -appropriate filter object and chains them together. The last filter
3147   -should write to whatever type of output is required. The ``QPDF`` class
3148   -has an interface to write raw or filtered stream contents to a given
3149   -pipeline.
3150   -
3151   -.. _ref.object-accessors:
3152   -
3153   -Object Accessor Methods
3154   ------------------------
3155   -
3156   -..
3157   - This section is referenced in QPDFObjectHandle.hh
3158   -
3159   -For general information about how to access instances of
3160   -``QPDFObjectHandle``, please see the comments in
3161   -:file:`QPDFObjectHandle.hh`. Search for "Accessor
3162   -methods". This section provides a more in-depth discussion of the
3163   -behavior and the rationale for the behavior.
3164   -
3165   -*Why were type errors made into warnings?* When type checks were
3166   -introduced into qpdf in the early days, it was expected that type errors
3167   -would only occur as a result of programmer error. However, in practice,
3168   -type errors would occur with malformed PDF files because of assumptions
3169   -made in code, including code within the qpdf library and code written by
3170   -library users. The most common case would be chaining calls to
3171   -``getKey()`` to access keys deep within a dictionary. In many cases,
3172   -qpdf would be able to recover from these situations, but the old
3173   -behavior often resulted in crashes rather than graceful recovery. For
3174   -this reason, the errors were changed to warnings.
3175   -
3176   -*Why even warn about type errors when the user can't usually do anything
3177   -about them?* Type warnings are extremely valuable during development.
3178   -Since it's impossible to catch at compile time things like typos in
3179   -dictionary key names or logic errors around what the structure of a PDF
3180   -file might be, the presence of type warnings can save lots of developer
3181   -time. They have also proven useful in exposing issues in qpdf itself
3182   -that would have otherwise gone undetected.
3183   -
3184   -*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if
3185   -``QPDFObjectHandle`` could be more strongly typed so that you'd have to
3186   -have check that something was of a particular type before calling
3187   -type-specific accessor methods. However, implementing this at this stage
3188   -of the library's history would be quite difficult, and it would make a
3189   -the common pattern of drilling into an object no longer work. While it
3190   -would be possible to have a parallel interface, it would create a lot of
3191   -extra code. If qpdf were written in a language like rust, an interface
3192   -like this would make a lot of sense, but, for a variety of reasons, the
3193   -qpdf API is consistent with other APIs of its time, relying on exception
3194   -handling to catch errors. The underlying PDF objects are inherently not
3195   -type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would
3196   -ultimately cause a lot more code to have to be written and would like
3197   -make software that uses qpdf more brittle, and even so, checks would
3198   -have to occur at runtime.
3199   -
3200   -*Why do type errors sometimes raise exceptions?* The way warnings work
3201   -in qpdf requires a ``QPDF`` object to be associated with an object
3202   -handle for a warning to be issued. It would be nice if this could be
3203   -fixed, but it would require major changes to the API. Rather than
3204   -throwing away these conditions, we convert them to exceptions. It's not
3205   -that bad though. Since any object handle that was read from a file has
3206   -an associated ``QPDF`` object, it would only be type errors on objects
3207   -that were created explicitly that would cause exceptions, and in that
3208   -case, type errors are much more likely to be the result of a coding
3209   -error than invalid input.
3210   -
3211   -*Why does the behavior of a type exception differ between the C and C++
3212   -API?* There is no way to throw and catch exceptions in C short of
3213   -something like ``setjmp`` and ``longjmp``, and that approach is not
3214   -portable across language barriers. Since the C API is often used from
3215   -other languages, it's important to keep things as simple as possible.
3216   -Starting in qpdf 10.5, exceptions that used to crash code using the C
3217   -API will be written to stderr by default, and it is possible to register
3218   -an error handler. There's no reason that the error handler can't
3219   -simulate exception handling in some way, such as by using ``setjmp`` and
3220   -``longjmp`` or by setting some variable that can be checked after
3221   -library calls are made. In retrospect, it might have been better if the
3222   -C API object handle methods returned error codes like the other methods
3223   -and set return values in passed-in pointers, but this would complicate
3224   -both the implementation and the use of the library for a case that is
3225   -actually quite rare and largely avoidable.
3226   -
3227   -.. _ref.linearization:
3228   -
3229   -Linearization
3230   -=============
3231   -
3232   -This chapter describes how ``QPDF`` and ``QPDFWriter`` implement
3233   -creation and processing of linearized PDFS.
3234   -
3235   -.. _ref.linearization-strategy:
3236   -
3237   -Basic Strategy for Linearization
3238   ---------------------------------
3239   -
3240   -To avoid the incestuous problem of having the qpdf library validate its
3241   -own linearized files, we have a special linearized file checking mode
3242   -which can be invoked via :command:`qpdf
3243   ---check-linearization` (or :command:`qpdf
3244   ---check`). This mode reads the linearization parameter
3245   -dictionary and the hint streams and validates that object ordering,
3246   -parameters, and hint stream contents are correct. The validation code
3247   -was first tested against linearized files created by external tools
3248   -(Acrobat and pdlin) and then used to validate files created by
3249   -``QPDFWriter`` itself.
3250   -
3251   -.. _ref.linearized.preparation:
3252   -
3253   -Preparing For Linearization
3254   ----------------------------
3255   -
3256   -Before creating a linearized PDF file from any other PDF file, the PDF
3257   -file must be altered such that all page attributes are propagated down
3258   -to the page level (and not inherited from parents in the ``/Pages``
3259   -tree). We also have to know which objects refer to which other objects,
3260   -being concerned with page boundaries and a few other cases. We refer to
3261   -this part of preparing the PDF file as
3262   -*optimization*, discussed in
3263   -:ref:`ref.optimization`. Note the, in this context, the
3264   -term *optimization* is a qpdf term, and the
3265   -term *linearization* is a term from the PDF
3266   -specification. Do not be confused by the fact that many applications
3267   -refer to linearization as optimization or web optimization.
3268   -
3269   -When creating linearized PDF files from optimized PDF files, there are
3270   -really only a few issues that need to be dealt with:
3271   -
3272   -- Creation of hints tables
3273   -
3274   -- Placing objects in the correct order
3275   -
3276   -- Filling in offsets and byte sizes
3277   -
3278   -.. _ref.optimization:
3279   -
3280   -Optimization
3281   -------------
3282   -
3283   -In order to perform various operations such as linearization and
3284   -splitting files into pages, it is necessary to know which objects are
3285   -referenced by which pages, page thumbnails, and root and trailer
3286   -dictionary keys. It is also necessary to ensure that all page-level
3287   -attributes appear directly at the page level and are not inherited from
3288   -parents in the pages tree.
3289   -
3290   -We refer to the process of enforcing these constraints as
3291   -*optimization*. As mentioned above, note
3292   -that some applications refer to linearization as optimization. Although
3293   -this optimization was initially motivated by the need to create
3294   -linearized files, we are using these terms separately.
3295   -
3296   -PDF file optimization is implemented in the
3297   -:file:`QPDF_optimization.cc` source file. That file
3298   -is richly commented and serves as the primary reference for the
3299   -optimization process.
3300   -
3301   -After optimization has been completed, the private member variables
3302   -``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have
3303   -been populated. Any object that has more than one value in the
3304   -``object_to_obj_users`` table is shared. Any object that has exactly one
3305   -value in the ``object_to_obj_users`` table is private. To find all the
3306   -private objects in a page or a trailer or root dictionary key, one
3307   -merely has make this determination for each element in the
3308   -``obj_user_to_objects`` table for the given page or key.
3309   -
3310   -Note that pages and thumbnails have different object user types, so the
3311   -above test on a page will not include objects referenced by the page's
3312   -thumbnail dictionary and nothing else.
3313   -
3314   -.. _ref.linearization.writing:
3315   -
3316   -Writing Linearized Files
3317   -------------------------
3318   -
3319   -We will create files with only primary hint streams. We will never write
3320   -overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either,
3321   -and they are never necessary.) The hint streams contain offset
3322   -information to objects that point to where they would be if the hint
3323   -stream were not present. This means that we have to calculate all object
3324   -positions before we can generate and write the hint table. This means
3325   -that we have to generate the file in two passes. To make this reliable,
3326   -``QPDFWriter`` in linearization mode invokes exactly the same code twice
3327   -to write the file to a pipeline.
3328   -
3329   -In the first pass, the target pipeline is a count pipeline chained to a
3330   -discard pipeline. The count pipeline simply passes its data through to
3331   -the next pipeline in the chain but can return the number of bytes passed
3332   -through it at any intermediate point. The discard pipeline is an end of
3333   -line pipeline that just throws its data away. The hint stream is not
3334   -written and dummy values with adequate padding are stored in the first
3335   -cross reference table, linearization parameter dictionary, and /Prev key
3336   -of the first trailer dictionary. All the offset, length, object
3337   -renumbering information, and anything else we need for the second pass
3338   -is stored.
3339   -
3340   -At the end of the first pass, this information is passed to the ``QPDF``
3341   -class which constructs a compressed hint stream in a memory buffer and
3342   -returns it. ``QPDFWriter`` uses this information to write a complete
3343   -hint stream object into a memory buffer. At this point, the length of
3344   -the hint stream is known.
3345   -
3346   -In the second pass, the end of the pipeline chain is a regular file
3347   -instead of a discard pipeline, and we have known values for all the
3348   -offsets and lengths that we didn't have in the first pass. We have to
3349   -adjust offsets that appear after the start of the hint stream by the
3350   -length of the hint stream, which is known. Anything that is of variable
3351   -length is padded, with the padding code surrounding any writing code
3352   -that differs in the two passes. This ensures that changes to the way
3353   -things are represented never results in offsets that were gathered
3354   -during the first pass becoming incorrect for the second pass.
3355   -
3356   -Using this strategy, we can write linearized files to a non-seekable
3357   -output stream with only a single pass to disk or wherever the output is
3358   -going.
3359   -
3360   -.. _ref.linearization-data:
3361   -
3362   -Calculating Linearization Data
3363   -------------------------------
3364   -
3365   -Once a file is optimized, we have information about which objects access
3366   -which other objects. We can then process these tables to decide which
3367   -part (as described in "Linearized PDF Document Structure" in the PDF
3368   -specification) each object is contained within. This tells us the exact
3369   -order in which objects are written. The ``QPDFWriter`` class asks for
3370   -this information and enqueues objects for writing in the proper order.
3371   -It also turns on a check that causes an exception to be thrown if an
3372   -object is encountered that has not already been queued. (This could
3373   -happen only if there were a bug in the traversal code used to calculate
3374   -the linearization data.)
3375   -
3376   -.. _ref.linearization-issues:
3377   -
3378   -Known Issues with Linearization
3379   --------------------------------
3380   -
3381   -There are a handful of known issues with this linearization code. These
3382   -issues do not appear to impact the behavior of linearized files which
3383   -still work as intended: it is possible for a web browser to begin to
3384   -display them before they are fully downloaded. In fact, it seems that
3385   -various other programs that create linearized files have many of these
3386   -same issues. These items make reference to terminology used in the
3387   -linearization appendix of the PDF specification.
3388   -
3389   -- Thread Dictionary information keys appear in part 4 with the rest of
3390   - Threads instead of in part 9. Objects in part 9 are not grouped
3391   - together functionally.
3392   -
3393   -- We are not calculating numerators for shared object positions within
3394   - content streams or interleaving them within content streams.
3395   -
3396   -- We generate only page offset, shared object, and outline hint tables.
3397   - It would be relatively easy to add some additional tables. We gather
3398   - most of the information needed to create thumbnail hint tables. There
3399   - are comments in the code about this.
3400   -
3401   -.. _ref.linearization-debugging:
3402   -
3403   -Debugging Note
3404   ---------------
3405   -
3406   -The :command:`qpdf --show-linearization` command can show
3407   -the complete contents of linearization hint streams. To look at the raw
3408   -data, you can extract the filtered contents of the linearization hint
3409   -tables using :command:`qpdf --show-object=n
3410   ---filtered-stream-data`. Then, to convert this into a bit
3411   -stream (since linearization tables are bit streams written without
3412   -regard to byte boundaries), you can pipe the resulting data through the
3413   -following perl code:
3414   -
3415   -.. code-block:: perl
3416   -
3417   - use bytes;
3418   - binmode STDIN;
3419   - undef $/;
3420   - my $a = <STDIN>;
3421   - my @ch = split(//, $a);
3422   - map { printf("%08b", ord($_)) } @ch;
3423   - print "\n";
3424   -
3425   -.. _ref.object-and-xref-streams:
3426   -
3427   -Object and Cross-Reference Streams
3428   -==================================
3429   -
3430   -This chapter provides information about the implementation of object
3431   -stream and cross-reference stream support in qpdf.
3432   -
3433   -.. _ref.object-streams:
3434   -
3435   -Object Streams
3436   ---------------
3437   -
3438   -Object streams can contain any regular object except the following:
3439   -
3440   -- stream objects
3441   -
3442   -- objects with generation > 0
3443   -
3444   -- the encryption dictionary
3445   -
3446   -- objects containing the /Length of another stream
3447   -
3448   -In addition, Adobe reader (at least as of version 8.0.0) appears to not
3449   -be able to handle having the document catalog appear in an object stream
3450   -if the file is encrypted, though this is not specifically disallowed by
3451   -the specification.
3452   -
3453   -There are additional restrictions for linearized files. See
3454   -:ref:`ref.object-streams-linearization` for details.
3455   -
3456   -The PDF specification refers to objects in object streams as "compressed
3457   -objects" regardless of whether the object stream is compressed.
3458   -
3459   -The generation number of every object in an object stream must be zero.
3460   -It is possible to delete and replace an object in an object stream with
3461   -a regular object.
3462   -
3463   -The object stream dictionary has the following keys:
3464   -
3465   -- ``/N``: number of objects
3466   -
3467   -- ``/First``: byte offset of first object
3468   -
3469   -- ``/Extends``: indirect reference to stream that this extends
3470   -
3471   -Stream collections are formed with ``/Extends``. They must form a
3472   -directed acyclic graph. These can be used for semantic information and
3473   -are not meaningful to the PDF document's syntactic structure. Although
3474   -qpdf preserves stream collections, it never generates them and doesn't
3475   -make use of this information in any way.
3476   -
3477   -The specification recommends limiting the number of objects in object
3478   -stream for efficiency in reading and decoding. Acrobat 6 uses no more
3479   -than 100 objects per object stream for linearized files and no more 200
3480   -objects per stream for non-linearized files. ``QPDFWriter``, in object
3481   -stream generation mode, never puts more than 100 objects in an object
3482   -stream.
3483   -
3484   -Object stream contents consists of *N* pairs of integers, each of which
3485   -is the object number and the byte offset of the object relative to the
3486   -first object in the stream, followed by the objects themselves,
3487   -concatenated.
3488   -
3489   -.. _ref.xref-streams:
3490   -
3491   -Cross-Reference Streams
3492   ------------------------
3493   -
3494   -For non-hybrid files, the value following ``startxref`` is the byte
3495   -offset to the xref stream rather than the word ``xref``.
3496   -
3497   -For hybrid files (files containing both xref tables and cross-reference
3498   -streams), the xref table's trailer dictionary contains the key
3499   -``/XRefStm`` whose value is the byte offset to a cross-reference stream
3500   -that supplements the xref table. A PDF 1.5-compliant application should
3501   -read the xref table first. Then it should replace any object that it has
3502   -already seen with any defined in the xref stream. Then it should follow
3503   -any ``/Prev`` pointer in the original xref table's trailer dictionary.
3504   -The specification is not clear about what should be done, if anything,
3505   -with a ``/Prev`` pointer in the xref stream referenced by an xref table.
3506   -The ``QPDF`` class ignores it, which is probably reasonable since, if
3507   -this case were to appear for any sensible PDF file, the previous xref
3508   -table would probably have a corresponding ``/XRefStm`` pointer of its
3509   -own. For example, if a hybrid file were appended, the appended section
3510   -would have its own xref table and ``/XRefStm``. The appended xref table
3511   -would point to the previous xref table which would point the
3512   -``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to
3513   -it.
3514   -
3515   -Since xref streams must be read very early, they may not be encrypted,
3516   -and the may not contain indirect objects for keys required to read them,
3517   -which are these:
3518   -
3519   -- ``/Type``: value ``/XRef``
3520   -
3521   -- ``/Size``: value *n+1*: where *n* is highest object number (same as
3522   - ``/Size`` in the trailer dictionary)
3523   -
3524   -- ``/Index`` (optional): value
3525   - ``[:samp:`{n count}` ...]`` used to determine
3526   - which objects' information is stored in this stream. The default is
3527   - ``[0 /Size]``.
3528   -
3529   -- ``/Prev``: value :samp:`{offset}`: byte
3530   - offset of previous xref stream (same as ``/Prev`` in the trailer
3531   - dictionary)
3532   -
3533   -- ``/W [...]``: sizes of each field in the xref table
3534   -
3535   -The other fields in the xref stream, which may be indirect if desired,
3536   -are the union of those from the xref table's trailer dictionary.
3537   -
3538   -.. _ref.xref-stream-data:
3539   -
3540   -Cross-Reference Stream Data
3541   -~~~~~~~~~~~~~~~~~~~~~~~~~~~
3542   -
3543   -The stream data is binary and encoded in big-endian byte order. Entries
3544   -are concatenated, and each entry has a length equal to the total of the
3545   -entries in ``/W`` above. Each entry consists of one or more fields, the
3546   -first of which is the type of the field. The number of bytes for each
3547   -field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
3548   -is omitted and has the default value. The default value for the field
3549   -type is "``1``". All other default values are "``0``".
3550   -
3551   -PDF 1.5 has three field types:
3552   -
3553   -- 0: for free objects. Format: ``0 obj next-generation``, same as the
3554   - free table in a traditional cross-reference table
3555   -
3556   -- 1: regular non-compressed object. Format: ``1 offset generation``
3557   -
3558   -- 2: for objects in object streams. Format: ``2 object-stream-number
3559   - index``, the number of object stream containing the object and the
3560   - index within the object stream of the object.
3561   -
3562   -It seems standard to have the first entry in the table be ``0 0 0``
3563   -instead of ``0 0 ffff`` if there are no deleted objects.
3564   -
3565   -.. _ref.object-streams-linearization:
3566   -
3567   -Implications for Linearized Files
3568   ----------------------------------
3569   -
3570   -For linearized files, the linearization dictionary, document catalog,
3571   -and page objects may not be contained in object streams.
3572   -
3573   -Objects stored within object streams are given the highest range of
3574   -object numbers within the main and first-page cross-reference sections.
3575   -
3576   -It is okay to use cross-reference streams in place of regular xref
3577   -tables. There are on special considerations.
3578   -
3579   -Hint data refers to object streams themselves, not the objects in the
3580   -streams. Shared object references should also be made to the object
3581   -streams. There are no reference in any hint tables to the object numbers
3582   -of compressed objects (objects within object streams).
3583   -
3584   -When numbering objects, all shared objects within both the first and
3585   -second halves of the linearized files must be numbered consecutively
3586   -after all normal uncompressed objects in that half.
3587   -
3588   -.. _ref.object-stream-implementation:
3589   -
3590   -Implementation Notes
3591   ---------------------
3592   -
3593   -There are three modes for writing object streams:
3594   -:samp:`disable`, :samp:`preserve`, and
3595   -:samp:`generate`. In disable mode, we do not generate
3596   -any object streams, and we also generate an xref table rather than xref
3597   -streams. This can be used to generate PDF files that are viewable with
3598   -older readers. In preserve mode, we write object streams such that
3599   -written object streams contain the same objects and ``/Extends``
3600   -relationships as in the original file. This is equal to disable if the
3601   -file has no object streams. In generate, we create object streams
3602   -ourselves by grouping objects that are allowed in object streams
3603   -together in sets of no more than 100 objects. We also ensure that the
3604   -PDF version is at least 1.5 in generate mode, but we preserve the
3605   -version header in the other modes. The default is
3606   -:samp:`preserve`.
3607   -
3608   -We do not support creation of hybrid files. When we write files, even in
3609   -preserve mode, we will lose any xref tables and merge any appended
3610   -sections.
3611   -
3612   -.. _ref.release-notes:
3613   -
3614   -Release Notes
3615   -=============
3616   -
3617   -For a detailed list of changes, please see the file
3618   -:file:`ChangeLog` in the source distribution.
3619   -
3620   -10.5.0: XXX Month dd, YYYY
3621   - - Library Enhancements
3622   -
3623   - - Since qpdf version 8, using object accessor methods on an
3624   - instance of ``QPDFObjectHandle`` may create warnings if the
3625   - object is not of the expected type. These warnings now have an
3626   - error code of ``qpdf_e_object`` instead of
3627   - ``qpdf_e_damaged_pdf``. Also, comments have been added to
3628   - :file:`QPDFObjectHandle.hh` to explain in more detail what the
3629   - behavior is. See :ref:`ref.object-accessors` for a more in-depth
3630   - discussion.
3631   -
3632   - - Add ``Pl_Buffer::getMallocBuffer()`` to initialize a buffer
3633   - allocated with ``malloc()`` for better cross-language
3634   - interoperability.
3635   -
3636   - - C API Enhancements
3637   -
3638   - - Overhaul error handling for the object handle functions C API.
3639   - Some rare error conditions that would previously have caused a
3640   - crash are now trapped and reported, and the functions that
3641   - generate them return fallback values. See comments in the
3642   - ``ERROR HANDLING`` section of :file:`include/qpdf/qpdf-c.h` for
3643   - details. In particular, exceptions thrown by the underlying C++
3644   - code when calling object accessors are caught and converted into
3645   - errors. The errors can be checked by call ``qpdf_has_error``.
3646   - Use ``qpdf_silence_errors`` to prevent the error from being
3647   - written to stderr.
3648   -
3649   - - Add ``qpdf_get_last_string_length`` to the C API to get the
3650   - length of the last string that was returned. This is needed to
3651   - handle strings that contain embedded null characters.
3652   -
3653   - - Add ``qpdf_oh_is_initialized`` and
3654   - ``qpdf_oh_new_uninitialized`` to the C API to make it possible
3655   - to work with uninitialized objects.
3656   -
3657   - - Add ``qpdf_oh_new_object`` to the C API. This allows you to
3658   - clone an object handle.
3659   -
3660   - - Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``,
3661   - and ``qpdf_replace_object``, exposing the corresponding methods
3662   - in ``QPDF`` and ``QPDFObjectHandle``.
3663   -
3664   - - Add several functions for working with pages. See ``PAGE
3665   - FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.
3666   -
3667   - - Add several functions for working with streams. See ``STREAM
3668   - FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.
3669   -
3670   - - Add ``qpdf_oh_get_type_code`` and ``qpdf_oh_get_type_name``.
3671   -
3672   - - Documentation change
3673   -
3674   - - The documentation sources have been switched from docbook to
3675   - reStructuredText processed with `Sphinx
3676   - <https://sphinx-doc.org>`__. This is mostly transparent (other
3677   - than format change) with the exception that all section links
3678   - have changed. What used to be `#ref.something` is now
3679   - `#something`. A top-to-bottom review of the documentation is
3680   - planned for an upcoming release.
3681   -
3682   -10.4.0: November 16, 2021
3683   - - Handling of Weak Cryptography Algorithms
3684   -
3685   - - From the qpdf CLI, the
3686   - :samp:`--allow-weak-crypto` is now required to
3687   - suppress a warning when explicitly creating PDF files using RC4
3688   - encryption. While qpdf will always retain the ability to read
3689   - and write such files, doing so will require explicit
3690   - acknowledgment moving forward. For qpdf 10.4, this change only
3691   - affects the command-line tool. Starting in qpdf 11, there will
3692   - be small API changes to require explicit acknowledgment in
3693   - those cases as well. For additional information, see :ref:`ref.weak-crypto`.
3694   -
3695   - - Bug Fixes
3696   -
3697   - - Fix potential bounds error when handling shell completion that
3698   - could occur when given bogus input.
3699   -
3700   - - Properly handle overlay/underlay on completely empty pages
3701   - (with no resource dictionary).
3702   -
3703   - - Fix crash that could occur under certain conditions when using
3704   - :samp:`--pages` with files that had form
3705   - fields.
3706   -
3707   - - Library Enhancements
3708   -
3709   - - Make ``QPDF::findPage`` functions public.
3710   -
3711   - - Add methods to ``Pl_Flate`` to be able to receive warnings on
3712   - certain recoverable conditions.
3713   -
3714   - - Add an extra check to the library to detect when foreign
3715   - objects are inserted directly (instead of using
3716   - ``QPDF::copyForeignObject``) at the time of insertion rather
3717   - than when the file is written. Catching the error sooner makes
3718   - it much easier to locate the incorrect code.
3719   -
3720   - - CLI Enhancements
3721   -
3722   - - Improve diagnostics around parsing
3723   - :samp:`--pages` command-line options
3724   -
3725   - - Packaging Changes
3726   -
3727   - - The Windows binary distribution is now built with crypto
3728   - provided by OpenSSL 3.0.
3729   -
3730   -10.3.2: May 8, 2021
3731   - - Bug Fixes
3732   -
3733   - - When generating a file while preserving object streams,
3734   - unreferenced objects are correctly removed unless
3735   - :samp:`--preserve-unreferenced` is specified.
3736   -
3737   - - Library Enhancements
3738   -
3739   - - When adding a page that already exists, make a shallow copy
3740   - instead of throwing an exception. This makes the library
3741   - behavior consistent with the CLI behavior. See
3742   - :file:`ChangeLog` for additional notes.
3743   -
3744   -10.3.1: March 11, 2021
3745   - - Bug Fixes
3746   -
3747   - - Form field copying failed on files where /DR was a direct
3748   - object in the document-level form dictionary.
3749   -
3750   -10.3.0: March 4, 2021
3751   - - Bug Fixes
3752   -
3753   - - The code for handling form fields when copying pages from
3754   - 10.2.0 was not quite right and didn't work in a number of
3755   - situations, such as when the same page was copied multiple
3756   - times or when there were conflicting resource or field names
3757   - across multiple copies. The 10.3.0 code has been much more
3758   - thoroughly tested with more complex cases and with a multitude
3759   - of readers and should be much closer to correct. The 10.2.0
3760   - code worked well enough for page splitting or for copying pages
3761   - with form fields into documents that didn't already have them
3762   - but was still not quite correct in handling of field-level
3763   - resources.
3764   -
3765   - - When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is
3766   - called, existing ``QPDFObjectHandle`` instances no longer point
3767   - to the old objects. The next time they are accessed, they
3768   - automatically notice the change to the underlying object and
3769   - update themselves. This resolves a very longstanding source of
3770   - confusion, albeit in a very rarely used method call.
3771   -
3772   - - Fix form field handling code to look for default appearances,
3773   - quadding, and default resources in the right places. The code
3774   - was not looking for things in the document-level interactive
3775   - form dictionary that it was supposed to be finding there. This
3776   - required adding a few new methods to
3777   - ``QPDFFormFieldObjectHelper``.
3778   -
3779   - - Library Enhancements
3780   -
3781   - - Reworked the code that handles copying annotations and form
3782   - fields during page operations. There were additional methods
3783   - added to the public API from 10.2.0 and a one deprecation of a
3784   - method added in 10.2.0. The majority of the API changes are in
3785   - methods most people would never call and that will hopefully be
3786   - superseded by higher-level interfaces for handling page copies.
3787   - Please see the :file:`ChangeLog` file for
3788   - details.
3789   -
3790   - - The method ``QPDF::numWarnings`` was added so that you can tell
3791   - whether any warnings happened during a specific block of code.
3792   -
3793   -10.2.0: February 23, 2021
3794   - - CLI Behavior Changes
3795   -
3796   - - Operations that work on combining pages are much better about
3797   - protecting form fields. In particular,
3798   - :samp:`--split-pages` and
3799   - :samp:`--pages` now preserve interaction form
3800   - functionality by copying the relevant form field information
3801   - from the original files. Additionally, if you use
3802   - :samp:`--pages` to select only some pages from
3803   - the original input file, unused form fields are removed, which
3804   - prevents lots of unused annotations from being retained.
3805   -
3806   - - By default, :command:`qpdf` no longer allows
3807   - creation of encrypted PDF files whose user password is
3808   - non-empty and owner password is empty when a 256-bit key is in
3809   - use. The :samp:`--allow-insecure` option,
3810   - specified inside the :samp:`--encrypt` options,
3811   - allows creation of such files. Behavior changes in the CLI are
3812   - avoided when possible, but an exception was made here because
3813   - this is security-related. qpdf must always allow creation of
3814   - weird files for testing purposes, but it should not default to
3815   - letting users unknowingly create insecure files.
3816   -
3817   - - Library Behavior Changes
3818   -
3819   - - Note: the changes in this section cause differences in output
3820   - in some cases. These differences change the syntax of the PDF
3821   - but do not change the semantics (meaning). I make a strong
3822   - effort to avoid gratuitous changes in qpdf's output so that
3823   - qpdf changes don't break people's tests. In this case, the
3824   - changes significantly improve the readability of the generated
3825   - PDF and don't affect any output that's generated by simple
3826   - transformation. If you are annoyed by having to update test
3827   - files, please rest assured that changes like this have been and
3828   - will continue to be rare events.
3829   -
3830   - - ``QPDFObjectHandle::newUnicodeString`` now uses whichever of
3831   - ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all
3832   - the characters in the string. This reduces needless encoding in
3833   - UTF-16 of strings that can be encoded in ASCII. This change may
3834   - cause qpdf to generate different output than before when form
3835   - field values are set using ``QPDFFormFieldObjectHelper`` but
3836   - does not change the meaning of the output.
3837   -
3838   - - The code that places form XObjects and also the code that
3839   - flattens rotations trim trailing zeroes from real numbers that
3840   - they calculate. This causes slight (but semantically
3841   - equivalent) differences in generated appearance streams and
3842   - form XObject invocations in overlay/underlay code or in user
3843   - code that calls the methods that place form XObjects on a page.
3844   -
3845   - - CLI Enhancements
3846   -
3847   - - Add new command line options for listing, saving, adding,
3848   - removing, and and copying file attachments. See :ref:`ref.attachments` for details.
3849   -
3850   - - Page splitting and merging operations, as well as
3851   - :samp:`--flatten-rotation`, are better behaved
3852   - with respect to annotations and interactive form fields. In
3853   - most cases, interactive form field functionality and proper
3854   - formatting and functionality of annotations is preserved by
3855   - these operations. There are still some cases that aren't
3856   - perfect, such as when functionality of annotations depends on
3857   - document-level data that qpdf doesn't yet understand or when
3858   - there are problems with referential integrity among form fields
3859   - and annotations (e.g., when a single form field object or its
3860   - associated annotations are shared across multiple pages, a case
3861   - that is out of spec but that works in most viewers anyway).
3862   -
3863   - - The option
3864   - :samp:`--password-file={filename}`
3865   - can now be used to read the decryption password from a file.
3866   - You can use ``-`` as the file name to read the password from
3867   - standard input. This is an easier/more obvious way to read
3868   - passwords from files or standard input than using
3869   - :samp:`@file` for this purpose.
3870   -
3871   - - Add some information about attachments to the json output, and
3872   - added ``attachments`` as an additional json key. The
3873   - information included here is limited to the preferred name and
3874   - content stream and a reference to the file spec object. This is
3875   - enough detail for clients to avoid the hassle of navigating a
3876   - name tree and provides what is needed for basic enumeration and
3877   - extraction of attachments. More detailed information can be
3878   - obtained by following the reference to the file spec object.
3879   -
3880   - - Add numeric option to :samp:`--collate`. If
3881   - :samp:`--collate={n}`
3882   - is given, take pages in groups of
3883   - :samp:`{n}` from the given files.
3884   -
3885   - - It is now valid to provide :samp:`--rotate=0`
3886   - to clear rotation from a page.
3887   -
3888   - - Library Enhancements
3889   -
3890   - - This release includes numerous additions to the API. Not all
3891   - changes are listed here. Please see the
3892   - :file:`ChangeLog` file in the source
3893   - distribution for a comprehensive list. Highlights appear below.
3894   -
3895   - - Add ``QPDFObjectHandle::ditems()`` and
3896   - ``QPDFObjectHandle::aitems()`` that enable C++-style iteration,
3897   - including range-for iteration, over dictionary and array
3898   - QPDFObjectHandles. See comments in
3899   - :file:`include/qpdf/QPDFObjectHandle.hh`
3900   - and
3901   - :file:`examples/pdf-name-number-tree.cc`
3902   - for details.
3903   -
3904   - - Add ``QPDFObjectHandle::copyStream`` for making a copy of a
3905   - stream within the same ``QPDF`` instance.
3906   -
3907   - - Add new helper classes for supporting file attachments, also
3908   - known as embedded files. New classes are
3909   - ``QPDFEmbeddedFileDocumentHelper``,
3910   - ``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``.
3911   - See their respective headers for details and
3912   - :file:`examples/pdf-attach-file.cc` for an
3913   - example.
3914   -
3915   - - Add a version of ``QPDFObjectHandle::parse`` that takes a
3916   - ``QPDF`` pointer as context so that it can parse strings
3917   - containing indirect object references. This is illustrated in
3918   - :file:`examples/pdf-attach-file.cc`.
3919   -
3920   - - Re-implement ``QPDFNameTreeObjectHelper`` and
3921   - ``QPDFNumberTreeObjectHelper`` to be more efficient, add an
3922   - iterator-based API, give them the capability to repair broken
3923   - trees, and create methods for modifying the trees. With this
3924   - change, qpdf has a robust read/write implementation of name and
3925   - number trees.
3926   -
3927   - - Add new versions of ``QPDFObjectHandle::replaceStreamData``
3928   - that take ``std::function`` objects for cases when you need
3929   - something between a static string and a full-fledged
3930   - StreamDataProvider. Using this with ``QUtil::file_provider`` is
3931   - a very easy way to create a stream from the contents of a file.
3932   -
3933   - - The ``QPDFMatrix`` class, formerly a private, internal class,
3934   - has been added to the public API. See
3935   - :file:`include/qpdf/QPDFMatrix.hh` for
3936   - details. This class is for working with transformation
3937   - matrices. Some methods in ``QPDFPageObjectHelper`` make use of
3938   - this to make information about transformation matrices
3939   - available. For an example, see
3940   - :file:`examples/pdf-overlay-page.cc`.
3941   -
3942   - - Several new methods were added to
3943   - ``QPDFAcroFormDocumentHelper`` for adding, removing, getting
3944   - information about, and enumerating form fields.
3945   -
3946   - - Add method
3947   - ``QPDFAcroFormDocumentHelper::transformAnnotations``, which
3948   - applies a transformation to each annotation on a page.
3949   -
3950   - - Add ``QPDFPageObjectHelper::copyAnnotations``, which copies
3951   - annotations and, if applicable, associated form fields, from
3952   - one page to another, possibly transforming the rectangles.
3953   -
3954   - - Build Changes
3955   -
3956   - - A C++-14 compiler is now required to build qpdf. There is no
3957   - intention to require anything newer than that for a while.
3958   - C++-14 includes modest enhancements to C++-11 and appears to be
3959   - supported about as widely as C++-11.
3960   -
3961   - - Bug Fixes
3962   -
3963   - - The :samp:`--flatten-rotation` option applies
3964   - transformations to any annotations that may be on the page.
3965   -
3966   - - If a form XObject lacks a resources dictionary, consider any
3967   - names in that form XObject to be referenced from the containing
3968   - page. This is compliant with older PDF versions. Also detect if
3969   - any form XObjects have any unresolved names and, if so, don't
3970   - remove unreferenced resources from them or from the page that
3971   - contains them. Unfortunately this has the side effect of
3972   - preventing removal of unreferenced resources in some cases
3973   - where names appear that don't refer to resources, such as with
3974   - tagged PDF. This is a bit of a corner case that is not likely
3975   - to cause a significant problem in practice, but the only side
3976   - effect would be lack of removal of shared resources. A future
3977   - version of qpdf may be more sophisticated in its detection of
3978   - names that refer to resources.
3979   -
3980   - - Properly handle strings if they appear in inline image
3981   - dictionaries while externalizing inline images.
3982   -
3983   -10.1.0: January 5, 2021
3984   - - CLI Enhancements
3985   -
3986   - - Add :samp:`--flatten-rotation` command-line
3987   - option, which causes all pages that are rotated using
3988   - parameters in the page's dictionary to instead be identically
3989   - rotated in the page's contents. The change is not user-visible
3990   - for compliant PDF readers but can be used to work around broken
3991   - PDF applications that don't properly handle page rotation.
3992   -
3993   - - Library Enhancements
3994   -
3995   - - Support for user-provided (pluggable, modular) stream filters.
3996   - It is now possible to derive a class from ``QPDFStreamFilter``
3997   - and register it with ``QPDF`` so that regular library methods,
3998   - including those used by ``QPDFWriter``, can decode streams with
3999   - filters not directly supported by the library. The example
4000   - :file:`examples/pdf-custom-filter.cc`
4001   - illustrates how to use this capability.
4002   -
4003   - - Add methods to ``QPDFPageObjectHelper`` to iterate through
4004   - XObjects on a page or form XObjects, possibly recursing into
4005   - nested form XObjects: ``forEachXObject``, ``ForEachImage``,
4006   - ``forEachFormXObject``.
4007   -
4008   - - Enhance several methods in ``QPDFPageObjectHelper`` to work
4009   - with form XObjects as well as pages, as noted in comments. See
4010   - :file:`ChangeLog` for a full list.
4011   -
4012   - - Rename some functions in ``QPDFPageObjectHelper``, while
4013   - keeping old names for compatibility:
4014   -
4015   - - ``getPageImages`` to ``getImages``
4016   -
4017   - - ``filterPageContents`` to ``filterContents``
4018   -
4019   - - ``pipePageContents`` to ``pipeContents``
4020   -
4021   - - ``parsePageContents`` to ``parseContents``
4022   -
4023   - - Add method ``QPDFPageObjectHelper::getFormXObjects`` to return
4024   - a map of form XObjects directly on a page or form XObject
4025   -
4026   - - Add new helper methods to ``QPDFObjectHandle``:
4027   - ``isFormXObject``, ``isImage``
4028   -
4029   - - Add the optional ``allow_streams`` parameter
4030   - ``QPDFObjectHandle::makeDirect``. When
4031   - ``QPDFObjectHandle::makeDirect`` is called in this way, it
4032   - preserves references to streams rather than throwing an
4033   - exception.
4034   -
4035   - - Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this
4036   - on a stream prevents ``QPDFWriter`` from attempting to
4037   - uncompress, recompress, or otherwise filter a stream even if it
4038   - could. Developers can use this to protect streams that are
4039   - optimized should be protected from ``QPDFWriter``'s default
4040   - behavior for any other reason.
4041   -
4042   - - Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is
4043   - useful to have for debugging.
4044   -
4045   - - Add method ``QPDFPageObjectHelper::flattenRotation``, which
4046   - replaces a page's ``/Rotate`` keyword by rotating the page
4047   - within the content stream and altering the page's bounding
4048   - boxes so the rendering is the same. This can be used to work
4049   - around buggy PDF readers that can't properly handle page
4050   - rotation.
4051   -
4052   - - C API Enhancements
4053   -
4054   - - Add several new functions to the C API for working with
4055   - objects. These are wrappers around many of the methods in
4056   - ``QPDFObjectHandle``. Their inclusion adds considerable new
4057   - capability to the C API.
4058   -
4059   - - Add ``qpdf_register_progress_reporter`` to the C API,
4060   - corresponding to ``QPDFWriter::registerProgressReporter``.
4061   -
4062   - - Performance Enhancements
4063   -
4064   - - Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object
4065   - for writing, resulting in about an 8% improvement in write
4066   - performance while allowing indirect objects to appear in
4067   - ``/DecodeParms``.
4068   -
4069   - - When extracting pages, the :command:`qpdf` CLI
4070   - only removes unreferenced resources from the pages that are
4071   - being kept, resulting in a significant performance improvement
4072   - when extracting small numbers of pages from large, complex
4073   - documents.
4074   -
4075   - - Bug Fixes
4076   -
4077   - - ``QPDFPageObjectHelper::externalizeInlineImages`` was not
4078   - externalizing images referenced from form XObjects that
4079   - appeared on the page.
4080   -
4081   - - ``QPDFObjectHandle::filterPageContents`` was broken for pages
4082   - with multiple content streams.
4083   -
4084   - - Tweak zsh completion code to behave a little better with
4085   - respect to path completion.
4086   -
4087   -10.0.4: November 21, 2020
4088   - - Bug Fixes
4089   -
4090   - - Fix a handful of integer overflows. This includes cases found
4091   - by fuzzing as well as having qpdf not do range checking on
4092   - unused values in the xref stream.
4093   -
4094   -10.0.3: October 31, 2020
4095   - - Bug Fixes
4096   -
4097   - - The fix to the bug involving copying streams with indirect
4098   - filters was incorrect and introduced a new, more serious bug.
4099   - The original bug has been fixed correctly, as has the bug
4100   - introduced in 10.0.2.
4101   -
4102   -10.0.2: October 27, 2020
4103   - - Bug Fixes
4104   -
4105   - - When concatenating content streams, as with
4106   - :samp:`--coalesce-contents`, there were cases
4107   - in which qpdf would merge two lexical tokens together, creating
4108   - invalid results. A newline is now inserted between merged
4109   - content streams if one is not already present.
4110   -
4111   - - Fix an internal error that could occur when copying foreign
4112   - streams whose stream data had been replaced using a stream data
4113   - provider if those streams had indirect filters or decode
4114   - parameters. This is a rare corner case.
4115   -
4116   - - Ensure that the caller's locale settings do not change the
4117   - results of numeric conversions performed internally by the qpdf
4118   - library. Note that the problem here could only be caused when
4119   - the qpdf library was used programmatically. Using the qpdf CLI
4120   - already ignored the user's locale for numeric conversion.
4121   -
4122   - - Fix several instances in which warnings were not suppressed in
4123   - spite of :samp:`--no-warn` and/or errors or
4124   - warnings were written to standard output rather than standard
4125   - error.
4126   -
4127   - - Fixed a memory leak that could occur under specific
4128   - circumstances when
4129   - :samp:`--object-streams=generate` was used.
4130   -
4131   - - Fix various integer overflows and similar conditions found by
4132   - the OSS-Fuzz project.
4133   -
4134   - - Enhancements
4135   -
4136   - - New option :samp:`--warning-exit-0` causes qpdf
4137   - to exit with a status of ``0`` rather than ``3`` if there are
4138   - warnings but no errors. Combine with
4139   - :samp:`--no-warn` to completely ignore
4140   - warnings.
4141   -
4142   - - Performance improvements have been made to
4143   - ``QPDF::processMemoryFile``.
4144   -
4145   - - The OpenSSL crypto provider produces more detailed error
4146   - messages.
4147   -
4148   - - Build Changes
4149   -
4150   - - The option :samp:`--disable-rpath` is now
4151   - supported by qpdf's :command:`./configure`
4152   - script. Some distributions' packaging standards recommended the
4153   - use of this option.
4154   -
4155   - - Selection of a printf format string for ``long long`` has
4156   - been moved from ``ifdefs`` to an autoconf
4157   - test. If you are using your own build system, you will need to
4158   - provide a value for ``LL_FMT`` in
4159   - :file:`libqpdf/qpdf/qpdf-config.h`, which
4160   - would typically be ``"%lld"`` or, for some Windows compilers,
4161   - ``"%I64d"``.
4162   -
4163   - - Several improvements were made to build-time configuration of
4164   - the OpenSSL crypto provider.
4165   -
4166   - - A nearly stand-alone Linux binary zip file is now included with
4167   - the qpdf release. This is built on an older (but supported)
4168   - Ubuntu LTS release, but would work on most reasonably recent
4169   - Linux distributions. It contains only the executables and
4170   - required shared libraries that would not be present on a
4171   - minimal system. It can be used for including qpdf in a minimal
4172   - environment, such as a docker container. The zip file is also
4173   - known to work as a layer in AWS Lambda.
4174   -
4175   - - QPDF's automated build has been migrated from Azure Pipelines
4176   - to GitHub Actions.
4177   -
4178   - - Windows-specific Changes
4179   -
4180   - - The Windows executables distributed with qpdf releases now use
4181   - the OpenSSL crypto provider by default. The native crypto
4182   - provider is also compiled in and can be selected at runtime
4183   - with the ``QPDF_CRYPTO_PROVIDER`` environment variable.
4184   -
4185   - - Improvements have been made to how a cryptographic provider is
4186   - obtained in the native Windows crypto implementation. However
4187   - mostly this is shadowed by OpenSSL being used by default.
4188   -
4189   -10.0.1: April 9, 2020
4190   - - Bug Fixes
4191   -
4192   - - 10.0.0 introduced a bug in which calling
4193   - ``QPDFObjectHandle::getStreamData`` on a stream that can't be
4194   - filtered was returning the raw data instead of throwing an
4195   - exception. This is now fixed.
4196   -
4197   - - Fix a bug that was preventing qpdf from linking with some
4198   - versions of clang on some platforms.
4199   -
4200   - - Enhancements
4201   -
4202   - - Improve the :file:`pdf-invert-images`
4203   - example to avoid having to load all the images into RAM at the
4204   - same time.
4205   -
4206   -10.0.0: April 6, 2020
4207   - - Performance Enhancements
4208   -
4209   - - The qpdf library and executable should run much faster in this
4210   - version than in the last several releases. Several internal
4211   - library optimizations have been made, and there has been
4212   - improved behavior on page splitting as well. This version of
4213   - qpdf should outperform any of the 8.x or 9.x versions.
4214   -
4215   - - Incompatible API (source-level) Changes (minor)
4216   -
4217   - - The ``QUtil::srandom`` method was removed. It didn't do
4218   - anything unless insecure random numbers were compiled in, and
4219   - they have been off by default for a long time. If you were
4220   - calling it, just remove the call since it wasn't doing anything
4221   - anyway.
4222   -
4223   - - Build/Packaging Changes
4224   -
4225   - - Add a ``openssl`` crypto provider, which is implemented with
4226   - OpenSSL and also works with BoringSSL. Thanks to Dean Scarff
4227   - for this contribution. If you maintain qpdf for a distribution,
4228   - pay special attention to make sure that you are including
4229   - support for the crypto providers you want. Package maintainers
4230   - will have to weigh the advantages of allowing users to pick a
4231   - crypto provider at runtime against the disadvantages of adding
4232   - more dependencies to qpdf.
4233   -
4234   - - Allow qpdf to built on stripped down systems whose C/C++
4235   - libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in
4236   - qpdf's README.md for details. This should be very rare, but it
4237   - is known to be helpful in some embedded environments.
4238   -
4239   - - CLI Enhancements
4240   -
4241   - - Add ``objectinfo`` key to the JSON output. This will be a place
4242   - to put computed metadata or other information about PDF objects
4243   - that are not immediately evident in other ways or that seem
4244   - useful for some other reason. In this version, information is
4245   - provided about each object indicating whether it is a stream
4246   - and, if so, what its length and filters are. Without this, it
4247   - was not possible to tell conclusively from the JSON output
4248   - alone whether or not an object was a stream. Run
4249   - :command:`qpdf --json-help` for details.
4250   -
4251   - - Add new option
4252   - :samp:`--remove-unreferenced-resources` which
4253   - takes ``auto``, ``yes``, or ``no`` as arguments. The new
4254   - ``auto`` mode, which is the default, performs a fast heuristic
4255   - over a PDF file when splitting pages to determine whether the
4256   - expensive process of finding and removing unreferenced
4257   - resources is likely to be of benefit. For most files, this new
4258   - default will result in a significant performance improvement
4259   - for splitting pages. See :ref:`ref.advanced-transformation` for a more detailed
4260   - discussion.
4261   -
4262   - - The :samp:`--preserve-unreferenced-resources`
4263   - is now just a synonym for
4264   - :samp:`--remove-unreferenced-resources=no`.
4265   -
4266   - - If the ``QPDF_EXECUTABLE`` environment variable is set when
4267   - invoking :command:`qpdf --bash-completion` or
4268   - :command:`qpdf --zsh-completion`, the completion
4269   - command that it outputs will refer to qpdf using the value of
4270   - that variable rather than what :command:`qpdf`
4271   - determines its executable path to be. This can be useful when
4272   - wrapping :command:`qpdf` with a script, working
4273   - with a version in the source tree, using an AppImage, or other
4274   - situations where there is some indirection.
4275   -
4276   - - Library Enhancements
4277   -
4278   - - Random number generation is now delegated to the crypto
4279   - provider. The old behavior is still used by the native crypto
4280   - provider. It is still possible to provide your own random
4281   - number generator.
4282   -
4283   - - Add a new version of
4284   - ``QPDFObjectHandle::StreamDataProvider::provideStreamData``
4285   - that accepts the ``suppress_warnings`` and ``will_retry``
4286   - options and allows a success code to be returned. This makes it
4287   - possible to implement a ``StreamDataProvider`` that calls
4288   - ``pipeStreamData`` on another stream and to pass the response
4289   - back to the caller, which enables better error handling on
4290   - those proxied streams.
4291   -
4292   - - Update ``QPDFObjectHandle::pipeStreamData`` to return an
4293   - overall success code that goes beyond whether or not filtered
4294   - data was written successfully. This allows better error
4295   - handling of cases that were not filtering errors. You have to
4296   - call this explicitly. Methods in previously existing APIs have
4297   - the same semantics as before.
4298   -
4299   - - The ``QPDFPageObjectHelper::placeFormXObject`` method now
4300   - allows separate control over whether it should be willing to
4301   - shrink or expand objects to fit them better into the
4302   - destination rectangle. The previous behavior was that shrinking
4303   - was allowed but expansion was not. The previous behavior is
4304   - still the default.
4305   -
4306   - - When calling the C API, any non-zero value passed to a boolean
4307   - parameter is treated as ``TRUE``. Previously only the value
4308   - ``1`` was accepted. This makes the C API behave more like most
4309   - C interfaces and is known to improve compatibility with some
4310   - Windows environments that dynamically load the DLL and call
4311   - functions from it.
4312   -
4313   - - Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only
4314   - top-level dictionary keys or array items. This is unsafe
4315   - because it creates a situation in which changing a lower-level
4316   - item in one object may also change it in another object, but
4317   - for cases in which you *know* you are only inserting or
4318   - replacing top-level items, it is much faster than
4319   - ``QPDFObjectHandle::shallowCopy``.
4320   -
4321   - - Add ``QPDFObjectHandle::filterAsContents``, which filter's a
4322   - stream's data as a content stream. This is useful for parsing
4323   - the contents for form XObjects in the same way as parsing page
4324   - content streams.
4325   -
4326   - - Bug Fixes
4327   -
4328   - - When detecting and removing unreferenced resources during page
4329   - splitting, traverse into form XObjects and handle their
4330   - resources dictionaries as well.
4331   -
4332   - - The same error recovery is applied to streams in other than the
4333   - primary input file when merging or splitting pages.
4334   -
4335   -9.1.1: January 26, 2020
4336   - - Build/Packaging Changes
4337   -
4338   - - The fix-qdf program was converted from perl to C++. As such,
4339   - qpdf no longer has a runtime dependency on perl.
4340   -
4341   - - Library Enhancements
4342   -
4343   - - Added new helper routine ``QUtil::call_main_from_wmain`` which
4344   - converts ``wchar_t`` arguments to UTF-8 encoded strings. This
4345   - is useful for qpdf because library methods expect file names to
4346   - be UTF-8 encoded, even on Windows
4347   -
4348   - - Added new ``QUtil::read_lines_from_file`` methods that take
4349   - ``FILE*`` arguments and that allow preservation of end-of-line
4350   - characters. This also fixes a bug where
4351   - ``QUtil::read_lines_from_file`` wouldn't work properly with
4352   - Unicode filenames.
4353   -
4354   - - CLI Enhancements
4355   -
4356   - - Added options :samp:`--is-encrypted` and
4357   - :samp:`--requires-password` for testing whether
4358   - a file is encrypted or requires a password other than the
4359   - supplied (or empty) password. These communicate via exit
4360   - status, making them useful for shell scripts. They also work on
4361   - encrypted files with unknown passwords.
4362   -
4363   - - Added ``encrypt`` key to JSON options. With the exception of
4364   - the reconstructed user password for older encryption formats,
4365   - this provides the same information as
4366   - :samp:`--show-encryption` but in a consistent,
4367   - parseable format. See output of :command:`qpdf
4368   - --json-help` for details.
4369   -
4370   - - Bug Fixes
4371   -
4372   - - In QDF mode, be sure not to write more than one XRef stream to
4373   - a file, even when
4374   - :samp:`--preserve-unreferenced` is used.
4375   - :command:`fix-qdf` assumes that there is only
4376   - one XRef stream, and that it appears at the end of the file.
4377   -
4378   - - When externalizing inline images, properly handle images whose
4379   - color space is a reference to an object in the page's resource
4380   - dictionary.
4381   -
4382   - - Windows-specific fix for acquiring crypt context with a new
4383   - keyset.
4384   -
4385   -9.1.0: November 17, 2019
4386   - - Build Changes
4387   -
4388   - - A C++-11 compiler is now required to build qpdf.
4389   -
4390   - - A new crypto provider that uses gnutls for crypto functions is
4391   - now available and can be enabled at build time. See :ref:`ref.crypto` for more information about crypto
4392   - providers and :ref:`ref.crypto.build` for specific information about
4393   - the build.
4394   -
4395   - - Library Enhancements
4396   -
4397   - - Incorporate contribution from Masamichi Hosoda to properly
4398   - handle signature dictionaries by not including them in object
4399   - streams, formatting the ``Contents`` key has a hexadecimal
4400   - string, and excluding the ``/Contents`` key from encryption and
4401   - decryption.
4402   -
4403   - - Incorporate contribution from Masamichi Hosoda to provide new
4404   - API calls for getting file-level information about input and
4405   - output files, enabling certain operations on the files at the
4406   - file level rather than the object level. New methods include
4407   - ``QPDF::getXRefTable()``,
4408   - ``QPDFObjectHandle::getParsedOffset()``,
4409   - ``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and
4410   - ``QPDFWriter::getWrittenXRefTable()``.
4411   -
4412   - - Support build-time and runtime selectable crypto providers.
4413   - This includes the addition of new classes
4414   - ``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the
4415   - recognition of the ``QPDF_CRYPTO_PROVIDER`` environment
4416   - variable. Crypto providers are described in depth in :ref:`ref.crypto`.
4417   -
4418   - - CLI Enhancements
4419   -
4420   - - Addition of the :samp:`--show-crypto` option in
4421   - support of selectable crypto providers, as described in :ref:`ref.crypto`.
4422   -
4423   - - Allow ``:even`` or ``:odd`` to be appended to numeric ranges
4424   - for specification of the even or odd pages from among the pages
4425   - specified in the range.
4426   -
4427   - - Fix shell wildcard expansion behavior (``*`` and ``?``) of the
4428   - :command:`qpdf.exe` as built my MSVC.
4429   -
4430   -9.0.2: October 12, 2019
4431   - - Bug Fix
4432   -
4433   - - Fix the name of the temporary file used by
4434   - :samp:`--replace-input` so that it doesn't
4435   - require path splitting and works with paths include
4436   - directories.
4437   -
4438   -9.0.1: September 20, 2019
4439   - - Bug Fixes/Enhancements
4440   -
4441   - - Fix some build and test issues on big-endian systems and
4442   - compilers with characters that are unsigned by default. The
4443   - problems were in build and test only. There were no actual bugs
4444   - in the qpdf library itself relating to endianness or unsigned
4445   - characters.
4446   -
4447   - - When a dictionary has a duplicated key, report this with a
4448   - warning. The behavior of the library in this case is unchanged,
4449   - but the error condition is no longer silently ignored.
4450   -
4451   - - When a form field's display rectangle is erroneously specified
4452   - with inverted coordinates, detect and correct this situation.
4453   - This avoids some form fields from being flipped when flattening
4454   - annotations on files with this condition.
4455   -
4456   -9.0.0: August 31, 2019
4457   - - Incompatible API (source-level) Changes (minor)
4458   -
4459   - - The method ``QUtil::strcasecmp`` has been renamed to
4460   - ``QUtil::str_compare_nocase``. This incompatible change is
4461   - necessary to enable qpdf to build on platforms that define
4462   - ``strcasecmp`` as a macro.
4463   -
4464   - - The ``QPDF::copyForeignObject`` method had an overloaded
4465   - version that took a boolean parameter that was not used. If you
4466   - were using this version, just omit the extra parameter.
4467   -
4468   - - There was a version ``QPDFTokenizer::expectInlineImage`` that
4469   - took no arguments. This version has been removed since it
4470   - caused the tokenizer to return incorrect inline images. A new
4471   - version was added some time ago that produces correct output.
4472   - This is a very low level method that doesn't make sense to call
4473   - outside of qpdf's lexical engine. There are higher level
4474   - methods for tokenizing content streams.
4475   -
4476   - - Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and
4477   - ``QPDFOutlineObjectHelper::getKids`` to return a
4478   - ``std::vector`` instead of a ``std::list`` of
4479   - ``QPDFOutlineObjectHelper`` objects.
4480   -
4481   - - Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This
4482   - function would allow creation of name tokens whose value would
4483   - change when unparsed, which is never the correct behavior.
4484   -
4485   - - CLI Enhancements
4486   -
4487   - - The :samp:`--replace-input` option may be given
4488   - in place of an output file name. This causes qpdf to overwrite
4489   - the input file with the output. See the description of
4490   - :samp:`--replace-input` in :ref:`ref.basic-options` for more details.
4491   -
4492   - - The :samp:`--recompress-flate` instructs
4493   - :command:`qpdf` to recompress streams that are
4494   - already compressed with ``/FlateDecode``. Useful with
4495   - :samp:`--compression-level`.
4496   -
4497   - - The
4498   - :samp:`--compression-level={level}`
4499   - sets the zlib compression level used for any streams compressed
4500   - by ``/FlateDecode``. Most effective when combined with
4501   - :samp:`--recompress-flate`.
4502   -
4503   - - Library Enhancements
4504   -
4505   - - A new namespace ``QIntC``, provided by
4506   - :file:`qpdf/QIntC.hh`, provides safe
4507   - conversion methods between different integer types. These
4508   - conversion methods do range checking to ensure that the cast
4509   - can be performed with no loss of information. Every use of
4510   - ``static_cast`` in the library was inspected to see if it could
4511   - use one of these safe converters instead. See :ref:`ref.casting` for additional details.
4512   -
4513   - - Method ``QPDF::anyWarnings`` tells whether there have been any
4514   - warnings without clearing the list of warnings.
4515   -
4516   - - Method ``QPDF::closeInputSource`` closes or otherwise releases
4517   - the input source. This enables the input file to be deleted or
4518   - renamed.
4519   -
4520   - - New methods have been added to ``QUtil`` for converting back
4521   - and forth between strings and unsigned integers:
4522   - ``uint_to_string``, ``uint_to_string_base``,
4523   - ``string_to_uint``, and ``string_to_ull``.
4524   -
4525   - - New methods have been added to ``QPDFObjectHandle`` that return
4526   - the value of ``Integer`` objects as ``int`` or ``unsigned int``
4527   - with range checking and sensible fallback values, and a new
4528   - method was added to return an unsigned value. This makes it
4529   - easier to write code that is safe from unintentional data loss.
4530   - Functions: ``getUIntValue``, ``getIntValueAsInt``,
4531   - ``getUIntValueAsUInt``.
4532   -
4533   - - When parsing content streams with
4534   - ``QPDFObjectHandle::ParserCallbacks``, in place of the method
4535   - ``handleObject(QPDFObjectHandle)``, the developer may override
4536   - ``handleObject(QPDFObjectHandle, size_t offset, size_t
4537   - length)``. If this method is defined, it will
4538   - be invoked with the object along with its offset and length
4539   - within the overall contents being parsed. Intervening spaces
4540   - and comments are not included in offset and length.
4541   - Additionally, a new method ``contentSize(size_t)`` may be
4542   - implemented. If present, it will be called prior to the first
4543   - call to ``handleObject`` with the total size in bytes of the
4544   - combined contents.
4545   -
4546   - - New methods ``QPDF::userPasswordMatched`` and
4547   - ``QPDF::ownerPasswordMatched`` have been added to enable a
4548   - caller to determine whether the supplied password was the user
4549   - password, the owner password, or both. This information is also
4550   - displayed by :command:`qpdf --show-encryption`
4551   - and :command:`qpdf --check`.
4552   -
4553   - - Static method ``Pl_Flate::setCompressionLevel`` can be called
4554   - to set the zlib compression level globally used by all
4555   - instances of Pl_Flate in deflate mode.
4556   -
4557   - - The method ``QPDFWriter::setRecompressFlate`` can be called to
4558   - tell ``QPDFWriter`` to uncompress and recompress streams
4559   - already compressed with ``/FlateDecode``.
4560   -
4561   - - The underlying implementation of QPDF arrays has been enhanced
4562   - to be much more memory efficient when dealing with arrays with
4563   - lots of nulls. This enables qpdf to use drastically less memory
4564   - for certain types of files.
4565   -
4566   - - When traversing the pages tree, if nodes are encountered with
4567   - invalid types, the types are fixed, and a warning is issued.
4568   -
4569   - - A new helper method ``QUtil::read_file_into_memory`` was added.
4570   -
4571   - - All conditions previously reported by
4572   - ``QPDF::checkLinearization()`` as errors are now presented as
4573   - warnings.
4574   -
4575   - - Name tokens containing the ``#`` character not preceded by two
4576   - hexadecimal digits, which is invalid in PDF 1.2 and above, are
4577   - properly handled by the library: a warning is generated, and
4578   - the name token is properly preserved, even if invalid, in the
4579   - output. See :file:`ChangeLog` for a more
4580   - complete description of this change.
4581   -
4582   - - Bug Fixes
4583   -
4584   - - A small handful of memory issues, assertion failures, and
4585   - unhandled exceptions that could occur on badly mangled input
4586   - files have been fixed. Most of these problems were found by
4587   - Google's OSS-Fuzz project.
4588   -
4589   - - When :command:`qpdf --check` or
4590   - :command:`qpdf --check-linearization` encounters
4591   - a file with linearization warnings but not errors, it now
4592   - properly exits with exit code 3 instead of 2.
4593   -
4594   - - The :samp:`--completion-bash` and
4595   - :samp:`--completion-zsh` options now work
4596   - properly when qpdf is invoked as an AppImage.
4597   -
4598   - - Calling ``QPDFWriter::set*EncryptionParameters`` on a
4599   - ``QPDFWriter`` object whose output filename has not yet been
4600   - set no longer produces a segmentation fault.
4601   -
4602   - - When reading encrypted files, follow the spec more closely
4603   - regarding encryption key length. This allows qpdf to open
4604   - encrypted files in most cases when they have invalid or missing
4605   - /Length keys in the encryption dictionary.
4606   -
4607   - - Build Changes
4608   -
4609   - - On platforms that support it, qpdf now builds with
4610   - :samp:`-fvisibility=hidden`. If you build qpdf
4611   - with your own build system, this is now safe to use. This
4612   - prevents methods that are not part of the public API from being
4613   - exported by the shared library, and makes qpdf's ELF shared
4614   - libraries (used on Linux, MacOS, and most other UNIX flavors)
4615   - behave more like the Windows DLL. Since the DLL already behaves
4616   - in much this way, it is unlikely that there are any methods
4617   - that were accidentally not exported. However, with ELF shared
4618   - libraries, typeinfo for some classes has to be explicitly
4619   - exported. If there are problems in dynamically linked code
4620   - catching exceptions or subclassing, this could be the reason.
4621   - If you see this, please report a bug at
4622   - https://github.com/qpdf/qpdf/issues/.
4623   -
4624   - - QPDF is now compiled with integer conversion and sign
4625   - conversion warnings enabled. Numerous changes were made to the
4626   - library to make this safe.
4627   -
4628   - - QPDF's :command:`make install` target explicitly
4629   - specifies the mode to use when installing files instead of
4630   - relying the user's umask. It was previously doing this for some
4631   - files but not others.
4632   -
4633   - - If :command:`pkg-config` is available, use it to
4634   - locate :file:`libjpeg` and
4635   - :file:`zlib` dependencies, falling back on
4636   - old behavior if unsuccessful.
4637   -
4638   - - Other Notes
4639   -
4640   - - QPDF has been fully integrated into `Google's OSS-Fuzz
4641   - project <https://github.com/google/oss-fuzz>`__. This project
4642   - exercises code with randomly mutated inputs and is great for
4643   - discovering hidden security crashes and security issues.
4644   - Several bugs found by oss-fuzz have already been fixed in qpdf.
4645   -
4646   -8.4.2: May 18, 2019
4647   - This release has just one change: correction of a buffer overrun in
4648   - the Windows code used to open files. Windows users should take this
4649   - update. There are no code changes that affect non-Windows releases.
4650   -
4651   -8.4.1: April 27, 2019
4652   - - Enhancements
4653   -
4654   - - When :command:`qpdf --version` is run, it will
4655   - detect if the qpdf CLI was built with a different version of
4656   - qpdf than the library, which may indicate a problem with the
4657   - installation.
4658   -
4659   - - New option :samp:`--remove-page-labels` will
4660   - remove page labels before generating output. This used to
4661   - happen if you ran :command:`qpdf --empty --pages ..
4662   - --`, but the behavior changed in qpdf 8.3.0. This
4663   - option enables people who were relying on the old behavior to
4664   - get it again.
4665   -
4666   - - New option
4667   - :samp:`--keep-files-open-threshold={count}`
4668   - can be used to override number of files that qpdf will use to
4669   - trigger the behavior of not keeping all files open when merging
4670   - files. This may be necessary if your system allows fewer than
4671   - the default value of 200 files to be open at the same time.
4672   -
4673   - - Bug Fixes
4674   -
4675   - - Handle Unicode characters in filenames on Windows. The changes
4676   - to support Unicode on the CLI in Windows broke Unicode
4677   - filenames for Windows.
4678   -
4679   - - Slightly tighten logic that determines whether an object is a
4680   - page. This should resolve problems in some rare files where
4681   - some non-page objects were passing qpdf's test for whether
4682   - something was a page, thus causing them to be erroneously lost
4683   - during page splitting operations.
4684   -
4685   - - Revert change that included preservation of outlines
4686   - (bookmarks) in :samp:`--split-pages`. The way
4687   - it was implemented in 8.3.0 and 8.4.0 caused a very significant
4688   - degradation of performance for splitting certain files. A
4689   - future release of qpdf may re-introduce the behavior in a more
4690   - performant and also more correct fashion.
4691   -
4692   - - In JSON mode, add missing leading 0 to decimal values between
4693   - -1 and 1 even if not present in the input. The JSON
4694   - specification requires the leading 0. The PDF specification
4695   - does not.
4696   -
4697   -8.4.0: February 1, 2019
4698   - - Command-line Enhancements
4699   -
4700   - - *Non-compatible CLI change:* The qpdf command-line tool
4701   - interprets passwords given at the command-line differently from
4702   - previous releases when the passwords contain non-ASCII
4703   - characters. In some cases, the behavior differs from previous
4704   - releases. For a discussion of the current behavior, please see
4705   - :ref:`ref.unicode-passwords`. The
4706   - incompatibilities are as follows:
4707   -
4708   - - On Windows, qpdf now receives all command-line options as
4709   - Unicode strings if it can figure out the appropriate
4710   - compile/link options. This is enabled at least for MSVC and
4711   - mingw builds. That means that if non-ASCII strings are
4712   - passed to the qpdf CLI in Windows, qpdf will now correctly
4713   - receive them. In the past, they would have either been
4714   - encoded as Windows code page 1252 (also known as "Windows
4715   - ANSI" or as something unintelligible. In almost all cases,
4716   - qpdf is able to properly interpret Unicode arguments now,
4717   - whereas in the past, it would almost never interpret them
4718   - properly. The result is that non-ASCII passwords given to
4719   - the qpdf CLI on Windows now have a much greater chance of
4720   - creating PDF files that can be opened by a variety of
4721   - readers. In the past, usually files encrypted from the
4722   - Windows CLI using non-ASCII passwords would not be readable
4723   - by most viewers. Note that the current version of qpdf is
4724   - able to decrypt files that it previously created using the
4725   - previously supplied password.
4726   -
4727   - - The PDF specification requires passwords to be encoded as
4728   - UTF-8 for 256-bit encryption and with PDF Doc encoding for
4729   - 40-bit or 128-bit encryption. Older versions of qpdf left it
4730   - up to the user to provide passwords with the correct
4731   - encoding. The qpdf CLI now detects when a password is given
4732   - with UTF-8 encoding and automatically transcodes it to what
4733   - the PDF spec requires. While this is almost always the
4734   - correct behavior, it is possible to override the behavior if
4735   - there is some reason to do so. This is discussed in more
4736   - depth in :ref:`ref.unicode-passwords`.
4737   -
4738   - - New options
4739   - :samp:`--externalize-inline-images`,
4740   - :samp:`--ii-min-bytes`, and
4741   - :samp:`--keep-inline-images` control qpdf's
4742   - handling of inline images and possible conversion of them to
4743   - regular images. By default,
4744   - :samp:`--optimize-images` now also applies to
4745   - inline images. These options are discussed in :ref:`ref.advanced-transformation`.
4746   -
4747   - - Add options :samp:`--overlay` and
4748   - :samp:`--underlay` for overlaying or
4749   - underlaying pages of other files onto output pages. See
4750   - :ref:`ref.overlay-underlay` for
4751   - details.
4752   -
4753   - - When opening an encrypted file with a password, if the
4754   - specified password doesn't work and the password contains any
4755   - non-ASCII characters, qpdf will try a number of alternative
4756   - passwords to try to compensate for possible character encoding
4757   - errors. This behavior can be suppressed with the
4758   - :samp:`--suppress-password-recovery` option.
4759   - See :ref:`ref.unicode-passwords` for a full
4760   - discussion.
4761   -
4762   - - Add the :samp:`--password-mode` option to
4763   - fine-tune how qpdf interprets password arguments, especially
4764   - when they contain non-ASCII characters. See :ref:`ref.unicode-passwords` for more information.
4765   -
4766   - - In the :samp:`--pages` option, it is now
4767   - possible to copy the same page more than once from the same
4768   - file without using the previous workaround of specifying two
4769   - different paths to the same file.
4770   -
4771   - - In the :samp:`--pages` option, allow use of "."
4772   - as a shortcut for the primary input file. That way, you can do
4773   - :command:`qpdf in.pdf --pages . 1-2 -- out.pdf`
4774   - instead of having to repeat :file:`in.pdf`
4775   - in the command.
4776   -
4777   - - When encrypting with 128-bit and 256-bit encryption, new
4778   - encryption options :samp:`--assemble`,
4779   - :samp:`--annotate`,
4780   - :samp:`--form`, and
4781   - :samp:`--modify-other` allow more fine-grained
4782   - granularity in configuring options. Before, the
4783   - :samp:`--modify` option only configured certain
4784   - predefined groups of permissions.
4785   -
4786   - - Bug Fixes and Enhancements
4787   -
4788   - - *Potential data-loss bug:* Versions of qpdf between 8.1.0 and
4789   - 8.3.0 had a bug that could cause page splitting and merging
4790   - operations to drop some font or image resources if the PDF
4791   - file's internal structure shared these resource lists across
4792   - pages and if some but not all of the pages in the output did
4793   - not reference all the fonts and images. Using the
4794   - :samp:`--preserve-unreferenced-resources`
4795   - option would work around the incorrect behavior. This bug was
4796   - the result of a typo in the code and a deficiency in the test
4797   - suite. The case that triggered the error was known, just not
4798   - handled properly. This case is now exercised in qpdf's test
4799   - suite and properly handled.
4800   -
4801   - - When optimizing images, detect and refuse to optimize images
4802   - that can't be converted to JPEG because of bit depth or color
4803   - space.
4804   -
4805   - - Linearization and page manipulation APIs now detect and recover
4806   - from files that have duplicate Page objects in the pages tree.
4807   -
4808   - - Using older option
4809   - :samp:`--stream-data=compress` with object
4810   - streams, object streams and xref streams were not compressed.
4811   -
4812   - - When the tokenizer returns inline image tokens, delimiters
4813   - following ``ID`` and ``EI`` operators are no longer excluded.
4814   - This makes it possible to reliably extract the actual image
4815   - data.
4816   -
4817   - - Library Enhancements
4818   -
4819   - - Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to
4820   - convert inline images to regular images.
4821   -
4822   - - Add method ``QUtil::possible_repaired_encodings()`` to generate
4823   - a list of strings that represent other ways the given string
4824   - could have been encoded. This is the method the QPDF CLI uses
4825   - to generate the strings it tries when recovering incorrectly
4826   - encoded Unicode passwords.
4827   -
4828   - - Add new versions of
4829   - ``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow
4830   - more granular setting of permissions bits. See
4831   - :file:`QPDFWriter.hh` for details.
4832   -
4833   - - Add new versions of the transcoders from UTF-8 to single-byte
4834   - coding systems in ``QUtil`` that report success or failure
4835   - rather than just substituting a specified unknown character.
4836   -
4837   - - Add method ``QUtil::analyze_encoding()`` to determine whether a
4838   - string has high-bit characters and is appears to be UTF-16 or
4839   - valid UTF-8 encoding.
4840   -
4841   - - Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to
4842   - copy a new page that is a "shallow copy" of a page. The
4843   - resulting object is an indirect object ready to be passed to
4844   - ``QPDFPageDocumentHelper::addPage()`` for either the original
4845   - ``QPDF`` object or a different one. This is what the
4846   - :command:`qpdf` command-line tool uses to copy
4847   - the same page multiple times from the same file during
4848   - splitting and merging operations.
4849   -
4850   - - Add method ``QPDF::getUniqueId()``, which returns a unique
4851   - identifier for the given QPDF object. The identifier will be
4852   - unique across the life of the application. The returned value
4853   - can be safely used as a map key.
4854   -
4855   - - Add method ``QPDF::setImmediateCopyFrom``. This further
4856   - enhances qpdf's ability to allow a ``QPDF`` object from which
4857   - objects are being copied to go out of scope before the
4858   - destination object is written. If you call this method on a
4859   - ``QPDF`` instances, objects copied *from* this instance will be
4860   - copied immediately instead of lazily. This option uses more
4861   - memory but allows the source object to go out of scope before
4862   - the destination object is written in all cases. See comments in
4863   - :file:`QPDF.hh` for details.
4864   -
4865   - - Add method ``QPDFPageObjectHelper::getAttribute`` for
4866   - retrieving an attribute from the page dictionary taking
4867   - inheritance into consideration, and optionally making a copy if
4868   - your intention is to modify the attribute.
4869   -
4870   - - Fix long-standing limitation of
4871   - ``QPDFPageObjectHelper::getPageImages`` so that it now properly
4872   - reports images from inherited resources dictionaries,
4873   - eliminating the need to call
4874   - ``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in
4875   - this case.
4876   -
4877   - - Add method ``QPDFObjectHandle::getUniqueResourceName`` for
4878   - finding an unused name in a resource dictionary.
4879   -
4880   - - Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for
4881   - generating a form XObject equivalent to a page. The resulting
4882   - object can be used in the same file or copied to another file
4883   - with ``copyForeignObject``. This can be useful for implementing
4884   - underlay, overlay, n-up, thumbnails, or any other functionality
4885   - requiring replication of pages in other contexts.
4886   -
4887   - - Add method ``QPDFPageObjectHelper::placeFormXObject`` for
4888   - generating content stream text that places a given form XObject
4889   - on a page, centered and fit within a specified rectangle. This
4890   - method takes care of computing the proper transformation matrix
4891   - and may optionally compensate for rotation or scaling of the
4892   - destination page.
4893   -
4894   - - Build Improvements
4895   -
4896   - - Add new configure option
4897   - :samp:`--enable-avoid-windows-handle`, which
4898   - causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be
4899   - defined. When defined, qpdf will avoid referencing the Windows
4900   - ``HANDLE`` type, which is disallowed with certain versions of
4901   - the Windows SDK.
4902   -
4903   - - For Windows builds, attempt to determine what options, if any,
4904   - have to be passed to the compiler and linker to enable use of
4905   - ``wmain``. This causes the preprocessor symbol
4906   - ``WINDOWS_WMAIN`` to be defined. If you do your own builds with
4907   - other compilers, you can define this symbol to cause ``wmain``
4908   - to be used. This is needed to allow the Windows
4909   - :command:`qpdf` command to receive Unicode
4910   - command-line options.
4911   -
4912   -8.3.0: January 7, 2019
4913   - - Command-line Enhancements
4914   -
4915   - - Shell completion: you can now use eval :command:`$(qpdf
4916   - --completion-bash)` and eval :command:`$(qpdf
4917   - --completion-zsh)` to enable shell completion for
4918   - bash and zsh.
4919   -
4920   - - Page numbers (also known as page labels) are now preserved when
4921   - merging and splitting files with the
4922   - :samp:`--pages` and
4923   - :samp:`--split-pages` options.
4924   -
4925   - - Bookmarks are partially preserved when splitting pages with the
4926   - :samp:`--split-pages` option. Specifically, the
4927   - outlines dictionary and some supporting metadata are copied
4928   - into the split files. The result is that all bookmarks from the
4929   - original file appear, those that point to pages that are
4930   - preserved work, and those that point to pages that are not
4931   - preserved don't do anything. This is an interim step toward
4932   - proper support for bookmarks in splitting and merging
4933   - operations.
4934   -
4935   - - Page collation: add new option
4936   - :samp:`--collate`. When specified, the
4937   - semantics of :samp:`--pages` change from
4938   - concatenation to collation. See :ref:`ref.page-selection` for examples and discussion.
4939   -
4940   - - Generation of information in JSON format, primarily to
4941   - facilitate use of qpdf from languages other than C++. Add new
4942   - options :samp:`--json`,
4943   - :samp:`--json-key`, and
4944   - :samp:`--json-object` to generate a JSON
4945   - representation of the PDF file. Run :command:`qpdf
4946   - --json-help` to get a description of the JSON
4947   - format. For more information, see :ref:`ref.json`.
4948   -
4949   - - The :samp:`--generate-appearances` flag will
4950   - cause qpdf to generate appearances for form fields if the PDF
4951   - file indicates that form field appearances are out of date.
4952   - This can happen when PDF forms are filled in by a program that
4953   - doesn't know how to regenerate the appearances of the filled-in
4954   - fields.
4955   -
4956   - - The :samp:`--flatten-annotations` flag can be
4957   - used to *flatten* annotations, including form fields.
4958   - Ordinarily, annotations are drawn separately from the page.
4959   - Flattening annotations is the process of combining their
4960   - appearances into the page's contents. You might want to do this
4961   - if you are going to rotate or combine pages using a tool that
4962   - doesn't understand about annotations. You may also want to use
4963   - :samp:`--generate-appearances` when using this
4964   - flag since annotations for outdated form fields are not
4965   - flattened as that would cause loss of information.
4966   -
4967   - - The :samp:`--optimize-images` flag tells qpdf
4968   - to recompresses every image using DCT (JPEG) compression as
4969   - long as the image is not already compressed with lossy
4970   - compression and recompressing the image reduces its size. The
4971   - additional options :samp:`--oi-min-width`,
4972   - :samp:`--oi-min-height`, and
4973   - :samp:`--oi-min-area` prevent recompression of
4974   - images whose width, height, or pixel area (widthย ร—ย height) are
4975   - below a specified threshold.
4976   -
4977   - - The :samp:`--show-object` option can now be
4978   - given as :samp:`--show-object=trailer` to show
4979   - the trailer dictionary.
4980   -
4981   - - Bug Fixes and Enhancements
4982   -
4983   - - QPDF now automatically detects and recovers from dangling
4984   - references. If a PDF file contained an indirect reference to a
4985   - non-existent object, which is valid, when adding a new object
4986   - to the file, it was possible for the new object to take the
4987   - object ID of the dangling reference, thereby causing the
4988   - dangling reference to point to the new object. This case is now
4989   - prevented.
4990   -
4991   - - Fixes to form field setting code: strings are always written in
4992   - UTF-16 format, and checkboxes and radio buttons are handled
4993   - properly with respect to synchronization of values and
4994   - appearance states.
4995   -
4996   - - The ``QPDF::checkLinearization()`` no longer causes the program
4997   - to crash when it detects problems with linearization data.
4998   - Instead, it issues a normal warning or error.
4999   -
5000   - - Ordinarily qpdf treats an argument of the form
5001   - :samp:`@file` to mean that command-line options
5002   - should be read from :file:`file`. Now, if
5003   - :file:`file` does not exist but
5004   - :file:`@file` does, qpdf will treat
5005   - :file:`@file` as a regular option. This
5006   - makes it possible to work more easily with PDF files whose
5007   - names happen to start with the ``@`` character.
5008   -
5009   - - Library Enhancements
5010   -
5011   - - Remove the restriction in most cases that the source QPDF
5012   - object used in a ``QPDF::copyForeignObject`` call has to stick
5013   - around until the destination QPDF is written. The exceptional
5014   - case is when the source stream gets is data using a
5015   - QPDFObjectHandle::StreamDataProvider. For a more in-depth
5016   - discussion, see comments around ``copyForeignObject`` in
5017   - :file:`QPDF.hh`.
5018   -
5019   - - Add new method ``QPDFWriter::getFinalVersion()``, which returns
5020   - the PDF version that will ultimately be written to the final
5021   - file. See comments in :file:`QPDFWriter.hh`
5022   - for some restrictions on its use.
5023   -
5024   - - Add several methods for transcoding strings to some of the
5025   - character sets used in PDF files: ``QUtil::utf8_to_ascii``,
5026   - ``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and
5027   - ``QUtil::utf8_to_utf16``. For the single-byte encodings that
5028   - support only a limited character sets, these methods replace
5029   - unsupported characters with a specified substitute.
5030   -
5031   - - Add new methods to ``QPDFAnnotationObjectHelper`` and
5032   - ``QPDFFormFieldObjectHelper`` for querying flags and
5033   - interpretation of different field types. Define constants in
5034   - :file:`qpdf/Constants.h` to help with
5035   - interpretation of flag values.
5036   -
5037   - - Add new methods
5038   - ``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and
5039   - ``QPDFFormFieldObjectHelper::generateAppearance`` for
5040   - generating appearance streams. See discussion in
5041   - :file:`QPDFFormFieldObjectHelper.hh` for
5042   - limitations.
5043   -
5044   - - Add two new helper functions for dealing with resource
5045   - dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns
5046   - a list of all second-level keys, which correspond to the names
5047   - of resources, and ``QPDFObjectHandle::mergeResources()`` merges
5048   - two resources dictionaries as long as they have non-conflicting
5049   - keys. These methods are useful for certain types of objects
5050   - that resolve resources from multiple places, such as form
5051   - fields.
5052   -
5053   - - Add methods ``QPDFPageDocumentHelper::flattenAnnotations()``
5054   - and
5055   - ``QPDFAnnotationObjectHelper::getPageContentForAppearance()``
5056   - for handling low-level details of annotation flattening.
5057   -
5058   - - Add new helper classes: ``QPDFOutlineDocumentHelper``,
5059   - ``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``,
5060   - ``QPDFNameTreeObjectHelper``, and
5061   - ``QPDFNumberTreeObjectHelper``.
5062   -
5063   - - Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON
5064   - representation of the object. Call ``serialize()`` on the
5065   - result to convert it to a string.
5066   -
5067   - - Add a simple JSON serializer. This is not a complete or
5068   - general-purpose JSON library. It allows assembly and
5069   - serialization of JSON structures with some restrictions, which
5070   - are described in the header file. This is the serializer used
5071   - by qpdf's new JSON representation.
5072   -
5073   - - Add new ``QPDFObjectHandle::Matrix`` class along with a few
5074   - convenience methods for dealing with six-element numerical
5075   - arrays as matrices.
5076   -
5077   - - Add new method ``QPDFObjectHandle::wrapInArray``, which returns
5078   - the object itself if it is an array, or an array containing the
5079   - object otherwise. This is a common construct in PDF. This
5080   - method prevents you from having to explicitly test whether
5081   - something is a single element or an array.
5082   -
5083   - - Build Improvements
5084   -
5085   - - It is no longer necessary to run
5086   - :command:`autogen.sh` to build from a pristine
5087   - checkout. Automatically generated files are now committed so
5088   - that it is possible to build on platforms without autoconf
5089   - directly from a clean checkout of the repository. The
5090   - :command:`configure` script detects if the files
5091   - are out of date when it also determines that the tools are
5092   - present to regenerate them.
5093   -
5094   - - Pull requests and the master branch are now built automatically
5095   - in `Azure
5096   - Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is
5097   - free for open source projects. The build includes Linux, mac,
5098   - Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage
5099   - build. Official qpdf releases are now built with Azure
5100   - Pipelines.
5101   -
5102   - - Notes for Packagers
5103   -
5104   - - A new section has been added to the documentation with notes
5105   - for packagers. Please see :ref:`ref.packaging`.
5106   -
5107   - - The qpdf detects out-of-date automatically generated files. If
5108   - your packaging system automatically refreshes libtool or
5109   - autoconf files, it could cause this check to fail. To avoid
5110   - this problem, pass
5111   - :samp:`--disable-check-autofiles` to
5112   - :command:`configure`.
5113   -
5114   - - If you would like to have qpdf completion enabled
5115   - automatically, you can install completion files in the
5116   - distribution's default location. You can find sample completion
5117   - files to install in the :file:`completions`
5118   - directory.
5119   -
5120   -8.2.1: August 18, 2018
5121   - - Command-line Enhancements
5122   -
5123   - - Add
5124   - :samp:`--keep-files-open={[yn]}`
5125   - to override default determination of whether to keep files open
5126   - when merging. Please see the discussion of
5127   - :samp:`--keep-files-open` in :ref:`ref.basic-options` for additional details.
5128   -
5129   -8.2.0: August 16, 2018
5130   - - Command-line Enhancements
5131   -
5132   - - Add :samp:`--no-warn` option to suppress
5133   - issuing warning messages. If there are any conditions that
5134   - would have caused warnings to be issued, the exit status is
5135   - still 3.
5136   -
5137   - - Bug Fixes and Optimizations
5138   -
5139   - - Performance fix: optimize page merging operation to avoid
5140   - unnecessary open/close calls on files being merged. This solves
5141   - a dramatic slow-down that was observed when merging certain
5142   - types of files.
5143   -
5144   - - Optimize how memory was used for the TIFF predictor,
5145   - drastically improving performance and memory usage for files
5146   - containing high-resolution images compressed with Flate using
5147   - the TIFF predictor.
5148   -
5149   - - Bug fix: end of line characters were not properly handled
5150   - inside strings in some cases.
5151   -
5152   - - Bug fix: using :samp:`--progress` on very small
5153   - files could cause an infinite loop.
5154   -
5155   - - API enhancements
5156   -
5157   - - Add new class ``QPDFSystemError``, derived from
5158   - ``std::runtime_error``, which is now thrown by
5159   - ``QUtil::throw_system_error``. This enables the triggering
5160   - ``errno`` value to be retrieved.
5161   -
5162   - - Add ``ClosedFileInputSource::stayOpen`` method, enabling a
5163   - ``ClosedFileInputSource`` to stay open during manually
5164   - indicated periods of high activity, thus reducing the overhead
5165   - of frequent open/close operations.
5166   -
5167   - - Build Changes
5168   -
5169   - - For the mingw builds, change the name of the DLL import library
5170   - from :file:`libqpdf.a` to
5171   - :file:`libqpdf.dll.a` to more accurately
5172   - reflect that it is an import library rather than a static
5173   - library. This potentially clears the way for supporting a
5174   - static library in the future, though presently, the qpdf
5175   - Windows build only builds the DLL and executables.
5176   -
5177   -8.1.0: June 23, 2018
5178   - - Usability Improvements
5179   -
5180   - - When splitting files, qpdf detects fonts and images that the
5181   - document metadata claims are referenced from a page but are not
5182   - actually referenced and omits them from the output file. This
5183   - change can cause a significant reduction in the size of split
5184   - PDF files for files created by some software packages. In some
5185   - cases, it can also make page splitting slower. Prior versions
5186   - of qpdf would believe the document metadata and sometimes
5187   - include all the images from all the other pages even though the
5188   - pages were no longer present. In the unlikely event that the
5189   - old behavior should be desired, or if you have a case where
5190   - page splitting is very slow, the old behavior (and speed) can
5191   - be enabled by specifying
5192   - :samp:`--preserve-unreferenced-resources`. For
5193   - additional details, please see :ref:`ref.advanced-transformation`.
5194   -
5195   - - When merging multiple PDF files, qpdf no longer leaves all the
5196   - files open. This makes it possible to merge numbers of files
5197   - that may exceed the operating system's limit for the maximum
5198   - number of open files.
5199   -
5200   - - The :samp:`--rotate` option's syntax has been
5201   - extended to make the page range optional. If you specify
5202   - :samp:`--rotate={angle}`
5203   - without specifying a page range, the rotation will be applied
5204   - to all pages. This can be especially useful for adjusting a PDF
5205   - created from a multi-page document that was scanned upside
5206   - down.
5207   -
5208   - - When merging multiple files, the
5209   - :samp:`--verbose` option now prints information
5210   - about each file as it operates on that file.
5211   -
5212   - - When the :samp:`--progress` option is
5213   - specified, qpdf will print a running indicator of its best
5214   - guess at how far through the writing process it is. Note that,
5215   - as with all progress meters, it's an approximation. This option
5216   - is implemented in a way that makes it useful for software that
5217   - uses the qpdf library; see API Enhancements below.
5218   -
5219   - - Bug Fixes
5220   -
5221   - - Properly decrypt files that use revision 3 of the standard
5222   - security handler but use 40 bit keys (even though revision 3
5223   - supports 128-bit keys).
5224   -
5225   - - Limit depth of nested data structures to prevent crashes from
5226   - certain types of malformed (malicious) PDFs.
5227   -
5228   - - In "newline before endstream" mode, insert the required extra
5229   - newline before the ``endstream`` at the end of object streams.
5230   - This one case was previously omitted.
5231   -
5232   - - API Enhancements
5233   -
5234   - - The first round of higher level "helper" interfaces has been
5235   - introduced. These are designed to provide a more convenient way
5236   - of interacting with certain document features than using
5237   - ``QPDFObjectHandle`` directly. For details on helpers, see
5238   - :ref:`ref.helper-classes`. Specific additional
5239   - interfaces are described below.
5240   -
5241   - - Add two new document helper classes: ``QPDFPageDocumentHelper``
5242   - for working with pages, and ``QPDFAcroFormDocumentHelper`` for
5243   - working with interactive forms. No old methods have been
5244   - removed, but ``QPDFPageDocumentHelper`` is now the preferred
5245   - way to perform operations on pages rather than calling the old
5246   - methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments
5247   - in the header files direct you to the new interfaces. Please
5248   - see the header files and :file:`ChangeLog`
5249   - for additional details.
5250   -
5251   - - Add three new object helper class: ``QPDFPageObjectHelper`` for
5252   - pages, ``QPDFFormFieldObjectHelper`` for interactive form
5253   - fields, and ``QPDFAnnotationObjectHelper`` for annotations. All
5254   - three classes are fairly sparse at the moment, but they have
5255   - some useful, basic functionality.
5256   -
5257   - - A new example program
5258   - :file:`examples/pdf-set-form-values.cc` has
5259   - been added that illustrates use of the new document and object
5260   - helpers.
5261   -
5262   - - The method ``QPDFWriter::registerProgressReporter`` has been
5263   - added. This method allows you to register a function that is
5264   - called by ``QPDFWriter`` to update your idea of the percentage
5265   - it thinks it is through writing its output. Client programs can
5266   - use this to implement reasonably accurate progress meters. The
5267   - :command:`qpdf` command line tool uses this to
5268   - implement its :samp:`--progress` option.
5269   -
5270   - - New methods ``QPDFObjectHandle::newUnicodeString`` and
5271   - ``QPDFObject::unparseBinary`` have been added to allow for more
5272   - convenient creation of strings that are explicitly encoded
5273   - using big-endian UTF-16. This is useful for creating strings
5274   - that appear outside of content streams, such as labels, form
5275   - fields, outlines, document metadata, etc.
5276   -
5277   - - A new class ``QPDFObjectHandle::Rectangle`` has been added to
5278   - ease working with PDF rectangles, which are just arrays of four
5279   - numeric values.
5280   -
5281   -8.0.2: March 6, 2018
5282   - - When a loop is detected while following cross reference streams or
5283   - tables, treat this as damage instead of silently ignoring the
5284   - previous table. This prevents loss of otherwise recoverable data
5285   - in some damaged files.
5286   -
5287   - - Properly handle pages with no contents.
5288   -
5289   -8.0.1: March 4, 2018
5290   - - Disregard data check errors when uncompressing ``/FlateDecode``
5291   - streams. This is consistent with most other PDF readers and allows
5292   - qpdf to recover data from another class of malformed PDF files.
5293   -
5294   - - On the command line when specifying page ranges, support preceding
5295   - a page number by "r" to indicate that it should be counted from
5296   - the end. For example, the range ``r3-r1`` would indicate the last
5297   - three pages of a document.
5298   -
5299   -8.0.0: February 25, 2018
5300   - - Packaging and Distribution Changes
5301   -
5302   - - QPDF is now distributed as an
5303   - `AppImage <https://appimage.org/>`__ in addition to all the
5304   - other ways it is distributed. The AppImage can be found in the
5305   - download area with the other packages. Thanks to Kurt Pfeifle
5306   - and Simon Peter for their contributions.
5307   -
5308   - - Bug Fixes
5309   -
5310   - - ``QPDFObjectHandle::getUTF8Val`` now properly treats
5311   - non-Unicode strings as encoded with PDF Doc Encoding.
5312   -
5313   - - Improvements to handling of objects in PDF files that are not
5314   - of the expected type. In most cases, qpdf will be able to warn
5315   - for such cases rather than fail with an exception. Previous
5316   - versions of qpdf would sometimes fail with errors such as
5317   - "operation for dictionary object attempted on object of wrong
5318   - type". This situation should be mostly or entirely eliminated
5319   - now.
5320   -
5321   - - Enhancements to the :command:`qpdf` Command-line
5322   - Tool. All new options listed here are documented in more detail in
5323   - :ref:`ref.using`.
5324   -
5325   - - The option
5326   - :samp:`--linearize-pass1={file}`
5327   - has been added for debugging qpdf's linearization code.
5328   -
5329   - - The option :samp:`--coalesce-contents` can be
5330   - used to combine content streams of a page whose contents are an
5331   - array of streams into a single stream.
5332   -
5333   - - API Enhancements. All new API calls are documented in their
5334   - respective classes' header files. There are no non-compatible
5335   - changes to the API.
5336   -
5337   - - Add function ``qpdf_check_pdf`` to the C API. This function
5338   - does basic checking that is a subset of what :command:`qpdf
5339   - --check` performs.
5340   -
5341   - - Major enhancements to the lexical layer of qpdf. For a complete
5342   - list of enhancements, please refer to the
5343   - :file:`ChangeLog` file. Most of the changes
5344   - result in improvements to qpdf's ability handle erroneous
5345   - files. It is also possible for programs to handle whitespace,
5346   - comments, and inline images as tokens.
5347   -
5348   - - New API for working with PDF content streams at a lexical
5349   - level. The new class ``QPDFObjectHandle::TokenFilter`` allows
5350   - the developer to provide token handlers. Token filters can be
5351   - used with several different methods in ``QPDFObjectHandle`` as
5352   - well as with a lower-level interface. See comments in
5353   - :file:`QPDFObjectHandle.hh` as well as the
5354   - new examples
5355   - :file:`examples/pdf-filter-tokens.cc` and
5356   - :file:`examples/pdf-count-strings.cc` for
5357   - details.
5358   -
5359   -7.1.1: February 4, 2018
5360   - - Bug fix: files whose /ID fields were other than 16 bytes long can
5361   - now be properly linearized
5362   -
5363   - - A few compile and link issues have been corrected for some
5364   - platforms.
5365   -
5366   -7.1.0: January 14, 2018
5367   - - PDF files contain streams that may be compressed with various
5368   - compression algorithms which, in some cases, may be enhanced by
5369   - various predictor functions. Previously only the PNG up predictor
5370   - was supported. In this version, all the PNG predictors as well as
5371   - the TIFF predictor are supported. This increases the range of
5372   - files that qpdf is able to handle.
5373   -
5374   - - QPDF now allows a raw encryption key to be specified in place of a
5375   - password when opening encrypted files, and will optionally display
5376   - the encryption key used by a file. This is a non-standard
5377   - operation, but it can be useful in certain situations. Please see
5378   - the discussion of :samp:`--password-is-hex-key` in
5379   - :ref:`ref.basic-options` or the comments around
5380   - ``QPDF::setPasswordIsHexKey`` in
5381   - :file:`QPDF.hh` for additional details.
5382   -
5383   - - Bug fix: numbers ending with a trailing decimal point are now
5384   - properly recognized as numbers.
5385   -
5386   - - Bug fix: when building qpdf from source on some platforms
5387   - (especially MacOS), the build could get confused by older versions
5388   - of qpdf installed on the system. This has been corrected.
5389   -
5390   -7.0.0: September 15, 2017
5391   - - Packaging and Distribution Changes
5392   -
5393   - - QPDF's primary license is now `version 2.0 of the Apache
5394   - License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather
5395   - than version 2.0 of the Artistic License. You may still, at
5396   - your option, consider qpdf to be licensed with version 2.0 of
5397   - the Artistic license.
5398   -
5399   - - QPDF no longer has a dependency on the PCRE (Perl-Compatible
5400   - Regular Expression) library. QPDF now has an added dependency
5401   - on the JPEG library.
5402   -
5403   - - Bug Fixes
5404   -
5405   - - This release contains many bug fixes for various infinite
5406   - loops, memory leaks, and other memory errors that could be
5407   - encountered with specially crafted or otherwise erroneous PDF
5408   - files.
5409   -
5410   - - New Features
5411   -
5412   - - QPDF now supports reading and writing streams encoded with JPEG
5413   - or RunLength encoding. Library API enhancements and
5414   - command-line options have been added to control this behavior.
5415   - See command-line options
5416   - :samp:`--compress-streams` and
5417   - :samp:`--decode-level` and methods
5418   - ``QPDFWriter::setCompressStreams`` and
5419   - ``QPDFWriter::setDecodeLevel``.
5420   -
5421   - - QPDF is much better at recovering from broken files. In most
5422   - cases, qpdf will skip invalid objects and will preserve broken
5423   - stream data by not attempting to filter broken streams. QPDF is
5424   - now able to recover or at least not crash on dozens of broken
5425   - test files I have received over the past few years.
5426   -
5427   - - Page rotation is now supported and accessible from both the
5428   - library and the command line.
5429   -
5430   - - ``QPDFWriter`` supports writing files in a way that preserves
5431   - PCLm compliance in support of driverless printing. This is very
5432   - specialized and is only useful to applications that already
5433   - know how to create PCLm files.
5434   -
5435   - - Enhancements to the :command:`qpdf` Command-line
5436   - Tool. All new options listed here are documented in more detail in
5437   - :ref:`ref.using`.
5438   -
5439   - - Command-line arguments can now be read from files or standard
5440   - input using ``@file`` or ``@-`` syntax. Please see :ref:`ref.invocation`.
5441   -
5442   - - :samp:`--rotate`: request page rotation
5443   -
5444   - - :samp:`--newline-before-endstream`: ensure that
5445   - a newline appears before every ``endstream`` keyword in the
5446   - file; used to prevent qpdf from breaking PDF/A compliance on
5447   - already compliant files.
5448   -
5449   - - :samp:`--preserve-unreferenced`: preserve
5450   - unreferenced objects in the input PDF
5451   -
5452   - - :samp:`--split-pages`: break output into chunks
5453   - with fixed numbers of pages
5454   -
5455   - - :samp:`--verbose`: print the name of each
5456   - output file that is created
5457   -
5458   - - :samp:`--compress-streams` and
5459   - :samp:`--decode-level` replace
5460   - :samp:`--stream-data` for improving granularity
5461   - of controlling compression and decompression of stream data.
5462   - The :samp:`--stream-data` option will remain
5463   - available.
5464   -
5465   - - When running :command:`qpdf --check` with other
5466   - options, checks are always run first. This enables qpdf to
5467   - perform its full recovery logic before outputting other
5468   - information. This can be especially useful when manually
5469   - recovering broken files, looking at qpdf's regenerated cross
5470   - reference table, or other similar operations.
5471   -
5472   - - Process :command:`--pages` earlier so that other
5473   - options like :samp:`--show-pages` or
5474   - :samp:`--split-pages` can operate on the file
5475   - after page splitting/merging has occurred.
5476   -
5477   - - API Changes. All new API calls are documented in their respective
5478   - classes' header files.
5479   -
5480   - - ``QPDFObjectHandle::rotatePage``: apply rotation to a page
5481   - object
5482   -
5483   - - ``QPDFWriter::setNewlineBeforeEndstream``: force newline to
5484   - appear before ``endstream``
5485   -
5486   - - ``QPDFWriter::setPreserveUnreferencedObjects``: preserve
5487   - unreferenced objects that appear in the input PDF. The default
5488   - behavior is to discard them.
5489   -
5490   - - New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are
5491   - available for developers who wish to produce or consume
5492   - RunLength or DCT stream data directly. The
5493   - :file:`examples/pdf-create.cc` example
5494   - illustrates their use.
5495   -
5496   - - ``QPDFWriter::setCompressStreams`` and
5497   - ``QPDFWriter::setDecodeLevel`` methods control handling of
5498   - different types of stream compression.
5499   -
5500   - - Add new C API functions ``qpdf_set_compress_streams``,
5501   - ``qpdf_set_decode_level``,
5502   - ``qpdf_set_preserve_unreferenced_objects``, and
5503   - ``qpdf_set_newline_before_endstream`` corresponding to the new
5504   - ``QPDFWriter`` methods.
5505   -
5506   -6.0.0: November 10, 2015
5507   - - Implement :samp:`--deterministic-id` command-line
5508   - option and ``QPDFWriter::setDeterministicID`` as well as C API
5509   - function ``qpdf_set_deterministic_ID`` for generating a
5510   - deterministic ID for non-encrypted files. When this option is
5511   - selected, the ID of the file depends on the contents of the output
5512   - file, and not on transient items such as the timestamp or output
5513   - file name.
5514   -
5515   - - Make qpdf more tolerant of files whose xref table entries are not
5516   - the correct length.
5517   -
5518   -5.1.3: May 24, 2015
5519   - - Bug fix: fix-qdf was not properly handling files that contained
5520   - object streams with more than 255 objects in them.
5521   -
5522   - - Bug fix: qpdf was not properly initializing Microsoft's secure
5523   - crypto provider on fresh Windows installations that had not had
5524   - any keys created yet.
5525   -
5526   - - Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of
5527   - the Google Security Team. Please see the ChangeLog for details.
5528   -
5529   - - Properly handle pages that have no contents at all. There were
5530   - many cases in which qpdf handled this fine, but a few methods
5531   - blindly obtained page contents with handling the possibility that
5532   - there were no contents.
5533   -
5534   - - Make qpdf more robust for a few more kinds of problems that may
5535   - occur in invalid PDF files.
5536   -
5537   -5.1.2: June 7, 2014
5538   - - Bug fix: linearizing files could create a corrupted output file
5539   - under extremely unlikely file size circumstances. See ChangeLog
5540   - for details. The odds of getting hit by this are very low, though
5541   - one person did.
5542   -
5543   - - Bug fix: qpdf would fail to write files that had streams with
5544   - decode parameters referencing other streams.
5545   -
5546   - - New example program: :command:`pdf-split-pages`:
5547   - efficiently split PDF files into individual pages. The example
5548   - program does this more efficiently than using :command:`qpdf
5549   - --pages` to do it.
5550   -
5551   - - Packaging fix: Visual C++ binaries did not support Windows XP.
5552   - This has been rectified by updating the compilers used to generate
5553   - the release binaries.
5554   -
5555   -5.1.1: January 14, 2014
5556   - - Performance fix: copying foreign objects could be very slow with
5557   - certain types of files. This was most likely to be visible during
5558   - page splitting and was due to traversing the same objects multiple
5559   - times in some cases.
5560   -
5561   -5.1.0: December 17, 2013
5562   - - Added runtime option (``QUtil::setRandomDataProvider``) to supply
5563   - your own random data provider. You can use this if you want to
5564   - avoid using the OS-provided secure random number generation
5565   - facility or stdlib's less secure version. See comments in
5566   - include/qpdf/QUtil.hh for details.
5567   -
5568   - - Fixed image comparison tests to not create 12-bit-per-pixel images
5569   - since some versions of tiffcmp have bugs in comparing them in some
5570   - cases. This increases the disk space required by the image
5571   - comparison tests, which are off by default anyway.
5572   -
5573   - - Introduce a number of small fixes for compilation on the latest
5574   - clang in MacOS and the latest Visual C++ in Windows.
5575   -
5576   - - Be able to handle broken files that end the xref table header with
5577   - a space instead of a newline.
5578   -
5579   -5.0.1: October 18, 2013
5580   - - Thanks to a detailed review by Florian Weimer and the Red Hat
5581   - Product Security Team, this release includes a number of
5582   - non-user-visible security hardening changes. Please see the
5583   - ChangeLog file in the source distribution for the complete list.
5584   -
5585   - - When available, operating system-specific secure random number
5586   - generation is used for generating initialization vectors and other
5587   - random values used during encryption or file creation. For the
5588   - Windows build, this results in an added dependency on Microsoft's
5589   - cryptography API. To disable the OS-specific cryptography and use
5590   - the old version, pass the
5591   - :samp:`--enable-insecure-random` option to
5592   - :command:`./configure`.
5593   -
5594   - - The :command:`qpdf` command-line tool now issues a
5595   - warning when :samp:`-accessibility=n` is specified
5596   - for newer encryption versions stating that the option is ignored.
5597   - qpdf, per the spec, has always ignored this flag, but it
5598   - previously did so silently. This warning is issued only by the
5599   - command-line tool, not by the library. The library's handling of
5600   - this flag is unchanged.
5601   -
5602   -5.0.0: July 10, 2013
5603   - - Bug fix: previous versions of qpdf would lose objects with
5604   - generation != 0 when generating object streams. Fixing this
5605   - required changes to the public API.
5606   -
5607   - - Removed methods from public API that were only supposed to be
5608   - called by QPDFWriter and couldn't realistically be called anywhere
5609   - else. See ChangeLog for details.
5610   -
5611   - - New ``QPDFObjGen`` class added to represent an object
5612   - ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now
5613   - preferred over ``QPDFObjectHandle::getObjectID()`` and
5614   - ``QPDFObjectHandle::getGeneration()`` as it makes it less likely
5615   - for people to accidentally write code that ignores the generation
5616   - number. See :file:`QPDF.hh` and
5617   - :file:`QPDFObjectHandle.hh` for additional
5618   - notes.
5619   -
5620   - - Add :samp:`--show-npages` command-line option to
5621   - the :command:`qpdf` command to show the number of
5622   - pages in a file.
5623   -
5624   - - Allow omission of the page range within
5625   - :samp:`--pages` for the
5626   - :command:`qpdf` command. When omitted, the page
5627   - range is implicitly taken to be all the pages in the file.
5628   -
5629   - - Various enhancements were made to support different types of
5630   - broken files or broken readers. Details can be found in
5631   - :file:`ChangeLog`.
5632   -
5633   -4.1.0: April 14, 2013
5634   - - Note to people including qpdf in distributions: the
5635   - :file:`.la` files generated by libtool are now
5636   - installed by qpdf's :command:`make install` target.
5637   - Before, they were not installed. This means that if your
5638   - distribution does not want to include
5639   - :file:`.la` files, you must remove them as
5640   - part of your packaging process.
5641   -
5642   - - Major enhancement: API enhancements have been made to support
5643   - parsing of content streams. This enhancement includes the
5644   - following changes:
5645   -
5646   - - ``QPDFObjectHandle::parseContentStream`` method parses objects
5647   - in a content stream and calls handlers in a callback class. The
5648   - example
5649   - :file:`examples/pdf-parse-content.cc`
5650   - illustrates how this may be used.
5651   -
5652   - - ``QPDFObjectHandle`` can now represent operators and inline
5653   - images, object types that may only appear in content streams.
5654   -
5655   - - Method ``QPDFObjectHandle::getTypeCode()`` returns an
5656   - enumerated type value representing the underlying object type.
5657   - Method ``QPDFObjectHandle::getTypeName()`` returns a text
5658   - string describing the name of the type of a
5659   - ``QPDFObjectHandle`` object. These methods can be used for more
5660   - efficient parsing and debugging/diagnostic messages.
5661   -
5662   - - :command:`qpdf --check` now parses all pages'
5663   - content streams in addition to doing other checks. While there are
5664   - still many types of errors that cannot be detected, syntactic
5665   - errors in content streams will now be reported.
5666   -
5667   - - Minor compilation enhancements have been made to facilitate easier
5668   - for support for a broader range of compilers and compiler
5669   - versions.
5670   -
5671   - - Warning flags have been moved into a separate variable in
5672   - :file:`autoconf.mk`
5673   -
5674   - - The configure flag :samp:`--enable-werror` work
5675   - for Microsoft compilers
5676   -
5677   - - All MSVC CRT security warnings have been resolved.
5678   -
5679   - - All C-style casts in C++ Code have been replaced by C++ casts,
5680   - and many casts that had been included to suppress higher
5681   - warning levels for some compilers have been removed, primarily
5682   - for clarity. Places where integer type coercion occurs have
5683   - been scrutinized. A new casting policy has been documented in
5684   - the manual. This is of concern mainly to people porting qpdf to
5685   - new platforms or compilers. It is not visible to programmers
5686   - writing code that uses the library
5687   -
5688   - - Some internal limits have been removed in code that converts
5689   - numbers to strings. This is largely invisible to users, but it
5690   - does trigger a bug in some older versions of mingw-w64's C++
5691   - library. See :file:`README-windows.md` in
5692   - the source distribution if you think this may affect you. The
5693   - copy of the DLL distributed with qpdf's binary distribution is
5694   - not affected by this problem.
5695   -
5696   - - The RPM spec file previously included with qpdf has been removed.
5697   - This is because virtually all Linux distributions include qpdf now
5698   - that it is a dependency of CUPS filters.
5699   -
5700   - - A few bug fixes are included:
5701   -
5702   - - Overridden compressed objects are properly handled. Before,
5703   - there were certain constructs that could cause qpdf to see old
5704   - versions of some objects. The most usual manifestation of this
5705   - was loss of filled in form values for certain files.
5706   -
5707   - - Installation no longer uses GNU/Linux-specific versions of some
5708   - commands, so :command:`make install` works on
5709   - Solaris with native tools.
5710   -
5711   - - The 64-bit mingw Windows binary package no longer includes a
5712   - 32-bit DLL.
5713   -
5714   -4.0.1: January 17, 2013
5715   - - Fix detection of binary attachments in test suite to avoid false
5716   - test failures on some platforms.
5717   -
5718   - - Add clarifying comment in :file:`QPDF.hh` to
5719   - methods that return the user password explaining that it is no
5720   - longer possible with newer encryption formats to recover the user
5721   - password knowing the owner password. In earlier encryption
5722   - formats, the user password was encrypted in the file using the
5723   - owner password. In newer encryption formats, a separate encryption
5724   - key is used on the file, and that key is independently encrypted
5725   - using both the user password and the owner password.
5726   -
5727   -4.0.0: December 31, 2012
5728   - - Major enhancement: support has been added for newer encryption
5729   - schemes supported by version X of Adobe Acrobat. This includes use
5730   - of 127-character passwords, 256-bit encryption keys, and the
5731   - encryption scheme specified in ISO 32000-2, the PDF 2.0
5732   - specification. This scheme can be chosen from the command line by
5733   - specifying use of 256-bit keys. qpdf also supports the deprecated
5734   - encryption method used by Acrobat IX. This encryption style has
5735   - known security weaknesses and should not be used in practice.
5736   - However, such files exist "in the wild," so support for this
5737   - scheme is still useful. New methods
5738   - ``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme)
5739   - and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated
5740   - scheme) have been added to enable these new encryption schemes.
5741   - Corresponding functions have been added to the C API as well.
5742   -
5743   - - Full support for Adobe extension levels in PDF version
5744   - information. Starting with PDF version 1.7, corresponding to ISO
5745   - 32000, Adobe adds new functionality by increasing the extension
5746   - level rather than increasing the version. This support includes
5747   - addition of the ``QPDF::getExtensionLevel`` method for retrieving
5748   - the document's extension level, addition of versions of
5749   - ``QPDFWriter::setMinimumPDFVersion`` and
5750   - ``QPDFWriter::forcePDFVersion`` that accept an extension level,
5751   - and extended syntax for specifying forced and minimum versions on
5752   - the command line as described in :ref:`ref.advanced-transformation`. Corresponding functions
5753   - have been added to the C API as well.
5754   -
5755   - - Minor fixes to prevent qpdf from referencing objects in the file
5756   - that are not referenced in the file's overall structure. Most
5757   - files don't have any such objects, but some files have contain
5758   - unreferenced objects with errors, so these fixes prevent qpdf from
5759   - needlessly rejecting or complaining about such objects.
5760   -
5761   - - Add new generalized methods for reading and writing files from/to
5762   - programmer-defined sources. The method
5763   - ``QPDF::processInputSource`` allows the programmer to use any
5764   - input source for the input file, and
5765   - ``QPDFWriter::setOutputPipeline`` allows the programmer to write
5766   - the output file through any pipeline. These methods would make it
5767   - possible to perform any number of specialized operations, such as
5768   - accessing external storage systems, creating bindings for qpdf in
5769   - other programming languages that have their own I/O systems, etc.
5770   -
5771   - - Add new method ``QPDF::getEncryptionKey`` for retrieving the
5772   - underlying encryption key used in the file.
5773   -
5774   - - This release includes a small handful of non-compatible API
5775   - changes. While effort is made to avoid such changes, all the
5776   - non-compatible API changes in this version were to parts of the
5777   - API that would likely never be used outside the library itself. In
5778   - all cases, the altered methods or structures were parts of the
5779   - ``QPDF`` that were public to enable them to be called from either
5780   - ``QPDFWriter`` or were part of validation code that was
5781   - over-zealous in reporting problems in parts of the file that would
5782   - not ordinarily be referenced. In no case did any of the removed
5783   - methods do anything worse that falsely report error conditions in
5784   - files that were broken in ways that didn't matter. The following
5785   - public parts of the ``QPDF`` class were changed in a
5786   - non-compatible way:
5787   -
5788   - - Updated nested ``QPDF::EncryptionData`` class to add fields
5789   - needed by the newer encryption formats, member variables
5790   - changed to private so that future changes will not require
5791   - breaking backward compatibility.
5792   -
5793   - - Added additional parameters to ``compute_data_key``, which is
5794   - used by ``QPDFWriter`` to compute the encryption key used to
5795   - encrypt a specific object.
5796   -
5797   - - Removed the method ``flattenScalarReferences``. This method was
5798   - previously used prior to writing a new PDF file, but it has the
5799   - undesired side effect of causing qpdf to read objects in the
5800   - file that were not referenced. Some otherwise files have
5801   - unreferenced objects with errors in them, so this could cause
5802   - qpdf to reject files that would be accepted by virtually all
5803   - other PDF readers. In fact, qpdf relied on only a very small
5804   - part of what flattenScalarReferences did, so only this part has
5805   - been preserved, and it is now done directly inside
5806   - ``QPDFWriter``.
5807   -
5808   - - Removed the method ``decodeStreams``. This method was used by
5809   - the :samp:`--check` option of the
5810   - :command:`qpdf` command-line tool to force all
5811   - streams in the file to be decoded, but it also suffered from
5812   - the problem of opening otherwise unreferenced streams and thus
5813   - could report false positive. The
5814   - :samp:`--check` option now causes qpdf to go
5815   - through all the motions of writing a new file based on the
5816   - original one, so it will always reference and check exactly
5817   - those parts of a file that any ordinary viewer would check.
5818   -
5819   - - Removed the method ``trimTrailerForWrite``. This method was
5820   - used by ``QPDFWriter`` to modify the original QPDF object by
5821   - removing fields from the trailer dictionary that wouldn't apply
5822   - to the newly written file. This functionality, though generally
5823   - harmless, was a poor implementation and has been replaced by
5824   - having QPDFWriter filter these out when copying the trailer
5825   - rather than modifying the original QPDF object. (Note that qpdf
5826   - never modifies the original file itself.)
5827   -
5828   - - Allow the PDF header to appear anywhere in the first 1024 bytes of
5829   - the file. This is consistent with what other readers do.
5830   -
5831   - - Fix the :command:`pkg-config` files to list zlib
5832   - and pcre in ``Requires.private`` to better support static linking
5833   - using :command:`pkg-config`.
5834   -
5835   -3.0.2: September 6, 2012
5836   - - Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not
5837   - used with ``QPDFWriter::setStaticID``, which made it pretty much
5838   - useless. This has been fixed.
5839   -
5840   - - New API call ``QPDFWriter::setExtraHeaderText`` inserts additional
5841   - text near the header of the PDF file. The intended use case is to
5842   - insert comments that may be consumed by a downstream application,
5843   - though other use cases may exist.
5844   -
5845   -3.0.1: August 11, 2012
5846   - - Version 3.0.0 included addition of files for
5847   - :command:`pkg-config`, but this was not mentioned
5848   - in the release notes. The release notes for 3.0.0 were updated to
5849   - mention this.
5850   -
5851   - - Bug fix: if an object stream ended with a scalar object not
5852   - followed by space, qpdf would incorrectly report that it
5853   - encountered a premature EOF. This bug has been in qpdf since
5854   - versionย 2.0.
5855   -
5856   -3.0.0: August 2, 2012
5857   - - Acknowledgment: I would like to express gratitude for the
5858   - contributions of Tobias Hoffmann toward the release of qpdf
5859   - version 3.0. He is responsible for most of the implementation and
5860   - design of the new API for manipulating pages, and contributed code
5861   - and ideas for many of the improvements made in version 3.0.
5862   - Without his work, this release would certainly not have happened
5863   - as soon as it did, if at all.
5864   -
5865   - - *Non-compatible API changes:*
5866   -
5867   - - The method ``QPDFObjectHandle::replaceStreamData`` that uses a
5868   - ``StreamDataProvider`` to provide the stream data no longer
5869   - takes a ``length`` parameter. The parameter was removed since
5870   - this provides the user an opportunity to simplify the calling
5871   - code. This method was introduced in version 2.2. At the time,
5872   - the ``length`` parameter was required in order to ensure that
5873   - calls to the stream data provider returned the same length for a
5874   - specific stream every time they were invoked. In particular, the
5875   - linearization code depends on this. Instead, qpdf 3.0 and newer
5876   - check for that constraint explicitly. The first time the stream
5877   - data provider is called for a specific stream, the actual length
5878   - is saved, and subsequent calls are required to return the same
5879   - number of bytes. This means the calling code no longer has to
5880   - compute the length in advance, which can be a significant
5881   - simplification. If your code fails to compile because of the
5882   - extra argument and you don't want to make other changes to your
5883   - code, just omit the argument.
5884   -
5885   - - Many methods take ``long long`` instead of other integer types.
5886   - Most if not all existing code should compile fine with this
5887   - change since such parameters had always previously been smaller
5888   - types. This change was required to support files larger than two
5889   - gigabytes in size.
5890   -
5891   - - Support has been added for large files. The test suite verifies
5892   - support for files larger than 4 gigabytes, and manual testing has
5893   - verified support for files larger than 10 gigabytes. Large file
5894   - support is available for both 32-bit and 64-bit platforms as long
5895   - as the compiler and underlying platforms support it.
5896   -
5897   - - Support for page selection (splitting and merging PDF files) has
5898   - been added to the :command:`qpdf` command-line
5899   - tool. See :ref:`ref.page-selection`.
5900   -
5901   - - Options have been added to the :command:`qpdf`
5902   - command-line tool for copying encryption parameters from another
5903   - file. See :ref:`ref.basic-options`.
5904   -
5905   - - New methods have been added to the ``QPDF`` object for adding and
5906   - removing pages. See :ref:`ref.adding-and-remove-pages`.
5907   -
5908   - - New methods have been added to the ``QPDF`` object for copying
5909   - objects from other PDF files. See :ref:`ref.foreign-objects`
5910   -
5911   - - A new method ``QPDFObjectHandle::parse`` has been added for
5912   - constructing ``QPDFObjectHandle`` objects from a string
5913   - description.
5914   -
5915   - - Methods have been added to ``QPDFWriter`` to allow writing to an
5916   - already open stdio ``FILE*`` addition to writing to standard
5917   - output or a named file. Methods have been added to ``QPDF`` to be
5918   - able to process a file from an already open stdio ``FILE*``. This
5919   - makes it possible to read and write PDF from secure temporary
5920   - files that have been unlinked prior to being fully read or
5921   - written.
5922   -
5923   - - The ``QPDF::emptyPDF`` can be used to allow creation of PDF files
5924   - from scratch. The example
5925   - :file:`examples/pdf-create.cc` illustrates how
5926   - it can be used.
5927   -
5928   - - Several methods to take ``PointerHolder<Buffer>`` can now also
5929   - accept ``std::string`` arguments.
5930   -
5931   - - Many new convenience methods have been added to the library, most
5932   - in ``QPDFObjectHandle``. See :file:`ChangeLog`
5933   - for a full list.
5934   -
5935   - - When building on a platform that supports ELF shared libraries
5936   - (such as Linux), symbol versions are enabled by default. They can
5937   - be disabled by passing
5938   - :samp:`--disable-ld-version-script` to
5939   - :command:`./configure`.
5940   -
5941   - - The file :file:`libqpdf.pc` is now installed
5942   - to support :command:`pkg-config`.
5943   -
5944   - - Image comparison tests are off by default now since they are not
5945   - needed to verify a correct build or port of qpdf. They are needed
5946   - only when changing the actual PDF output generated by qpdf. You
5947   - should enable them if you are making deep changes to qpdf itself.
5948   - See :file:`README.md` for details.
5949   -
5950   - - Large file tests are off by default but can be turned on with
5951   - :command:`./configure` or by setting an environment
5952   - variable before running the test suite. See
5953   - :file:`README.md` for details.
5954   -
5955   - - When qpdf's test suite fails, failures are not printed to the
5956   - terminal anymore by default. Instead, find them in
5957   - :file:`build/qtest.log`. For packagers who are
5958   - building with an autobuilder, you can add the
5959   - :samp:`--enable-show-failed-test-output` option to
5960   - :command:`./configure` to restore the old behavior.
5961   -
5962   -2.3.1: December 28, 2011
5963   - - Fix thread-safety problem resulting from non-thread-safe use of
5964   - the PCRE library.
5965   -
5966   - - Made a few minor documentation fixes.
5967   -
5968   - - Add workaround for a bug that appears in some versions of
5969   - ghostscript to the test suite
5970   -
5971   - - Fix minor build issue for Visual C++ 2010.
5972   -
5973   -2.3.0: August 11, 2011
5974   - - Bug fix: when preserving existing encryption on encrypted files
5975   - with cleartext metadata, older qpdf versions would generate
5976   - password-protected files with no valid password. This operation
5977   - now works. This bug only affected files created by copying
5978   - existing encryption parameters; explicit encryption with
5979   - specification of cleartext metadata worked before and continues to
5980   - work.
5981   -
5982   - - Enhance ``QPDFWriter`` with a new constructor that allows you to
5983   - delay the specification of the output file. When using this
5984   - constructor, you may now call ``QPDFWriter::setOutputFilename`` to
5985   - specify the output file, or you may use
5986   - ``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write
5987   - the resulting PDF file to a memory buffer. You may then use
5988   - ``QPDFWriter::getBuffer`` to retrieve the memory buffer.
5989   -
5990   - - Add new API call ``QPDF::replaceObject`` for replacing objects by
5991   - object ID
5992   -
5993   - - Add new API call ``QPDF::swapObjects`` for swapping two objects by
5994   - object ID
5995   -
5996   - - Add ``QPDFObjectHandle::getDictAsMap`` and
5997   - ``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of
5998   - dictionary objects as maps and array objects as vectors.
5999   -
6000   - - Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to
6001   - the C API for manipulating string fields of the document's
6002   - ``/Info`` dictionary.
6003   -
6004   - - Add functions ``qpdf_init_write_memory``,
6005   - ``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API
6006   - for writing PDF files to a memory buffer instead of a file.
6007   -
6008   -2.2.4: June 25, 2011
6009   - - Fix installation and compilation issues; no functionality changes.
6010   -
6011   -2.2.3: April 30, 2011
6012   - - Handle some damaged streams with incorrect characters following
6013   - the stream keyword.
6014   -
6015   - - Improve handling of inline images when normalizing content
6016   - streams.
6017   -
6018   - - Enhance error recovery to properly handle files that use object 0
6019   - as a regular object, which is specifically disallowed by the spec.
6020   -
6021   -2.2.2: October 4, 2010
6022   - - Add new function ``qpdf_read_memory`` to the C API to call
6023   - ``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1.
6024   -
6025   -2.2.1: October 1, 2010
6026   - - Add new method ``QPDF::setOutputStreams`` to replace ``std::cout``
6027   - and ``std::cerr`` with other streams for generation of diagnostic
6028   - messages and error messages. This can be useful for GUIs or other
6029   - applications that want to capture any output generated by the
6030   - library to present to the user in some other way. Note that QPDF
6031   - does not write to ``std::cout`` (or the specified output stream)
6032   - except where explicitly mentioned in
6033   - :file:`QPDF.hh`, and that the only use of the
6034   - error stream is for warnings. Note also that output of warnings is
6035   - suppressed when ``setSuppressWarnings(true)`` is called.
6036   -
6037   - - Add new method ``QPDF::processMemoryFile`` for operating on PDF
6038   - files that are loaded into memory rather than in a file on disk.
6039   -
6040   - - Give a warning but otherwise ignore empty PDF objects by treating
6041   - them as null. Empty object are not permitted by the PDF
6042   - specification but have been known to appear in some actual PDF
6043   - files.
6044   -
6045   - - Handle inline image filter abbreviations when the appear as stream
6046   - filter abbreviations. The PDF specification does not allow use of
6047   - stream filter abbreviations in this way, but Adobe Reader and some
6048   - other PDF readers accept them since they sometimes appear
6049   - incorrectly in actual PDF files.
6050   -
6051   - - Implement miscellaneous enhancements to ``PointerHolder`` and
6052   - ``Buffer`` to support other changes.
6053   -
6054   -2.2.0: August 14, 2010
6055   - - Add new methods to ``QPDFObjectHandle`` (``newStream`` and
6056   - ``replaceStreamData`` for creating new streams and replacing
6057   - stream data. This makes it possible to perform a wide range of
6058   - operations that were not previously possible.
6059   -
6060   - - Add new helper method in ``QPDFObjectHandle``
6061   - (``addPageContents``) for appending or prepending new content
6062   - streams to a page. This method makes it possible to manipulate
6063   - content streams without having to be concerned whether a page's
6064   - contents are a single stream or an array of streams.
6065   -
6066   - - Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``,
6067   - which replaces a dictionary key with a given value unless the
6068   - value is null, in which case it removes the key instead.
6069   -
6070   - - Add new method in ``QPDFObjectHandle``: ``getRawStreamData``,
6071   - which returns the raw (unfiltered) stream data into a buffer. This
6072   - complements the ``getStreamData`` method, which returns the
6073   - filtered (uncompressed) stream data and can only be used when the
6074   - stream's data is filterable.
6075   -
6076   - - Provide two new examples:
6077   - :command:`pdf-double-page-size` and
6078   - :command:`pdf-invert-images` that illustrate the
6079   - newly added interfaces.
6080   -
6081   - - Fix a memory leak that would cause loss of a few bytes for every
6082   - object involved in a cycle of object references. Thanks to Jian Ma
6083   - for calling my attention to the leak.
6084   -
6085   -2.1.5: April 25, 2010
6086   - - Remove restriction of file identifier strings to 16 bytes. This
6087   - unnecessary restriction was preventing qpdf from being able to
6088   - encrypt or decrypt files with identifier strings that were not
6089   - exactly 16 bytes long. The specification imposes no such
6090   - restriction.
6091   -
6092   -2.1.4: April 18, 2010
6093   - - Apply the same padding calculation fix from version 2.1.2 to the
6094   - main cross reference stream as well.
6095   -
6096   - - Since :command:`qpdf --check` only performs limited
6097   - checks, clarify the output to make it clear that there still may
6098   - be errors that qpdf can't check. This should make it less
6099   - surprising to people when another PDF reader is unable to read a
6100   - file that qpdf thinks is okay.
6101   -
6102   -2.1.3: March 27, 2010
6103   - - Fix bug that could cause a failure when rewriting PDF files that
6104   - contain object streams with unreferenced objects that in turn
6105   - reference indirect scalars.
6106   -
6107   - - Don't complain about (invalid) AES streams that aren't a multiple
6108   - of 16 bytes. Instead, pad them before decrypting.
6109   -
6110   -2.1.2: January 24, 2010
6111   - - Fix bug in padding around first half cross reference stream in
6112   - linearized files. The bug could cause an assertion failure when
6113   - linearizing certain unlucky files.
6114   -
6115   -2.1.1: December 14, 2009
6116   - - No changes in functionality; insert missing include in an internal
6117   - library header file to support gcc 4.4, and update test suite to
6118   - ignore broken Adobe Reader installations.
6119   -
6120   -2.1: October 30, 2009
6121   - - This is the first version of qpdf to include Windows support. On
6122   - Windows, it is possible to build a DLL. Additionally, a partial
6123   - C-language API has been introduced, which makes it possible to
6124   - call qpdf functions from non-C++ environments. I am very grateful
6125   - to ลฝarko Gajiฤ‡ (http://zarko-gajic.iz.hr/) for tirelessly testing
6126   - numerous pre-release versions of this DLL and providing many
6127   - excellent suggestions on improving the interface.
6128   -
6129   - For programming to the C interface, please see the header file
6130   - :file:`qpdf/qpdf-c.h` and the example
6131   - :file:`examples/pdf-linearize.c`.
6132   -
6133   - - ลฝarko Gajiฤ‡ has written a Delphi wrapper for qpdf, which can be
6134   - downloaded from qpdf's download side. ลฝarko's Delphi wrapper is
6135   - released with the same licensing terms as qpdf itself and comes
6136   - with this disclaimer: "Delphi wrapper unit
6137   - :file:`qpdf.pas` created by ลฝarko Gajiฤ‡
6138   - (http://zarko-gajic.iz.hr/). Use at your own risk and for whatever
6139   - purpose you want. No support is provided. Sample code is
6140   - provided."
6141   -
6142   - - Support has been added for AES encryption and crypt filters.
6143   - Although qpdf does not presently support files that use PKI-based
6144   - encryption, with the addition of AES and crypt filters, qpdf is
6145   - now be able to open most encrypted files created with newer
6146   - versions of Acrobat or other PDF creation software. Note that I
6147   - have not been able to get very many files encrypted in this way,
6148   - so it's possible there could still be some cases that qpdf can't
6149   - handle. Please report them if you find them.
6150   -
6151   - - Many error messages have been improved to include more information
6152   - in hopes of making qpdf a more useful tool for PDF experts to use
6153   - in manually recovering damaged PDF files.
6154   -
6155   - - Attempt to avoid compressing metadata streams if possible. This is
6156   - consistent with other PDF creation applications.
6157   -
6158   - - Provide new command-line options for AES encrypt, cleartext
6159   - metadata, and setting the minimum and forced PDF versions of
6160   - output files.
6161   -
6162   - - Add additional methods to the ``QPDF`` object for querying the
6163   - document's permissions. Although qpdf does not enforce these
6164   - permissions, it does make them available so that applications that
6165   - use qpdf can enforce permissions.
6166   -
6167   - - The :samp:`--check` option to
6168   - :command:`qpdf` has been extended to include some
6169   - additional information.
6170   -
6171   - - *Non-compatible API changes:*
6172   -
6173   - - QPDF's exception handling mechanism now uses
6174   - ``std::logic_error`` for internal errors and
6175   - ``std::runtime_error`` for runtime errors in favor of the now
6176   - removed ``QEXC`` classes used in previous versions. The ``QEXC``
6177   - exception classes predated the addition of the
6178   - :file:`<stdexcept>` header file to the C++ standard library.
6179   - Most of the exceptions thrown by the qpdf library itself are
6180   - still of type ``QPDFExc`` which is now derived from
6181   - ``std::runtime_error``. Programs that catch an instance of
6182   - ``std::exception`` and displayed it by calling the ``what()``
6183   - method will not need to be changed.
6184   -
6185   - - The ``QPDFExc`` class now internally represents various fields
6186   - of the error condition and provides interfaces for querying
6187   - them. Among the fields is a numeric error code that can help
6188   - applications act differently on (a small number of) different
6189   - error conditions. See :file:`QPDFExc.hh` for details.
6190   -
6191   - - Warnings can be retrieved from qpdf as instances of ``QPDFExc``
6192   - instead of strings.
6193   -
6194   - - The nested ``QPDF::EncryptionData`` class's constructor takes an
6195   - additional argument. This class is primarily intended to be used
6196   - by ``QPDFWriter``. There's not really anything useful an
6197   - end-user application could do with it. It probably shouldn't
6198   - really be part of the public interface to begin with. Likewise,
6199   - some of the methods for computing internal encryption dictionary
6200   - parameters have changed to support ``/R=4`` encryption.
6201   -
6202   - - The method ``QPDF::getUserPassword`` has been removed since it
6203   - didn't do what people would think it did. There are now two new
6204   - methods: ``QPDF::getPaddedUserPassword`` and
6205   - ``QPDF::getTrimmedUserPassword``. The first one does what the
6206   - old ``QPDF::getUserPassword`` method used to do, which is to
6207   - return the password with possible binary padding as specified by
6208   - the PDF specification. The second one returns a human-readable
6209   - password string.
6210   -
6211   - - The enumerated types that used to be nested in ``QPDFWriter``
6212   - have moved to top-level enumerated types and are now defined in
6213   - the file :file:`qpdf/Constants.h`. This enables them to be
6214   - shared by both the C and C++ interfaces.
6215   -
6216   -2.0.6: May 3, 2009
6217   - - Do not attempt to uncompress streams that have decode parameters
6218   - we don't recognize. Earlier versions of qpdf would have rejected
6219   - files with such streams.
6220   -
6221   -2.0.5: March 10, 2009
6222   - - Improve error handling in the LZW decoder, and fix a small error
6223   - introduced in the previous version with regard to handling full
6224   - tables. The LZW decoder has been more strongly verified in this
6225   - release.
6226   -
6227   -2.0.4: February 21, 2009
6228   - - Include proper support for LZW streams encoded without the "early
6229   - code change" flag. Special thanks to Atom Smasher who reported the
6230   - problem and provided an input file compressed in this way, which I
6231   - did not previously have.
6232   -
6233   - - Implement some improvements to file recovery logic.
6234   -
6235   -2.0.3: February 15, 2009
6236   - - Compile cleanly with gcc 4.4.
6237   -
6238   - - Handle strings encoded as UTF-16BE properly.
6239   -
6240   -2.0.2: June 30, 2008
6241   - - Update test suite to work properly with a
6242   - non-:command:`bash`
6243   - :file:`/bin/sh` and with Perl 5.10. No changes
6244   - were made to the actual qpdf source code itself for this release.
6245   -
6246   -2.0.1: May 6, 2008
6247   - - No changes in functionality or interface. This release includes
6248   - fixes to the source code so that qpdf compiles properly and passes
6249   - its test suite on a broader range of platforms. See
6250   - :file:`ChangeLog` in the source distribution
6251   - for details.
6252   -
6253   -2.0: April 29, 2008
6254   - - First public release.
6255   -
6256   -.. _acknowledgments:
6257   -
6258   -Acknowledgment
6259   -==============
6260   -
6261   -QPDF was originally created in 2001 and modified periodically between
6262   -2001 and 2005 during my employment at `Apex CoVantage
6263   -<http://www.apexcovantage.com>`__. Upon my departure from Apex, the
6264   -company graciously allowed me to take ownership of the software and
6265   -continue maintaining it as an open source project, a decision for which I
6266   -am very grateful. I have made considerable enhancements to it since
6267   -that time. I feel fortunate to have worked for people who would make
6268   -such a decision. This work would not have been possible without their
6269   -support.
  12 + overview
  13 + license
  14 + installation
  15 + cli
  16 + qdf
  17 + library
  18 + weak-crypto
  19 + json
  20 + design
  21 + linearization
  22 + object-streams
  23 + release-notes
  24 + acknowledgement
... ...
manual/installation.rst 0 โ†’ 100644
  1 +.. _ref.installing:
  2 +
  3 +Building and Installing QPDF
  4 +============================
  5 +
  6 +This chapter describes how to build and install qpdf. Please see also
  7 +the :file:`README.md` and
  8 +:file:`INSTALL` files in the source distribution.
  9 +
  10 +.. _ref.prerequisites:
  11 +
  12 +System Requirements
  13 +-------------------
  14 +
  15 +The qpdf package has few external dependencies. In order to build qpdf,
  16 +the following packages are required:
  17 +
  18 +- A C++ compiler that supports C++-14.
  19 +
  20 +- zlib: http://www.zlib.net/
  21 +
  22 +- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/
  23 +
  24 +- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be
  25 + able to use the gnutls crypto provider, and/or openssl:
  26 + https://openssl.org/ to be able to use the openssl crypto provider.
  27 +
  28 +- gnu make 3.81 or newer: http://www.gnu.org/software/make
  29 +
  30 +- perl version 5.8 or newer: http://www.perl.org/; required for running
  31 + the test suite. Starting with qpdf version 9.1.1, perl is no longer
  32 + required at runtime.
  33 +
  34 +- GNU diffutils (any version): http://www.gnu.org/software/diffutils/
  35 + is required to run the test suite. Note that this is the version of
  36 + diff present on virtually all GNU/Linux systems. This is required
  37 + because the test suite uses :command:`diff -u`.
  38 +
  39 +Part of qpdf's test suite does comparisons of the contents PDF files by
  40 +converting them images and comparing the images. The image comparison
  41 +tests are disabled by default. Those tests are not required for
  42 +determining correctness of a qpdf build if you have not modified the
  43 +code since the test suite also contains expected output files that are
  44 +compared literally. The image comparison tests provide an extra check to
  45 +make sure that any content transformations don't break the rendering of
  46 +pages. Transformations that affect the content streams themselves are
  47 +off by default and are only provided to help developers look into the
  48 +contents of PDF files. If you are making deep changes to the library
  49 +that cause changes in the contents of the files that qpdf generate,
  50 +then you should enable the image comparison tests. Enable them by
  51 +running :command:`configure` with the
  52 +:samp:`--enable-test-compare-images` flag. If you enable
  53 +this, the following additional requirements are required by the test
  54 +suite. Note that in no case are these items required to use qpdf.
  55 +
  56 +- libtiff: http://www.remotesensing.org/libtiff/
  57 +
  58 +- GhostScript version 8.60 or newer: http://www.ghostscript.com
  59 +
  60 +If you do not enable this, then you do not need to have tiff and
  61 +ghostscript.
  62 +
  63 +Pre-built documentation is distributed with qpdf, so you should
  64 +generally not need to rebuild the documentation. In order to build the
  65 +documentation from source, you need to install `Sphinx
  66 +<https://sphinx-doc.org>`__. To build the PDF version of the
  67 +documentation, you need `pdflatex`, `latexmk`, and a fairly complete
  68 +LaTeX installation. Detailed requirements can be found in the Sphinx
  69 +documentation.
  70 +
  71 +.. _ref.building:
  72 +
  73 +Build Instructions
  74 +------------------
  75 +
  76 +Building qpdf on UNIX is generally just a matter of running
  77 +
  78 +::
  79 +
  80 + ./configure
  81 + make
  82 +
  83 +You can also run :command:`make check` to run the test
  84 +suite and :command:`make install` to install. Please run
  85 +:command:`./configure --help` for options on what can be
  86 +configured. You can also set the value of ``DESTDIR`` during
  87 +installation to install to a temporary location, as is common with many
  88 +open source packages. Please see also the
  89 +:file:`README.md` and
  90 +:file:`INSTALL` files in the source distribution.
  91 +
  92 +Building on Windows is a little bit more complicated. For details,
  93 +please see :file:`README-windows.md` in the source
  94 +distribution. You can also download a binary distribution for Windows.
  95 +There is a port of qpdf to Visual C++ version 6 in the
  96 +:file:`contrib` area generously contributed by Jian
  97 +Ma. This is also discussed in more detail in
  98 +:file:`README-windows.md`.
  99 +
  100 +While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one
  101 +place in the public API, and it's just in a helper function. It is
  102 +possible to build qpdf on a system that doesn't have ``wchar_t``, and
  103 +it's also possible to compile a program that uses qpdf on a system
  104 +without ``wchar_t`` as long as you don't call that one method. This is a
  105 +very unusual situation. For a detailed discussion, please see the
  106 +top-level README.md file in qpdf's source distribution.
  107 +
  108 +There are some other things you can do with the build. Although qpdf
  109 +uses :command:`autoconf`, it does not use
  110 +:command:`automake` but instead uses a
  111 +hand-crafted non-recursive Makefile that requires gnu make. If you're
  112 +really interested, please read the comments in the top-level
  113 +:file:`Makefile`.
  114 +
  115 +.. _ref.crypto:
  116 +
  117 +Crypto Providers
  118 +----------------
  119 +
  120 +Starting with qpdf 9.1.0, the qpdf library can be built with multiple
  121 +implementations of providers of cryptographic functions, which we refer
  122 +to as "crypto providers." At the time of writing, a crypto
  123 +implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes
  124 +and RC4 and AES256 with and without CBC encryption. In the future, if
  125 +digital signature is added to qpdf, there may be additional requirements
  126 +beyond this.
  127 +
  128 +Starting with qpdf version 9.1.0, the available implementations are
  129 +``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added.
  130 +Additional implementations may be added if needed. It is also possible
  131 +for a developer to provide their own implementation without modifying
  132 +the qpdf library.
  133 +
  134 +.. _ref.crypto.build:
  135 +
  136 +Build Support For Crypto Providers
  137 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  138 +
  139 +When building with qpdf's build system, crypto providers can be enabled
  140 +at build time using various :command:`./configure`
  141 +options. The default behavior is for
  142 +:command:`./configure` to discover which crypto providers
  143 +can be supported based on available external libraries, to build all
  144 +available crypto providers, and to use an external provider as the
  145 +default over the native one. This behavior can be changed with the
  146 +following flags to :command:`./configure`:
  147 +
  148 +- :samp:`--enable-crypto-{x}`
  149 + (where :samp:`{x}` is a supported crypto
  150 + provider): enable the :samp:`{x}` crypto
  151 + provider, requiring any external dependencies it needs
  152 +
  153 +- :samp:`--disable-crypto-{x}`:
  154 + disable the :samp:`{x}` provider, and do not
  155 + link against its dependencies even if they are available
  156 +
  157 +- :samp:`--with-default-crypto={x}`:
  158 + make :samp:`{x}` the default provider even if
  159 + a higher priority one is available
  160 +
  161 +- :samp:`--disable-implicit-crypto`: only build crypto
  162 + providers that are explicitly requested with an
  163 + :samp:`--enable-crypto-{x}`
  164 + option
  165 +
  166 +For example, if you want to guarantee that the gnutls crypto provider is
  167 +used and that the native provider is not built, you could run
  168 +:command:`./configure --enable-crypto-gnutls
  169 +--disable-implicit-crypto`.
  170 +
  171 +If you build qpdf using your own build system, in order for qpdf to work
  172 +at all, you need to enable at least one crypto provider. The file
  173 +:file:`libqpdf/qpdf/qpdf-config.h.in` provides
  174 +macros ``DEFAULT_CRYPTO``, whose value must be a string naming the
  175 +default crypto provider, and various symbols starting with
  176 +``USE_CRYPTO_``, at least one of which has to be enabled. Additionally,
  177 +you must compile the source files that implement a crypto provider. To
  178 +get a list of those files, look at
  179 +:file:`libqpdf/build.mk`. If you want to omit a
  180 +particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is
  181 +undefined, you can completely ignore the source files that belong to a
  182 +particular crypto provider. Additionally, crypto providers may have
  183 +their own external dependencies that can be omitted if the crypto
  184 +provider is not used. For example, if you are building qpdf yourself and
  185 +are using an environment that does not support gnutls or openssl, you
  186 +can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS``
  187 +is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then
  188 +you must include the source files used in the native implementation,
  189 +some of which were added or renamed from earlier versions, to your
  190 +build, and you can ignore
  191 +:file:`QPDFCrypto_gnutls.cc`. Always consult
  192 +:file:`libqpdf/build.mk` to get the list of source
  193 +files you need to build.
  194 +
  195 +.. _ref.crypto.runtime:
  196 +
  197 +Runtime Crypto Provider Selection
  198 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  199 +
  200 +You can use the :samp:`--show-crypto` option to
  201 +:command:`qpdf` to get a list of available crypto
  202 +providers. The default provider is always listed first, and the rest are
  203 +listed in lexical order. Each crypto provider is listed on a line by
  204 +itself with no other text, enabling the output of this command to be
  205 +used easily in scripts.
  206 +
  207 +You can override which crypto provider is used by setting the
  208 +``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to
  209 +ever do this, but you might want to do it if you were explicitly trying
  210 +to compare behavior of two different crypto providers while testing
  211 +performance or reproducing a bug. It could also be useful for people who
  212 +are implementing their own crypto providers.
  213 +
  214 +.. _ref.crypto.develop:
  215 +
  216 +Crypto Provider Information for Developers
  217 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  218 +
  219 +If you are writing code that uses libqpdf and you want to force a
  220 +certain crypto provider to be used, you can call the method
  221 +``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of
  222 +a built-in or developer-supplied provider. To add your own crypto
  223 +provider, you have to create a class derived from ``QPDFCryptoImpl`` and
  224 +register it with ``QPDFCryptoProvider``. For additional information, see
  225 +comments in :file:`include/qpdf/QPDFCryptoImpl.hh`.
  226 +
  227 +.. _ref.crypto.design:
  228 +
  229 +Crypto Provider Design Notes
  230 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  231 +
  232 +This section describes a few bits of rationale for why the crypto
  233 +provider interface was set up the way it was. You don't need to know any
  234 +of this information, but it's provided for the record and in case it's
  235 +interesting.
  236 +
  237 +As a general rule, I want to avoid as much as possible including large
  238 +blocks of code that are conditionally compiled such that, in most
  239 +builds, some code is never built. This is dangerous because it makes it
  240 +very easy for invalid code to creep in unnoticed. As such, I want it to
  241 +be possible to build qpdf with all available crypto providers, and this
  242 +is the way I build qpdf for local development. At the same time, if a
  243 +particular packager feels that it is a security liability for qpdf to
  244 +use crypto functionality from other than a library that gets
  245 +considerable scrutiny for this specific purpose (such as gnutls,
  246 +openssl, or nettle), then I want to give that packager the ability to
  247 +completely disable qpdf's native implementation. Or if someone wants to
  248 +avoid adding a dependency on one of the external crypto providers, I
  249 +don't want the availability of the provider to impose additional
  250 +external dependencies within that environment. Both of these are
  251 +situations that I know to be true for some users of qpdf.
  252 +
  253 +I want registration and selection of crypto providers to be thread-safe,
  254 +and I want it to work deterministically for a developer to provide their
  255 +own crypto provider and be able to set it up as the default. This was
  256 +the primary motivation behind requiring C++-11 as doing so enabled me to
  257 +exploit the guaranteed thread safety of local block static
  258 +initialization. The ``QPDFCryptoProvider`` class uses a singleton
  259 +pattern with thread-safe initialization to create the singleton instance
  260 +of ``QPDFCryptoProvider`` and exposes only static methods in its public
  261 +interface. In this way, if a developer wants to call any
  262 +``QPDFCryptoProvider`` methods, the library guarantees the
  263 +``QPDFCryptoProvider`` is fully initialized and all built-in crypto
  264 +providers are registered. Making ``QPDFCryptoProvider`` actually know
  265 +about all the built-in providers may seem a bit sad at first, but this
  266 +choice makes it extremely clear exactly what the initialization behavior
  267 +is. There's no question about provider implementations automatically
  268 +registering themselves in a nondeterministic order. It also means that
  269 +implementations do not need to know anything about the provider
  270 +interface, which makes them easier to test in isolation. Another
  271 +advantage of this approach is that a developer who wants to develop
  272 +their own crypto provider can do so in complete isolation from the qpdf
  273 +library and, with just two calls, can make qpdf use their provider in
  274 +their application. If they decided to contribute their code, plugging it
  275 +into the qpdf library would require a very small change to qpdf's source
  276 +code.
  277 +
  278 +The decision to make the crypto provider selectable at runtime was one I
  279 +struggled with a little, but I decided to do it for various reasons.
  280 +Allowing an end user to switch crypto providers easily could be very
  281 +useful for reproducing a potential bug. If a user reports a bug that
  282 +some cryptographic thing is broken, I can easily ask that person to try
  283 +with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The
  284 +same could apply in the event of a performance problem. This also makes
  285 +it easier for qpdf's own test suite to exercise code with different
  286 +providers without having to make every program that links with qpdf
  287 +aware of the possibility of multiple providers. In qpdf's continuous
  288 +integration environment, the entire test suite is run for each supported
  289 +crypto provider. This is made simple by being able to select the
  290 +provider using an environment variable.
  291 +
  292 +Finally, making crypto providers selectable in this way establish a
  293 +pattern that I may follow again in the future for stream filter
  294 +providers. One could imagine a future enhancement where someone could
  295 +provide their own implementations for basic filters like
  296 +``/FlateDecode`` or for other filters that qpdf doesn't support.
  297 +Implementing the registration functions and internal storage of
  298 +registered providers was also easier using C++-11's functional
  299 +interfaces, which was another reason to require C++-11 at this time.
  300 +
  301 +.. _ref.packaging:
  302 +
  303 +Notes for Packagers
  304 +-------------------
  305 +
  306 +If you are packaging qpdf for an operating system distribution, here are
  307 +some things you may want to keep in mind:
  308 +
  309 +- Starting in qpdf version 9.1.1, qpdf no longer has a runtime
  310 + dependency on perl. This is because fix-qdf was rewritten in C++.
  311 + However, qpdf still has a build-time dependency on perl.
  312 +
  313 +- Make sure you are getting the intended behavior with regard to crypto
  314 + providers. Read :ref:`ref.crypto.build` for details.
  315 +
  316 +- Passing :samp:`--enable-show-failed-test-output` to
  317 + :command:`./configure` will cause any failed test
  318 + output to be written to the console. This can be very useful for
  319 + seeing test failures generated by autobuilders where you can't access
  320 + qtest.log after the fact.
  321 +
  322 +- If qpdf's build environment detects the presence of autoconf and
  323 + related tools, it will check to ensure that automatically generated
  324 + files are up-to-date with recorded checksums and fail if it detects a
  325 + discrepancy. This feature is intended to prevent you from
  326 + accidentally forgetting to regenerate automatic files after modifying
  327 + their sources. If your packaging environment automatically refreshes
  328 + automatic files, it can cause this check to fail. Suppress qpdf's
  329 + checks by passing :samp:`--disable-check-autofiles`
  330 + to :command:`/.configure`. This is safe since qpdf's
  331 + :command:`autogen.sh` just runs autotools in the
  332 + normal way.
  333 +
  334 +- QPDF's :command:`make install` does not install
  335 + completion files by default, but as a packager, it's good if you
  336 + install them wherever your distribution expects such files to go. You
  337 + can find completion files to install in the
  338 + :file:`completions` directory.
  339 +
  340 +- Packagers are encouraged to install the source files from the
  341 + :file:`examples` directory along with qpdf
  342 + development packages.
... ...
manual/json.rst 0 โ†’ 100644
  1 +.. _ref.json:
  2 +
  3 +QPDF JSON
  4 +=========
  5 +
  6 +.. _ref.json-overview:
  7 +
  8 +Overview
  9 +--------
  10 +
  11 +Beginning with qpdf version 8.3.0, the :command:`qpdf`
  12 +command-line program can produce a JSON representation of the
  13 +non-content data in a PDF file. It includes a dump in JSON format of all
  14 +objects in the PDF file excluding the content of streams. This JSON
  15 +representation makes it very easy to look in detail at the structure of
  16 +a given PDF file, and it also provides a great way to work with PDF
  17 +files programmatically from the command-line in languages that can't
  18 +call or link with the qpdf library directly. Note that stream data can
  19 +be extracted from PDF files using other qpdf command-line options.
  20 +
  21 +.. _ref.json-guarantees:
  22 +
  23 +JSON Guarantees
  24 +---------------
  25 +
  26 +The qpdf JSON representation includes a JSON serialization of the raw
  27 +objects in the PDF file as well as some computed information in a more
  28 +easily extracted format. QPDF provides some guarantees about its JSON
  29 +format. These guarantees are designed to simplify the experience of a
  30 +developer working with the JSON format.
  31 +
  32 +Compatibility
  33 + The top-level JSON object output is a dictionary. The JSON output
  34 + contains various nested dictionaries and arrays. With the exception
  35 + of dictionaries that are populated by the fields of objects from the
  36 + file, all instances of a dictionary are guaranteed to have exactly
  37 + the same keys. Future versions of qpdf are free to add additional
  38 + keys but not to remove keys or change the type of object that a key
  39 + points to. The qpdf program validates this guarantee, and in the
  40 + unlikely event that a bug in qpdf should cause it to generate data
  41 + that doesn't conform to this rule, it will ask you to file a bug
  42 + report.
  43 +
  44 + The top-level JSON structure contains a "``version``" key whose value
  45 + is simple integer. The value of the ``version`` key will be
  46 + incremented if a non-compatible change is made. A non-compatible
  47 + change would be any change that involves removal of a key, a change
  48 + to the format of data pointed to by a key, or a semantic change that
  49 + requires a different interpretation of a previously existing key. A
  50 + strong effort will be made to avoid breaking compatibility.
  51 +
  52 +Documentation
  53 + The :command:`qpdf` command can be invoked with the
  54 + :samp:`--json-help` option. This will output a JSON
  55 + structure that has the same structure as the JSON output that qpdf
  56 + generates, except that each field in the help output is a description
  57 + of the corresponding field in the JSON output. The specific
  58 + guarantees are as follows:
  59 +
  60 + - A dictionary in the help output means that the corresponding
  61 + location in the actual JSON output is also a dictionary with
  62 + exactly the same keys; that is, no keys present in help are absent
  63 + in the real output, and no keys will be present in the real output
  64 + that are not in help. As a special case, if the dictionary has a
  65 + single key whose name starts with ``<`` and ends with ``>``, it
  66 + means that the JSON output is a dictionary that can have any keys,
  67 + each of which conforms to the value of the special key. This is
  68 + used for cases in which the keys of the dictionary are things like
  69 + object IDs.
  70 +
  71 + - A string in the help output is a description of the item that
  72 + appears in the corresponding location of the actual output. The
  73 + corresponding output can have any format.
  74 +
  75 + - An array in the help output always contains a single element. It
  76 + indicates that the corresponding location in the actual output is
  77 + also an array, and that each element of the array has whatever
  78 + format is implied by the single element of the help output's
  79 + array.
  80 +
  81 + For example, the help output indicates includes a "``pagelabels``"
  82 + key whose value is an array of one element. That element is a
  83 + dictionary with keys "``index``" and "``label``". In addition to
  84 + describing the meaning of those keys, this tells you that the actual
  85 + JSON output will contain a ``pagelabels`` array, each of whose
  86 + elements is a dictionary that contains an ``index`` key, a ``label``
  87 + key, and no other keys.
  88 +
  89 +Directness and Simplicity
  90 + The JSON output contains the value of every object in the file, but
  91 + it also contains some processed data. This is analogous to how qpdf's
  92 + library interface works. The processed data is similar to the helper
  93 + functions in that it allows you to look at certain aspects of the PDF
  94 + file without having to understand all the nuances of the PDF
  95 + specification, while the raw objects allow you to mine the PDF for
  96 + anything that the higher-level interfaces are lacking.
  97 +
  98 +.. _json.limitations:
  99 +
  100 +Limitations of JSON Representation
  101 +----------------------------------
  102 +
  103 +There are a few limitations to be aware of with the JSON structure:
  104 +
  105 +- Strings, names, and indirect object references in the original PDF
  106 + file are all converted to strings in the JSON representation. In the
  107 + case of a "normal" PDF file, you can tell the difference because a
  108 + name starts with a slash (``/``), and an indirect object reference
  109 + looks like ``n n R``, but if there were to be a string that looked
  110 + like a name or indirect object reference, there would be no way to
  111 + tell this from the JSON output. Note that there are certain cases
  112 + where you know for sure what something is, such as knowing that
  113 + dictionary keys in objects are always names and that certain things
  114 + in the higher-level computed data are known to contain indirect
  115 + object references.
  116 +
  117 +- The JSON format doesn't support binary data very well. Mostly the
  118 + details are not important, but they are presented here for
  119 + information. When qpdf outputs a string in the JSON representation,
  120 + it converts the string to UTF-8, assuming usual PDF string semantics.
  121 + Specifically, if the original string is UTF-16, it is converted to
  122 + UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is
  123 + converted to UTF-8 with that assumption. This causes strange things
  124 + to happen to binary strings. For example, if you had the binary
  125 + string ``<038051>``, this would be output to the JSON as ``\u0003โ€ขQ``
  126 + because ``03`` is not a printable character and ``80`` is the bullet
  127 + character in PDF doc encoding and is mapped to the Unicode value
  128 + ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to
  129 + convert back from here to a binary string, would have to recognize
  130 + Unicode values whose code points are higher than ``0xFF`` and map
  131 + those back to their corresponding PDF doc encoding characters. There
  132 + is no way to tell the difference between a Unicode string that was
  133 + originally encoded as UTF-16 or one that was converted from PDF doc
  134 + encoding. In other words, it's best if you don't try to use the JSON
  135 + format to extract binary strings from the PDF file, but if you really
  136 + had to, it could be done. Note that qpdf's
  137 + :samp:`--show-object` option does not have this
  138 + limitation and will reveal the string as encoded in the original
  139 + file.
  140 +
  141 +.. _json.considerations:
  142 +
  143 +JSON: Special Considerations
  144 +----------------------------
  145 +
  146 +For the most part, the built-in JSON help tells you everything you need
  147 +to know about the JSON format, but there are a few non-obvious things to
  148 +be aware of:
  149 +
  150 +- While qpdf guarantees that keys present in the help will be present
  151 + in the output, those fields may be null or empty if the information
  152 + is not known or absent in the file. Also, if you specify
  153 + :samp:`--json-keys`, the keys that are not listed
  154 + will be excluded entirely except for those that
  155 + :samp:`--json-help` says are always present.
  156 +
  157 +- In a few places, there are keys with names containing
  158 + ``pageposfrom1``. The values of these keys are null or an integer. If
  159 + an integer, they point to a page index within the file numbering from
  160 + 1. Note that JSON indexes from 0, and you would also use 0-based
  161 + indexing using the API. However, 1-based indexing is easier in this
  162 + case because the command-line syntax for specifying page ranges is
  163 + 1-based. If you were going to write a program that looked through the
  164 + JSON for information about specific pages and then use the
  165 + command-line to extract those pages, 1-based indexing is easier.
  166 + Besides, it's more convenient to subtract 1 from a program in a real
  167 + programming language than it is to add 1 from shell code.
  168 +
  169 +- The image information included in the ``page`` section of the JSON
  170 + output includes the key "``filterable``". Note that the value of this
  171 + field may depend on the :samp:`--decode-level` that
  172 + you invoke qpdf with. The JSON output includes a top-level key
  173 + "``parameters``" that indicates the decode level used for computing
  174 + whether a stream was filterable. For example, jpeg images will be
  175 + shown as not filterable by default, but they will be shown as
  176 + filterable if you run :command:`qpdf --json
  177 + --decode-level=all`.
... ...
manual/library.rst 0 โ†’ 100644
  1 +.. _ref.using-library:
  2 +
  3 +Using the QPDF Library
  4 +======================
  5 +
  6 +.. _ref.using.from-cxx:
  7 +
  8 +Using QPDF from C++
  9 +-------------------
  10 +
  11 +The source tree for the qpdf package has an
  12 +:file:`examples` directory that contains a few
  13 +example programs. The :file:`qpdf/qpdf.cc` source
  14 +file also serves as a useful example since it exercises almost all of
  15 +the qpdf library's public interface. The best source of documentation on
  16 +the library itself is reading comments in
  17 +:file:`include/qpdf/QPDF.hh`,
  18 +:file:`include/qpdf/QPDFWriter.hh`, and
  19 +:file:`include/qpdf/QPDFObjectHandle.hh`.
  20 +
  21 +All header files are installed in the
  22 +:file:`include/qpdf` directory. It is recommend that
  23 +you use ``#include <qpdf/QPDF.hh>`` rather than adding
  24 +:file:`include/qpdf` to your include path.
  25 +
  26 +When linking against the qpdf static library, you may also need to
  27 +specify ``-lz -ljpeg`` on your link command. If your system understands
  28 +how to read libtool :file:`.la` files, this may not
  29 +be necessary.
  30 +
  31 +The qpdf library is safe to use in a multithreaded program, but no
  32 +individual ``QPDF`` object instance (including ``QPDF``,
  33 +``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one
  34 +thread at a time. Multiple threads may simultaneously work with
  35 +different instances of these and all other QPDF objects.
  36 +
  37 +.. _ref.using.other-languages:
  38 +
  39 +Using QPDF from other languages
  40 +-------------------------------
  41 +
  42 +The qpdf library is implemented in C++, which makes it hard to use
  43 +directly in other languages. There are a few things that can help.
  44 +
  45 +"C"
  46 + The qpdf library includes a "C" language interface that provides a
  47 + subset of the overall capabilities. The header file
  48 + :file:`qpdf/qpdf-c.h` includes information about
  49 + its use. As long as you use a C++ linker, you can link C programs
  50 + with qpdf and use the C API. For languages that can directly load
  51 + methods from a shared library, the C API can also be useful. People
  52 + have reported success using the C API from other languages on Windows
  53 + by directly calling functions in the DLL.
  54 +
  55 +Python
  56 + A Python module called
  57 + `pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and
  58 + highly functional set of Python bindings to the qpdf library. Using
  59 + pikepdf, you can work with PDF files in a natural way and combine
  60 + qpdf's capabilities with other functionality provided by Python's
  61 + rich standard library and available modules.
  62 +
  63 +Other Languages
  64 + Starting with version 8.3.0, the :command:`qpdf`
  65 + command-line tool can produce a JSON representation of the PDF file's
  66 + non-content data. This can facilitate interacting programmatically
  67 + with PDF files through qpdf's command line interface. For more
  68 + information, please see :ref:`ref.json`.
  69 +
  70 +.. _ref.unicode-files:
  71 +
  72 +A Note About Unicode File Names
  73 +-------------------------------
  74 +
  75 +When strings are passed to qpdf library routines either as ``char*`` or
  76 +as ``std::string``, they are treated as byte arrays except where
  77 +otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless
  78 +otherwise noted in comments in header files. In modern UNIX/Linux
  79 +environments, this generally does the right thing. In Windows, it's a
  80 +bit more complicated. Starting in qpdf 8.4.0, passwords that contain
  81 +Unicode characters are handled much better, and starting in qpdf 8.4.1,
  82 +the library attempts to properly handle Unicode characters in filenames.
  83 +In particular, in Windows, if a UTF-8 encoded string is used as a
  84 +filename in either ``QPDF`` or ``QPDFWriter``, it is internally
  85 +converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As
  86 +such, qpdf will generally operate properly on files with non-ASCII
  87 +characters in their names as long as the filenames are UTF-8 encoded for
  88 +passing into the qpdf library API, but there are still some rough edges,
  89 +such as the encoding of the filenames in error messages our CLI output
  90 +messages. Patches or bug reports are welcome for any continuing issues
  91 +with Unicode file names in Windows.
... ...
manual/license.rst 0 โ†’ 100644
  1 +.. _ref.license:
  2 +
  3 +License
  4 +=======
  5 +
  6 +QPDF is licensed under `the Apache License, Version 2.0
  7 +<http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License").
  8 +Unless required by applicable law or agreed to in writing, software
  9 +distributed under the License is distributed on an "AS IS" BASIS,
  10 +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
  11 +implied. See the License for the specific language governing
  12 +permissions and limitations under the License.
... ...
manual/linearization.rst 0 โ†’ 100644
  1 +.. _ref.linearization:
  2 +
  3 +Linearization
  4 +=============
  5 +
  6 +This chapter describes how ``QPDF`` and ``QPDFWriter`` implement
  7 +creation and processing of linearized PDFS.
  8 +
  9 +.. _ref.linearization-strategy:
  10 +
  11 +Basic Strategy for Linearization
  12 +--------------------------------
  13 +
  14 +To avoid the incestuous problem of having the qpdf library validate its
  15 +own linearized files, we have a special linearized file checking mode
  16 +which can be invoked via :command:`qpdf
  17 +--check-linearization` (or :command:`qpdf
  18 +--check`). This mode reads the linearization parameter
  19 +dictionary and the hint streams and validates that object ordering,
  20 +parameters, and hint stream contents are correct. The validation code
  21 +was first tested against linearized files created by external tools
  22 +(Acrobat and pdlin) and then used to validate files created by
  23 +``QPDFWriter`` itself.
  24 +
  25 +.. _ref.linearized.preparation:
  26 +
  27 +Preparing For Linearization
  28 +---------------------------
  29 +
  30 +Before creating a linearized PDF file from any other PDF file, the PDF
  31 +file must be altered such that all page attributes are propagated down
  32 +to the page level (and not inherited from parents in the ``/Pages``
  33 +tree). We also have to know which objects refer to which other objects,
  34 +being concerned with page boundaries and a few other cases. We refer to
  35 +this part of preparing the PDF file as
  36 +*optimization*, discussed in
  37 +:ref:`ref.optimization`. Note the, in this context, the
  38 +term *optimization* is a qpdf term, and the
  39 +term *linearization* is a term from the PDF
  40 +specification. Do not be confused by the fact that many applications
  41 +refer to linearization as optimization or web optimization.
  42 +
  43 +When creating linearized PDF files from optimized PDF files, there are
  44 +really only a few issues that need to be dealt with:
  45 +
  46 +- Creation of hints tables
  47 +
  48 +- Placing objects in the correct order
  49 +
  50 +- Filling in offsets and byte sizes
  51 +
  52 +.. _ref.optimization:
  53 +
  54 +Optimization
  55 +------------
  56 +
  57 +In order to perform various operations such as linearization and
  58 +splitting files into pages, it is necessary to know which objects are
  59 +referenced by which pages, page thumbnails, and root and trailer
  60 +dictionary keys. It is also necessary to ensure that all page-level
  61 +attributes appear directly at the page level and are not inherited from
  62 +parents in the pages tree.
  63 +
  64 +We refer to the process of enforcing these constraints as
  65 +*optimization*. As mentioned above, note
  66 +that some applications refer to linearization as optimization. Although
  67 +this optimization was initially motivated by the need to create
  68 +linearized files, we are using these terms separately.
  69 +
  70 +PDF file optimization is implemented in the
  71 +:file:`QPDF_optimization.cc` source file. That file
  72 +is richly commented and serves as the primary reference for the
  73 +optimization process.
  74 +
  75 +After optimization has been completed, the private member variables
  76 +``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have
  77 +been populated. Any object that has more than one value in the
  78 +``object_to_obj_users`` table is shared. Any object that has exactly one
  79 +value in the ``object_to_obj_users`` table is private. To find all the
  80 +private objects in a page or a trailer or root dictionary key, one
  81 +merely has make this determination for each element in the
  82 +``obj_user_to_objects`` table for the given page or key.
  83 +
  84 +Note that pages and thumbnails have different object user types, so the
  85 +above test on a page will not include objects referenced by the page's
  86 +thumbnail dictionary and nothing else.
  87 +
  88 +.. _ref.linearization.writing:
  89 +
  90 +Writing Linearized Files
  91 +------------------------
  92 +
  93 +We will create files with only primary hint streams. We will never write
  94 +overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either,
  95 +and they are never necessary.) The hint streams contain offset
  96 +information to objects that point to where they would be if the hint
  97 +stream were not present. This means that we have to calculate all object
  98 +positions before we can generate and write the hint table. This means
  99 +that we have to generate the file in two passes. To make this reliable,
  100 +``QPDFWriter`` in linearization mode invokes exactly the same code twice
  101 +to write the file to a pipeline.
  102 +
  103 +In the first pass, the target pipeline is a count pipeline chained to a
  104 +discard pipeline. The count pipeline simply passes its data through to
  105 +the next pipeline in the chain but can return the number of bytes passed
  106 +through it at any intermediate point. The discard pipeline is an end of
  107 +line pipeline that just throws its data away. The hint stream is not
  108 +written and dummy values with adequate padding are stored in the first
  109 +cross reference table, linearization parameter dictionary, and /Prev key
  110 +of the first trailer dictionary. All the offset, length, object
  111 +renumbering information, and anything else we need for the second pass
  112 +is stored.
  113 +
  114 +At the end of the first pass, this information is passed to the ``QPDF``
  115 +class which constructs a compressed hint stream in a memory buffer and
  116 +returns it. ``QPDFWriter`` uses this information to write a complete
  117 +hint stream object into a memory buffer. At this point, the length of
  118 +the hint stream is known.
  119 +
  120 +In the second pass, the end of the pipeline chain is a regular file
  121 +instead of a discard pipeline, and we have known values for all the
  122 +offsets and lengths that we didn't have in the first pass. We have to
  123 +adjust offsets that appear after the start of the hint stream by the
  124 +length of the hint stream, which is known. Anything that is of variable
  125 +length is padded, with the padding code surrounding any writing code
  126 +that differs in the two passes. This ensures that changes to the way
  127 +things are represented never results in offsets that were gathered
  128 +during the first pass becoming incorrect for the second pass.
  129 +
  130 +Using this strategy, we can write linearized files to a non-seekable
  131 +output stream with only a single pass to disk or wherever the output is
  132 +going.
  133 +
  134 +.. _ref.linearization-data:
  135 +
  136 +Calculating Linearization Data
  137 +------------------------------
  138 +
  139 +Once a file is optimized, we have information about which objects access
  140 +which other objects. We can then process these tables to decide which
  141 +part (as described in "Linearized PDF Document Structure" in the PDF
  142 +specification) each object is contained within. This tells us the exact
  143 +order in which objects are written. The ``QPDFWriter`` class asks for
  144 +this information and enqueues objects for writing in the proper order.
  145 +It also turns on a check that causes an exception to be thrown if an
  146 +object is encountered that has not already been queued. (This could
  147 +happen only if there were a bug in the traversal code used to calculate
  148 +the linearization data.)
  149 +
  150 +.. _ref.linearization-issues:
  151 +
  152 +Known Issues with Linearization
  153 +-------------------------------
  154 +
  155 +There are a handful of known issues with this linearization code. These
  156 +issues do not appear to impact the behavior of linearized files which
  157 +still work as intended: it is possible for a web browser to begin to
  158 +display them before they are fully downloaded. In fact, it seems that
  159 +various other programs that create linearized files have many of these
  160 +same issues. These items make reference to terminology used in the
  161 +linearization appendix of the PDF specification.
  162 +
  163 +- Thread Dictionary information keys appear in part 4 with the rest of
  164 + Threads instead of in part 9. Objects in part 9 are not grouped
  165 + together functionally.
  166 +
  167 +- We are not calculating numerators for shared object positions within
  168 + content streams or interleaving them within content streams.
  169 +
  170 +- We generate only page offset, shared object, and outline hint tables.
  171 + It would be relatively easy to add some additional tables. We gather
  172 + most of the information needed to create thumbnail hint tables. There
  173 + are comments in the code about this.
  174 +
  175 +.. _ref.linearization-debugging:
  176 +
  177 +Debugging Note
  178 +--------------
  179 +
  180 +The :command:`qpdf --show-linearization` command can show
  181 +the complete contents of linearization hint streams. To look at the raw
  182 +data, you can extract the filtered contents of the linearization hint
  183 +tables using :command:`qpdf --show-object=n
  184 +--filtered-stream-data`. Then, to convert this into a bit
  185 +stream (since linearization tables are bit streams written without
  186 +regard to byte boundaries), you can pipe the resulting data through the
  187 +following perl code:
  188 +
  189 +.. code-block:: perl
  190 +
  191 + use bytes;
  192 + binmode STDIN;
  193 + undef $/;
  194 + my $a = <STDIN>;
  195 + my @ch = split(//, $a);
  196 + map { printf("%08b", ord($_)) } @ch;
  197 + print "\n";
... ...
manual/object-streams.rst 0 โ†’ 100644
  1 +.. _ref.object-and-xref-streams:
  2 +
  3 +Object and Cross-Reference Streams
  4 +==================================
  5 +
  6 +This chapter provides information about the implementation of object
  7 +stream and cross-reference stream support in qpdf.
  8 +
  9 +.. _ref.object-streams:
  10 +
  11 +Object Streams
  12 +--------------
  13 +
  14 +Object streams can contain any regular object except the following:
  15 +
  16 +- stream objects
  17 +
  18 +- objects with generation > 0
  19 +
  20 +- the encryption dictionary
  21 +
  22 +- objects containing the /Length of another stream
  23 +
  24 +In addition, Adobe reader (at least as of version 8.0.0) appears to not
  25 +be able to handle having the document catalog appear in an object stream
  26 +if the file is encrypted, though this is not specifically disallowed by
  27 +the specification.
  28 +
  29 +There are additional restrictions for linearized files. See
  30 +:ref:`ref.object-streams-linearization` for details.
  31 +
  32 +The PDF specification refers to objects in object streams as "compressed
  33 +objects" regardless of whether the object stream is compressed.
  34 +
  35 +The generation number of every object in an object stream must be zero.
  36 +It is possible to delete and replace an object in an object stream with
  37 +a regular object.
  38 +
  39 +The object stream dictionary has the following keys:
  40 +
  41 +- ``/N``: number of objects
  42 +
  43 +- ``/First``: byte offset of first object
  44 +
  45 +- ``/Extends``: indirect reference to stream that this extends
  46 +
  47 +Stream collections are formed with ``/Extends``. They must form a
  48 +directed acyclic graph. These can be used for semantic information and
  49 +are not meaningful to the PDF document's syntactic structure. Although
  50 +qpdf preserves stream collections, it never generates them and doesn't
  51 +make use of this information in any way.
  52 +
  53 +The specification recommends limiting the number of objects in object
  54 +stream for efficiency in reading and decoding. Acrobat 6 uses no more
  55 +than 100 objects per object stream for linearized files and no more 200
  56 +objects per stream for non-linearized files. ``QPDFWriter``, in object
  57 +stream generation mode, never puts more than 100 objects in an object
  58 +stream.
  59 +
  60 +Object stream contents consists of *N* pairs of integers, each of which
  61 +is the object number and the byte offset of the object relative to the
  62 +first object in the stream, followed by the objects themselves,
  63 +concatenated.
  64 +
  65 +.. _ref.xref-streams:
  66 +
  67 +Cross-Reference Streams
  68 +-----------------------
  69 +
  70 +For non-hybrid files, the value following ``startxref`` is the byte
  71 +offset to the xref stream rather than the word ``xref``.
  72 +
  73 +For hybrid files (files containing both xref tables and cross-reference
  74 +streams), the xref table's trailer dictionary contains the key
  75 +``/XRefStm`` whose value is the byte offset to a cross-reference stream
  76 +that supplements the xref table. A PDF 1.5-compliant application should
  77 +read the xref table first. Then it should replace any object that it has
  78 +already seen with any defined in the xref stream. Then it should follow
  79 +any ``/Prev`` pointer in the original xref table's trailer dictionary.
  80 +The specification is not clear about what should be done, if anything,
  81 +with a ``/Prev`` pointer in the xref stream referenced by an xref table.
  82 +The ``QPDF`` class ignores it, which is probably reasonable since, if
  83 +this case were to appear for any sensible PDF file, the previous xref
  84 +table would probably have a corresponding ``/XRefStm`` pointer of its
  85 +own. For example, if a hybrid file were appended, the appended section
  86 +would have its own xref table and ``/XRefStm``. The appended xref table
  87 +would point to the previous xref table which would point the
  88 +``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to
  89 +it.
  90 +
  91 +Since xref streams must be read very early, they may not be encrypted,
  92 +and the may not contain indirect objects for keys required to read them,
  93 +which are these:
  94 +
  95 +- ``/Type``: value ``/XRef``
  96 +
  97 +- ``/Size``: value *n+1*: where *n* is highest object number (same as
  98 + ``/Size`` in the trailer dictionary)
  99 +
  100 +- ``/Index`` (optional): value
  101 + ``[:samp:`{n count}` ...]`` used to determine
  102 + which objects' information is stored in this stream. The default is
  103 + ``[0 /Size]``.
  104 +
  105 +- ``/Prev``: value :samp:`{offset}`: byte
  106 + offset of previous xref stream (same as ``/Prev`` in the trailer
  107 + dictionary)
  108 +
  109 +- ``/W [...]``: sizes of each field in the xref table
  110 +
  111 +The other fields in the xref stream, which may be indirect if desired,
  112 +are the union of those from the xref table's trailer dictionary.
  113 +
  114 +.. _ref.xref-stream-data:
  115 +
  116 +Cross-Reference Stream Data
  117 +~~~~~~~~~~~~~~~~~~~~~~~~~~~
  118 +
  119 +The stream data is binary and encoded in big-endian byte order. Entries
  120 +are concatenated, and each entry has a length equal to the total of the
  121 +entries in ``/W`` above. Each entry consists of one or more fields, the
  122 +first of which is the type of the field. The number of bytes for each
  123 +field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
  124 +is omitted and has the default value. The default value for the field
  125 +type is "``1``". All other default values are "``0``".
  126 +
  127 +PDF 1.5 has three field types:
  128 +
  129 +- 0: for free objects. Format: ``0 obj next-generation``, same as the
  130 + free table in a traditional cross-reference table
  131 +
  132 +- 1: regular non-compressed object. Format: ``1 offset generation``
  133 +
  134 +- 2: for objects in object streams. Format: ``2 object-stream-number
  135 + index``, the number of object stream containing the object and the
  136 + index within the object stream of the object.
  137 +
  138 +It seems standard to have the first entry in the table be ``0 0 0``
  139 +instead of ``0 0 ffff`` if there are no deleted objects.
  140 +
  141 +.. _ref.object-streams-linearization:
  142 +
  143 +Implications for Linearized Files
  144 +---------------------------------
  145 +
  146 +For linearized files, the linearization dictionary, document catalog,
  147 +and page objects may not be contained in object streams.
  148 +
  149 +Objects stored within object streams are given the highest range of
  150 +object numbers within the main and first-page cross-reference sections.
  151 +
  152 +It is okay to use cross-reference streams in place of regular xref
  153 +tables. There are on special considerations.
  154 +
  155 +Hint data refers to object streams themselves, not the objects in the
  156 +streams. Shared object references should also be made to the object
  157 +streams. There are no reference in any hint tables to the object numbers
  158 +of compressed objects (objects within object streams).
  159 +
  160 +When numbering objects, all shared objects within both the first and
  161 +second halves of the linearized files must be numbered consecutively
  162 +after all normal uncompressed objects in that half.
  163 +
  164 +.. _ref.object-stream-implementation:
  165 +
  166 +Implementation Notes
  167 +--------------------
  168 +
  169 +There are three modes for writing object streams:
  170 +:samp:`disable`, :samp:`preserve`, and
  171 +:samp:`generate`. In disable mode, we do not generate
  172 +any object streams, and we also generate an xref table rather than xref
  173 +streams. This can be used to generate PDF files that are viewable with
  174 +older readers. In preserve mode, we write object streams such that
  175 +written object streams contain the same objects and ``/Extends``
  176 +relationships as in the original file. This is equal to disable if the
  177 +file has no object streams. In generate, we create object streams
  178 +ourselves by grouping objects that are allowed in object streams
  179 +together in sets of no more than 100 objects. We also ensure that the
  180 +PDF version is at least 1.5 in generate mode, but we preserve the
  181 +version header in the other modes. The default is
  182 +:samp:`preserve`.
  183 +
  184 +We do not support creation of hybrid files. When we write files, even in
  185 +preserve mode, we will lose any xref tables and merge any appended
  186 +sections.
... ...
manual/overview.rst 0 โ†’ 100644
  1 +.. _ref.overview:
  2 +
  3 +What is QPDF?
  4 +=============
  5 +
  6 +QPDF is a program and C++ library for structural, content-preserving
  7 +transformations on PDF files. QPDF's website is located at
  8 +https://qpdf.sourceforge.io/. QPDF's source code is hosted on github
  9 +at https://github.com/qpdf/qpdf.
  10 +
  11 +QPDF provides many useful capabilities to developers of PDF-producing
  12 +software or for people who just want to look at the innards of a PDF
  13 +file to learn more about how they work. With QPDF, it is possible to
  14 +copy objects from one PDF file into another and to manipulate the list
  15 +of pages in a PDF file. This makes it possible to merge and split PDF
  16 +files. The QPDF library also makes it possible for you to create PDF
  17 +files from scratch. In this mode, you are responsible for supplying
  18 +all the contents of the file, while the QPDF library takes care of all
  19 +the syntactical representation of the objects, creation of cross
  20 +references tables and, if you use them, object streams, encryption,
  21 +linearization, and other syntactic details. You are still responsible
  22 +for generating PDF content on your own.
  23 +
  24 +QPDF has been designed with very few external dependencies, and it is
  25 +intentionally very lightweight. QPDF is *not* a PDF content creation
  26 +library, a PDF viewer, or a program capable of converting PDF into other
  27 +formats. In particular, QPDF knows nothing about the semantics of PDF
  28 +content streams. If you are looking for something that can do that, you
  29 +should look elsewhere. However, once you have a valid PDF file, QPDF can
  30 +be used to transform that file in ways that perhaps your original PDF
  31 +creation tool can't handle. For example, many programs generate simple PDF
  32 +files but can't password-protect them, web-optimize them, or perform
  33 +other transformations of that type.
... ...
manual/qdf.rst 0 โ†’ 100644
  1 +.. _ref.qdf:
  2 +
  3 +QDF Mode
  4 +========
  5 +
  6 +In QDF mode, qpdf creates PDF files in what we call *QDF
  7 +form*. A PDF file in QDF form, sometimes called a QDF
  8 +file, is a completely valid PDF file that has ``%QDF-1.0`` as its third
  9 +line (after the pdf header and binary characters) and has certain other
  10 +characteristics. The purpose of QDF form is to make it possible to edit
  11 +PDF files, with some restrictions, in an ordinary text editor. This can
  12 +be very useful for experimenting with different PDF constructs or for
  13 +making one-off edits to PDF files (though there are other reasons why
  14 +this may not always work). Note that QDF mode does not support
  15 +linearized files. If you enable linearization, QDF mode is automatically
  16 +disabled.
  17 +
  18 +It is ordinarily very difficult to edit PDF files in a text editor for
  19 +two reasons: most meaningful data in PDF files is compressed, and PDF
  20 +files are full of offset and length information that makes it hard to
  21 +add or remove data. A QDF file is organized in a manner such that, if
  22 +edits are kept within certain constraints, the
  23 +:command:`fix-qdf` program, distributed with qpdf, is
  24 +able to restore edited files to a correct state. The
  25 +:command:`fix-qdf` program takes no command-line
  26 +arguments. It reads a possibly edited QDF file from standard input and
  27 +writes a repaired file to standard output.
  28 +
  29 +The following attributes characterize a QDF file:
  30 +
  31 +- All objects appear in numerical order in the PDF file, including when
  32 + objects appear in object streams.
  33 +
  34 +- Objects are printed in an easy-to-read format, and all line endings
  35 + are normalized to UNIX line endings.
  36 +
  37 +- Unless specifically overridden, streams appear uncompressed (when
  38 + qpdf supports the filters and they are compressed with a non-lossy
  39 + compression scheme), and most content streams are normalized (line
  40 + endings are converted to just a UNIX-style linefeeds).
  41 +
  42 +- All streams lengths are represented as indirect objects, and the
  43 + stream length object is always the next object after the stream. If
  44 + the stream data does not end with a newline, an extra newline is
  45 + inserted, and a special comment appears after the stream indicating
  46 + that this has been done.
  47 +
  48 +- If the PDF file contains object streams, if object stream *n*
  49 + contains *k* objects, those objects are numbered from *n+1* through
  50 + *n+k*, and the object number/offset pairs appear on a separate line
  51 + for each object. Additionally, each object in the object stream is
  52 + preceded by a comment indicating its object number and index. This
  53 + makes it very easy to find objects in object streams.
  54 +
  55 +- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens,
  56 + and ``endobj`` tokens appear on lines by themselves. A blank line
  57 + follows every ``endobj`` token.
  58 +
  59 +- If there is a cross-reference stream, it is unfiltered.
  60 +
  61 +- Page dictionaries and page content streams are marked with special
  62 + comments that make them easy to find.
  63 +
  64 +- Comments precede each object indicating the object number of the
  65 + corresponding object in the original file.
  66 +
  67 +When editing a QDF file, any edits can be made as long as the above
  68 +constraints are maintained. This means that you can freely edit a page's
  69 +content without worrying about messing up the QDF file. It is also
  70 +possible to add new objects so long as those objects are added after the
  71 +last object in the file or subsequent objects are renumbered. If a QDF
  72 +file has object streams in it, you can always add the new objects before
  73 +the xref stream and then change the number of the xref stream, since
  74 +nothing generally ever references it by number.
  75 +
  76 +It is not generally practical to remove objects from QDF files without
  77 +messing up object numbering, but if you remove all references to an
  78 +object, you can run qpdf on the file (after running
  79 +:command:`fix-qdf`), and qpdf will omit the now-orphaned
  80 +object.
  81 +
  82 +When :command:`fix-qdf` is run, it goes through the file
  83 +and recomputes the following parts of the file:
  84 +
  85 +- the ``/N``, ``/W``, and ``/First`` keys of all object stream
  86 + dictionaries
  87 +
  88 +- the pairs of numbers representing object numbers and offsets of
  89 + objects in object streams
  90 +
  91 +- all stream lengths
  92 +
  93 +- the cross-reference table or cross-reference stream
  94 +
  95 +- the offset to the cross-reference table or cross-reference stream
  96 + following the ``startxref`` token
... ...
manual/release-notes.rst 0 โ†’ 100644
  1 +.. _ref.release-notes:
  2 +
  3 +Release Notes
  4 +=============
  5 +
  6 +For a detailed list of changes, please see the file
  7 +:file:`ChangeLog` in the source distribution.
  8 +
  9 +10.5.0: XXX Month dd, YYYY
  10 + - Library Enhancements
  11 +
  12 + - Since qpdf version 8, using object accessor methods on an
  13 + instance of ``QPDFObjectHandle`` may create warnings if the
  14 + object is not of the expected type. These warnings now have an
  15 + error code of ``qpdf_e_object`` instead of
  16 + ``qpdf_e_damaged_pdf``. Also, comments have been added to
  17 + :file:`QPDFObjectHandle.hh` to explain in more detail what the
  18 + behavior is. See :ref:`ref.object-accessors` for a more in-depth
  19 + discussion.
  20 +
  21 + - Add ``Pl_Buffer::getMallocBuffer()`` to initialize a buffer
  22 + allocated with ``malloc()`` for better cross-language
  23 + interoperability.
  24 +
  25 + - C API Enhancements
  26 +
  27 + - Overhaul error handling for the object handle functions C API.
  28 + Some rare error conditions that would previously have caused a
  29 + crash are now trapped and reported, and the functions that
  30 + generate them return fallback values. See comments in the
  31 + ``ERROR HANDLING`` section of :file:`include/qpdf/qpdf-c.h` for
  32 + details. In particular, exceptions thrown by the underlying C++
  33 + code when calling object accessors are caught and converted into
  34 + errors. The errors can be checked by call ``qpdf_has_error``.
  35 + Use ``qpdf_silence_errors`` to prevent the error from being
  36 + written to stderr.
  37 +
  38 + - Add ``qpdf_get_last_string_length`` to the C API to get the
  39 + length of the last string that was returned. This is needed to
  40 + handle strings that contain embedded null characters.
  41 +
  42 + - Add ``qpdf_oh_is_initialized`` and
  43 + ``qpdf_oh_new_uninitialized`` to the C API to make it possible
  44 + to work with uninitialized objects.
  45 +
  46 + - Add ``qpdf_oh_new_object`` to the C API. This allows you to
  47 + clone an object handle.
  48 +
  49 + - Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``,
  50 + and ``qpdf_replace_object``, exposing the corresponding methods
  51 + in ``QPDF`` and ``QPDFObjectHandle``.
  52 +
  53 + - Add several functions for working with pages. See ``PAGE
  54 + FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.
  55 +
  56 + - Add several functions for working with streams. See ``STREAM
  57 + FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.
  58 +
  59 + - Add ``qpdf_oh_get_type_code`` and ``qpdf_oh_get_type_name``.
  60 +
  61 + - Documentation change
  62 +
  63 + - The documentation sources have been switched from docbook to
  64 + reStructuredText processed with `Sphinx
  65 + <https://sphinx-doc.org>`__. This is mostly transparent (other
  66 + than format change) with the exception that all section links
  67 + have changed. What used to be `#ref.something` is now
  68 + `#something`. A top-to-bottom review of the documentation is
  69 + planned for an upcoming release.
  70 +
  71 +10.4.0: November 16, 2021
  72 + - Handling of Weak Cryptography Algorithms
  73 +
  74 + - From the qpdf CLI, the
  75 + :samp:`--allow-weak-crypto` is now required to
  76 + suppress a warning when explicitly creating PDF files using RC4
  77 + encryption. While qpdf will always retain the ability to read
  78 + and write such files, doing so will require explicit
  79 + acknowledgment moving forward. For qpdf 10.4, this change only
  80 + affects the command-line tool. Starting in qpdf 11, there will
  81 + be small API changes to require explicit acknowledgment in
  82 + those cases as well. For additional information, see :ref:`ref.weak-crypto`.
  83 +
  84 + - Bug Fixes
  85 +
  86 + - Fix potential bounds error when handling shell completion that
  87 + could occur when given bogus input.
  88 +
  89 + - Properly handle overlay/underlay on completely empty pages
  90 + (with no resource dictionary).
  91 +
  92 + - Fix crash that could occur under certain conditions when using
  93 + :samp:`--pages` with files that had form
  94 + fields.
  95 +
  96 + - Library Enhancements
  97 +
  98 + - Make ``QPDF::findPage`` functions public.
  99 +
  100 + - Add methods to ``Pl_Flate`` to be able to receive warnings on
  101 + certain recoverable conditions.
  102 +
  103 + - Add an extra check to the library to detect when foreign
  104 + objects are inserted directly (instead of using
  105 + ``QPDF::copyForeignObject``) at the time of insertion rather
  106 + than when the file is written. Catching the error sooner makes
  107 + it much easier to locate the incorrect code.
  108 +
  109 + - CLI Enhancements
  110 +
  111 + - Improve diagnostics around parsing
  112 + :samp:`--pages` command-line options
  113 +
  114 + - Packaging Changes
  115 +
  116 + - The Windows binary distribution is now built with crypto
  117 + provided by OpenSSL 3.0.
  118 +
  119 +10.3.2: May 8, 2021
  120 + - Bug Fixes
  121 +
  122 + - When generating a file while preserving object streams,
  123 + unreferenced objects are correctly removed unless
  124 + :samp:`--preserve-unreferenced` is specified.
  125 +
  126 + - Library Enhancements
  127 +
  128 + - When adding a page that already exists, make a shallow copy
  129 + instead of throwing an exception. This makes the library
  130 + behavior consistent with the CLI behavior. See
  131 + :file:`ChangeLog` for additional notes.
  132 +
  133 +10.3.1: March 11, 2021
  134 + - Bug Fixes
  135 +
  136 + - Form field copying failed on files where /DR was a direct
  137 + object in the document-level form dictionary.
  138 +
  139 +10.3.0: March 4, 2021
  140 + - Bug Fixes
  141 +
  142 + - The code for handling form fields when copying pages from
  143 + 10.2.0 was not quite right and didn't work in a number of
  144 + situations, such as when the same page was copied multiple
  145 + times or when there were conflicting resource or field names
  146 + across multiple copies. The 10.3.0 code has been much more
  147 + thoroughly tested with more complex cases and with a multitude
  148 + of readers and should be much closer to correct. The 10.2.0
  149 + code worked well enough for page splitting or for copying pages
  150 + with form fields into documents that didn't already have them
  151 + but was still not quite correct in handling of field-level
  152 + resources.
  153 +
  154 + - When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is
  155 + called, existing ``QPDFObjectHandle`` instances no longer point
  156 + to the old objects. The next time they are accessed, they
  157 + automatically notice the change to the underlying object and
  158 + update themselves. This resolves a very longstanding source of
  159 + confusion, albeit in a very rarely used method call.
  160 +
  161 + - Fix form field handling code to look for default appearances,
  162 + quadding, and default resources in the right places. The code
  163 + was not looking for things in the document-level interactive
  164 + form dictionary that it was supposed to be finding there. This
  165 + required adding a few new methods to
  166 + ``QPDFFormFieldObjectHelper``.
  167 +
  168 + - Library Enhancements
  169 +
  170 + - Reworked the code that handles copying annotations and form
  171 + fields during page operations. There were additional methods
  172 + added to the public API from 10.2.0 and a one deprecation of a
  173 + method added in 10.2.0. The majority of the API changes are in
  174 + methods most people would never call and that will hopefully be
  175 + superseded by higher-level interfaces for handling page copies.
  176 + Please see the :file:`ChangeLog` file for
  177 + details.
  178 +
  179 + - The method ``QPDF::numWarnings`` was added so that you can tell
  180 + whether any warnings happened during a specific block of code.
  181 +
  182 +10.2.0: February 23, 2021
  183 + - CLI Behavior Changes
  184 +
  185 + - Operations that work on combining pages are much better about
  186 + protecting form fields. In particular,
  187 + :samp:`--split-pages` and
  188 + :samp:`--pages` now preserve interaction form
  189 + functionality by copying the relevant form field information
  190 + from the original files. Additionally, if you use
  191 + :samp:`--pages` to select only some pages from
  192 + the original input file, unused form fields are removed, which
  193 + prevents lots of unused annotations from being retained.
  194 +
  195 + - By default, :command:`qpdf` no longer allows
  196 + creation of encrypted PDF files whose user password is
  197 + non-empty and owner password is empty when a 256-bit key is in
  198 + use. The :samp:`--allow-insecure` option,
  199 + specified inside the :samp:`--encrypt` options,
  200 + allows creation of such files. Behavior changes in the CLI are
  201 + avoided when possible, but an exception was made here because
  202 + this is security-related. qpdf must always allow creation of
  203 + weird files for testing purposes, but it should not default to
  204 + letting users unknowingly create insecure files.
  205 +
  206 + - Library Behavior Changes
  207 +
  208 + - Note: the changes in this section cause differences in output
  209 + in some cases. These differences change the syntax of the PDF
  210 + but do not change the semantics (meaning). I make a strong
  211 + effort to avoid gratuitous changes in qpdf's output so that
  212 + qpdf changes don't break people's tests. In this case, the
  213 + changes significantly improve the readability of the generated
  214 + PDF and don't affect any output that's generated by simple
  215 + transformation. If you are annoyed by having to update test
  216 + files, please rest assured that changes like this have been and
  217 + will continue to be rare events.
  218 +
  219 + - ``QPDFObjectHandle::newUnicodeString`` now uses whichever of
  220 + ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all
  221 + the characters in the string. This reduces needless encoding in
  222 + UTF-16 of strings that can be encoded in ASCII. This change may
  223 + cause qpdf to generate different output than before when form
  224 + field values are set using ``QPDFFormFieldObjectHelper`` but
  225 + does not change the meaning of the output.
  226 +
  227 + - The code that places form XObjects and also the code that
  228 + flattens rotations trim trailing zeroes from real numbers that
  229 + they calculate. This causes slight (but semantically
  230 + equivalent) differences in generated appearance streams and
  231 + form XObject invocations in overlay/underlay code or in user
  232 + code that calls the methods that place form XObjects on a page.
  233 +
  234 + - CLI Enhancements
  235 +
  236 + - Add new command line options for listing, saving, adding,
  237 + removing, and and copying file attachments. See :ref:`ref.attachments` for details.
  238 +
  239 + - Page splitting and merging operations, as well as
  240 + :samp:`--flatten-rotation`, are better behaved
  241 + with respect to annotations and interactive form fields. In
  242 + most cases, interactive form field functionality and proper
  243 + formatting and functionality of annotations is preserved by
  244 + these operations. There are still some cases that aren't
  245 + perfect, such as when functionality of annotations depends on
  246 + document-level data that qpdf doesn't yet understand or when
  247 + there are problems with referential integrity among form fields
  248 + and annotations (e.g., when a single form field object or its
  249 + associated annotations are shared across multiple pages, a case
  250 + that is out of spec but that works in most viewers anyway).
  251 +
  252 + - The option
  253 + :samp:`--password-file={filename}`
  254 + can now be used to read the decryption password from a file.
  255 + You can use ``-`` as the file name to read the password from
  256 + standard input. This is an easier/more obvious way to read
  257 + passwords from files or standard input than using
  258 + :samp:`@file` for this purpose.
  259 +
  260 + - Add some information about attachments to the json output, and
  261 + added ``attachments`` as an additional json key. The
  262 + information included here is limited to the preferred name and
  263 + content stream and a reference to the file spec object. This is
  264 + enough detail for clients to avoid the hassle of navigating a
  265 + name tree and provides what is needed for basic enumeration and
  266 + extraction of attachments. More detailed information can be
  267 + obtained by following the reference to the file spec object.
  268 +
  269 + - Add numeric option to :samp:`--collate`. If
  270 + :samp:`--collate={n}`
  271 + is given, take pages in groups of
  272 + :samp:`{n}` from the given files.
  273 +
  274 + - It is now valid to provide :samp:`--rotate=0`
  275 + to clear rotation from a page.
  276 +
  277 + - Library Enhancements
  278 +
  279 + - This release includes numerous additions to the API. Not all
  280 + changes are listed here. Please see the
  281 + :file:`ChangeLog` file in the source
  282 + distribution for a comprehensive list. Highlights appear below.
  283 +
  284 + - Add ``QPDFObjectHandle::ditems()`` and
  285 + ``QPDFObjectHandle::aitems()`` that enable C++-style iteration,
  286 + including range-for iteration, over dictionary and array
  287 + QPDFObjectHandles. See comments in
  288 + :file:`include/qpdf/QPDFObjectHandle.hh`
  289 + and
  290 + :file:`examples/pdf-name-number-tree.cc`
  291 + for details.
  292 +
  293 + - Add ``QPDFObjectHandle::copyStream`` for making a copy of a
  294 + stream within the same ``QPDF`` instance.
  295 +
  296 + - Add new helper classes for supporting file attachments, also
  297 + known as embedded files. New classes are
  298 + ``QPDFEmbeddedFileDocumentHelper``,
  299 + ``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``.
  300 + See their respective headers for details and
  301 + :file:`examples/pdf-attach-file.cc` for an
  302 + example.
  303 +
  304 + - Add a version of ``QPDFObjectHandle::parse`` that takes a
  305 + ``QPDF`` pointer as context so that it can parse strings
  306 + containing indirect object references. This is illustrated in
  307 + :file:`examples/pdf-attach-file.cc`.
  308 +
  309 + - Re-implement ``QPDFNameTreeObjectHelper`` and
  310 + ``QPDFNumberTreeObjectHelper`` to be more efficient, add an
  311 + iterator-based API, give them the capability to repair broken
  312 + trees, and create methods for modifying the trees. With this
  313 + change, qpdf has a robust read/write implementation of name and
  314 + number trees.
  315 +
  316 + - Add new versions of ``QPDFObjectHandle::replaceStreamData``
  317 + that take ``std::function`` objects for cases when you need
  318 + something between a static string and a full-fledged
  319 + StreamDataProvider. Using this with ``QUtil::file_provider`` is
  320 + a very easy way to create a stream from the contents of a file.
  321 +
  322 + - The ``QPDFMatrix`` class, formerly a private, internal class,
  323 + has been added to the public API. See
  324 + :file:`include/qpdf/QPDFMatrix.hh` for
  325 + details. This class is for working with transformation
  326 + matrices. Some methods in ``QPDFPageObjectHelper`` make use of
  327 + this to make information about transformation matrices
  328 + available. For an example, see
  329 + :file:`examples/pdf-overlay-page.cc`.
  330 +
  331 + - Several new methods were added to
  332 + ``QPDFAcroFormDocumentHelper`` for adding, removing, getting
  333 + information about, and enumerating form fields.
  334 +
  335 + - Add method
  336 + ``QPDFAcroFormDocumentHelper::transformAnnotations``, which
  337 + applies a transformation to each annotation on a page.
  338 +
  339 + - Add ``QPDFPageObjectHelper::copyAnnotations``, which copies
  340 + annotations and, if applicable, associated form fields, from
  341 + one page to another, possibly transforming the rectangles.
  342 +
  343 + - Build Changes
  344 +
  345 + - A C++-14 compiler is now required to build qpdf. There is no
  346 + intention to require anything newer than that for a while.
  347 + C++-14 includes modest enhancements to C++-11 and appears to be
  348 + supported about as widely as C++-11.
  349 +
  350 + - Bug Fixes
  351 +
  352 + - The :samp:`--flatten-rotation` option applies
  353 + transformations to any annotations that may be on the page.
  354 +
  355 + - If a form XObject lacks a resources dictionary, consider any
  356 + names in that form XObject to be referenced from the containing
  357 + page. This is compliant with older PDF versions. Also detect if
  358 + any form XObjects have any unresolved names and, if so, don't
  359 + remove unreferenced resources from them or from the page that
  360 + contains them. Unfortunately this has the side effect of
  361 + preventing removal of unreferenced resources in some cases
  362 + where names appear that don't refer to resources, such as with
  363 + tagged PDF. This is a bit of a corner case that is not likely
  364 + to cause a significant problem in practice, but the only side
  365 + effect would be lack of removal of shared resources. A future
  366 + version of qpdf may be more sophisticated in its detection of
  367 + names that refer to resources.
  368 +
  369 + - Properly handle strings if they appear in inline image
  370 + dictionaries while externalizing inline images.
  371 +
  372 +10.1.0: January 5, 2021
  373 + - CLI Enhancements
  374 +
  375 + - Add :samp:`--flatten-rotation` command-line
  376 + option, which causes all pages that are rotated using
  377 + parameters in the page's dictionary to instead be identically
  378 + rotated in the page's contents. The change is not user-visible
  379 + for compliant PDF readers but can be used to work around broken
  380 + PDF applications that don't properly handle page rotation.
  381 +
  382 + - Library Enhancements
  383 +
  384 + - Support for user-provided (pluggable, modular) stream filters.
  385 + It is now possible to derive a class from ``QPDFStreamFilter``
  386 + and register it with ``QPDF`` so that regular library methods,
  387 + including those used by ``QPDFWriter``, can decode streams with
  388 + filters not directly supported by the library. The example
  389 + :file:`examples/pdf-custom-filter.cc`
  390 + illustrates how to use this capability.
  391 +
  392 + - Add methods to ``QPDFPageObjectHelper`` to iterate through
  393 + XObjects on a page or form XObjects, possibly recursing into
  394 + nested form XObjects: ``forEachXObject``, ``ForEachImage``,
  395 + ``forEachFormXObject``.
  396 +
  397 + - Enhance several methods in ``QPDFPageObjectHelper`` to work
  398 + with form XObjects as well as pages, as noted in comments. See
  399 + :file:`ChangeLog` for a full list.
  400 +
  401 + - Rename some functions in ``QPDFPageObjectHelper``, while
  402 + keeping old names for compatibility:
  403 +
  404 + - ``getPageImages`` to ``getImages``
  405 +
  406 + - ``filterPageContents`` to ``filterContents``
  407 +
  408 + - ``pipePageContents`` to ``pipeContents``
  409 +
  410 + - ``parsePageContents`` to ``parseContents``
  411 +
  412 + - Add method ``QPDFPageObjectHelper::getFormXObjects`` to return
  413 + a map of form XObjects directly on a page or form XObject
  414 +
  415 + - Add new helper methods to ``QPDFObjectHandle``:
  416 + ``isFormXObject``, ``isImage``
  417 +
  418 + - Add the optional ``allow_streams`` parameter
  419 + ``QPDFObjectHandle::makeDirect``. When
  420 + ``QPDFObjectHandle::makeDirect`` is called in this way, it
  421 + preserves references to streams rather than throwing an
  422 + exception.
  423 +
  424 + - Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this
  425 + on a stream prevents ``QPDFWriter`` from attempting to
  426 + uncompress, recompress, or otherwise filter a stream even if it
  427 + could. Developers can use this to protect streams that are
  428 + optimized should be protected from ``QPDFWriter``'s default
  429 + behavior for any other reason.
  430 +
  431 + - Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is
  432 + useful to have for debugging.
  433 +
  434 + - Add method ``QPDFPageObjectHelper::flattenRotation``, which
  435 + replaces a page's ``/Rotate`` keyword by rotating the page
  436 + within the content stream and altering the page's bounding
  437 + boxes so the rendering is the same. This can be used to work
  438 + around buggy PDF readers that can't properly handle page
  439 + rotation.
  440 +
  441 + - C API Enhancements
  442 +
  443 + - Add several new functions to the C API for working with
  444 + objects. These are wrappers around many of the methods in
  445 + ``QPDFObjectHandle``. Their inclusion adds considerable new
  446 + capability to the C API.
  447 +
  448 + - Add ``qpdf_register_progress_reporter`` to the C API,
  449 + corresponding to ``QPDFWriter::registerProgressReporter``.
  450 +
  451 + - Performance Enhancements
  452 +
  453 + - Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object
  454 + for writing, resulting in about an 8% improvement in write
  455 + performance while allowing indirect objects to appear in
  456 + ``/DecodeParms``.
  457 +
  458 + - When extracting pages, the :command:`qpdf` CLI
  459 + only removes unreferenced resources from the pages that are
  460 + being kept, resulting in a significant performance improvement
  461 + when extracting small numbers of pages from large, complex
  462 + documents.
  463 +
  464 + - Bug Fixes
  465 +
  466 + - ``QPDFPageObjectHelper::externalizeInlineImages`` was not
  467 + externalizing images referenced from form XObjects that
  468 + appeared on the page.
  469 +
  470 + - ``QPDFObjectHandle::filterPageContents`` was broken for pages
  471 + with multiple content streams.
  472 +
  473 + - Tweak zsh completion code to behave a little better with
  474 + respect to path completion.
  475 +
  476 +10.0.4: November 21, 2020
  477 + - Bug Fixes
  478 +
  479 + - Fix a handful of integer overflows. This includes cases found
  480 + by fuzzing as well as having qpdf not do range checking on
  481 + unused values in the xref stream.
  482 +
  483 +10.0.3: October 31, 2020
  484 + - Bug Fixes
  485 +
  486 + - The fix to the bug involving copying streams with indirect
  487 + filters was incorrect and introduced a new, more serious bug.
  488 + The original bug has been fixed correctly, as has the bug
  489 + introduced in 10.0.2.
  490 +
  491 +10.0.2: October 27, 2020
  492 + - Bug Fixes
  493 +
  494 + - When concatenating content streams, as with
  495 + :samp:`--coalesce-contents`, there were cases
  496 + in which qpdf would merge two lexical tokens together, creating
  497 + invalid results. A newline is now inserted between merged
  498 + content streams if one is not already present.
  499 +
  500 + - Fix an internal error that could occur when copying foreign
  501 + streams whose stream data had been replaced using a stream data
  502 + provider if those streams had indirect filters or decode
  503 + parameters. This is a rare corner case.
  504 +
  505 + - Ensure that the caller's locale settings do not change the
  506 + results of numeric conversions performed internally by the qpdf
  507 + library. Note that the problem here could only be caused when
  508 + the qpdf library was used programmatically. Using the qpdf CLI
  509 + already ignored the user's locale for numeric conversion.
  510 +
  511 + - Fix several instances in which warnings were not suppressed in
  512 + spite of :samp:`--no-warn` and/or errors or
  513 + warnings were written to standard output rather than standard
  514 + error.
  515 +
  516 + - Fixed a memory leak that could occur under specific
  517 + circumstances when
  518 + :samp:`--object-streams=generate` was used.
  519 +
  520 + - Fix various integer overflows and similar conditions found by
  521 + the OSS-Fuzz project.
  522 +
  523 + - Enhancements
  524 +
  525 + - New option :samp:`--warning-exit-0` causes qpdf
  526 + to exit with a status of ``0`` rather than ``3`` if there are
  527 + warnings but no errors. Combine with
  528 + :samp:`--no-warn` to completely ignore
  529 + warnings.
  530 +
  531 + - Performance improvements have been made to
  532 + ``QPDF::processMemoryFile``.
  533 +
  534 + - The OpenSSL crypto provider produces more detailed error
  535 + messages.
  536 +
  537 + - Build Changes
  538 +
  539 + - The option :samp:`--disable-rpath` is now
  540 + supported by qpdf's :command:`./configure`
  541 + script. Some distributions' packaging standards recommended the
  542 + use of this option.
  543 +
  544 + - Selection of a printf format string for ``long long`` has
  545 + been moved from ``ifdefs`` to an autoconf
  546 + test. If you are using your own build system, you will need to
  547 + provide a value for ``LL_FMT`` in
  548 + :file:`libqpdf/qpdf/qpdf-config.h`, which
  549 + would typically be ``"%lld"`` or, for some Windows compilers,
  550 + ``"%I64d"``.
  551 +
  552 + - Several improvements were made to build-time configuration of
  553 + the OpenSSL crypto provider.
  554 +
  555 + - A nearly stand-alone Linux binary zip file is now included with
  556 + the qpdf release. This is built on an older (but supported)
  557 + Ubuntu LTS release, but would work on most reasonably recent
  558 + Linux distributions. It contains only the executables and
  559 + required shared libraries that would not be present on a
  560 + minimal system. It can be used for including qpdf in a minimal
  561 + environment, such as a docker container. The zip file is also
  562 + known to work as a layer in AWS Lambda.
  563 +
  564 + - QPDF's automated build has been migrated from Azure Pipelines
  565 + to GitHub Actions.
  566 +
  567 + - Windows-specific Changes
  568 +
  569 + - The Windows executables distributed with qpdf releases now use
  570 + the OpenSSL crypto provider by default. The native crypto
  571 + provider is also compiled in and can be selected at runtime
  572 + with the ``QPDF_CRYPTO_PROVIDER`` environment variable.
  573 +
  574 + - Improvements have been made to how a cryptographic provider is
  575 + obtained in the native Windows crypto implementation. However
  576 + mostly this is shadowed by OpenSSL being used by default.
  577 +
  578 +10.0.1: April 9, 2020
  579 + - Bug Fixes
  580 +
  581 + - 10.0.0 introduced a bug in which calling
  582 + ``QPDFObjectHandle::getStreamData`` on a stream that can't be
  583 + filtered was returning the raw data instead of throwing an
  584 + exception. This is now fixed.
  585 +
  586 + - Fix a bug that was preventing qpdf from linking with some
  587 + versions of clang on some platforms.
  588 +
  589 + - Enhancements
  590 +
  591 + - Improve the :file:`pdf-invert-images`
  592 + example to avoid having to load all the images into RAM at the
  593 + same time.
  594 +
  595 +10.0.0: April 6, 2020
  596 + - Performance Enhancements
  597 +
  598 + - The qpdf library and executable should run much faster in this
  599 + version than in the last several releases. Several internal
  600 + library optimizations have been made, and there has been
  601 + improved behavior on page splitting as well. This version of
  602 + qpdf should outperform any of the 8.x or 9.x versions.
  603 +
  604 + - Incompatible API (source-level) Changes (minor)
  605 +
  606 + - The ``QUtil::srandom`` method was removed. It didn't do
  607 + anything unless insecure random numbers were compiled in, and
  608 + they have been off by default for a long time. If you were
  609 + calling it, just remove the call since it wasn't doing anything
  610 + anyway.
  611 +
  612 + - Build/Packaging Changes
  613 +
  614 + - Add a ``openssl`` crypto provider, which is implemented with
  615 + OpenSSL and also works with BoringSSL. Thanks to Dean Scarff
  616 + for this contribution. If you maintain qpdf for a distribution,
  617 + pay special attention to make sure that you are including
  618 + support for the crypto providers you want. Package maintainers
  619 + will have to weigh the advantages of allowing users to pick a
  620 + crypto provider at runtime against the disadvantages of adding
  621 + more dependencies to qpdf.
  622 +
  623 + - Allow qpdf to built on stripped down systems whose C/C++
  624 + libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in
  625 + qpdf's README.md for details. This should be very rare, but it
  626 + is known to be helpful in some embedded environments.
  627 +
  628 + - CLI Enhancements
  629 +
  630 + - Add ``objectinfo`` key to the JSON output. This will be a place
  631 + to put computed metadata or other information about PDF objects
  632 + that are not immediately evident in other ways or that seem
  633 + useful for some other reason. In this version, information is
  634 + provided about each object indicating whether it is a stream
  635 + and, if so, what its length and filters are. Without this, it
  636 + was not possible to tell conclusively from the JSON output
  637 + alone whether or not an object was a stream. Run
  638 + :command:`qpdf --json-help` for details.
  639 +
  640 + - Add new option
  641 + :samp:`--remove-unreferenced-resources` which
  642 + takes ``auto``, ``yes``, or ``no`` as arguments. The new
  643 + ``auto`` mode, which is the default, performs a fast heuristic
  644 + over a PDF file when splitting pages to determine whether the
  645 + expensive process of finding and removing unreferenced
  646 + resources is likely to be of benefit. For most files, this new
  647 + default will result in a significant performance improvement
  648 + for splitting pages. See :ref:`ref.advanced-transformation` for a more detailed
  649 + discussion.
  650 +
  651 + - The :samp:`--preserve-unreferenced-resources`
  652 + is now just a synonym for
  653 + :samp:`--remove-unreferenced-resources=no`.
  654 +
  655 + - If the ``QPDF_EXECUTABLE`` environment variable is set when
  656 + invoking :command:`qpdf --bash-completion` or
  657 + :command:`qpdf --zsh-completion`, the completion
  658 + command that it outputs will refer to qpdf using the value of
  659 + that variable rather than what :command:`qpdf`
  660 + determines its executable path to be. This can be useful when
  661 + wrapping :command:`qpdf` with a script, working
  662 + with a version in the source tree, using an AppImage, or other
  663 + situations where there is some indirection.
  664 +
  665 + - Library Enhancements
  666 +
  667 + - Random number generation is now delegated to the crypto
  668 + provider. The old behavior is still used by the native crypto
  669 + provider. It is still possible to provide your own random
  670 + number generator.
  671 +
  672 + - Add a new version of
  673 + ``QPDFObjectHandle::StreamDataProvider::provideStreamData``
  674 + that accepts the ``suppress_warnings`` and ``will_retry``
  675 + options and allows a success code to be returned. This makes it
  676 + possible to implement a ``StreamDataProvider`` that calls
  677 + ``pipeStreamData`` on another stream and to pass the response
  678 + back to the caller, which enables better error handling on
  679 + those proxied streams.
  680 +
  681 + - Update ``QPDFObjectHandle::pipeStreamData`` to return an
  682 + overall success code that goes beyond whether or not filtered
  683 + data was written successfully. This allows better error
  684 + handling of cases that were not filtering errors. You have to
  685 + call this explicitly. Methods in previously existing APIs have
  686 + the same semantics as before.
  687 +
  688 + - The ``QPDFPageObjectHelper::placeFormXObject`` method now
  689 + allows separate control over whether it should be willing to
  690 + shrink or expand objects to fit them better into the
  691 + destination rectangle. The previous behavior was that shrinking
  692 + was allowed but expansion was not. The previous behavior is
  693 + still the default.
  694 +
  695 + - When calling the C API, any non-zero value passed to a boolean
  696 + parameter is treated as ``TRUE``. Previously only the value
  697 + ``1`` was accepted. This makes the C API behave more like most
  698 + C interfaces and is known to improve compatibility with some
  699 + Windows environments that dynamically load the DLL and call
  700 + functions from it.
  701 +
  702 + - Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only
  703 + top-level dictionary keys or array items. This is unsafe
  704 + because it creates a situation in which changing a lower-level
  705 + item in one object may also change it in another object, but
  706 + for cases in which you *know* you are only inserting or
  707 + replacing top-level items, it is much faster than
  708 + ``QPDFObjectHandle::shallowCopy``.
  709 +
  710 + - Add ``QPDFObjectHandle::filterAsContents``, which filter's a
  711 + stream's data as a content stream. This is useful for parsing
  712 + the contents for form XObjects in the same way as parsing page
  713 + content streams.
  714 +
  715 + - Bug Fixes
  716 +
  717 + - When detecting and removing unreferenced resources during page
  718 + splitting, traverse into form XObjects and handle their
  719 + resources dictionaries as well.
  720 +
  721 + - The same error recovery is applied to streams in other than the
  722 + primary input file when merging or splitting pages.
  723 +
  724 +9.1.1: January 26, 2020
  725 + - Build/Packaging Changes
  726 +
  727 + - The fix-qdf program was converted from perl to C++. As such,
  728 + qpdf no longer has a runtime dependency on perl.
  729 +
  730 + - Library Enhancements
  731 +
  732 + - Added new helper routine ``QUtil::call_main_from_wmain`` which
  733 + converts ``wchar_t`` arguments to UTF-8 encoded strings. This
  734 + is useful for qpdf because library methods expect file names to
  735 + be UTF-8 encoded, even on Windows
  736 +
  737 + - Added new ``QUtil::read_lines_from_file`` methods that take
  738 + ``FILE*`` arguments and that allow preservation of end-of-line
  739 + characters. This also fixes a bug where
  740 + ``QUtil::read_lines_from_file`` wouldn't work properly with
  741 + Unicode filenames.
  742 +
  743 + - CLI Enhancements
  744 +
  745 + - Added options :samp:`--is-encrypted` and
  746 + :samp:`--requires-password` for testing whether
  747 + a file is encrypted or requires a password other than the
  748 + supplied (or empty) password. These communicate via exit
  749 + status, making them useful for shell scripts. They also work on
  750 + encrypted files with unknown passwords.
  751 +
  752 + - Added ``encrypt`` key to JSON options. With the exception of
  753 + the reconstructed user password for older encryption formats,
  754 + this provides the same information as
  755 + :samp:`--show-encryption` but in a consistent,
  756 + parseable format. See output of :command:`qpdf
  757 + --json-help` for details.
  758 +
  759 + - Bug Fixes
  760 +
  761 + - In QDF mode, be sure not to write more than one XRef stream to
  762 + a file, even when
  763 + :samp:`--preserve-unreferenced` is used.
  764 + :command:`fix-qdf` assumes that there is only
  765 + one XRef stream, and that it appears at the end of the file.
  766 +
  767 + - When externalizing inline images, properly handle images whose
  768 + color space is a reference to an object in the page's resource
  769 + dictionary.
  770 +
  771 + - Windows-specific fix for acquiring crypt context with a new
  772 + keyset.
  773 +
  774 +9.1.0: November 17, 2019
  775 + - Build Changes
  776 +
  777 + - A C++-11 compiler is now required to build qpdf.
  778 +
  779 + - A new crypto provider that uses gnutls for crypto functions is
  780 + now available and can be enabled at build time. See :ref:`ref.crypto` for more information about crypto
  781 + providers and :ref:`ref.crypto.build` for specific information about
  782 + the build.
  783 +
  784 + - Library Enhancements
  785 +
  786 + - Incorporate contribution from Masamichi Hosoda to properly
  787 + handle signature dictionaries by not including them in object
  788 + streams, formatting the ``Contents`` key has a hexadecimal
  789 + string, and excluding the ``/Contents`` key from encryption and
  790 + decryption.
  791 +
  792 + - Incorporate contribution from Masamichi Hosoda to provide new
  793 + API calls for getting file-level information about input and
  794 + output files, enabling certain operations on the files at the
  795 + file level rather than the object level. New methods include
  796 + ``QPDF::getXRefTable()``,
  797 + ``QPDFObjectHandle::getParsedOffset()``,
  798 + ``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and
  799 + ``QPDFWriter::getWrittenXRefTable()``.
  800 +
  801 + - Support build-time and runtime selectable crypto providers.
  802 + This includes the addition of new classes
  803 + ``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the
  804 + recognition of the ``QPDF_CRYPTO_PROVIDER`` environment
  805 + variable. Crypto providers are described in depth in :ref:`ref.crypto`.
  806 +
  807 + - CLI Enhancements
  808 +
  809 + - Addition of the :samp:`--show-crypto` option in
  810 + support of selectable crypto providers, as described in :ref:`ref.crypto`.
  811 +
  812 + - Allow ``:even`` or ``:odd`` to be appended to numeric ranges
  813 + for specification of the even or odd pages from among the pages
  814 + specified in the range.
  815 +
  816 + - Fix shell wildcard expansion behavior (``*`` and ``?``) of the
  817 + :command:`qpdf.exe` as built my MSVC.
  818 +
  819 +9.0.2: October 12, 2019
  820 + - Bug Fix
  821 +
  822 + - Fix the name of the temporary file used by
  823 + :samp:`--replace-input` so that it doesn't
  824 + require path splitting and works with paths include
  825 + directories.
  826 +
  827 +9.0.1: September 20, 2019
  828 + - Bug Fixes/Enhancements
  829 +
  830 + - Fix some build and test issues on big-endian systems and
  831 + compilers with characters that are unsigned by default. The
  832 + problems were in build and test only. There were no actual bugs
  833 + in the qpdf library itself relating to endianness or unsigned
  834 + characters.
  835 +
  836 + - When a dictionary has a duplicated key, report this with a
  837 + warning. The behavior of the library in this case is unchanged,
  838 + but the error condition is no longer silently ignored.
  839 +
  840 + - When a form field's display rectangle is erroneously specified
  841 + with inverted coordinates, detect and correct this situation.
  842 + This avoids some form fields from being flipped when flattening
  843 + annotations on files with this condition.
  844 +
  845 +9.0.0: August 31, 2019
  846 + - Incompatible API (source-level) Changes (minor)
  847 +
  848 + - The method ``QUtil::strcasecmp`` has been renamed to
  849 + ``QUtil::str_compare_nocase``. This incompatible change is
  850 + necessary to enable qpdf to build on platforms that define
  851 + ``strcasecmp`` as a macro.
  852 +
  853 + - The ``QPDF::copyForeignObject`` method had an overloaded
  854 + version that took a boolean parameter that was not used. If you
  855 + were using this version, just omit the extra parameter.
  856 +
  857 + - There was a version ``QPDFTokenizer::expectInlineImage`` that
  858 + took no arguments. This version has been removed since it
  859 + caused the tokenizer to return incorrect inline images. A new
  860 + version was added some time ago that produces correct output.
  861 + This is a very low level method that doesn't make sense to call
  862 + outside of qpdf's lexical engine. There are higher level
  863 + methods for tokenizing content streams.
  864 +
  865 + - Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and
  866 + ``QPDFOutlineObjectHelper::getKids`` to return a
  867 + ``std::vector`` instead of a ``std::list`` of
  868 + ``QPDFOutlineObjectHelper`` objects.
  869 +
  870 + - Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This
  871 + function would allow creation of name tokens whose value would
  872 + change when unparsed, which is never the correct behavior.
  873 +
  874 + - CLI Enhancements
  875 +
  876 + - The :samp:`--replace-input` option may be given
  877 + in place of an output file name. This causes qpdf to overwrite
  878 + the input file with the output. See the description of
  879 + :samp:`--replace-input` in :ref:`ref.basic-options` for more details.
  880 +
  881 + - The :samp:`--recompress-flate` instructs
  882 + :command:`qpdf` to recompress streams that are
  883 + already compressed with ``/FlateDecode``. Useful with
  884 + :samp:`--compression-level`.
  885 +
  886 + - The
  887 + :samp:`--compression-level={level}`
  888 + sets the zlib compression level used for any streams compressed
  889 + by ``/FlateDecode``. Most effective when combined with
  890 + :samp:`--recompress-flate`.
  891 +
  892 + - Library Enhancements
  893 +
  894 + - A new namespace ``QIntC``, provided by
  895 + :file:`qpdf/QIntC.hh`, provides safe
  896 + conversion methods between different integer types. These
  897 + conversion methods do range checking to ensure that the cast
  898 + can be performed with no loss of information. Every use of
  899 + ``static_cast`` in the library was inspected to see if it could
  900 + use one of these safe converters instead. See :ref:`ref.casting` for additional details.
  901 +
  902 + - Method ``QPDF::anyWarnings`` tells whether there have been any
  903 + warnings without clearing the list of warnings.
  904 +
  905 + - Method ``QPDF::closeInputSource`` closes or otherwise releases
  906 + the input source. This enables the input file to be deleted or
  907 + renamed.
  908 +
  909 + - New methods have been added to ``QUtil`` for converting back
  910 + and forth between strings and unsigned integers:
  911 + ``uint_to_string``, ``uint_to_string_base``,
  912 + ``string_to_uint``, and ``string_to_ull``.
  913 +
  914 + - New methods have been added to ``QPDFObjectHandle`` that return
  915 + the value of ``Integer`` objects as ``int`` or ``unsigned int``
  916 + with range checking and sensible fallback values, and a new
  917 + method was added to return an unsigned value. This makes it
  918 + easier to write code that is safe from unintentional data loss.
  919 + Functions: ``getUIntValue``, ``getIntValueAsInt``,
  920 + ``getUIntValueAsUInt``.
  921 +
  922 + - When parsing content streams with
  923 + ``QPDFObjectHandle::ParserCallbacks``, in place of the method
  924 + ``handleObject(QPDFObjectHandle)``, the developer may override
  925 + ``handleObject(QPDFObjectHandle, size_t offset, size_t
  926 + length)``. If this method is defined, it will
  927 + be invoked with the object along with its offset and length
  928 + within the overall contents being parsed. Intervening spaces
  929 + and comments are not included in offset and length.
  930 + Additionally, a new method ``contentSize(size_t)`` may be
  931 + implemented. If present, it will be called prior to the first
  932 + call to ``handleObject`` with the total size in bytes of the
  933 + combined contents.
  934 +
  935 + - New methods ``QPDF::userPasswordMatched`` and
  936 + ``QPDF::ownerPasswordMatched`` have been added to enable a
  937 + caller to determine whether the supplied password was the user
  938 + password, the owner password, or both. This information is also
  939 + displayed by :command:`qpdf --show-encryption`
  940 + and :command:`qpdf --check`.
  941 +
  942 + - Static method ``Pl_Flate::setCompressionLevel`` can be called
  943 + to set the zlib compression level globally used by all
  944 + instances of Pl_Flate in deflate mode.
  945 +
  946 + - The method ``QPDFWriter::setRecompressFlate`` can be called to
  947 + tell ``QPDFWriter`` to uncompress and recompress streams
  948 + already compressed with ``/FlateDecode``.
  949 +
  950 + - The underlying implementation of QPDF arrays has been enhanced
  951 + to be much more memory efficient when dealing with arrays with
  952 + lots of nulls. This enables qpdf to use drastically less memory
  953 + for certain types of files.
  954 +
  955 + - When traversing the pages tree, if nodes are encountered with
  956 + invalid types, the types are fixed, and a warning is issued.
  957 +
  958 + - A new helper method ``QUtil::read_file_into_memory`` was added.
  959 +
  960 + - All conditions previously reported by
  961 + ``QPDF::checkLinearization()`` as errors are now presented as
  962 + warnings.
  963 +
  964 + - Name tokens containing the ``#`` character not preceded by two
  965 + hexadecimal digits, which is invalid in PDF 1.2 and above, are
  966 + properly handled by the library: a warning is generated, and
  967 + the name token is properly preserved, even if invalid, in the
  968 + output. See :file:`ChangeLog` for a more
  969 + complete description of this change.
  970 +
  971 + - Bug Fixes
  972 +
  973 + - A small handful of memory issues, assertion failures, and
  974 + unhandled exceptions that could occur on badly mangled input
  975 + files have been fixed. Most of these problems were found by
  976 + Google's OSS-Fuzz project.
  977 +
  978 + - When :command:`qpdf --check` or
  979 + :command:`qpdf --check-linearization` encounters
  980 + a file with linearization warnings but not errors, it now
  981 + properly exits with exit code 3 instead of 2.
  982 +
  983 + - The :samp:`--completion-bash` and
  984 + :samp:`--completion-zsh` options now work
  985 + properly when qpdf is invoked as an AppImage.
  986 +
  987 + - Calling ``QPDFWriter::set*EncryptionParameters`` on a
  988 + ``QPDFWriter`` object whose output filename has not yet been
  989 + set no longer produces a segmentation fault.
  990 +
  991 + - When reading encrypted files, follow the spec more closely
  992 + regarding encryption key length. This allows qpdf to open
  993 + encrypted files in most cases when they have invalid or missing
  994 + /Length keys in the encryption dictionary.
  995 +
  996 + - Build Changes
  997 +
  998 + - On platforms that support it, qpdf now builds with
  999 + :samp:`-fvisibility=hidden`. If you build qpdf
  1000 + with your own build system, this is now safe to use. This
  1001 + prevents methods that are not part of the public API from being
  1002 + exported by the shared library, and makes qpdf's ELF shared
  1003 + libraries (used on Linux, MacOS, and most other UNIX flavors)
  1004 + behave more like the Windows DLL. Since the DLL already behaves
  1005 + in much this way, it is unlikely that there are any methods
  1006 + that were accidentally not exported. However, with ELF shared
  1007 + libraries, typeinfo for some classes has to be explicitly
  1008 + exported. If there are problems in dynamically linked code
  1009 + catching exceptions or subclassing, this could be the reason.
  1010 + If you see this, please report a bug at
  1011 + https://github.com/qpdf/qpdf/issues/.
  1012 +
  1013 + - QPDF is now compiled with integer conversion and sign
  1014 + conversion warnings enabled. Numerous changes were made to the
  1015 + library to make this safe.
  1016 +
  1017 + - QPDF's :command:`make install` target explicitly
  1018 + specifies the mode to use when installing files instead of
  1019 + relying the user's umask. It was previously doing this for some
  1020 + files but not others.
  1021 +
  1022 + - If :command:`pkg-config` is available, use it to
  1023 + locate :file:`libjpeg` and
  1024 + :file:`zlib` dependencies, falling back on
  1025 + old behavior if unsuccessful.
  1026 +
  1027 + - Other Notes
  1028 +
  1029 + - QPDF has been fully integrated into `Google's OSS-Fuzz
  1030 + project <https://github.com/google/oss-fuzz>`__. This project
  1031 + exercises code with randomly mutated inputs and is great for
  1032 + discovering hidden security crashes and security issues.
  1033 + Several bugs found by oss-fuzz have already been fixed in qpdf.
  1034 +
  1035 +8.4.2: May 18, 2019
  1036 + This release has just one change: correction of a buffer overrun in
  1037 + the Windows code used to open files. Windows users should take this
  1038 + update. There are no code changes that affect non-Windows releases.
  1039 +
  1040 +8.4.1: April 27, 2019
  1041 + - Enhancements
  1042 +
  1043 + - When :command:`qpdf --version` is run, it will
  1044 + detect if the qpdf CLI was built with a different version of
  1045 + qpdf than the library, which may indicate a problem with the
  1046 + installation.
  1047 +
  1048 + - New option :samp:`--remove-page-labels` will
  1049 + remove page labels before generating output. This used to
  1050 + happen if you ran :command:`qpdf --empty --pages ..
  1051 + --`, but the behavior changed in qpdf 8.3.0. This
  1052 + option enables people who were relying on the old behavior to
  1053 + get it again.
  1054 +
  1055 + - New option
  1056 + :samp:`--keep-files-open-threshold={count}`
  1057 + can be used to override number of files that qpdf will use to
  1058 + trigger the behavior of not keeping all files open when merging
  1059 + files. This may be necessary if your system allows fewer than
  1060 + the default value of 200 files to be open at the same time.
  1061 +
  1062 + - Bug Fixes
  1063 +
  1064 + - Handle Unicode characters in filenames on Windows. The changes
  1065 + to support Unicode on the CLI in Windows broke Unicode
  1066 + filenames for Windows.
  1067 +
  1068 + - Slightly tighten logic that determines whether an object is a
  1069 + page. This should resolve problems in some rare files where
  1070 + some non-page objects were passing qpdf's test for whether
  1071 + something was a page, thus causing them to be erroneously lost
  1072 + during page splitting operations.
  1073 +
  1074 + - Revert change that included preservation of outlines
  1075 + (bookmarks) in :samp:`--split-pages`. The way
  1076 + it was implemented in 8.3.0 and 8.4.0 caused a very significant
  1077 + degradation of performance for splitting certain files. A
  1078 + future release of qpdf may re-introduce the behavior in a more
  1079 + performant and also more correct fashion.
  1080 +
  1081 + - In JSON mode, add missing leading 0 to decimal values between
  1082 + -1 and 1 even if not present in the input. The JSON
  1083 + specification requires the leading 0. The PDF specification
  1084 + does not.
  1085 +
  1086 +8.4.0: February 1, 2019
  1087 + - Command-line Enhancements
  1088 +
  1089 + - *Non-compatible CLI change:* The qpdf command-line tool
  1090 + interprets passwords given at the command-line differently from
  1091 + previous releases when the passwords contain non-ASCII
  1092 + characters. In some cases, the behavior differs from previous
  1093 + releases. For a discussion of the current behavior, please see
  1094 + :ref:`ref.unicode-passwords`. The
  1095 + incompatibilities are as follows:
  1096 +
  1097 + - On Windows, qpdf now receives all command-line options as
  1098 + Unicode strings if it can figure out the appropriate
  1099 + compile/link options. This is enabled at least for MSVC and
  1100 + mingw builds. That means that if non-ASCII strings are
  1101 + passed to the qpdf CLI in Windows, qpdf will now correctly
  1102 + receive them. In the past, they would have either been
  1103 + encoded as Windows code page 1252 (also known as "Windows
  1104 + ANSI" or as something unintelligible. In almost all cases,
  1105 + qpdf is able to properly interpret Unicode arguments now,
  1106 + whereas in the past, it would almost never interpret them
  1107 + properly. The result is that non-ASCII passwords given to
  1108 + the qpdf CLI on Windows now have a much greater chance of
  1109 + creating PDF files that can be opened by a variety of
  1110 + readers. In the past, usually files encrypted from the
  1111 + Windows CLI using non-ASCII passwords would not be readable
  1112 + by most viewers. Note that the current version of qpdf is
  1113 + able to decrypt files that it previously created using the
  1114 + previously supplied password.
  1115 +
  1116 + - The PDF specification requires passwords to be encoded as
  1117 + UTF-8 for 256-bit encryption and with PDF Doc encoding for
  1118 + 40-bit or 128-bit encryption. Older versions of qpdf left it
  1119 + up to the user to provide passwords with the correct
  1120 + encoding. The qpdf CLI now detects when a password is given
  1121 + with UTF-8 encoding and automatically transcodes it to what
  1122 + the PDF spec requires. While this is almost always the
  1123 + correct behavior, it is possible to override the behavior if
  1124 + there is some reason to do so. This is discussed in more
  1125 + depth in :ref:`ref.unicode-passwords`.
  1126 +
  1127 + - New options
  1128 + :samp:`--externalize-inline-images`,
  1129 + :samp:`--ii-min-bytes`, and
  1130 + :samp:`--keep-inline-images` control qpdf's
  1131 + handling of inline images and possible conversion of them to
  1132 + regular images. By default,
  1133 + :samp:`--optimize-images` now also applies to
  1134 + inline images. These options are discussed in :ref:`ref.advanced-transformation`.
  1135 +
  1136 + - Add options :samp:`--overlay` and
  1137 + :samp:`--underlay` for overlaying or
  1138 + underlaying pages of other files onto output pages. See
  1139 + :ref:`ref.overlay-underlay` for
  1140 + details.
  1141 +
  1142 + - When opening an encrypted file with a password, if the
  1143 + specified password doesn't work and the password contains any
  1144 + non-ASCII characters, qpdf will try a number of alternative
  1145 + passwords to try to compensate for possible character encoding
  1146 + errors. This behavior can be suppressed with the
  1147 + :samp:`--suppress-password-recovery` option.
  1148 + See :ref:`ref.unicode-passwords` for a full
  1149 + discussion.
  1150 +
  1151 + - Add the :samp:`--password-mode` option to
  1152 + fine-tune how qpdf interprets password arguments, especially
  1153 + when they contain non-ASCII characters. See :ref:`ref.unicode-passwords` for more information.
  1154 +
  1155 + - In the :samp:`--pages` option, it is now
  1156 + possible to copy the same page more than once from the same
  1157 + file without using the previous workaround of specifying two
  1158 + different paths to the same file.
  1159 +
  1160 + - In the :samp:`--pages` option, allow use of "."
  1161 + as a shortcut for the primary input file. That way, you can do
  1162 + :command:`qpdf in.pdf --pages . 1-2 -- out.pdf`
  1163 + instead of having to repeat :file:`in.pdf`
  1164 + in the command.
  1165 +
  1166 + - When encrypting with 128-bit and 256-bit encryption, new
  1167 + encryption options :samp:`--assemble`,
  1168 + :samp:`--annotate`,
  1169 + :samp:`--form`, and
  1170 + :samp:`--modify-other` allow more fine-grained
  1171 + granularity in configuring options. Before, the
  1172 + :samp:`--modify` option only configured certain
  1173 + predefined groups of permissions.
  1174 +
  1175 + - Bug Fixes and Enhancements
  1176 +
  1177 + - *Potential data-loss bug:* Versions of qpdf between 8.1.0 and
  1178 + 8.3.0 had a bug that could cause page splitting and merging
  1179 + operations to drop some font or image resources if the PDF
  1180 + file's internal structure shared these resource lists across
  1181 + pages and if some but not all of the pages in the output did
  1182 + not reference all the fonts and images. Using the
  1183 + :samp:`--preserve-unreferenced-resources`
  1184 + option would work around the incorrect behavior. This bug was
  1185 + the result of a typo in the code and a deficiency in the test
  1186 + suite. The case that triggered the error was known, just not
  1187 + handled properly. This case is now exercised in qpdf's test
  1188 + suite and properly handled.
  1189 +
  1190 + - When optimizing images, detect and refuse to optimize images
  1191 + that can't be converted to JPEG because of bit depth or color
  1192 + space.
  1193 +
  1194 + - Linearization and page manipulation APIs now detect and recover
  1195 + from files that have duplicate Page objects in the pages tree.
  1196 +
  1197 + - Using older option
  1198 + :samp:`--stream-data=compress` with object
  1199 + streams, object streams and xref streams were not compressed.
  1200 +
  1201 + - When the tokenizer returns inline image tokens, delimiters
  1202 + following ``ID`` and ``EI`` operators are no longer excluded.
  1203 + This makes it possible to reliably extract the actual image
  1204 + data.
  1205 +
  1206 + - Library Enhancements
  1207 +
  1208 + - Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to
  1209 + convert inline images to regular images.
  1210 +
  1211 + - Add method ``QUtil::possible_repaired_encodings()`` to generate
  1212 + a list of strings that represent other ways the given string
  1213 + could have been encoded. This is the method the QPDF CLI uses
  1214 + to generate the strings it tries when recovering incorrectly
  1215 + encoded Unicode passwords.
  1216 +
  1217 + - Add new versions of
  1218 + ``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow
  1219 + more granular setting of permissions bits. See
  1220 + :file:`QPDFWriter.hh` for details.
  1221 +
  1222 + - Add new versions of the transcoders from UTF-8 to single-byte
  1223 + coding systems in ``QUtil`` that report success or failure
  1224 + rather than just substituting a specified unknown character.
  1225 +
  1226 + - Add method ``QUtil::analyze_encoding()`` to determine whether a
  1227 + string has high-bit characters and is appears to be UTF-16 or
  1228 + valid UTF-8 encoding.
  1229 +
  1230 + - Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to
  1231 + copy a new page that is a "shallow copy" of a page. The
  1232 + resulting object is an indirect object ready to be passed to
  1233 + ``QPDFPageDocumentHelper::addPage()`` for either the original
  1234 + ``QPDF`` object or a different one. This is what the
  1235 + :command:`qpdf` command-line tool uses to copy
  1236 + the same page multiple times from the same file during
  1237 + splitting and merging operations.
  1238 +
  1239 + - Add method ``QPDF::getUniqueId()``, which returns a unique
  1240 + identifier for the given QPDF object. The identifier will be
  1241 + unique across the life of the application. The returned value
  1242 + can be safely used as a map key.
  1243 +
  1244 + - Add method ``QPDF::setImmediateCopyFrom``. This further
  1245 + enhances qpdf's ability to allow a ``QPDF`` object from which
  1246 + objects are being copied to go out of scope before the
  1247 + destination object is written. If you call this method on a
  1248 + ``QPDF`` instances, objects copied *from* this instance will be
  1249 + copied immediately instead of lazily. This option uses more
  1250 + memory but allows the source object to go out of scope before
  1251 + the destination object is written in all cases. See comments in
  1252 + :file:`QPDF.hh` for details.
  1253 +
  1254 + - Add method ``QPDFPageObjectHelper::getAttribute`` for
  1255 + retrieving an attribute from the page dictionary taking
  1256 + inheritance into consideration, and optionally making a copy if
  1257 + your intention is to modify the attribute.
  1258 +
  1259 + - Fix long-standing limitation of
  1260 + ``QPDFPageObjectHelper::getPageImages`` so that it now properly
  1261 + reports images from inherited resources dictionaries,
  1262 + eliminating the need to call
  1263 + ``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in
  1264 + this case.
  1265 +
  1266 + - Add method ``QPDFObjectHandle::getUniqueResourceName`` for
  1267 + finding an unused name in a resource dictionary.
  1268 +
  1269 + - Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for
  1270 + generating a form XObject equivalent to a page. The resulting
  1271 + object can be used in the same file or copied to another file
  1272 + with ``copyForeignObject``. This can be useful for implementing
  1273 + underlay, overlay, n-up, thumbnails, or any other functionality
  1274 + requiring replication of pages in other contexts.
  1275 +
  1276 + - Add method ``QPDFPageObjectHelper::placeFormXObject`` for
  1277 + generating content stream text that places a given form XObject
  1278 + on a page, centered and fit within a specified rectangle. This
  1279 + method takes care of computing the proper transformation matrix
  1280 + and may optionally compensate for rotation or scaling of the
  1281 + destination page.
  1282 +
  1283 + - Build Improvements
  1284 +
  1285 + - Add new configure option
  1286 + :samp:`--enable-avoid-windows-handle`, which
  1287 + causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be
  1288 + defined. When defined, qpdf will avoid referencing the Windows
  1289 + ``HANDLE`` type, which is disallowed with certain versions of
  1290 + the Windows SDK.
  1291 +
  1292 + - For Windows builds, attempt to determine what options, if any,
  1293 + have to be passed to the compiler and linker to enable use of
  1294 + ``wmain``. This causes the preprocessor symbol
  1295 + ``WINDOWS_WMAIN`` to be defined. If you do your own builds with
  1296 + other compilers, you can define this symbol to cause ``wmain``
  1297 + to be used. This is needed to allow the Windows
  1298 + :command:`qpdf` command to receive Unicode
  1299 + command-line options.
  1300 +
  1301 +8.3.0: January 7, 2019
  1302 + - Command-line Enhancements
  1303 +
  1304 + - Shell completion: you can now use eval :command:`$(qpdf
  1305 + --completion-bash)` and eval :command:`$(qpdf
  1306 + --completion-zsh)` to enable shell completion for
  1307 + bash and zsh.
  1308 +
  1309 + - Page numbers (also known as page labels) are now preserved when
  1310 + merging and splitting files with the
  1311 + :samp:`--pages` and
  1312 + :samp:`--split-pages` options.
  1313 +
  1314 + - Bookmarks are partially preserved when splitting pages with the
  1315 + :samp:`--split-pages` option. Specifically, the
  1316 + outlines dictionary and some supporting metadata are copied
  1317 + into the split files. The result is that all bookmarks from the
  1318 + original file appear, those that point to pages that are
  1319 + preserved work, and those that point to pages that are not
  1320 + preserved don't do anything. This is an interim step toward
  1321 + proper support for bookmarks in splitting and merging
  1322 + operations.
  1323 +
  1324 + - Page collation: add new option
  1325 + :samp:`--collate`. When specified, the
  1326 + semantics of :samp:`--pages` change from
  1327 + concatenation to collation. See :ref:`ref.page-selection` for examples and discussion.
  1328 +
  1329 + - Generation of information in JSON format, primarily to
  1330 + facilitate use of qpdf from languages other than C++. Add new
  1331 + options :samp:`--json`,
  1332 + :samp:`--json-key`, and
  1333 + :samp:`--json-object` to generate a JSON
  1334 + representation of the PDF file. Run :command:`qpdf
  1335 + --json-help` to get a description of the JSON
  1336 + format. For more information, see :ref:`ref.json`.
  1337 +
  1338 + - The :samp:`--generate-appearances` flag will
  1339 + cause qpdf to generate appearances for form fields if the PDF
  1340 + file indicates that form field appearances are out of date.
  1341 + This can happen when PDF forms are filled in by a program that
  1342 + doesn't know how to regenerate the appearances of the filled-in
  1343 + fields.
  1344 +
  1345 + - The :samp:`--flatten-annotations` flag can be
  1346 + used to *flatten* annotations, including form fields.
  1347 + Ordinarily, annotations are drawn separately from the page.
  1348 + Flattening annotations is the process of combining their
  1349 + appearances into the page's contents. You might want to do this
  1350 + if you are going to rotate or combine pages using a tool that
  1351 + doesn't understand about annotations. You may also want to use
  1352 + :samp:`--generate-appearances` when using this
  1353 + flag since annotations for outdated form fields are not
  1354 + flattened as that would cause loss of information.
  1355 +
  1356 + - The :samp:`--optimize-images` flag tells qpdf
  1357 + to recompresses every image using DCT (JPEG) compression as
  1358 + long as the image is not already compressed with lossy
  1359 + compression and recompressing the image reduces its size. The
  1360 + additional options :samp:`--oi-min-width`,
  1361 + :samp:`--oi-min-height`, and
  1362 + :samp:`--oi-min-area` prevent recompression of
  1363 + images whose width, height, or pixel area (widthย ร—ย height) are
  1364 + below a specified threshold.
  1365 +
  1366 + - The :samp:`--show-object` option can now be
  1367 + given as :samp:`--show-object=trailer` to show
  1368 + the trailer dictionary.
  1369 +
  1370 + - Bug Fixes and Enhancements
  1371 +
  1372 + - QPDF now automatically detects and recovers from dangling
  1373 + references. If a PDF file contained an indirect reference to a
  1374 + non-existent object, which is valid, when adding a new object
  1375 + to the file, it was possible for the new object to take the
  1376 + object ID of the dangling reference, thereby causing the
  1377 + dangling reference to point to the new object. This case is now
  1378 + prevented.
  1379 +
  1380 + - Fixes to form field setting code: strings are always written in
  1381 + UTF-16 format, and checkboxes and radio buttons are handled
  1382 + properly with respect to synchronization of values and
  1383 + appearance states.
  1384 +
  1385 + - The ``QPDF::checkLinearization()`` no longer causes the program
  1386 + to crash when it detects problems with linearization data.
  1387 + Instead, it issues a normal warning or error.
  1388 +
  1389 + - Ordinarily qpdf treats an argument of the form
  1390 + :samp:`@file` to mean that command-line options
  1391 + should be read from :file:`file`. Now, if
  1392 + :file:`file` does not exist but
  1393 + :file:`@file` does, qpdf will treat
  1394 + :file:`@file` as a regular option. This
  1395 + makes it possible to work more easily with PDF files whose
  1396 + names happen to start with the ``@`` character.
  1397 +
  1398 + - Library Enhancements
  1399 +
  1400 + - Remove the restriction in most cases that the source QPDF
  1401 + object used in a ``QPDF::copyForeignObject`` call has to stick
  1402 + around until the destination QPDF is written. The exceptional
  1403 + case is when the source stream gets is data using a
  1404 + QPDFObjectHandle::StreamDataProvider. For a more in-depth
  1405 + discussion, see comments around ``copyForeignObject`` in
  1406 + :file:`QPDF.hh`.
  1407 +
  1408 + - Add new method ``QPDFWriter::getFinalVersion()``, which returns
  1409 + the PDF version that will ultimately be written to the final
  1410 + file. See comments in :file:`QPDFWriter.hh`
  1411 + for some restrictions on its use.
  1412 +
  1413 + - Add several methods for transcoding strings to some of the
  1414 + character sets used in PDF files: ``QUtil::utf8_to_ascii``,
  1415 + ``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and
  1416 + ``QUtil::utf8_to_utf16``. For the single-byte encodings that
  1417 + support only a limited character sets, these methods replace
  1418 + unsupported characters with a specified substitute.
  1419 +
  1420 + - Add new methods to ``QPDFAnnotationObjectHelper`` and
  1421 + ``QPDFFormFieldObjectHelper`` for querying flags and
  1422 + interpretation of different field types. Define constants in
  1423 + :file:`qpdf/Constants.h` to help with
  1424 + interpretation of flag values.
  1425 +
  1426 + - Add new methods
  1427 + ``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and
  1428 + ``QPDFFormFieldObjectHelper::generateAppearance`` for
  1429 + generating appearance streams. See discussion in
  1430 + :file:`QPDFFormFieldObjectHelper.hh` for
  1431 + limitations.
  1432 +
  1433 + - Add two new helper functions for dealing with resource
  1434 + dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns
  1435 + a list of all second-level keys, which correspond to the names
  1436 + of resources, and ``QPDFObjectHandle::mergeResources()`` merges
  1437 + two resources dictionaries as long as they have non-conflicting
  1438 + keys. These methods are useful for certain types of objects
  1439 + that resolve resources from multiple places, such as form
  1440 + fields.
  1441 +
  1442 + - Add methods ``QPDFPageDocumentHelper::flattenAnnotations()``
  1443 + and
  1444 + ``QPDFAnnotationObjectHelper::getPageContentForAppearance()``
  1445 + for handling low-level details of annotation flattening.
  1446 +
  1447 + - Add new helper classes: ``QPDFOutlineDocumentHelper``,
  1448 + ``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``,
  1449 + ``QPDFNameTreeObjectHelper``, and
  1450 + ``QPDFNumberTreeObjectHelper``.
  1451 +
  1452 + - Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON
  1453 + representation of the object. Call ``serialize()`` on the
  1454 + result to convert it to a string.
  1455 +
  1456 + - Add a simple JSON serializer. This is not a complete or
  1457 + general-purpose JSON library. It allows assembly and
  1458 + serialization of JSON structures with some restrictions, which
  1459 + are described in the header file. This is the serializer used
  1460 + by qpdf's new JSON representation.
  1461 +
  1462 + - Add new ``QPDFObjectHandle::Matrix`` class along with a few
  1463 + convenience methods for dealing with six-element numerical
  1464 + arrays as matrices.
  1465 +
  1466 + - Add new method ``QPDFObjectHandle::wrapInArray``, which returns
  1467 + the object itself if it is an array, or an array containing the
  1468 + object otherwise. This is a common construct in PDF. This
  1469 + method prevents you from having to explicitly test whether
  1470 + something is a single element or an array.
  1471 +
  1472 + - Build Improvements
  1473 +
  1474 + - It is no longer necessary to run
  1475 + :command:`autogen.sh` to build from a pristine
  1476 + checkout. Automatically generated files are now committed so
  1477 + that it is possible to build on platforms without autoconf
  1478 + directly from a clean checkout of the repository. The
  1479 + :command:`configure` script detects if the files
  1480 + are out of date when it also determines that the tools are
  1481 + present to regenerate them.
  1482 +
  1483 + - Pull requests and the master branch are now built automatically
  1484 + in `Azure
  1485 + Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is
  1486 + free for open source projects. The build includes Linux, mac,
  1487 + Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage
  1488 + build. Official qpdf releases are now built with Azure
  1489 + Pipelines.
  1490 +
  1491 + - Notes for Packagers
  1492 +
  1493 + - A new section has been added to the documentation with notes
  1494 + for packagers. Please see :ref:`ref.packaging`.
  1495 +
  1496 + - The qpdf detects out-of-date automatically generated files. If
  1497 + your packaging system automatically refreshes libtool or
  1498 + autoconf files, it could cause this check to fail. To avoid
  1499 + this problem, pass
  1500 + :samp:`--disable-check-autofiles` to
  1501 + :command:`configure`.
  1502 +
  1503 + - If you would like to have qpdf completion enabled
  1504 + automatically, you can install completion files in the
  1505 + distribution's default location. You can find sample completion
  1506 + files to install in the :file:`completions`
  1507 + directory.
  1508 +
  1509 +8.2.1: August 18, 2018
  1510 + - Command-line Enhancements
  1511 +
  1512 + - Add
  1513 + :samp:`--keep-files-open={[yn]}`
  1514 + to override default determination of whether to keep files open
  1515 + when merging. Please see the discussion of
  1516 + :samp:`--keep-files-open` in :ref:`ref.basic-options` for additional details.
  1517 +
  1518 +8.2.0: August 16, 2018
  1519 + - Command-line Enhancements
  1520 +
  1521 + - Add :samp:`--no-warn` option to suppress
  1522 + issuing warning messages. If there are any conditions that
  1523 + would have caused warnings to be issued, the exit status is
  1524 + still 3.
  1525 +
  1526 + - Bug Fixes and Optimizations
  1527 +
  1528 + - Performance fix: optimize page merging operation to avoid
  1529 + unnecessary open/close calls on files being merged. This solves
  1530 + a dramatic slow-down that was observed when merging certain
  1531 + types of files.
  1532 +
  1533 + - Optimize how memory was used for the TIFF predictor,
  1534 + drastically improving performance and memory usage for files
  1535 + containing high-resolution images compressed with Flate using
  1536 + the TIFF predictor.
  1537 +
  1538 + - Bug fix: end of line characters were not properly handled
  1539 + inside strings in some cases.
  1540 +
  1541 + - Bug fix: using :samp:`--progress` on very small
  1542 + files could cause an infinite loop.
  1543 +
  1544 + - API enhancements
  1545 +
  1546 + - Add new class ``QPDFSystemError``, derived from
  1547 + ``std::runtime_error``, which is now thrown by
  1548 + ``QUtil::throw_system_error``. This enables the triggering
  1549 + ``errno`` value to be retrieved.
  1550 +
  1551 + - Add ``ClosedFileInputSource::stayOpen`` method, enabling a
  1552 + ``ClosedFileInputSource`` to stay open during manually
  1553 + indicated periods of high activity, thus reducing the overhead
  1554 + of frequent open/close operations.
  1555 +
  1556 + - Build Changes
  1557 +
  1558 + - For the mingw builds, change the name of the DLL import library
  1559 + from :file:`libqpdf.a` to
  1560 + :file:`libqpdf.dll.a` to more accurately
  1561 + reflect that it is an import library rather than a static
  1562 + library. This potentially clears the way for supporting a
  1563 + static library in the future, though presently, the qpdf
  1564 + Windows build only builds the DLL and executables.
  1565 +
  1566 +8.1.0: June 23, 2018
  1567 + - Usability Improvements
  1568 +
  1569 + - When splitting files, qpdf detects fonts and images that the
  1570 + document metadata claims are referenced from a page but are not
  1571 + actually referenced and omits them from the output file. This
  1572 + change can cause a significant reduction in the size of split
  1573 + PDF files for files created by some software packages. In some
  1574 + cases, it can also make page splitting slower. Prior versions
  1575 + of qpdf would believe the document metadata and sometimes
  1576 + include all the images from all the other pages even though the
  1577 + pages were no longer present. In the unlikely event that the
  1578 + old behavior should be desired, or if you have a case where
  1579 + page splitting is very slow, the old behavior (and speed) can
  1580 + be enabled by specifying
  1581 + :samp:`--preserve-unreferenced-resources`. For
  1582 + additional details, please see :ref:`ref.advanced-transformation`.
  1583 +
  1584 + - When merging multiple PDF files, qpdf no longer leaves all the
  1585 + files open. This makes it possible to merge numbers of files
  1586 + that may exceed the operating system's limit for the maximum
  1587 + number of open files.
  1588 +
  1589 + - The :samp:`--rotate` option's syntax has been
  1590 + extended to make the page range optional. If you specify
  1591 + :samp:`--rotate={angle}`
  1592 + without specifying a page range, the rotation will be applied
  1593 + to all pages. This can be especially useful for adjusting a PDF
  1594 + created from a multi-page document that was scanned upside
  1595 + down.
  1596 +
  1597 + - When merging multiple files, the
  1598 + :samp:`--verbose` option now prints information
  1599 + about each file as it operates on that file.
  1600 +
  1601 + - When the :samp:`--progress` option is
  1602 + specified, qpdf will print a running indicator of its best
  1603 + guess at how far through the writing process it is. Note that,
  1604 + as with all progress meters, it's an approximation. This option
  1605 + is implemented in a way that makes it useful for software that
  1606 + uses the qpdf library; see API Enhancements below.
  1607 +
  1608 + - Bug Fixes
  1609 +
  1610 + - Properly decrypt files that use revision 3 of the standard
  1611 + security handler but use 40 bit keys (even though revision 3
  1612 + supports 128-bit keys).
  1613 +
  1614 + - Limit depth of nested data structures to prevent crashes from
  1615 + certain types of malformed (malicious) PDFs.
  1616 +
  1617 + - In "newline before endstream" mode, insert the required extra
  1618 + newline before the ``endstream`` at the end of object streams.
  1619 + This one case was previously omitted.
  1620 +
  1621 + - API Enhancements
  1622 +
  1623 + - The first round of higher level "helper" interfaces has been
  1624 + introduced. These are designed to provide a more convenient way
  1625 + of interacting with certain document features than using
  1626 + ``QPDFObjectHandle`` directly. For details on helpers, see
  1627 + :ref:`ref.helper-classes`. Specific additional
  1628 + interfaces are described below.
  1629 +
  1630 + - Add two new document helper classes: ``QPDFPageDocumentHelper``
  1631 + for working with pages, and ``QPDFAcroFormDocumentHelper`` for
  1632 + working with interactive forms. No old methods have been
  1633 + removed, but ``QPDFPageDocumentHelper`` is now the preferred
  1634 + way to perform operations on pages rather than calling the old
  1635 + methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments
  1636 + in the header files direct you to the new interfaces. Please
  1637 + see the header files and :file:`ChangeLog`
  1638 + for additional details.
  1639 +
  1640 + - Add three new object helper class: ``QPDFPageObjectHelper`` for
  1641 + pages, ``QPDFFormFieldObjectHelper`` for interactive form
  1642 + fields, and ``QPDFAnnotationObjectHelper`` for annotations. All
  1643 + three classes are fairly sparse at the moment, but they have
  1644 + some useful, basic functionality.
  1645 +
  1646 + - A new example program
  1647 + :file:`examples/pdf-set-form-values.cc` has
  1648 + been added that illustrates use of the new document and object
  1649 + helpers.
  1650 +
  1651 + - The method ``QPDFWriter::registerProgressReporter`` has been
  1652 + added. This method allows you to register a function that is
  1653 + called by ``QPDFWriter`` to update your idea of the percentage
  1654 + it thinks it is through writing its output. Client programs can
  1655 + use this to implement reasonably accurate progress meters. The
  1656 + :command:`qpdf` command line tool uses this to
  1657 + implement its :samp:`--progress` option.
  1658 +
  1659 + - New methods ``QPDFObjectHandle::newUnicodeString`` and
  1660 + ``QPDFObject::unparseBinary`` have been added to allow for more
  1661 + convenient creation of strings that are explicitly encoded
  1662 + using big-endian UTF-16. This is useful for creating strings
  1663 + that appear outside of content streams, such as labels, form
  1664 + fields, outlines, document metadata, etc.
  1665 +
  1666 + - A new class ``QPDFObjectHandle::Rectangle`` has been added to
  1667 + ease working with PDF rectangles, which are just arrays of four
  1668 + numeric values.
  1669 +
  1670 +8.0.2: March 6, 2018
  1671 + - When a loop is detected while following cross reference streams or
  1672 + tables, treat this as damage instead of silently ignoring the
  1673 + previous table. This prevents loss of otherwise recoverable data
  1674 + in some damaged files.
  1675 +
  1676 + - Properly handle pages with no contents.
  1677 +
  1678 +8.0.1: March 4, 2018
  1679 + - Disregard data check errors when uncompressing ``/FlateDecode``
  1680 + streams. This is consistent with most other PDF readers and allows
  1681 + qpdf to recover data from another class of malformed PDF files.
  1682 +
  1683 + - On the command line when specifying page ranges, support preceding
  1684 + a page number by "r" to indicate that it should be counted from
  1685 + the end. For example, the range ``r3-r1`` would indicate the last
  1686 + three pages of a document.
  1687 +
  1688 +8.0.0: February 25, 2018
  1689 + - Packaging and Distribution Changes
  1690 +
  1691 + - QPDF is now distributed as an
  1692 + `AppImage <https://appimage.org/>`__ in addition to all the
  1693 + other ways it is distributed. The AppImage can be found in the
  1694 + download area with the other packages. Thanks to Kurt Pfeifle
  1695 + and Simon Peter for their contributions.
  1696 +
  1697 + - Bug Fixes
  1698 +
  1699 + - ``QPDFObjectHandle::getUTF8Val`` now properly treats
  1700 + non-Unicode strings as encoded with PDF Doc Encoding.
  1701 +
  1702 + - Improvements to handling of objects in PDF files that are not
  1703 + of the expected type. In most cases, qpdf will be able to warn
  1704 + for such cases rather than fail with an exception. Previous
  1705 + versions of qpdf would sometimes fail with errors such as
  1706 + "operation for dictionary object attempted on object of wrong
  1707 + type". This situation should be mostly or entirely eliminated
  1708 + now.
  1709 +
  1710 + - Enhancements to the :command:`qpdf` Command-line
  1711 + Tool. All new options listed here are documented in more detail in
  1712 + :ref:`ref.using`.
  1713 +
  1714 + - The option
  1715 + :samp:`--linearize-pass1={file}`
  1716 + has been added for debugging qpdf's linearization code.
  1717 +
  1718 + - The option :samp:`--coalesce-contents` can be
  1719 + used to combine content streams of a page whose contents are an
  1720 + array of streams into a single stream.
  1721 +
  1722 + - API Enhancements. All new API calls are documented in their
  1723 + respective classes' header files. There are no non-compatible
  1724 + changes to the API.
  1725 +
  1726 + - Add function ``qpdf_check_pdf`` to the C API. This function
  1727 + does basic checking that is a subset of what :command:`qpdf
  1728 + --check` performs.
  1729 +
  1730 + - Major enhancements to the lexical layer of qpdf. For a complete
  1731 + list of enhancements, please refer to the
  1732 + :file:`ChangeLog` file. Most of the changes
  1733 + result in improvements to qpdf's ability handle erroneous
  1734 + files. It is also possible for programs to handle whitespace,
  1735 + comments, and inline images as tokens.
  1736 +
  1737 + - New API for working with PDF content streams at a lexical
  1738 + level. The new class ``QPDFObjectHandle::TokenFilter`` allows
  1739 + the developer to provide token handlers. Token filters can be
  1740 + used with several different methods in ``QPDFObjectHandle`` as
  1741 + well as with a lower-level interface. See comments in
  1742 + :file:`QPDFObjectHandle.hh` as well as the
  1743 + new examples
  1744 + :file:`examples/pdf-filter-tokens.cc` and
  1745 + :file:`examples/pdf-count-strings.cc` for
  1746 + details.
  1747 +
  1748 +7.1.1: February 4, 2018
  1749 + - Bug fix: files whose /ID fields were other than 16 bytes long can
  1750 + now be properly linearized
  1751 +
  1752 + - A few compile and link issues have been corrected for some
  1753 + platforms.
  1754 +
  1755 +7.1.0: January 14, 2018
  1756 + - PDF files contain streams that may be compressed with various
  1757 + compression algorithms which, in some cases, may be enhanced by
  1758 + various predictor functions. Previously only the PNG up predictor
  1759 + was supported. In this version, all the PNG predictors as well as
  1760 + the TIFF predictor are supported. This increases the range of
  1761 + files that qpdf is able to handle.
  1762 +
  1763 + - QPDF now allows a raw encryption key to be specified in place of a
  1764 + password when opening encrypted files, and will optionally display
  1765 + the encryption key used by a file. This is a non-standard
  1766 + operation, but it can be useful in certain situations. Please see
  1767 + the discussion of :samp:`--password-is-hex-key` in
  1768 + :ref:`ref.basic-options` or the comments around
  1769 + ``QPDF::setPasswordIsHexKey`` in
  1770 + :file:`QPDF.hh` for additional details.
  1771 +
  1772 + - Bug fix: numbers ending with a trailing decimal point are now
  1773 + properly recognized as numbers.
  1774 +
  1775 + - Bug fix: when building qpdf from source on some platforms
  1776 + (especially MacOS), the build could get confused by older versions
  1777 + of qpdf installed on the system. This has been corrected.
  1778 +
  1779 +7.0.0: September 15, 2017
  1780 + - Packaging and Distribution Changes
  1781 +
  1782 + - QPDF's primary license is now `version 2.0 of the Apache
  1783 + License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather
  1784 + than version 2.0 of the Artistic License. You may still, at
  1785 + your option, consider qpdf to be licensed with version 2.0 of
  1786 + the Artistic license.
  1787 +
  1788 + - QPDF no longer has a dependency on the PCRE (Perl-Compatible
  1789 + Regular Expression) library. QPDF now has an added dependency
  1790 + on the JPEG library.
  1791 +
  1792 + - Bug Fixes
  1793 +
  1794 + - This release contains many bug fixes for various infinite
  1795 + loops, memory leaks, and other memory errors that could be
  1796 + encountered with specially crafted or otherwise erroneous PDF
  1797 + files.
  1798 +
  1799 + - New Features
  1800 +
  1801 + - QPDF now supports reading and writing streams encoded with JPEG
  1802 + or RunLength encoding. Library API enhancements and
  1803 + command-line options have been added to control this behavior.
  1804 + See command-line options
  1805 + :samp:`--compress-streams` and
  1806 + :samp:`--decode-level` and methods
  1807 + ``QPDFWriter::setCompressStreams`` and
  1808 + ``QPDFWriter::setDecodeLevel``.
  1809 +
  1810 + - QPDF is much better at recovering from broken files. In most
  1811 + cases, qpdf will skip invalid objects and will preserve broken
  1812 + stream data by not attempting to filter broken streams. QPDF is
  1813 + now able to recover or at least not crash on dozens of broken
  1814 + test files I have received over the past few years.
  1815 +
  1816 + - Page rotation is now supported and accessible from both the
  1817 + library and the command line.
  1818 +
  1819 + - ``QPDFWriter`` supports writing files in a way that preserves
  1820 + PCLm compliance in support of driverless printing. This is very
  1821 + specialized and is only useful to applications that already
  1822 + know how to create PCLm files.
  1823 +
  1824 + - Enhancements to the :command:`qpdf` Command-line
  1825 + Tool. All new options listed here are documented in more detail in
  1826 + :ref:`ref.using`.
  1827 +
  1828 + - Command-line arguments can now be read from files or standard
  1829 + input using ``@file`` or ``@-`` syntax. Please see :ref:`ref.invocation`.
  1830 +
  1831 + - :samp:`--rotate`: request page rotation
  1832 +
  1833 + - :samp:`--newline-before-endstream`: ensure that
  1834 + a newline appears before every ``endstream`` keyword in the
  1835 + file; used to prevent qpdf from breaking PDF/A compliance on
  1836 + already compliant files.
  1837 +
  1838 + - :samp:`--preserve-unreferenced`: preserve
  1839 + unreferenced objects in the input PDF
  1840 +
  1841 + - :samp:`--split-pages`: break output into chunks
  1842 + with fixed numbers of pages
  1843 +
  1844 + - :samp:`--verbose`: print the name of each
  1845 + output file that is created
  1846 +
  1847 + - :samp:`--compress-streams` and
  1848 + :samp:`--decode-level` replace
  1849 + :samp:`--stream-data` for improving granularity
  1850 + of controlling compression and decompression of stream data.
  1851 + The :samp:`--stream-data` option will remain
  1852 + available.
  1853 +
  1854 + - When running :command:`qpdf --check` with other
  1855 + options, checks are always run first. This enables qpdf to
  1856 + perform its full recovery logic before outputting other
  1857 + information. This can be especially useful when manually
  1858 + recovering broken files, looking at qpdf's regenerated cross
  1859 + reference table, or other similar operations.
  1860 +
  1861 + - Process :command:`--pages` earlier so that other
  1862 + options like :samp:`--show-pages` or
  1863 + :samp:`--split-pages` can operate on the file
  1864 + after page splitting/merging has occurred.
  1865 +
  1866 + - API Changes. All new API calls are documented in their respective
  1867 + classes' header files.
  1868 +
  1869 + - ``QPDFObjectHandle::rotatePage``: apply rotation to a page
  1870 + object
  1871 +
  1872 + - ``QPDFWriter::setNewlineBeforeEndstream``: force newline to
  1873 + appear before ``endstream``
  1874 +
  1875 + - ``QPDFWriter::setPreserveUnreferencedObjects``: preserve
  1876 + unreferenced objects that appear in the input PDF. The default
  1877 + behavior is to discard them.
  1878 +
  1879 + - New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are
  1880 + available for developers who wish to produce or consume
  1881 + RunLength or DCT stream data directly. The
  1882 + :file:`examples/pdf-create.cc` example
  1883 + illustrates their use.
  1884 +
  1885 + - ``QPDFWriter::setCompressStreams`` and
  1886 + ``QPDFWriter::setDecodeLevel`` methods control handling of
  1887 + different types of stream compression.
  1888 +
  1889 + - Add new C API functions ``qpdf_set_compress_streams``,
  1890 + ``qpdf_set_decode_level``,
  1891 + ``qpdf_set_preserve_unreferenced_objects``, and
  1892 + ``qpdf_set_newline_before_endstream`` corresponding to the new
  1893 + ``QPDFWriter`` methods.
  1894 +
  1895 +6.0.0: November 10, 2015
  1896 + - Implement :samp:`--deterministic-id` command-line
  1897 + option and ``QPDFWriter::setDeterministicID`` as well as C API
  1898 + function ``qpdf_set_deterministic_ID`` for generating a
  1899 + deterministic ID for non-encrypted files. When this option is
  1900 + selected, the ID of the file depends on the contents of the output
  1901 + file, and not on transient items such as the timestamp or output
  1902 + file name.
  1903 +
  1904 + - Make qpdf more tolerant of files whose xref table entries are not
  1905 + the correct length.
  1906 +
  1907 +5.1.3: May 24, 2015
  1908 + - Bug fix: fix-qdf was not properly handling files that contained
  1909 + object streams with more than 255 objects in them.
  1910 +
  1911 + - Bug fix: qpdf was not properly initializing Microsoft's secure
  1912 + crypto provider on fresh Windows installations that had not had
  1913 + any keys created yet.
  1914 +
  1915 + - Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of
  1916 + the Google Security Team. Please see the ChangeLog for details.
  1917 +
  1918 + - Properly handle pages that have no contents at all. There were
  1919 + many cases in which qpdf handled this fine, but a few methods
  1920 + blindly obtained page contents with handling the possibility that
  1921 + there were no contents.
  1922 +
  1923 + - Make qpdf more robust for a few more kinds of problems that may
  1924 + occur in invalid PDF files.
  1925 +
  1926 +5.1.2: June 7, 2014
  1927 + - Bug fix: linearizing files could create a corrupted output file
  1928 + under extremely unlikely file size circumstances. See ChangeLog
  1929 + for details. The odds of getting hit by this are very low, though
  1930 + one person did.
  1931 +
  1932 + - Bug fix: qpdf would fail to write files that had streams with
  1933 + decode parameters referencing other streams.
  1934 +
  1935 + - New example program: :command:`pdf-split-pages`:
  1936 + efficiently split PDF files into individual pages. The example
  1937 + program does this more efficiently than using :command:`qpdf
  1938 + --pages` to do it.
  1939 +
  1940 + - Packaging fix: Visual C++ binaries did not support Windows XP.
  1941 + This has been rectified by updating the compilers used to generate
  1942 + the release binaries.
  1943 +
  1944 +5.1.1: January 14, 2014
  1945 + - Performance fix: copying foreign objects could be very slow with
  1946 + certain types of files. This was most likely to be visible during
  1947 + page splitting and was due to traversing the same objects multiple
  1948 + times in some cases.
  1949 +
  1950 +5.1.0: December 17, 2013
  1951 + - Added runtime option (``QUtil::setRandomDataProvider``) to supply
  1952 + your own random data provider. You can use this if you want to
  1953 + avoid using the OS-provided secure random number generation
  1954 + facility or stdlib's less secure version. See comments in
  1955 + include/qpdf/QUtil.hh for details.
  1956 +
  1957 + - Fixed image comparison tests to not create 12-bit-per-pixel images
  1958 + since some versions of tiffcmp have bugs in comparing them in some
  1959 + cases. This increases the disk space required by the image
  1960 + comparison tests, which are off by default anyway.
  1961 +
  1962 + - Introduce a number of small fixes for compilation on the latest
  1963 + clang in MacOS and the latest Visual C++ in Windows.
  1964 +
  1965 + - Be able to handle broken files that end the xref table header with
  1966 + a space instead of a newline.
  1967 +
  1968 +5.0.1: October 18, 2013
  1969 + - Thanks to a detailed review by Florian Weimer and the Red Hat
  1970 + Product Security Team, this release includes a number of
  1971 + non-user-visible security hardening changes. Please see the
  1972 + ChangeLog file in the source distribution for the complete list.
  1973 +
  1974 + - When available, operating system-specific secure random number
  1975 + generation is used for generating initialization vectors and other
  1976 + random values used during encryption or file creation. For the
  1977 + Windows build, this results in an added dependency on Microsoft's
  1978 + cryptography API. To disable the OS-specific cryptography and use
  1979 + the old version, pass the
  1980 + :samp:`--enable-insecure-random` option to
  1981 + :command:`./configure`.
  1982 +
  1983 + - The :command:`qpdf` command-line tool now issues a
  1984 + warning when :samp:`-accessibility=n` is specified
  1985 + for newer encryption versions stating that the option is ignored.
  1986 + qpdf, per the spec, has always ignored this flag, but it
  1987 + previously did so silently. This warning is issued only by the
  1988 + command-line tool, not by the library. The library's handling of
  1989 + this flag is unchanged.
  1990 +
  1991 +5.0.0: July 10, 2013
  1992 + - Bug fix: previous versions of qpdf would lose objects with
  1993 + generation != 0 when generating object streams. Fixing this
  1994 + required changes to the public API.
  1995 +
  1996 + - Removed methods from public API that were only supposed to be
  1997 + called by QPDFWriter and couldn't realistically be called anywhere
  1998 + else. See ChangeLog for details.
  1999 +
  2000 + - New ``QPDFObjGen`` class added to represent an object
  2001 + ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now
  2002 + preferred over ``QPDFObjectHandle::getObjectID()`` and
  2003 + ``QPDFObjectHandle::getGeneration()`` as it makes it less likely
  2004 + for people to accidentally write code that ignores the generation
  2005 + number. See :file:`QPDF.hh` and
  2006 + :file:`QPDFObjectHandle.hh` for additional
  2007 + notes.
  2008 +
  2009 + - Add :samp:`--show-npages` command-line option to
  2010 + the :command:`qpdf` command to show the number of
  2011 + pages in a file.
  2012 +
  2013 + - Allow omission of the page range within
  2014 + :samp:`--pages` for the
  2015 + :command:`qpdf` command. When omitted, the page
  2016 + range is implicitly taken to be all the pages in the file.
  2017 +
  2018 + - Various enhancements were made to support different types of
  2019 + broken files or broken readers. Details can be found in
  2020 + :file:`ChangeLog`.
  2021 +
  2022 +4.1.0: April 14, 2013
  2023 + - Note to people including qpdf in distributions: the
  2024 + :file:`.la` files generated by libtool are now
  2025 + installed by qpdf's :command:`make install` target.
  2026 + Before, they were not installed. This means that if your
  2027 + distribution does not want to include
  2028 + :file:`.la` files, you must remove them as
  2029 + part of your packaging process.
  2030 +
  2031 + - Major enhancement: API enhancements have been made to support
  2032 + parsing of content streams. This enhancement includes the
  2033 + following changes:
  2034 +
  2035 + - ``QPDFObjectHandle::parseContentStream`` method parses objects
  2036 + in a content stream and calls handlers in a callback class. The
  2037 + example
  2038 + :file:`examples/pdf-parse-content.cc`
  2039 + illustrates how this may be used.
  2040 +
  2041 + - ``QPDFObjectHandle`` can now represent operators and inline
  2042 + images, object types that may only appear in content streams.
  2043 +
  2044 + - Method ``QPDFObjectHandle::getTypeCode()`` returns an
  2045 + enumerated type value representing the underlying object type.
  2046 + Method ``QPDFObjectHandle::getTypeName()`` returns a text
  2047 + string describing the name of the type of a
  2048 + ``QPDFObjectHandle`` object. These methods can be used for more
  2049 + efficient parsing and debugging/diagnostic messages.
  2050 +
  2051 + - :command:`qpdf --check` now parses all pages'
  2052 + content streams in addition to doing other checks. While there are
  2053 + still many types of errors that cannot be detected, syntactic
  2054 + errors in content streams will now be reported.
  2055 +
  2056 + - Minor compilation enhancements have been made to facilitate easier
  2057 + for support for a broader range of compilers and compiler
  2058 + versions.
  2059 +
  2060 + - Warning flags have been moved into a separate variable in
  2061 + :file:`autoconf.mk`
  2062 +
  2063 + - The configure flag :samp:`--enable-werror` work
  2064 + for Microsoft compilers
  2065 +
  2066 + - All MSVC CRT security warnings have been resolved.
  2067 +
  2068 + - All C-style casts in C++ Code have been replaced by C++ casts,
  2069 + and many casts that had been included to suppress higher
  2070 + warning levels for some compilers have been removed, primarily
  2071 + for clarity. Places where integer type coercion occurs have
  2072 + been scrutinized. A new casting policy has been documented in
  2073 + the manual. This is of concern mainly to people porting qpdf to
  2074 + new platforms or compilers. It is not visible to programmers
  2075 + writing code that uses the library
  2076 +
  2077 + - Some internal limits have been removed in code that converts
  2078 + numbers to strings. This is largely invisible to users, but it
  2079 + does trigger a bug in some older versions of mingw-w64's C++
  2080 + library. See :file:`README-windows.md` in
  2081 + the source distribution if you think this may affect you. The
  2082 + copy of the DLL distributed with qpdf's binary distribution is
  2083 + not affected by this problem.
  2084 +
  2085 + - The RPM spec file previously included with qpdf has been removed.
  2086 + This is because virtually all Linux distributions include qpdf now
  2087 + that it is a dependency of CUPS filters.
  2088 +
  2089 + - A few bug fixes are included:
  2090 +
  2091 + - Overridden compressed objects are properly handled. Before,
  2092 + there were certain constructs that could cause qpdf to see old
  2093 + versions of some objects. The most usual manifestation of this
  2094 + was loss of filled in form values for certain files.
  2095 +
  2096 + - Installation no longer uses GNU/Linux-specific versions of some
  2097 + commands, so :command:`make install` works on
  2098 + Solaris with native tools.
  2099 +
  2100 + - The 64-bit mingw Windows binary package no longer includes a
  2101 + 32-bit DLL.
  2102 +
  2103 +4.0.1: January 17, 2013
  2104 + - Fix detection of binary attachments in test suite to avoid false
  2105 + test failures on some platforms.
  2106 +
  2107 + - Add clarifying comment in :file:`QPDF.hh` to
  2108 + methods that return the user password explaining that it is no
  2109 + longer possible with newer encryption formats to recover the user
  2110 + password knowing the owner password. In earlier encryption
  2111 + formats, the user password was encrypted in the file using the
  2112 + owner password. In newer encryption formats, a separate encryption
  2113 + key is used on the file, and that key is independently encrypted
  2114 + using both the user password and the owner password.
  2115 +
  2116 +4.0.0: December 31, 2012
  2117 + - Major enhancement: support has been added for newer encryption
  2118 + schemes supported by version X of Adobe Acrobat. This includes use
  2119 + of 127-character passwords, 256-bit encryption keys, and the
  2120 + encryption scheme specified in ISO 32000-2, the PDF 2.0
  2121 + specification. This scheme can be chosen from the command line by
  2122 + specifying use of 256-bit keys. qpdf also supports the deprecated
  2123 + encryption method used by Acrobat IX. This encryption style has
  2124 + known security weaknesses and should not be used in practice.
  2125 + However, such files exist "in the wild," so support for this
  2126 + scheme is still useful. New methods
  2127 + ``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme)
  2128 + and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated
  2129 + scheme) have been added to enable these new encryption schemes.
  2130 + Corresponding functions have been added to the C API as well.
  2131 +
  2132 + - Full support for Adobe extension levels in PDF version
  2133 + information. Starting with PDF version 1.7, corresponding to ISO
  2134 + 32000, Adobe adds new functionality by increasing the extension
  2135 + level rather than increasing the version. This support includes
  2136 + addition of the ``QPDF::getExtensionLevel`` method for retrieving
  2137 + the document's extension level, addition of versions of
  2138 + ``QPDFWriter::setMinimumPDFVersion`` and
  2139 + ``QPDFWriter::forcePDFVersion`` that accept an extension level,
  2140 + and extended syntax for specifying forced and minimum versions on
  2141 + the command line as described in :ref:`ref.advanced-transformation`. Corresponding functions
  2142 + have been added to the C API as well.
  2143 +
  2144 + - Minor fixes to prevent qpdf from referencing objects in the file
  2145 + that are not referenced in the file's overall structure. Most
  2146 + files don't have any such objects, but some files have contain
  2147 + unreferenced objects with errors, so these fixes prevent qpdf from
  2148 + needlessly rejecting or complaining about such objects.
  2149 +
  2150 + - Add new generalized methods for reading and writing files from/to
  2151 + programmer-defined sources. The method
  2152 + ``QPDF::processInputSource`` allows the programmer to use any
  2153 + input source for the input file, and
  2154 + ``QPDFWriter::setOutputPipeline`` allows the programmer to write
  2155 + the output file through any pipeline. These methods would make it
  2156 + possible to perform any number of specialized operations, such as
  2157 + accessing external storage systems, creating bindings for qpdf in
  2158 + other programming languages that have their own I/O systems, etc.
  2159 +
  2160 + - Add new method ``QPDF::getEncryptionKey`` for retrieving the
  2161 + underlying encryption key used in the file.
  2162 +
  2163 + - This release includes a small handful of non-compatible API
  2164 + changes. While effort is made to avoid such changes, all the
  2165 + non-compatible API changes in this version were to parts of the
  2166 + API that would likely never be used outside the library itself. In
  2167 + all cases, the altered methods or structures were parts of the
  2168 + ``QPDF`` that were public to enable them to be called from either
  2169 + ``QPDFWriter`` or were part of validation code that was
  2170 + over-zealous in reporting problems in parts of the file that would
  2171 + not ordinarily be referenced. In no case did any of the removed
  2172 + methods do anything worse that falsely report error conditions in
  2173 + files that were broken in ways that didn't matter. The following
  2174 + public parts of the ``QPDF`` class were changed in a
  2175 + non-compatible way:
  2176 +
  2177 + - Updated nested ``QPDF::EncryptionData`` class to add fields
  2178 + needed by the newer encryption formats, member variables
  2179 + changed to private so that future changes will not require
  2180 + breaking backward compatibility.
  2181 +
  2182 + - Added additional parameters to ``compute_data_key``, which is
  2183 + used by ``QPDFWriter`` to compute the encryption key used to
  2184 + encrypt a specific object.
  2185 +
  2186 + - Removed the method ``flattenScalarReferences``. This method was
  2187 + previously used prior to writing a new PDF file, but it has the
  2188 + undesired side effect of causing qpdf to read objects in the
  2189 + file that were not referenced. Some otherwise files have
  2190 + unreferenced objects with errors in them, so this could cause
  2191 + qpdf to reject files that would be accepted by virtually all
  2192 + other PDF readers. In fact, qpdf relied on only a very small
  2193 + part of what flattenScalarReferences did, so only this part has
  2194 + been preserved, and it is now done directly inside
  2195 + ``QPDFWriter``.
  2196 +
  2197 + - Removed the method ``decodeStreams``. This method was used by
  2198 + the :samp:`--check` option of the
  2199 + :command:`qpdf` command-line tool to force all
  2200 + streams in the file to be decoded, but it also suffered from
  2201 + the problem of opening otherwise unreferenced streams and thus
  2202 + could report false positive. The
  2203 + :samp:`--check` option now causes qpdf to go
  2204 + through all the motions of writing a new file based on the
  2205 + original one, so it will always reference and check exactly
  2206 + those parts of a file that any ordinary viewer would check.
  2207 +
  2208 + - Removed the method ``trimTrailerForWrite``. This method was
  2209 + used by ``QPDFWriter`` to modify the original QPDF object by
  2210 + removing fields from the trailer dictionary that wouldn't apply
  2211 + to the newly written file. This functionality, though generally
  2212 + harmless, was a poor implementation and has been replaced by
  2213 + having QPDFWriter filter these out when copying the trailer
  2214 + rather than modifying the original QPDF object. (Note that qpdf
  2215 + never modifies the original file itself.)
  2216 +
  2217 + - Allow the PDF header to appear anywhere in the first 1024 bytes of
  2218 + the file. This is consistent with what other readers do.
  2219 +
  2220 + - Fix the :command:`pkg-config` files to list zlib
  2221 + and pcre in ``Requires.private`` to better support static linking
  2222 + using :command:`pkg-config`.
  2223 +
  2224 +3.0.2: September 6, 2012
  2225 + - Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not
  2226 + used with ``QPDFWriter::setStaticID``, which made it pretty much
  2227 + useless. This has been fixed.
  2228 +
  2229 + - New API call ``QPDFWriter::setExtraHeaderText`` inserts additional
  2230 + text near the header of the PDF file. The intended use case is to
  2231 + insert comments that may be consumed by a downstream application,
  2232 + though other use cases may exist.
  2233 +
  2234 +3.0.1: August 11, 2012
  2235 + - Version 3.0.0 included addition of files for
  2236 + :command:`pkg-config`, but this was not mentioned
  2237 + in the release notes. The release notes for 3.0.0 were updated to
  2238 + mention this.
  2239 +
  2240 + - Bug fix: if an object stream ended with a scalar object not
  2241 + followed by space, qpdf would incorrectly report that it
  2242 + encountered a premature EOF. This bug has been in qpdf since
  2243 + versionย 2.0.
  2244 +
  2245 +3.0.0: August 2, 2012
  2246 + - Acknowledgment: I would like to express gratitude for the
  2247 + contributions of Tobias Hoffmann toward the release of qpdf
  2248 + version 3.0. He is responsible for most of the implementation and
  2249 + design of the new API for manipulating pages, and contributed code
  2250 + and ideas for many of the improvements made in version 3.0.
  2251 + Without his work, this release would certainly not have happened
  2252 + as soon as it did, if at all.
  2253 +
  2254 + - *Non-compatible API changes:*
  2255 +
  2256 + - The method ``QPDFObjectHandle::replaceStreamData`` that uses a
  2257 + ``StreamDataProvider`` to provide the stream data no longer
  2258 + takes a ``length`` parameter. The parameter was removed since
  2259 + this provides the user an opportunity to simplify the calling
  2260 + code. This method was introduced in version 2.2. At the time,
  2261 + the ``length`` parameter was required in order to ensure that
  2262 + calls to the stream data provider returned the same length for a
  2263 + specific stream every time they were invoked. In particular, the
  2264 + linearization code depends on this. Instead, qpdf 3.0 and newer
  2265 + check for that constraint explicitly. The first time the stream
  2266 + data provider is called for a specific stream, the actual length
  2267 + is saved, and subsequent calls are required to return the same
  2268 + number of bytes. This means the calling code no longer has to
  2269 + compute the length in advance, which can be a significant
  2270 + simplification. If your code fails to compile because of the
  2271 + extra argument and you don't want to make other changes to your
  2272 + code, just omit the argument.
  2273 +
  2274 + - Many methods take ``long long`` instead of other integer types.
  2275 + Most if not all existing code should compile fine with this
  2276 + change since such parameters had always previously been smaller
  2277 + types. This change was required to support files larger than two
  2278 + gigabytes in size.
  2279 +
  2280 + - Support has been added for large files. The test suite verifies
  2281 + support for files larger than 4 gigabytes, and manual testing has
  2282 + verified support for files larger than 10 gigabytes. Large file
  2283 + support is available for both 32-bit and 64-bit platforms as long
  2284 + as the compiler and underlying platforms support it.
  2285 +
  2286 + - Support for page selection (splitting and merging PDF files) has
  2287 + been added to the :command:`qpdf` command-line
  2288 + tool. See :ref:`ref.page-selection`.
  2289 +
  2290 + - Options have been added to the :command:`qpdf`
  2291 + command-line tool for copying encryption parameters from another
  2292 + file. See :ref:`ref.basic-options`.
  2293 +
  2294 + - New methods have been added to the ``QPDF`` object for adding and
  2295 + removing pages. See :ref:`ref.adding-and-remove-pages`.
  2296 +
  2297 + - New methods have been added to the ``QPDF`` object for copying
  2298 + objects from other PDF files. See :ref:`ref.foreign-objects`
  2299 +
  2300 + - A new method ``QPDFObjectHandle::parse`` has been added for
  2301 + constructing ``QPDFObjectHandle`` objects from a string
  2302 + description.
  2303 +
  2304 + - Methods have been added to ``QPDFWriter`` to allow writing to an
  2305 + already open stdio ``FILE*`` addition to writing to standard
  2306 + output or a named file. Methods have been added to ``QPDF`` to be
  2307 + able to process a file from an already open stdio ``FILE*``. This
  2308 + makes it possible to read and write PDF from secure temporary
  2309 + files that have been unlinked prior to being fully read or
  2310 + written.
  2311 +
  2312 + - The ``QPDF::emptyPDF`` can be used to allow creation of PDF files
  2313 + from scratch. The example
  2314 + :file:`examples/pdf-create.cc` illustrates how
  2315 + it can be used.
  2316 +
  2317 + - Several methods to take ``PointerHolder<Buffer>`` can now also
  2318 + accept ``std::string`` arguments.
  2319 +
  2320 + - Many new convenience methods have been added to the library, most
  2321 + in ``QPDFObjectHandle``. See :file:`ChangeLog`
  2322 + for a full list.
  2323 +
  2324 + - When building on a platform that supports ELF shared libraries
  2325 + (such as Linux), symbol versions are enabled by default. They can
  2326 + be disabled by passing
  2327 + :samp:`--disable-ld-version-script` to
  2328 + :command:`./configure`.
  2329 +
  2330 + - The file :file:`libqpdf.pc` is now installed
  2331 + to support :command:`pkg-config`.
  2332 +
  2333 + - Image comparison tests are off by default now since they are not
  2334 + needed to verify a correct build or port of qpdf. They are needed
  2335 + only when changing the actual PDF output generated by qpdf. You
  2336 + should enable them if you are making deep changes to qpdf itself.
  2337 + See :file:`README.md` for details.
  2338 +
  2339 + - Large file tests are off by default but can be turned on with
  2340 + :command:`./configure` or by setting an environment
  2341 + variable before running the test suite. See
  2342 + :file:`README.md` for details.
  2343 +
  2344 + - When qpdf's test suite fails, failures are not printed to the
  2345 + terminal anymore by default. Instead, find them in
  2346 + :file:`build/qtest.log`. For packagers who are
  2347 + building with an autobuilder, you can add the
  2348 + :samp:`--enable-show-failed-test-output` option to
  2349 + :command:`./configure` to restore the old behavior.
  2350 +
  2351 +2.3.1: December 28, 2011
  2352 + - Fix thread-safety problem resulting from non-thread-safe use of
  2353 + the PCRE library.
  2354 +
  2355 + - Made a few minor documentation fixes.
  2356 +
  2357 + - Add workaround for a bug that appears in some versions of
  2358 + ghostscript to the test suite
  2359 +
  2360 + - Fix minor build issue for Visual C++ 2010.
  2361 +
  2362 +2.3.0: August 11, 2011
  2363 + - Bug fix: when preserving existing encryption on encrypted files
  2364 + with cleartext metadata, older qpdf versions would generate
  2365 + password-protected files with no valid password. This operation
  2366 + now works. This bug only affected files created by copying
  2367 + existing encryption parameters; explicit encryption with
  2368 + specification of cleartext metadata worked before and continues to
  2369 + work.
  2370 +
  2371 + - Enhance ``QPDFWriter`` with a new constructor that allows you to
  2372 + delay the specification of the output file. When using this
  2373 + constructor, you may now call ``QPDFWriter::setOutputFilename`` to
  2374 + specify the output file, or you may use
  2375 + ``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write
  2376 + the resulting PDF file to a memory buffer. You may then use
  2377 + ``QPDFWriter::getBuffer`` to retrieve the memory buffer.
  2378 +
  2379 + - Add new API call ``QPDF::replaceObject`` for replacing objects by
  2380 + object ID
  2381 +
  2382 + - Add new API call ``QPDF::swapObjects`` for swapping two objects by
  2383 + object ID
  2384 +
  2385 + - Add ``QPDFObjectHandle::getDictAsMap`` and
  2386 + ``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of
  2387 + dictionary objects as maps and array objects as vectors.
  2388 +
  2389 + - Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to
  2390 + the C API for manipulating string fields of the document's
  2391 + ``/Info`` dictionary.
  2392 +
  2393 + - Add functions ``qpdf_init_write_memory``,
  2394 + ``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API
  2395 + for writing PDF files to a memory buffer instead of a file.
  2396 +
  2397 +2.2.4: June 25, 2011
  2398 + - Fix installation and compilation issues; no functionality changes.
  2399 +
  2400 +2.2.3: April 30, 2011
  2401 + - Handle some damaged streams with incorrect characters following
  2402 + the stream keyword.
  2403 +
  2404 + - Improve handling of inline images when normalizing content
  2405 + streams.
  2406 +
  2407 + - Enhance error recovery to properly handle files that use object 0
  2408 + as a regular object, which is specifically disallowed by the spec.
  2409 +
  2410 +2.2.2: October 4, 2010
  2411 + - Add new function ``qpdf_read_memory`` to the C API to call
  2412 + ``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1.
  2413 +
  2414 +2.2.1: October 1, 2010
  2415 + - Add new method ``QPDF::setOutputStreams`` to replace ``std::cout``
  2416 + and ``std::cerr`` with other streams for generation of diagnostic
  2417 + messages and error messages. This can be useful for GUIs or other
  2418 + applications that want to capture any output generated by the
  2419 + library to present to the user in some other way. Note that QPDF
  2420 + does not write to ``std::cout`` (or the specified output stream)
  2421 + except where explicitly mentioned in
  2422 + :file:`QPDF.hh`, and that the only use of the
  2423 + error stream is for warnings. Note also that output of warnings is
  2424 + suppressed when ``setSuppressWarnings(true)`` is called.
  2425 +
  2426 + - Add new method ``QPDF::processMemoryFile`` for operating on PDF
  2427 + files that are loaded into memory rather than in a file on disk.
  2428 +
  2429 + - Give a warning but otherwise ignore empty PDF objects by treating
  2430 + them as null. Empty object are not permitted by the PDF
  2431 + specification but have been known to appear in some actual PDF
  2432 + files.
  2433 +
  2434 + - Handle inline image filter abbreviations when the appear as stream
  2435 + filter abbreviations. The PDF specification does not allow use of
  2436 + stream filter abbreviations in this way, but Adobe Reader and some
  2437 + other PDF readers accept them since they sometimes appear
  2438 + incorrectly in actual PDF files.
  2439 +
  2440 + - Implement miscellaneous enhancements to ``PointerHolder`` and
  2441 + ``Buffer`` to support other changes.
  2442 +
  2443 +2.2.0: August 14, 2010
  2444 + - Add new methods to ``QPDFObjectHandle`` (``newStream`` and
  2445 + ``replaceStreamData`` for creating new streams and replacing
  2446 + stream data. This makes it possible to perform a wide range of
  2447 + operations that were not previously possible.
  2448 +
  2449 + - Add new helper method in ``QPDFObjectHandle``
  2450 + (``addPageContents``) for appending or prepending new content
  2451 + streams to a page. This method makes it possible to manipulate
  2452 + content streams without having to be concerned whether a page's
  2453 + contents are a single stream or an array of streams.
  2454 +
  2455 + - Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``,
  2456 + which replaces a dictionary key with a given value unless the
  2457 + value is null, in which case it removes the key instead.
  2458 +
  2459 + - Add new method in ``QPDFObjectHandle``: ``getRawStreamData``,
  2460 + which returns the raw (unfiltered) stream data into a buffer. This
  2461 + complements the ``getStreamData`` method, which returns the
  2462 + filtered (uncompressed) stream data and can only be used when the
  2463 + stream's data is filterable.
  2464 +
  2465 + - Provide two new examples:
  2466 + :command:`pdf-double-page-size` and
  2467 + :command:`pdf-invert-images` that illustrate the
  2468 + newly added interfaces.
  2469 +
  2470 + - Fix a memory leak that would cause loss of a few bytes for every
  2471 + object involved in a cycle of object references. Thanks to Jian Ma
  2472 + for calling my attention to the leak.
  2473 +
  2474 +2.1.5: April 25, 2010
  2475 + - Remove restriction of file identifier strings to 16 bytes. This
  2476 + unnecessary restriction was preventing qpdf from being able to
  2477 + encrypt or decrypt files with identifier strings that were not
  2478 + exactly 16 bytes long. The specification imposes no such
  2479 + restriction.
  2480 +
  2481 +2.1.4: April 18, 2010
  2482 + - Apply the same padding calculation fix from version 2.1.2 to the
  2483 + main cross reference stream as well.
  2484 +
  2485 + - Since :command:`qpdf --check` only performs limited
  2486 + checks, clarify the output to make it clear that there still may
  2487 + be errors that qpdf can't check. This should make it less
  2488 + surprising to people when another PDF reader is unable to read a
  2489 + file that qpdf thinks is okay.
  2490 +
  2491 +2.1.3: March 27, 2010
  2492 + - Fix bug that could cause a failure when rewriting PDF files that
  2493 + contain object streams with unreferenced objects that in turn
  2494 + reference indirect scalars.
  2495 +
  2496 + - Don't complain about (invalid) AES streams that aren't a multiple
  2497 + of 16 bytes. Instead, pad them before decrypting.
  2498 +
  2499 +2.1.2: January 24, 2010
  2500 + - Fix bug in padding around first half cross reference stream in
  2501 + linearized files. The bug could cause an assertion failure when
  2502 + linearizing certain unlucky files.
  2503 +
  2504 +2.1.1: December 14, 2009
  2505 + - No changes in functionality; insert missing include in an internal
  2506 + library header file to support gcc 4.4, and update test suite to
  2507 + ignore broken Adobe Reader installations.
  2508 +
  2509 +2.1: October 30, 2009
  2510 + - This is the first version of qpdf to include Windows support. On
  2511 + Windows, it is possible to build a DLL. Additionally, a partial
  2512 + C-language API has been introduced, which makes it possible to
  2513 + call qpdf functions from non-C++ environments. I am very grateful
  2514 + to ลฝarko Gajiฤ‡ (http://zarko-gajic.iz.hr/) for tirelessly testing
  2515 + numerous pre-release versions of this DLL and providing many
  2516 + excellent suggestions on improving the interface.
  2517 +
  2518 + For programming to the C interface, please see the header file
  2519 + :file:`qpdf/qpdf-c.h` and the example
  2520 + :file:`examples/pdf-linearize.c`.
  2521 +
  2522 + - ลฝarko Gajiฤ‡ has written a Delphi wrapper for qpdf, which can be
  2523 + downloaded from qpdf's download side. ลฝarko's Delphi wrapper is
  2524 + released with the same licensing terms as qpdf itself and comes
  2525 + with this disclaimer: "Delphi wrapper unit
  2526 + :file:`qpdf.pas` created by ลฝarko Gajiฤ‡
  2527 + (http://zarko-gajic.iz.hr/). Use at your own risk and for whatever
  2528 + purpose you want. No support is provided. Sample code is
  2529 + provided."
  2530 +
  2531 + - Support has been added for AES encryption and crypt filters.
  2532 + Although qpdf does not presently support files that use PKI-based
  2533 + encryption, with the addition of AES and crypt filters, qpdf is
  2534 + now be able to open most encrypted files created with newer
  2535 + versions of Acrobat or other PDF creation software. Note that I
  2536 + have not been able to get very many files encrypted in this way,
  2537 + so it's possible there could still be some cases that qpdf can't
  2538 + handle. Please report them if you find them.
  2539 +
  2540 + - Many error messages have been improved to include more information
  2541 + in hopes of making qpdf a more useful tool for PDF experts to use
  2542 + in manually recovering damaged PDF files.
  2543 +
  2544 + - Attempt to avoid compressing metadata streams if possible. This is
  2545 + consistent with other PDF creation applications.
  2546 +
  2547 + - Provide new command-line options for AES encrypt, cleartext
  2548 + metadata, and setting the minimum and forced PDF versions of
  2549 + output files.
  2550 +
  2551 + - Add additional methods to the ``QPDF`` object for querying the
  2552 + document's permissions. Although qpdf does not enforce these
  2553 + permissions, it does make them available so that applications that
  2554 + use qpdf can enforce permissions.
  2555 +
  2556 + - The :samp:`--check` option to
  2557 + :command:`qpdf` has been extended to include some
  2558 + additional information.
  2559 +
  2560 + - *Non-compatible API changes:*
  2561 +
  2562 + - QPDF's exception handling mechanism now uses
  2563 + ``std::logic_error`` for internal errors and
  2564 + ``std::runtime_error`` for runtime errors in favor of the now
  2565 + removed ``QEXC`` classes used in previous versions. The ``QEXC``
  2566 + exception classes predated the addition of the
  2567 + :file:`<stdexcept>` header file to the C++ standard library.
  2568 + Most of the exceptions thrown by the qpdf library itself are
  2569 + still of type ``QPDFExc`` which is now derived from
  2570 + ``std::runtime_error``. Programs that catch an instance of
  2571 + ``std::exception`` and displayed it by calling the ``what()``
  2572 + method will not need to be changed.
  2573 +
  2574 + - The ``QPDFExc`` class now internally represents various fields
  2575 + of the error condition and provides interfaces for querying
  2576 + them. Among the fields is a numeric error code that can help
  2577 + applications act differently on (a small number of) different
  2578 + error conditions. See :file:`QPDFExc.hh` for details.
  2579 +
  2580 + - Warnings can be retrieved from qpdf as instances of ``QPDFExc``
  2581 + instead of strings.
  2582 +
  2583 + - The nested ``QPDF::EncryptionData`` class's constructor takes an
  2584 + additional argument. This class is primarily intended to be used
  2585 + by ``QPDFWriter``. There's not really anything useful an
  2586 + end-user application could do with it. It probably shouldn't
  2587 + really be part of the public interface to begin with. Likewise,
  2588 + some of the methods for computing internal encryption dictionary
  2589 + parameters have changed to support ``/R=4`` encryption.
  2590 +
  2591 + - The method ``QPDF::getUserPassword`` has been removed since it
  2592 + didn't do what people would think it did. There are now two new
  2593 + methods: ``QPDF::getPaddedUserPassword`` and
  2594 + ``QPDF::getTrimmedUserPassword``. The first one does what the
  2595 + old ``QPDF::getUserPassword`` method used to do, which is to
  2596 + return the password with possible binary padding as specified by
  2597 + the PDF specification. The second one returns a human-readable
  2598 + password string.
  2599 +
  2600 + - The enumerated types that used to be nested in ``QPDFWriter``
  2601 + have moved to top-level enumerated types and are now defined in
  2602 + the file :file:`qpdf/Constants.h`. This enables them to be
  2603 + shared by both the C and C++ interfaces.
  2604 +
  2605 +2.0.6: May 3, 2009
  2606 + - Do not attempt to uncompress streams that have decode parameters
  2607 + we don't recognize. Earlier versions of qpdf would have rejected
  2608 + files with such streams.
  2609 +
  2610 +2.0.5: March 10, 2009
  2611 + - Improve error handling in the LZW decoder, and fix a small error
  2612 + introduced in the previous version with regard to handling full
  2613 + tables. The LZW decoder has been more strongly verified in this
  2614 + release.
  2615 +
  2616 +2.0.4: February 21, 2009
  2617 + - Include proper support for LZW streams encoded without the "early
  2618 + code change" flag. Special thanks to Atom Smasher who reported the
  2619 + problem and provided an input file compressed in this way, which I
  2620 + did not previously have.
  2621 +
  2622 + - Implement some improvements to file recovery logic.
  2623 +
  2624 +2.0.3: February 15, 2009
  2625 + - Compile cleanly with gcc 4.4.
  2626 +
  2627 + - Handle strings encoded as UTF-16BE properly.
  2628 +
  2629 +2.0.2: June 30, 2008
  2630 + - Update test suite to work properly with a
  2631 + non-:command:`bash`
  2632 + :file:`/bin/sh` and with Perl 5.10. No changes
  2633 + were made to the actual qpdf source code itself for this release.
  2634 +
  2635 +2.0.1: May 6, 2008
  2636 + - No changes in functionality or interface. This release includes
  2637 + fixes to the source code so that qpdf compiles properly and passes
  2638 + its test suite on a broader range of platforms. See
  2639 + :file:`ChangeLog` in the source distribution
  2640 + for details.
  2641 +
  2642 +2.0: April 29, 2008
  2643 + - First public release.
... ...
manual/weak-crypto.rst 0 โ†’ 100644
  1 +.. _ref.weak-crypto:
  2 +
  3 +Weak Cryptography
  4 +=================
  5 +
  6 +Start with version 10.4, qpdf is taking steps to reduce the likelihood
  7 +of a user *accidentally* creating PDF files with insecure cryptography
  8 +but will continue to allow creation of such files indefinitely with
  9 +explicit acknowledgment.
  10 +
  11 +The PDF file format makes use of RC4, which is known to be a weak
  12 +cryptography algorithm, and MD5, which is a weak hashing algorithm. In
  13 +version 10.4, qpdf generates warnings for some (but not all) cases of
  14 +writing files with weak cryptography when invoked from the command-line.
  15 +These warnings can be suppressed using the
  16 +:samp:`--allow-weak-crypto` option.
  17 +
  18 +It is planned for qpdf version 11 to be stricter, making it an error to
  19 +write files with insecure cryptography from the command-line tool in
  20 +most cases without specifying the
  21 +:samp:`--allow-weak-crypto` flag and also to require
  22 +explicit steps when using the C++ library to enable use of insecure
  23 +cryptography.
  24 +
  25 +Note that qpdf must always retain support for weak cryptographic
  26 +algorithms since this is required for reading older PDF files that use
  27 +it. Additionally, qpdf will always retain the ability to create files
  28 +using weak cryptographic algorithms since, as a development tool, qpdf
  29 +explicitly supports creating older or deprecated types of PDF files
  30 +since these are sometimes needed to test or work with older versions of
  31 +software. Even if other cryptography libraries drop support for RC4 or
  32 +MD5, qpdf can always fall back to its internal implementations of those
  33 +algorithms, so they are not going to disappear from qpdf.
... ...