Commit 10fb619d3e0618528b7ac6c20cad6262020cf947

Authored by Jay Berkenbilt
1 parent f3d1138b

Split documentation into multiple pages, change theme

@@ -30,8 +30,6 @@ Before release: @@ -30,8 +30,6 @@ Before release:
30 I can do about, and it doesn't seem worth fixing. Maybe mention it 30 I can do about, and it doesn't seem worth fixing. Maybe mention it
31 somewhere? 31 somewhere?
32 * README-maintainer: Fix installation of documentation to website 32 * README-maintainer: Fix installation of documentation to website
33 -* Get navigation working properly  
34 -* Figure out where to put :ref:`search` so we get doc search  
35 33
36 Soon: 34 Soon:
37 35
manual/acknowledgement.rst 0 โ†’ 100644
  1 +.. _acknowledgments:
  2 +
  3 +Acknowledgment
  4 +==============
  5 +
  6 +QPDF was originally created in 2001 and modified periodically between
  7 +2001 and 2005 during my employment at `Apex CoVantage
  8 +<http://www.apexcovantage.com>`__. Upon my departure from Apex, the
  9 +company graciously allowed me to take ownership of the software and
  10 +continue maintaining it as an open source project, a decision for which I
  11 +am very grateful. I have made considerable enhancements to it since
  12 +that time. I feel fortunate to have worked for people who would make
  13 +such a decision. This work would not have been possible without their
  14 +support.
manual/cli.rst 0 โ†’ 100644
  1 +.. _ref.using:
  2 +
  3 +Running QPDF
  4 +============
  5 +
  6 +This chapter describes how to run the qpdf program from the command
  7 +line.
  8 +
  9 +.. _ref.invocation:
  10 +
  11 +Basic Invocation
  12 +----------------
  13 +
  14 +When running qpdf, the basic invocation is as follows:
  15 +
  16 +::
  17 +
  18 + qpdf [ options ] { infilename | --empty } outfilename
  19 +
  20 +This converts PDF file :samp:`infilename` to PDF file
  21 +:samp:`outfilename`. The output file is functionally
  22 +identical to the input file but may have been structurally reorganized.
  23 +Also, orphaned objects will be removed from the file. Many
  24 +transformations are available as controlled by the options below. In
  25 +place of :samp:`infilename`, the parameter
  26 +:samp:`--empty` may be specified. This causes qpdf to
  27 +use a dummy input file that contains zero pages. The only normal use
  28 +case for using :samp:`--empty` would be if you were
  29 +going to add pages from another source, as discussed in :ref:`ref.page-selection`.
  30 +
  31 +If :samp:`@filename` appears as a word anywhere in the
  32 +command-line, it will be read line by line, and each line will be
  33 +treated as a command-line argument. Leading and trailing whitespace is
  34 +intentionally not removed from lines, which makes it possible to handle
  35 +arguments that start or end with spaces. The :samp:`@-`
  36 +option allows arguments to be read from standard input. This allows qpdf
  37 +to be invoked with an arbitrary number of arbitrarily long arguments. It
  38 +is also very useful for avoiding having to pass passwords on the command
  39 +line. Note that the :samp:`@filename` can't appear in
  40 +the middle of an argument, so constructs such as
  41 +:samp:`--arg=@option` will not work. You would have to
  42 +include the argument and its options together in the arguments file.
  43 +
  44 +:samp:`outfilename` does not have to be seekable, even
  45 +when generating linearized files. Specifying ":samp:`-`"
  46 +as :samp:`outfilename` means to write to standard
  47 +output. If you want to overwrite the input file with the output, use the
  48 +option :samp:`--replace-input` and omit the output file
  49 +name. You can't specify the same file as both the input and the output.
  50 +If you do this, qpdf will tell you about the
  51 +:samp:`--replace-input` option.
  52 +
  53 +Most options require an output file, but some testing or inspection
  54 +commands do not. These are specifically noted.
  55 +
  56 +.. _ref.exit-status:
  57 +
  58 +Exit Status
  59 +~~~~~~~~~~~
  60 +
  61 +The exit status of :command:`qpdf` may be interpreted as
  62 +follows:
  63 +
  64 +- ``0``: no errors or warnings were found. The file may still have
  65 + problems qpdf can't detect. If
  66 + :samp:`--warning-exit-0` was specified, exit status 0
  67 + is used even if there are warnings.
  68 +
  69 +- ``2``: errors were found. qpdf was not able to fully process the
  70 + file.
  71 +
  72 +- ``3``: qpdf encountered problems that it was able to recover from. In
  73 + some cases, the resulting file may still be damaged. Note that qpdf
  74 + still exits with status ``3`` if it finds warnings even when
  75 + :samp:`--no-warn` is specified. With
  76 + :samp:`--warning-exit-0`, warnings without errors
  77 + exit with status 0 instead of 3.
  78 +
  79 +Note that :command:`qpdf` never exists with status ``1``.
  80 +If you get an exit status of ``1``, it was something else, like the
  81 +shell not being able to find or execute :command:`qpdf`.
  82 +
  83 +.. _ref.shell-completion:
  84 +
  85 +Shell Completion
  86 +----------------
  87 +
  88 +Starting in qpdf version 8.3.0, qpdf provides its own completion support
  89 +for zsh and bash. You can enable bash completion with :command:`eval
  90 +$(qpdf --completion-bash)` and zsh completion with
  91 +:command:`eval $(qpdf --completion-zsh)`. If
  92 +:command:`qpdf` is not in your path, you should invoke it
  93 +above with an absolute path. If you invoke it with a relative path, it
  94 +will warn you, and the completion won't work if you're in a different
  95 +directory.
  96 +
  97 +qpdf will use ``argv[0]`` to figure out where its executable is. This
  98 +may produce unwanted results in some cases, especially if you are trying
  99 +to use completion with copy of qpdf that is built from source. You can
  100 +specify a full path to the qpdf you want to use for completion in the
  101 +``QPDF_EXECUTABLE`` environment variable.
  102 +
  103 +.. _ref.basic-options:
  104 +
  105 +Basic Options
  106 +-------------
  107 +
  108 +The following options are the most common ones and perform commonly
  109 +needed transformations.
  110 +
  111 +:samp:`--help`
  112 + Display command-line invocation help.
  113 +
  114 +:samp:`--version`
  115 + Display the current version of qpdf.
  116 +
  117 +:samp:`--copyright`
  118 + Show detailed copyright information.
  119 +
  120 +:samp:`--show-crypto`
  121 + Show a list of available crypto providers, each on a line by itself.
  122 + The default provider is always listed first. See :ref:`ref.crypto` for more information about crypto
  123 + providers.
  124 +
  125 +:samp:`--completion-bash`
  126 + Output a completion command you can eval to enable shell completion
  127 + from bash.
  128 +
  129 +:samp:`--completion-zsh`
  130 + Output a completion command you can eval to enable shell completion
  131 + from zsh.
  132 +
  133 +:samp:`--password={password}`
  134 + Specifies a password for accessing encrypted files. To read the
  135 + password from a file or standard input, you can use
  136 + :samp:`--password-file`, added in qpdf 10.2. Note
  137 + that you can also use :samp:`@filename` or
  138 + :samp:`@-` as described above to put the password in
  139 + a file or pass it via standard input, but you would do so by
  140 + specifying the entire
  141 + :samp:`--password={password}`
  142 + option in the file. Syntax such as
  143 + :samp:`--password=@filename` won't work since
  144 + :samp:`@filename` is not recognized in the middle of
  145 + an argument.
  146 +
  147 +:samp:`--password-file={filename}`
  148 + Reads the first line from the specified file and uses it as the
  149 + password for accessing encrypted files.
  150 + :samp:`{filename}`
  151 + may be ``-`` to read the password from standard input. Note that, in
  152 + this case, the password is echoed and there is no prompt, so use with
  153 + caution.
  154 +
  155 +:samp:`--is-encrypted`
  156 + Silently exit with status 0 if the file is encrypted or status 2 if
  157 + the file is not encrypted. This is useful for shell scripts. Other
  158 + options are ignored if this is given. This option is mutually
  159 + exclusive with :samp:`--requires-password`. Both this
  160 + option and :samp:`--requires-password` exit with
  161 + status 2 for non-encrypted files.
  162 +
  163 +:samp:`--requires-password`
  164 + Silently exit with status 0 if a password (other than as supplied) is
  165 + required. Exit with status 2 if the file is not encrypted. Exit with
  166 + status 3 if the file is encrypted but requires no password or the
  167 + correct password has been supplied. This is useful for shell scripts.
  168 + Note that any supplied password is used when opening the file. When
  169 + used with a :samp:`--password` option, this option
  170 + can be used to check the correctness of the password. In that case,
  171 + an exit status of 3 means the file works with the supplied password.
  172 + This option is mutually exclusive with
  173 + :samp:`--is-encrypted`. Both this option and
  174 + :samp:`--is-encrypted` exit with status 2 for
  175 + non-encrypted files.
  176 +
  177 +:samp:`--verbose`
  178 + Increase verbosity of output. For now, this just prints some
  179 + indication of any file that it creates.
  180 +
  181 +:samp:`--progress`
  182 + Indicate progress while writing files.
  183 +
  184 +:samp:`--no-warn`
  185 + Suppress writing of warnings to stderr. If warnings were detected and
  186 + suppressed, :command:`qpdf` will still exit with exit
  187 + code 3. See also :samp:`--warning-exit-0`.
  188 +
  189 +:samp:`--warning-exit-0`
  190 + If warnings are found but no errors, exit with exit code 0 instead 3.
  191 + When combined with :samp:`--no-warn`, the effect is
  192 + for :command:`qpdf` to completely ignore warnings.
  193 +
  194 +:samp:`--linearize`
  195 + Causes generation of a linearized (web-optimized) output file.
  196 +
  197 +:samp:`--replace-input`
  198 + If specified, the output file name should be omitted. This option
  199 + tells qpdf to replace the input file with the output. It does this by
  200 + writing to
  201 + :file:`{infilename}.~qpdf-temp#`
  202 + and, when done, overwriting the input file with the temporary file.
  203 + If there were any warnings, the original input is saved as
  204 + :file:`{infilename}.~qpdf-orig`.
  205 +
  206 +:samp:`--copy-encryption=file`
  207 + Encrypt the file using the same encryption parameters, including user
  208 + and owner password, as the specified file. Use
  209 + :samp:`--encryption-file-password` to specify a
  210 + password if one is needed to open this file. Note that copying the
  211 + encryption parameters from a file also copies the first half of
  212 + ``/ID`` from the file since this is part of the encryption
  213 + parameters.
  214 +
  215 +:samp:`--encryption-file-password=password`
  216 + If the file specified with :samp:`--copy-encryption`
  217 + requires a password, specify the password using this option. Note
  218 + that only one of the user or owner password is required. Both
  219 + passwords will be preserved since QPDF does not distinguish between
  220 + the two passwords. It is possible to preserve encryption parameters,
  221 + including the owner password, from a file even if you don't know the
  222 + file's owner password.
  223 +
  224 +:samp:`--allow-weak-crypto`
  225 + Starting with version 10.4, qpdf issues warnings when requested to
  226 + create files using RC4 encryption. This option suppresses those
  227 + warnings. In future versions of qpdf, qpdf will refuse to create
  228 + files with weak cryptography when this flag is not given. See :ref:`ref.weak-crypto` for additional details.
  229 +
  230 +:samp:`--encrypt options --`
  231 + Causes generation an encrypted output file. Please see :ref:`ref.encryption-options` for details on how to specify
  232 + encryption parameters.
  233 +
  234 +:samp:`--decrypt`
  235 + Removes any encryption on the file. A password must be supplied if
  236 + the file is password protected.
  237 +
  238 +:samp:`--password-is-hex-key`
  239 + Overrides the usual computation/retrieval of the PDF file's
  240 + encryption key from user/owner password with an explicit
  241 + specification of the encryption key. When this option is specified,
  242 + the argument to the :samp:`--password` option is
  243 + interpreted as a hexadecimal-encoded key value. This only applies to
  244 + the password used to open the main input file. It does not apply to
  245 + other files opened by :samp:`--pages` or other
  246 + options or to files being written.
  247 +
  248 + Most users will never have a need for this option, and no standard
  249 + viewers support this mode of operation, but it can be useful for
  250 + forensic or investigatory purposes. For example, if a PDF file is
  251 + encrypted with an unknown password, a brute-force attack using the
  252 + key directly is sometimes more efficient than one using the password.
  253 + Also, if a file is heavily damaged, it may be possible to derive the
  254 + encryption key and recover parts of the file using it directly. To
  255 + expose the encryption key used by an encrypted file that you can open
  256 + normally, use the :samp:`--show-encryption-key`
  257 + option.
  258 +
  259 +:samp:`--suppress-password-recovery`
  260 + Ordinarily, qpdf attempts to automatically compensate for passwords
  261 + specified in the wrong character encoding. This option suppresses
  262 + that behavior. Under normal conditions, there are no reasons to use
  263 + this option. See :ref:`ref.unicode-passwords` for a
  264 + discussion
  265 +
  266 +:samp:`--password-mode={mode}`
  267 + This option can be used to fine-tune how qpdf interprets Unicode
  268 + (non-ASCII) password strings passed on the command line. With the
  269 + exception of the :samp:`hex-bytes` mode, these only
  270 + apply to passwords provided when encrypting files. The
  271 + :samp:`hex-bytes` mode also applies to passwords
  272 + specified for reading files. For additional discussion of the
  273 + supported password modes and when you might want to use them, see
  274 + :ref:`ref.unicode-passwords`. The following modes
  275 + are supported:
  276 +
  277 + - :samp:`auto`: Automatically determine whether the
  278 + specified password is a properly encoded Unicode (UTF-8) string,
  279 + and transcode it as required by the PDF spec based on the type
  280 + encryption being applied. On Windows starting with version 8.4.0,
  281 + and on almost all other modern platforms, incoming passwords will
  282 + be properly encoded in UTF-8, so this is almost always what you
  283 + want.
  284 +
  285 + - :samp:`unicode`: Tells qpdf that the incoming
  286 + password is UTF-8, overriding whatever its automatic detection
  287 + determines. The only difference between this mode and
  288 + :samp:`auto` is that qpdf will fail with an error
  289 + message if the password is not valid UTF-8 instead of falling back
  290 + to :samp:`bytes` mode with a warning.
  291 +
  292 + - :samp:`bytes`: Interpret the password as a literal
  293 + byte string. For non-Windows platforms, this is what versions of
  294 + qpdf prior to 8.4.0 did. For Windows platforms, there is no way to
  295 + specify strings of binary data on the command line directly, but
  296 + you can use the :samp:`@filename` option to do it,
  297 + in which case this option forces qpdf to respect the string of
  298 + bytes as provided. This option will allow you to encrypt PDF files
  299 + with passwords that will not be usable by other readers.
  300 +
  301 + - :samp:`hex-bytes`: Interpret the password as a
  302 + hex-encoded string. This provides a way to pass binary data as a
  303 + password on all platforms including Windows. As with
  304 + :samp:`bytes`, this option may allow creation of
  305 + files that can't be opened by other readers. This mode affects
  306 + qpdf's interpretation of passwords specified for decrypting files
  307 + as well as for encrypting them. It makes it possible to specify
  308 + strings that are encoded in some manner other than the system's
  309 + default encoding.
  310 +
  311 +:samp:`--rotate=[+|-]angle[:page-range]`
  312 + Apply rotation to specified pages. The
  313 + :samp:`page-range` portion of the option value has
  314 + the same format as page ranges in :ref:`ref.page-selection`. If the page range is omitted, the
  315 + rotation is applied to all pages. The :samp:`angle`
  316 + portion of the parameter may be either 0, 90, 180, or 270. If
  317 + preceded by :samp:`+` or :samp:`-`,
  318 + the angle is added to or subtracted from the specified pages'
  319 + original rotations. This is almost always what you want. Otherwise
  320 + the pages' rotations are set to the exact value, which may cause the
  321 + appearances of the pages to be inconsistent, especially for scans.
  322 + For example, the command :command:`qpdf in.pdf out.pdf
  323 + --rotate=+90:2,4,6 --rotate=180:7-8` would rotate pages
  324 + 2, 4, and 6 90 degrees clockwise from their original rotation and
  325 + force the rotation of pages 7 through 8 to 180 degrees regardless of
  326 + their original rotation, and the command :command:`qpdf in.pdf
  327 + out.pdf --rotate=+180` would rotate all pages by 180
  328 + degrees.
  329 +
  330 +:samp:`--keep-files-open={[yn]}`
  331 + This option controls whether qpdf keeps individual files open while
  332 + merging. Prior to version 8.1.0, qpdf always kept all files open, but
  333 + this meant that the number of files that could be merged was limited
  334 + by the operating system's open file limit. Version 8.1.0 opened files
  335 + as they were referenced and closed them after each read, but this
  336 + caused a major performance impact. Version 8.2.0 optimized the
  337 + performance but did so in a way that, for local file systems, there
  338 + was a small but unavoidable performance hit, but for networked file
  339 + systems, the performance impact could be very high. Starting with
  340 + version 8.2.1, the default behavior is that files are kept open if no
  341 + more than 200 files are specified, but this default behavior can be
  342 + explicitly overridden with the
  343 + :samp:`--keep-files-open` flag. If you are merging
  344 + more than 200 files but less than the operating system's max open
  345 + files limit, you may want to use
  346 + :samp:`--keep-files-open=y`, especially if working
  347 + over a networked file system. If you are using a local file system
  348 + where the overhead is low and you might sometimes merge more than the
  349 + OS limit's number of files from a script and are not worried about a
  350 + few seconds additional processing time, you may want to specify
  351 + :samp:`--keep-files-open=n`. The threshold for
  352 + switching may be changed from the default 200 with the
  353 + :samp:`--keep-files-open-threshold` option.
  354 +
  355 +:samp:`--keep-files-open-threshold={count}`
  356 + If specified, overrides the default value of 200 used as the
  357 + threshold for qpdf deciding whether or not to keep files open. See
  358 + :samp:`--keep-files-open` for details.
  359 +
  360 +:samp:`--pages options --`
  361 + Select specific pages from one or more input files. See :ref:`ref.page-selection` for details on how to do
  362 + page selection (splitting and merging).
  363 +
  364 +:samp:`--collate={n}`
  365 + When specified, collate rather than concatenate pages from files
  366 + specified with :samp:`--pages`. With a numeric
  367 + argument, collate in groups of :samp:`{n}`.
  368 + The default is 1. See :ref:`ref.page-selection` for additional details.
  369 +
  370 +:samp:`--flatten-rotation`
  371 + For each page that is rotated using the ``/Rotate`` key in the page's
  372 + dictionary, remove the ``/Rotate`` key and implement the identical
  373 + rotation semantics by modifying the page's contents. This option can
  374 + be useful to prepare files for buggy PDF applications that don't
  375 + properly handle rotated pages.
  376 +
  377 +:samp:`--split-pages=[n]`
  378 + Write each group of :samp:`n` pages to a separate
  379 + output file. If :samp:`n` is not specified, create
  380 + single pages. Output file names are generated as follows:
  381 +
  382 + - If the string ``%d`` appears in the output file name, it is
  383 + replaced with a range of zero-padded page numbers starting from 1.
  384 +
  385 + - Otherwise, if the output file name ends in
  386 + :file:`.pdf` (case insensitive), a zero-padded
  387 + page range, preceded by a dash, is inserted before the file
  388 + extension.
  389 +
  390 + - Otherwise, the file name is appended with a zero-padded page range
  391 + preceded by a dash.
  392 +
  393 + Page ranges are a single number in the case of single-page groups or
  394 + two numbers separated by a dash otherwise. For example, if
  395 + :file:`infile.pdf` has 12 pages
  396 +
  397 + - :command:`qpdf --split-pages infile.pdf %d-out`
  398 + would generate files :file:`01-out` through
  399 + :file:`12-out`
  400 +
  401 + - :command:`qpdf --split-pages=2 infile.pdf
  402 + outfile.pdf` would generate files
  403 + :file:`outfile-01-02.pdf` through
  404 + :file:`outfile-11-12.pdf`
  405 +
  406 + - :command:`qpdf --split-pages infile.pdf
  407 + something.else` would generate files
  408 + :file:`something.else-01` through
  409 + :file:`something.else-12`
  410 +
  411 + Note that outlines, threads, and other global features of the
  412 + original PDF file are not preserved. For each page of output, this
  413 + option creates an empty PDF and copies a single page from the output
  414 + into it. If you require the global data, you will have to run
  415 + :command:`qpdf` with the
  416 + :samp:`--pages` option once for each file. Using
  417 + :samp:`--split-pages` is much faster if you don't
  418 + require the global data.
  419 +
  420 +:samp:`--overlay options --`
  421 + Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
  422 + overlay/underlay.
  423 +
  424 +:samp:`--underlay options --`
  425 + Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on
  426 + overlay/underlay.
  427 +
  428 +Password-protected files may be opened by specifying a password. By
  429 +default, qpdf will preserve any encryption data associated with a file.
  430 +If :samp:`--decrypt` is specified, qpdf will attempt to
  431 +remove any encryption information. If :samp:`--encrypt`
  432 +is specified, qpdf will replace the document's encryption parameters
  433 +with whatever is specified.
  434 +
  435 +Note that qpdf does not obey encryption restrictions already imposed on
  436 +the file. Doing so would be meaningless since qpdf can be used to remove
  437 +encryption from the file entirely. This functionality is not intended to
  438 +be used for bypassing copyright restrictions or other restrictions
  439 +placed on files by their producers.
  440 +
  441 +Prior to 8.4.0, in the case of passwords that contain characters that
  442 +fall outside of 7-bit US-ASCII, qpdf left the burden of supplying
  443 +properly encoded encryption and decryption passwords to the user.
  444 +Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For
  445 +an in-depth discussion, please see :ref:`ref.unicode-passwords`. Previous versions of this manual
  446 +described workarounds using the :command:`iconv` command.
  447 +Such workarounds are no longer required or recommended with qpdf 8.4.0.
  448 +However, for backward compatibility, qpdf attempts to detect those
  449 +workarounds and do the right thing in most cases.
  450 +
  451 +.. _ref.encryption-options:
  452 +
  453 +Encryption Options
  454 +------------------
  455 +
  456 +To change the encryption parameters of a file, use the --encrypt flag.
  457 +The syntax is
  458 +
  459 +::
  460 +
  461 + --encrypt user-password owner-password key-length [ restrictions ] --
  462 +
  463 +Note that ":samp:`--`" terminates parsing of encryption
  464 +flags and must be present even if no restrictions are present.
  465 +
  466 +Either or both of the user password and the owner password may be empty
  467 +strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation
  468 +of PDF files with a non-empty user password, an empty owner password,
  469 +and a 256-bit key since such files can be opened with no password. If
  470 +you want to create such files, specify the encryption option
  471 +:samp:`--allow-insecure`, as described below.
  472 +
  473 +The value for
  474 +:samp:`{key-length}` may
  475 +be 40, 128, or 256. The restriction flags are dependent upon key length.
  476 +When no additional restrictions are given, the default is to be fully
  477 +permissive.
  478 +
  479 +If :samp:`{key-length}`
  480 +is 40, the following restriction options are available:
  481 +
  482 +:samp:`--print=[yn]`
  483 + Determines whether or not to allow printing.
  484 +
  485 +:samp:`--modify=[yn]`
  486 + Determines whether or not to allow document modification.
  487 +
  488 +:samp:`--extract=[yn]`
  489 + Determines whether or not to allow text/image extraction.
  490 +
  491 +:samp:`--annotate=[yn]`
  492 + Determines whether or not to allow comments and form fill-in and
  493 + signing.
  494 +
  495 +If :samp:`{key-length}`
  496 +is 128, the following restriction options are available:
  497 +
  498 +:samp:`--accessibility=[yn]`
  499 + Determines whether or not to allow accessibility to visually
  500 + impaired. The qpdf library disregards this field when AES is used or
  501 + when 256-bit encryption is used. You should really never disable
  502 + accessibility, but qpdf lets you do it in case you need to configure
  503 + a file this way for testing purposes. The PDF spec says that
  504 + conforming readers should disregard this permission and always allow
  505 + accessibility.
  506 +
  507 +:samp:`--extract=[yn]`
  508 + Determines whether or not to allow text/graphic extraction.
  509 +
  510 +:samp:`--assemble=[yn]`
  511 + Determines whether document assembly (rotation and reordering of
  512 + pages) is allowed.
  513 +
  514 +:samp:`--annotate=[yn]`
  515 + Determines whether modifying annotations is allowed. This includes
  516 + adding comments and filling in form fields. Also allows editing of
  517 + form fields if :samp:`--modify-other=y` is given.
  518 +
  519 +:samp:`--form=[yn]`
  520 + Determines whether filling form fields is allowed.
  521 +
  522 +:samp:`--modify-other=[yn]`
  523 + Allow all document editing except those controlled separately by the
  524 + :samp:`--assemble`,
  525 + :samp:`--annotate`, and
  526 + :samp:`--form` options.
  527 +
  528 +:samp:`--print={print-opt}`
  529 + Controls printing access.
  530 + :samp:`{print-opt}`
  531 + may be one of the following:
  532 +
  533 + - :samp:`full`: allow full printing
  534 +
  535 + - :samp:`low`: allow low-resolution printing only
  536 +
  537 + - :samp:`none`: disallow printing
  538 +
  539 +:samp:`--modify={modify-opt}`
  540 + Controls modify access. This way of controlling modify access has
  541 + less granularity than new options added in qpdf 8.4.
  542 + :samp:`{modify-opt}`
  543 + may be one of the following:
  544 +
  545 + - :samp:`all`: allow full document modification
  546 +
  547 + - :samp:`annotate`: allow comment authoring, form
  548 + operations, and document assembly
  549 +
  550 + - :samp:`form`: allow form field fill-in and signing
  551 + and document assembly
  552 +
  553 + - :samp:`assembly`: allow document assembly only
  554 +
  555 + - :samp:`none`: allow no modifications
  556 +
  557 + Using the :samp:`--modify` option does not allow you
  558 + to create certain combinations of permissions such as allowing form
  559 + filling but not allowing document assembly. Starting with qpdf 8.4,
  560 + you can either just use the other options to control fields
  561 + individually, or you can use something like :samp:`--modify=form
  562 + --assembly=n` to fine tune.
  563 +
  564 +:samp:`--cleartext-metadata`
  565 + If specified, any metadata stream in the document will be left
  566 + unencrypted even if the rest of the document is encrypted. This also
  567 + forces the PDF version to be at least 1.5.
  568 +
  569 +:samp:`--use-aes=[yn]`
  570 + If :samp:`--use-aes=y` is specified, AES encryption
  571 + will be used instead of RC4 encryption. This forces the PDF version
  572 + to be at least 1.6.
  573 +
  574 +:samp:`--allow-insecure`
  575 + From qpdf 10.2, qpdf defaults to not allowing creation of PDF files
  576 + where the user password is non-empty, the owner password is empty,
  577 + and a 256-bit key is in use. Files created in this way are insecure
  578 + since they can be opened without a password. Users would ordinarily
  579 + never want to create such files. If you are using qpdf to
  580 + intentionally created strange files for testing (a definite valid use
  581 + of qpdf!), this option allows you to create such insecure files.
  582 +
  583 +:samp:`--force-V4`
  584 + Use of this option forces the ``/V`` and ``/R`` parameters in the
  585 + document's encryption dictionary to be set to the value ``4``. As
  586 + qpdf will automatically do this when required, there is no reason to
  587 + ever use this option. It exists primarily for use in testing qpdf
  588 + itself. This option also forces the PDF version to be at least 1.5.
  589 +
  590 +If :samp:`{key-length}`
  591 +is 256, the minimum PDF version is 1.7 with extension level 8, and the
  592 +AES-based encryption format used is the PDF 2.0 encryption method
  593 +supported by Acrobat X. the same options are available as with 128 bits
  594 +with the following exceptions:
  595 +
  596 +:samp:`--use-aes`
  597 + This option is not available with 256-bit keys. AES is always used
  598 + with 256-bit encryption keys.
  599 +
  600 +:samp:`--force-V4`
  601 + This option is not available with 256 keys.
  602 +
  603 +:samp:`--force-R5`
  604 + If specified, qpdf sets the minimum version to 1.7 at extension level
  605 + 3 and writes the deprecated encryption format used by Acrobat version
  606 + IX. This option should not be used in practice to generate PDF files
  607 + that will be in general use, but it can be useful to generate files
  608 + if you are trying to test proper support in another application for
  609 + PDF files encrypted in this way.
  610 +
  611 +The default for each permission option is to be fully permissive.
  612 +
  613 +.. _ref.page-selection:
  614 +
  615 +Page Selection Options
  616 +----------------------
  617 +
  618 +Starting with qpdf 3.0, it is possible to split and merge PDF files by
  619 +selecting pages from one or more input files. Whatever file is given as
  620 +the primary input file is used as the starting point, but its pages are
  621 +replaced with pages as specified.
  622 +
  623 +::
  624 +
  625 + --pages input-file [ --password=password ] [ page-range ] [ ... ] --
  626 +
  627 +Multiple input files may be specified. Each one is given as the name of
  628 +the input file, an optional password (if required to open the file), and
  629 +the range of pages. Note that ":samp:`--`" terminates
  630 +parsing of page selection flags.
  631 +
  632 +Starting with qpf 8.4, the special input file name
  633 +":file:`.`" can be used as a shortcut for the
  634 +primary input filename.
  635 +
  636 +For each file that pages should be taken from, specify the file, a
  637 +password needed to open the file (if any), and a page range. The
  638 +password needs to be given only once per file. If any of the input files
  639 +are the same as the primary input file or the file used to copy
  640 +encryption parameters (if specified), you do not need to repeat the
  641 +password here. The same file can be repeated multiple times. If a file
  642 +that is repeated has a password, the password only has to be given the
  643 +first time. All non-page data (info, outlines, page numbers, etc.) are
  644 +taken from the primary input file. To discard these, use
  645 +:samp:`--empty` as the primary input.
  646 +
  647 +Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf
  648 +sees a value in the place where it expects a page range and that value
  649 +is not a valid range but is a valid file name, qpdf will implicitly use
  650 +the range ``1-z``, meaning that it will include all pages in the file.
  651 +This makes it possible to easily combine all pages in a set of files
  652 +with a command like :command:`qpdf --empty out.pdf --pages \*.pdf
  653 +--`.
  654 +
  655 +The page range is a set of numbers separated by commas, ranges of
  656 +numbers separated dashes, or combinations of those. The character "z"
  657 +represents the last page. A number preceded by an "r" indicates to count
  658 +from the end, so ``r3-r1`` would be the last three pages of the
  659 +document. Pages can appear in any order. Ranges can appear with a high
  660 +number followed by a low number, which causes the pages to appear in
  661 +reverse. Numbers may be repeated in a page range. A page range may be
  662 +optionally appended with ``:even`` or ``:odd`` to indicate only the even
  663 +or odd pages in the given range. Note that even and odd refer to the
  664 +positions within the specified, range, not whether the original number
  665 +is even or odd.
  666 +
  667 +Example page ranges:
  668 +
  669 +- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in
  670 + that order.
  671 +
  672 +- ``z-1``: all pages in the document in reverse
  673 +
  674 +- ``r3-r1``: the last three pages of the document
  675 +
  676 +- ``r1-r3``: the last three pages of the document in reverse order
  677 +
  678 +- ``1-20:even``: even pages from 2 to 20
  679 +
  680 +- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd
  681 + positions from among the original range, which represents pages 5, 7,
  682 + 8, 9, and 12.
  683 +
  684 +Starting in qpdf version 8.3, you can specify the
  685 +:samp:`--collate` option. Note that this option is
  686 +specified outside of :samp:`--pagesย ...ย --`. When
  687 +:samp:`--collate` is specified, it changes the meaning
  688 +of :samp:`--pages` so that the specified files, as
  689 +modified by page ranges, are collated rather than concatenated. For
  690 +example, if you add the files :file:`odd.pdf` and
  691 +:file:`even.pdf` containing odd and even pages of a
  692 +document respectively, you could run :command:`qpdf --collate odd.pdf
  693 +--pages odd.pdf even.pdf -- all.pdf` to collate the pages.
  694 +This would pick page 1 from odd, page 1 from even, page 2 from odd, page
  695 +2 from even, etc. until all pages have been included. Any number of
  696 +files and page ranges can be specified. If any file has fewer pages,
  697 +that file is just skipped when its pages have all been included. For
  698 +example, if you ran :command:`qpdf --collate --empty --pages a.pdf
  699 +1-5 b.pdf 6-4 c.pdf r1 -- out.pdf`, you would get the
  700 +following pages in this order:
  701 +
  702 +- a.pdf page 1
  703 +
  704 +- b.pdf page 6
  705 +
  706 +- c.pdf last page
  707 +
  708 +- a.pdf page 2
  709 +
  710 +- b.pdf page 5
  711 +
  712 +- a.pdf page 3
  713 +
  714 +- b.pdf page 4
  715 +
  716 +- a.pdf page 4
  717 +
  718 +- a.pdf page 5
  719 +
  720 +Starting in qpdf version 10.2, you may specify a numeric argument to
  721 +:samp:`--collate`. With
  722 +:samp:`--collate={n}`,
  723 +pull groups of :samp:`{n}` pages from each file,
  724 +again, stopping when there are no more pages. For example, if you ran
  725 +:command:`qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf
  726 +r1 -- out.pdf`, you would get the following pages in this
  727 +order:
  728 +
  729 +- a.pdf page 1
  730 +
  731 +- a.pdf page 2
  732 +
  733 +- b.pdf page 6
  734 +
  735 +- b.pdf page 5
  736 +
  737 +- c.pdf last page
  738 +
  739 +- a.pdf page 3
  740 +
  741 +- a.pdf page 4
  742 +
  743 +- b.pdf page 4
  744 +
  745 +- a.pdf page 5
  746 +
  747 +Starting in qpdf version 8.3, when you split and merge files, any page
  748 +labels (page numbers) are preserved in the final file. It is expected
  749 +that more document features will be preserved by splitting and merging.
  750 +In the mean time, semantics of splitting and merging vary across
  751 +features. For example, the document's outlines (bookmarks) point to
  752 +actual page objects, so if you select some pages and not others,
  753 +bookmarks that point to pages that are in the output file will work, and
  754 +remaining bookmarks will not work. A future version of
  755 +:command:`qpdf` may do a better job at handling these
  756 +issues. (Note that the qpdf library already contains all of the APIs
  757 +required in order to implement this in your own application if you need
  758 +it.) In the mean time, you can always use
  759 +:samp:`--empty` as the primary input file to avoid
  760 +copying all of that from the first file. For example, to take pages 1
  761 +through 5 from a :file:`infile.pdf` while preserving
  762 +all metadata associated with that file, you could use
  763 +
  764 +::
  765 +
  766 + qpdf infile.pdf --pages . 1-5 -- outfile.pdf
  767 +
  768 +If you wanted pages 1 through 5 from
  769 +:file:`infile.pdf` but you wanted the rest of the
  770 +metadata to be dropped, you could instead run
  771 +
  772 +::
  773 +
  774 + qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf
  775 +
  776 +If you wanted to take pages 1 through 5 from
  777 +:file:`file1.pdf` and pages 11 through 15 from
  778 +:file:`file2.pdf` in reverse, taking document-level
  779 +metadata from :file:`file2.pdf`, you would run
  780 +
  781 +::
  782 +
  783 + qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf
  784 +
  785 +If, for some reason, you wanted to take the first page of an encrypted
  786 +file called :file:`encrypted.pdf` with password
  787 +``pass`` and repeat it twice in an output file, and if you wanted to
  788 +drop document-level metadata but preserve encryption, you would use
  789 +
  790 +::
  791 +
  792 + qpdf --empty --copy-encryption=encrypted.pdf --encryption-file-password=pass
  793 + --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --
  794 + outfile.pdf
  795 +
  796 +Note that we had to specify the password all three times because giving
  797 +a password as :samp:`--encryption-file-password` doesn't
  798 +count for page selection, and as far as qpdf is concerned,
  799 +:file:`encrypted.pdf` and
  800 +:file:`./encrypted.pdf` are separated files. These
  801 +are all corner cases that most users should hopefully never have to be
  802 +bothered with.
  803 +
  804 +Prior to version 8.4, it was not possible to specify the same page from
  805 +the same file directly more than once, and the workaround of specifying
  806 +the same file in more than one way was required. Version 8.4 removes
  807 +this limitation, but there is still a valid use case. When you specify
  808 +the same page from the same file more than once, qpdf will share objects
  809 +between the pages. If you are going to do further manipulation on the
  810 +file and need the two instances of the same original page to be deep
  811 +copies, then you can specify the file in two different ways. For example
  812 +:command:`qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf`
  813 +would create a file with two copies of the first page of the input, and
  814 +the two copies would share any objects in common. This includes fonts,
  815 +images, and anything else the page references.
  816 +
  817 +.. _ref.overlay-underlay:
  818 +
  819 +Overlay and Underlay Options
  820 +----------------------------
  821 +
  822 +Starting with qpdf 8.4, it is possible to overlay or underlay pages from
  823 +other files onto the output generated by qpdf. Specify overlay or
  824 +underlay as follows:
  825 +
  826 +::
  827 +
  828 + { --overlay | --underlay } file [ options ] --
  829 +
  830 +Overlay and underlay options are processed late, so they can be combined
  831 +with other like merging and will apply to the final output. The
  832 +:samp:`--overlay` and :samp:`--underlay`
  833 +options work the same way, except underlay pages are drawn underneath
  834 +the page to which they are applied, possibly obscured by the original
  835 +page, and overlay files are drawn on top of the page to which they are
  836 +applied, possibly obscuring the page. You can combine overlay and
  837 +underlay.
  838 +
  839 +The default behavior of overlay and underlay is that pages are taken
  840 +from the overlay/underlay file in sequence and applied to corresponding
  841 +pages in the output until there are no more output pages. If the overlay
  842 +or underlay file runs out of pages, remaining output pages are left
  843 +alone. This behavior can be modified by options, which are provided
  844 +between the :samp:`--overlay` or
  845 +:samp:`--underlay` flag and the
  846 +:samp:`--` option. The following options are supported:
  847 +
  848 +- :samp:`--password=password`: supply a password if the
  849 + overlay/underlay file is encrypted.
  850 +
  851 +- :samp:`--to=page-range`: a range of pages in the same
  852 + form at described in :ref:`ref.page-selection`
  853 + indicates which pages in the output should have the overlay/underlay
  854 + applied. If not specified, overlay/underlay are applied to all pages.
  855 +
  856 +- :samp:`--from=[page-range]`: a range of pages that
  857 + specifies which pages in the overlay/underlay file will be used for
  858 + overlay or underlay. If not specified, all pages will be used. This
  859 + can be explicitly specified to be empty if
  860 + :samp:`--repeat` is used.
  861 +
  862 +- :samp:`--repeat=page-range`: an optional range of
  863 + pages that specifies which pages in the overlay/underlay file will be
  864 + repeated after the "from" pages are used up. If you want to repeat a
  865 + range of pages starting at the beginning, you can explicitly use
  866 + :samp:`--from=`.
  867 +
  868 +Here are some examples.
  869 +
  870 +- :command:`--overlay o.pdf --to=1-5 --from=1-3 --repeat=4
  871 + --`: overlay the first three pages from file
  872 + :file:`o.pdf` onto the first three pages of the
  873 + output, then overlay page 4 from :file:`o.pdf`
  874 + onto pages 4 and 5 of the output. Leave remaining output pages
  875 + untouched.
  876 +
  877 +- :command:`--underlay footer.pdf --from= --repeat=1,2
  878 + --`: Underlay page 1 of
  879 + :file:`footer.pdf` on all odd output pages, and
  880 + underlay page 2 of :file:`footer.pdf` on all even
  881 + output pages.
  882 +
  883 +.. _ref.attachments:
  884 +
  885 +Embedded Files/Attachments Options
  886 +----------------------------------
  887 +
  888 +Starting with qpdf 10.2, you can work with file attachments in PDF files
  889 +from the command line. The following options are available:
  890 +
  891 +:samp:`--list-attachments`
  892 + Show the "key" and stream number for embedded files. With
  893 + :samp:`--verbose`, additional information, including
  894 + preferred file name, description, dates, and more are also displayed.
  895 + The key is usually but not always equal to the file name, and is
  896 + needed by some of the other options.
  897 +
  898 +:samp:`--show-attachment={key}`
  899 + Write the contents of the specified attachment to standard output as
  900 + binary data. The key should match one of the keys shown by
  901 + :samp:`--list-attachments`. If specified multiple
  902 + times, only the last attachment will be shown.
  903 +
  904 +:samp:`--add-attachment {file} {options} --`
  905 + Add or replace an attachment with the contents of
  906 + :samp:`{file}`. This may be specified more
  907 + than once. The following additional options may appear before the
  908 + ``--`` that ends this option:
  909 +
  910 + :samp:`--key={key}`
  911 + The key to use to register the attachment in the embedded files
  912 + table. Defaults to the last path element of
  913 + :samp:`{file}`.
  914 +
  915 + :samp:`--filename={name}`
  916 + The file name to be used for the attachment. This is what is
  917 + usually displayed to the user and is the name most graphical PDF
  918 + viewers will use when saving a file. It defaults to the last path
  919 + element of :samp:`{file}`.
  920 +
  921 + :samp:`--creationdate={date}`
  922 + The attachment's creation date in PDF format; defaults to the
  923 + current time. The date format is explained below.
  924 +
  925 + :samp:`--moddate={date}`
  926 + The attachment's modification date in PDF format; defaults to the
  927 + current time. The date format is explained below.
  928 +
  929 + :samp:`--mimetype={type/subtype}`
  930 + The mime type for the attachment, e.g. ``text/plain`` or
  931 + ``application/pdf``. Note that the mimetype appears in a field
  932 + called ``/Subtype`` in the PDF but actually includes the full type
  933 + and subtype of the mime type.
  934 +
  935 + :samp:`--description={"text"}`
  936 + Descriptive text for the attachment, displayed by some PDF
  937 + viewers.
  938 +
  939 + :samp:`--replace`
  940 + Indicates that any existing attachment with the same key should be
  941 + replaced by the new attachment. Otherwise,
  942 + :command:`qpdf` gives an error if an attachment
  943 + with that key is already present.
  944 +
  945 +:samp:`--remove-attachment={key}`
  946 + Remove the specified attachment. This doesn't only remove the
  947 + attachment from the embedded files table but also clears out the file
  948 + specification. That means that any potential internal links to the
  949 + attachment will be broken. This option may be specified multiple
  950 + times. Run with :samp:`--verbose` to see status of
  951 + the removal.
  952 +
  953 +:samp:`--copy-attachments-from {file} {options} --`
  954 + Copy attachments from another file. This may be specified more than
  955 + once. The following additional options may appear before the ``--``
  956 + that ends this option:
  957 +
  958 + :samp:`--password={password}`
  959 + If required, the password needed to open
  960 + :samp:`{file}`
  961 +
  962 + :samp:`--prefix={prefix}`
  963 + Only required if the file from which attachments are being copied
  964 + has attachments with keys that conflict with attachments already
  965 + in the file. In this case, the specified prefix will be prepended
  966 + to each key. This affects only the key in the embedded files
  967 + table, not the file name. The PDF specification doesn't preclude
  968 + multiple attachments having the same file name.
  969 +
  970 +When a date is required, the date should conform to the PDF date format
  971 +specification, which is
  972 +``D:``\ :samp:`{yyyymmddhhmmss<z>}`, where
  973 +:samp:`{<z>}` is either ``Z`` for UTC or a
  974 +timezone offset in the form :samp:`{-hh'mm'}` or
  975 +:samp:`{+hh'mm'}`. Examples:
  976 +``D:20210207161528-05'00'``, ``D:20210207211528Z``.
  977 +
  978 +.. _ref.advanced-parsing:
  979 +
  980 +Advanced Parsing Options
  981 +------------------------
  982 +
  983 +These options control aspects of how qpdf reads PDF files. Mostly these
  984 +are of use to people who are working with damaged files. There is little
  985 +reason to use these options unless you are trying to solve specific
  986 +problems. The following options are available:
  987 +
  988 +:samp:`--suppress-recovery`
  989 + Prevents qpdf from attempting to recover damaged files.
  990 +
  991 +:samp:`--ignore-xref-streams`
  992 + Tells qpdf to ignore any cross-reference streams.
  993 +
  994 +Ordinarily, qpdf will attempt to recover from certain types of errors in
  995 +PDF files. These include errors in the cross-reference table, certain
  996 +types of object numbering errors, and certain types of stream length
  997 +errors. Sometimes, qpdf may think it has recovered but may not have
  998 +actually recovered, so care should be taken when using this option as
  999 +some data loss is possible. The
  1000 +:samp:`--suppress-recovery` option will prevent qpdf
  1001 +from attempting recovery. In this case, it will fail on the first error
  1002 +that it encounters.
  1003 +
  1004 +Ordinarily, qpdf reads cross-reference streams when they are present in
  1005 +a PDF file. If :samp:`--ignore-xref-streams` is
  1006 +specified, qpdf will ignore any cross-reference streams for hybrid PDF
  1007 +files. The purpose of hybrid files is to make some content available to
  1008 +viewers that are not aware of cross-reference streams. It is almost
  1009 +never desirable to ignore them. The only time when you might want to use
  1010 +this feature is if you are testing creation of hybrid PDF files and wish
  1011 +to see how a PDF consumer that doesn't understand object and
  1012 +cross-reference streams would interpret such a file.
  1013 +
  1014 +.. _ref.advanced-transformation:
  1015 +
  1016 +Advanced Transformation Options
  1017 +-------------------------------
  1018 +
  1019 +These transformation options control fine points of how qpdf creates the
  1020 +output file. Mostly these are of use only to people who are very
  1021 +familiar with the PDF file format or who are PDF developers. The
  1022 +following options are available:
  1023 +
  1024 +:samp:`--compress-streams={[yn]}`
  1025 + By default, or with :samp:`--compress-streams=y`,
  1026 + qpdf will compress any stream with no other filters applied to it
  1027 + with the ``/FlateDecode`` filter when it writes it. To suppress this
  1028 + behavior and preserve uncompressed streams as uncompressed, use
  1029 + :samp:`--compress-streams=n`.
  1030 +
  1031 +:samp:`--decode-level={option}`
  1032 + Controls which streams qpdf tries to decode. The default is
  1033 + :samp:`generalized`. The following options are
  1034 + available:
  1035 +
  1036 + - :samp:`none`: do not attempt to decode any streams
  1037 +
  1038 + - :samp:`generalized`: decode streams filtered with
  1039 + supported generalized filters: ``/LZWDecode``, ``/FlateDecode``,
  1040 + ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized
  1041 + filters as those to be used for general-purpose compression or
  1042 + encoding, as opposed to filters specifically designed for image
  1043 + data. Note that, by default, streams already compressed with
  1044 + ``/FlateDecode`` are not uncompressed and recompressed unless you
  1045 + also specify :samp:`--recompress-flate`.
  1046 +
  1047 + - :samp:`specialized`: in addition to generalized,
  1048 + decode streams with supported non-lossy specialized filters;
  1049 + currently this is just ``/RunLengthDecode``
  1050 +
  1051 + - :samp:`all`: in addition to generalized and
  1052 + specialized, decode streams with supported lossy filters;
  1053 + currently this is just ``/DCTDecode`` (JPEG)
  1054 +
  1055 +:samp:`--stream-data={option}`
  1056 + Controls transformation of stream data. This option predates the
  1057 + :samp:`--compress-streams` and
  1058 + :samp:`--decode-level` options. Those options can be
  1059 + used to achieve the same affect with more control. The value of
  1060 + :samp:`{option}` may
  1061 + be one of the following:
  1062 +
  1063 + - :samp:`compress`: recompress stream data when
  1064 + possible (default); equivalent to
  1065 + :samp:`--compress-streams=y`
  1066 + :samp:`--decode-level=generalized`. Does not
  1067 + recompress streams already compressed with ``/FlateDecode`` unless
  1068 + :samp:`--recompress-flate` is also specified.
  1069 +
  1070 + - :samp:`preserve`: leave all stream data as is;
  1071 + equivalent to :samp:`--compress-streams=n`
  1072 + :samp:`--decode-level=none`
  1073 +
  1074 + - :samp:`uncompress`: uncompress stream data
  1075 + compressed with generalized filters when possible; equivalent to
  1076 + :samp:`--compress-streams=n`
  1077 + :samp:`--decode-level=generalized`
  1078 +
  1079 +:samp:`--recompress-flate`
  1080 + By default, streams already compressed with ``/FlateDecode`` are left
  1081 + alone rather than being uncompressed and recompressed. This option
  1082 + causes qpdf to uncompress and recompress the streams. There is a
  1083 + significant performance cost to using this option, but you probably
  1084 + want to use it if you specify
  1085 + :samp:`--compression-level`.
  1086 +
  1087 +:samp:`--compression-level={level}`
  1088 + When writing new streams that are compressed with ``/FlateDecode``,
  1089 + use the specified compression level. The value of
  1090 + :samp:`level` should be a number from 1 to 9 and is
  1091 + passed directly to zlib, which implements deflate compression. Note
  1092 + that qpdf doesn't uncompress and recompress streams by default. To
  1093 + have this option apply to already compressed streams, you should also
  1094 + specify :samp:`--recompress-flate`. If your goal is
  1095 + to shrink the size of PDF files, you should also use
  1096 + :samp:`--object-streams=generate`.
  1097 +
  1098 +:samp:`--normalize-content=[yn]`
  1099 + Enables or disables normalization of content streams. Content
  1100 + normalization is enabled by default in QDF mode. Please see :ref:`ref.qdf` for additional discussion of QDF mode.
  1101 +
  1102 +:samp:`--object-streams={mode}`
  1103 + Controls handling of object streams. The value of
  1104 + :samp:`{mode}` may be
  1105 + one of the following:
  1106 +
  1107 + - :samp:`preserve`: preserve original object streams
  1108 + (default)
  1109 +
  1110 + - :samp:`disable`: don't write any object streams
  1111 +
  1112 + - :samp:`generate`: use object streams wherever
  1113 + possible
  1114 +
  1115 +:samp:`--preserve-unreferenced`
  1116 + Tells qpdf to preserve objects that are not referenced when writing
  1117 + the file. Ordinarily any object that is not referenced in a traversal
  1118 + of the document from the trailer dictionary will be discarded. This
  1119 + may be useful in working with some damaged files or inspecting files
  1120 + with known unreferenced objects.
  1121 +
  1122 + This flag is ignored for linearized files and has the effect of
  1123 + causing objects in the new file to be written in order by object ID
  1124 + from the original file. This does not mean that object numbers will
  1125 + be the same since qpdf may create stream lengths as direct or
  1126 + indirect differently from the original file, and the original file
  1127 + may have gaps in its numbering.
  1128 +
  1129 + See also :samp:`--preserve-unreferenced-resources`,
  1130 + which does something completely different.
  1131 +
  1132 +:samp:`--remove-unreferenced-resources={option}`
  1133 + The :samp:`{option}` may be ``auto``,
  1134 + ``yes``, or ``no``. The default is ``auto``.
  1135 +
  1136 + Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt
  1137 + to remove images and fonts that are not used by a page even if they
  1138 + are referenced in the page's resources dictionary. When shared
  1139 + resources are in use, this behavior can greatly reduce the file sizes
  1140 + of split pages, but the analysis is very slow. In versions from 8.1
  1141 + through 9.1.1, qpdf did this analysis by default. Starting in qpdf
  1142 + 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file
  1143 + to determine whether the file is likely to have unreferenced objects
  1144 + on pages, a pattern that frequently occurs when resource dictionaries
  1145 + are shared across multiple pages and rarely occurs otherwise. If it
  1146 + discovers this pattern, then it will attempt to remove unreferenced
  1147 + resources. Usually this means you get the slower splitting speed only
  1148 + when it's actually going to create smaller files. You can suppress
  1149 + removal of unreferenced resources altogether by specifying ``no`` or
  1150 + force it to do the full algorithm by specifying ``yes``.
  1151 +
  1152 + Other than cases in which you don't care about file size and care a
  1153 + lot about runtime, there are few reasons to use this option,
  1154 + especially now that ``auto`` mode is supported. One reason to use
  1155 + this is if you suspect that qpdf is removing resources it shouldn't
  1156 + be removing. If you encounter that case, please report it as bug at
  1157 + https://github.com/qpdf/qpdf/issues/.
  1158 +
  1159 +:samp:`--preserve-unreferenced-resources`
  1160 + This is a synonym for
  1161 + :samp:`--remove-unreferenced-resources=no`.
  1162 +
  1163 + See also :samp:`--preserve-unreferenced`, which does
  1164 + something completely different.
  1165 +
  1166 +:samp:`--newline-before-endstream`
  1167 + Tells qpdf to insert a newline before the ``endstream`` keyword, not
  1168 + counted in the length, after any stream content even if the last
  1169 + character of the stream was a newline. This may result in two
  1170 + newlines in some cases. This is a requirement of PDF/A. While qpdf
  1171 + doesn't specifically know how to generate PDF/A-compliant PDFs, this
  1172 + at least prevents it from removing compliance on already compliant
  1173 + files.
  1174 +
  1175 +:samp:`--linearize-pass1={file}`
  1176 + Write the first pass of linearization to the named file. The
  1177 + resulting file is not a valid PDF file. This option is useful only
  1178 + for debugging ``QPDFWriter``'s linearization code. When qpdf
  1179 + linearizes files, it writes the file in two passes, using the first
  1180 + pass to calculate sizes and offsets that are required for hint tables
  1181 + and the linearization dictionary. Ordinarily, the first pass is
  1182 + discarded. This option enables it to be captured.
  1183 +
  1184 +:samp:`--coalesce-contents`
  1185 + When a page's contents are split across multiple streams, this option
  1186 + causes qpdf to combine them into a single stream. Use of this option
  1187 + is never necessary for ordinary usage, but it can help when working
  1188 + with some files in some cases. For example, this can also be combined
  1189 + with QDF mode or content normalization to make it easier to look at
  1190 + all of a page's contents at once.
  1191 +
  1192 +:samp:`--flatten-annotations={option}`
  1193 + This option collapses annotations into the pages' contents with
  1194 + special handling for form fields. Ordinarily, an annotation is
  1195 + rendered separately and on top of the page. Combining annotations
  1196 + into the page's contents effectively freezes the placement of the
  1197 + annotations, making them look right after various page
  1198 + transformations. The library functionality backing this option was
  1199 + added for the benefit of programs that want to create *n-up* page
  1200 + layouts and other similar things that don't work well with
  1201 + annotations. The :samp:`{option}` parameter
  1202 + may be any of the following:
  1203 +
  1204 + - :samp:`all`: include all annotations that are not
  1205 + marked invisible or hidden
  1206 +
  1207 + - :samp:`print`: only include annotations that
  1208 + indicate that they should appear when the page is printed
  1209 +
  1210 + - :samp:`screen`: omit annotations that indicate
  1211 + they should not appear on the screen
  1212 +
  1213 + Note that form fields are special because the annotations that are
  1214 + used to render filled-in form fields may become out of date from the
  1215 + fields' values if the form is filled in by a program that doesn't
  1216 + know how to update the appearances. If qpdf detects this case, its
  1217 + default behavior is not to flatten those annotations because doing so
  1218 + would cause the value of the form field to be lost. This gives you a
  1219 + chance to go back and resave the form with a program that knows how
  1220 + to generate appearances. QPDF itself can generate appearances with
  1221 + some limitations. See the
  1222 + :samp:`--generate-appearances` option below.
  1223 +
  1224 +:samp:`--generate-appearances`
  1225 + If a file contains interactive form fields and indicates that the
  1226 + appearances are out of date with the values of the form, this flag
  1227 + will regenerate appearances, subject to a few limitations. Note that
  1228 + there is not usually a reason to do this, but it can be necessary
  1229 + before using the :samp:`--flatten-annotations`
  1230 + option. Most of these are not a problem with well-behaved PDF files.
  1231 + The limitations are as follows:
  1232 +
  1233 + - Radio button and checkbox appearances use the pre-set values in
  1234 + the PDF file. QPDF just makes sure that the correct appearance is
  1235 + displayed based on the value of the field. This is fine for PDF
  1236 + files that create their forms properly. Some PDF writers save
  1237 + appearances for fields when they change, which could cause some
  1238 + controls to have inconsistent appearances.
  1239 +
  1240 + - For text fields and list boxes, any characters that fall outside
  1241 + of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman"
  1242 + encoding, will be replaced by the ``?`` character.
  1243 +
  1244 + - Quadding is ignored. Quadding is used to specify whether the
  1245 + contents of a field should be left, center, or right aligned with
  1246 + the field.
  1247 +
  1248 + - Rich text, multi-line, and other more elaborate formatting
  1249 + directives are ignored.
  1250 +
  1251 + - There is no support for multi-select fields or signature fields.
  1252 +
  1253 + If qpdf doesn't do a good enough job with your form, use an external
  1254 + application to save your filled-in form before processing it with
  1255 + qpdf.
  1256 +
  1257 +:samp:`--optimize-images`
  1258 + This flag causes qpdf to recompress all images that are not
  1259 + compressed with DCT (JPEG) using DCT compression as long as doing so
  1260 + decreases the size in bytes of the image data and the image does not
  1261 + fall below minimum specified dimensions. Useful information is
  1262 + provided when used in combination with
  1263 + :samp:`--verbose`. See also the
  1264 + :samp:`--oi-min-width`,
  1265 + :samp:`--oi-min-height`, and
  1266 + :samp:`--oi-min-area` options. By default, starting
  1267 + in qpdf 8.4, inline images are converted to regular images and
  1268 + optimized as well. Use :samp:`--keep-inline-images`
  1269 + to prevent inline images from being included.
  1270 +
  1271 +:samp:`--oi-min-width={width}`
  1272 + Avoid optimizing images whose width is below the specified amount. If
  1273 + omitted, the default is 128 pixels. Use 0 for no minimum.
  1274 +
  1275 +:samp:`--oi-min-height={height}`
  1276 + Avoid optimizing images whose height is below the specified amount.
  1277 + If omitted, the default is 128 pixels. Use 0 for no minimum.
  1278 +
  1279 +:samp:`--oi-min-area={area-in-pixels}`
  1280 + Avoid optimizing images whose pixel count (widthย ร—ย height) is below
  1281 + the specified amount. If omitted, the default is 16,384 pixels. Use 0
  1282 + for no minimum.
  1283 +
  1284 +:samp:`--externalize-inline-images`
  1285 + Convert inline images to regular images. By default, images whose
  1286 + data is at least 1,024 bytes are converted when this option is
  1287 + selected. Use :samp:`--ii-min-bytes` to change the
  1288 + size threshold. This option is implicitly selected when
  1289 + :samp:`--optimize-images` is selected. Use
  1290 + :samp:`--keep-inline-images` to exclude inline images
  1291 + from image optimization.
  1292 +
  1293 +:samp:`--ii-min-bytes={bytes}`
  1294 + Avoid converting inline images whose size is below the specified
  1295 + minimum size to regular images. If omitted, the default is 1,024
  1296 + bytes. Use 0 for no minimum.
  1297 +
  1298 +:samp:`--keep-inline-images`
  1299 + Prevent inline images from being included in image optimization. This
  1300 + option has no affect when :samp:`--optimize-images`
  1301 + is not specified.
  1302 +
  1303 +:samp:`--remove-page-labels`
  1304 + Remove page labels from the output file.
  1305 +
  1306 +:samp:`--qdf`
  1307 + Turns on QDF mode. For additional information on QDF, please see :ref:`ref.qdf`. Note that :samp:`--linearize`
  1308 + disables QDF mode.
  1309 +
  1310 +:samp:`--min-version={version}`
  1311 + Forces the PDF version of the output file to be at least
  1312 + :samp:`{version}`. In other words, if the
  1313 + input file has a lower version than the specified version, the
  1314 + specified version will be used. If the input file has a higher
  1315 + version, the input file's original version will be used. It is seldom
  1316 + necessary to use this option since qpdf will automatically increase
  1317 + the version as needed when adding features that require newer PDF
  1318 + readers.
  1319 +
  1320 + The version number may be expressed in the form
  1321 + :samp:`{major.minor.extension-level}`, in
  1322 + which case the version is interpreted as
  1323 + :samp:`{major.minor}` at extension level
  1324 + :samp:`{extension-level}`. For example,
  1325 + version ``1.7.8`` represents version 1.7 at extension level 8. Note
  1326 + that minimal syntax checking is done on the command line.
  1327 +
  1328 +:samp:`--force-version={version}`
  1329 + This option forces the PDF version to be the exact version specified
  1330 + *even when the file may have content that is not supported in that
  1331 + version*. The version number is interpreted in the same way as with
  1332 + :samp:`--min-version` so that extension levels can be
  1333 + set. In some cases, forcing the output file's PDF version to be lower
  1334 + than that of the input file will cause qpdf to disable certain
  1335 + features of the document. Specifically, 256-bit keys are disabled if
  1336 + the version is less than 1.7 with extension level 8 (except R5 is
  1337 + disabled if less than 1.7 with extension level 3), AES encryption is
  1338 + disabled if the version is less than 1.6, cleartext metadata and
  1339 + object streams are disabled if less than 1.5, 128-bit encryption keys
  1340 + are disabled if less than 1.4, and all encryption is disabled if less
  1341 + than 1.3. Even with these precautions, qpdf won't be able to do
  1342 + things like eliminate use of newer image compression schemes,
  1343 + transparency groups, or other features that may have been added in
  1344 + more recent versions of PDF.
  1345 +
  1346 + As a general rule, with the exception of big structural things like
  1347 + the use of object streams or AES encryption, PDF viewers are supposed
  1348 + to ignore features in files that they don't support from newer
  1349 + versions. This means that forcing the version to a lower version may
  1350 + make it possible to open your PDF file with an older version, though
  1351 + bear in mind that some of the original document's functionality may
  1352 + be lost.
  1353 +
  1354 +By default, when a stream is encoded using non-lossy filters that qpdf
  1355 +understands and is not already compressed using a good compression
  1356 +scheme, qpdf will uncompress and recompress streams. Assuming proper
  1357 +filter implements, this is safe and generally results in smaller files.
  1358 +This behavior may also be explicitly requested with
  1359 +:samp:`--stream-data=compress`.
  1360 +
  1361 +When :samp:`--normalize-content=y` is specified, qpdf
  1362 +will attempt to normalize whitespace and newlines in page content
  1363 +streams. This is generally safe but could, in some cases, cause damage
  1364 +to the content streams. This option is intended for people who wish to
  1365 +study PDF content streams or to debug PDF content. You should not use
  1366 +this for "production" PDF files.
  1367 +
  1368 +When normalizing content, if qpdf runs into any lexical errors, it will
  1369 +print a warning indicating that content may be damaged. The only
  1370 +situation in which qpdf is known to cause damage during content
  1371 +normalization is when a page's contents are split across multiple
  1372 +streams and streams are split in the middle of a lexical token such as a
  1373 +string, name, or inline image. Note that files that do this are invalid
  1374 +since the PDF specification states that content streams are not to be
  1375 +split in the middle of a token. If you want to inspect the original
  1376 +content streams in an uncompressed format, you can always run with
  1377 +:samp:`--qdf --normalize-content=n` for a QDF file
  1378 +without content normalization, or alternatively
  1379 +:samp:`--stream-data=uncompress` for a regular non-QDF
  1380 +mode file with uncompressed streams. These will both uncompress all the
  1381 +streams but will not attempt to normalize content. Please note that if
  1382 +you are using content normalization or QDF mode for the purpose of
  1383 +manually inspecting files, you don't have to care about this.
  1384 +
  1385 +Object streams, also known as compressed objects, were introduced into
  1386 +the PDF specification at version 1.5, corresponding to Acrobat 6. Some
  1387 +older PDF viewers may not support files with object streams. qpdf can be
  1388 +used to transform files with object streams to files without object
  1389 +streams or vice versa. As mentioned above, there are three object stream
  1390 +modes: :samp:`preserve`,
  1391 +:samp:`disable`, and :samp:`generate`.
  1392 +
  1393 +In :samp:`preserve` mode, the relationship to objects
  1394 +and the streams that contain them is preserved from the original file.
  1395 +In :samp:`disable` mode, all objects are written as
  1396 +regular, uncompressed objects. The resulting file should be readable by
  1397 +older PDF viewers. (Of course, the content of the files may include
  1398 +features not supported by older viewers, but at least the structure will
  1399 +be supported.) In :samp:`generate` mode, qpdf will
  1400 +create its own object streams. This will usually result in more compact
  1401 +PDF files, though they may not be readable by older viewers. In this
  1402 +mode, qpdf will also make sure the PDF version number in the header is
  1403 +at least 1.5.
  1404 +
  1405 +The :samp:`--qdf` flag turns on QDF mode, which changes
  1406 +some of the defaults described above. Specifically, in QDF mode, by
  1407 +default, stream data is uncompressed, content streams are normalized,
  1408 +and encryption is removed. These defaults can still be overridden by
  1409 +specifying the appropriate options as described above. Additionally, in
  1410 +QDF mode, stream lengths are stored as indirect objects, objects are
  1411 +laid out in a less efficient but more readable fashion, and the
  1412 +documents are interspersed with comments that make it easier for the
  1413 +user to find things and also make it possible for
  1414 +:command:`fix-qdf` to work properly. QDF mode is intended
  1415 +for people, mostly developers, who wish to inspect or modify PDF files
  1416 +in a text editor. For details, please see :ref:`ref.qdf`.
  1417 +
  1418 +.. _ref.testing-options:
  1419 +
  1420 +Testing, Inspection, and Debugging Options
  1421 +------------------------------------------
  1422 +
  1423 +These options can be useful for digging into PDF files or for use in
  1424 +automated test suites for software that uses the qpdf library. When any
  1425 +of the options in this section are specified, no output file should be
  1426 +given. The following options are available:
  1427 +
  1428 +:samp:`--deterministic-id`
  1429 + Causes generation of a deterministic value for /ID. This prevents use
  1430 + of timestamp and output file name information in the /ID generation.
  1431 + Instead, at some slight additional runtime cost, the /ID field is
  1432 + generated to include a digest of the significant parts of the content
  1433 + of the output PDF file. This means that a given qpdf operation should
  1434 + generate the same /ID each time it is run, which can be useful when
  1435 + caching results or for generation of some test data. Use of this flag
  1436 + is not compatible with creation of encrypted files.
  1437 +
  1438 +:samp:`--static-id`
  1439 + Causes generation of a fixed value for /ID. This is intended for
  1440 + testing only. Never use it for production files. If you are trying to
  1441 + get the same /ID each time for a given file and you are not
  1442 + generating encrypted files, consider using the
  1443 + :samp:`--deterministic-id` option.
  1444 +
  1445 +:samp:`--static-aes-iv`
  1446 + Causes use of a static initialization vector for AES-CBC. This is
  1447 + intended for testing only so that output files can be reproducible.
  1448 + Never use it for production files. This option in particular is not
  1449 + secure since it significantly weakens the encryption.
  1450 +
  1451 +:samp:`--no-original-object-ids`
  1452 + Suppresses inclusion of original object ID comments in QDF files.
  1453 + This can be useful when generating QDF files for test purposes,
  1454 + particularly when comparing them to determine whether two PDF files
  1455 + have identical content.
  1456 +
  1457 +:samp:`--show-encryption`
  1458 + Shows document encryption parameters. Also shows the document's user
  1459 + password if the owner password is given.
  1460 +
  1461 +:samp:`--show-encryption-key`
  1462 + When encryption information is being displayed, as when
  1463 + :samp:`--check` or
  1464 + :samp:`--show-encryption` is given, display the
  1465 + computed or retrieved encryption key as a hexadecimal string. This
  1466 + value is not ordinarily useful to users, but it can be used as the
  1467 + argument to :samp:`--password` if the
  1468 + :samp:`--password-is-hex-key` is specified. Note
  1469 + that, when PDF files are encrypted, passwords and other metadata are
  1470 + used only to compute an encryption key, and the encryption key is
  1471 + what is actually used for encryption. This enables retrieval of that
  1472 + key.
  1473 +
  1474 +:samp:`--check-linearization`
  1475 + Checks file integrity and linearization status.
  1476 +
  1477 +:samp:`--show-linearization`
  1478 + Checks and displays all data in the linearization hint tables.
  1479 +
  1480 +:samp:`--show-xref`
  1481 + Shows the contents of the cross-reference table in a human-readable
  1482 + form. This is especially useful for files with cross-reference
  1483 + streams which are stored in a binary format.
  1484 +
  1485 +:samp:`--show-object=trailer|obj[,gen]`
  1486 + Show the contents of the given object. This is especially useful for
  1487 + inspecting objects that are inside of object streams (also known as
  1488 + "compressed objects").
  1489 +
  1490 +:samp:`--raw-stream-data`
  1491 + When used along with the :samp:`--show-object`
  1492 + option, if the object is a stream, shows the raw stream data instead
  1493 + of object's contents.
  1494 +
  1495 +:samp:`--filtered-stream-data`
  1496 + When used along with the :samp:`--show-object`
  1497 + option, if the object is a stream, shows the filtered stream data
  1498 + instead of object's contents. If the stream is filtered using filters
  1499 + that qpdf does not support, an error will be issued.
  1500 +
  1501 +:samp:`--show-npages`
  1502 + Prints the number of pages in the input file on a line by itself.
  1503 + Since the number of pages appears by itself on a line, this option
  1504 + can be useful for scripting if you need to know the number of pages
  1505 + in a file.
  1506 +
  1507 +:samp:`--show-pages`
  1508 + Shows the object and generation number for each page dictionary
  1509 + object and for each content stream associated with the page. Having
  1510 + this information makes it more convenient to inspect objects from a
  1511 + particular page.
  1512 +
  1513 +:samp:`--with-images`
  1514 + When used along with :samp:`--show-pages`, also shows
  1515 + the object and generation numbers for the image objects on each page.
  1516 + (At present, information about images in shared resource dictionaries
  1517 + are not output by this command. This is discussed in a comment in the
  1518 + source code.)
  1519 +
  1520 +:samp:`--json`
  1521 + Generate a JSON representation of the file. This is described in
  1522 + depth in :ref:`ref.json`
  1523 +
  1524 +:samp:`--json-help`
  1525 + Describe the format of the JSON output.
  1526 +
  1527 +:samp:`--json-key=key`
  1528 + This option is repeatable. If specified, only top-level keys
  1529 + specified will be included in the JSON output. If not specified, all
  1530 + keys will be shown.
  1531 +
  1532 +:samp:`--json-object=trailer|obj[,gen]`
  1533 + This option is repeatable. If specified, only specified objects will
  1534 + be shown in the "``objects``" key of the JSON output. If absent, all
  1535 + objects will be shown.
  1536 +
  1537 +:samp:`--check`
  1538 + Checks file structure and well as encryption, linearization, and
  1539 + encoding of stream data. A file for which
  1540 + :samp:`--check` reports no errors may still have
  1541 + errors in stream data content but should otherwise be structurally
  1542 + sound. If :samp:`--check` any errors, qpdf will exit
  1543 + with a status of 2. There are some recoverable conditions that
  1544 + :samp:`--check` detects. These are issued as warnings
  1545 + instead of errors. If qpdf finds no errors but finds warnings, it
  1546 + will exit with a status of 3 (as of versionย 2.0.4). When
  1547 + :samp:`--check` is combined with other options,
  1548 + checks are always performed before any other options are processed.
  1549 + For erroneous files, :samp:`--check` will cause qpdf
  1550 + to attempt to recover, after which other options are effectively
  1551 + operating on the recovered file. Combining
  1552 + :samp:`--check` with other options in this way can be
  1553 + useful for manually recovering severely damaged files. Note that
  1554 + :samp:`--check` produces no output to standard output
  1555 + when everything is valid, so if you are using this to
  1556 + programmatically validate files in bulk, it is safe to run without
  1557 + output redirected to :file:`/dev/null` and just
  1558 + check for a 0 exit code.
  1559 +
  1560 +The :samp:`--raw-stream-data` and
  1561 +:samp:`--filtered-stream-data` options are ignored
  1562 +unless :samp:`--show-object` is given. Either of these
  1563 +options will cause the stream data to be written to standard output. In
  1564 +order to avoid commingling of stream data with other output, it is
  1565 +recommend that these objects not be combined with other test/inspection
  1566 +options.
  1567 +
  1568 +If :samp:`--filtered-stream-data` is given and
  1569 +:samp:`--normalize-content=y` is also given, qpdf will
  1570 +attempt to normalize the stream data as if it is a page content stream.
  1571 +This attempt will be made even if it is not a page content stream, in
  1572 +which case it will produce unusable results.
  1573 +
  1574 +.. _ref.unicode-passwords:
  1575 +
  1576 +Unicode Passwords
  1577 +-----------------
  1578 +
  1579 +At the library API level, all methods that perform encryption and
  1580 +decryption interpret passwords as strings of bytes. It is up to the
  1581 +caller to ensure that they are appropriately encoded. Starting with qpdf
  1582 +version 8.4.0, qpdf will attempt to make this easier for you when
  1583 +interact with qpdf via its command line interface. The PDF specification
  1584 +requires passwords used to encrypt files with 40-bit or 128-bit
  1585 +encryption to be encoded with PDF Doc encoding. This encoding is a
  1586 +single-byte encoding that supports ISO-Latin-1 and a handful of other
  1587 +commonly used characters. It has a large overlap with Windows ANSI but
  1588 +is not exactly the same. There is generally not a way to provide PDF Doc
  1589 +encoded strings on the command line. As such, qpdf versions prior to
  1590 +8.4.0 would often create PDF files that couldn't be opened with other
  1591 +software when given a password with non-ASCII characters to encrypt a
  1592 +file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf
  1593 +recognizes the encoding of the parameter and transcodes it as needed.
  1594 +The rest of this section provides the details about exactly how qpdf
  1595 +behaves. Most users will not need to know this information, but it might
  1596 +be useful if you have been working around qpdf's old behavior or if you
  1597 +are using qpdf to generate encrypted files for testing other PDF
  1598 +software.
  1599 +
  1600 +A note about Windows: when qpdf builds, it attempts to determine what it
  1601 +has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain``
  1602 +function is an alternative entry point that receives all arguments as
  1603 +UTF-16-encoded strings. When qpdf starts up this way, it converts all
  1604 +the strings to UTF-8 encoding and then invokes the regular main. This
  1605 +means that, as far as qpdf is concerned, it receives its command-line
  1606 +arguments with UTF-8 encoding, just as it would in any modern Linux or
  1607 +UNIX environment.
  1608 +
  1609 +If a file is being encrypted with 40-bit or 128-bit encryption and the
  1610 +supplied password is not a valid UTF-8 string, qpdf will fall back to
  1611 +the behavior of interpreting the password as a string of bytes. If you
  1612 +have old scripts that encrypt files by passing the output of
  1613 +:command:`iconv` to qpdf, you no longer need to do that,
  1614 +but if you do, qpdf should still work. The only exception would be for
  1615 +the extremely unlikely case of a password that is encoded with a
  1616 +single-byte encoding but also happens to be valid UTF-8. Such a password
  1617 +would contain strings of even numbers of characters that alternate
  1618 +between accented letters and symbols. In the extremely unlikely event
  1619 +that you are intentionally using such passwords and qpdf is thwarting
  1620 +you by interpreting them as UTF-8, you can use
  1621 +:samp:`--password-mode=bytes` to suppress qpdf's
  1622 +automatic behavior.
  1623 +
  1624 +The :samp:`--password-mode` option, as described earlier
  1625 +in this chapter, can be used to change qpdf's interpretation of supplied
  1626 +passwords. There are very few reasons to use this option. One would be
  1627 +the unlikely case described in the previous paragraph in which the
  1628 +supplied password happens to be valid UTF-8 but isn't supposed to be
  1629 +UTF-8. Your best bet would be just to provide the password as a valid
  1630 +UTF-8 string, but you could also use
  1631 +:samp:`--password-mode=bytes`. Another reason to use
  1632 +:samp:`--password-mode=bytes` would be to intentionally
  1633 +generate PDF files encrypted with passwords that are not properly
  1634 +encoded. The qpdf test suite does this to generate invalid files for the
  1635 +purpose of testing its password recovery capability. If you were trying
  1636 +to create intentionally incorrect files for a similar purposes, the
  1637 +:samp:`bytes` password mode can enable you to do this.
  1638 +
  1639 +When qpdf attempts to decrypt a file with a password that contains
  1640 +non-ASCII characters, it will generate a list of alternative passwords
  1641 +by attempting to interpret the password as each of a handful of
  1642 +different coding systems and then transcode them to the required format.
  1643 +This helps to compensate for the supplied password being given in the
  1644 +wrong coding system, such as would happen if you used the
  1645 +:command:`iconv` workaround that was previously needed.
  1646 +It also generates passwords by doing the reverse operation: translating
  1647 +from correct in incorrect encoding of the password. This would enable
  1648 +qpdf to decrypt files using passwords that were improperly encoded by
  1649 +whatever software encrypted the files, including older versions of qpdf
  1650 +invoked without properly encoded passwords. The combination of these two
  1651 +recovery methods should make qpdf transparently open most encrypted
  1652 +files with the password supplied correctly but in the wrong coding
  1653 +system. There are no real downsides to this behavior, but if you don't
  1654 +want qpdf to do this, you can use the
  1655 +:samp:`--suppress-password-recovery` option. One reason
  1656 +to do that is to ensure that you know the exact password that was used
  1657 +to encrypt the file.
  1658 +
  1659 +With these changes, qpdf now generates compliant passwords in most
  1660 +cases. There are still some exceptions. In particular, the PDF
  1661 +specification directs compliant writers to normalize Unicode passwords
  1662 +and to perform certain transformations on passwords with bidirectional
  1663 +text. Implementing this functionality requires using a real Unicode
  1664 +library like ICU. If a client application that uses qpdf wants to do
  1665 +this, the qpdf library will accept the resulting passwords, but qpdf
  1666 +will not perform these transformations itself. It is possible that this
  1667 +will be addressed in a future version of qpdf. The ``QPDFWriter``
  1668 +methods that enable encryption on the output file accept passwords as
  1669 +strings of bytes.
  1670 +
  1671 +Please note that the :samp:`--password-is-hex-key`
  1672 +option is unrelated to all this. This flag bypasses the normal process
  1673 +of going from password to encryption string entirely, allowing the raw
  1674 +encryption key to be specified directly. This is useful for forensic
  1675 +purposes or for brute-force recovery of files with unknown passwords.
manual/conf.py
@@ -11,4 +11,7 @@ project = &#39;QPDF&#39; @@ -11,4 +11,7 @@ project = &#39;QPDF&#39;
11 copyright = '2005-2021, Jay Berkenbilt' 11 copyright = '2005-2021, Jay Berkenbilt'
12 author = 'Jay Berkenbilt' 12 author = 'Jay Berkenbilt'
13 release = '10.4.0' 13 release = '10.4.0'
14 -html_theme = 'alabaster' 14 +html_theme = 'agogo'
  15 +html_theme_options = {
  16 + "body_max_width": None,
  17 +}
manual/design.rst 0 โ†’ 100644
  1 +.. _ref.design:
  2 +
  3 +Design and Library Notes
  4 +========================
  5 +
  6 +.. _ref.design.intro:
  7 +
  8 +Introduction
  9 +------------
  10 +
  11 +This section was written prior to the implementation of the qpdf package
  12 +and was subsequently modified to reflect the implementation. In some
  13 +cases, for purposes of explanation, it may differ slightly from the
  14 +actual implementation. As always, the source code and test suite are
  15 +authoritative. Even if there are some errors, this document should serve
  16 +as a road map to understanding how this code works.
  17 +
  18 +In general, one should adhere strictly to a specification when writing
  19 +but be liberal in reading. This way, the product of our software will be
  20 +accepted by the widest range of other programs, and we will accept the
  21 +widest range of input files. This library attempts to conform to that
  22 +philosophy whenever possible but also aims to provide strict checking
  23 +for people who want to validate PDF files. If you don't want to see
  24 +warnings and are trying to write something that is tolerant, you can
  25 +call ``setSuppressWarnings(true)``. If you want to fail on the first
  26 +error, you can call ``setAttemptRecovery(false)``. The default behavior
  27 +is to generating warnings for recoverable problems. Note that recovery
  28 +will not always produce the desired results even if it is able to get
  29 +through the file. Unlike most other PDF files that produce generic
  30 +warnings such as "This file is damaged,", qpdf generally issues a
  31 +detailed error message that would be most useful to a PDF developer.
  32 +This is by design as there seems to be a shortage of PDF validation
  33 +tools out there. This was, in fact, one of the major motivations behind
  34 +the initial creation of qpdf.
  35 +
  36 +.. _ref.design-goals:
  37 +
  38 +Design Goals
  39 +------------
  40 +
  41 +The QPDF package includes support for reading and rewriting PDF files.
  42 +It aims to hide from the user details involving object locations,
  43 +modified (appended) PDF files, the directness/indirectness of objects,
  44 +and stream filters including encryption. It does not aim to hide
  45 +knowledge of the object hierarchy or content stream contents. Put
  46 +another way, a user of the qpdf library is expected to have knowledge
  47 +about how PDF files work, but is not expected to have to keep track of
  48 +bookkeeping details such as file positions.
  49 +
  50 +A user of the library never has to care whether an object is direct or
  51 +indirect, though it is possible to determine whether an object is direct
  52 +or not if this information is needed. All access to objects deals with
  53 +this transparently. All memory management details are also handled by
  54 +the library.
  55 +
  56 +The ``PointerHolder`` object is used internally by the library to deal
  57 +with memory management. This is basically a smart pointer object very
  58 +similar in spirit to C++-11's ``std::shared_ptr`` object, but predating
  59 +it by several years. This library also makes use of a technique for
  60 +giving fine-grained access to methods in one class to other classes by
  61 +using public subclasses with friends and only private members that in
  62 +turn call private methods of the containing class. See
  63 +``QPDFObjectHandle::Factory`` as an example.
  64 +
  65 +The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF
  66 +file. The library provides methods for both accessing and mutating PDF
  67 +files.
  68 +
  69 +The primary class for interacting with PDF objects is
  70 +``QPDFObjectHandle``. Instances of this class can be passed around by
  71 +value, copied, stored in containers, etc. with very low overhead.
  72 +Instances of ``QPDFObjectHandle`` created by reading from a file will
  73 +always contain a reference back to the ``QPDF`` object from which they
  74 +were created. A ``QPDFObjectHandle`` may be direct or indirect. If
  75 +indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to
  76 +is a null pointer. In this case, the first attempt to access the
  77 +underlying ``QPDFObject`` will result in the ``QPDFObject`` being
  78 +resolved via a call to the referenced ``QPDF`` instance. This makes it
  79 +essentially impossible to make coding errors in which certain things
  80 +will work for some PDF files and not for others based on which objects
  81 +are direct and which objects are indirect.
  82 +
  83 +Instances of ``QPDFObjectHandle`` can be directly created and modified
  84 +using static factory methods in the ``QPDFObjectHandle`` class. There
  85 +are factory methods for each type of object as well as a convenience
  86 +method ``QPDFObjectHandle::parse`` that creates an object from a string
  87 +representation of the object. Existing instances of ``QPDFObjectHandle``
  88 +can also be modified in several ways. See comments in
  89 +:file:`QPDFObjectHandle.hh` for details.
  90 +
  91 +An instance of ``QPDF`` is constructed by using the class's default
  92 +constructor. If desired, the ``QPDF`` object may be configured with
  93 +various methods that change its default behavior. Then the
  94 +``QPDF::processFile()`` method is passed the name of a PDF file, which
  95 +permanently associates the file with that QPDF object. A password may
  96 +also be given for access to password-protected files. QPDF does not
  97 +enforce encryption parameters and will treat user and owner passwords
  98 +equivalently. Either password may be used to access an encrypted file.
  99 +``QPDF`` will allow recovery of a user password given an owner password.
  100 +The input PDF file must be seekable. (Output files written by
  101 +``QPDFWriter`` need not be seekable, even when creating linearized
  102 +files.) During construction, ``QPDF`` validates the PDF file's header,
  103 +and then reads the cross reference tables and trailer dictionaries. The
  104 +``QPDF`` class keeps only the first trailer dictionary though it does
  105 +read all of them so it can check the ``/Prev`` key. ``QPDF`` class users
  106 +may request the root object and the trailer dictionary specifically. The
  107 +cross reference table is kept private. Objects may then be requested by
  108 +number of by walking the object tree.
  109 +
  110 +When a PDF file has a cross-reference stream instead of a
  111 +cross-reference table and trailer, requesting the document's trailer
  112 +dictionary returns the stream dictionary from the cross-reference stream
  113 +instead.
  114 +
  115 +There are some convenience routines for very common operations such as
  116 +walking the page tree and returning a vector of all page objects. For
  117 +full details, please see the header files
  118 +:file:`QPDF.hh` and
  119 +:file:`QPDFObjectHandle.hh`. There are also some
  120 +additional helper classes that provide higher level API functions for
  121 +certain document constructions. These are discussed in :ref:`ref.helper-classes`.
  122 +
  123 +.. _ref.helper-classes:
  124 +
  125 +Helper Classes
  126 +--------------
  127 +
  128 +QPDF version 8.1 introduced the concept of helper classes. Helper
  129 +classes are intended to contain higher level APIs that allow developers
  130 +to work with certain document constructs at an abstraction level above
  131 +that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of
  132 +not hiding document structure from the developer. As with qpdf in
  133 +general, the goal is take away some of the more tedious bookkeeping
  134 +aspects of working with PDF files, not to remove the need for the
  135 +developer to understand how the PDF construction in question works. The
  136 +driving factor behind the creation of helper classes was to allow the
  137 +evolution of higher level interfaces in qpdf without polluting the
  138 +interfaces of the main top-level classes ``QPDF`` and
  139 +``QPDFObjectHandle``.
  140 +
  141 +There are two kinds of helper classes: *document* helpers and *object*
  142 +helpers. Document helpers are constructed with a reference to a ``QPDF``
  143 +object and provide methods for working with structures that are at the
  144 +document level. Object helpers are constructed with an instance of a
  145 +``QPDFObjectHandle`` and provide methods for working with specific types
  146 +of objects.
  147 +
  148 +Examples of document helpers include ``QPDFPageDocumentHelper``, which
  149 +contains methods for operating on the document's page trees, such as
  150 +enumerating all pages of a document and adding and removing pages; and
  151 +``QPDFAcroFormDocumentHelper``, which contains document-level methods
  152 +related to interactive forms, such as enumerating form fields and
  153 +creating mappings between form fields and annotations.
  154 +
  155 +Examples of object helpers include ``QPDFPageObjectHelper`` for
  156 +performing operations on pages such as page rotation and some operations
  157 +on content streams, ``QPDFFormFieldObjectHelper`` for performing
  158 +operations related to interactive form fields, and
  159 +``QPDFAnnotationObjectHelper`` for working with annotations.
  160 +
  161 +It is always possible to retrieve the underlying ``QPDF`` reference from
  162 +a document helper and the underlying ``QPDFObjectHandle`` reference from
  163 +an object helper. Helpers are designed to be helpers, not wrappers. The
  164 +intention is that, in general, it is safe to freely intermix operations
  165 +that use helpers with operations that use the underlying objects.
  166 +Document and object helpers do not attempt to provide a complete
  167 +interface for working with the things they are helping with, nor do they
  168 +attempt to encapsulate underlying structures. They just provide a few
  169 +methods to help with error-prone, repetitive, or complex tasks. In some
  170 +cases, a helper object may cache some information that is expensive to
  171 +gather. In such cases, the helper classes are implemented so that their
  172 +own methods keep the cache consistent, and the header file will provide
  173 +a method to invalidate the cache and a description of what kinds of
  174 +operations would make the cache invalid. If in doubt, you can always
  175 +discard a helper class and create a new one with the same underlying
  176 +objects, which will ensure that you have discarded any stale
  177 +information.
  178 +
  179 +By Convention, document helpers are called
  180 +``QPDFSomethingDocumentHelper`` and are derived from
  181 +``QPDFDocumentHelper``, and object helpers are called
  182 +``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``.
  183 +For details on specific helpers, please see their header files. You can
  184 +find them by looking at
  185 +:file:`include/qpdf/QPDF*DocumentHelper.hh` and
  186 +:file:`include/qpdf/QPDF*ObjectHelper.hh`.
  187 +
  188 +In order to avoid creation of circular dependencies, the following
  189 +general guidelines are followed with helper classes:
  190 +
  191 +- Core class interfaces do not know about helper classes. For example,
  192 + no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper
  193 + classes in their interfaces.
  194 +
  195 +- Interfaces of object helpers will usually not use document helpers in
  196 + their interfaces. This is because it is much more useful for document
  197 + helpers to have methods that return object helpers. Most operations
  198 + in PDF files start at the document level and go from there to the
  199 + object level rather than the other way around. It can sometimes be
  200 + useful to map back from object-level structures to document-level
  201 + structures. If there is a desire to do this, it will generally be
  202 + provided by a method in the document helper class.
  203 +
  204 +- Most of the time, object helpers don't know about other object
  205 + helpers. However, in some cases, one type of object may be a
  206 + container for another type of object, in which case it may make sense
  207 + for the outer object to know about the inner object. For example,
  208 + there are methods in the ``QPDFPageObjectHelper`` that know
  209 + ``QPDFAnnotationObjectHelper`` because references to annotations are
  210 + contained in page dictionaries.
  211 +
  212 +- Any helper or core library class may use helpers in their
  213 + implementations.
  214 +
  215 +Prior to qpdf version 8.1, higher level interfaces were added as
  216 +"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For
  217 +compatibility, older convenience functions for operating with pages will
  218 +remain in those classes even as alternatives are provided in helper
  219 +classes. Going forward, new higher level interfaces will be provided
  220 +using helper classes.
  221 +
  222 +.. _ref.implementation-notes:
  223 +
  224 +Implementation Notes
  225 +--------------------
  226 +
  227 +This section contains a few notes about QPDF's internal implementation,
  228 +particularly around what it does when it first processes a file. This
  229 +section is a bit of a simplification of what it actually does, but it
  230 +could serve as a starting point to someone trying to understand the
  231 +implementation. There is nothing in this section that you need to know
  232 +to use the qpdf library.
  233 +
  234 +``QPDFObject`` is the basic PDF Object class. It is an abstract base
  235 +class from which are derived classes for each type of PDF object.
  236 +Clients do not interact with Objects directly but instead interact with
  237 +``QPDFObjectHandle``.
  238 +
  239 +When the ``QPDF`` class creates a new object, it dynamically allocates
  240 +the appropriate type of ``QPDFObject`` and immediately hands the pointer
  241 +to an instance of ``QPDFObjectHandle``. The parser reads a token from
  242 +the current file position. If the token is a not either a dictionary or
  243 +array opener, an object is immediately constructed from the single token
  244 +and the parser returns. Otherwise, the parser iterates in a special mode
  245 +in which it accumulates objects until it finds a balancing closer.
  246 +During this process, the "``R``" keyword is recognized and an indirect
  247 +``QPDFObjectHandle`` may be constructed.
  248 +
  249 +The ``QPDF::resolve()`` method, which is used to resolve an indirect
  250 +object, may be invoked from the ``QPDFObjectHandle`` class. It first
  251 +checks a cache to see whether this object has already been read. If not,
  252 +it reads the object from the PDF file and caches it. It the returns the
  253 +resulting ``QPDFObjectHandle``. The calling object handle then replaces
  254 +its ``PointerHolder<QDFObject>`` with the one from the newly returned
  255 +``QPDFObjectHandle``. In this way, only a single copy of any direct
  256 +object need exist and clients can access objects transparently without
  257 +knowing caring whether they are direct or indirect objects.
  258 +Additionally, no object is ever read from the file more than once. That
  259 +means that only the portions of the PDF file that are actually needed
  260 +are ever read from the input file, thus allowing the qpdf package to
  261 +take advantage of this important design goal of PDF files.
  262 +
  263 +If the requested object is inside of an object stream, the object stream
  264 +itself is first read into memory. Then the tokenizer reads objects from
  265 +the memory stream based on the offset information stored in the stream.
  266 +Those individual objects are cached, after which the temporary buffer
  267 +holding the object stream contents are discarded. In this way, the first
  268 +time an object in an object stream is requested, all objects in the
  269 +stream are cached.
  270 +
  271 +The following example should clarify how ``QPDF`` processes a simple
  272 +file.
  273 +
  274 +- Client constructs ``QPDF`` ``pdf`` and calls
  275 + ``pdf.processFile("a.pdf");``.
  276 +
  277 +- The ``QPDF`` class checks the beginning of
  278 + :file:`a.pdf` for a PDF header. It then reads the
  279 + cross reference table mentioned at the end of the file, ensuring that
  280 + it is looking before the last ``%%EOF``. After getting to ``trailer``
  281 + keyword, it invokes the parser.
  282 +
  283 +- The parser sees "``<<``", so it calls itself recursively in
  284 + dictionary creation mode.
  285 +
  286 +- In dictionary creation mode, the parser keeps accumulating objects
  287 + until it encounters "``>>``". Each object that is read is pushed onto
  288 + a stack. If "``R``" is read, the last two objects on the stack are
  289 + inspected. If they are integers, they are popped off the stack and
  290 + their values are used to construct an indirect object handle which is
  291 + then pushed onto the stack. When "``>>``" is finally read, the stack
  292 + is converted into a ``QPDF_Dictionary`` which is placed in a
  293 + ``QPDFObjectHandle`` and returned.
  294 +
  295 +- The resulting dictionary is saved as the trailer dictionary.
  296 +
  297 +- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that
  298 + point and repeats except that the new trailer dictionary is not
  299 + saved. If ``/Prev`` is not present, the initial parsing process is
  300 + complete.
  301 +
  302 + If there is an encryption dictionary, the document's encryption
  303 + parameters are initialized.
  304 +
  305 +- The client requests root object. The ``QPDF`` class gets the value of
  306 + root key from trailer dictionary and returns it. It is an unresolved
  307 + indirect ``QPDFObjectHandle``.
  308 +
  309 +- The client requests the ``/Pages`` key from root
  310 + ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is
  311 + indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the
  312 + object cache for an object with the root dictionary's object ID and
  313 + generation number. Upon not seeing it, it checks the cross reference
  314 + table, gets the offset, and reads the object present at that offset.
  315 + It stores the result in the object cache and returns the cached
  316 + result. The calling ``QPDFObjectHandle`` replaces its object pointer
  317 + with the one from the resolved ``QPDFObjectHandle``, verifies that it
  318 + a valid dictionary object, and returns the (unresolved indirect)
  319 + ``QPDFObject`` handle to the top of the Pages hierarchy.
  320 +
  321 + As the client continues to request objects, the same process is
  322 + followed for each new requested object.
  323 +
  324 +.. _ref.casting:
  325 +
  326 +Casting Policy
  327 +--------------
  328 +
  329 +This section describes the casting policy followed by qpdf's
  330 +implementation. This is no concern to qpdf's end users and largely of no
  331 +concern to people writing code that uses qpdf, but it could be of
  332 +interest to people who are porting qpdf to a new platform or who are
  333 +making modifications to the code.
  334 +
  335 +The C++ code in qpdf is free of old-style casts except where unavoidable
  336 +(e.g. where the old-style cast is in a macro provided by a third-party
  337 +header file). When there is a need for a cast, it is handled, in order
  338 +of preference, by rewriting the code to avoid the need for a cast,
  339 +calling ``const_cast``, calling ``static_cast``, calling
  340 +``reinterpret_cast``, or calling some combination of the above. As a
  341 +last resort, a compiler-specific ``#pragma`` may be used to suppress a
  342 +warning that we don't want to fix. Examples may include suppressing
  343 +warnings about the use of old-style casts in code that is shared between
  344 +C and C++ code.
  345 +
  346 +The ``QIntC`` namespace, provided by
  347 +:file:`include/qpdf/QIntC.hh`, implements safe
  348 +functions for converting between integer types. These functions do range
  349 +checking and throw a ``std::range_error``, which is subclass of
  350 +``std::runtime_error``, if conversion from one integer type to another
  351 +results in loss of information. There are many cases in which we have to
  352 +move between different integer types because of incompatible integer
  353 +types used in interoperable interfaces. Some are unavoidable, such as
  354 +moving between sizes and offsets, and others are there because of old
  355 +code that is too in entrenched to be fixable without breaking source
  356 +compatibility and causing pain for users. QPDF is compiled with extra
  357 +warnings to detect conversions with potential data loss, and all such
  358 +cases should be fixed by either using a function from ``QIntC`` or a
  359 +``static_cast``.
  360 +
  361 +When the intention is just to switch the type because of exchanging data
  362 +between incompatible interfaces, use ``QIntC``. This is the usual case.
  363 +However, there are some cases in which we are explicitly intending to
  364 +use the exact same bit pattern with a different type. This is most
  365 +common when switching between signed and unsigned characters. A lot of
  366 +qpdf's code uses unsigned characters internally, but ``std::string`` and
  367 +``char`` are signed. Using ``QIntC::to_char`` would be wrong for
  368 +converting from unsigned to signed characters because a negative
  369 +``char`` value and the corresponding ``unsigned char`` value greater
  370 +than 127 *mean the same thing*. There are also
  371 +cases in which we use ``static_cast`` when working with bit fields where
  372 +we are not representing a numerical value but rather a bunch of bits
  373 +packed together in some integer type. Also note that ``size_t`` and
  374 +``long`` both typically differ between 32-bit and 64-bit environments,
  375 +so sometimes an explicit cast may not be needed to avoid warnings on one
  376 +platform but may be needed on another. A conversion with ``QIntC``
  377 +should always be used when the types are different even if the
  378 +underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit
  379 +platforms, and the test suite is very thorough, so it is hard to make
  380 +any of the potential errors here without being caught in build or test.
  381 +
  382 +Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The
  383 +pipeline interface has a ``write`` call that uses ``unsigned char*``
  384 +without a ``const`` qualifier. The main reason for this is
  385 +to support pipelines that make calls to third-party libraries, such as
  386 +zlib, that don't include ``const`` in their interfaces. Unfortunately,
  387 +there are many places in the code where it is desirable to have
  388 +``const char*`` with pipelines. None of the pipeline implementations
  389 +in qpdf
  390 +currently modify the data passed to write, and doing so would be counter
  391 +to the intent of ``Pipeline``, but there is nothing in the code to
  392 +prevent this from being done. There are places in the code where
  393 +``const_cast`` is used to remove the const-ness of pointers going into
  394 +``Pipeline``\ s. This could theoretically be unsafe, but there is
  395 +adequate testing to assert that it is safe and will remain safe in
  396 +qpdf's code.
  397 +
  398 +.. _ref.encryption:
  399 +
  400 +Encryption
  401 +----------
  402 +
  403 +Encryption is supported transparently by qpdf. When opening a PDF file,
  404 +if an encryption dictionary exists, the ``QPDF`` object processes this
  405 +dictionary using the password (if any) provided. The primary decryption
  406 +key is computed and cached. No further access is made to the encryption
  407 +dictionary after that time. When an object is read from a file, the
  408 +object ID and generation of the object in which it is contained is
  409 +always known. Using this information along with the stored encryption
  410 +key, all stream and string objects are transparently decrypted. Raw
  411 +encrypted objects are never stored in memory. This way, nothing in the
  412 +library ever has to know or care whether it is reading an encrypted
  413 +file.
  414 +
  415 +An interface is also provided for writing encrypted streams and strings
  416 +given an encryption key. This is used by ``QPDFWriter`` when it rewrites
  417 +encrypted files.
  418 +
  419 +When copying encrypted files, unless otherwise directed, qpdf will
  420 +preserve any encryption in force in the original file. qpdf can do this
  421 +with either the user or the owner password. There is no difference in
  422 +capability based on which password is used. When 40 or 128 bit
  423 +encryption keys are used, the user password can be recovered with the
  424 +owner password. With 256 keys, the user and owner passwords are used
  425 +independently to encrypt the actual encryption key, so while either can
  426 +be used, the owner password can no longer be used to recover the user
  427 +password.
  428 +
  429 +Starting with version 4.0.0, qpdf can read files that are not encrypted
  430 +but that contain encrypted attachments, but it cannot write such files.
  431 +qpdf also requires the password to be specified in order to open the
  432 +file, not just to extract attachments, since once the file is open, all
  433 +decryption is handled transparently. When copying files like this while
  434 +preserving encryption, qpdf will apply the file's encryption to
  435 +everything in the file, not just to the attachments. When decrypting the
  436 +file, qpdf will decrypt the attachments. In general, when copying PDF
  437 +files with multiple encryption formats, qpdf will choose the newest
  438 +format. The only exception to this is that clear-text metadata will be
  439 +preserved as clear-text if it is that way in the original file.
  440 +
  441 +One point of confusion some people have about encrypted PDF files is
  442 +that encryption is not the same as password protection. Password
  443 +protected files are always encrypted, but it is also possible to create
  444 +encrypted files that do not have passwords. Internally, such files use
  445 +the empty string as a password, and most readers try the empty string
  446 +first to see if it works and prompt for a password only if the empty
  447 +string doesn't work. Normally such files have an empty user password and
  448 +a non-empty owner password. In that way, if the file is opened by an
  449 +ordinary reader without specification of password, the restrictions
  450 +specified in the encryption dictionary can be enforced. Most users
  451 +wouldn't even realize such a file was encrypted. Since qpdf always
  452 +ignores the restrictions (except for the purpose of reporting what they
  453 +are), qpdf doesn't care which password you use. QPDF will allow you to
  454 +create PDF files with non-empty user passwords and empty owner
  455 +passwords. Some readers will require a password when you open these
  456 +files, and others will open the files without a password and not enforce
  457 +restrictions. Having a non-empty user password and an empty owner
  458 +password doesn't really make sense because it would mean that opening
  459 +the file with the user password would be more restrictive than not
  460 +supplying a password at all. QPDF also allows you to create PDF files
  461 +with the same password as both the user and owner password. Some readers
  462 +will not ever allow such files to be accessed without restrictions
  463 +because they never try the password as the owner password if it works as
  464 +the user password. Nonetheless, one of the powerful aspects of qpdf is
  465 +that it allows you to finely specify the way encrypted files are
  466 +created, even if the results are not useful to some readers. One use
  467 +case for this would be for testing a PDF reader to ensure that it
  468 +handles odd configurations of input files.
  469 +
  470 +.. _ref.random-numbers:
  471 +
  472 +Random Number Generation
  473 +------------------------
  474 +
  475 +QPDF generates random numbers to support generation of encrypted data.
  476 +Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of
  477 +random numbers. Older versions used the OS-provided source of secure
  478 +random numbers or, if allowed at build time, insecure random numbers
  479 +from stdlib. Starting with version 5.1.0, you can disable use of
  480 +OS-provided secure random numbers at build time. This is especially
  481 +useful on Windows if you want to avoid a dependency on Microsoft's
  482 +cryptography API. You can also supply your own random data provider. For
  483 +details on how to do this, please refer to the top-level README.md file
  484 +in the source distribution and to comments in
  485 +:file:`QUtil.hh`.
  486 +
  487 +.. _ref.adding-and-remove-pages:
  488 +
  489 +Adding and Removing Pages
  490 +-------------------------
  491 +
  492 +While qpdf's API has supported adding and modifying objects for some
  493 +time, version 3.0 introduces specific methods for adding and removing
  494 +pages. These are largely convenience routines that handle two tricky
  495 +issues: pushing inheritable resources from the ``/Pages`` tree down to
  496 +individual pages and manipulation of the ``/Pages`` tree itself. For
  497 +details, see ``addPage`` and surrounding methods in
  498 +:file:`QPDF.hh`.
  499 +
  500 +.. _ref.reserved-objects:
  501 +
  502 +Reserving Object Numbers
  503 +------------------------
  504 +
  505 +Version 3.0 of qpdf introduced the concept of reserved objects. These
  506 +are seldom needed for ordinary operations, but there are cases in which
  507 +you may want to add a series of indirect objects with references to each
  508 +other to a ``QPDF`` object. This causes a problem because you can't
  509 +determine the object ID that a new indirect object will have until you
  510 +add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The
  511 +only way to add two mutually referential objects to a ``QPDF`` object
  512 +prior to version 3.0 would be to add the new objects first and then make
  513 +them refer to each other after adding them. Now it is possible to create
  514 +a *reserved object* using
  515 +``QPDFObjectHandle::newReserved``. This is an indirect object that stays
  516 +"unresolved" even if it is queried for its type. So now, if you want to
  517 +create a set of mutually referential objects, you can create
  518 +reservations for each one of them and use those reservations to
  519 +construct the references. When finished, you can call
  520 +``QPDF::replaceReserved`` to replace the reserved objects with the real
  521 +ones. This functionality will never be needed by most applications, but
  522 +it is used internally by QPDF when copying objects from other PDF files,
  523 +as discussed in :ref:`ref.foreign-objects`. For an example of how to use reserved
  524 +objects, search for ``newReserved`` in
  525 +:file:`test_driver.cc` in qpdf's sources.
  526 +
  527 +.. _ref.foreign-objects:
  528 +
  529 +Copying Objects From Other PDF Files
  530 +------------------------------------
  531 +
  532 +Version 3.0 of qpdf introduced the ability to copy objects into a
  533 +``QPDF`` object from a different ``QPDF`` object, which we refer to as
  534 +*foreign objects*. This allows arbitrary
  535 +merging of PDF files. The "from" ``QPDF`` object must remain valid after
  536 +the copy as discussed in the note below. The
  537 +:command:`qpdf` command-line tool provides limited
  538 +support for basic page selection, including merging in pages from other
  539 +files, but the library's API makes it possible to implement arbitrarily
  540 +complex merging operations. The main method for copying foreign objects
  541 +is ``QPDF::copyForeignObject``. This takes an indirect object from
  542 +another ``QPDF`` and copies it recursively into this object while
  543 +preserving all object structure, including circular references. This
  544 +means you can add a direct object that you create from scratch to a
  545 +``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an
  546 +indirect object from another file with ``QPDF::copyForeignObject``. The
  547 +fact that ``QPDF::makeIndirectObject`` does not automatically detect a
  548 +foreign object and copy it is an explicit design decision. Copying a
  549 +foreign object seems like a sufficiently significant thing to do that it
  550 +should be done explicitly.
  551 +
  552 +The other way to copy foreign objects is by passing a page from one
  553 +``QPDF`` to another by calling ``QPDF::addPage``. In contrast to
  554 +``QPDF::makeIndirectObject``, this method automatically distinguishes
  555 +between indirect objects in the current file, foreign objects, and
  556 +direct objects.
  557 +
  558 +Please note: when you copy objects from one ``QPDF`` to another, the
  559 +source ``QPDF`` object must remain valid until you have finished with
  560 +the destination object. This is because the original object is still
  561 +used to retrieve any referenced stream data from the copied object.
  562 +
  563 +.. _ref.rewriting:
  564 +
  565 +Writing PDF Files
  566 +-----------------
  567 +
  568 +The qpdf library supports file writing of ``QPDF`` objects to PDF files
  569 +through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two
  570 +writing modes: one for non-linearized files, and one for linearized
  571 +files. See :ref:`ref.linearization` for a description of
  572 +linearization is implemented. This section describes how we write
  573 +non-linearized files including the creation of QDF files (see :ref:`ref.qdf`.
  574 +
  575 +This outline was written prior to implementation and is not exactly
  576 +accurate, but it provides a correct "notional" idea of how writing
  577 +works. Look at the code in ``QPDFWriter`` for exact details.
  578 +
  579 +- Initialize state:
  580 +
  581 + - next object number = 1
  582 +
  583 + - object queue = empty
  584 +
  585 + - renumber table: old object id/generation to new id/0 = empty
  586 +
  587 + - xref table: new id -> offset = empty
  588 +
  589 +- Create a QPDF object from a file.
  590 +
  591 +- Write header for new PDF file.
  592 +
  593 +- Request the trailer dictionary.
  594 +
  595 +- For each value that is an indirect object, grab the next object
  596 + number (via an operation that returns and increments the number). Map
  597 + object to new number in renumber table. Push object onto queue.
  598 +
  599 +- While there are more objects on the queue:
  600 +
  601 + - Pop queue.
  602 +
  603 + - Look up object's new number *n* in the renumbering table.
  604 +
  605 + - Store current offset into xref table.
  606 +
  607 + - Write ``:samp:`{n}` 0 obj``.
  608 +
  609 + - If object is null, whether direct or indirect, write out null,
  610 + thus eliminating unresolvable indirect object references.
  611 +
  612 + - If the object is a stream stream, write stream contents, piped
  613 + through any filters as required, to a memory buffer. Use this
  614 + buffer to determine the stream length.
  615 +
  616 + - If object is not a stream, array, or dictionary, write out its
  617 + contents.
  618 +
  619 + - If object is an array or dictionary (including stream), traverse
  620 + its elements (for array) or values (for dictionaries), handling
  621 + recursive dictionaries and arrays, looking for indirect objects.
  622 + When an indirect object is found, if it is not resolvable, ignore.
  623 + (This case is handled when writing it out.) Otherwise, look it up
  624 + in the renumbering table. If not found, grab the next available
  625 + object number, assign to the referenced object in the renumbering
  626 + table, and push the referenced object onto the queue. As a special
  627 + case, when writing out a stream dictionary, replace length,
  628 + filters, and decode parameters as required.
  629 +
  630 + Write out dictionary or array, replacing any unresolvable indirect
  631 + object references with null (pdf spec says reference to
  632 + non-existent object is legal and resolves to null) and any
  633 + resolvable ones with references to the renumbered objects.
  634 +
  635 + - If the object is a stream, write ``stream\n``, the stream contents
  636 + (from the memory buffer), and ``\nendstream\n``.
  637 +
  638 + - When done, write ``endobj``.
  639 +
  640 +Once we have finished the queue, all referenced objects will have been
  641 +written out and all deleted objects or unreferenced objects will have
  642 +been skipped. The new cross-reference table will contain an offset for
  643 +every new object number from 1 up to the number of objects written. This
  644 +can be used to write out a new xref table. Finally we can write out the
  645 +trailer dictionary with appropriately computed /ID (see spec, 8.3, File
  646 +Identifiers), the cross reference table offset, and ``%%EOF``.
  647 +
  648 +.. _ref.filtered-streams:
  649 +
  650 +Filtered Streams
  651 +----------------
  652 +
  653 +Support for streams is implemented through the ``Pipeline`` interface
  654 +which was designed for this package.
  655 +
  656 +When reading streams, create a series of ``Pipeline`` objects. The
  657 +``Pipeline`` abstract base requires implementation ``write()`` and
  658 +``finish()`` and provides an implementation of ``getNext()``. Each
  659 +pipeline object, upon receiving data, does whatever it is going to do
  660 +and then writes the data (possibly modified) to its successor.
  661 +Alternatively, a pipeline may be an end-of-the-line pipeline that does
  662 +something like store its output to a file or a memory buffer ignoring a
  663 +successor. For additional details, look at
  664 +:file:`Pipeline.hh`.
  665 +
  666 +``QPDF`` can read raw or filtered streams. When reading a filtered
  667 +stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each
  668 +appropriate filter object and chains them together. The last filter
  669 +should write to whatever type of output is required. The ``QPDF`` class
  670 +has an interface to write raw or filtered stream contents to a given
  671 +pipeline.
  672 +
  673 +.. _ref.object-accessors:
  674 +
  675 +Object Accessor Methods
  676 +-----------------------
  677 +
  678 +..
  679 + This section is referenced in QPDFObjectHandle.hh
  680 +
  681 +For general information about how to access instances of
  682 +``QPDFObjectHandle``, please see the comments in
  683 +:file:`QPDFObjectHandle.hh`. Search for "Accessor
  684 +methods". This section provides a more in-depth discussion of the
  685 +behavior and the rationale for the behavior.
  686 +
  687 +*Why were type errors made into warnings?* When type checks were
  688 +introduced into qpdf in the early days, it was expected that type errors
  689 +would only occur as a result of programmer error. However, in practice,
  690 +type errors would occur with malformed PDF files because of assumptions
  691 +made in code, including code within the qpdf library and code written by
  692 +library users. The most common case would be chaining calls to
  693 +``getKey()`` to access keys deep within a dictionary. In many cases,
  694 +qpdf would be able to recover from these situations, but the old
  695 +behavior often resulted in crashes rather than graceful recovery. For
  696 +this reason, the errors were changed to warnings.
  697 +
  698 +*Why even warn about type errors when the user can't usually do anything
  699 +about them?* Type warnings are extremely valuable during development.
  700 +Since it's impossible to catch at compile time things like typos in
  701 +dictionary key names or logic errors around what the structure of a PDF
  702 +file might be, the presence of type warnings can save lots of developer
  703 +time. They have also proven useful in exposing issues in qpdf itself
  704 +that would have otherwise gone undetected.
  705 +
  706 +*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if
  707 +``QPDFObjectHandle`` could be more strongly typed so that you'd have to
  708 +have check that something was of a particular type before calling
  709 +type-specific accessor methods. However, implementing this at this stage
  710 +of the library's history would be quite difficult, and it would make a
  711 +the common pattern of drilling into an object no longer work. While it
  712 +would be possible to have a parallel interface, it would create a lot of
  713 +extra code. If qpdf were written in a language like rust, an interface
  714 +like this would make a lot of sense, but, for a variety of reasons, the
  715 +qpdf API is consistent with other APIs of its time, relying on exception
  716 +handling to catch errors. The underlying PDF objects are inherently not
  717 +type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would
  718 +ultimately cause a lot more code to have to be written and would like
  719 +make software that uses qpdf more brittle, and even so, checks would
  720 +have to occur at runtime.
  721 +
  722 +*Why do type errors sometimes raise exceptions?* The way warnings work
  723 +in qpdf requires a ``QPDF`` object to be associated with an object
  724 +handle for a warning to be issued. It would be nice if this could be
  725 +fixed, but it would require major changes to the API. Rather than
  726 +throwing away these conditions, we convert them to exceptions. It's not
  727 +that bad though. Since any object handle that was read from a file has
  728 +an associated ``QPDF`` object, it would only be type errors on objects
  729 +that were created explicitly that would cause exceptions, and in that
  730 +case, type errors are much more likely to be the result of a coding
  731 +error than invalid input.
  732 +
  733 +*Why does the behavior of a type exception differ between the C and C++
  734 +API?* There is no way to throw and catch exceptions in C short of
  735 +something like ``setjmp`` and ``longjmp``, and that approach is not
  736 +portable across language barriers. Since the C API is often used from
  737 +other languages, it's important to keep things as simple as possible.
  738 +Starting in qpdf 10.5, exceptions that used to crash code using the C
  739 +API will be written to stderr by default, and it is possible to register
  740 +an error handler. There's no reason that the error handler can't
  741 +simulate exception handling in some way, such as by using ``setjmp`` and
  742 +``longjmp`` or by setting some variable that can be checked after
  743 +library calls are made. In retrospect, it might have been better if the
  744 +C API object handle methods returned error codes like the other methods
  745 +and set return values in passed-in pointers, but this would complicate
  746 +both the implementation and the use of the library for a case that is
  747 +actually quite rare and largely avoidable.
manual/index.rst
@@ -9,6261 +9,16 @@ QPDF version |release| @@ -9,6261 +9,16 @@ QPDF version |release|
9 :maxdepth: 2 9 :maxdepth: 2
10 :caption: Contents: 10 :caption: Contents:
11 11
12 -.. _ref.overview:  
13 -  
14 -What is QPDF?  
15 -=============  
16 -  
17 -QPDF is a program and C++ library for structural, content-preserving  
18 -transformations on PDF files. QPDF's website is located at  
19 -https://qpdf.sourceforge.io/. QPDF's source code is hosted on github  
20 -at https://github.com/qpdf/qpdf.  
21 -  
22 -QPDF provides many useful capabilities to developers of PDF-producing  
23 -software or for people who just want to look at the innards of a PDF  
24 -file to learn more about how they work. With QPDF, it is possible to  
25 -copy objects from one PDF file into another and to manipulate the list  
26 -of pages in a PDF file. This makes it possible to merge and split PDF  
27 -files. The QPDF library also makes it possible for you to create PDF  
28 -files from scratch. In this mode, you are responsible for supplying  
29 -all the contents of the file, while the QPDF library takes care of all  
30 -the syntactical representation of the objects, creation of cross  
31 -references tables and, if you use them, object streams, encryption,  
32 -linearization, and other syntactic details. You are still responsible  
33 -for generating PDF content on your own.  
34 -  
35 -QPDF has been designed with very few external dependencies, and it is  
36 -intentionally very lightweight. QPDF is *not* a PDF content creation  
37 -library, a PDF viewer, or a program capable of converting PDF into other  
38 -formats. In particular, QPDF knows nothing about the semantics of PDF  
39 -content streams. If you are looking for something that can do that, you  
40 -should look elsewhere. However, once you have a valid PDF file, QPDF can  
41 -be used to transform that file in ways that perhaps your original PDF  
42 -creation tool can't handle. For example, many programs generate simple PDF  
43 -files but can't password-protect them, web-optimize them, or perform  
44 -other transformations of that type.  
45 -  
46 -.. _ref.license:  
47 -  
48 -License  
49 -=======  
50 -  
51 -QPDF is licensed under `the Apache License, Version 2.0  
52 -<http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License").  
53 -Unless required by applicable law or agreed to in writing, software  
54 -distributed under the License is distributed on an "AS IS" BASIS,  
55 -WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or  
56 -implied. See the License for the specific language governing  
57 -permissions and limitations under the License.  
58 -  
59 -.. _ref.installing:  
60 -  
61 -Building and Installing QPDF  
62 -============================  
63 -  
64 -This chapter describes how to build and install qpdf. Please see also  
65 -the :file:`README.md` and  
66 -:file:`INSTALL` files in the source distribution.  
67 -  
68 -.. _ref.prerequisites:  
69 -  
70 -System Requirements  
71 --------------------  
72 -  
73 -The qpdf package has few external dependencies. In order to build qpdf,  
74 -the following packages are required:  
75 -  
76 -- A C++ compiler that supports C++-14.  
77 -  
78 -- zlib: http://www.zlib.net/  
79 -  
80 -- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/  
81 -  
82 -- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be  
83 - able to use the gnutls crypto provider, and/or openssl:  
84 - https://openssl.org/ to be able to use the openssl crypto provider.  
85 -  
86 -- gnu make 3.81 or newer: http://www.gnu.org/software/make  
87 -  
88 -- perl version 5.8 or newer: http://www.perl.org/; required for running  
89 - the test suite. Starting with qpdf version 9.1.1, perl is no longer  
90 - required at runtime.  
91 -  
92 -- GNU diffutils (any version): http://www.gnu.org/software/diffutils/  
93 - is required to run the test suite. Note that this is the version of  
94 - diff present on virtually all GNU/Linux systems. This is required  
95 - because the test suite uses :command:`diff -u`.  
96 -  
97 -Part of qpdf's test suite does comparisons of the contents PDF files by  
98 -converting them images and comparing the images. The image comparison  
99 -tests are disabled by default. Those tests are not required for  
100 -determining correctness of a qpdf build if you have not modified the  
101 -code since the test suite also contains expected output files that are  
102 -compared literally. The image comparison tests provide an extra check to  
103 -make sure that any content transformations don't break the rendering of  
104 -pages. Transformations that affect the content streams themselves are  
105 -off by default and are only provided to help developers look into the  
106 -contents of PDF files. If you are making deep changes to the library  
107 -that cause changes in the contents of the files that qpdf generate,  
108 -then you should enable the image comparison tests. Enable them by  
109 -running :command:`configure` with the  
110 -:samp:`--enable-test-compare-images` flag. If you enable  
111 -this, the following additional requirements are required by the test  
112 -suite. Note that in no case are these items required to use qpdf.  
113 -  
114 -- libtiff: http://www.remotesensing.org/libtiff/  
115 -  
116 -- GhostScript version 8.60 or newer: http://www.ghostscript.com  
117 -  
118 -If you do not enable this, then you do not need to have tiff and  
119 -ghostscript.  
120 -  
121 -Pre-built documentation is distributed with qpdf, so you should  
122 -generally not need to rebuild the documentation. In order to build the  
123 -documentation from source, you need to install `Sphinx  
124 -<https://sphinx-doc.org>`__. To build the PDF version of the  
125 -documentation, you need `pdflatex`, `latexmk`, and a fairly complete  
126 -LaTeX installation. Detailed requirements can be found in the Sphinx  
127 -documentation.  
128 -  
129 -.. _ref.building:  
130 -  
131 -Build Instructions  
132 -------------------  
133 -  
134 -Building qpdf on UNIX is generally just a matter of running  
135 -  
136 -::  
137 -  
138 - ./configure  
139 - make  
140 -  
141 -You can also run :command:`make check` to run the test  
142 -suite and :command:`make install` to install. Please run  
143 -:command:`./configure --help` for options on what can be  
144 -configured. You can also set the value of ``DESTDIR`` during  
145 -installation to install to a temporary location, as is common with many  
146 -open source packages. Please see also the  
147 -:file:`README.md` and  
148 -:file:`INSTALL` files in the source distribution.  
149 -  
150 -Building on Windows is a little bit more complicated. For details,  
151 -please see :file:`README-windows.md` in the source  
152 -distribution. You can also download a binary distribution for Windows.  
153 -There is a port of qpdf to Visual C++ version 6 in the  
154 -:file:`contrib` area generously contributed by Jian  
155 -Ma. This is also discussed in more detail in  
156 -:file:`README-windows.md`.  
157 -  
158 -While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one  
159 -place in the public API, and it's just in a helper function. It is  
160 -possible to build qpdf on a system that doesn't have ``wchar_t``, and  
161 -it's also possible to compile a program that uses qpdf on a system  
162 -without ``wchar_t`` as long as you don't call that one method. This is a  
163 -very unusual situation. For a detailed discussion, please see the  
164 -top-level README.md file in qpdf's source distribution.  
165 -  
166 -There are some other things you can do with the build. Although qpdf  
167 -uses :command:`autoconf`, it does not use  
168 -:command:`automake` but instead uses a  
169 -hand-crafted non-recursive Makefile that requires gnu make. If you're  
170 -really interested, please read the comments in the top-level  
171 -:file:`Makefile`.  
172 -  
173 -.. _ref.crypto:  
174 -  
175 -Crypto Providers  
176 -----------------  
177 -  
178 -Starting with qpdf 9.1.0, the qpdf library can be built with multiple  
179 -implementations of providers of cryptographic functions, which we refer  
180 -to as "crypto providers." At the time of writing, a crypto  
181 -implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes  
182 -and RC4 and AES256 with and without CBC encryption. In the future, if  
183 -digital signature is added to qpdf, there may be additional requirements  
184 -beyond this.  
185 -  
186 -Starting with qpdf version 9.1.0, the available implementations are  
187 -``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added.  
188 -Additional implementations may be added if needed. It is also possible  
189 -for a developer to provide their own implementation without modifying  
190 -the qpdf library.  
191 -  
192 -.. _ref.crypto.build:  
193 -  
194 -Build Support For Crypto Providers  
195 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
196 -  
197 -When building with qpdf's build system, crypto providers can be enabled  
198 -at build time using various :command:`./configure`  
199 -options. The default behavior is for  
200 -:command:`./configure` to discover which crypto providers  
201 -can be supported based on available external libraries, to build all  
202 -available crypto providers, and to use an external provider as the  
203 -default over the native one. This behavior can be changed with the  
204 -following flags to :command:`./configure`:  
205 -  
206 -- :samp:`--enable-crypto-{x}`  
207 - (where :samp:`{x}` is a supported crypto  
208 - provider): enable the :samp:`{x}` crypto  
209 - provider, requiring any external dependencies it needs  
210 -  
211 -- :samp:`--disable-crypto-{x}`:  
212 - disable the :samp:`{x}` provider, and do not  
213 - link against its dependencies even if they are available  
214 -  
215 -- :samp:`--with-default-crypto={x}`:  
216 - make :samp:`{x}` the default provider even if  
217 - a higher priority one is available  
218 -  
219 -- :samp:`--disable-implicit-crypto`: only build crypto  
220 - providers that are explicitly requested with an  
221 - :samp:`--enable-crypto-{x}`  
222 - option  
223 -  
224 -For example, if you want to guarantee that the gnutls crypto provider is  
225 -used and that the native provider is not built, you could run  
226 -:command:`./configure --enable-crypto-gnutls  
227 ---disable-implicit-crypto`.  
228 -  
229 -If you build qpdf using your own build system, in order for qpdf to work  
230 -at all, you need to enable at least one crypto provider. The file  
231 -:file:`libqpdf/qpdf/qpdf-config.h.in` provides  
232 -macros ``DEFAULT_CRYPTO``, whose value must be a string naming the  
233 -default crypto provider, and various symbols starting with  
234 -``USE_CRYPTO_``, at least one of which has to be enabled. Additionally,  
235 -you must compile the source files that implement a crypto provider. To  
236 -get a list of those files, look at  
237 -:file:`libqpdf/build.mk`. If you want to omit a  
238 -particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is  
239 -undefined, you can completely ignore the source files that belong to a  
240 -particular crypto provider. Additionally, crypto providers may have  
241 -their own external dependencies that can be omitted if the crypto  
242 -provider is not used. For example, if you are building qpdf yourself and  
243 -are using an environment that does not support gnutls or openssl, you  
244 -can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS``  
245 -is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then  
246 -you must include the source files used in the native implementation,  
247 -some of which were added or renamed from earlier versions, to your  
248 -build, and you can ignore  
249 -:file:`QPDFCrypto_gnutls.cc`. Always consult  
250 -:file:`libqpdf/build.mk` to get the list of source  
251 -files you need to build.  
252 -  
253 -.. _ref.crypto.runtime:  
254 -  
255 -Runtime Crypto Provider Selection  
256 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
257 -  
258 -You can use the :samp:`--show-crypto` option to  
259 -:command:`qpdf` to get a list of available crypto  
260 -providers. The default provider is always listed first, and the rest are  
261 -listed in lexical order. Each crypto provider is listed on a line by  
262 -itself with no other text, enabling the output of this command to be  
263 -used easily in scripts.  
264 -  
265 -You can override which crypto provider is used by setting the  
266 -``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to  
267 -ever do this, but you might want to do it if you were explicitly trying  
268 -to compare behavior of two different crypto providers while testing  
269 -performance or reproducing a bug. It could also be useful for people who  
270 -are implementing their own crypto providers.  
271 -  
272 -.. _ref.crypto.develop:  
273 -  
274 -Crypto Provider Information for Developers  
275 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
276 -  
277 -If you are writing code that uses libqpdf and you want to force a  
278 -certain crypto provider to be used, you can call the method  
279 -``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of  
280 -a built-in or developer-supplied provider. To add your own crypto  
281 -provider, you have to create a class derived from ``QPDFCryptoImpl`` and  
282 -register it with ``QPDFCryptoProvider``. For additional information, see  
283 -comments in :file:`include/qpdf/QPDFCryptoImpl.hh`.  
284 -  
285 -.. _ref.crypto.design:  
286 -  
287 -Crypto Provider Design Notes  
288 -~~~~~~~~~~~~~~~~~~~~~~~~~~~~  
289 -  
290 -This section describes a few bits of rationale for why the crypto  
291 -provider interface was set up the way it was. You don't need to know any  
292 -of this information, but it's provided for the record and in case it's  
293 -interesting.  
294 -  
295 -As a general rule, I want to avoid as much as possible including large  
296 -blocks of code that are conditionally compiled such that, in most  
297 -builds, some code is never built. This is dangerous because it makes it  
298 -very easy for invalid code to creep in unnoticed. As such, I want it to  
299 -be possible to build qpdf with all available crypto providers, and this  
300 -is the way I build qpdf for local development. At the same time, if a  
301 -particular packager feels that it is a security liability for qpdf to  
302 -use crypto functionality from other than a library that gets  
303 -considerable scrutiny for this specific purpose (such as gnutls,  
304 -openssl, or nettle), then I want to give that packager the ability to  
305 -completely disable qpdf's native implementation. Or if someone wants to  
306 -avoid adding a dependency on one of the external crypto providers, I  
307 -don't want the availability of the provider to impose additional  
308 -external dependencies within that environment. Both of these are  
309 -situations that I know to be true for some users of qpdf.  
310 -  
311 -I want registration and selection of crypto providers to be thread-safe,  
312 -and I want it to work deterministically for a developer to provide their  
313 -own crypto provider and be able to set it up as the default. This was  
314 -the primary motivation behind requiring C++-11 as doing so enabled me to  
315 -exploit the guaranteed thread safety of local block static  
316 -initialization. The ``QPDFCryptoProvider`` class uses a singleton  
317 -pattern with thread-safe initialization to create the singleton instance  
318 -of ``QPDFCryptoProvider`` and exposes only static methods in its public  
319 -interface. In this way, if a developer wants to call any  
320 -``QPDFCryptoProvider`` methods, the library guarantees the  
321 -``QPDFCryptoProvider`` is fully initialized and all built-in crypto  
322 -providers are registered. Making ``QPDFCryptoProvider`` actually know  
323 -about all the built-in providers may seem a bit sad at first, but this  
324 -choice makes it extremely clear exactly what the initialization behavior  
325 -is. There's no question about provider implementations automatically  
326 -registering themselves in a nondeterministic order. It also means that  
327 -implementations do not need to know anything about the provider  
328 -interface, which makes them easier to test in isolation. Another  
329 -advantage of this approach is that a developer who wants to develop  
330 -their own crypto provider can do so in complete isolation from the qpdf  
331 -library and, with just two calls, can make qpdf use their provider in  
332 -their application. If they decided to contribute their code, plugging it  
333 -into the qpdf library would require a very small change to qpdf's source  
334 -code.  
335 -  
336 -The decision to make the crypto provider selectable at runtime was one I  
337 -struggled with a little, but I decided to do it for various reasons.  
338 -Allowing an end user to switch crypto providers easily could be very  
339 -useful for reproducing a potential bug. If a user reports a bug that  
340 -some cryptographic thing is broken, I can easily ask that person to try  
341 -with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The  
342 -same could apply in the event of a performance problem. This also makes  
343 -it easier for qpdf's own test suite to exercise code with different  
344 -providers without having to make every program that links with qpdf  
345 -aware of the possibility of multiple providers. In qpdf's continuous  
346 -integration environment, the entire test suite is run for each supported  
347 -crypto provider. This is made simple by being able to select the  
348 -provider using an environment variable.  
349 -  
350 -Finally, making crypto providers selectable in this way establish a  
351 -pattern that I may follow again in the future for stream filter  
352 -providers. One could imagine a future enhancement where someone could  
353 -provide their own implementations for basic filters like  
354 -``/FlateDecode`` or for other filters that qpdf doesn't support.  
355 -Implementing the registration functions and internal storage of  
356 -registered providers was also easier using C++-11's functional  
357 -interfaces, which was another reason to require C++-11 at this time.  
358 -  
359 -.. _ref.packaging:  
360 -  
361 -Notes for Packagers  
362 --------------------  
363 -  
364 -If you are packaging qpdf for an operating system distribution, here are  
365 -some things you may want to keep in mind:  
366 -  
367 -- Starting in qpdf version 9.1.1, qpdf no longer has a runtime  
368 - dependency on perl. This is because fix-qdf was rewritten in C++.  
369 - However, qpdf still has a build-time dependency on perl.  
370 -  
371 -- Make sure you are getting the intended behavior with regard to crypto  
372 - providers. Read :ref:`ref.crypto.build` for details.  
373 -  
374 -- Passing :samp:`--enable-show-failed-test-output` to  
375 - :command:`./configure` will cause any failed test  
376 - output to be written to the console. This can be very useful for  
377 - seeing test failures generated by autobuilders where you can't access  
378 - qtest.log after the fact.  
379 -  
380 -- If qpdf's build environment detects the presence of autoconf and  
381 - related tools, it will check to ensure that automatically generated  
382 - files are up-to-date with recorded checksums and fail if it detects a  
383 - discrepancy. This feature is intended to prevent you from  
384 - accidentally forgetting to regenerate automatic files after modifying  
385 - their sources. If your packaging environment automatically refreshes  
386 - automatic files, it can cause this check to fail. Suppress qpdf's  
387 - checks by passing :samp:`--disable-check-autofiles`  
388 - to :command:`/.configure`. This is safe since qpdf's  
389 - :command:`autogen.sh` just runs autotools in the  
390 - normal way.  
391 -  
392 -- QPDF's :command:`make install` does not install  
393 - completion files by default, but as a packager, it's good if you  
394 - install them wherever your distribution expects such files to go. You  
395 - can find completion files to install in the  
396 - :file:`completions` directory.  
397 -  
398 -- Packagers are encouraged to install the source files from the  
399 - :file:`examples` directory along with qpdf  
400 - development packages.  
401 -  
402 -.. _ref.using:  
403 -  
404 -Running QPDF  
405 -============  
406 -  
407 -This chapter describes how to run the qpdf program from the command  
408 -line.  
409 -  
410 -.. _ref.invocation:  
411 -  
412 -Basic Invocation  
413 -----------------  
414 -  
415 -When running qpdf, the basic invocation is as follows:  
416 -  
417 -::  
418 -  
419 - qpdf [ options ] { infilename | --empty } outfilename  
420 -  
421 -This converts PDF file :samp:`infilename` to PDF file  
422 -:samp:`outfilename`. The output file is functionally  
423 -identical to the input file but may have been structurally reorganized.  
424 -Also, orphaned objects will be removed from the file. Many  
425 -transformations are available as controlled by the options below. In  
426 -place of :samp:`infilename`, the parameter  
427 -:samp:`--empty` may be specified. This causes qpdf to  
428 -use a dummy input file that contains zero pages. The only normal use  
429 -case for using :samp:`--empty` would be if you were  
430 -going to add pages from another source, as discussed in :ref:`ref.page-selection`.  
431 -  
432 -If :samp:`@filename` appears as a word anywhere in the  
433 -command-line, it will be read line by line, and each line will be  
434 -treated as a command-line argument. Leading and trailing whitespace is  
435 -intentionally not removed from lines, which makes it possible to handle  
436 -arguments that start or end with spaces. The :samp:`@-`  
437 -option allows arguments to be read from standard input. This allows qpdf  
438 -to be invoked with an arbitrary number of arbitrarily long arguments. It  
439 -is also very useful for avoiding having to pass passwords on the command  
440 -line. Note that the :samp:`@filename` can't appear in  
441 -the middle of an argument, so constructs such as  
442 -:samp:`--arg=@option` will not work. You would have to  
443 -include the argument and its options together in the arguments file.  
444 -  
445 -:samp:`outfilename` does not have to be seekable, even  
446 -when generating linearized files. Specifying ":samp:`-`"  
447 -as :samp:`outfilename` means to write to standard  
448 -output. If you want to overwrite the input file with the output, use the  
449 -option :samp:`--replace-input` and omit the output file  
450 -name. You can't specify the same file as both the input and the output.  
451 -If you do this, qpdf will tell you about the  
452 -:samp:`--replace-input` option.  
453 -  
454 -Most options require an output file, but some testing or inspection  
455 -commands do not. These are specifically noted.  
456 -  
457 -.. _ref.exit-status:  
458 -  
459 -Exit Status  
460 -~~~~~~~~~~~  
461 -  
462 -The exit status of :command:`qpdf` may be interpreted as  
463 -follows:  
464 -  
465 -- ``0``: no errors or warnings were found. The file may still have  
466 - problems qpdf can't detect. If  
467 - :samp:`--warning-exit-0` was specified, exit status 0  
468 - is used even if there are warnings.  
469 -  
470 -- ``2``: errors were found. qpdf was not able to fully process the  
471 - file.  
472 -  
473 -- ``3``: qpdf encountered problems that it was able to recover from. In  
474 - some cases, the resulting file may still be damaged. Note that qpdf  
475 - still exits with status ``3`` if it finds warnings even when  
476 - :samp:`--no-warn` is specified. With  
477 - :samp:`--warning-exit-0`, warnings without errors  
478 - exit with status 0 instead of 3.  
479 -  
480 -Note that :command:`qpdf` never exists with status ``1``.  
481 -If you get an exit status of ``1``, it was something else, like the  
482 -shell not being able to find or execute :command:`qpdf`.  
483 -  
484 -.. _ref.shell-completion:  
485 -  
486 -Shell Completion  
487 -----------------  
488 -  
489 -Starting in qpdf version 8.3.0, qpdf provides its own completion support  
490 -for zsh and bash. You can enable bash completion with :command:`eval  
491 -$(qpdf --completion-bash)` and zsh completion with  
492 -:command:`eval $(qpdf --completion-zsh)`. If  
493 -:command:`qpdf` is not in your path, you should invoke it  
494 -above with an absolute path. If you invoke it with a relative path, it  
495 -will warn you, and the completion won't work if you're in a different  
496 -directory.  
497 -  
498 -qpdf will use ``argv[0]`` to figure out where its executable is. This  
499 -may produce unwanted results in some cases, especially if you are trying  
500 -to use completion with copy of qpdf that is built from source. You can  
501 -specify a full path to the qpdf you want to use for completion in the  
502 -``QPDF_EXECUTABLE`` environment variable.  
503 -  
504 -.. _ref.basic-options:  
505 -  
506 -Basic Options  
507 --------------  
508 -  
509 -The following options are the most common ones and perform commonly  
510 -needed transformations.  
511 -  
512 -:samp:`--help`  
513 - Display command-line invocation help.  
514 -  
515 -:samp:`--version`  
516 - Display the current version of qpdf.  
517 -  
518 -:samp:`--copyright`  
519 - Show detailed copyright information.  
520 -  
521 -:samp:`--show-crypto`  
522 - Show a list of available crypto providers, each on a line by itself.  
523 - The default provider is always listed first. See :ref:`ref.crypto` for more information about crypto  
524 - providers.  
525 -  
526 -:samp:`--completion-bash`  
527 - Output a completion command you can eval to enable shell completion  
528 - from bash.  
529 -  
530 -:samp:`--completion-zsh`  
531 - Output a completion command you can eval to enable shell completion  
532 - from zsh.  
533 -  
534 -:samp:`--password={password}`  
535 - Specifies a password for accessing encrypted files. To read the  
536 - password from a file or standard input, you can use  
537 - :samp:`--password-file`, added in qpdf 10.2. Note  
538 - that you can also use :samp:`@filename` or  
539 - :samp:`@-` as described above to put the password in  
540 - a file or pass it via standard input, but you would do so by  
541 - specifying the entire  
542 - :samp:`--password={password}`  
543 - option in the file. Syntax such as  
544 - :samp:`--password=@filename` won't work since  
545 - :samp:`@filename` is not recognized in the middle of  
546 - an argument.  
547 -  
548 -:samp:`--password-file={filename}`  
549 - Reads the first line from the specified file and uses it as the  
550 - password for accessing encrypted files.  
551 - :samp:`{filename}`  
552 - may be ``-`` to read the password from standard input. Note that, in  
553 - this case, the password is echoed and there is no prompt, so use with  
554 - caution.  
555 -  
556 -:samp:`--is-encrypted`  
557 - Silently exit with status 0 if the file is encrypted or status 2 if  
558 - the file is not encrypted. This is useful for shell scripts. Other  
559 - options are ignored if this is given. This option is mutually  
560 - exclusive with :samp:`--requires-password`. Both this  
561 - option and :samp:`--requires-password` exit with  
562 - status 2 for non-encrypted files.  
563 -  
564 -:samp:`--requires-password`  
565 - Silently exit with status 0 if a password (other than as supplied) is  
566 - required. Exit with status 2 if the file is not encrypted. Exit with  
567 - status 3 if the file is encrypted but requires no password or the  
568 - correct password has been supplied. This is useful for shell scripts.  
569 - Note that any supplied password is used when opening the file. When  
570 - used with a :samp:`--password` option, this option  
571 - can be used to check the correctness of the password. In that case,  
572 - an exit status of 3 means the file works with the supplied password.  
573 - This option is mutually exclusive with  
574 - :samp:`--is-encrypted`. Both this option and  
575 - :samp:`--is-encrypted` exit with status 2 for  
576 - non-encrypted files.  
577 -  
578 -:samp:`--verbose`  
579 - Increase verbosity of output. For now, this just prints some  
580 - indication of any file that it creates.  
581 -  
582 -:samp:`--progress`  
583 - Indicate progress while writing files.  
584 -  
585 -:samp:`--no-warn`  
586 - Suppress writing of warnings to stderr. If warnings were detected and  
587 - suppressed, :command:`qpdf` will still exit with exit  
588 - code 3. See also :samp:`--warning-exit-0`.  
589 -  
590 -:samp:`--warning-exit-0`  
591 - If warnings are found but no errors, exit with exit code 0 instead 3.  
592 - When combined with :samp:`--no-warn`, the effect is  
593 - for :command:`qpdf` to completely ignore warnings.  
594 -  
595 -:samp:`--linearize`  
596 - Causes generation of a linearized (web-optimized) output file.  
597 -  
598 -:samp:`--replace-input`  
599 - If specified, the output file name should be omitted. This option  
600 - tells qpdf to replace the input file with the output. It does this by  
601 - writing to  
602 - :file:`{infilename}.~qpdf-temp#`  
603 - and, when done, overwriting the input file with the temporary file.  
604 - If there were any warnings, the original input is saved as  
605 - :file:`{infilename}.~qpdf-orig`.  
606 -  
607 -:samp:`--copy-encryption=file`  
608 - Encrypt the file using the same encryption parameters, including user  
609 - and owner password, as the specified file. Use  
610 - :samp:`--encryption-file-password` to specify a  
611 - password if one is needed to open this file. Note that copying the  
612 - encryption parameters from a file also copies the first half of  
613 - ``/ID`` from the file since this is part of the encryption  
614 - parameters.  
615 -  
616 -:samp:`--encryption-file-password=password`  
617 - If the file specified with :samp:`--copy-encryption`  
618 - requires a password, specify the password using this option. Note  
619 - that only one of the user or owner password is required. Both  
620 - passwords will be preserved since QPDF does not distinguish between  
621 - the two passwords. It is possible to preserve encryption parameters,  
622 - including the owner password, from a file even if you don't know the  
623 - file's owner password.  
624 -  
625 -:samp:`--allow-weak-crypto`  
626 - Starting with version 10.4, qpdf issues warnings when requested to  
627 - create files using RC4 encryption. This option suppresses those  
628 - warnings. In future versions of qpdf, qpdf will refuse to create  
629 - files with weak cryptography when this flag is not given. See :ref:`ref.weak-crypto` for additional details.  
630 -  
631 -:samp:`--encrypt options --`  
632 - Causes generation an encrypted output file. Please see :ref:`ref.encryption-options` for details on how to specify  
633 - encryption parameters.  
634 -  
635 -:samp:`--decrypt`  
636 - Removes any encryption on the file. A password must be supplied if  
637 - the file is password protected.  
638 -  
639 -:samp:`--password-is-hex-key`  
640 - Overrides the usual computation/retrieval of the PDF file's  
641 - encryption key from user/owner password with an explicit  
642 - specification of the encryption key. When this option is specified,  
643 - the argument to the :samp:`--password` option is  
644 - interpreted as a hexadecimal-encoded key value. This only applies to  
645 - the password used to open the main input file. It does not apply to  
646 - other files opened by :samp:`--pages` or other  
647 - options or to files being written.  
648 -  
649 - Most users will never have a need for this option, and no standard  
650 - viewers support this mode of operation, but it can be useful for  
651 - forensic or investigatory purposes. For example, if a PDF file is  
652 - encrypted with an unknown password, a brute-force attack using the  
653 - key directly is sometimes more efficient than one using the password.  
654 - Also, if a file is heavily damaged, it may be possible to derive the  
655 - encryption key and recover parts of the file using it directly. To  
656 - expose the encryption key used by an encrypted file that you can open  
657 - normally, use the :samp:`--show-encryption-key`  
658 - option.  
659 -  
660 -:samp:`--suppress-password-recovery`  
661 - Ordinarily, qpdf attempts to automatically compensate for passwords  
662 - specified in the wrong character encoding. This option suppresses  
663 - that behavior. Under normal conditions, there are no reasons to use  
664 - this option. See :ref:`ref.unicode-passwords` for a  
665 - discussion  
666 -  
667 -:samp:`--password-mode={mode}`  
668 - This option can be used to fine-tune how qpdf interprets Unicode  
669 - (non-ASCII) password strings passed on the command line. With the  
670 - exception of the :samp:`hex-bytes` mode, these only  
671 - apply to passwords provided when encrypting files. The  
672 - :samp:`hex-bytes` mode also applies to passwords  
673 - specified for reading files. For additional discussion of the  
674 - supported password modes and when you might want to use them, see  
675 - :ref:`ref.unicode-passwords`. The following modes  
676 - are supported:  
677 -  
678 - - :samp:`auto`: Automatically determine whether the  
679 - specified password is a properly encoded Unicode (UTF-8) string,  
680 - and transcode it as required by the PDF spec based on the type  
681 - encryption being applied. On Windows starting with version 8.4.0,  
682 - and on almost all other modern platforms, incoming passwords will  
683 - be properly encoded in UTF-8, so this is almost always what you  
684 - want.  
685 -  
686 - - :samp:`unicode`: Tells qpdf that the incoming  
687 - password is UTF-8, overriding whatever its automatic detection  
688 - determines. The only difference between this mode and  
689 - :samp:`auto` is that qpdf will fail with an error  
690 - message if the password is not valid UTF-8 instead of falling back  
691 - to :samp:`bytes` mode with a warning.  
692 -  
693 - - :samp:`bytes`: Interpret the password as a literal  
694 - byte string. For non-Windows platforms, this is what versions of  
695 - qpdf prior to 8.4.0 did. For Windows platforms, there is no way to  
696 - specify strings of binary data on the command line directly, but  
697 - you can use the :samp:`@filename` option to do it,  
698 - in which case this option forces qpdf to respect the string of  
699 - bytes as provided. This option will allow you to encrypt PDF files  
700 - with passwords that will not be usable by other readers.  
701 -  
702 - - :samp:`hex-bytes`: Interpret the password as a  
703 - hex-encoded string. This provides a way to pass binary data as a  
704 - password on all platforms including Windows. As with  
705 - :samp:`bytes`, this option may allow creation of  
706 - files that can't be opened by other readers. This mode affects  
707 - qpdf's interpretation of passwords specified for decrypting files  
708 - as well as for encrypting them. It makes it possible to specify  
709 - strings that are encoded in some manner other than the system's  
710 - default encoding.  
711 -  
712 -:samp:`--rotate=[+|-]angle[:page-range]`  
713 - Apply rotation to specified pages. The  
714 - :samp:`page-range` portion of the option value has  
715 - the same format as page ranges in :ref:`ref.page-selection`. If the page range is omitted, the  
716 - rotation is applied to all pages. The :samp:`angle`  
717 - portion of the parameter may be either 0, 90, 180, or 270. If  
718 - preceded by :samp:`+` or :samp:`-`,  
719 - the angle is added to or subtracted from the specified pages'  
720 - original rotations. This is almost always what you want. Otherwise  
721 - the pages' rotations are set to the exact value, which may cause the  
722 - appearances of the pages to be inconsistent, especially for scans.  
723 - For example, the command :command:`qpdf in.pdf out.pdf  
724 - --rotate=+90:2,4,6 --rotate=180:7-8` would rotate pages  
725 - 2, 4, and 6 90 degrees clockwise from their original rotation and  
726 - force the rotation of pages 7 through 8 to 180 degrees regardless of  
727 - their original rotation, and the command :command:`qpdf in.pdf  
728 - out.pdf --rotate=+180` would rotate all pages by 180  
729 - degrees.  
730 -  
731 -:samp:`--keep-files-open={[yn]}`  
732 - This option controls whether qpdf keeps individual files open while  
733 - merging. Prior to version 8.1.0, qpdf always kept all files open, but  
734 - this meant that the number of files that could be merged was limited  
735 - by the operating system's open file limit. Version 8.1.0 opened files  
736 - as they were referenced and closed them after each read, but this  
737 - caused a major performance impact. Version 8.2.0 optimized the  
738 - performance but did so in a way that, for local file systems, there  
739 - was a small but unavoidable performance hit, but for networked file  
740 - systems, the performance impact could be very high. Starting with  
741 - version 8.2.1, the default behavior is that files are kept open if no  
742 - more than 200 files are specified, but this default behavior can be  
743 - explicitly overridden with the  
744 - :samp:`--keep-files-open` flag. If you are merging  
745 - more than 200 files but less than the operating system's max open  
746 - files limit, you may want to use  
747 - :samp:`--keep-files-open=y`, especially if working  
748 - over a networked file system. If you are using a local file system  
749 - where the overhead is low and you might sometimes merge more than the  
750 - OS limit's number of files from a script and are not worried about a  
751 - few seconds additional processing time, you may want to specify  
752 - :samp:`--keep-files-open=n`. The threshold for  
753 - switching may be changed from the default 200 with the  
754 - :samp:`--keep-files-open-threshold` option.  
755 -  
756 -:samp:`--keep-files-open-threshold={count}`  
757 - If specified, overrides the default value of 200 used as the  
758 - threshold for qpdf deciding whether or not to keep files open. See  
759 - :samp:`--keep-files-open` for details.  
760 -  
761 -:samp:`--pages options --`  
762 - Select specific pages from one or more input files. See :ref:`ref.page-selection` for details on how to do  
763 - page selection (splitting and merging).  
764 -  
765 -:samp:`--collate={n}`  
766 - When specified, collate rather than concatenate pages from files  
767 - specified with :samp:`--pages`. With a numeric  
768 - argument, collate in groups of :samp:`{n}`.  
769 - The default is 1. See :ref:`ref.page-selection` for additional details.  
770 -  
771 -:samp:`--flatten-rotation`  
772 - For each page that is rotated using the ``/Rotate`` key in the page's  
773 - dictionary, remove the ``/Rotate`` key and implement the identical  
774 - rotation semantics by modifying the page's contents. This option can  
775 - be useful to prepare files for buggy PDF applications that don't  
776 - properly handle rotated pages.  
777 -  
778 -:samp:`--split-pages=[n]`  
779 - Write each group of :samp:`n` pages to a separate  
780 - output file. If :samp:`n` is not specified, create  
781 - single pages. Output file names are generated as follows:  
782 -  
783 - - If the string ``%d`` appears in the output file name, it is  
784 - replaced with a range of zero-padded page numbers starting from 1.  
785 -  
786 - - Otherwise, if the output file name ends in  
787 - :file:`.pdf` (case insensitive), a zero-padded  
788 - page range, preceded by a dash, is inserted before the file  
789 - extension.  
790 -  
791 - - Otherwise, the file name is appended with a zero-padded page range  
792 - preceded by a dash.  
793 -  
794 - Page ranges are a single number in the case of single-page groups or  
795 - two numbers separated by a dash otherwise. For example, if  
796 - :file:`infile.pdf` has 12 pages  
797 -  
798 - - :command:`qpdf --split-pages infile.pdf %d-out`  
799 - would generate files :file:`01-out` through  
800 - :file:`12-out`  
801 -  
802 - - :command:`qpdf --split-pages=2 infile.pdf  
803 - outfile.pdf` would generate files  
804 - :file:`outfile-01-02.pdf` through  
805 - :file:`outfile-11-12.pdf`  
806 -  
807 - - :command:`qpdf --split-pages infile.pdf  
808 - something.else` would generate files  
809 - :file:`something.else-01` through  
810 - :file:`something.else-12`  
811 -  
812 - Note that outlines, threads, and other global features of the  
813 - original PDF file are not preserved. For each page of output, this  
814 - option creates an empty PDF and copies a single page from the output  
815 - into it. If you require the global data, you will have to run  
816 - :command:`qpdf` with the  
817 - :samp:`--pages` option once for each file. Using  
818 - :samp:`--split-pages` is much faster if you don't  
819 - require the global data.  
820 -  
821 -:samp:`--overlay options --`  
822 - Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on  
823 - overlay/underlay.  
824 -  
825 -:samp:`--underlay options --`  
826 - Overlay pages from another file onto the output pages. See :ref:`ref.overlay-underlay` for details on  
827 - overlay/underlay.  
828 -  
829 -Password-protected files may be opened by specifying a password. By  
830 -default, qpdf will preserve any encryption data associated with a file.  
831 -If :samp:`--decrypt` is specified, qpdf will attempt to  
832 -remove any encryption information. If :samp:`--encrypt`  
833 -is specified, qpdf will replace the document's encryption parameters  
834 -with whatever is specified.  
835 -  
836 -Note that qpdf does not obey encryption restrictions already imposed on  
837 -the file. Doing so would be meaningless since qpdf can be used to remove  
838 -encryption from the file entirely. This functionality is not intended to  
839 -be used for bypassing copyright restrictions or other restrictions  
840 -placed on files by their producers.  
841 -  
842 -Prior to 8.4.0, in the case of passwords that contain characters that  
843 -fall outside of 7-bit US-ASCII, qpdf left the burden of supplying  
844 -properly encoded encryption and decryption passwords to the user.  
845 -Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For  
846 -an in-depth discussion, please see :ref:`ref.unicode-passwords`. Previous versions of this manual  
847 -described workarounds using the :command:`iconv` command.  
848 -Such workarounds are no longer required or recommended with qpdf 8.4.0.  
849 -However, for backward compatibility, qpdf attempts to detect those  
850 -workarounds and do the right thing in most cases.  
851 -  
852 -.. _ref.encryption-options:  
853 -  
854 -Encryption Options  
855 -------------------  
856 -  
857 -To change the encryption parameters of a file, use the --encrypt flag.  
858 -The syntax is  
859 -  
860 -::  
861 -  
862 - --encrypt user-password owner-password key-length [ restrictions ] --  
863 -  
864 -Note that ":samp:`--`" terminates parsing of encryption  
865 -flags and must be present even if no restrictions are present.  
866 -  
867 -Either or both of the user password and the owner password may be empty  
868 -strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation  
869 -of PDF files with a non-empty user password, an empty owner password,  
870 -and a 256-bit key since such files can be opened with no password. If  
871 -you want to create such files, specify the encryption option  
872 -:samp:`--allow-insecure`, as described below.  
873 -  
874 -The value for  
875 -:samp:`{key-length}` may  
876 -be 40, 128, or 256. The restriction flags are dependent upon key length.  
877 -When no additional restrictions are given, the default is to be fully  
878 -permissive.  
879 -  
880 -If :samp:`{key-length}`  
881 -is 40, the following restriction options are available:  
882 -  
883 -:samp:`--print=[yn]`  
884 - Determines whether or not to allow printing.  
885 -  
886 -:samp:`--modify=[yn]`  
887 - Determines whether or not to allow document modification.  
888 -  
889 -:samp:`--extract=[yn]`  
890 - Determines whether or not to allow text/image extraction.  
891 -  
892 -:samp:`--annotate=[yn]`  
893 - Determines whether or not to allow comments and form fill-in and  
894 - signing.  
895 -  
896 -If :samp:`{key-length}`  
897 -is 128, the following restriction options are available:  
898 -  
899 -:samp:`--accessibility=[yn]`  
900 - Determines whether or not to allow accessibility to visually  
901 - impaired. The qpdf library disregards this field when AES is used or  
902 - when 256-bit encryption is used. You should really never disable  
903 - accessibility, but qpdf lets you do it in case you need to configure  
904 - a file this way for testing purposes. The PDF spec says that  
905 - conforming readers should disregard this permission and always allow  
906 - accessibility.  
907 -  
908 -:samp:`--extract=[yn]`  
909 - Determines whether or not to allow text/graphic extraction.  
910 -  
911 -:samp:`--assemble=[yn]`  
912 - Determines whether document assembly (rotation and reordering of  
913 - pages) is allowed.  
914 -  
915 -:samp:`--annotate=[yn]`  
916 - Determines whether modifying annotations is allowed. This includes  
917 - adding comments and filling in form fields. Also allows editing of  
918 - form fields if :samp:`--modify-other=y` is given.  
919 -  
920 -:samp:`--form=[yn]`  
921 - Determines whether filling form fields is allowed.  
922 -  
923 -:samp:`--modify-other=[yn]`  
924 - Allow all document editing except those controlled separately by the  
925 - :samp:`--assemble`,  
926 - :samp:`--annotate`, and  
927 - :samp:`--form` options.  
928 -  
929 -:samp:`--print={print-opt}`  
930 - Controls printing access.  
931 - :samp:`{print-opt}`  
932 - may be one of the following:  
933 -  
934 - - :samp:`full`: allow full printing  
935 -  
936 - - :samp:`low`: allow low-resolution printing only  
937 -  
938 - - :samp:`none`: disallow printing  
939 -  
940 -:samp:`--modify={modify-opt}`  
941 - Controls modify access. This way of controlling modify access has  
942 - less granularity than new options added in qpdf 8.4.  
943 - :samp:`{modify-opt}`  
944 - may be one of the following:  
945 -  
946 - - :samp:`all`: allow full document modification  
947 -  
948 - - :samp:`annotate`: allow comment authoring, form  
949 - operations, and document assembly  
950 -  
951 - - :samp:`form`: allow form field fill-in and signing  
952 - and document assembly  
953 -  
954 - - :samp:`assembly`: allow document assembly only  
955 -  
956 - - :samp:`none`: allow no modifications  
957 -  
958 - Using the :samp:`--modify` option does not allow you  
959 - to create certain combinations of permissions such as allowing form  
960 - filling but not allowing document assembly. Starting with qpdf 8.4,  
961 - you can either just use the other options to control fields  
962 - individually, or you can use something like :samp:`--modify=form  
963 - --assembly=n` to fine tune.  
964 -  
965 -:samp:`--cleartext-metadata`  
966 - If specified, any metadata stream in the document will be left  
967 - unencrypted even if the rest of the document is encrypted. This also  
968 - forces the PDF version to be at least 1.5.  
969 -  
970 -:samp:`--use-aes=[yn]`  
971 - If :samp:`--use-aes=y` is specified, AES encryption  
972 - will be used instead of RC4 encryption. This forces the PDF version  
973 - to be at least 1.6.  
974 -  
975 -:samp:`--allow-insecure`  
976 - From qpdf 10.2, qpdf defaults to not allowing creation of PDF files  
977 - where the user password is non-empty, the owner password is empty,  
978 - and a 256-bit key is in use. Files created in this way are insecure  
979 - since they can be opened without a password. Users would ordinarily  
980 - never want to create such files. If you are using qpdf to  
981 - intentionally created strange files for testing (a definite valid use  
982 - of qpdf!), this option allows you to create such insecure files.  
983 -  
984 -:samp:`--force-V4`  
985 - Use of this option forces the ``/V`` and ``/R`` parameters in the  
986 - document's encryption dictionary to be set to the value ``4``. As  
987 - qpdf will automatically do this when required, there is no reason to  
988 - ever use this option. It exists primarily for use in testing qpdf  
989 - itself. This option also forces the PDF version to be at least 1.5.  
990 -  
991 -If :samp:`{key-length}`  
992 -is 256, the minimum PDF version is 1.7 with extension level 8, and the  
993 -AES-based encryption format used is the PDF 2.0 encryption method  
994 -supported by Acrobat X. the same options are available as with 128 bits  
995 -with the following exceptions:  
996 -  
997 -:samp:`--use-aes`  
998 - This option is not available with 256-bit keys. AES is always used  
999 - with 256-bit encryption keys.  
1000 -  
1001 -:samp:`--force-V4`  
1002 - This option is not available with 256 keys.  
1003 -  
1004 -:samp:`--force-R5`  
1005 - If specified, qpdf sets the minimum version to 1.7 at extension level  
1006 - 3 and writes the deprecated encryption format used by Acrobat version  
1007 - IX. This option should not be used in practice to generate PDF files  
1008 - that will be in general use, but it can be useful to generate files  
1009 - if you are trying to test proper support in another application for  
1010 - PDF files encrypted in this way.  
1011 -  
1012 -The default for each permission option is to be fully permissive.  
1013 -  
1014 -.. _ref.page-selection:  
1015 -  
1016 -Page Selection Options  
1017 -----------------------  
1018 -  
1019 -Starting with qpdf 3.0, it is possible to split and merge PDF files by  
1020 -selecting pages from one or more input files. Whatever file is given as  
1021 -the primary input file is used as the starting point, but its pages are  
1022 -replaced with pages as specified.  
1023 -  
1024 -::  
1025 -  
1026 - --pages input-file [ --password=password ] [ page-range ] [ ... ] --  
1027 -  
1028 -Multiple input files may be specified. Each one is given as the name of  
1029 -the input file, an optional password (if required to open the file), and  
1030 -the range of pages. Note that ":samp:`--`" terminates  
1031 -parsing of page selection flags.  
1032 -  
1033 -Starting with qpf 8.4, the special input file name  
1034 -":file:`.`" can be used as a shortcut for the  
1035 -primary input filename.  
1036 -  
1037 -For each file that pages should be taken from, specify the file, a  
1038 -password needed to open the file (if any), and a page range. The  
1039 -password needs to be given only once per file. If any of the input files  
1040 -are the same as the primary input file or the file used to copy  
1041 -encryption parameters (if specified), you do not need to repeat the  
1042 -password here. The same file can be repeated multiple times. If a file  
1043 -that is repeated has a password, the password only has to be given the  
1044 -first time. All non-page data (info, outlines, page numbers, etc.) are  
1045 -taken from the primary input file. To discard these, use  
1046 -:samp:`--empty` as the primary input.  
1047 -  
1048 -Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf  
1049 -sees a value in the place where it expects a page range and that value  
1050 -is not a valid range but is a valid file name, qpdf will implicitly use  
1051 -the range ``1-z``, meaning that it will include all pages in the file.  
1052 -This makes it possible to easily combine all pages in a set of files  
1053 -with a command like :command:`qpdf --empty out.pdf --pages \*.pdf  
1054 ---`.  
1055 -  
1056 -The page range is a set of numbers separated by commas, ranges of  
1057 -numbers separated dashes, or combinations of those. The character "z"  
1058 -represents the last page. A number preceded by an "r" indicates to count  
1059 -from the end, so ``r3-r1`` would be the last three pages of the  
1060 -document. Pages can appear in any order. Ranges can appear with a high  
1061 -number followed by a low number, which causes the pages to appear in  
1062 -reverse. Numbers may be repeated in a page range. A page range may be  
1063 -optionally appended with ``:even`` or ``:odd`` to indicate only the even  
1064 -or odd pages in the given range. Note that even and odd refer to the  
1065 -positions within the specified, range, not whether the original number  
1066 -is even or odd.  
1067 -  
1068 -Example page ranges:  
1069 -  
1070 -- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in  
1071 - that order.  
1072 -  
1073 -- ``z-1``: all pages in the document in reverse  
1074 -  
1075 -- ``r3-r1``: the last three pages of the document  
1076 -  
1077 -- ``r1-r3``: the last three pages of the document in reverse order  
1078 -  
1079 -- ``1-20:even``: even pages from 2 to 20  
1080 -  
1081 -- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd  
1082 - positions from among the original range, which represents pages 5, 7,  
1083 - 8, 9, and 12.  
1084 -  
1085 -Starting in qpdf version 8.3, you can specify the  
1086 -:samp:`--collate` option. Note that this option is  
1087 -specified outside of :samp:`--pagesย ...ย --`. When  
1088 -:samp:`--collate` is specified, it changes the meaning  
1089 -of :samp:`--pages` so that the specified files, as  
1090 -modified by page ranges, are collated rather than concatenated. For  
1091 -example, if you add the files :file:`odd.pdf` and  
1092 -:file:`even.pdf` containing odd and even pages of a  
1093 -document respectively, you could run :command:`qpdf --collate odd.pdf  
1094 ---pages odd.pdf even.pdf -- all.pdf` to collate the pages.  
1095 -This would pick page 1 from odd, page 1 from even, page 2 from odd, page  
1096 -2 from even, etc. until all pages have been included. Any number of  
1097 -files and page ranges can be specified. If any file has fewer pages,  
1098 -that file is just skipped when its pages have all been included. For  
1099 -example, if you ran :command:`qpdf --collate --empty --pages a.pdf  
1100 -1-5 b.pdf 6-4 c.pdf r1 -- out.pdf`, you would get the  
1101 -following pages in this order:  
1102 -  
1103 -- a.pdf page 1  
1104 -  
1105 -- b.pdf page 6  
1106 -  
1107 -- c.pdf last page  
1108 -  
1109 -- a.pdf page 2  
1110 -  
1111 -- b.pdf page 5  
1112 -  
1113 -- a.pdf page 3  
1114 -  
1115 -- b.pdf page 4  
1116 -  
1117 -- a.pdf page 4  
1118 -  
1119 -- a.pdf page 5  
1120 -  
1121 -Starting in qpdf version 10.2, you may specify a numeric argument to  
1122 -:samp:`--collate`. With  
1123 -:samp:`--collate={n}`,  
1124 -pull groups of :samp:`{n}` pages from each file,  
1125 -again, stopping when there are no more pages. For example, if you ran  
1126 -:command:`qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf  
1127 -r1 -- out.pdf`, you would get the following pages in this  
1128 -order:  
1129 -  
1130 -- a.pdf page 1  
1131 -  
1132 -- a.pdf page 2  
1133 -  
1134 -- b.pdf page 6  
1135 -  
1136 -- b.pdf page 5  
1137 -  
1138 -- c.pdf last page  
1139 -  
1140 -- a.pdf page 3  
1141 -  
1142 -- a.pdf page 4  
1143 -  
1144 -- b.pdf page 4  
1145 -  
1146 -- a.pdf page 5  
1147 -  
1148 -Starting in qpdf version 8.3, when you split and merge files, any page  
1149 -labels (page numbers) are preserved in the final file. It is expected  
1150 -that more document features will be preserved by splitting and merging.  
1151 -In the mean time, semantics of splitting and merging vary across  
1152 -features. For example, the document's outlines (bookmarks) point to  
1153 -actual page objects, so if you select some pages and not others,  
1154 -bookmarks that point to pages that are in the output file will work, and  
1155 -remaining bookmarks will not work. A future version of  
1156 -:command:`qpdf` may do a better job at handling these  
1157 -issues. (Note that the qpdf library already contains all of the APIs  
1158 -required in order to implement this in your own application if you need  
1159 -it.) In the mean time, you can always use  
1160 -:samp:`--empty` as the primary input file to avoid  
1161 -copying all of that from the first file. For example, to take pages 1  
1162 -through 5 from a :file:`infile.pdf` while preserving  
1163 -all metadata associated with that file, you could use  
1164 -  
1165 -::  
1166 -  
1167 - qpdf infile.pdf --pages . 1-5 -- outfile.pdf  
1168 -  
1169 -If you wanted pages 1 through 5 from  
1170 -:file:`infile.pdf` but you wanted the rest of the  
1171 -metadata to be dropped, you could instead run  
1172 -  
1173 -::  
1174 -  
1175 - qpdf --empty --pages infile.pdf 1-5 -- outfile.pdf  
1176 -  
1177 -If you wanted to take pages 1 through 5 from  
1178 -:file:`file1.pdf` and pages 11 through 15 from  
1179 -:file:`file2.pdf` in reverse, taking document-level  
1180 -metadata from :file:`file2.pdf`, you would run  
1181 -  
1182 -::  
1183 -  
1184 - qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf  
1185 -  
1186 -If, for some reason, you wanted to take the first page of an encrypted  
1187 -file called :file:`encrypted.pdf` with password  
1188 -``pass`` and repeat it twice in an output file, and if you wanted to  
1189 -drop document-level metadata but preserve encryption, you would use  
1190 -  
1191 -::  
1192 -  
1193 - qpdf --empty --copy-encryption=encrypted.pdf --encryption-file-password=pass  
1194 - --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --  
1195 - outfile.pdf  
1196 -  
1197 -Note that we had to specify the password all three times because giving  
1198 -a password as :samp:`--encryption-file-password` doesn't  
1199 -count for page selection, and as far as qpdf is concerned,  
1200 -:file:`encrypted.pdf` and  
1201 -:file:`./encrypted.pdf` are separated files. These  
1202 -are all corner cases that most users should hopefully never have to be  
1203 -bothered with.  
1204 -  
1205 -Prior to version 8.4, it was not possible to specify the same page from  
1206 -the same file directly more than once, and the workaround of specifying  
1207 -the same file in more than one way was required. Version 8.4 removes  
1208 -this limitation, but there is still a valid use case. When you specify  
1209 -the same page from the same file more than once, qpdf will share objects  
1210 -between the pages. If you are going to do further manipulation on the  
1211 -file and need the two instances of the same original page to be deep  
1212 -copies, then you can specify the file in two different ways. For example  
1213 -:command:`qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf`  
1214 -would create a file with two copies of the first page of the input, and  
1215 -the two copies would share any objects in common. This includes fonts,  
1216 -images, and anything else the page references.  
1217 -  
1218 -.. _ref.overlay-underlay:  
1219 -  
1220 -Overlay and Underlay Options  
1221 -----------------------------  
1222 -  
1223 -Starting with qpdf 8.4, it is possible to overlay or underlay pages from  
1224 -other files onto the output generated by qpdf. Specify overlay or  
1225 -underlay as follows:  
1226 -  
1227 -::  
1228 -  
1229 - { --overlay | --underlay } file [ options ] --  
1230 -  
1231 -Overlay and underlay options are processed late, so they can be combined  
1232 -with other like merging and will apply to the final output. The  
1233 -:samp:`--overlay` and :samp:`--underlay`  
1234 -options work the same way, except underlay pages are drawn underneath  
1235 -the page to which they are applied, possibly obscured by the original  
1236 -page, and overlay files are drawn on top of the page to which they are  
1237 -applied, possibly obscuring the page. You can combine overlay and  
1238 -underlay.  
1239 -  
1240 -The default behavior of overlay and underlay is that pages are taken  
1241 -from the overlay/underlay file in sequence and applied to corresponding  
1242 -pages in the output until there are no more output pages. If the overlay  
1243 -or underlay file runs out of pages, remaining output pages are left  
1244 -alone. This behavior can be modified by options, which are provided  
1245 -between the :samp:`--overlay` or  
1246 -:samp:`--underlay` flag and the  
1247 -:samp:`--` option. The following options are supported:  
1248 -  
1249 -- :samp:`--password=password`: supply a password if the  
1250 - overlay/underlay file is encrypted.  
1251 -  
1252 -- :samp:`--to=page-range`: a range of pages in the same  
1253 - form at described in :ref:`ref.page-selection`  
1254 - indicates which pages in the output should have the overlay/underlay  
1255 - applied. If not specified, overlay/underlay are applied to all pages.  
1256 -  
1257 -- :samp:`--from=[page-range]`: a range of pages that  
1258 - specifies which pages in the overlay/underlay file will be used for  
1259 - overlay or underlay. If not specified, all pages will be used. This  
1260 - can be explicitly specified to be empty if  
1261 - :samp:`--repeat` is used.  
1262 -  
1263 -- :samp:`--repeat=page-range`: an optional range of  
1264 - pages that specifies which pages in the overlay/underlay file will be  
1265 - repeated after the "from" pages are used up. If you want to repeat a  
1266 - range of pages starting at the beginning, you can explicitly use  
1267 - :samp:`--from=`.  
1268 -  
1269 -Here are some examples.  
1270 -  
1271 -- :command:`--overlay o.pdf --to=1-5 --from=1-3 --repeat=4  
1272 - --`: overlay the first three pages from file  
1273 - :file:`o.pdf` onto the first three pages of the  
1274 - output, then overlay page 4 from :file:`o.pdf`  
1275 - onto pages 4 and 5 of the output. Leave remaining output pages  
1276 - untouched.  
1277 -  
1278 -- :command:`--underlay footer.pdf --from= --repeat=1,2  
1279 - --`: Underlay page 1 of  
1280 - :file:`footer.pdf` on all odd output pages, and  
1281 - underlay page 2 of :file:`footer.pdf` on all even  
1282 - output pages.  
1283 -  
1284 -.. _ref.attachments:  
1285 -  
1286 -Embedded Files/Attachments Options  
1287 -----------------------------------  
1288 -  
1289 -Starting with qpdf 10.2, you can work with file attachments in PDF files  
1290 -from the command line. The following options are available:  
1291 -  
1292 -:samp:`--list-attachments`  
1293 - Show the "key" and stream number for embedded files. With  
1294 - :samp:`--verbose`, additional information, including  
1295 - preferred file name, description, dates, and more are also displayed.  
1296 - The key is usually but not always equal to the file name, and is  
1297 - needed by some of the other options.  
1298 -  
1299 -:samp:`--show-attachment={key}`  
1300 - Write the contents of the specified attachment to standard output as  
1301 - binary data. The key should match one of the keys shown by  
1302 - :samp:`--list-attachments`. If specified multiple  
1303 - times, only the last attachment will be shown.  
1304 -  
1305 -:samp:`--add-attachment {file} {options} --`  
1306 - Add or replace an attachment with the contents of  
1307 - :samp:`{file}`. This may be specified more  
1308 - than once. The following additional options may appear before the  
1309 - ``--`` that ends this option:  
1310 -  
1311 - :samp:`--key={key}`  
1312 - The key to use to register the attachment in the embedded files  
1313 - table. Defaults to the last path element of  
1314 - :samp:`{file}`.  
1315 -  
1316 - :samp:`--filename={name}`  
1317 - The file name to be used for the attachment. This is what is  
1318 - usually displayed to the user and is the name most graphical PDF  
1319 - viewers will use when saving a file. It defaults to the last path  
1320 - element of :samp:`{file}`.  
1321 -  
1322 - :samp:`--creationdate={date}`  
1323 - The attachment's creation date in PDF format; defaults to the  
1324 - current time. The date format is explained below.  
1325 -  
1326 - :samp:`--moddate={date}`  
1327 - The attachment's modification date in PDF format; defaults to the  
1328 - current time. The date format is explained below.  
1329 -  
1330 - :samp:`--mimetype={type/subtype}`  
1331 - The mime type for the attachment, e.g. ``text/plain`` or  
1332 - ``application/pdf``. Note that the mimetype appears in a field  
1333 - called ``/Subtype`` in the PDF but actually includes the full type  
1334 - and subtype of the mime type.  
1335 -  
1336 - :samp:`--description={"text"}`  
1337 - Descriptive text for the attachment, displayed by some PDF  
1338 - viewers.  
1339 -  
1340 - :samp:`--replace`  
1341 - Indicates that any existing attachment with the same key should be  
1342 - replaced by the new attachment. Otherwise,  
1343 - :command:`qpdf` gives an error if an attachment  
1344 - with that key is already present.  
1345 -  
1346 -:samp:`--remove-attachment={key}`  
1347 - Remove the specified attachment. This doesn't only remove the  
1348 - attachment from the embedded files table but also clears out the file  
1349 - specification. That means that any potential internal links to the  
1350 - attachment will be broken. This option may be specified multiple  
1351 - times. Run with :samp:`--verbose` to see status of  
1352 - the removal.  
1353 -  
1354 -:samp:`--copy-attachments-from {file} {options} --`  
1355 - Copy attachments from another file. This may be specified more than  
1356 - once. The following additional options may appear before the ``--``  
1357 - that ends this option:  
1358 -  
1359 - :samp:`--password={password}`  
1360 - If required, the password needed to open  
1361 - :samp:`{file}`  
1362 -  
1363 - :samp:`--prefix={prefix}`  
1364 - Only required if the file from which attachments are being copied  
1365 - has attachments with keys that conflict with attachments already  
1366 - in the file. In this case, the specified prefix will be prepended  
1367 - to each key. This affects only the key in the embedded files  
1368 - table, not the file name. The PDF specification doesn't preclude  
1369 - multiple attachments having the same file name.  
1370 -  
1371 -When a date is required, the date should conform to the PDF date format  
1372 -specification, which is  
1373 -``D:``\ :samp:`{yyyymmddhhmmss<z>}`, where  
1374 -:samp:`{<z>}` is either ``Z`` for UTC or a  
1375 -timezone offset in the form :samp:`{-hh'mm'}` or  
1376 -:samp:`{+hh'mm'}`. Examples:  
1377 -``D:20210207161528-05'00'``, ``D:20210207211528Z``.  
1378 -  
1379 -.. _ref.advanced-parsing:  
1380 -  
1381 -Advanced Parsing Options  
1382 -------------------------  
1383 -  
1384 -These options control aspects of how qpdf reads PDF files. Mostly these  
1385 -are of use to people who are working with damaged files. There is little  
1386 -reason to use these options unless you are trying to solve specific  
1387 -problems. The following options are available:  
1388 -  
1389 -:samp:`--suppress-recovery`  
1390 - Prevents qpdf from attempting to recover damaged files.  
1391 -  
1392 -:samp:`--ignore-xref-streams`  
1393 - Tells qpdf to ignore any cross-reference streams.  
1394 -  
1395 -Ordinarily, qpdf will attempt to recover from certain types of errors in  
1396 -PDF files. These include errors in the cross-reference table, certain  
1397 -types of object numbering errors, and certain types of stream length  
1398 -errors. Sometimes, qpdf may think it has recovered but may not have  
1399 -actually recovered, so care should be taken when using this option as  
1400 -some data loss is possible. The  
1401 -:samp:`--suppress-recovery` option will prevent qpdf  
1402 -from attempting recovery. In this case, it will fail on the first error  
1403 -that it encounters.  
1404 -  
1405 -Ordinarily, qpdf reads cross-reference streams when they are present in  
1406 -a PDF file. If :samp:`--ignore-xref-streams` is  
1407 -specified, qpdf will ignore any cross-reference streams for hybrid PDF  
1408 -files. The purpose of hybrid files is to make some content available to  
1409 -viewers that are not aware of cross-reference streams. It is almost  
1410 -never desirable to ignore them. The only time when you might want to use  
1411 -this feature is if you are testing creation of hybrid PDF files and wish  
1412 -to see how a PDF consumer that doesn't understand object and  
1413 -cross-reference streams would interpret such a file.  
1414 -  
1415 -.. _ref.advanced-transformation:  
1416 -  
1417 -Advanced Transformation Options  
1418 --------------------------------  
1419 -  
1420 -These transformation options control fine points of how qpdf creates the  
1421 -output file. Mostly these are of use only to people who are very  
1422 -familiar with the PDF file format or who are PDF developers. The  
1423 -following options are available:  
1424 -  
1425 -:samp:`--compress-streams={[yn]}`  
1426 - By default, or with :samp:`--compress-streams=y`,  
1427 - qpdf will compress any stream with no other filters applied to it  
1428 - with the ``/FlateDecode`` filter when it writes it. To suppress this  
1429 - behavior and preserve uncompressed streams as uncompressed, use  
1430 - :samp:`--compress-streams=n`.  
1431 -  
1432 -:samp:`--decode-level={option}`  
1433 - Controls which streams qpdf tries to decode. The default is  
1434 - :samp:`generalized`. The following options are  
1435 - available:  
1436 -  
1437 - - :samp:`none`: do not attempt to decode any streams  
1438 -  
1439 - - :samp:`generalized`: decode streams filtered with  
1440 - supported generalized filters: ``/LZWDecode``, ``/FlateDecode``,  
1441 - ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized  
1442 - filters as those to be used for general-purpose compression or  
1443 - encoding, as opposed to filters specifically designed for image  
1444 - data. Note that, by default, streams already compressed with  
1445 - ``/FlateDecode`` are not uncompressed and recompressed unless you  
1446 - also specify :samp:`--recompress-flate`.  
1447 -  
1448 - - :samp:`specialized`: in addition to generalized,  
1449 - decode streams with supported non-lossy specialized filters;  
1450 - currently this is just ``/RunLengthDecode``  
1451 -  
1452 - - :samp:`all`: in addition to generalized and  
1453 - specialized, decode streams with supported lossy filters;  
1454 - currently this is just ``/DCTDecode`` (JPEG)  
1455 -  
1456 -:samp:`--stream-data={option}`  
1457 - Controls transformation of stream data. This option predates the  
1458 - :samp:`--compress-streams` and  
1459 - :samp:`--decode-level` options. Those options can be  
1460 - used to achieve the same affect with more control. The value of  
1461 - :samp:`{option}` may  
1462 - be one of the following:  
1463 -  
1464 - - :samp:`compress`: recompress stream data when  
1465 - possible (default); equivalent to  
1466 - :samp:`--compress-streams=y`  
1467 - :samp:`--decode-level=generalized`. Does not  
1468 - recompress streams already compressed with ``/FlateDecode`` unless  
1469 - :samp:`--recompress-flate` is also specified.  
1470 -  
1471 - - :samp:`preserve`: leave all stream data as is;  
1472 - equivalent to :samp:`--compress-streams=n`  
1473 - :samp:`--decode-level=none`  
1474 -  
1475 - - :samp:`uncompress`: uncompress stream data  
1476 - compressed with generalized filters when possible; equivalent to  
1477 - :samp:`--compress-streams=n`  
1478 - :samp:`--decode-level=generalized`  
1479 -  
1480 -:samp:`--recompress-flate`  
1481 - By default, streams already compressed with ``/FlateDecode`` are left  
1482 - alone rather than being uncompressed and recompressed. This option  
1483 - causes qpdf to uncompress and recompress the streams. There is a  
1484 - significant performance cost to using this option, but you probably  
1485 - want to use it if you specify  
1486 - :samp:`--compression-level`.  
1487 -  
1488 -:samp:`--compression-level={level}`  
1489 - When writing new streams that are compressed with ``/FlateDecode``,  
1490 - use the specified compression level. The value of  
1491 - :samp:`level` should be a number from 1 to 9 and is  
1492 - passed directly to zlib, which implements deflate compression. Note  
1493 - that qpdf doesn't uncompress and recompress streams by default. To  
1494 - have this option apply to already compressed streams, you should also  
1495 - specify :samp:`--recompress-flate`. If your goal is  
1496 - to shrink the size of PDF files, you should also use  
1497 - :samp:`--object-streams=generate`.  
1498 -  
1499 -:samp:`--normalize-content=[yn]`  
1500 - Enables or disables normalization of content streams. Content  
1501 - normalization is enabled by default in QDF mode. Please see :ref:`ref.qdf` for additional discussion of QDF mode.  
1502 -  
1503 -:samp:`--object-streams={mode}`  
1504 - Controls handling of object streams. The value of  
1505 - :samp:`{mode}` may be  
1506 - one of the following:  
1507 -  
1508 - - :samp:`preserve`: preserve original object streams  
1509 - (default)  
1510 -  
1511 - - :samp:`disable`: don't write any object streams  
1512 -  
1513 - - :samp:`generate`: use object streams wherever  
1514 - possible  
1515 -  
1516 -:samp:`--preserve-unreferenced`  
1517 - Tells qpdf to preserve objects that are not referenced when writing  
1518 - the file. Ordinarily any object that is not referenced in a traversal  
1519 - of the document from the trailer dictionary will be discarded. This  
1520 - may be useful in working with some damaged files or inspecting files  
1521 - with known unreferenced objects.  
1522 -  
1523 - This flag is ignored for linearized files and has the effect of  
1524 - causing objects in the new file to be written in order by object ID  
1525 - from the original file. This does not mean that object numbers will  
1526 - be the same since qpdf may create stream lengths as direct or  
1527 - indirect differently from the original file, and the original file  
1528 - may have gaps in its numbering.  
1529 -  
1530 - See also :samp:`--preserve-unreferenced-resources`,  
1531 - which does something completely different.  
1532 -  
1533 -:samp:`--remove-unreferenced-resources={option}`  
1534 - The :samp:`{option}` may be ``auto``,  
1535 - ``yes``, or ``no``. The default is ``auto``.  
1536 -  
1537 - Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt  
1538 - to remove images and fonts that are not used by a page even if they  
1539 - are referenced in the page's resources dictionary. When shared  
1540 - resources are in use, this behavior can greatly reduce the file sizes  
1541 - of split pages, but the analysis is very slow. In versions from 8.1  
1542 - through 9.1.1, qpdf did this analysis by default. Starting in qpdf  
1543 - 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file  
1544 - to determine whether the file is likely to have unreferenced objects  
1545 - on pages, a pattern that frequently occurs when resource dictionaries  
1546 - are shared across multiple pages and rarely occurs otherwise. If it  
1547 - discovers this pattern, then it will attempt to remove unreferenced  
1548 - resources. Usually this means you get the slower splitting speed only  
1549 - when it's actually going to create smaller files. You can suppress  
1550 - removal of unreferenced resources altogether by specifying ``no`` or  
1551 - force it to do the full algorithm by specifying ``yes``.  
1552 -  
1553 - Other than cases in which you don't care about file size and care a  
1554 - lot about runtime, there are few reasons to use this option,  
1555 - especially now that ``auto`` mode is supported. One reason to use  
1556 - this is if you suspect that qpdf is removing resources it shouldn't  
1557 - be removing. If you encounter that case, please report it as bug at  
1558 - https://github.com/qpdf/qpdf/issues/.  
1559 -  
1560 -:samp:`--preserve-unreferenced-resources`  
1561 - This is a synonym for  
1562 - :samp:`--remove-unreferenced-resources=no`.  
1563 -  
1564 - See also :samp:`--preserve-unreferenced`, which does  
1565 - something completely different.  
1566 -  
1567 -:samp:`--newline-before-endstream`  
1568 - Tells qpdf to insert a newline before the ``endstream`` keyword, not  
1569 - counted in the length, after any stream content even if the last  
1570 - character of the stream was a newline. This may result in two  
1571 - newlines in some cases. This is a requirement of PDF/A. While qpdf  
1572 - doesn't specifically know how to generate PDF/A-compliant PDFs, this  
1573 - at least prevents it from removing compliance on already compliant  
1574 - files.  
1575 -  
1576 -:samp:`--linearize-pass1={file}`  
1577 - Write the first pass of linearization to the named file. The  
1578 - resulting file is not a valid PDF file. This option is useful only  
1579 - for debugging ``QPDFWriter``'s linearization code. When qpdf  
1580 - linearizes files, it writes the file in two passes, using the first  
1581 - pass to calculate sizes and offsets that are required for hint tables  
1582 - and the linearization dictionary. Ordinarily, the first pass is  
1583 - discarded. This option enables it to be captured.  
1584 -  
1585 -:samp:`--coalesce-contents`  
1586 - When a page's contents are split across multiple streams, this option  
1587 - causes qpdf to combine them into a single stream. Use of this option  
1588 - is never necessary for ordinary usage, but it can help when working  
1589 - with some files in some cases. For example, this can also be combined  
1590 - with QDF mode or content normalization to make it easier to look at  
1591 - all of a page's contents at once.  
1592 -  
1593 -:samp:`--flatten-annotations={option}`  
1594 - This option collapses annotations into the pages' contents with  
1595 - special handling for form fields. Ordinarily, an annotation is  
1596 - rendered separately and on top of the page. Combining annotations  
1597 - into the page's contents effectively freezes the placement of the  
1598 - annotations, making them look right after various page  
1599 - transformations. The library functionality backing this option was  
1600 - added for the benefit of programs that want to create *n-up* page  
1601 - layouts and other similar things that don't work well with  
1602 - annotations. The :samp:`{option}` parameter  
1603 - may be any of the following:  
1604 -  
1605 - - :samp:`all`: include all annotations that are not  
1606 - marked invisible or hidden  
1607 -  
1608 - - :samp:`print`: only include annotations that  
1609 - indicate that they should appear when the page is printed  
1610 -  
1611 - - :samp:`screen`: omit annotations that indicate  
1612 - they should not appear on the screen  
1613 -  
1614 - Note that form fields are special because the annotations that are  
1615 - used to render filled-in form fields may become out of date from the  
1616 - fields' values if the form is filled in by a program that doesn't  
1617 - know how to update the appearances. If qpdf detects this case, its  
1618 - default behavior is not to flatten those annotations because doing so  
1619 - would cause the value of the form field to be lost. This gives you a  
1620 - chance to go back and resave the form with a program that knows how  
1621 - to generate appearances. QPDF itself can generate appearances with  
1622 - some limitations. See the  
1623 - :samp:`--generate-appearances` option below.  
1624 -  
1625 -:samp:`--generate-appearances`  
1626 - If a file contains interactive form fields and indicates that the  
1627 - appearances are out of date with the values of the form, this flag  
1628 - will regenerate appearances, subject to a few limitations. Note that  
1629 - there is not usually a reason to do this, but it can be necessary  
1630 - before using the :samp:`--flatten-annotations`  
1631 - option. Most of these are not a problem with well-behaved PDF files.  
1632 - The limitations are as follows:  
1633 -  
1634 - - Radio button and checkbox appearances use the pre-set values in  
1635 - the PDF file. QPDF just makes sure that the correct appearance is  
1636 - displayed based on the value of the field. This is fine for PDF  
1637 - files that create their forms properly. Some PDF writers save  
1638 - appearances for fields when they change, which could cause some  
1639 - controls to have inconsistent appearances.  
1640 -  
1641 - - For text fields and list boxes, any characters that fall outside  
1642 - of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman"  
1643 - encoding, will be replaced by the ``?`` character.  
1644 -  
1645 - - Quadding is ignored. Quadding is used to specify whether the  
1646 - contents of a field should be left, center, or right aligned with  
1647 - the field.  
1648 -  
1649 - - Rich text, multi-line, and other more elaborate formatting  
1650 - directives are ignored.  
1651 -  
1652 - - There is no support for multi-select fields or signature fields.  
1653 -  
1654 - If qpdf doesn't do a good enough job with your form, use an external  
1655 - application to save your filled-in form before processing it with  
1656 - qpdf.  
1657 -  
1658 -:samp:`--optimize-images`  
1659 - This flag causes qpdf to recompress all images that are not  
1660 - compressed with DCT (JPEG) using DCT compression as long as doing so  
1661 - decreases the size in bytes of the image data and the image does not  
1662 - fall below minimum specified dimensions. Useful information is  
1663 - provided when used in combination with  
1664 - :samp:`--verbose`. See also the  
1665 - :samp:`--oi-min-width`,  
1666 - :samp:`--oi-min-height`, and  
1667 - :samp:`--oi-min-area` options. By default, starting  
1668 - in qpdf 8.4, inline images are converted to regular images and  
1669 - optimized as well. Use :samp:`--keep-inline-images`  
1670 - to prevent inline images from being included.  
1671 -  
1672 -:samp:`--oi-min-width={width}`  
1673 - Avoid optimizing images whose width is below the specified amount. If  
1674 - omitted, the default is 128 pixels. Use 0 for no minimum.  
1675 -  
1676 -:samp:`--oi-min-height={height}`  
1677 - Avoid optimizing images whose height is below the specified amount.  
1678 - If omitted, the default is 128 pixels. Use 0 for no minimum.  
1679 -  
1680 -:samp:`--oi-min-area={area-in-pixels}`  
1681 - Avoid optimizing images whose pixel count (widthย ร—ย height) is below  
1682 - the specified amount. If omitted, the default is 16,384 pixels. Use 0  
1683 - for no minimum.  
1684 -  
1685 -:samp:`--externalize-inline-images`  
1686 - Convert inline images to regular images. By default, images whose  
1687 - data is at least 1,024 bytes are converted when this option is  
1688 - selected. Use :samp:`--ii-min-bytes` to change the  
1689 - size threshold. This option is implicitly selected when  
1690 - :samp:`--optimize-images` is selected. Use  
1691 - :samp:`--keep-inline-images` to exclude inline images  
1692 - from image optimization.  
1693 -  
1694 -:samp:`--ii-min-bytes={bytes}`  
1695 - Avoid converting inline images whose size is below the specified  
1696 - minimum size to regular images. If omitted, the default is 1,024  
1697 - bytes. Use 0 for no minimum.  
1698 -  
1699 -:samp:`--keep-inline-images`  
1700 - Prevent inline images from being included in image optimization. This  
1701 - option has no affect when :samp:`--optimize-images`  
1702 - is not specified.  
1703 -  
1704 -:samp:`--remove-page-labels`  
1705 - Remove page labels from the output file.  
1706 -  
1707 -:samp:`--qdf`  
1708 - Turns on QDF mode. For additional information on QDF, please see :ref:`ref.qdf`. Note that :samp:`--linearize`  
1709 - disables QDF mode.  
1710 -  
1711 -:samp:`--min-version={version}`  
1712 - Forces the PDF version of the output file to be at least  
1713 - :samp:`{version}`. In other words, if the  
1714 - input file has a lower version than the specified version, the  
1715 - specified version will be used. If the input file has a higher  
1716 - version, the input file's original version will be used. It is seldom  
1717 - necessary to use this option since qpdf will automatically increase  
1718 - the version as needed when adding features that require newer PDF  
1719 - readers.  
1720 -  
1721 - The version number may be expressed in the form  
1722 - :samp:`{major.minor.extension-level}`, in  
1723 - which case the version is interpreted as  
1724 - :samp:`{major.minor}` at extension level  
1725 - :samp:`{extension-level}`. For example,  
1726 - version ``1.7.8`` represents version 1.7 at extension level 8. Note  
1727 - that minimal syntax checking is done on the command line.  
1728 -  
1729 -:samp:`--force-version={version}`  
1730 - This option forces the PDF version to be the exact version specified  
1731 - *even when the file may have content that is not supported in that  
1732 - version*. The version number is interpreted in the same way as with  
1733 - :samp:`--min-version` so that extension levels can be  
1734 - set. In some cases, forcing the output file's PDF version to be lower  
1735 - than that of the input file will cause qpdf to disable certain  
1736 - features of the document. Specifically, 256-bit keys are disabled if  
1737 - the version is less than 1.7 with extension level 8 (except R5 is  
1738 - disabled if less than 1.7 with extension level 3), AES encryption is  
1739 - disabled if the version is less than 1.6, cleartext metadata and  
1740 - object streams are disabled if less than 1.5, 128-bit encryption keys  
1741 - are disabled if less than 1.4, and all encryption is disabled if less  
1742 - than 1.3. Even with these precautions, qpdf won't be able to do  
1743 - things like eliminate use of newer image compression schemes,  
1744 - transparency groups, or other features that may have been added in  
1745 - more recent versions of PDF.  
1746 -  
1747 - As a general rule, with the exception of big structural things like  
1748 - the use of object streams or AES encryption, PDF viewers are supposed  
1749 - to ignore features in files that they don't support from newer  
1750 - versions. This means that forcing the version to a lower version may  
1751 - make it possible to open your PDF file with an older version, though  
1752 - bear in mind that some of the original document's functionality may  
1753 - be lost.  
1754 -  
1755 -By default, when a stream is encoded using non-lossy filters that qpdf  
1756 -understands and is not already compressed using a good compression  
1757 -scheme, qpdf will uncompress and recompress streams. Assuming proper  
1758 -filter implements, this is safe and generally results in smaller files.  
1759 -This behavior may also be explicitly requested with  
1760 -:samp:`--stream-data=compress`.  
1761 -  
1762 -When :samp:`--normalize-content=y` is specified, qpdf  
1763 -will attempt to normalize whitespace and newlines in page content  
1764 -streams. This is generally safe but could, in some cases, cause damage  
1765 -to the content streams. This option is intended for people who wish to  
1766 -study PDF content streams or to debug PDF content. You should not use  
1767 -this for "production" PDF files.  
1768 -  
1769 -When normalizing content, if qpdf runs into any lexical errors, it will  
1770 -print a warning indicating that content may be damaged. The only  
1771 -situation in which qpdf is known to cause damage during content  
1772 -normalization is when a page's contents are split across multiple  
1773 -streams and streams are split in the middle of a lexical token such as a  
1774 -string, name, or inline image. Note that files that do this are invalid  
1775 -since the PDF specification states that content streams are not to be  
1776 -split in the middle of a token. If you want to inspect the original  
1777 -content streams in an uncompressed format, you can always run with  
1778 -:samp:`--qdf --normalize-content=n` for a QDF file  
1779 -without content normalization, or alternatively  
1780 -:samp:`--stream-data=uncompress` for a regular non-QDF  
1781 -mode file with uncompressed streams. These will both uncompress all the  
1782 -streams but will not attempt to normalize content. Please note that if  
1783 -you are using content normalization or QDF mode for the purpose of  
1784 -manually inspecting files, you don't have to care about this.  
1785 -  
1786 -Object streams, also known as compressed objects, were introduced into  
1787 -the PDF specification at version 1.5, corresponding to Acrobat 6. Some  
1788 -older PDF viewers may not support files with object streams. qpdf can be  
1789 -used to transform files with object streams to files without object  
1790 -streams or vice versa. As mentioned above, there are three object stream  
1791 -modes: :samp:`preserve`,  
1792 -:samp:`disable`, and :samp:`generate`.  
1793 -  
1794 -In :samp:`preserve` mode, the relationship to objects  
1795 -and the streams that contain them is preserved from the original file.  
1796 -In :samp:`disable` mode, all objects are written as  
1797 -regular, uncompressed objects. The resulting file should be readable by  
1798 -older PDF viewers. (Of course, the content of the files may include  
1799 -features not supported by older viewers, but at least the structure will  
1800 -be supported.) In :samp:`generate` mode, qpdf will  
1801 -create its own object streams. This will usually result in more compact  
1802 -PDF files, though they may not be readable by older viewers. In this  
1803 -mode, qpdf will also make sure the PDF version number in the header is  
1804 -at least 1.5.  
1805 -  
1806 -The :samp:`--qdf` flag turns on QDF mode, which changes  
1807 -some of the defaults described above. Specifically, in QDF mode, by  
1808 -default, stream data is uncompressed, content streams are normalized,  
1809 -and encryption is removed. These defaults can still be overridden by  
1810 -specifying the appropriate options as described above. Additionally, in  
1811 -QDF mode, stream lengths are stored as indirect objects, objects are  
1812 -laid out in a less efficient but more readable fashion, and the  
1813 -documents are interspersed with comments that make it easier for the  
1814 -user to find things and also make it possible for  
1815 -:command:`fix-qdf` to work properly. QDF mode is intended  
1816 -for people, mostly developers, who wish to inspect or modify PDF files  
1817 -in a text editor. For details, please see :ref:`ref.qdf`.  
1818 -  
1819 -.. _ref.testing-options:  
1820 -  
1821 -Testing, Inspection, and Debugging Options  
1822 -------------------------------------------  
1823 -  
1824 -These options can be useful for digging into PDF files or for use in  
1825 -automated test suites for software that uses the qpdf library. When any  
1826 -of the options in this section are specified, no output file should be  
1827 -given. The following options are available:  
1828 -  
1829 -:samp:`--deterministic-id`  
1830 - Causes generation of a deterministic value for /ID. This prevents use  
1831 - of timestamp and output file name information in the /ID generation.  
1832 - Instead, at some slight additional runtime cost, the /ID field is  
1833 - generated to include a digest of the significant parts of the content  
1834 - of the output PDF file. This means that a given qpdf operation should  
1835 - generate the same /ID each time it is run, which can be useful when  
1836 - caching results or for generation of some test data. Use of this flag  
1837 - is not compatible with creation of encrypted files.  
1838 -  
1839 -:samp:`--static-id`  
1840 - Causes generation of a fixed value for /ID. This is intended for  
1841 - testing only. Never use it for production files. If you are trying to  
1842 - get the same /ID each time for a given file and you are not  
1843 - generating encrypted files, consider using the  
1844 - :samp:`--deterministic-id` option.  
1845 -  
1846 -:samp:`--static-aes-iv`  
1847 - Causes use of a static initialization vector for AES-CBC. This is  
1848 - intended for testing only so that output files can be reproducible.  
1849 - Never use it for production files. This option in particular is not  
1850 - secure since it significantly weakens the encryption.  
1851 -  
1852 -:samp:`--no-original-object-ids`  
1853 - Suppresses inclusion of original object ID comments in QDF files.  
1854 - This can be useful when generating QDF files for test purposes,  
1855 - particularly when comparing them to determine whether two PDF files  
1856 - have identical content.  
1857 -  
1858 -:samp:`--show-encryption`  
1859 - Shows document encryption parameters. Also shows the document's user  
1860 - password if the owner password is given.  
1861 -  
1862 -:samp:`--show-encryption-key`  
1863 - When encryption information is being displayed, as when  
1864 - :samp:`--check` or  
1865 - :samp:`--show-encryption` is given, display the  
1866 - computed or retrieved encryption key as a hexadecimal string. This  
1867 - value is not ordinarily useful to users, but it can be used as the  
1868 - argument to :samp:`--password` if the  
1869 - :samp:`--password-is-hex-key` is specified. Note  
1870 - that, when PDF files are encrypted, passwords and other metadata are  
1871 - used only to compute an encryption key, and the encryption key is  
1872 - what is actually used for encryption. This enables retrieval of that  
1873 - key.  
1874 -  
1875 -:samp:`--check-linearization`  
1876 - Checks file integrity and linearization status.  
1877 -  
1878 -:samp:`--show-linearization`  
1879 - Checks and displays all data in the linearization hint tables.  
1880 -  
1881 -:samp:`--show-xref`  
1882 - Shows the contents of the cross-reference table in a human-readable  
1883 - form. This is especially useful for files with cross-reference  
1884 - streams which are stored in a binary format.  
1885 -  
1886 -:samp:`--show-object=trailer|obj[,gen]`  
1887 - Show the contents of the given object. This is especially useful for  
1888 - inspecting objects that are inside of object streams (also known as  
1889 - "compressed objects").  
1890 -  
1891 -:samp:`--raw-stream-data`  
1892 - When used along with the :samp:`--show-object`  
1893 - option, if the object is a stream, shows the raw stream data instead  
1894 - of object's contents.  
1895 -  
1896 -:samp:`--filtered-stream-data`  
1897 - When used along with the :samp:`--show-object`  
1898 - option, if the object is a stream, shows the filtered stream data  
1899 - instead of object's contents. If the stream is filtered using filters  
1900 - that qpdf does not support, an error will be issued.  
1901 -  
1902 -:samp:`--show-npages`  
1903 - Prints the number of pages in the input file on a line by itself.  
1904 - Since the number of pages appears by itself on a line, this option  
1905 - can be useful for scripting if you need to know the number of pages  
1906 - in a file.  
1907 -  
1908 -:samp:`--show-pages`  
1909 - Shows the object and generation number for each page dictionary  
1910 - object and for each content stream associated with the page. Having  
1911 - this information makes it more convenient to inspect objects from a  
1912 - particular page.  
1913 -  
1914 -:samp:`--with-images`  
1915 - When used along with :samp:`--show-pages`, also shows  
1916 - the object and generation numbers for the image objects on each page.  
1917 - (At present, information about images in shared resource dictionaries  
1918 - are not output by this command. This is discussed in a comment in the  
1919 - source code.)  
1920 -  
1921 -:samp:`--json`  
1922 - Generate a JSON representation of the file. This is described in  
1923 - depth in :ref:`ref.json`  
1924 -  
1925 -:samp:`--json-help`  
1926 - Describe the format of the JSON output.  
1927 -  
1928 -:samp:`--json-key=key`  
1929 - This option is repeatable. If specified, only top-level keys  
1930 - specified will be included in the JSON output. If not specified, all  
1931 - keys will be shown.  
1932 -  
1933 -:samp:`--json-object=trailer|obj[,gen]`  
1934 - This option is repeatable. If specified, only specified objects will  
1935 - be shown in the "``objects``" key of the JSON output. If absent, all  
1936 - objects will be shown.  
1937 -  
1938 -:samp:`--check`  
1939 - Checks file structure and well as encryption, linearization, and  
1940 - encoding of stream data. A file for which  
1941 - :samp:`--check` reports no errors may still have  
1942 - errors in stream data content but should otherwise be structurally  
1943 - sound. If :samp:`--check` any errors, qpdf will exit  
1944 - with a status of 2. There are some recoverable conditions that  
1945 - :samp:`--check` detects. These are issued as warnings  
1946 - instead of errors. If qpdf finds no errors but finds warnings, it  
1947 - will exit with a status of 3 (as of versionย 2.0.4). When  
1948 - :samp:`--check` is combined with other options,  
1949 - checks are always performed before any other options are processed.  
1950 - For erroneous files, :samp:`--check` will cause qpdf  
1951 - to attempt to recover, after which other options are effectively  
1952 - operating on the recovered file. Combining  
1953 - :samp:`--check` with other options in this way can be  
1954 - useful for manually recovering severely damaged files. Note that  
1955 - :samp:`--check` produces no output to standard output  
1956 - when everything is valid, so if you are using this to  
1957 - programmatically validate files in bulk, it is safe to run without  
1958 - output redirected to :file:`/dev/null` and just  
1959 - check for a 0 exit code.  
1960 -  
1961 -The :samp:`--raw-stream-data` and  
1962 -:samp:`--filtered-stream-data` options are ignored  
1963 -unless :samp:`--show-object` is given. Either of these  
1964 -options will cause the stream data to be written to standard output. In  
1965 -order to avoid commingling of stream data with other output, it is  
1966 -recommend that these objects not be combined with other test/inspection  
1967 -options.  
1968 -  
1969 -If :samp:`--filtered-stream-data` is given and  
1970 -:samp:`--normalize-content=y` is also given, qpdf will  
1971 -attempt to normalize the stream data as if it is a page content stream.  
1972 -This attempt will be made even if it is not a page content stream, in  
1973 -which case it will produce unusable results.  
1974 -  
1975 -.. _ref.unicode-passwords:  
1976 -  
1977 -Unicode Passwords  
1978 ------------------  
1979 -  
1980 -At the library API level, all methods that perform encryption and  
1981 -decryption interpret passwords as strings of bytes. It is up to the  
1982 -caller to ensure that they are appropriately encoded. Starting with qpdf  
1983 -version 8.4.0, qpdf will attempt to make this easier for you when  
1984 -interact with qpdf via its command line interface. The PDF specification  
1985 -requires passwords used to encrypt files with 40-bit or 128-bit  
1986 -encryption to be encoded with PDF Doc encoding. This encoding is a  
1987 -single-byte encoding that supports ISO-Latin-1 and a handful of other  
1988 -commonly used characters. It has a large overlap with Windows ANSI but  
1989 -is not exactly the same. There is generally not a way to provide PDF Doc  
1990 -encoded strings on the command line. As such, qpdf versions prior to  
1991 -8.4.0 would often create PDF files that couldn't be opened with other  
1992 -software when given a password with non-ASCII characters to encrypt a  
1993 -file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf  
1994 -recognizes the encoding of the parameter and transcodes it as needed.  
1995 -The rest of this section provides the details about exactly how qpdf  
1996 -behaves. Most users will not need to know this information, but it might  
1997 -be useful if you have been working around qpdf's old behavior or if you  
1998 -are using qpdf to generate encrypted files for testing other PDF  
1999 -software.  
2000 -  
2001 -A note about Windows: when qpdf builds, it attempts to determine what it  
2002 -has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain``  
2003 -function is an alternative entry point that receives all arguments as  
2004 -UTF-16-encoded strings. When qpdf starts up this way, it converts all  
2005 -the strings to UTF-8 encoding and then invokes the regular main. This  
2006 -means that, as far as qpdf is concerned, it receives its command-line  
2007 -arguments with UTF-8 encoding, just as it would in any modern Linux or  
2008 -UNIX environment.  
2009 -  
2010 -If a file is being encrypted with 40-bit or 128-bit encryption and the  
2011 -supplied password is not a valid UTF-8 string, qpdf will fall back to  
2012 -the behavior of interpreting the password as a string of bytes. If you  
2013 -have old scripts that encrypt files by passing the output of  
2014 -:command:`iconv` to qpdf, you no longer need to do that,  
2015 -but if you do, qpdf should still work. The only exception would be for  
2016 -the extremely unlikely case of a password that is encoded with a  
2017 -single-byte encoding but also happens to be valid UTF-8. Such a password  
2018 -would contain strings of even numbers of characters that alternate  
2019 -between accented letters and symbols. In the extremely unlikely event  
2020 -that you are intentionally using such passwords and qpdf is thwarting  
2021 -you by interpreting them as UTF-8, you can use  
2022 -:samp:`--password-mode=bytes` to suppress qpdf's  
2023 -automatic behavior.  
2024 -  
2025 -The :samp:`--password-mode` option, as described earlier  
2026 -in this chapter, can be used to change qpdf's interpretation of supplied  
2027 -passwords. There are very few reasons to use this option. One would be  
2028 -the unlikely case described in the previous paragraph in which the  
2029 -supplied password happens to be valid UTF-8 but isn't supposed to be  
2030 -UTF-8. Your best bet would be just to provide the password as a valid  
2031 -UTF-8 string, but you could also use  
2032 -:samp:`--password-mode=bytes`. Another reason to use  
2033 -:samp:`--password-mode=bytes` would be to intentionally  
2034 -generate PDF files encrypted with passwords that are not properly  
2035 -encoded. The qpdf test suite does this to generate invalid files for the  
2036 -purpose of testing its password recovery capability. If you were trying  
2037 -to create intentionally incorrect files for a similar purposes, the  
2038 -:samp:`bytes` password mode can enable you to do this.  
2039 -  
2040 -When qpdf attempts to decrypt a file with a password that contains  
2041 -non-ASCII characters, it will generate a list of alternative passwords  
2042 -by attempting to interpret the password as each of a handful of  
2043 -different coding systems and then transcode them to the required format.  
2044 -This helps to compensate for the supplied password being given in the  
2045 -wrong coding system, such as would happen if you used the  
2046 -:command:`iconv` workaround that was previously needed.  
2047 -It also generates passwords by doing the reverse operation: translating  
2048 -from correct in incorrect encoding of the password. This would enable  
2049 -qpdf to decrypt files using passwords that were improperly encoded by  
2050 -whatever software encrypted the files, including older versions of qpdf  
2051 -invoked without properly encoded passwords. The combination of these two  
2052 -recovery methods should make qpdf transparently open most encrypted  
2053 -files with the password supplied correctly but in the wrong coding  
2054 -system. There are no real downsides to this behavior, but if you don't  
2055 -want qpdf to do this, you can use the  
2056 -:samp:`--suppress-password-recovery` option. One reason  
2057 -to do that is to ensure that you know the exact password that was used  
2058 -to encrypt the file.  
2059 -  
2060 -With these changes, qpdf now generates compliant passwords in most  
2061 -cases. There are still some exceptions. In particular, the PDF  
2062 -specification directs compliant writers to normalize Unicode passwords  
2063 -and to perform certain transformations on passwords with bidirectional  
2064 -text. Implementing this functionality requires using a real Unicode  
2065 -library like ICU. If a client application that uses qpdf wants to do  
2066 -this, the qpdf library will accept the resulting passwords, but qpdf  
2067 -will not perform these transformations itself. It is possible that this  
2068 -will be addressed in a future version of qpdf. The ``QPDFWriter``  
2069 -methods that enable encryption on the output file accept passwords as  
2070 -strings of bytes.  
2071 -  
2072 -Please note that the :samp:`--password-is-hex-key`  
2073 -option is unrelated to all this. This flag bypasses the normal process  
2074 -of going from password to encryption string entirely, allowing the raw  
2075 -encryption key to be specified directly. This is useful for forensic  
2076 -purposes or for brute-force recovery of files with unknown passwords.  
2077 -  
2078 -.. _ref.qdf:  
2079 -  
2080 -QDF Mode  
2081 -========  
2082 -  
2083 -In QDF mode, qpdf creates PDF files in what we call *QDF  
2084 -form*. A PDF file in QDF form, sometimes called a QDF  
2085 -file, is a completely valid PDF file that has ``%QDF-1.0`` as its third  
2086 -line (after the pdf header and binary characters) and has certain other  
2087 -characteristics. The purpose of QDF form is to make it possible to edit  
2088 -PDF files, with some restrictions, in an ordinary text editor. This can  
2089 -be very useful for experimenting with different PDF constructs or for  
2090 -making one-off edits to PDF files (though there are other reasons why  
2091 -this may not always work). Note that QDF mode does not support  
2092 -linearized files. If you enable linearization, QDF mode is automatically  
2093 -disabled.  
2094 -  
2095 -It is ordinarily very difficult to edit PDF files in a text editor for  
2096 -two reasons: most meaningful data in PDF files is compressed, and PDF  
2097 -files are full of offset and length information that makes it hard to  
2098 -add or remove data. A QDF file is organized in a manner such that, if  
2099 -edits are kept within certain constraints, the  
2100 -:command:`fix-qdf` program, distributed with qpdf, is  
2101 -able to restore edited files to a correct state. The  
2102 -:command:`fix-qdf` program takes no command-line  
2103 -arguments. It reads a possibly edited QDF file from standard input and  
2104 -writes a repaired file to standard output.  
2105 -  
2106 -The following attributes characterize a QDF file:  
2107 -  
2108 -- All objects appear in numerical order in the PDF file, including when  
2109 - objects appear in object streams.  
2110 -  
2111 -- Objects are printed in an easy-to-read format, and all line endings  
2112 - are normalized to UNIX line endings.  
2113 -  
2114 -- Unless specifically overridden, streams appear uncompressed (when  
2115 - qpdf supports the filters and they are compressed with a non-lossy  
2116 - compression scheme), and most content streams are normalized (line  
2117 - endings are converted to just a UNIX-style linefeeds).  
2118 -  
2119 -- All streams lengths are represented as indirect objects, and the  
2120 - stream length object is always the next object after the stream. If  
2121 - the stream data does not end with a newline, an extra newline is  
2122 - inserted, and a special comment appears after the stream indicating  
2123 - that this has been done.  
2124 -  
2125 -- If the PDF file contains object streams, if object stream *n*  
2126 - contains *k* objects, those objects are numbered from *n+1* through  
2127 - *n+k*, and the object number/offset pairs appear on a separate line  
2128 - for each object. Additionally, each object in the object stream is  
2129 - preceded by a comment indicating its object number and index. This  
2130 - makes it very easy to find objects in object streams.  
2131 -  
2132 -- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens,  
2133 - and ``endobj`` tokens appear on lines by themselves. A blank line  
2134 - follows every ``endobj`` token.  
2135 -  
2136 -- If there is a cross-reference stream, it is unfiltered.  
2137 -  
2138 -- Page dictionaries and page content streams are marked with special  
2139 - comments that make them easy to find.  
2140 -  
2141 -- Comments precede each object indicating the object number of the  
2142 - corresponding object in the original file.  
2143 -  
2144 -When editing a QDF file, any edits can be made as long as the above  
2145 -constraints are maintained. This means that you can freely edit a page's  
2146 -content without worrying about messing up the QDF file. It is also  
2147 -possible to add new objects so long as those objects are added after the  
2148 -last object in the file or subsequent objects are renumbered. If a QDF  
2149 -file has object streams in it, you can always add the new objects before  
2150 -the xref stream and then change the number of the xref stream, since  
2151 -nothing generally ever references it by number.  
2152 -  
2153 -It is not generally practical to remove objects from QDF files without  
2154 -messing up object numbering, but if you remove all references to an  
2155 -object, you can run qpdf on the file (after running  
2156 -:command:`fix-qdf`), and qpdf will omit the now-orphaned  
2157 -object.  
2158 -  
2159 -When :command:`fix-qdf` is run, it goes through the file  
2160 -and recomputes the following parts of the file:  
2161 -  
2162 -- the ``/N``, ``/W``, and ``/First`` keys of all object stream  
2163 - dictionaries  
2164 -  
2165 -- the pairs of numbers representing object numbers and offsets of  
2166 - objects in object streams  
2167 -  
2168 -- all stream lengths  
2169 -  
2170 -- the cross-reference table or cross-reference stream  
2171 -  
2172 -- the offset to the cross-reference table or cross-reference stream  
2173 - following the ``startxref`` token  
2174 -  
2175 -.. _ref.using-library:  
2176 -  
2177 -Using the QPDF Library  
2178 -======================  
2179 -  
2180 -.. _ref.using.from-cxx:  
2181 -  
2182 -Using QPDF from C++  
2183 --------------------  
2184 -  
2185 -The source tree for the qpdf package has an  
2186 -:file:`examples` directory that contains a few  
2187 -example programs. The :file:`qpdf/qpdf.cc` source  
2188 -file also serves as a useful example since it exercises almost all of  
2189 -the qpdf library's public interface. The best source of documentation on  
2190 -the library itself is reading comments in  
2191 -:file:`include/qpdf/QPDF.hh`,  
2192 -:file:`include/qpdf/QPDFWriter.hh`, and  
2193 -:file:`include/qpdf/QPDFObjectHandle.hh`.  
2194 -  
2195 -All header files are installed in the  
2196 -:file:`include/qpdf` directory. It is recommend that  
2197 -you use ``#include <qpdf/QPDF.hh>`` rather than adding  
2198 -:file:`include/qpdf` to your include path.  
2199 -  
2200 -When linking against the qpdf static library, you may also need to  
2201 -specify ``-lz -ljpeg`` on your link command. If your system understands  
2202 -how to read libtool :file:`.la` files, this may not  
2203 -be necessary.  
2204 -  
2205 -The qpdf library is safe to use in a multithreaded program, but no  
2206 -individual ``QPDF`` object instance (including ``QPDF``,  
2207 -``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one  
2208 -thread at a time. Multiple threads may simultaneously work with  
2209 -different instances of these and all other QPDF objects.  
2210 -  
2211 -.. _ref.using.other-languages:  
2212 -  
2213 -Using QPDF from other languages  
2214 --------------------------------  
2215 -  
2216 -The qpdf library is implemented in C++, which makes it hard to use  
2217 -directly in other languages. There are a few things that can help.  
2218 -  
2219 -"C"  
2220 - The qpdf library includes a "C" language interface that provides a  
2221 - subset of the overall capabilities. The header file  
2222 - :file:`qpdf/qpdf-c.h` includes information about  
2223 - its use. As long as you use a C++ linker, you can link C programs  
2224 - with qpdf and use the C API. For languages that can directly load  
2225 - methods from a shared library, the C API can also be useful. People  
2226 - have reported success using the C API from other languages on Windows  
2227 - by directly calling functions in the DLL.  
2228 -  
2229 -Python  
2230 - A Python module called  
2231 - `pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and  
2232 - highly functional set of Python bindings to the qpdf library. Using  
2233 - pikepdf, you can work with PDF files in a natural way and combine  
2234 - qpdf's capabilities with other functionality provided by Python's  
2235 - rich standard library and available modules.  
2236 -  
2237 -Other Languages  
2238 - Starting with version 8.3.0, the :command:`qpdf`  
2239 - command-line tool can produce a JSON representation of the PDF file's  
2240 - non-content data. This can facilitate interacting programmatically  
2241 - with PDF files through qpdf's command line interface. For more  
2242 - information, please see :ref:`ref.json`.  
2243 -  
2244 -.. _ref.unicode-files:  
2245 -  
2246 -A Note About Unicode File Names  
2247 --------------------------------  
2248 -  
2249 -When strings are passed to qpdf library routines either as ``char*`` or  
2250 -as ``std::string``, they are treated as byte arrays except where  
2251 -otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless  
2252 -otherwise noted in comments in header files. In modern UNIX/Linux  
2253 -environments, this generally does the right thing. In Windows, it's a  
2254 -bit more complicated. Starting in qpdf 8.4.0, passwords that contain  
2255 -Unicode characters are handled much better, and starting in qpdf 8.4.1,  
2256 -the library attempts to properly handle Unicode characters in filenames.  
2257 -In particular, in Windows, if a UTF-8 encoded string is used as a  
2258 -filename in either ``QPDF`` or ``QPDFWriter``, it is internally  
2259 -converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As  
2260 -such, qpdf will generally operate properly on files with non-ASCII  
2261 -characters in their names as long as the filenames are UTF-8 encoded for  
2262 -passing into the qpdf library API, but there are still some rough edges,  
2263 -such as the encoding of the filenames in error messages our CLI output  
2264 -messages. Patches or bug reports are welcome for any continuing issues  
2265 -with Unicode file names in Windows.  
2266 -  
2267 -.. _ref.weak-crypto:  
2268 -  
2269 -Weak Cryptography  
2270 -=================  
2271 -  
2272 -Start with version 10.4, qpdf is taking steps to reduce the likelihood  
2273 -of a user *accidentally* creating PDF files with insecure cryptography  
2274 -but will continue to allow creation of such files indefinitely with  
2275 -explicit acknowledgment.  
2276 -  
2277 -The PDF file format makes use of RC4, which is known to be a weak  
2278 -cryptography algorithm, and MD5, which is a weak hashing algorithm. In  
2279 -version 10.4, qpdf generates warnings for some (but not all) cases of  
2280 -writing files with weak cryptography when invoked from the command-line.  
2281 -These warnings can be suppressed using the  
2282 -:samp:`--allow-weak-crypto` option.  
2283 -  
2284 -It is planned for qpdf version 11 to be stricter, making it an error to  
2285 -write files with insecure cryptography from the command-line tool in  
2286 -most cases without specifying the  
2287 -:samp:`--allow-weak-crypto` flag and also to require  
2288 -explicit steps when using the C++ library to enable use of insecure  
2289 -cryptography.  
2290 -  
2291 -Note that qpdf must always retain support for weak cryptographic  
2292 -algorithms since this is required for reading older PDF files that use  
2293 -it. Additionally, qpdf will always retain the ability to create files  
2294 -using weak cryptographic algorithms since, as a development tool, qpdf  
2295 -explicitly supports creating older or deprecated types of PDF files  
2296 -since these are sometimes needed to test or work with older versions of  
2297 -software. Even if other cryptography libraries drop support for RC4 or  
2298 -MD5, qpdf can always fall back to its internal implementations of those  
2299 -algorithms, so they are not going to disappear from qpdf.  
2300 -  
2301 -.. _ref.json:  
2302 -  
2303 -QPDF JSON  
2304 -=========  
2305 -  
2306 -.. _ref.json-overview:  
2307 -  
2308 -Overview  
2309 ---------  
2310 -  
2311 -Beginning with qpdf version 8.3.0, the :command:`qpdf`  
2312 -command-line program can produce a JSON representation of the  
2313 -non-content data in a PDF file. It includes a dump in JSON format of all  
2314 -objects in the PDF file excluding the content of streams. This JSON  
2315 -representation makes it very easy to look in detail at the structure of  
2316 -a given PDF file, and it also provides a great way to work with PDF  
2317 -files programmatically from the command-line in languages that can't  
2318 -call or link with the qpdf library directly. Note that stream data can  
2319 -be extracted from PDF files using other qpdf command-line options.  
2320 -  
2321 -.. _ref.json-guarantees:  
2322 -  
2323 -JSON Guarantees  
2324 ----------------  
2325 -  
2326 -The qpdf JSON representation includes a JSON serialization of the raw  
2327 -objects in the PDF file as well as some computed information in a more  
2328 -easily extracted format. QPDF provides some guarantees about its JSON  
2329 -format. These guarantees are designed to simplify the experience of a  
2330 -developer working with the JSON format.  
2331 -  
2332 -Compatibility  
2333 - The top-level JSON object output is a dictionary. The JSON output  
2334 - contains various nested dictionaries and arrays. With the exception  
2335 - of dictionaries that are populated by the fields of objects from the  
2336 - file, all instances of a dictionary are guaranteed to have exactly  
2337 - the same keys. Future versions of qpdf are free to add additional  
2338 - keys but not to remove keys or change the type of object that a key  
2339 - points to. The qpdf program validates this guarantee, and in the  
2340 - unlikely event that a bug in qpdf should cause it to generate data  
2341 - that doesn't conform to this rule, it will ask you to file a bug  
2342 - report.  
2343 -  
2344 - The top-level JSON structure contains a "``version``" key whose value  
2345 - is simple integer. The value of the ``version`` key will be  
2346 - incremented if a non-compatible change is made. A non-compatible  
2347 - change would be any change that involves removal of a key, a change  
2348 - to the format of data pointed to by a key, or a semantic change that  
2349 - requires a different interpretation of a previously existing key. A  
2350 - strong effort will be made to avoid breaking compatibility.  
2351 -  
2352 -Documentation  
2353 - The :command:`qpdf` command can be invoked with the  
2354 - :samp:`--json-help` option. This will output a JSON  
2355 - structure that has the same structure as the JSON output that qpdf  
2356 - generates, except that each field in the help output is a description  
2357 - of the corresponding field in the JSON output. The specific  
2358 - guarantees are as follows:  
2359 -  
2360 - - A dictionary in the help output means that the corresponding  
2361 - location in the actual JSON output is also a dictionary with  
2362 - exactly the same keys; that is, no keys present in help are absent  
2363 - in the real output, and no keys will be present in the real output  
2364 - that are not in help. As a special case, if the dictionary has a  
2365 - single key whose name starts with ``<`` and ends with ``>``, it  
2366 - means that the JSON output is a dictionary that can have any keys,  
2367 - each of which conforms to the value of the special key. This is  
2368 - used for cases in which the keys of the dictionary are things like  
2369 - object IDs.  
2370 -  
2371 - - A string in the help output is a description of the item that  
2372 - appears in the corresponding location of the actual output. The  
2373 - corresponding output can have any format.  
2374 -  
2375 - - An array in the help output always contains a single element. It  
2376 - indicates that the corresponding location in the actual output is  
2377 - also an array, and that each element of the array has whatever  
2378 - format is implied by the single element of the help output's  
2379 - array.  
2380 -  
2381 - For example, the help output indicates includes a "``pagelabels``"  
2382 - key whose value is an array of one element. That element is a  
2383 - dictionary with keys "``index``" and "``label``". In addition to  
2384 - describing the meaning of those keys, this tells you that the actual  
2385 - JSON output will contain a ``pagelabels`` array, each of whose  
2386 - elements is a dictionary that contains an ``index`` key, a ``label``  
2387 - key, and no other keys.  
2388 -  
2389 -Directness and Simplicity  
2390 - The JSON output contains the value of every object in the file, but  
2391 - it also contains some processed data. This is analogous to how qpdf's  
2392 - library interface works. The processed data is similar to the helper  
2393 - functions in that it allows you to look at certain aspects of the PDF  
2394 - file without having to understand all the nuances of the PDF  
2395 - specification, while the raw objects allow you to mine the PDF for  
2396 - anything that the higher-level interfaces are lacking.  
2397 -  
2398 -.. _json.limitations:  
2399 -  
2400 -Limitations of JSON Representation  
2401 -----------------------------------  
2402 -  
2403 -There are a few limitations to be aware of with the JSON structure:  
2404 -  
2405 -- Strings, names, and indirect object references in the original PDF  
2406 - file are all converted to strings in the JSON representation. In the  
2407 - case of a "normal" PDF file, you can tell the difference because a  
2408 - name starts with a slash (``/``), and an indirect object reference  
2409 - looks like ``n n R``, but if there were to be a string that looked  
2410 - like a name or indirect object reference, there would be no way to  
2411 - tell this from the JSON output. Note that there are certain cases  
2412 - where you know for sure what something is, such as knowing that  
2413 - dictionary keys in objects are always names and that certain things  
2414 - in the higher-level computed data are known to contain indirect  
2415 - object references.  
2416 -  
2417 -- The JSON format doesn't support binary data very well. Mostly the  
2418 - details are not important, but they are presented here for  
2419 - information. When qpdf outputs a string in the JSON representation,  
2420 - it converts the string to UTF-8, assuming usual PDF string semantics.  
2421 - Specifically, if the original string is UTF-16, it is converted to  
2422 - UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is  
2423 - converted to UTF-8 with that assumption. This causes strange things  
2424 - to happen to binary strings. For example, if you had the binary  
2425 - string ``<038051>``, this would be output to the JSON as ``\u0003โ€ขQ``  
2426 - because ``03`` is not a printable character and ``80`` is the bullet  
2427 - character in PDF doc encoding and is mapped to the Unicode value  
2428 - ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to  
2429 - convert back from here to a binary string, would have to recognize  
2430 - Unicode values whose code points are higher than ``0xFF`` and map  
2431 - those back to their corresponding PDF doc encoding characters. There  
2432 - is no way to tell the difference between a Unicode string that was  
2433 - originally encoded as UTF-16 or one that was converted from PDF doc  
2434 - encoding. In other words, it's best if you don't try to use the JSON  
2435 - format to extract binary strings from the PDF file, but if you really  
2436 - had to, it could be done. Note that qpdf's  
2437 - :samp:`--show-object` option does not have this  
2438 - limitation and will reveal the string as encoded in the original  
2439 - file.  
2440 -  
2441 -.. _json.considerations:  
2442 -  
2443 -JSON: Special Considerations  
2444 -----------------------------  
2445 -  
2446 -For the most part, the built-in JSON help tells you everything you need  
2447 -to know about the JSON format, but there are a few non-obvious things to  
2448 -be aware of:  
2449 -  
2450 -- While qpdf guarantees that keys present in the help will be present  
2451 - in the output, those fields may be null or empty if the information  
2452 - is not known or absent in the file. Also, if you specify  
2453 - :samp:`--json-keys`, the keys that are not listed  
2454 - will be excluded entirely except for those that  
2455 - :samp:`--json-help` says are always present.  
2456 -  
2457 -- In a few places, there are keys with names containing  
2458 - ``pageposfrom1``. The values of these keys are null or an integer. If  
2459 - an integer, they point to a page index within the file numbering from  
2460 - 1. Note that JSON indexes from 0, and you would also use 0-based  
2461 - indexing using the API. However, 1-based indexing is easier in this  
2462 - case because the command-line syntax for specifying page ranges is  
2463 - 1-based. If you were going to write a program that looked through the  
2464 - JSON for information about specific pages and then use the  
2465 - command-line to extract those pages, 1-based indexing is easier.  
2466 - Besides, it's more convenient to subtract 1 from a program in a real  
2467 - programming language than it is to add 1 from shell code.  
2468 -  
2469 -- The image information included in the ``page`` section of the JSON  
2470 - output includes the key "``filterable``". Note that the value of this  
2471 - field may depend on the :samp:`--decode-level` that  
2472 - you invoke qpdf with. The JSON output includes a top-level key  
2473 - "``parameters``" that indicates the decode level used for computing  
2474 - whether a stream was filterable. For example, jpeg images will be  
2475 - shown as not filterable by default, but they will be shown as  
2476 - filterable if you run :command:`qpdf --json  
2477 - --decode-level=all`.  
2478 -  
2479 -.. _ref.design:  
2480 -  
2481 -Design and Library Notes  
2482 -========================  
2483 -  
2484 -.. _ref.design.intro:  
2485 -  
2486 -Introduction  
2487 -------------  
2488 -  
2489 -This section was written prior to the implementation of the qpdf package  
2490 -and was subsequently modified to reflect the implementation. In some  
2491 -cases, for purposes of explanation, it may differ slightly from the  
2492 -actual implementation. As always, the source code and test suite are  
2493 -authoritative. Even if there are some errors, this document should serve  
2494 -as a road map to understanding how this code works.  
2495 -  
2496 -In general, one should adhere strictly to a specification when writing  
2497 -but be liberal in reading. This way, the product of our software will be  
2498 -accepted by the widest range of other programs, and we will accept the  
2499 -widest range of input files. This library attempts to conform to that  
2500 -philosophy whenever possible but also aims to provide strict checking  
2501 -for people who want to validate PDF files. If you don't want to see  
2502 -warnings and are trying to write something that is tolerant, you can  
2503 -call ``setSuppressWarnings(true)``. If you want to fail on the first  
2504 -error, you can call ``setAttemptRecovery(false)``. The default behavior  
2505 -is to generating warnings for recoverable problems. Note that recovery  
2506 -will not always produce the desired results even if it is able to get  
2507 -through the file. Unlike most other PDF files that produce generic  
2508 -warnings such as "This file is damaged,", qpdf generally issues a  
2509 -detailed error message that would be most useful to a PDF developer.  
2510 -This is by design as there seems to be a shortage of PDF validation  
2511 -tools out there. This was, in fact, one of the major motivations behind  
2512 -the initial creation of qpdf.  
2513 -  
2514 -.. _ref.design-goals:  
2515 -  
2516 -Design Goals  
2517 -------------  
2518 -  
2519 -The QPDF package includes support for reading and rewriting PDF files.  
2520 -It aims to hide from the user details involving object locations,  
2521 -modified (appended) PDF files, the directness/indirectness of objects,  
2522 -and stream filters including encryption. It does not aim to hide  
2523 -knowledge of the object hierarchy or content stream contents. Put  
2524 -another way, a user of the qpdf library is expected to have knowledge  
2525 -about how PDF files work, but is not expected to have to keep track of  
2526 -bookkeeping details such as file positions.  
2527 -  
2528 -A user of the library never has to care whether an object is direct or  
2529 -indirect, though it is possible to determine whether an object is direct  
2530 -or not if this information is needed. All access to objects deals with  
2531 -this transparently. All memory management details are also handled by  
2532 -the library.  
2533 -  
2534 -The ``PointerHolder`` object is used internally by the library to deal  
2535 -with memory management. This is basically a smart pointer object very  
2536 -similar in spirit to C++-11's ``std::shared_ptr`` object, but predating  
2537 -it by several years. This library also makes use of a technique for  
2538 -giving fine-grained access to methods in one class to other classes by  
2539 -using public subclasses with friends and only private members that in  
2540 -turn call private methods of the containing class. See  
2541 -``QPDFObjectHandle::Factory`` as an example.  
2542 -  
2543 -The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF  
2544 -file. The library provides methods for both accessing and mutating PDF  
2545 -files.  
2546 -  
2547 -The primary class for interacting with PDF objects is  
2548 -``QPDFObjectHandle``. Instances of this class can be passed around by  
2549 -value, copied, stored in containers, etc. with very low overhead.  
2550 -Instances of ``QPDFObjectHandle`` created by reading from a file will  
2551 -always contain a reference back to the ``QPDF`` object from which they  
2552 -were created. A ``QPDFObjectHandle`` may be direct or indirect. If  
2553 -indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to  
2554 -is a null pointer. In this case, the first attempt to access the  
2555 -underlying ``QPDFObject`` will result in the ``QPDFObject`` being  
2556 -resolved via a call to the referenced ``QPDF`` instance. This makes it  
2557 -essentially impossible to make coding errors in which certain things  
2558 -will work for some PDF files and not for others based on which objects  
2559 -are direct and which objects are indirect.  
2560 -  
2561 -Instances of ``QPDFObjectHandle`` can be directly created and modified  
2562 -using static factory methods in the ``QPDFObjectHandle`` class. There  
2563 -are factory methods for each type of object as well as a convenience  
2564 -method ``QPDFObjectHandle::parse`` that creates an object from a string  
2565 -representation of the object. Existing instances of ``QPDFObjectHandle``  
2566 -can also be modified in several ways. See comments in  
2567 -:file:`QPDFObjectHandle.hh` for details.  
2568 -  
2569 -An instance of ``QPDF`` is constructed by using the class's default  
2570 -constructor. If desired, the ``QPDF`` object may be configured with  
2571 -various methods that change its default behavior. Then the  
2572 -``QPDF::processFile()`` method is passed the name of a PDF file, which  
2573 -permanently associates the file with that QPDF object. A password may  
2574 -also be given for access to password-protected files. QPDF does not  
2575 -enforce encryption parameters and will treat user and owner passwords  
2576 -equivalently. Either password may be used to access an encrypted file.  
2577 -``QPDF`` will allow recovery of a user password given an owner password.  
2578 -The input PDF file must be seekable. (Output files written by  
2579 -``QPDFWriter`` need not be seekable, even when creating linearized  
2580 -files.) During construction, ``QPDF`` validates the PDF file's header,  
2581 -and then reads the cross reference tables and trailer dictionaries. The  
2582 -``QPDF`` class keeps only the first trailer dictionary though it does  
2583 -read all of them so it can check the ``/Prev`` key. ``QPDF`` class users  
2584 -may request the root object and the trailer dictionary specifically. The  
2585 -cross reference table is kept private. Objects may then be requested by  
2586 -number of by walking the object tree.  
2587 -  
2588 -When a PDF file has a cross-reference stream instead of a  
2589 -cross-reference table and trailer, requesting the document's trailer  
2590 -dictionary returns the stream dictionary from the cross-reference stream  
2591 -instead.  
2592 -  
2593 -There are some convenience routines for very common operations such as  
2594 -walking the page tree and returning a vector of all page objects. For  
2595 -full details, please see the header files  
2596 -:file:`QPDF.hh` and  
2597 -:file:`QPDFObjectHandle.hh`. There are also some  
2598 -additional helper classes that provide higher level API functions for  
2599 -certain document constructions. These are discussed in :ref:`ref.helper-classes`.  
2600 -  
2601 -.. _ref.helper-classes:  
2602 -  
2603 -Helper Classes  
2604 ---------------  
2605 -  
2606 -QPDF version 8.1 introduced the concept of helper classes. Helper  
2607 -classes are intended to contain higher level APIs that allow developers  
2608 -to work with certain document constructs at an abstraction level above  
2609 -that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of  
2610 -not hiding document structure from the developer. As with qpdf in  
2611 -general, the goal is take away some of the more tedious bookkeeping  
2612 -aspects of working with PDF files, not to remove the need for the  
2613 -developer to understand how the PDF construction in question works. The  
2614 -driving factor behind the creation of helper classes was to allow the  
2615 -evolution of higher level interfaces in qpdf without polluting the  
2616 -interfaces of the main top-level classes ``QPDF`` and  
2617 -``QPDFObjectHandle``.  
2618 -  
2619 -There are two kinds of helper classes: *document* helpers and *object*  
2620 -helpers. Document helpers are constructed with a reference to a ``QPDF``  
2621 -object and provide methods for working with structures that are at the  
2622 -document level. Object helpers are constructed with an instance of a  
2623 -``QPDFObjectHandle`` and provide methods for working with specific types  
2624 -of objects.  
2625 -  
2626 -Examples of document helpers include ``QPDFPageDocumentHelper``, which  
2627 -contains methods for operating on the document's page trees, such as  
2628 -enumerating all pages of a document and adding and removing pages; and  
2629 -``QPDFAcroFormDocumentHelper``, which contains document-level methods  
2630 -related to interactive forms, such as enumerating form fields and  
2631 -creating mappings between form fields and annotations.  
2632 -  
2633 -Examples of object helpers include ``QPDFPageObjectHelper`` for  
2634 -performing operations on pages such as page rotation and some operations  
2635 -on content streams, ``QPDFFormFieldObjectHelper`` for performing  
2636 -operations related to interactive form fields, and  
2637 -``QPDFAnnotationObjectHelper`` for working with annotations.  
2638 -  
2639 -It is always possible to retrieve the underlying ``QPDF`` reference from  
2640 -a document helper and the underlying ``QPDFObjectHandle`` reference from  
2641 -an object helper. Helpers are designed to be helpers, not wrappers. The  
2642 -intention is that, in general, it is safe to freely intermix operations  
2643 -that use helpers with operations that use the underlying objects.  
2644 -Document and object helpers do not attempt to provide a complete  
2645 -interface for working with the things they are helping with, nor do they  
2646 -attempt to encapsulate underlying structures. They just provide a few  
2647 -methods to help with error-prone, repetitive, or complex tasks. In some  
2648 -cases, a helper object may cache some information that is expensive to  
2649 -gather. In such cases, the helper classes are implemented so that their  
2650 -own methods keep the cache consistent, and the header file will provide  
2651 -a method to invalidate the cache and a description of what kinds of  
2652 -operations would make the cache invalid. If in doubt, you can always  
2653 -discard a helper class and create a new one with the same underlying  
2654 -objects, which will ensure that you have discarded any stale  
2655 -information.  
2656 -  
2657 -By Convention, document helpers are called  
2658 -``QPDFSomethingDocumentHelper`` and are derived from  
2659 -``QPDFDocumentHelper``, and object helpers are called  
2660 -``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``.  
2661 -For details on specific helpers, please see their header files. You can  
2662 -find them by looking at  
2663 -:file:`include/qpdf/QPDF*DocumentHelper.hh` and  
2664 -:file:`include/qpdf/QPDF*ObjectHelper.hh`.  
2665 -  
2666 -In order to avoid creation of circular dependencies, the following  
2667 -general guidelines are followed with helper classes:  
2668 -  
2669 -- Core class interfaces do not know about helper classes. For example,  
2670 - no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper  
2671 - classes in their interfaces.  
2672 -  
2673 -- Interfaces of object helpers will usually not use document helpers in  
2674 - their interfaces. This is because it is much more useful for document  
2675 - helpers to have methods that return object helpers. Most operations  
2676 - in PDF files start at the document level and go from there to the  
2677 - object level rather than the other way around. It can sometimes be  
2678 - useful to map back from object-level structures to document-level  
2679 - structures. If there is a desire to do this, it will generally be  
2680 - provided by a method in the document helper class.  
2681 -  
2682 -- Most of the time, object helpers don't know about other object  
2683 - helpers. However, in some cases, one type of object may be a  
2684 - container for another type of object, in which case it may make sense  
2685 - for the outer object to know about the inner object. For example,  
2686 - there are methods in the ``QPDFPageObjectHelper`` that know  
2687 - ``QPDFAnnotationObjectHelper`` because references to annotations are  
2688 - contained in page dictionaries.  
2689 -  
2690 -- Any helper or core library class may use helpers in their  
2691 - implementations.  
2692 -  
2693 -Prior to qpdf version 8.1, higher level interfaces were added as  
2694 -"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For  
2695 -compatibility, older convenience functions for operating with pages will  
2696 -remain in those classes even as alternatives are provided in helper  
2697 -classes. Going forward, new higher level interfaces will be provided  
2698 -using helper classes.  
2699 -  
2700 -.. _ref.implementation-notes:  
2701 -  
2702 -Implementation Notes  
2703 ---------------------  
2704 -  
2705 -This section contains a few notes about QPDF's internal implementation,  
2706 -particularly around what it does when it first processes a file. This  
2707 -section is a bit of a simplification of what it actually does, but it  
2708 -could serve as a starting point to someone trying to understand the  
2709 -implementation. There is nothing in this section that you need to know  
2710 -to use the qpdf library.  
2711 -  
2712 -``QPDFObject`` is the basic PDF Object class. It is an abstract base  
2713 -class from which are derived classes for each type of PDF object.  
2714 -Clients do not interact with Objects directly but instead interact with  
2715 -``QPDFObjectHandle``.  
2716 -  
2717 -When the ``QPDF`` class creates a new object, it dynamically allocates  
2718 -the appropriate type of ``QPDFObject`` and immediately hands the pointer  
2719 -to an instance of ``QPDFObjectHandle``. The parser reads a token from  
2720 -the current file position. If the token is a not either a dictionary or  
2721 -array opener, an object is immediately constructed from the single token  
2722 -and the parser returns. Otherwise, the parser iterates in a special mode  
2723 -in which it accumulates objects until it finds a balancing closer.  
2724 -During this process, the "``R``" keyword is recognized and an indirect  
2725 -``QPDFObjectHandle`` may be constructed.  
2726 -  
2727 -The ``QPDF::resolve()`` method, which is used to resolve an indirect  
2728 -object, may be invoked from the ``QPDFObjectHandle`` class. It first  
2729 -checks a cache to see whether this object has already been read. If not,  
2730 -it reads the object from the PDF file and caches it. It the returns the  
2731 -resulting ``QPDFObjectHandle``. The calling object handle then replaces  
2732 -its ``PointerHolder<QDFObject>`` with the one from the newly returned  
2733 -``QPDFObjectHandle``. In this way, only a single copy of any direct  
2734 -object need exist and clients can access objects transparently without  
2735 -knowing caring whether they are direct or indirect objects.  
2736 -Additionally, no object is ever read from the file more than once. That  
2737 -means that only the portions of the PDF file that are actually needed  
2738 -are ever read from the input file, thus allowing the qpdf package to  
2739 -take advantage of this important design goal of PDF files.  
2740 -  
2741 -If the requested object is inside of an object stream, the object stream  
2742 -itself is first read into memory. Then the tokenizer reads objects from  
2743 -the memory stream based on the offset information stored in the stream.  
2744 -Those individual objects are cached, after which the temporary buffer  
2745 -holding the object stream contents are discarded. In this way, the first  
2746 -time an object in an object stream is requested, all objects in the  
2747 -stream are cached.  
2748 -  
2749 -The following example should clarify how ``QPDF`` processes a simple  
2750 -file.  
2751 -  
2752 -- Client constructs ``QPDF`` ``pdf`` and calls  
2753 - ``pdf.processFile("a.pdf");``.  
2754 -  
2755 -- The ``QPDF`` class checks the beginning of  
2756 - :file:`a.pdf` for a PDF header. It then reads the  
2757 - cross reference table mentioned at the end of the file, ensuring that  
2758 - it is looking before the last ``%%EOF``. After getting to ``trailer``  
2759 - keyword, it invokes the parser.  
2760 -  
2761 -- The parser sees "``<<``", so it calls itself recursively in  
2762 - dictionary creation mode.  
2763 -  
2764 -- In dictionary creation mode, the parser keeps accumulating objects  
2765 - until it encounters "``>>``". Each object that is read is pushed onto  
2766 - a stack. If "``R``" is read, the last two objects on the stack are  
2767 - inspected. If they are integers, they are popped off the stack and  
2768 - their values are used to construct an indirect object handle which is  
2769 - then pushed onto the stack. When "``>>``" is finally read, the stack  
2770 - is converted into a ``QPDF_Dictionary`` which is placed in a  
2771 - ``QPDFObjectHandle`` and returned.  
2772 -  
2773 -- The resulting dictionary is saved as the trailer dictionary.  
2774 -  
2775 -- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that  
2776 - point and repeats except that the new trailer dictionary is not  
2777 - saved. If ``/Prev`` is not present, the initial parsing process is  
2778 - complete.  
2779 -  
2780 - If there is an encryption dictionary, the document's encryption  
2781 - parameters are initialized.  
2782 -  
2783 -- The client requests root object. The ``QPDF`` class gets the value of  
2784 - root key from trailer dictionary and returns it. It is an unresolved  
2785 - indirect ``QPDFObjectHandle``.  
2786 -  
2787 -- The client requests the ``/Pages`` key from root  
2788 - ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is  
2789 - indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the  
2790 - object cache for an object with the root dictionary's object ID and  
2791 - generation number. Upon not seeing it, it checks the cross reference  
2792 - table, gets the offset, and reads the object present at that offset.  
2793 - It stores the result in the object cache and returns the cached  
2794 - result. The calling ``QPDFObjectHandle`` replaces its object pointer  
2795 - with the one from the resolved ``QPDFObjectHandle``, verifies that it  
2796 - a valid dictionary object, and returns the (unresolved indirect)  
2797 - ``QPDFObject`` handle to the top of the Pages hierarchy.  
2798 -  
2799 - As the client continues to request objects, the same process is  
2800 - followed for each new requested object.  
2801 -  
2802 -.. _ref.casting:  
2803 -  
2804 -Casting Policy  
2805 ---------------  
2806 -  
2807 -This section describes the casting policy followed by qpdf's  
2808 -implementation. This is no concern to qpdf's end users and largely of no  
2809 -concern to people writing code that uses qpdf, but it could be of  
2810 -interest to people who are porting qpdf to a new platform or who are  
2811 -making modifications to the code.  
2812 -  
2813 -The C++ code in qpdf is free of old-style casts except where unavoidable  
2814 -(e.g. where the old-style cast is in a macro provided by a third-party  
2815 -header file). When there is a need for a cast, it is handled, in order  
2816 -of preference, by rewriting the code to avoid the need for a cast,  
2817 -calling ``const_cast``, calling ``static_cast``, calling  
2818 -``reinterpret_cast``, or calling some combination of the above. As a  
2819 -last resort, a compiler-specific ``#pragma`` may be used to suppress a  
2820 -warning that we don't want to fix. Examples may include suppressing  
2821 -warnings about the use of old-style casts in code that is shared between  
2822 -C and C++ code.  
2823 -  
2824 -The ``QIntC`` namespace, provided by  
2825 -:file:`include/qpdf/QIntC.hh`, implements safe  
2826 -functions for converting between integer types. These functions do range  
2827 -checking and throw a ``std::range_error``, which is subclass of  
2828 -``std::runtime_error``, if conversion from one integer type to another  
2829 -results in loss of information. There are many cases in which we have to  
2830 -move between different integer types because of incompatible integer  
2831 -types used in interoperable interfaces. Some are unavoidable, such as  
2832 -moving between sizes and offsets, and others are there because of old  
2833 -code that is too in entrenched to be fixable without breaking source  
2834 -compatibility and causing pain for users. QPDF is compiled with extra  
2835 -warnings to detect conversions with potential data loss, and all such  
2836 -cases should be fixed by either using a function from ``QIntC`` or a  
2837 -``static_cast``.  
2838 -  
2839 -When the intention is just to switch the type because of exchanging data  
2840 -between incompatible interfaces, use ``QIntC``. This is the usual case.  
2841 -However, there are some cases in which we are explicitly intending to  
2842 -use the exact same bit pattern with a different type. This is most  
2843 -common when switching between signed and unsigned characters. A lot of  
2844 -qpdf's code uses unsigned characters internally, but ``std::string`` and  
2845 -``char`` are signed. Using ``QIntC::to_char`` would be wrong for  
2846 -converting from unsigned to signed characters because a negative  
2847 -``char`` value and the corresponding ``unsigned char`` value greater  
2848 -than 127 *mean the same thing*. There are also  
2849 -cases in which we use ``static_cast`` when working with bit fields where  
2850 -we are not representing a numerical value but rather a bunch of bits  
2851 -packed together in some integer type. Also note that ``size_t`` and  
2852 -``long`` both typically differ between 32-bit and 64-bit environments,  
2853 -so sometimes an explicit cast may not be needed to avoid warnings on one  
2854 -platform but may be needed on another. A conversion with ``QIntC``  
2855 -should always be used when the types are different even if the  
2856 -underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit  
2857 -platforms, and the test suite is very thorough, so it is hard to make  
2858 -any of the potential errors here without being caught in build or test.  
2859 -  
2860 -Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The  
2861 -pipeline interface has a ``write`` call that uses ``unsigned char*``  
2862 -without a ``const`` qualifier. The main reason for this is  
2863 -to support pipelines that make calls to third-party libraries, such as  
2864 -zlib, that don't include ``const`` in their interfaces. Unfortunately,  
2865 -there are many places in the code where it is desirable to have  
2866 -``const char*`` with pipelines. None of the pipeline implementations  
2867 -in qpdf  
2868 -currently modify the data passed to write, and doing so would be counter  
2869 -to the intent of ``Pipeline``, but there is nothing in the code to  
2870 -prevent this from being done. There are places in the code where  
2871 -``const_cast`` is used to remove the const-ness of pointers going into  
2872 -``Pipeline``\ s. This could theoretically be unsafe, but there is  
2873 -adequate testing to assert that it is safe and will remain safe in  
2874 -qpdf's code.  
2875 -  
2876 -.. _ref.encryption:  
2877 -  
2878 -Encryption  
2879 -----------  
2880 -  
2881 -Encryption is supported transparently by qpdf. When opening a PDF file,  
2882 -if an encryption dictionary exists, the ``QPDF`` object processes this  
2883 -dictionary using the password (if any) provided. The primary decryption  
2884 -key is computed and cached. No further access is made to the encryption  
2885 -dictionary after that time. When an object is read from a file, the  
2886 -object ID and generation of the object in which it is contained is  
2887 -always known. Using this information along with the stored encryption  
2888 -key, all stream and string objects are transparently decrypted. Raw  
2889 -encrypted objects are never stored in memory. This way, nothing in the  
2890 -library ever has to know or care whether it is reading an encrypted  
2891 -file.  
2892 -  
2893 -An interface is also provided for writing encrypted streams and strings  
2894 -given an encryption key. This is used by ``QPDFWriter`` when it rewrites  
2895 -encrypted files.  
2896 -  
2897 -When copying encrypted files, unless otherwise directed, qpdf will  
2898 -preserve any encryption in force in the original file. qpdf can do this  
2899 -with either the user or the owner password. There is no difference in  
2900 -capability based on which password is used. When 40 or 128 bit  
2901 -encryption keys are used, the user password can be recovered with the  
2902 -owner password. With 256 keys, the user and owner passwords are used  
2903 -independently to encrypt the actual encryption key, so while either can  
2904 -be used, the owner password can no longer be used to recover the user  
2905 -password.  
2906 -  
2907 -Starting with version 4.0.0, qpdf can read files that are not encrypted  
2908 -but that contain encrypted attachments, but it cannot write such files.  
2909 -qpdf also requires the password to be specified in order to open the  
2910 -file, not just to extract attachments, since once the file is open, all  
2911 -decryption is handled transparently. When copying files like this while  
2912 -preserving encryption, qpdf will apply the file's encryption to  
2913 -everything in the file, not just to the attachments. When decrypting the  
2914 -file, qpdf will decrypt the attachments. In general, when copying PDF  
2915 -files with multiple encryption formats, qpdf will choose the newest  
2916 -format. The only exception to this is that clear-text metadata will be  
2917 -preserved as clear-text if it is that way in the original file.  
2918 -  
2919 -One point of confusion some people have about encrypted PDF files is  
2920 -that encryption is not the same as password protection. Password  
2921 -protected files are always encrypted, but it is also possible to create  
2922 -encrypted files that do not have passwords. Internally, such files use  
2923 -the empty string as a password, and most readers try the empty string  
2924 -first to see if it works and prompt for a password only if the empty  
2925 -string doesn't work. Normally such files have an empty user password and  
2926 -a non-empty owner password. In that way, if the file is opened by an  
2927 -ordinary reader without specification of password, the restrictions  
2928 -specified in the encryption dictionary can be enforced. Most users  
2929 -wouldn't even realize such a file was encrypted. Since qpdf always  
2930 -ignores the restrictions (except for the purpose of reporting what they  
2931 -are), qpdf doesn't care which password you use. QPDF will allow you to  
2932 -create PDF files with non-empty user passwords and empty owner  
2933 -passwords. Some readers will require a password when you open these  
2934 -files, and others will open the files without a password and not enforce  
2935 -restrictions. Having a non-empty user password and an empty owner  
2936 -password doesn't really make sense because it would mean that opening  
2937 -the file with the user password would be more restrictive than not  
2938 -supplying a password at all. QPDF also allows you to create PDF files  
2939 -with the same password as both the user and owner password. Some readers  
2940 -will not ever allow such files to be accessed without restrictions  
2941 -because they never try the password as the owner password if it works as  
2942 -the user password. Nonetheless, one of the powerful aspects of qpdf is  
2943 -that it allows you to finely specify the way encrypted files are  
2944 -created, even if the results are not useful to some readers. One use  
2945 -case for this would be for testing a PDF reader to ensure that it  
2946 -handles odd configurations of input files.  
2947 -  
2948 -.. _ref.random-numbers:  
2949 -  
2950 -Random Number Generation  
2951 -------------------------  
2952 -  
2953 -QPDF generates random numbers to support generation of encrypted data.  
2954 -Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of  
2955 -random numbers. Older versions used the OS-provided source of secure  
2956 -random numbers or, if allowed at build time, insecure random numbers  
2957 -from stdlib. Starting with version 5.1.0, you can disable use of  
2958 -OS-provided secure random numbers at build time. This is especially  
2959 -useful on Windows if you want to avoid a dependency on Microsoft's  
2960 -cryptography API. You can also supply your own random data provider. For  
2961 -details on how to do this, please refer to the top-level README.md file  
2962 -in the source distribution and to comments in  
2963 -:file:`QUtil.hh`.  
2964 -  
2965 -.. _ref.adding-and-remove-pages:  
2966 -  
2967 -Adding and Removing Pages  
2968 --------------------------  
2969 -  
2970 -While qpdf's API has supported adding and modifying objects for some  
2971 -time, version 3.0 introduces specific methods for adding and removing  
2972 -pages. These are largely convenience routines that handle two tricky  
2973 -issues: pushing inheritable resources from the ``/Pages`` tree down to  
2974 -individual pages and manipulation of the ``/Pages`` tree itself. For  
2975 -details, see ``addPage`` and surrounding methods in  
2976 -:file:`QPDF.hh`.  
2977 -  
2978 -.. _ref.reserved-objects:  
2979 -  
2980 -Reserving Object Numbers  
2981 -------------------------  
2982 -  
2983 -Version 3.0 of qpdf introduced the concept of reserved objects. These  
2984 -are seldom needed for ordinary operations, but there are cases in which  
2985 -you may want to add a series of indirect objects with references to each  
2986 -other to a ``QPDF`` object. This causes a problem because you can't  
2987 -determine the object ID that a new indirect object will have until you  
2988 -add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The  
2989 -only way to add two mutually referential objects to a ``QPDF`` object  
2990 -prior to version 3.0 would be to add the new objects first and then make  
2991 -them refer to each other after adding them. Now it is possible to create  
2992 -a *reserved object* using  
2993 -``QPDFObjectHandle::newReserved``. This is an indirect object that stays  
2994 -"unresolved" even if it is queried for its type. So now, if you want to  
2995 -create a set of mutually referential objects, you can create  
2996 -reservations for each one of them and use those reservations to  
2997 -construct the references. When finished, you can call  
2998 -``QPDF::replaceReserved`` to replace the reserved objects with the real  
2999 -ones. This functionality will never be needed by most applications, but  
3000 -it is used internally by QPDF when copying objects from other PDF files,  
3001 -as discussed in :ref:`ref.foreign-objects`. For an example of how to use reserved  
3002 -objects, search for ``newReserved`` in  
3003 -:file:`test_driver.cc` in qpdf's sources.  
3004 -  
3005 -.. _ref.foreign-objects:  
3006 -  
3007 -Copying Objects From Other PDF Files  
3008 -------------------------------------  
3009 -  
3010 -Version 3.0 of qpdf introduced the ability to copy objects into a  
3011 -``QPDF`` object from a different ``QPDF`` object, which we refer to as  
3012 -*foreign objects*. This allows arbitrary  
3013 -merging of PDF files. The "from" ``QPDF`` object must remain valid after  
3014 -the copy as discussed in the note below. The  
3015 -:command:`qpdf` command-line tool provides limited  
3016 -support for basic page selection, including merging in pages from other  
3017 -files, but the library's API makes it possible to implement arbitrarily  
3018 -complex merging operations. The main method for copying foreign objects  
3019 -is ``QPDF::copyForeignObject``. This takes an indirect object from  
3020 -another ``QPDF`` and copies it recursively into this object while  
3021 -preserving all object structure, including circular references. This  
3022 -means you can add a direct object that you create from scratch to a  
3023 -``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an  
3024 -indirect object from another file with ``QPDF::copyForeignObject``. The  
3025 -fact that ``QPDF::makeIndirectObject`` does not automatically detect a  
3026 -foreign object and copy it is an explicit design decision. Copying a  
3027 -foreign object seems like a sufficiently significant thing to do that it  
3028 -should be done explicitly.  
3029 -  
3030 -The other way to copy foreign objects is by passing a page from one  
3031 -``QPDF`` to another by calling ``QPDF::addPage``. In contrast to  
3032 -``QPDF::makeIndirectObject``, this method automatically distinguishes  
3033 -between indirect objects in the current file, foreign objects, and  
3034 -direct objects.  
3035 -  
3036 -Please note: when you copy objects from one ``QPDF`` to another, the  
3037 -source ``QPDF`` object must remain valid until you have finished with  
3038 -the destination object. This is because the original object is still  
3039 -used to retrieve any referenced stream data from the copied object.  
3040 -  
3041 -.. _ref.rewriting:  
3042 -  
3043 -Writing PDF Files  
3044 ------------------  
3045 -  
3046 -The qpdf library supports file writing of ``QPDF`` objects to PDF files  
3047 -through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two  
3048 -writing modes: one for non-linearized files, and one for linearized  
3049 -files. See :ref:`ref.linearization` for a description of  
3050 -linearization is implemented. This section describes how we write  
3051 -non-linearized files including the creation of QDF files (see :ref:`ref.qdf`.  
3052 -  
3053 -This outline was written prior to implementation and is not exactly  
3054 -accurate, but it provides a correct "notional" idea of how writing  
3055 -works. Look at the code in ``QPDFWriter`` for exact details.  
3056 -  
3057 -- Initialize state:  
3058 -  
3059 - - next object number = 1  
3060 -  
3061 - - object queue = empty  
3062 -  
3063 - - renumber table: old object id/generation to new id/0 = empty  
3064 -  
3065 - - xref table: new id -> offset = empty  
3066 -  
3067 -- Create a QPDF object from a file.  
3068 -  
3069 -- Write header for new PDF file.  
3070 -  
3071 -- Request the trailer dictionary.  
3072 -  
3073 -- For each value that is an indirect object, grab the next object  
3074 - number (via an operation that returns and increments the number). Map  
3075 - object to new number in renumber table. Push object onto queue.  
3076 -  
3077 -- While there are more objects on the queue:  
3078 -  
3079 - - Pop queue.  
3080 -  
3081 - - Look up object's new number *n* in the renumbering table.  
3082 -  
3083 - - Store current offset into xref table.  
3084 -  
3085 - - Write ``:samp:`{n}` 0 obj``.  
3086 -  
3087 - - If object is null, whether direct or indirect, write out null,  
3088 - thus eliminating unresolvable indirect object references.  
3089 -  
3090 - - If the object is a stream stream, write stream contents, piped  
3091 - through any filters as required, to a memory buffer. Use this  
3092 - buffer to determine the stream length.  
3093 -  
3094 - - If object is not a stream, array, or dictionary, write out its  
3095 - contents.  
3096 -  
3097 - - If object is an array or dictionary (including stream), traverse  
3098 - its elements (for array) or values (for dictionaries), handling  
3099 - recursive dictionaries and arrays, looking for indirect objects.  
3100 - When an indirect object is found, if it is not resolvable, ignore.  
3101 - (This case is handled when writing it out.) Otherwise, look it up  
3102 - in the renumbering table. If not found, grab the next available  
3103 - object number, assign to the referenced object in the renumbering  
3104 - table, and push the referenced object onto the queue. As a special  
3105 - case, when writing out a stream dictionary, replace length,  
3106 - filters, and decode parameters as required.  
3107 -  
3108 - Write out dictionary or array, replacing any unresolvable indirect  
3109 - object references with null (pdf spec says reference to  
3110 - non-existent object is legal and resolves to null) and any  
3111 - resolvable ones with references to the renumbered objects.  
3112 -  
3113 - - If the object is a stream, write ``stream\n``, the stream contents  
3114 - (from the memory buffer), and ``\nendstream\n``.  
3115 -  
3116 - - When done, write ``endobj``.  
3117 -  
3118 -Once we have finished the queue, all referenced objects will have been  
3119 -written out and all deleted objects or unreferenced objects will have  
3120 -been skipped. The new cross-reference table will contain an offset for  
3121 -every new object number from 1 up to the number of objects written. This  
3122 -can be used to write out a new xref table. Finally we can write out the  
3123 -trailer dictionary with appropriately computed /ID (see spec, 8.3, File  
3124 -Identifiers), the cross reference table offset, and ``%%EOF``.  
3125 -  
3126 -.. _ref.filtered-streams:  
3127 -  
3128 -Filtered Streams  
3129 -----------------  
3130 -  
3131 -Support for streams is implemented through the ``Pipeline`` interface  
3132 -which was designed for this package.  
3133 -  
3134 -When reading streams, create a series of ``Pipeline`` objects. The  
3135 -``Pipeline`` abstract base requires implementation ``write()`` and  
3136 -``finish()`` and provides an implementation of ``getNext()``. Each  
3137 -pipeline object, upon receiving data, does whatever it is going to do  
3138 -and then writes the data (possibly modified) to its successor.  
3139 -Alternatively, a pipeline may be an end-of-the-line pipeline that does  
3140 -something like store its output to a file or a memory buffer ignoring a  
3141 -successor. For additional details, look at  
3142 -:file:`Pipeline.hh`.  
3143 -  
3144 -``QPDF`` can read raw or filtered streams. When reading a filtered  
3145 -stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each  
3146 -appropriate filter object and chains them together. The last filter  
3147 -should write to whatever type of output is required. The ``QPDF`` class  
3148 -has an interface to write raw or filtered stream contents to a given  
3149 -pipeline.  
3150 -  
3151 -.. _ref.object-accessors:  
3152 -  
3153 -Object Accessor Methods  
3154 ------------------------  
3155 -  
3156 -..  
3157 - This section is referenced in QPDFObjectHandle.hh  
3158 -  
3159 -For general information about how to access instances of  
3160 -``QPDFObjectHandle``, please see the comments in  
3161 -:file:`QPDFObjectHandle.hh`. Search for "Accessor  
3162 -methods". This section provides a more in-depth discussion of the  
3163 -behavior and the rationale for the behavior.  
3164 -  
3165 -*Why were type errors made into warnings?* When type checks were  
3166 -introduced into qpdf in the early days, it was expected that type errors  
3167 -would only occur as a result of programmer error. However, in practice,  
3168 -type errors would occur with malformed PDF files because of assumptions  
3169 -made in code, including code within the qpdf library and code written by  
3170 -library users. The most common case would be chaining calls to  
3171 -``getKey()`` to access keys deep within a dictionary. In many cases,  
3172 -qpdf would be able to recover from these situations, but the old  
3173 -behavior often resulted in crashes rather than graceful recovery. For  
3174 -this reason, the errors were changed to warnings.  
3175 -  
3176 -*Why even warn about type errors when the user can't usually do anything  
3177 -about them?* Type warnings are extremely valuable during development.  
3178 -Since it's impossible to catch at compile time things like typos in  
3179 -dictionary key names or logic errors around what the structure of a PDF  
3180 -file might be, the presence of type warnings can save lots of developer  
3181 -time. They have also proven useful in exposing issues in qpdf itself  
3182 -that would have otherwise gone undetected.  
3183 -  
3184 -*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if  
3185 -``QPDFObjectHandle`` could be more strongly typed so that you'd have to  
3186 -have check that something was of a particular type before calling  
3187 -type-specific accessor methods. However, implementing this at this stage  
3188 -of the library's history would be quite difficult, and it would make a  
3189 -the common pattern of drilling into an object no longer work. While it  
3190 -would be possible to have a parallel interface, it would create a lot of  
3191 -extra code. If qpdf were written in a language like rust, an interface  
3192 -like this would make a lot of sense, but, for a variety of reasons, the  
3193 -qpdf API is consistent with other APIs of its time, relying on exception  
3194 -handling to catch errors. The underlying PDF objects are inherently not  
3195 -type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would  
3196 -ultimately cause a lot more code to have to be written and would like  
3197 -make software that uses qpdf more brittle, and even so, checks would  
3198 -have to occur at runtime.  
3199 -  
3200 -*Why do type errors sometimes raise exceptions?* The way warnings work  
3201 -in qpdf requires a ``QPDF`` object to be associated with an object  
3202 -handle for a warning to be issued. It would be nice if this could be  
3203 -fixed, but it would require major changes to the API. Rather than  
3204 -throwing away these conditions, we convert them to exceptions. It's not  
3205 -that bad though. Since any object handle that was read from a file has  
3206 -an associated ``QPDF`` object, it would only be type errors on objects  
3207 -that were created explicitly that would cause exceptions, and in that  
3208 -case, type errors are much more likely to be the result of a coding  
3209 -error than invalid input.  
3210 -  
3211 -*Why does the behavior of a type exception differ between the C and C++  
3212 -API?* There is no way to throw and catch exceptions in C short of  
3213 -something like ``setjmp`` and ``longjmp``, and that approach is not  
3214 -portable across language barriers. Since the C API is often used from  
3215 -other languages, it's important to keep things as simple as possible.  
3216 -Starting in qpdf 10.5, exceptions that used to crash code using the C  
3217 -API will be written to stderr by default, and it is possible to register  
3218 -an error handler. There's no reason that the error handler can't  
3219 -simulate exception handling in some way, such as by using ``setjmp`` and  
3220 -``longjmp`` or by setting some variable that can be checked after  
3221 -library calls are made. In retrospect, it might have been better if the  
3222 -C API object handle methods returned error codes like the other methods  
3223 -and set return values in passed-in pointers, but this would complicate  
3224 -both the implementation and the use of the library for a case that is  
3225 -actually quite rare and largely avoidable.  
3226 -  
3227 -.. _ref.linearization:  
3228 -  
3229 -Linearization  
3230 -=============  
3231 -  
3232 -This chapter describes how ``QPDF`` and ``QPDFWriter`` implement  
3233 -creation and processing of linearized PDFS.  
3234 -  
3235 -.. _ref.linearization-strategy:  
3236 -  
3237 -Basic Strategy for Linearization  
3238 ---------------------------------  
3239 -  
3240 -To avoid the incestuous problem of having the qpdf library validate its  
3241 -own linearized files, we have a special linearized file checking mode  
3242 -which can be invoked via :command:`qpdf  
3243 ---check-linearization` (or :command:`qpdf  
3244 ---check`). This mode reads the linearization parameter  
3245 -dictionary and the hint streams and validates that object ordering,  
3246 -parameters, and hint stream contents are correct. The validation code  
3247 -was first tested against linearized files created by external tools  
3248 -(Acrobat and pdlin) and then used to validate files created by  
3249 -``QPDFWriter`` itself.  
3250 -  
3251 -.. _ref.linearized.preparation:  
3252 -  
3253 -Preparing For Linearization  
3254 ----------------------------  
3255 -  
3256 -Before creating a linearized PDF file from any other PDF file, the PDF  
3257 -file must be altered such that all page attributes are propagated down  
3258 -to the page level (and not inherited from parents in the ``/Pages``  
3259 -tree). We also have to know which objects refer to which other objects,  
3260 -being concerned with page boundaries and a few other cases. We refer to  
3261 -this part of preparing the PDF file as  
3262 -*optimization*, discussed in  
3263 -:ref:`ref.optimization`. Note the, in this context, the  
3264 -term *optimization* is a qpdf term, and the  
3265 -term *linearization* is a term from the PDF  
3266 -specification. Do not be confused by the fact that many applications  
3267 -refer to linearization as optimization or web optimization.  
3268 -  
3269 -When creating linearized PDF files from optimized PDF files, there are  
3270 -really only a few issues that need to be dealt with:  
3271 -  
3272 -- Creation of hints tables  
3273 -  
3274 -- Placing objects in the correct order  
3275 -  
3276 -- Filling in offsets and byte sizes  
3277 -  
3278 -.. _ref.optimization:  
3279 -  
3280 -Optimization  
3281 -------------  
3282 -  
3283 -In order to perform various operations such as linearization and  
3284 -splitting files into pages, it is necessary to know which objects are  
3285 -referenced by which pages, page thumbnails, and root and trailer  
3286 -dictionary keys. It is also necessary to ensure that all page-level  
3287 -attributes appear directly at the page level and are not inherited from  
3288 -parents in the pages tree.  
3289 -  
3290 -We refer to the process of enforcing these constraints as  
3291 -*optimization*. As mentioned above, note  
3292 -that some applications refer to linearization as optimization. Although  
3293 -this optimization was initially motivated by the need to create  
3294 -linearized files, we are using these terms separately.  
3295 -  
3296 -PDF file optimization is implemented in the  
3297 -:file:`QPDF_optimization.cc` source file. That file  
3298 -is richly commented and serves as the primary reference for the  
3299 -optimization process.  
3300 -  
3301 -After optimization has been completed, the private member variables  
3302 -``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have  
3303 -been populated. Any object that has more than one value in the  
3304 -``object_to_obj_users`` table is shared. Any object that has exactly one  
3305 -value in the ``object_to_obj_users`` table is private. To find all the  
3306 -private objects in a page or a trailer or root dictionary key, one  
3307 -merely has make this determination for each element in the  
3308 -``obj_user_to_objects`` table for the given page or key.  
3309 -  
3310 -Note that pages and thumbnails have different object user types, so the  
3311 -above test on a page will not include objects referenced by the page's  
3312 -thumbnail dictionary and nothing else.  
3313 -  
3314 -.. _ref.linearization.writing:  
3315 -  
3316 -Writing Linearized Files  
3317 -------------------------  
3318 -  
3319 -We will create files with only primary hint streams. We will never write  
3320 -overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either,  
3321 -and they are never necessary.) The hint streams contain offset  
3322 -information to objects that point to where they would be if the hint  
3323 -stream were not present. This means that we have to calculate all object  
3324 -positions before we can generate and write the hint table. This means  
3325 -that we have to generate the file in two passes. To make this reliable,  
3326 -``QPDFWriter`` in linearization mode invokes exactly the same code twice  
3327 -to write the file to a pipeline.  
3328 -  
3329 -In the first pass, the target pipeline is a count pipeline chained to a  
3330 -discard pipeline. The count pipeline simply passes its data through to  
3331 -the next pipeline in the chain but can return the number of bytes passed  
3332 -through it at any intermediate point. The discard pipeline is an end of  
3333 -line pipeline that just throws its data away. The hint stream is not  
3334 -written and dummy values with adequate padding are stored in the first  
3335 -cross reference table, linearization parameter dictionary, and /Prev key  
3336 -of the first trailer dictionary. All the offset, length, object  
3337 -renumbering information, and anything else we need for the second pass  
3338 -is stored.  
3339 -  
3340 -At the end of the first pass, this information is passed to the ``QPDF``  
3341 -class which constructs a compressed hint stream in a memory buffer and  
3342 -returns it. ``QPDFWriter`` uses this information to write a complete  
3343 -hint stream object into a memory buffer. At this point, the length of  
3344 -the hint stream is known.  
3345 -  
3346 -In the second pass, the end of the pipeline chain is a regular file  
3347 -instead of a discard pipeline, and we have known values for all the  
3348 -offsets and lengths that we didn't have in the first pass. We have to  
3349 -adjust offsets that appear after the start of the hint stream by the  
3350 -length of the hint stream, which is known. Anything that is of variable  
3351 -length is padded, with the padding code surrounding any writing code  
3352 -that differs in the two passes. This ensures that changes to the way  
3353 -things are represented never results in offsets that were gathered  
3354 -during the first pass becoming incorrect for the second pass.  
3355 -  
3356 -Using this strategy, we can write linearized files to a non-seekable  
3357 -output stream with only a single pass to disk or wherever the output is  
3358 -going.  
3359 -  
3360 -.. _ref.linearization-data:  
3361 -  
3362 -Calculating Linearization Data  
3363 -------------------------------  
3364 -  
3365 -Once a file is optimized, we have information about which objects access  
3366 -which other objects. We can then process these tables to decide which  
3367 -part (as described in "Linearized PDF Document Structure" in the PDF  
3368 -specification) each object is contained within. This tells us the exact  
3369 -order in which objects are written. The ``QPDFWriter`` class asks for  
3370 -this information and enqueues objects for writing in the proper order.  
3371 -It also turns on a check that causes an exception to be thrown if an  
3372 -object is encountered that has not already been queued. (This could  
3373 -happen only if there were a bug in the traversal code used to calculate  
3374 -the linearization data.)  
3375 -  
3376 -.. _ref.linearization-issues:  
3377 -  
3378 -Known Issues with Linearization  
3379 --------------------------------  
3380 -  
3381 -There are a handful of known issues with this linearization code. These  
3382 -issues do not appear to impact the behavior of linearized files which  
3383 -still work as intended: it is possible for a web browser to begin to  
3384 -display them before they are fully downloaded. In fact, it seems that  
3385 -various other programs that create linearized files have many of these  
3386 -same issues. These items make reference to terminology used in the  
3387 -linearization appendix of the PDF specification.  
3388 -  
3389 -- Thread Dictionary information keys appear in part 4 with the rest of  
3390 - Threads instead of in part 9. Objects in part 9 are not grouped  
3391 - together functionally.  
3392 -  
3393 -- We are not calculating numerators for shared object positions within  
3394 - content streams or interleaving them within content streams.  
3395 -  
3396 -- We generate only page offset, shared object, and outline hint tables.  
3397 - It would be relatively easy to add some additional tables. We gather  
3398 - most of the information needed to create thumbnail hint tables. There  
3399 - are comments in the code about this.  
3400 -  
3401 -.. _ref.linearization-debugging:  
3402 -  
3403 -Debugging Note  
3404 ---------------  
3405 -  
3406 -The :command:`qpdf --show-linearization` command can show  
3407 -the complete contents of linearization hint streams. To look at the raw  
3408 -data, you can extract the filtered contents of the linearization hint  
3409 -tables using :command:`qpdf --show-object=n  
3410 ---filtered-stream-data`. Then, to convert this into a bit  
3411 -stream (since linearization tables are bit streams written without  
3412 -regard to byte boundaries), you can pipe the resulting data through the  
3413 -following perl code:  
3414 -  
3415 -.. code-block:: perl  
3416 -  
3417 - use bytes;  
3418 - binmode STDIN;  
3419 - undef $/;  
3420 - my $a = <STDIN>;  
3421 - my @ch = split(//, $a);  
3422 - map { printf("%08b", ord($_)) } @ch;  
3423 - print "\n";  
3424 -  
3425 -.. _ref.object-and-xref-streams:  
3426 -  
3427 -Object and Cross-Reference Streams  
3428 -==================================  
3429 -  
3430 -This chapter provides information about the implementation of object  
3431 -stream and cross-reference stream support in qpdf.  
3432 -  
3433 -.. _ref.object-streams:  
3434 -  
3435 -Object Streams  
3436 ---------------  
3437 -  
3438 -Object streams can contain any regular object except the following:  
3439 -  
3440 -- stream objects  
3441 -  
3442 -- objects with generation > 0  
3443 -  
3444 -- the encryption dictionary  
3445 -  
3446 -- objects containing the /Length of another stream  
3447 -  
3448 -In addition, Adobe reader (at least as of version 8.0.0) appears to not  
3449 -be able to handle having the document catalog appear in an object stream  
3450 -if the file is encrypted, though this is not specifically disallowed by  
3451 -the specification.  
3452 -  
3453 -There are additional restrictions for linearized files. See  
3454 -:ref:`ref.object-streams-linearization` for details.  
3455 -  
3456 -The PDF specification refers to objects in object streams as "compressed  
3457 -objects" regardless of whether the object stream is compressed.  
3458 -  
3459 -The generation number of every object in an object stream must be zero.  
3460 -It is possible to delete and replace an object in an object stream with  
3461 -a regular object.  
3462 -  
3463 -The object stream dictionary has the following keys:  
3464 -  
3465 -- ``/N``: number of objects  
3466 -  
3467 -- ``/First``: byte offset of first object  
3468 -  
3469 -- ``/Extends``: indirect reference to stream that this extends  
3470 -  
3471 -Stream collections are formed with ``/Extends``. They must form a  
3472 -directed acyclic graph. These can be used for semantic information and  
3473 -are not meaningful to the PDF document's syntactic structure. Although  
3474 -qpdf preserves stream collections, it never generates them and doesn't  
3475 -make use of this information in any way.  
3476 -  
3477 -The specification recommends limiting the number of objects in object  
3478 -stream for efficiency in reading and decoding. Acrobat 6 uses no more  
3479 -than 100 objects per object stream for linearized files and no more 200  
3480 -objects per stream for non-linearized files. ``QPDFWriter``, in object  
3481 -stream generation mode, never puts more than 100 objects in an object  
3482 -stream.  
3483 -  
3484 -Object stream contents consists of *N* pairs of integers, each of which  
3485 -is the object number and the byte offset of the object relative to the  
3486 -first object in the stream, followed by the objects themselves,  
3487 -concatenated.  
3488 -  
3489 -.. _ref.xref-streams:  
3490 -  
3491 -Cross-Reference Streams  
3492 ------------------------  
3493 -  
3494 -For non-hybrid files, the value following ``startxref`` is the byte  
3495 -offset to the xref stream rather than the word ``xref``.  
3496 -  
3497 -For hybrid files (files containing both xref tables and cross-reference  
3498 -streams), the xref table's trailer dictionary contains the key  
3499 -``/XRefStm`` whose value is the byte offset to a cross-reference stream  
3500 -that supplements the xref table. A PDF 1.5-compliant application should  
3501 -read the xref table first. Then it should replace any object that it has  
3502 -already seen with any defined in the xref stream. Then it should follow  
3503 -any ``/Prev`` pointer in the original xref table's trailer dictionary.  
3504 -The specification is not clear about what should be done, if anything,  
3505 -with a ``/Prev`` pointer in the xref stream referenced by an xref table.  
3506 -The ``QPDF`` class ignores it, which is probably reasonable since, if  
3507 -this case were to appear for any sensible PDF file, the previous xref  
3508 -table would probably have a corresponding ``/XRefStm`` pointer of its  
3509 -own. For example, if a hybrid file were appended, the appended section  
3510 -would have its own xref table and ``/XRefStm``. The appended xref table  
3511 -would point to the previous xref table which would point the  
3512 -``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to  
3513 -it.  
3514 -  
3515 -Since xref streams must be read very early, they may not be encrypted,  
3516 -and the may not contain indirect objects for keys required to read them,  
3517 -which are these:  
3518 -  
3519 -- ``/Type``: value ``/XRef``  
3520 -  
3521 -- ``/Size``: value *n+1*: where *n* is highest object number (same as  
3522 - ``/Size`` in the trailer dictionary)  
3523 -  
3524 -- ``/Index`` (optional): value  
3525 - ``[:samp:`{n count}` ...]`` used to determine  
3526 - which objects' information is stored in this stream. The default is  
3527 - ``[0 /Size]``.  
3528 -  
3529 -- ``/Prev``: value :samp:`{offset}`: byte  
3530 - offset of previous xref stream (same as ``/Prev`` in the trailer  
3531 - dictionary)  
3532 -  
3533 -- ``/W [...]``: sizes of each field in the xref table  
3534 -  
3535 -The other fields in the xref stream, which may be indirect if desired,  
3536 -are the union of those from the xref table's trailer dictionary.  
3537 -  
3538 -.. _ref.xref-stream-data:  
3539 -  
3540 -Cross-Reference Stream Data  
3541 -~~~~~~~~~~~~~~~~~~~~~~~~~~~  
3542 -  
3543 -The stream data is binary and encoded in big-endian byte order. Entries  
3544 -are concatenated, and each entry has a length equal to the total of the  
3545 -entries in ``/W`` above. Each entry consists of one or more fields, the  
3546 -first of which is the type of the field. The number of bytes for each  
3547 -field is given by ``/W`` above. A 0 in ``/W`` indicates that the field  
3548 -is omitted and has the default value. The default value for the field  
3549 -type is "``1``". All other default values are "``0``".  
3550 -  
3551 -PDF 1.5 has three field types:  
3552 -  
3553 -- 0: for free objects. Format: ``0 obj next-generation``, same as the  
3554 - free table in a traditional cross-reference table  
3555 -  
3556 -- 1: regular non-compressed object. Format: ``1 offset generation``  
3557 -  
3558 -- 2: for objects in object streams. Format: ``2 object-stream-number  
3559 - index``, the number of object stream containing the object and the  
3560 - index within the object stream of the object.  
3561 -  
3562 -It seems standard to have the first entry in the table be ``0 0 0``  
3563 -instead of ``0 0 ffff`` if there are no deleted objects.  
3564 -  
3565 -.. _ref.object-streams-linearization:  
3566 -  
3567 -Implications for Linearized Files  
3568 ----------------------------------  
3569 -  
3570 -For linearized files, the linearization dictionary, document catalog,  
3571 -and page objects may not be contained in object streams.  
3572 -  
3573 -Objects stored within object streams are given the highest range of  
3574 -object numbers within the main and first-page cross-reference sections.  
3575 -  
3576 -It is okay to use cross-reference streams in place of regular xref  
3577 -tables. There are on special considerations.  
3578 -  
3579 -Hint data refers to object streams themselves, not the objects in the  
3580 -streams. Shared object references should also be made to the object  
3581 -streams. There are no reference in any hint tables to the object numbers  
3582 -of compressed objects (objects within object streams).  
3583 -  
3584 -When numbering objects, all shared objects within both the first and  
3585 -second halves of the linearized files must be numbered consecutively  
3586 -after all normal uncompressed objects in that half.  
3587 -  
3588 -.. _ref.object-stream-implementation:  
3589 -  
3590 -Implementation Notes  
3591 ---------------------  
3592 -  
3593 -There are three modes for writing object streams:  
3594 -:samp:`disable`, :samp:`preserve`, and  
3595 -:samp:`generate`. In disable mode, we do not generate  
3596 -any object streams, and we also generate an xref table rather than xref  
3597 -streams. This can be used to generate PDF files that are viewable with  
3598 -older readers. In preserve mode, we write object streams such that  
3599 -written object streams contain the same objects and ``/Extends``  
3600 -relationships as in the original file. This is equal to disable if the  
3601 -file has no object streams. In generate, we create object streams  
3602 -ourselves by grouping objects that are allowed in object streams  
3603 -together in sets of no more than 100 objects. We also ensure that the  
3604 -PDF version is at least 1.5 in generate mode, but we preserve the  
3605 -version header in the other modes. The default is  
3606 -:samp:`preserve`.  
3607 -  
3608 -We do not support creation of hybrid files. When we write files, even in  
3609 -preserve mode, we will lose any xref tables and merge any appended  
3610 -sections.  
3611 -  
3612 -.. _ref.release-notes:  
3613 -  
3614 -Release Notes  
3615 -=============  
3616 -  
3617 -For a detailed list of changes, please see the file  
3618 -:file:`ChangeLog` in the source distribution.  
3619 -  
3620 -10.5.0: XXX Month dd, YYYY  
3621 - - Library Enhancements  
3622 -  
3623 - - Since qpdf version 8, using object accessor methods on an  
3624 - instance of ``QPDFObjectHandle`` may create warnings if the  
3625 - object is not of the expected type. These warnings now have an  
3626 - error code of ``qpdf_e_object`` instead of  
3627 - ``qpdf_e_damaged_pdf``. Also, comments have been added to  
3628 - :file:`QPDFObjectHandle.hh` to explain in more detail what the  
3629 - behavior is. See :ref:`ref.object-accessors` for a more in-depth  
3630 - discussion.  
3631 -  
3632 - - Add ``Pl_Buffer::getMallocBuffer()`` to initialize a buffer  
3633 - allocated with ``malloc()`` for better cross-language  
3634 - interoperability.  
3635 -  
3636 - - C API Enhancements  
3637 -  
3638 - - Overhaul error handling for the object handle functions C API.  
3639 - Some rare error conditions that would previously have caused a  
3640 - crash are now trapped and reported, and the functions that  
3641 - generate them return fallback values. See comments in the  
3642 - ``ERROR HANDLING`` section of :file:`include/qpdf/qpdf-c.h` for  
3643 - details. In particular, exceptions thrown by the underlying C++  
3644 - code when calling object accessors are caught and converted into  
3645 - errors. The errors can be checked by call ``qpdf_has_error``.  
3646 - Use ``qpdf_silence_errors`` to prevent the error from being  
3647 - written to stderr.  
3648 -  
3649 - - Add ``qpdf_get_last_string_length`` to the C API to get the  
3650 - length of the last string that was returned. This is needed to  
3651 - handle strings that contain embedded null characters.  
3652 -  
3653 - - Add ``qpdf_oh_is_initialized`` and  
3654 - ``qpdf_oh_new_uninitialized`` to the C API to make it possible  
3655 - to work with uninitialized objects.  
3656 -  
3657 - - Add ``qpdf_oh_new_object`` to the C API. This allows you to  
3658 - clone an object handle.  
3659 -  
3660 - - Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``,  
3661 - and ``qpdf_replace_object``, exposing the corresponding methods  
3662 - in ``QPDF`` and ``QPDFObjectHandle``.  
3663 -  
3664 - - Add several functions for working with pages. See ``PAGE  
3665 - FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.  
3666 -  
3667 - - Add several functions for working with streams. See ``STREAM  
3668 - FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.  
3669 -  
3670 - - Add ``qpdf_oh_get_type_code`` and ``qpdf_oh_get_type_name``.  
3671 -  
3672 - - Documentation change  
3673 -  
3674 - - The documentation sources have been switched from docbook to  
3675 - reStructuredText processed with `Sphinx  
3676 - <https://sphinx-doc.org>`__. This is mostly transparent (other  
3677 - than format change) with the exception that all section links  
3678 - have changed. What used to be `#ref.something` is now  
3679 - `#something`. A top-to-bottom review of the documentation is  
3680 - planned for an upcoming release.  
3681 -  
3682 -10.4.0: November 16, 2021  
3683 - - Handling of Weak Cryptography Algorithms  
3684 -  
3685 - - From the qpdf CLI, the  
3686 - :samp:`--allow-weak-crypto` is now required to  
3687 - suppress a warning when explicitly creating PDF files using RC4  
3688 - encryption. While qpdf will always retain the ability to read  
3689 - and write such files, doing so will require explicit  
3690 - acknowledgment moving forward. For qpdf 10.4, this change only  
3691 - affects the command-line tool. Starting in qpdf 11, there will  
3692 - be small API changes to require explicit acknowledgment in  
3693 - those cases as well. For additional information, see :ref:`ref.weak-crypto`.  
3694 -  
3695 - - Bug Fixes  
3696 -  
3697 - - Fix potential bounds error when handling shell completion that  
3698 - could occur when given bogus input.  
3699 -  
3700 - - Properly handle overlay/underlay on completely empty pages  
3701 - (with no resource dictionary).  
3702 -  
3703 - - Fix crash that could occur under certain conditions when using  
3704 - :samp:`--pages` with files that had form  
3705 - fields.  
3706 -  
3707 - - Library Enhancements  
3708 -  
3709 - - Make ``QPDF::findPage`` functions public.  
3710 -  
3711 - - Add methods to ``Pl_Flate`` to be able to receive warnings on  
3712 - certain recoverable conditions.  
3713 -  
3714 - - Add an extra check to the library to detect when foreign  
3715 - objects are inserted directly (instead of using  
3716 - ``QPDF::copyForeignObject``) at the time of insertion rather  
3717 - than when the file is written. Catching the error sooner makes  
3718 - it much easier to locate the incorrect code.  
3719 -  
3720 - - CLI Enhancements  
3721 -  
3722 - - Improve diagnostics around parsing  
3723 - :samp:`--pages` command-line options  
3724 -  
3725 - - Packaging Changes  
3726 -  
3727 - - The Windows binary distribution is now built with crypto  
3728 - provided by OpenSSL 3.0.  
3729 -  
3730 -10.3.2: May 8, 2021  
3731 - - Bug Fixes  
3732 -  
3733 - - When generating a file while preserving object streams,  
3734 - unreferenced objects are correctly removed unless  
3735 - :samp:`--preserve-unreferenced` is specified.  
3736 -  
3737 - - Library Enhancements  
3738 -  
3739 - - When adding a page that already exists, make a shallow copy  
3740 - instead of throwing an exception. This makes the library  
3741 - behavior consistent with the CLI behavior. See  
3742 - :file:`ChangeLog` for additional notes.  
3743 -  
3744 -10.3.1: March 11, 2021  
3745 - - Bug Fixes  
3746 -  
3747 - - Form field copying failed on files where /DR was a direct  
3748 - object in the document-level form dictionary.  
3749 -  
3750 -10.3.0: March 4, 2021  
3751 - - Bug Fixes  
3752 -  
3753 - - The code for handling form fields when copying pages from  
3754 - 10.2.0 was not quite right and didn't work in a number of  
3755 - situations, such as when the same page was copied multiple  
3756 - times or when there were conflicting resource or field names  
3757 - across multiple copies. The 10.3.0 code has been much more  
3758 - thoroughly tested with more complex cases and with a multitude  
3759 - of readers and should be much closer to correct. The 10.2.0  
3760 - code worked well enough for page splitting or for copying pages  
3761 - with form fields into documents that didn't already have them  
3762 - but was still not quite correct in handling of field-level  
3763 - resources.  
3764 -  
3765 - - When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is  
3766 - called, existing ``QPDFObjectHandle`` instances no longer point  
3767 - to the old objects. The next time they are accessed, they  
3768 - automatically notice the change to the underlying object and  
3769 - update themselves. This resolves a very longstanding source of  
3770 - confusion, albeit in a very rarely used method call.  
3771 -  
3772 - - Fix form field handling code to look for default appearances,  
3773 - quadding, and default resources in the right places. The code  
3774 - was not looking for things in the document-level interactive  
3775 - form dictionary that it was supposed to be finding there. This  
3776 - required adding a few new methods to  
3777 - ``QPDFFormFieldObjectHelper``.  
3778 -  
3779 - - Library Enhancements  
3780 -  
3781 - - Reworked the code that handles copying annotations and form  
3782 - fields during page operations. There were additional methods  
3783 - added to the public API from 10.2.0 and a one deprecation of a  
3784 - method added in 10.2.0. The majority of the API changes are in  
3785 - methods most people would never call and that will hopefully be  
3786 - superseded by higher-level interfaces for handling page copies.  
3787 - Please see the :file:`ChangeLog` file for  
3788 - details.  
3789 -  
3790 - - The method ``QPDF::numWarnings`` was added so that you can tell  
3791 - whether any warnings happened during a specific block of code.  
3792 -  
3793 -10.2.0: February 23, 2021  
3794 - - CLI Behavior Changes  
3795 -  
3796 - - Operations that work on combining pages are much better about  
3797 - protecting form fields. In particular,  
3798 - :samp:`--split-pages` and  
3799 - :samp:`--pages` now preserve interaction form  
3800 - functionality by copying the relevant form field information  
3801 - from the original files. Additionally, if you use  
3802 - :samp:`--pages` to select only some pages from  
3803 - the original input file, unused form fields are removed, which  
3804 - prevents lots of unused annotations from being retained.  
3805 -  
3806 - - By default, :command:`qpdf` no longer allows  
3807 - creation of encrypted PDF files whose user password is  
3808 - non-empty and owner password is empty when a 256-bit key is in  
3809 - use. The :samp:`--allow-insecure` option,  
3810 - specified inside the :samp:`--encrypt` options,  
3811 - allows creation of such files. Behavior changes in the CLI are  
3812 - avoided when possible, but an exception was made here because  
3813 - this is security-related. qpdf must always allow creation of  
3814 - weird files for testing purposes, but it should not default to  
3815 - letting users unknowingly create insecure files.  
3816 -  
3817 - - Library Behavior Changes  
3818 -  
3819 - - Note: the changes in this section cause differences in output  
3820 - in some cases. These differences change the syntax of the PDF  
3821 - but do not change the semantics (meaning). I make a strong  
3822 - effort to avoid gratuitous changes in qpdf's output so that  
3823 - qpdf changes don't break people's tests. In this case, the  
3824 - changes significantly improve the readability of the generated  
3825 - PDF and don't affect any output that's generated by simple  
3826 - transformation. If you are annoyed by having to update test  
3827 - files, please rest assured that changes like this have been and  
3828 - will continue to be rare events.  
3829 -  
3830 - - ``QPDFObjectHandle::newUnicodeString`` now uses whichever of  
3831 - ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all  
3832 - the characters in the string. This reduces needless encoding in  
3833 - UTF-16 of strings that can be encoded in ASCII. This change may  
3834 - cause qpdf to generate different output than before when form  
3835 - field values are set using ``QPDFFormFieldObjectHelper`` but  
3836 - does not change the meaning of the output.  
3837 -  
3838 - - The code that places form XObjects and also the code that  
3839 - flattens rotations trim trailing zeroes from real numbers that  
3840 - they calculate. This causes slight (but semantically  
3841 - equivalent) differences in generated appearance streams and  
3842 - form XObject invocations in overlay/underlay code or in user  
3843 - code that calls the methods that place form XObjects on a page.  
3844 -  
3845 - - CLI Enhancements  
3846 -  
3847 - - Add new command line options for listing, saving, adding,  
3848 - removing, and and copying file attachments. See :ref:`ref.attachments` for details.  
3849 -  
3850 - - Page splitting and merging operations, as well as  
3851 - :samp:`--flatten-rotation`, are better behaved  
3852 - with respect to annotations and interactive form fields. In  
3853 - most cases, interactive form field functionality and proper  
3854 - formatting and functionality of annotations is preserved by  
3855 - these operations. There are still some cases that aren't  
3856 - perfect, such as when functionality of annotations depends on  
3857 - document-level data that qpdf doesn't yet understand or when  
3858 - there are problems with referential integrity among form fields  
3859 - and annotations (e.g., when a single form field object or its  
3860 - associated annotations are shared across multiple pages, a case  
3861 - that is out of spec but that works in most viewers anyway).  
3862 -  
3863 - - The option  
3864 - :samp:`--password-file={filename}`  
3865 - can now be used to read the decryption password from a file.  
3866 - You can use ``-`` as the file name to read the password from  
3867 - standard input. This is an easier/more obvious way to read  
3868 - passwords from files or standard input than using  
3869 - :samp:`@file` for this purpose.  
3870 -  
3871 - - Add some information about attachments to the json output, and  
3872 - added ``attachments`` as an additional json key. The  
3873 - information included here is limited to the preferred name and  
3874 - content stream and a reference to the file spec object. This is  
3875 - enough detail for clients to avoid the hassle of navigating a  
3876 - name tree and provides what is needed for basic enumeration and  
3877 - extraction of attachments. More detailed information can be  
3878 - obtained by following the reference to the file spec object.  
3879 -  
3880 - - Add numeric option to :samp:`--collate`. If  
3881 - :samp:`--collate={n}`  
3882 - is given, take pages in groups of  
3883 - :samp:`{n}` from the given files.  
3884 -  
3885 - - It is now valid to provide :samp:`--rotate=0`  
3886 - to clear rotation from a page.  
3887 -  
3888 - - Library Enhancements  
3889 -  
3890 - - This release includes numerous additions to the API. Not all  
3891 - changes are listed here. Please see the  
3892 - :file:`ChangeLog` file in the source  
3893 - distribution for a comprehensive list. Highlights appear below.  
3894 -  
3895 - - Add ``QPDFObjectHandle::ditems()`` and  
3896 - ``QPDFObjectHandle::aitems()`` that enable C++-style iteration,  
3897 - including range-for iteration, over dictionary and array  
3898 - QPDFObjectHandles. See comments in  
3899 - :file:`include/qpdf/QPDFObjectHandle.hh`  
3900 - and  
3901 - :file:`examples/pdf-name-number-tree.cc`  
3902 - for details.  
3903 -  
3904 - - Add ``QPDFObjectHandle::copyStream`` for making a copy of a  
3905 - stream within the same ``QPDF`` instance.  
3906 -  
3907 - - Add new helper classes for supporting file attachments, also  
3908 - known as embedded files. New classes are  
3909 - ``QPDFEmbeddedFileDocumentHelper``,  
3910 - ``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``.  
3911 - See their respective headers for details and  
3912 - :file:`examples/pdf-attach-file.cc` for an  
3913 - example.  
3914 -  
3915 - - Add a version of ``QPDFObjectHandle::parse`` that takes a  
3916 - ``QPDF`` pointer as context so that it can parse strings  
3917 - containing indirect object references. This is illustrated in  
3918 - :file:`examples/pdf-attach-file.cc`.  
3919 -  
3920 - - Re-implement ``QPDFNameTreeObjectHelper`` and  
3921 - ``QPDFNumberTreeObjectHelper`` to be more efficient, add an  
3922 - iterator-based API, give them the capability to repair broken  
3923 - trees, and create methods for modifying the trees. With this  
3924 - change, qpdf has a robust read/write implementation of name and  
3925 - number trees.  
3926 -  
3927 - - Add new versions of ``QPDFObjectHandle::replaceStreamData``  
3928 - that take ``std::function`` objects for cases when you need  
3929 - something between a static string and a full-fledged  
3930 - StreamDataProvider. Using this with ``QUtil::file_provider`` is  
3931 - a very easy way to create a stream from the contents of a file.  
3932 -  
3933 - - The ``QPDFMatrix`` class, formerly a private, internal class,  
3934 - has been added to the public API. See  
3935 - :file:`include/qpdf/QPDFMatrix.hh` for  
3936 - details. This class is for working with transformation  
3937 - matrices. Some methods in ``QPDFPageObjectHelper`` make use of  
3938 - this to make information about transformation matrices  
3939 - available. For an example, see  
3940 - :file:`examples/pdf-overlay-page.cc`.  
3941 -  
3942 - - Several new methods were added to  
3943 - ``QPDFAcroFormDocumentHelper`` for adding, removing, getting  
3944 - information about, and enumerating form fields.  
3945 -  
3946 - - Add method  
3947 - ``QPDFAcroFormDocumentHelper::transformAnnotations``, which  
3948 - applies a transformation to each annotation on a page.  
3949 -  
3950 - - Add ``QPDFPageObjectHelper::copyAnnotations``, which copies  
3951 - annotations and, if applicable, associated form fields, from  
3952 - one page to another, possibly transforming the rectangles.  
3953 -  
3954 - - Build Changes  
3955 -  
3956 - - A C++-14 compiler is now required to build qpdf. There is no  
3957 - intention to require anything newer than that for a while.  
3958 - C++-14 includes modest enhancements to C++-11 and appears to be  
3959 - supported about as widely as C++-11.  
3960 -  
3961 - - Bug Fixes  
3962 -  
3963 - - The :samp:`--flatten-rotation` option applies  
3964 - transformations to any annotations that may be on the page.  
3965 -  
3966 - - If a form XObject lacks a resources dictionary, consider any  
3967 - names in that form XObject to be referenced from the containing  
3968 - page. This is compliant with older PDF versions. Also detect if  
3969 - any form XObjects have any unresolved names and, if so, don't  
3970 - remove unreferenced resources from them or from the page that  
3971 - contains them. Unfortunately this has the side effect of  
3972 - preventing removal of unreferenced resources in some cases  
3973 - where names appear that don't refer to resources, such as with  
3974 - tagged PDF. This is a bit of a corner case that is not likely  
3975 - to cause a significant problem in practice, but the only side  
3976 - effect would be lack of removal of shared resources. A future  
3977 - version of qpdf may be more sophisticated in its detection of  
3978 - names that refer to resources.  
3979 -  
3980 - - Properly handle strings if they appear in inline image  
3981 - dictionaries while externalizing inline images.  
3982 -  
3983 -10.1.0: January 5, 2021  
3984 - - CLI Enhancements  
3985 -  
3986 - - Add :samp:`--flatten-rotation` command-line  
3987 - option, which causes all pages that are rotated using  
3988 - parameters in the page's dictionary to instead be identically  
3989 - rotated in the page's contents. The change is not user-visible  
3990 - for compliant PDF readers but can be used to work around broken  
3991 - PDF applications that don't properly handle page rotation.  
3992 -  
3993 - - Library Enhancements  
3994 -  
3995 - - Support for user-provided (pluggable, modular) stream filters.  
3996 - It is now possible to derive a class from ``QPDFStreamFilter``  
3997 - and register it with ``QPDF`` so that regular library methods,  
3998 - including those used by ``QPDFWriter``, can decode streams with  
3999 - filters not directly supported by the library. The example  
4000 - :file:`examples/pdf-custom-filter.cc`  
4001 - illustrates how to use this capability.  
4002 -  
4003 - - Add methods to ``QPDFPageObjectHelper`` to iterate through  
4004 - XObjects on a page or form XObjects, possibly recursing into  
4005 - nested form XObjects: ``forEachXObject``, ``ForEachImage``,  
4006 - ``forEachFormXObject``.  
4007 -  
4008 - - Enhance several methods in ``QPDFPageObjectHelper`` to work  
4009 - with form XObjects as well as pages, as noted in comments. See  
4010 - :file:`ChangeLog` for a full list.  
4011 -  
4012 - - Rename some functions in ``QPDFPageObjectHelper``, while  
4013 - keeping old names for compatibility:  
4014 -  
4015 - - ``getPageImages`` to ``getImages``  
4016 -  
4017 - - ``filterPageContents`` to ``filterContents``  
4018 -  
4019 - - ``pipePageContents`` to ``pipeContents``  
4020 -  
4021 - - ``parsePageContents`` to ``parseContents``  
4022 -  
4023 - - Add method ``QPDFPageObjectHelper::getFormXObjects`` to return  
4024 - a map of form XObjects directly on a page or form XObject  
4025 -  
4026 - - Add new helper methods to ``QPDFObjectHandle``:  
4027 - ``isFormXObject``, ``isImage``  
4028 -  
4029 - - Add the optional ``allow_streams`` parameter  
4030 - ``QPDFObjectHandle::makeDirect``. When  
4031 - ``QPDFObjectHandle::makeDirect`` is called in this way, it  
4032 - preserves references to streams rather than throwing an  
4033 - exception.  
4034 -  
4035 - - Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this  
4036 - on a stream prevents ``QPDFWriter`` from attempting to  
4037 - uncompress, recompress, or otherwise filter a stream even if it  
4038 - could. Developers can use this to protect streams that are  
4039 - optimized should be protected from ``QPDFWriter``'s default  
4040 - behavior for any other reason.  
4041 -  
4042 - - Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is  
4043 - useful to have for debugging.  
4044 -  
4045 - - Add method ``QPDFPageObjectHelper::flattenRotation``, which  
4046 - replaces a page's ``/Rotate`` keyword by rotating the page  
4047 - within the content stream and altering the page's bounding  
4048 - boxes so the rendering is the same. This can be used to work  
4049 - around buggy PDF readers that can't properly handle page  
4050 - rotation.  
4051 -  
4052 - - C API Enhancements  
4053 -  
4054 - - Add several new functions to the C API for working with  
4055 - objects. These are wrappers around many of the methods in  
4056 - ``QPDFObjectHandle``. Their inclusion adds considerable new  
4057 - capability to the C API.  
4058 -  
4059 - - Add ``qpdf_register_progress_reporter`` to the C API,  
4060 - corresponding to ``QPDFWriter::registerProgressReporter``.  
4061 -  
4062 - - Performance Enhancements  
4063 -  
4064 - - Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object  
4065 - for writing, resulting in about an 8% improvement in write  
4066 - performance while allowing indirect objects to appear in  
4067 - ``/DecodeParms``.  
4068 -  
4069 - - When extracting pages, the :command:`qpdf` CLI  
4070 - only removes unreferenced resources from the pages that are  
4071 - being kept, resulting in a significant performance improvement  
4072 - when extracting small numbers of pages from large, complex  
4073 - documents.  
4074 -  
4075 - - Bug Fixes  
4076 -  
4077 - - ``QPDFPageObjectHelper::externalizeInlineImages`` was not  
4078 - externalizing images referenced from form XObjects that  
4079 - appeared on the page.  
4080 -  
4081 - - ``QPDFObjectHandle::filterPageContents`` was broken for pages  
4082 - with multiple content streams.  
4083 -  
4084 - - Tweak zsh completion code to behave a little better with  
4085 - respect to path completion.  
4086 -  
4087 -10.0.4: November 21, 2020  
4088 - - Bug Fixes  
4089 -  
4090 - - Fix a handful of integer overflows. This includes cases found  
4091 - by fuzzing as well as having qpdf not do range checking on  
4092 - unused values in the xref stream.  
4093 -  
4094 -10.0.3: October 31, 2020  
4095 - - Bug Fixes  
4096 -  
4097 - - The fix to the bug involving copying streams with indirect  
4098 - filters was incorrect and introduced a new, more serious bug.  
4099 - The original bug has been fixed correctly, as has the bug  
4100 - introduced in 10.0.2.  
4101 -  
4102 -10.0.2: October 27, 2020  
4103 - - Bug Fixes  
4104 -  
4105 - - When concatenating content streams, as with  
4106 - :samp:`--coalesce-contents`, there were cases  
4107 - in which qpdf would merge two lexical tokens together, creating  
4108 - invalid results. A newline is now inserted between merged  
4109 - content streams if one is not already present.  
4110 -  
4111 - - Fix an internal error that could occur when copying foreign  
4112 - streams whose stream data had been replaced using a stream data  
4113 - provider if those streams had indirect filters or decode  
4114 - parameters. This is a rare corner case.  
4115 -  
4116 - - Ensure that the caller's locale settings do not change the  
4117 - results of numeric conversions performed internally by the qpdf  
4118 - library. Note that the problem here could only be caused when  
4119 - the qpdf library was used programmatically. Using the qpdf CLI  
4120 - already ignored the user's locale for numeric conversion.  
4121 -  
4122 - - Fix several instances in which warnings were not suppressed in  
4123 - spite of :samp:`--no-warn` and/or errors or  
4124 - warnings were written to standard output rather than standard  
4125 - error.  
4126 -  
4127 - - Fixed a memory leak that could occur under specific  
4128 - circumstances when  
4129 - :samp:`--object-streams=generate` was used.  
4130 -  
4131 - - Fix various integer overflows and similar conditions found by  
4132 - the OSS-Fuzz project.  
4133 -  
4134 - - Enhancements  
4135 -  
4136 - - New option :samp:`--warning-exit-0` causes qpdf  
4137 - to exit with a status of ``0`` rather than ``3`` if there are  
4138 - warnings but no errors. Combine with  
4139 - :samp:`--no-warn` to completely ignore  
4140 - warnings.  
4141 -  
4142 - - Performance improvements have been made to  
4143 - ``QPDF::processMemoryFile``.  
4144 -  
4145 - - The OpenSSL crypto provider produces more detailed error  
4146 - messages.  
4147 -  
4148 - - Build Changes  
4149 -  
4150 - - The option :samp:`--disable-rpath` is now  
4151 - supported by qpdf's :command:`./configure`  
4152 - script. Some distributions' packaging standards recommended the  
4153 - use of this option.  
4154 -  
4155 - - Selection of a printf format string for ``long long`` has  
4156 - been moved from ``ifdefs`` to an autoconf  
4157 - test. If you are using your own build system, you will need to  
4158 - provide a value for ``LL_FMT`` in  
4159 - :file:`libqpdf/qpdf/qpdf-config.h`, which  
4160 - would typically be ``"%lld"`` or, for some Windows compilers,  
4161 - ``"%I64d"``.  
4162 -  
4163 - - Several improvements were made to build-time configuration of  
4164 - the OpenSSL crypto provider.  
4165 -  
4166 - - A nearly stand-alone Linux binary zip file is now included with  
4167 - the qpdf release. This is built on an older (but supported)  
4168 - Ubuntu LTS release, but would work on most reasonably recent  
4169 - Linux distributions. It contains only the executables and  
4170 - required shared libraries that would not be present on a  
4171 - minimal system. It can be used for including qpdf in a minimal  
4172 - environment, such as a docker container. The zip file is also  
4173 - known to work as a layer in AWS Lambda.  
4174 -  
4175 - - QPDF's automated build has been migrated from Azure Pipelines  
4176 - to GitHub Actions.  
4177 -  
4178 - - Windows-specific Changes  
4179 -  
4180 - - The Windows executables distributed with qpdf releases now use  
4181 - the OpenSSL crypto provider by default. The native crypto  
4182 - provider is also compiled in and can be selected at runtime  
4183 - with the ``QPDF_CRYPTO_PROVIDER`` environment variable.  
4184 -  
4185 - - Improvements have been made to how a cryptographic provider is  
4186 - obtained in the native Windows crypto implementation. However  
4187 - mostly this is shadowed by OpenSSL being used by default.  
4188 -  
4189 -10.0.1: April 9, 2020  
4190 - - Bug Fixes  
4191 -  
4192 - - 10.0.0 introduced a bug in which calling  
4193 - ``QPDFObjectHandle::getStreamData`` on a stream that can't be  
4194 - filtered was returning the raw data instead of throwing an  
4195 - exception. This is now fixed.  
4196 -  
4197 - - Fix a bug that was preventing qpdf from linking with some  
4198 - versions of clang on some platforms.  
4199 -  
4200 - - Enhancements  
4201 -  
4202 - - Improve the :file:`pdf-invert-images`  
4203 - example to avoid having to load all the images into RAM at the  
4204 - same time.  
4205 -  
4206 -10.0.0: April 6, 2020  
4207 - - Performance Enhancements  
4208 -  
4209 - - The qpdf library and executable should run much faster in this  
4210 - version than in the last several releases. Several internal  
4211 - library optimizations have been made, and there has been  
4212 - improved behavior on page splitting as well. This version of  
4213 - qpdf should outperform any of the 8.x or 9.x versions.  
4214 -  
4215 - - Incompatible API (source-level) Changes (minor)  
4216 -  
4217 - - The ``QUtil::srandom`` method was removed. It didn't do  
4218 - anything unless insecure random numbers were compiled in, and  
4219 - they have been off by default for a long time. If you were  
4220 - calling it, just remove the call since it wasn't doing anything  
4221 - anyway.  
4222 -  
4223 - - Build/Packaging Changes  
4224 -  
4225 - - Add a ``openssl`` crypto provider, which is implemented with  
4226 - OpenSSL and also works with BoringSSL. Thanks to Dean Scarff  
4227 - for this contribution. If you maintain qpdf for a distribution,  
4228 - pay special attention to make sure that you are including  
4229 - support for the crypto providers you want. Package maintainers  
4230 - will have to weigh the advantages of allowing users to pick a  
4231 - crypto provider at runtime against the disadvantages of adding  
4232 - more dependencies to qpdf.  
4233 -  
4234 - - Allow qpdf to built on stripped down systems whose C/C++  
4235 - libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in  
4236 - qpdf's README.md for details. This should be very rare, but it  
4237 - is known to be helpful in some embedded environments.  
4238 -  
4239 - - CLI Enhancements  
4240 -  
4241 - - Add ``objectinfo`` key to the JSON output. This will be a place  
4242 - to put computed metadata or other information about PDF objects  
4243 - that are not immediately evident in other ways or that seem  
4244 - useful for some other reason. In this version, information is  
4245 - provided about each object indicating whether it is a stream  
4246 - and, if so, what its length and filters are. Without this, it  
4247 - was not possible to tell conclusively from the JSON output  
4248 - alone whether or not an object was a stream. Run  
4249 - :command:`qpdf --json-help` for details.  
4250 -  
4251 - - Add new option  
4252 - :samp:`--remove-unreferenced-resources` which  
4253 - takes ``auto``, ``yes``, or ``no`` as arguments. The new  
4254 - ``auto`` mode, which is the default, performs a fast heuristic  
4255 - over a PDF file when splitting pages to determine whether the  
4256 - expensive process of finding and removing unreferenced  
4257 - resources is likely to be of benefit. For most files, this new  
4258 - default will result in a significant performance improvement  
4259 - for splitting pages. See :ref:`ref.advanced-transformation` for a more detailed  
4260 - discussion.  
4261 -  
4262 - - The :samp:`--preserve-unreferenced-resources`  
4263 - is now just a synonym for  
4264 - :samp:`--remove-unreferenced-resources=no`.  
4265 -  
4266 - - If the ``QPDF_EXECUTABLE`` environment variable is set when  
4267 - invoking :command:`qpdf --bash-completion` or  
4268 - :command:`qpdf --zsh-completion`, the completion  
4269 - command that it outputs will refer to qpdf using the value of  
4270 - that variable rather than what :command:`qpdf`  
4271 - determines its executable path to be. This can be useful when  
4272 - wrapping :command:`qpdf` with a script, working  
4273 - with a version in the source tree, using an AppImage, or other  
4274 - situations where there is some indirection.  
4275 -  
4276 - - Library Enhancements  
4277 -  
4278 - - Random number generation is now delegated to the crypto  
4279 - provider. The old behavior is still used by the native crypto  
4280 - provider. It is still possible to provide your own random  
4281 - number generator.  
4282 -  
4283 - - Add a new version of  
4284 - ``QPDFObjectHandle::StreamDataProvider::provideStreamData``  
4285 - that accepts the ``suppress_warnings`` and ``will_retry``  
4286 - options and allows a success code to be returned. This makes it  
4287 - possible to implement a ``StreamDataProvider`` that calls  
4288 - ``pipeStreamData`` on another stream and to pass the response  
4289 - back to the caller, which enables better error handling on  
4290 - those proxied streams.  
4291 -  
4292 - - Update ``QPDFObjectHandle::pipeStreamData`` to return an  
4293 - overall success code that goes beyond whether or not filtered  
4294 - data was written successfully. This allows better error  
4295 - handling of cases that were not filtering errors. You have to  
4296 - call this explicitly. Methods in previously existing APIs have  
4297 - the same semantics as before.  
4298 -  
4299 - - The ``QPDFPageObjectHelper::placeFormXObject`` method now  
4300 - allows separate control over whether it should be willing to  
4301 - shrink or expand objects to fit them better into the  
4302 - destination rectangle. The previous behavior was that shrinking  
4303 - was allowed but expansion was not. The previous behavior is  
4304 - still the default.  
4305 -  
4306 - - When calling the C API, any non-zero value passed to a boolean  
4307 - parameter is treated as ``TRUE``. Previously only the value  
4308 - ``1`` was accepted. This makes the C API behave more like most  
4309 - C interfaces and is known to improve compatibility with some  
4310 - Windows environments that dynamically load the DLL and call  
4311 - functions from it.  
4312 -  
4313 - - Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only  
4314 - top-level dictionary keys or array items. This is unsafe  
4315 - because it creates a situation in which changing a lower-level  
4316 - item in one object may also change it in another object, but  
4317 - for cases in which you *know* you are only inserting or  
4318 - replacing top-level items, it is much faster than  
4319 - ``QPDFObjectHandle::shallowCopy``.  
4320 -  
4321 - - Add ``QPDFObjectHandle::filterAsContents``, which filter's a  
4322 - stream's data as a content stream. This is useful for parsing  
4323 - the contents for form XObjects in the same way as parsing page  
4324 - content streams.  
4325 -  
4326 - - Bug Fixes  
4327 -  
4328 - - When detecting and removing unreferenced resources during page  
4329 - splitting, traverse into form XObjects and handle their  
4330 - resources dictionaries as well.  
4331 -  
4332 - - The same error recovery is applied to streams in other than the  
4333 - primary input file when merging or splitting pages.  
4334 -  
4335 -9.1.1: January 26, 2020  
4336 - - Build/Packaging Changes  
4337 -  
4338 - - The fix-qdf program was converted from perl to C++. As such,  
4339 - qpdf no longer has a runtime dependency on perl.  
4340 -  
4341 - - Library Enhancements  
4342 -  
4343 - - Added new helper routine ``QUtil::call_main_from_wmain`` which  
4344 - converts ``wchar_t`` arguments to UTF-8 encoded strings. This  
4345 - is useful for qpdf because library methods expect file names to  
4346 - be UTF-8 encoded, even on Windows  
4347 -  
4348 - - Added new ``QUtil::read_lines_from_file`` methods that take  
4349 - ``FILE*`` arguments and that allow preservation of end-of-line  
4350 - characters. This also fixes a bug where  
4351 - ``QUtil::read_lines_from_file`` wouldn't work properly with  
4352 - Unicode filenames.  
4353 -  
4354 - - CLI Enhancements  
4355 -  
4356 - - Added options :samp:`--is-encrypted` and  
4357 - :samp:`--requires-password` for testing whether  
4358 - a file is encrypted or requires a password other than the  
4359 - supplied (or empty) password. These communicate via exit  
4360 - status, making them useful for shell scripts. They also work on  
4361 - encrypted files with unknown passwords.  
4362 -  
4363 - - Added ``encrypt`` key to JSON options. With the exception of  
4364 - the reconstructed user password for older encryption formats,  
4365 - this provides the same information as  
4366 - :samp:`--show-encryption` but in a consistent,  
4367 - parseable format. See output of :command:`qpdf  
4368 - --json-help` for details.  
4369 -  
4370 - - Bug Fixes  
4371 -  
4372 - - In QDF mode, be sure not to write more than one XRef stream to  
4373 - a file, even when  
4374 - :samp:`--preserve-unreferenced` is used.  
4375 - :command:`fix-qdf` assumes that there is only  
4376 - one XRef stream, and that it appears at the end of the file.  
4377 -  
4378 - - When externalizing inline images, properly handle images whose  
4379 - color space is a reference to an object in the page's resource  
4380 - dictionary.  
4381 -  
4382 - - Windows-specific fix for acquiring crypt context with a new  
4383 - keyset.  
4384 -  
4385 -9.1.0: November 17, 2019  
4386 - - Build Changes  
4387 -  
4388 - - A C++-11 compiler is now required to build qpdf.  
4389 -  
4390 - - A new crypto provider that uses gnutls for crypto functions is  
4391 - now available and can be enabled at build time. See :ref:`ref.crypto` for more information about crypto  
4392 - providers and :ref:`ref.crypto.build` for specific information about  
4393 - the build.  
4394 -  
4395 - - Library Enhancements  
4396 -  
4397 - - Incorporate contribution from Masamichi Hosoda to properly  
4398 - handle signature dictionaries by not including them in object  
4399 - streams, formatting the ``Contents`` key has a hexadecimal  
4400 - string, and excluding the ``/Contents`` key from encryption and  
4401 - decryption.  
4402 -  
4403 - - Incorporate contribution from Masamichi Hosoda to provide new  
4404 - API calls for getting file-level information about input and  
4405 - output files, enabling certain operations on the files at the  
4406 - file level rather than the object level. New methods include  
4407 - ``QPDF::getXRefTable()``,  
4408 - ``QPDFObjectHandle::getParsedOffset()``,  
4409 - ``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and  
4410 - ``QPDFWriter::getWrittenXRefTable()``.  
4411 -  
4412 - - Support build-time and runtime selectable crypto providers.  
4413 - This includes the addition of new classes  
4414 - ``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the  
4415 - recognition of the ``QPDF_CRYPTO_PROVIDER`` environment  
4416 - variable. Crypto providers are described in depth in :ref:`ref.crypto`.  
4417 -  
4418 - - CLI Enhancements  
4419 -  
4420 - - Addition of the :samp:`--show-crypto` option in  
4421 - support of selectable crypto providers, as described in :ref:`ref.crypto`.  
4422 -  
4423 - - Allow ``:even`` or ``:odd`` to be appended to numeric ranges  
4424 - for specification of the even or odd pages from among the pages  
4425 - specified in the range.  
4426 -  
4427 - - Fix shell wildcard expansion behavior (``*`` and ``?``) of the  
4428 - :command:`qpdf.exe` as built my MSVC.  
4429 -  
4430 -9.0.2: October 12, 2019  
4431 - - Bug Fix  
4432 -  
4433 - - Fix the name of the temporary file used by  
4434 - :samp:`--replace-input` so that it doesn't  
4435 - require path splitting and works with paths include  
4436 - directories.  
4437 -  
4438 -9.0.1: September 20, 2019  
4439 - - Bug Fixes/Enhancements  
4440 -  
4441 - - Fix some build and test issues on big-endian systems and  
4442 - compilers with characters that are unsigned by default. The  
4443 - problems were in build and test only. There were no actual bugs  
4444 - in the qpdf library itself relating to endianness or unsigned  
4445 - characters.  
4446 -  
4447 - - When a dictionary has a duplicated key, report this with a  
4448 - warning. The behavior of the library in this case is unchanged,  
4449 - but the error condition is no longer silently ignored.  
4450 -  
4451 - - When a form field's display rectangle is erroneously specified  
4452 - with inverted coordinates, detect and correct this situation.  
4453 - This avoids some form fields from being flipped when flattening  
4454 - annotations on files with this condition.  
4455 -  
4456 -9.0.0: August 31, 2019  
4457 - - Incompatible API (source-level) Changes (minor)  
4458 -  
4459 - - The method ``QUtil::strcasecmp`` has been renamed to  
4460 - ``QUtil::str_compare_nocase``. This incompatible change is  
4461 - necessary to enable qpdf to build on platforms that define  
4462 - ``strcasecmp`` as a macro.  
4463 -  
4464 - - The ``QPDF::copyForeignObject`` method had an overloaded  
4465 - version that took a boolean parameter that was not used. If you  
4466 - were using this version, just omit the extra parameter.  
4467 -  
4468 - - There was a version ``QPDFTokenizer::expectInlineImage`` that  
4469 - took no arguments. This version has been removed since it  
4470 - caused the tokenizer to return incorrect inline images. A new  
4471 - version was added some time ago that produces correct output.  
4472 - This is a very low level method that doesn't make sense to call  
4473 - outside of qpdf's lexical engine. There are higher level  
4474 - methods for tokenizing content streams.  
4475 -  
4476 - - Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and  
4477 - ``QPDFOutlineObjectHelper::getKids`` to return a  
4478 - ``std::vector`` instead of a ``std::list`` of  
4479 - ``QPDFOutlineObjectHelper`` objects.  
4480 -  
4481 - - Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This  
4482 - function would allow creation of name tokens whose value would  
4483 - change when unparsed, which is never the correct behavior.  
4484 -  
4485 - - CLI Enhancements  
4486 -  
4487 - - The :samp:`--replace-input` option may be given  
4488 - in place of an output file name. This causes qpdf to overwrite  
4489 - the input file with the output. See the description of  
4490 - :samp:`--replace-input` in :ref:`ref.basic-options` for more details.  
4491 -  
4492 - - The :samp:`--recompress-flate` instructs  
4493 - :command:`qpdf` to recompress streams that are  
4494 - already compressed with ``/FlateDecode``. Useful with  
4495 - :samp:`--compression-level`.  
4496 -  
4497 - - The  
4498 - :samp:`--compression-level={level}`  
4499 - sets the zlib compression level used for any streams compressed  
4500 - by ``/FlateDecode``. Most effective when combined with  
4501 - :samp:`--recompress-flate`.  
4502 -  
4503 - - Library Enhancements  
4504 -  
4505 - - A new namespace ``QIntC``, provided by  
4506 - :file:`qpdf/QIntC.hh`, provides safe  
4507 - conversion methods between different integer types. These  
4508 - conversion methods do range checking to ensure that the cast  
4509 - can be performed with no loss of information. Every use of  
4510 - ``static_cast`` in the library was inspected to see if it could  
4511 - use one of these safe converters instead. See :ref:`ref.casting` for additional details.  
4512 -  
4513 - - Method ``QPDF::anyWarnings`` tells whether there have been any  
4514 - warnings without clearing the list of warnings.  
4515 -  
4516 - - Method ``QPDF::closeInputSource`` closes or otherwise releases  
4517 - the input source. This enables the input file to be deleted or  
4518 - renamed.  
4519 -  
4520 - - New methods have been added to ``QUtil`` for converting back  
4521 - and forth between strings and unsigned integers:  
4522 - ``uint_to_string``, ``uint_to_string_base``,  
4523 - ``string_to_uint``, and ``string_to_ull``.  
4524 -  
4525 - - New methods have been added to ``QPDFObjectHandle`` that return  
4526 - the value of ``Integer`` objects as ``int`` or ``unsigned int``  
4527 - with range checking and sensible fallback values, and a new  
4528 - method was added to return an unsigned value. This makes it  
4529 - easier to write code that is safe from unintentional data loss.  
4530 - Functions: ``getUIntValue``, ``getIntValueAsInt``,  
4531 - ``getUIntValueAsUInt``.  
4532 -  
4533 - - When parsing content streams with  
4534 - ``QPDFObjectHandle::ParserCallbacks``, in place of the method  
4535 - ``handleObject(QPDFObjectHandle)``, the developer may override  
4536 - ``handleObject(QPDFObjectHandle, size_t offset, size_t  
4537 - length)``. If this method is defined, it will  
4538 - be invoked with the object along with its offset and length  
4539 - within the overall contents being parsed. Intervening spaces  
4540 - and comments are not included in offset and length.  
4541 - Additionally, a new method ``contentSize(size_t)`` may be  
4542 - implemented. If present, it will be called prior to the first  
4543 - call to ``handleObject`` with the total size in bytes of the  
4544 - combined contents.  
4545 -  
4546 - - New methods ``QPDF::userPasswordMatched`` and  
4547 - ``QPDF::ownerPasswordMatched`` have been added to enable a  
4548 - caller to determine whether the supplied password was the user  
4549 - password, the owner password, or both. This information is also  
4550 - displayed by :command:`qpdf --show-encryption`  
4551 - and :command:`qpdf --check`.  
4552 -  
4553 - - Static method ``Pl_Flate::setCompressionLevel`` can be called  
4554 - to set the zlib compression level globally used by all  
4555 - instances of Pl_Flate in deflate mode.  
4556 -  
4557 - - The method ``QPDFWriter::setRecompressFlate`` can be called to  
4558 - tell ``QPDFWriter`` to uncompress and recompress streams  
4559 - already compressed with ``/FlateDecode``.  
4560 -  
4561 - - The underlying implementation of QPDF arrays has been enhanced  
4562 - to be much more memory efficient when dealing with arrays with  
4563 - lots of nulls. This enables qpdf to use drastically less memory  
4564 - for certain types of files.  
4565 -  
4566 - - When traversing the pages tree, if nodes are encountered with  
4567 - invalid types, the types are fixed, and a warning is issued.  
4568 -  
4569 - - A new helper method ``QUtil::read_file_into_memory`` was added.  
4570 -  
4571 - - All conditions previously reported by  
4572 - ``QPDF::checkLinearization()`` as errors are now presented as  
4573 - warnings.  
4574 -  
4575 - - Name tokens containing the ``#`` character not preceded by two  
4576 - hexadecimal digits, which is invalid in PDF 1.2 and above, are  
4577 - properly handled by the library: a warning is generated, and  
4578 - the name token is properly preserved, even if invalid, in the  
4579 - output. See :file:`ChangeLog` for a more  
4580 - complete description of this change.  
4581 -  
4582 - - Bug Fixes  
4583 -  
4584 - - A small handful of memory issues, assertion failures, and  
4585 - unhandled exceptions that could occur on badly mangled input  
4586 - files have been fixed. Most of these problems were found by  
4587 - Google's OSS-Fuzz project.  
4588 -  
4589 - - When :command:`qpdf --check` or  
4590 - :command:`qpdf --check-linearization` encounters  
4591 - a file with linearization warnings but not errors, it now  
4592 - properly exits with exit code 3 instead of 2.  
4593 -  
4594 - - The :samp:`--completion-bash` and  
4595 - :samp:`--completion-zsh` options now work  
4596 - properly when qpdf is invoked as an AppImage.  
4597 -  
4598 - - Calling ``QPDFWriter::set*EncryptionParameters`` on a  
4599 - ``QPDFWriter`` object whose output filename has not yet been  
4600 - set no longer produces a segmentation fault.  
4601 -  
4602 - - When reading encrypted files, follow the spec more closely  
4603 - regarding encryption key length. This allows qpdf to open  
4604 - encrypted files in most cases when they have invalid or missing  
4605 - /Length keys in the encryption dictionary.  
4606 -  
4607 - - Build Changes  
4608 -  
4609 - - On platforms that support it, qpdf now builds with  
4610 - :samp:`-fvisibility=hidden`. If you build qpdf  
4611 - with your own build system, this is now safe to use. This  
4612 - prevents methods that are not part of the public API from being  
4613 - exported by the shared library, and makes qpdf's ELF shared  
4614 - libraries (used on Linux, MacOS, and most other UNIX flavors)  
4615 - behave more like the Windows DLL. Since the DLL already behaves  
4616 - in much this way, it is unlikely that there are any methods  
4617 - that were accidentally not exported. However, with ELF shared  
4618 - libraries, typeinfo for some classes has to be explicitly  
4619 - exported. If there are problems in dynamically linked code  
4620 - catching exceptions or subclassing, this could be the reason.  
4621 - If you see this, please report a bug at  
4622 - https://github.com/qpdf/qpdf/issues/.  
4623 -  
4624 - - QPDF is now compiled with integer conversion and sign  
4625 - conversion warnings enabled. Numerous changes were made to the  
4626 - library to make this safe.  
4627 -  
4628 - - QPDF's :command:`make install` target explicitly  
4629 - specifies the mode to use when installing files instead of  
4630 - relying the user's umask. It was previously doing this for some  
4631 - files but not others.  
4632 -  
4633 - - If :command:`pkg-config` is available, use it to  
4634 - locate :file:`libjpeg` and  
4635 - :file:`zlib` dependencies, falling back on  
4636 - old behavior if unsuccessful.  
4637 -  
4638 - - Other Notes  
4639 -  
4640 - - QPDF has been fully integrated into `Google's OSS-Fuzz  
4641 - project <https://github.com/google/oss-fuzz>`__. This project  
4642 - exercises code with randomly mutated inputs and is great for  
4643 - discovering hidden security crashes and security issues.  
4644 - Several bugs found by oss-fuzz have already been fixed in qpdf.  
4645 -  
4646 -8.4.2: May 18, 2019  
4647 - This release has just one change: correction of a buffer overrun in  
4648 - the Windows code used to open files. Windows users should take this  
4649 - update. There are no code changes that affect non-Windows releases.  
4650 -  
4651 -8.4.1: April 27, 2019  
4652 - - Enhancements  
4653 -  
4654 - - When :command:`qpdf --version` is run, it will  
4655 - detect if the qpdf CLI was built with a different version of  
4656 - qpdf than the library, which may indicate a problem with the  
4657 - installation.  
4658 -  
4659 - - New option :samp:`--remove-page-labels` will  
4660 - remove page labels before generating output. This used to  
4661 - happen if you ran :command:`qpdf --empty --pages ..  
4662 - --`, but the behavior changed in qpdf 8.3.0. This  
4663 - option enables people who were relying on the old behavior to  
4664 - get it again.  
4665 -  
4666 - - New option  
4667 - :samp:`--keep-files-open-threshold={count}`  
4668 - can be used to override number of files that qpdf will use to  
4669 - trigger the behavior of not keeping all files open when merging  
4670 - files. This may be necessary if your system allows fewer than  
4671 - the default value of 200 files to be open at the same time.  
4672 -  
4673 - - Bug Fixes  
4674 -  
4675 - - Handle Unicode characters in filenames on Windows. The changes  
4676 - to support Unicode on the CLI in Windows broke Unicode  
4677 - filenames for Windows.  
4678 -  
4679 - - Slightly tighten logic that determines whether an object is a  
4680 - page. This should resolve problems in some rare files where  
4681 - some non-page objects were passing qpdf's test for whether  
4682 - something was a page, thus causing them to be erroneously lost  
4683 - during page splitting operations.  
4684 -  
4685 - - Revert change that included preservation of outlines  
4686 - (bookmarks) in :samp:`--split-pages`. The way  
4687 - it was implemented in 8.3.0 and 8.4.0 caused a very significant  
4688 - degradation of performance for splitting certain files. A  
4689 - future release of qpdf may re-introduce the behavior in a more  
4690 - performant and also more correct fashion.  
4691 -  
4692 - - In JSON mode, add missing leading 0 to decimal values between  
4693 - -1 and 1 even if not present in the input. The JSON  
4694 - specification requires the leading 0. The PDF specification  
4695 - does not.  
4696 -  
4697 -8.4.0: February 1, 2019  
4698 - - Command-line Enhancements  
4699 -  
4700 - - *Non-compatible CLI change:* The qpdf command-line tool  
4701 - interprets passwords given at the command-line differently from  
4702 - previous releases when the passwords contain non-ASCII  
4703 - characters. In some cases, the behavior differs from previous  
4704 - releases. For a discussion of the current behavior, please see  
4705 - :ref:`ref.unicode-passwords`. The  
4706 - incompatibilities are as follows:  
4707 -  
4708 - - On Windows, qpdf now receives all command-line options as  
4709 - Unicode strings if it can figure out the appropriate  
4710 - compile/link options. This is enabled at least for MSVC and  
4711 - mingw builds. That means that if non-ASCII strings are  
4712 - passed to the qpdf CLI in Windows, qpdf will now correctly  
4713 - receive them. In the past, they would have either been  
4714 - encoded as Windows code page 1252 (also known as "Windows  
4715 - ANSI" or as something unintelligible. In almost all cases,  
4716 - qpdf is able to properly interpret Unicode arguments now,  
4717 - whereas in the past, it would almost never interpret them  
4718 - properly. The result is that non-ASCII passwords given to  
4719 - the qpdf CLI on Windows now have a much greater chance of  
4720 - creating PDF files that can be opened by a variety of  
4721 - readers. In the past, usually files encrypted from the  
4722 - Windows CLI using non-ASCII passwords would not be readable  
4723 - by most viewers. Note that the current version of qpdf is  
4724 - able to decrypt files that it previously created using the  
4725 - previously supplied password.  
4726 -  
4727 - - The PDF specification requires passwords to be encoded as  
4728 - UTF-8 for 256-bit encryption and with PDF Doc encoding for  
4729 - 40-bit or 128-bit encryption. Older versions of qpdf left it  
4730 - up to the user to provide passwords with the correct  
4731 - encoding. The qpdf CLI now detects when a password is given  
4732 - with UTF-8 encoding and automatically transcodes it to what  
4733 - the PDF spec requires. While this is almost always the  
4734 - correct behavior, it is possible to override the behavior if  
4735 - there is some reason to do so. This is discussed in more  
4736 - depth in :ref:`ref.unicode-passwords`.  
4737 -  
4738 - - New options  
4739 - :samp:`--externalize-inline-images`,  
4740 - :samp:`--ii-min-bytes`, and  
4741 - :samp:`--keep-inline-images` control qpdf's  
4742 - handling of inline images and possible conversion of them to  
4743 - regular images. By default,  
4744 - :samp:`--optimize-images` now also applies to  
4745 - inline images. These options are discussed in :ref:`ref.advanced-transformation`.  
4746 -  
4747 - - Add options :samp:`--overlay` and  
4748 - :samp:`--underlay` for overlaying or  
4749 - underlaying pages of other files onto output pages. See  
4750 - :ref:`ref.overlay-underlay` for  
4751 - details.  
4752 -  
4753 - - When opening an encrypted file with a password, if the  
4754 - specified password doesn't work and the password contains any  
4755 - non-ASCII characters, qpdf will try a number of alternative  
4756 - passwords to try to compensate for possible character encoding  
4757 - errors. This behavior can be suppressed with the  
4758 - :samp:`--suppress-password-recovery` option.  
4759 - See :ref:`ref.unicode-passwords` for a full  
4760 - discussion.  
4761 -  
4762 - - Add the :samp:`--password-mode` option to  
4763 - fine-tune how qpdf interprets password arguments, especially  
4764 - when they contain non-ASCII characters. See :ref:`ref.unicode-passwords` for more information.  
4765 -  
4766 - - In the :samp:`--pages` option, it is now  
4767 - possible to copy the same page more than once from the same  
4768 - file without using the previous workaround of specifying two  
4769 - different paths to the same file.  
4770 -  
4771 - - In the :samp:`--pages` option, allow use of "."  
4772 - as a shortcut for the primary input file. That way, you can do  
4773 - :command:`qpdf in.pdf --pages . 1-2 -- out.pdf`  
4774 - instead of having to repeat :file:`in.pdf`  
4775 - in the command.  
4776 -  
4777 - - When encrypting with 128-bit and 256-bit encryption, new  
4778 - encryption options :samp:`--assemble`,  
4779 - :samp:`--annotate`,  
4780 - :samp:`--form`, and  
4781 - :samp:`--modify-other` allow more fine-grained  
4782 - granularity in configuring options. Before, the  
4783 - :samp:`--modify` option only configured certain  
4784 - predefined groups of permissions.  
4785 -  
4786 - - Bug Fixes and Enhancements  
4787 -  
4788 - - *Potential data-loss bug:* Versions of qpdf between 8.1.0 and  
4789 - 8.3.0 had a bug that could cause page splitting and merging  
4790 - operations to drop some font or image resources if the PDF  
4791 - file's internal structure shared these resource lists across  
4792 - pages and if some but not all of the pages in the output did  
4793 - not reference all the fonts and images. Using the  
4794 - :samp:`--preserve-unreferenced-resources`  
4795 - option would work around the incorrect behavior. This bug was  
4796 - the result of a typo in the code and a deficiency in the test  
4797 - suite. The case that triggered the error was known, just not  
4798 - handled properly. This case is now exercised in qpdf's test  
4799 - suite and properly handled.  
4800 -  
4801 - - When optimizing images, detect and refuse to optimize images  
4802 - that can't be converted to JPEG because of bit depth or color  
4803 - space.  
4804 -  
4805 - - Linearization and page manipulation APIs now detect and recover  
4806 - from files that have duplicate Page objects in the pages tree.  
4807 -  
4808 - - Using older option  
4809 - :samp:`--stream-data=compress` with object  
4810 - streams, object streams and xref streams were not compressed.  
4811 -  
4812 - - When the tokenizer returns inline image tokens, delimiters  
4813 - following ``ID`` and ``EI`` operators are no longer excluded.  
4814 - This makes it possible to reliably extract the actual image  
4815 - data.  
4816 -  
4817 - - Library Enhancements  
4818 -  
4819 - - Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to  
4820 - convert inline images to regular images.  
4821 -  
4822 - - Add method ``QUtil::possible_repaired_encodings()`` to generate  
4823 - a list of strings that represent other ways the given string  
4824 - could have been encoded. This is the method the QPDF CLI uses  
4825 - to generate the strings it tries when recovering incorrectly  
4826 - encoded Unicode passwords.  
4827 -  
4828 - - Add new versions of  
4829 - ``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow  
4830 - more granular setting of permissions bits. See  
4831 - :file:`QPDFWriter.hh` for details.  
4832 -  
4833 - - Add new versions of the transcoders from UTF-8 to single-byte  
4834 - coding systems in ``QUtil`` that report success or failure  
4835 - rather than just substituting a specified unknown character.  
4836 -  
4837 - - Add method ``QUtil::analyze_encoding()`` to determine whether a  
4838 - string has high-bit characters and is appears to be UTF-16 or  
4839 - valid UTF-8 encoding.  
4840 -  
4841 - - Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to  
4842 - copy a new page that is a "shallow copy" of a page. The  
4843 - resulting object is an indirect object ready to be passed to  
4844 - ``QPDFPageDocumentHelper::addPage()`` for either the original  
4845 - ``QPDF`` object or a different one. This is what the  
4846 - :command:`qpdf` command-line tool uses to copy  
4847 - the same page multiple times from the same file during  
4848 - splitting and merging operations.  
4849 -  
4850 - - Add method ``QPDF::getUniqueId()``, which returns a unique  
4851 - identifier for the given QPDF object. The identifier will be  
4852 - unique across the life of the application. The returned value  
4853 - can be safely used as a map key.  
4854 -  
4855 - - Add method ``QPDF::setImmediateCopyFrom``. This further  
4856 - enhances qpdf's ability to allow a ``QPDF`` object from which  
4857 - objects are being copied to go out of scope before the  
4858 - destination object is written. If you call this method on a  
4859 - ``QPDF`` instances, objects copied *from* this instance will be  
4860 - copied immediately instead of lazily. This option uses more  
4861 - memory but allows the source object to go out of scope before  
4862 - the destination object is written in all cases. See comments in  
4863 - :file:`QPDF.hh` for details.  
4864 -  
4865 - - Add method ``QPDFPageObjectHelper::getAttribute`` for  
4866 - retrieving an attribute from the page dictionary taking  
4867 - inheritance into consideration, and optionally making a copy if  
4868 - your intention is to modify the attribute.  
4869 -  
4870 - - Fix long-standing limitation of  
4871 - ``QPDFPageObjectHelper::getPageImages`` so that it now properly  
4872 - reports images from inherited resources dictionaries,  
4873 - eliminating the need to call  
4874 - ``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in  
4875 - this case.  
4876 -  
4877 - - Add method ``QPDFObjectHandle::getUniqueResourceName`` for  
4878 - finding an unused name in a resource dictionary.  
4879 -  
4880 - - Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for  
4881 - generating a form XObject equivalent to a page. The resulting  
4882 - object can be used in the same file or copied to another file  
4883 - with ``copyForeignObject``. This can be useful for implementing  
4884 - underlay, overlay, n-up, thumbnails, or any other functionality  
4885 - requiring replication of pages in other contexts.  
4886 -  
4887 - - Add method ``QPDFPageObjectHelper::placeFormXObject`` for  
4888 - generating content stream text that places a given form XObject  
4889 - on a page, centered and fit within a specified rectangle. This  
4890 - method takes care of computing the proper transformation matrix  
4891 - and may optionally compensate for rotation or scaling of the  
4892 - destination page.  
4893 -  
4894 - - Build Improvements  
4895 -  
4896 - - Add new configure option  
4897 - :samp:`--enable-avoid-windows-handle`, which  
4898 - causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be  
4899 - defined. When defined, qpdf will avoid referencing the Windows  
4900 - ``HANDLE`` type, which is disallowed with certain versions of  
4901 - the Windows SDK.  
4902 -  
4903 - - For Windows builds, attempt to determine what options, if any,  
4904 - have to be passed to the compiler and linker to enable use of  
4905 - ``wmain``. This causes the preprocessor symbol  
4906 - ``WINDOWS_WMAIN`` to be defined. If you do your own builds with  
4907 - other compilers, you can define this symbol to cause ``wmain``  
4908 - to be used. This is needed to allow the Windows  
4909 - :command:`qpdf` command to receive Unicode  
4910 - command-line options.  
4911 -  
4912 -8.3.0: January 7, 2019  
4913 - - Command-line Enhancements  
4914 -  
4915 - - Shell completion: you can now use eval :command:`$(qpdf  
4916 - --completion-bash)` and eval :command:`$(qpdf  
4917 - --completion-zsh)` to enable shell completion for  
4918 - bash and zsh.  
4919 -  
4920 - - Page numbers (also known as page labels) are now preserved when  
4921 - merging and splitting files with the  
4922 - :samp:`--pages` and  
4923 - :samp:`--split-pages` options.  
4924 -  
4925 - - Bookmarks are partially preserved when splitting pages with the  
4926 - :samp:`--split-pages` option. Specifically, the  
4927 - outlines dictionary and some supporting metadata are copied  
4928 - into the split files. The result is that all bookmarks from the  
4929 - original file appear, those that point to pages that are  
4930 - preserved work, and those that point to pages that are not  
4931 - preserved don't do anything. This is an interim step toward  
4932 - proper support for bookmarks in splitting and merging  
4933 - operations.  
4934 -  
4935 - - Page collation: add new option  
4936 - :samp:`--collate`. When specified, the  
4937 - semantics of :samp:`--pages` change from  
4938 - concatenation to collation. See :ref:`ref.page-selection` for examples and discussion.  
4939 -  
4940 - - Generation of information in JSON format, primarily to  
4941 - facilitate use of qpdf from languages other than C++. Add new  
4942 - options :samp:`--json`,  
4943 - :samp:`--json-key`, and  
4944 - :samp:`--json-object` to generate a JSON  
4945 - representation of the PDF file. Run :command:`qpdf  
4946 - --json-help` to get a description of the JSON  
4947 - format. For more information, see :ref:`ref.json`.  
4948 -  
4949 - - The :samp:`--generate-appearances` flag will  
4950 - cause qpdf to generate appearances for form fields if the PDF  
4951 - file indicates that form field appearances are out of date.  
4952 - This can happen when PDF forms are filled in by a program that  
4953 - doesn't know how to regenerate the appearances of the filled-in  
4954 - fields.  
4955 -  
4956 - - The :samp:`--flatten-annotations` flag can be  
4957 - used to *flatten* annotations, including form fields.  
4958 - Ordinarily, annotations are drawn separately from the page.  
4959 - Flattening annotations is the process of combining their  
4960 - appearances into the page's contents. You might want to do this  
4961 - if you are going to rotate or combine pages using a tool that  
4962 - doesn't understand about annotations. You may also want to use  
4963 - :samp:`--generate-appearances` when using this  
4964 - flag since annotations for outdated form fields are not  
4965 - flattened as that would cause loss of information.  
4966 -  
4967 - - The :samp:`--optimize-images` flag tells qpdf  
4968 - to recompresses every image using DCT (JPEG) compression as  
4969 - long as the image is not already compressed with lossy  
4970 - compression and recompressing the image reduces its size. The  
4971 - additional options :samp:`--oi-min-width`,  
4972 - :samp:`--oi-min-height`, and  
4973 - :samp:`--oi-min-area` prevent recompression of  
4974 - images whose width, height, or pixel area (widthย ร—ย height) are  
4975 - below a specified threshold.  
4976 -  
4977 - - The :samp:`--show-object` option can now be  
4978 - given as :samp:`--show-object=trailer` to show  
4979 - the trailer dictionary.  
4980 -  
4981 - - Bug Fixes and Enhancements  
4982 -  
4983 - - QPDF now automatically detects and recovers from dangling  
4984 - references. If a PDF file contained an indirect reference to a  
4985 - non-existent object, which is valid, when adding a new object  
4986 - to the file, it was possible for the new object to take the  
4987 - object ID of the dangling reference, thereby causing the  
4988 - dangling reference to point to the new object. This case is now  
4989 - prevented.  
4990 -  
4991 - - Fixes to form field setting code: strings are always written in  
4992 - UTF-16 format, and checkboxes and radio buttons are handled  
4993 - properly with respect to synchronization of values and  
4994 - appearance states.  
4995 -  
4996 - - The ``QPDF::checkLinearization()`` no longer causes the program  
4997 - to crash when it detects problems with linearization data.  
4998 - Instead, it issues a normal warning or error.  
4999 -  
5000 - - Ordinarily qpdf treats an argument of the form  
5001 - :samp:`@file` to mean that command-line options  
5002 - should be read from :file:`file`. Now, if  
5003 - :file:`file` does not exist but  
5004 - :file:`@file` does, qpdf will treat  
5005 - :file:`@file` as a regular option. This  
5006 - makes it possible to work more easily with PDF files whose  
5007 - names happen to start with the ``@`` character.  
5008 -  
5009 - - Library Enhancements  
5010 -  
5011 - - Remove the restriction in most cases that the source QPDF  
5012 - object used in a ``QPDF::copyForeignObject`` call has to stick  
5013 - around until the destination QPDF is written. The exceptional  
5014 - case is when the source stream gets is data using a  
5015 - QPDFObjectHandle::StreamDataProvider. For a more in-depth  
5016 - discussion, see comments around ``copyForeignObject`` in  
5017 - :file:`QPDF.hh`.  
5018 -  
5019 - - Add new method ``QPDFWriter::getFinalVersion()``, which returns  
5020 - the PDF version that will ultimately be written to the final  
5021 - file. See comments in :file:`QPDFWriter.hh`  
5022 - for some restrictions on its use.  
5023 -  
5024 - - Add several methods for transcoding strings to some of the  
5025 - character sets used in PDF files: ``QUtil::utf8_to_ascii``,  
5026 - ``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and  
5027 - ``QUtil::utf8_to_utf16``. For the single-byte encodings that  
5028 - support only a limited character sets, these methods replace  
5029 - unsupported characters with a specified substitute.  
5030 -  
5031 - - Add new methods to ``QPDFAnnotationObjectHelper`` and  
5032 - ``QPDFFormFieldObjectHelper`` for querying flags and  
5033 - interpretation of different field types. Define constants in  
5034 - :file:`qpdf/Constants.h` to help with  
5035 - interpretation of flag values.  
5036 -  
5037 - - Add new methods  
5038 - ``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and  
5039 - ``QPDFFormFieldObjectHelper::generateAppearance`` for  
5040 - generating appearance streams. See discussion in  
5041 - :file:`QPDFFormFieldObjectHelper.hh` for  
5042 - limitations.  
5043 -  
5044 - - Add two new helper functions for dealing with resource  
5045 - dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns  
5046 - a list of all second-level keys, which correspond to the names  
5047 - of resources, and ``QPDFObjectHandle::mergeResources()`` merges  
5048 - two resources dictionaries as long as they have non-conflicting  
5049 - keys. These methods are useful for certain types of objects  
5050 - that resolve resources from multiple places, such as form  
5051 - fields.  
5052 -  
5053 - - Add methods ``QPDFPageDocumentHelper::flattenAnnotations()``  
5054 - and  
5055 - ``QPDFAnnotationObjectHelper::getPageContentForAppearance()``  
5056 - for handling low-level details of annotation flattening.  
5057 -  
5058 - - Add new helper classes: ``QPDFOutlineDocumentHelper``,  
5059 - ``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``,  
5060 - ``QPDFNameTreeObjectHelper``, and  
5061 - ``QPDFNumberTreeObjectHelper``.  
5062 -  
5063 - - Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON  
5064 - representation of the object. Call ``serialize()`` on the  
5065 - result to convert it to a string.  
5066 -  
5067 - - Add a simple JSON serializer. This is not a complete or  
5068 - general-purpose JSON library. It allows assembly and  
5069 - serialization of JSON structures with some restrictions, which  
5070 - are described in the header file. This is the serializer used  
5071 - by qpdf's new JSON representation.  
5072 -  
5073 - - Add new ``QPDFObjectHandle::Matrix`` class along with a few  
5074 - convenience methods for dealing with six-element numerical  
5075 - arrays as matrices.  
5076 -  
5077 - - Add new method ``QPDFObjectHandle::wrapInArray``, which returns  
5078 - the object itself if it is an array, or an array containing the  
5079 - object otherwise. This is a common construct in PDF. This  
5080 - method prevents you from having to explicitly test whether  
5081 - something is a single element or an array.  
5082 -  
5083 - - Build Improvements  
5084 -  
5085 - - It is no longer necessary to run  
5086 - :command:`autogen.sh` to build from a pristine  
5087 - checkout. Automatically generated files are now committed so  
5088 - that it is possible to build on platforms without autoconf  
5089 - directly from a clean checkout of the repository. The  
5090 - :command:`configure` script detects if the files  
5091 - are out of date when it also determines that the tools are  
5092 - present to regenerate them.  
5093 -  
5094 - - Pull requests and the master branch are now built automatically  
5095 - in `Azure  
5096 - Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is  
5097 - free for open source projects. The build includes Linux, mac,  
5098 - Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage  
5099 - build. Official qpdf releases are now built with Azure  
5100 - Pipelines.  
5101 -  
5102 - - Notes for Packagers  
5103 -  
5104 - - A new section has been added to the documentation with notes  
5105 - for packagers. Please see :ref:`ref.packaging`.  
5106 -  
5107 - - The qpdf detects out-of-date automatically generated files. If  
5108 - your packaging system automatically refreshes libtool or  
5109 - autoconf files, it could cause this check to fail. To avoid  
5110 - this problem, pass  
5111 - :samp:`--disable-check-autofiles` to  
5112 - :command:`configure`.  
5113 -  
5114 - - If you would like to have qpdf completion enabled  
5115 - automatically, you can install completion files in the  
5116 - distribution's default location. You can find sample completion  
5117 - files to install in the :file:`completions`  
5118 - directory.  
5119 -  
5120 -8.2.1: August 18, 2018  
5121 - - Command-line Enhancements  
5122 -  
5123 - - Add  
5124 - :samp:`--keep-files-open={[yn]}`  
5125 - to override default determination of whether to keep files open  
5126 - when merging. Please see the discussion of  
5127 - :samp:`--keep-files-open` in :ref:`ref.basic-options` for additional details.  
5128 -  
5129 -8.2.0: August 16, 2018  
5130 - - Command-line Enhancements  
5131 -  
5132 - - Add :samp:`--no-warn` option to suppress  
5133 - issuing warning messages. If there are any conditions that  
5134 - would have caused warnings to be issued, the exit status is  
5135 - still 3.  
5136 -  
5137 - - Bug Fixes and Optimizations  
5138 -  
5139 - - Performance fix: optimize page merging operation to avoid  
5140 - unnecessary open/close calls on files being merged. This solves  
5141 - a dramatic slow-down that was observed when merging certain  
5142 - types of files.  
5143 -  
5144 - - Optimize how memory was used for the TIFF predictor,  
5145 - drastically improving performance and memory usage for files  
5146 - containing high-resolution images compressed with Flate using  
5147 - the TIFF predictor.  
5148 -  
5149 - - Bug fix: end of line characters were not properly handled  
5150 - inside strings in some cases.  
5151 -  
5152 - - Bug fix: using :samp:`--progress` on very small  
5153 - files could cause an infinite loop.  
5154 -  
5155 - - API enhancements  
5156 -  
5157 - - Add new class ``QPDFSystemError``, derived from  
5158 - ``std::runtime_error``, which is now thrown by  
5159 - ``QUtil::throw_system_error``. This enables the triggering  
5160 - ``errno`` value to be retrieved.  
5161 -  
5162 - - Add ``ClosedFileInputSource::stayOpen`` method, enabling a  
5163 - ``ClosedFileInputSource`` to stay open during manually  
5164 - indicated periods of high activity, thus reducing the overhead  
5165 - of frequent open/close operations.  
5166 -  
5167 - - Build Changes  
5168 -  
5169 - - For the mingw builds, change the name of the DLL import library  
5170 - from :file:`libqpdf.a` to  
5171 - :file:`libqpdf.dll.a` to more accurately  
5172 - reflect that it is an import library rather than a static  
5173 - library. This potentially clears the way for supporting a  
5174 - static library in the future, though presently, the qpdf  
5175 - Windows build only builds the DLL and executables.  
5176 -  
5177 -8.1.0: June 23, 2018  
5178 - - Usability Improvements  
5179 -  
5180 - - When splitting files, qpdf detects fonts and images that the  
5181 - document metadata claims are referenced from a page but are not  
5182 - actually referenced and omits them from the output file. This  
5183 - change can cause a significant reduction in the size of split  
5184 - PDF files for files created by some software packages. In some  
5185 - cases, it can also make page splitting slower. Prior versions  
5186 - of qpdf would believe the document metadata and sometimes  
5187 - include all the images from all the other pages even though the  
5188 - pages were no longer present. In the unlikely event that the  
5189 - old behavior should be desired, or if you have a case where  
5190 - page splitting is very slow, the old behavior (and speed) can  
5191 - be enabled by specifying  
5192 - :samp:`--preserve-unreferenced-resources`. For  
5193 - additional details, please see :ref:`ref.advanced-transformation`.  
5194 -  
5195 - - When merging multiple PDF files, qpdf no longer leaves all the  
5196 - files open. This makes it possible to merge numbers of files  
5197 - that may exceed the operating system's limit for the maximum  
5198 - number of open files.  
5199 -  
5200 - - The :samp:`--rotate` option's syntax has been  
5201 - extended to make the page range optional. If you specify  
5202 - :samp:`--rotate={angle}`  
5203 - without specifying a page range, the rotation will be applied  
5204 - to all pages. This can be especially useful for adjusting a PDF  
5205 - created from a multi-page document that was scanned upside  
5206 - down.  
5207 -  
5208 - - When merging multiple files, the  
5209 - :samp:`--verbose` option now prints information  
5210 - about each file as it operates on that file.  
5211 -  
5212 - - When the :samp:`--progress` option is  
5213 - specified, qpdf will print a running indicator of its best  
5214 - guess at how far through the writing process it is. Note that,  
5215 - as with all progress meters, it's an approximation. This option  
5216 - is implemented in a way that makes it useful for software that  
5217 - uses the qpdf library; see API Enhancements below.  
5218 -  
5219 - - Bug Fixes  
5220 -  
5221 - - Properly decrypt files that use revision 3 of the standard  
5222 - security handler but use 40 bit keys (even though revision 3  
5223 - supports 128-bit keys).  
5224 -  
5225 - - Limit depth of nested data structures to prevent crashes from  
5226 - certain types of malformed (malicious) PDFs.  
5227 -  
5228 - - In "newline before endstream" mode, insert the required extra  
5229 - newline before the ``endstream`` at the end of object streams.  
5230 - This one case was previously omitted.  
5231 -  
5232 - - API Enhancements  
5233 -  
5234 - - The first round of higher level "helper" interfaces has been  
5235 - introduced. These are designed to provide a more convenient way  
5236 - of interacting with certain document features than using  
5237 - ``QPDFObjectHandle`` directly. For details on helpers, see  
5238 - :ref:`ref.helper-classes`. Specific additional  
5239 - interfaces are described below.  
5240 -  
5241 - - Add two new document helper classes: ``QPDFPageDocumentHelper``  
5242 - for working with pages, and ``QPDFAcroFormDocumentHelper`` for  
5243 - working with interactive forms. No old methods have been  
5244 - removed, but ``QPDFPageDocumentHelper`` is now the preferred  
5245 - way to perform operations on pages rather than calling the old  
5246 - methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments  
5247 - in the header files direct you to the new interfaces. Please  
5248 - see the header files and :file:`ChangeLog`  
5249 - for additional details.  
5250 -  
5251 - - Add three new object helper class: ``QPDFPageObjectHelper`` for  
5252 - pages, ``QPDFFormFieldObjectHelper`` for interactive form  
5253 - fields, and ``QPDFAnnotationObjectHelper`` for annotations. All  
5254 - three classes are fairly sparse at the moment, but they have  
5255 - some useful, basic functionality.  
5256 -  
5257 - - A new example program  
5258 - :file:`examples/pdf-set-form-values.cc` has  
5259 - been added that illustrates use of the new document and object  
5260 - helpers.  
5261 -  
5262 - - The method ``QPDFWriter::registerProgressReporter`` has been  
5263 - added. This method allows you to register a function that is  
5264 - called by ``QPDFWriter`` to update your idea of the percentage  
5265 - it thinks it is through writing its output. Client programs can  
5266 - use this to implement reasonably accurate progress meters. The  
5267 - :command:`qpdf` command line tool uses this to  
5268 - implement its :samp:`--progress` option.  
5269 -  
5270 - - New methods ``QPDFObjectHandle::newUnicodeString`` and  
5271 - ``QPDFObject::unparseBinary`` have been added to allow for more  
5272 - convenient creation of strings that are explicitly encoded  
5273 - using big-endian UTF-16. This is useful for creating strings  
5274 - that appear outside of content streams, such as labels, form  
5275 - fields, outlines, document metadata, etc.  
5276 -  
5277 - - A new class ``QPDFObjectHandle::Rectangle`` has been added to  
5278 - ease working with PDF rectangles, which are just arrays of four  
5279 - numeric values.  
5280 -  
5281 -8.0.2: March 6, 2018  
5282 - - When a loop is detected while following cross reference streams or  
5283 - tables, treat this as damage instead of silently ignoring the  
5284 - previous table. This prevents loss of otherwise recoverable data  
5285 - in some damaged files.  
5286 -  
5287 - - Properly handle pages with no contents.  
5288 -  
5289 -8.0.1: March 4, 2018  
5290 - - Disregard data check errors when uncompressing ``/FlateDecode``  
5291 - streams. This is consistent with most other PDF readers and allows  
5292 - qpdf to recover data from another class of malformed PDF files.  
5293 -  
5294 - - On the command line when specifying page ranges, support preceding  
5295 - a page number by "r" to indicate that it should be counted from  
5296 - the end. For example, the range ``r3-r1`` would indicate the last  
5297 - three pages of a document.  
5298 -  
5299 -8.0.0: February 25, 2018  
5300 - - Packaging and Distribution Changes  
5301 -  
5302 - - QPDF is now distributed as an  
5303 - `AppImage <https://appimage.org/>`__ in addition to all the  
5304 - other ways it is distributed. The AppImage can be found in the  
5305 - download area with the other packages. Thanks to Kurt Pfeifle  
5306 - and Simon Peter for their contributions.  
5307 -  
5308 - - Bug Fixes  
5309 -  
5310 - - ``QPDFObjectHandle::getUTF8Val`` now properly treats  
5311 - non-Unicode strings as encoded with PDF Doc Encoding.  
5312 -  
5313 - - Improvements to handling of objects in PDF files that are not  
5314 - of the expected type. In most cases, qpdf will be able to warn  
5315 - for such cases rather than fail with an exception. Previous  
5316 - versions of qpdf would sometimes fail with errors such as  
5317 - "operation for dictionary object attempted on object of wrong  
5318 - type". This situation should be mostly or entirely eliminated  
5319 - now.  
5320 -  
5321 - - Enhancements to the :command:`qpdf` Command-line  
5322 - Tool. All new options listed here are documented in more detail in  
5323 - :ref:`ref.using`.  
5324 -  
5325 - - The option  
5326 - :samp:`--linearize-pass1={file}`  
5327 - has been added for debugging qpdf's linearization code.  
5328 -  
5329 - - The option :samp:`--coalesce-contents` can be  
5330 - used to combine content streams of a page whose contents are an  
5331 - array of streams into a single stream.  
5332 -  
5333 - - API Enhancements. All new API calls are documented in their  
5334 - respective classes' header files. There are no non-compatible  
5335 - changes to the API.  
5336 -  
5337 - - Add function ``qpdf_check_pdf`` to the C API. This function  
5338 - does basic checking that is a subset of what :command:`qpdf  
5339 - --check` performs.  
5340 -  
5341 - - Major enhancements to the lexical layer of qpdf. For a complete  
5342 - list of enhancements, please refer to the  
5343 - :file:`ChangeLog` file. Most of the changes  
5344 - result in improvements to qpdf's ability handle erroneous  
5345 - files. It is also possible for programs to handle whitespace,  
5346 - comments, and inline images as tokens.  
5347 -  
5348 - - New API for working with PDF content streams at a lexical  
5349 - level. The new class ``QPDFObjectHandle::TokenFilter`` allows  
5350 - the developer to provide token handlers. Token filters can be  
5351 - used with several different methods in ``QPDFObjectHandle`` as  
5352 - well as with a lower-level interface. See comments in  
5353 - :file:`QPDFObjectHandle.hh` as well as the  
5354 - new examples  
5355 - :file:`examples/pdf-filter-tokens.cc` and  
5356 - :file:`examples/pdf-count-strings.cc` for  
5357 - details.  
5358 -  
5359 -7.1.1: February 4, 2018  
5360 - - Bug fix: files whose /ID fields were other than 16 bytes long can  
5361 - now be properly linearized  
5362 -  
5363 - - A few compile and link issues have been corrected for some  
5364 - platforms.  
5365 -  
5366 -7.1.0: January 14, 2018  
5367 - - PDF files contain streams that may be compressed with various  
5368 - compression algorithms which, in some cases, may be enhanced by  
5369 - various predictor functions. Previously only the PNG up predictor  
5370 - was supported. In this version, all the PNG predictors as well as  
5371 - the TIFF predictor are supported. This increases the range of  
5372 - files that qpdf is able to handle.  
5373 -  
5374 - - QPDF now allows a raw encryption key to be specified in place of a  
5375 - password when opening encrypted files, and will optionally display  
5376 - the encryption key used by a file. This is a non-standard  
5377 - operation, but it can be useful in certain situations. Please see  
5378 - the discussion of :samp:`--password-is-hex-key` in  
5379 - :ref:`ref.basic-options` or the comments around  
5380 - ``QPDF::setPasswordIsHexKey`` in  
5381 - :file:`QPDF.hh` for additional details.  
5382 -  
5383 - - Bug fix: numbers ending with a trailing decimal point are now  
5384 - properly recognized as numbers.  
5385 -  
5386 - - Bug fix: when building qpdf from source on some platforms  
5387 - (especially MacOS), the build could get confused by older versions  
5388 - of qpdf installed on the system. This has been corrected.  
5389 -  
5390 -7.0.0: September 15, 2017  
5391 - - Packaging and Distribution Changes  
5392 -  
5393 - - QPDF's primary license is now `version 2.0 of the Apache  
5394 - License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather  
5395 - than version 2.0 of the Artistic License. You may still, at  
5396 - your option, consider qpdf to be licensed with version 2.0 of  
5397 - the Artistic license.  
5398 -  
5399 - - QPDF no longer has a dependency on the PCRE (Perl-Compatible  
5400 - Regular Expression) library. QPDF now has an added dependency  
5401 - on the JPEG library.  
5402 -  
5403 - - Bug Fixes  
5404 -  
5405 - - This release contains many bug fixes for various infinite  
5406 - loops, memory leaks, and other memory errors that could be  
5407 - encountered with specially crafted or otherwise erroneous PDF  
5408 - files.  
5409 -  
5410 - - New Features  
5411 -  
5412 - - QPDF now supports reading and writing streams encoded with JPEG  
5413 - or RunLength encoding. Library API enhancements and  
5414 - command-line options have been added to control this behavior.  
5415 - See command-line options  
5416 - :samp:`--compress-streams` and  
5417 - :samp:`--decode-level` and methods  
5418 - ``QPDFWriter::setCompressStreams`` and  
5419 - ``QPDFWriter::setDecodeLevel``.  
5420 -  
5421 - - QPDF is much better at recovering from broken files. In most  
5422 - cases, qpdf will skip invalid objects and will preserve broken  
5423 - stream data by not attempting to filter broken streams. QPDF is  
5424 - now able to recover or at least not crash on dozens of broken  
5425 - test files I have received over the past few years.  
5426 -  
5427 - - Page rotation is now supported and accessible from both the  
5428 - library and the command line.  
5429 -  
5430 - - ``QPDFWriter`` supports writing files in a way that preserves  
5431 - PCLm compliance in support of driverless printing. This is very  
5432 - specialized and is only useful to applications that already  
5433 - know how to create PCLm files.  
5434 -  
5435 - - Enhancements to the :command:`qpdf` Command-line  
5436 - Tool. All new options listed here are documented in more detail in  
5437 - :ref:`ref.using`.  
5438 -  
5439 - - Command-line arguments can now be read from files or standard  
5440 - input using ``@file`` or ``@-`` syntax. Please see :ref:`ref.invocation`.  
5441 -  
5442 - - :samp:`--rotate`: request page rotation  
5443 -  
5444 - - :samp:`--newline-before-endstream`: ensure that  
5445 - a newline appears before every ``endstream`` keyword in the  
5446 - file; used to prevent qpdf from breaking PDF/A compliance on  
5447 - already compliant files.  
5448 -  
5449 - - :samp:`--preserve-unreferenced`: preserve  
5450 - unreferenced objects in the input PDF  
5451 -  
5452 - - :samp:`--split-pages`: break output into chunks  
5453 - with fixed numbers of pages  
5454 -  
5455 - - :samp:`--verbose`: print the name of each  
5456 - output file that is created  
5457 -  
5458 - - :samp:`--compress-streams` and  
5459 - :samp:`--decode-level` replace  
5460 - :samp:`--stream-data` for improving granularity  
5461 - of controlling compression and decompression of stream data.  
5462 - The :samp:`--stream-data` option will remain  
5463 - available.  
5464 -  
5465 - - When running :command:`qpdf --check` with other  
5466 - options, checks are always run first. This enables qpdf to  
5467 - perform its full recovery logic before outputting other  
5468 - information. This can be especially useful when manually  
5469 - recovering broken files, looking at qpdf's regenerated cross  
5470 - reference table, or other similar operations.  
5471 -  
5472 - - Process :command:`--pages` earlier so that other  
5473 - options like :samp:`--show-pages` or  
5474 - :samp:`--split-pages` can operate on the file  
5475 - after page splitting/merging has occurred.  
5476 -  
5477 - - API Changes. All new API calls are documented in their respective  
5478 - classes' header files.  
5479 -  
5480 - - ``QPDFObjectHandle::rotatePage``: apply rotation to a page  
5481 - object  
5482 -  
5483 - - ``QPDFWriter::setNewlineBeforeEndstream``: force newline to  
5484 - appear before ``endstream``  
5485 -  
5486 - - ``QPDFWriter::setPreserveUnreferencedObjects``: preserve  
5487 - unreferenced objects that appear in the input PDF. The default  
5488 - behavior is to discard them.  
5489 -  
5490 - - New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are  
5491 - available for developers who wish to produce or consume  
5492 - RunLength or DCT stream data directly. The  
5493 - :file:`examples/pdf-create.cc` example  
5494 - illustrates their use.  
5495 -  
5496 - - ``QPDFWriter::setCompressStreams`` and  
5497 - ``QPDFWriter::setDecodeLevel`` methods control handling of  
5498 - different types of stream compression.  
5499 -  
5500 - - Add new C API functions ``qpdf_set_compress_streams``,  
5501 - ``qpdf_set_decode_level``,  
5502 - ``qpdf_set_preserve_unreferenced_objects``, and  
5503 - ``qpdf_set_newline_before_endstream`` corresponding to the new  
5504 - ``QPDFWriter`` methods.  
5505 -  
5506 -6.0.0: November 10, 2015  
5507 - - Implement :samp:`--deterministic-id` command-line  
5508 - option and ``QPDFWriter::setDeterministicID`` as well as C API  
5509 - function ``qpdf_set_deterministic_ID`` for generating a  
5510 - deterministic ID for non-encrypted files. When this option is  
5511 - selected, the ID of the file depends on the contents of the output  
5512 - file, and not on transient items such as the timestamp or output  
5513 - file name.  
5514 -  
5515 - - Make qpdf more tolerant of files whose xref table entries are not  
5516 - the correct length.  
5517 -  
5518 -5.1.3: May 24, 2015  
5519 - - Bug fix: fix-qdf was not properly handling files that contained  
5520 - object streams with more than 255 objects in them.  
5521 -  
5522 - - Bug fix: qpdf was not properly initializing Microsoft's secure  
5523 - crypto provider on fresh Windows installations that had not had  
5524 - any keys created yet.  
5525 -  
5526 - - Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of  
5527 - the Google Security Team. Please see the ChangeLog for details.  
5528 -  
5529 - - Properly handle pages that have no contents at all. There were  
5530 - many cases in which qpdf handled this fine, but a few methods  
5531 - blindly obtained page contents with handling the possibility that  
5532 - there were no contents.  
5533 -  
5534 - - Make qpdf more robust for a few more kinds of problems that may  
5535 - occur in invalid PDF files.  
5536 -  
5537 -5.1.2: June 7, 2014  
5538 - - Bug fix: linearizing files could create a corrupted output file  
5539 - under extremely unlikely file size circumstances. See ChangeLog  
5540 - for details. The odds of getting hit by this are very low, though  
5541 - one person did.  
5542 -  
5543 - - Bug fix: qpdf would fail to write files that had streams with  
5544 - decode parameters referencing other streams.  
5545 -  
5546 - - New example program: :command:`pdf-split-pages`:  
5547 - efficiently split PDF files into individual pages. The example  
5548 - program does this more efficiently than using :command:`qpdf  
5549 - --pages` to do it.  
5550 -  
5551 - - Packaging fix: Visual C++ binaries did not support Windows XP.  
5552 - This has been rectified by updating the compilers used to generate  
5553 - the release binaries.  
5554 -  
5555 -5.1.1: January 14, 2014  
5556 - - Performance fix: copying foreign objects could be very slow with  
5557 - certain types of files. This was most likely to be visible during  
5558 - page splitting and was due to traversing the same objects multiple  
5559 - times in some cases.  
5560 -  
5561 -5.1.0: December 17, 2013  
5562 - - Added runtime option (``QUtil::setRandomDataProvider``) to supply  
5563 - your own random data provider. You can use this if you want to  
5564 - avoid using the OS-provided secure random number generation  
5565 - facility or stdlib's less secure version. See comments in  
5566 - include/qpdf/QUtil.hh for details.  
5567 -  
5568 - - Fixed image comparison tests to not create 12-bit-per-pixel images  
5569 - since some versions of tiffcmp have bugs in comparing them in some  
5570 - cases. This increases the disk space required by the image  
5571 - comparison tests, which are off by default anyway.  
5572 -  
5573 - - Introduce a number of small fixes for compilation on the latest  
5574 - clang in MacOS and the latest Visual C++ in Windows.  
5575 -  
5576 - - Be able to handle broken files that end the xref table header with  
5577 - a space instead of a newline.  
5578 -  
5579 -5.0.1: October 18, 2013  
5580 - - Thanks to a detailed review by Florian Weimer and the Red Hat  
5581 - Product Security Team, this release includes a number of  
5582 - non-user-visible security hardening changes. Please see the  
5583 - ChangeLog file in the source distribution for the complete list.  
5584 -  
5585 - - When available, operating system-specific secure random number  
5586 - generation is used for generating initialization vectors and other  
5587 - random values used during encryption or file creation. For the  
5588 - Windows build, this results in an added dependency on Microsoft's  
5589 - cryptography API. To disable the OS-specific cryptography and use  
5590 - the old version, pass the  
5591 - :samp:`--enable-insecure-random` option to  
5592 - :command:`./configure`.  
5593 -  
5594 - - The :command:`qpdf` command-line tool now issues a  
5595 - warning when :samp:`-accessibility=n` is specified  
5596 - for newer encryption versions stating that the option is ignored.  
5597 - qpdf, per the spec, has always ignored this flag, but it  
5598 - previously did so silently. This warning is issued only by the  
5599 - command-line tool, not by the library. The library's handling of  
5600 - this flag is unchanged.  
5601 -  
5602 -5.0.0: July 10, 2013  
5603 - - Bug fix: previous versions of qpdf would lose objects with  
5604 - generation != 0 when generating object streams. Fixing this  
5605 - required changes to the public API.  
5606 -  
5607 - - Removed methods from public API that were only supposed to be  
5608 - called by QPDFWriter and couldn't realistically be called anywhere  
5609 - else. See ChangeLog for details.  
5610 -  
5611 - - New ``QPDFObjGen`` class added to represent an object  
5612 - ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now  
5613 - preferred over ``QPDFObjectHandle::getObjectID()`` and  
5614 - ``QPDFObjectHandle::getGeneration()`` as it makes it less likely  
5615 - for people to accidentally write code that ignores the generation  
5616 - number. See :file:`QPDF.hh` and  
5617 - :file:`QPDFObjectHandle.hh` for additional  
5618 - notes.  
5619 -  
5620 - - Add :samp:`--show-npages` command-line option to  
5621 - the :command:`qpdf` command to show the number of  
5622 - pages in a file.  
5623 -  
5624 - - Allow omission of the page range within  
5625 - :samp:`--pages` for the  
5626 - :command:`qpdf` command. When omitted, the page  
5627 - range is implicitly taken to be all the pages in the file.  
5628 -  
5629 - - Various enhancements were made to support different types of  
5630 - broken files or broken readers. Details can be found in  
5631 - :file:`ChangeLog`.  
5632 -  
5633 -4.1.0: April 14, 2013  
5634 - - Note to people including qpdf in distributions: the  
5635 - :file:`.la` files generated by libtool are now  
5636 - installed by qpdf's :command:`make install` target.  
5637 - Before, they were not installed. This means that if your  
5638 - distribution does not want to include  
5639 - :file:`.la` files, you must remove them as  
5640 - part of your packaging process.  
5641 -  
5642 - - Major enhancement: API enhancements have been made to support  
5643 - parsing of content streams. This enhancement includes the  
5644 - following changes:  
5645 -  
5646 - - ``QPDFObjectHandle::parseContentStream`` method parses objects  
5647 - in a content stream and calls handlers in a callback class. The  
5648 - example  
5649 - :file:`examples/pdf-parse-content.cc`  
5650 - illustrates how this may be used.  
5651 -  
5652 - - ``QPDFObjectHandle`` can now represent operators and inline  
5653 - images, object types that may only appear in content streams.  
5654 -  
5655 - - Method ``QPDFObjectHandle::getTypeCode()`` returns an  
5656 - enumerated type value representing the underlying object type.  
5657 - Method ``QPDFObjectHandle::getTypeName()`` returns a text  
5658 - string describing the name of the type of a  
5659 - ``QPDFObjectHandle`` object. These methods can be used for more  
5660 - efficient parsing and debugging/diagnostic messages.  
5661 -  
5662 - - :command:`qpdf --check` now parses all pages'  
5663 - content streams in addition to doing other checks. While there are  
5664 - still many types of errors that cannot be detected, syntactic  
5665 - errors in content streams will now be reported.  
5666 -  
5667 - - Minor compilation enhancements have been made to facilitate easier  
5668 - for support for a broader range of compilers and compiler  
5669 - versions.  
5670 -  
5671 - - Warning flags have been moved into a separate variable in  
5672 - :file:`autoconf.mk`  
5673 -  
5674 - - The configure flag :samp:`--enable-werror` work  
5675 - for Microsoft compilers  
5676 -  
5677 - - All MSVC CRT security warnings have been resolved.  
5678 -  
5679 - - All C-style casts in C++ Code have been replaced by C++ casts,  
5680 - and many casts that had been included to suppress higher  
5681 - warning levels for some compilers have been removed, primarily  
5682 - for clarity. Places where integer type coercion occurs have  
5683 - been scrutinized. A new casting policy has been documented in  
5684 - the manual. This is of concern mainly to people porting qpdf to  
5685 - new platforms or compilers. It is not visible to programmers  
5686 - writing code that uses the library  
5687 -  
5688 - - Some internal limits have been removed in code that converts  
5689 - numbers to strings. This is largely invisible to users, but it  
5690 - does trigger a bug in some older versions of mingw-w64's C++  
5691 - library. See :file:`README-windows.md` in  
5692 - the source distribution if you think this may affect you. The  
5693 - copy of the DLL distributed with qpdf's binary distribution is  
5694 - not affected by this problem.  
5695 -  
5696 - - The RPM spec file previously included with qpdf has been removed.  
5697 - This is because virtually all Linux distributions include qpdf now  
5698 - that it is a dependency of CUPS filters.  
5699 -  
5700 - - A few bug fixes are included:  
5701 -  
5702 - - Overridden compressed objects are properly handled. Before,  
5703 - there were certain constructs that could cause qpdf to see old  
5704 - versions of some objects. The most usual manifestation of this  
5705 - was loss of filled in form values for certain files.  
5706 -  
5707 - - Installation no longer uses GNU/Linux-specific versions of some  
5708 - commands, so :command:`make install` works on  
5709 - Solaris with native tools.  
5710 -  
5711 - - The 64-bit mingw Windows binary package no longer includes a  
5712 - 32-bit DLL.  
5713 -  
5714 -4.0.1: January 17, 2013  
5715 - - Fix detection of binary attachments in test suite to avoid false  
5716 - test failures on some platforms.  
5717 -  
5718 - - Add clarifying comment in :file:`QPDF.hh` to  
5719 - methods that return the user password explaining that it is no  
5720 - longer possible with newer encryption formats to recover the user  
5721 - password knowing the owner password. In earlier encryption  
5722 - formats, the user password was encrypted in the file using the  
5723 - owner password. In newer encryption formats, a separate encryption  
5724 - key is used on the file, and that key is independently encrypted  
5725 - using both the user password and the owner password.  
5726 -  
5727 -4.0.0: December 31, 2012  
5728 - - Major enhancement: support has been added for newer encryption  
5729 - schemes supported by version X of Adobe Acrobat. This includes use  
5730 - of 127-character passwords, 256-bit encryption keys, and the  
5731 - encryption scheme specified in ISO 32000-2, the PDF 2.0  
5732 - specification. This scheme can be chosen from the command line by  
5733 - specifying use of 256-bit keys. qpdf also supports the deprecated  
5734 - encryption method used by Acrobat IX. This encryption style has  
5735 - known security weaknesses and should not be used in practice.  
5736 - However, such files exist "in the wild," so support for this  
5737 - scheme is still useful. New methods  
5738 - ``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme)  
5739 - and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated  
5740 - scheme) have been added to enable these new encryption schemes.  
5741 - Corresponding functions have been added to the C API as well.  
5742 -  
5743 - - Full support for Adobe extension levels in PDF version  
5744 - information. Starting with PDF version 1.7, corresponding to ISO  
5745 - 32000, Adobe adds new functionality by increasing the extension  
5746 - level rather than increasing the version. This support includes  
5747 - addition of the ``QPDF::getExtensionLevel`` method for retrieving  
5748 - the document's extension level, addition of versions of  
5749 - ``QPDFWriter::setMinimumPDFVersion`` and  
5750 - ``QPDFWriter::forcePDFVersion`` that accept an extension level,  
5751 - and extended syntax for specifying forced and minimum versions on  
5752 - the command line as described in :ref:`ref.advanced-transformation`. Corresponding functions  
5753 - have been added to the C API as well.  
5754 -  
5755 - - Minor fixes to prevent qpdf from referencing objects in the file  
5756 - that are not referenced in the file's overall structure. Most  
5757 - files don't have any such objects, but some files have contain  
5758 - unreferenced objects with errors, so these fixes prevent qpdf from  
5759 - needlessly rejecting or complaining about such objects.  
5760 -  
5761 - - Add new generalized methods for reading and writing files from/to  
5762 - programmer-defined sources. The method  
5763 - ``QPDF::processInputSource`` allows the programmer to use any  
5764 - input source for the input file, and  
5765 - ``QPDFWriter::setOutputPipeline`` allows the programmer to write  
5766 - the output file through any pipeline. These methods would make it  
5767 - possible to perform any number of specialized operations, such as  
5768 - accessing external storage systems, creating bindings for qpdf in  
5769 - other programming languages that have their own I/O systems, etc.  
5770 -  
5771 - - Add new method ``QPDF::getEncryptionKey`` for retrieving the  
5772 - underlying encryption key used in the file.  
5773 -  
5774 - - This release includes a small handful of non-compatible API  
5775 - changes. While effort is made to avoid such changes, all the  
5776 - non-compatible API changes in this version were to parts of the  
5777 - API that would likely never be used outside the library itself. In  
5778 - all cases, the altered methods or structures were parts of the  
5779 - ``QPDF`` that were public to enable them to be called from either  
5780 - ``QPDFWriter`` or were part of validation code that was  
5781 - over-zealous in reporting problems in parts of the file that would  
5782 - not ordinarily be referenced. In no case did any of the removed  
5783 - methods do anything worse that falsely report error conditions in  
5784 - files that were broken in ways that didn't matter. The following  
5785 - public parts of the ``QPDF`` class were changed in a  
5786 - non-compatible way:  
5787 -  
5788 - - Updated nested ``QPDF::EncryptionData`` class to add fields  
5789 - needed by the newer encryption formats, member variables  
5790 - changed to private so that future changes will not require  
5791 - breaking backward compatibility.  
5792 -  
5793 - - Added additional parameters to ``compute_data_key``, which is  
5794 - used by ``QPDFWriter`` to compute the encryption key used to  
5795 - encrypt a specific object.  
5796 -  
5797 - - Removed the method ``flattenScalarReferences``. This method was  
5798 - previously used prior to writing a new PDF file, but it has the  
5799 - undesired side effect of causing qpdf to read objects in the  
5800 - file that were not referenced. Some otherwise files have  
5801 - unreferenced objects with errors in them, so this could cause  
5802 - qpdf to reject files that would be accepted by virtually all  
5803 - other PDF readers. In fact, qpdf relied on only a very small  
5804 - part of what flattenScalarReferences did, so only this part has  
5805 - been preserved, and it is now done directly inside  
5806 - ``QPDFWriter``.  
5807 -  
5808 - - Removed the method ``decodeStreams``. This method was used by  
5809 - the :samp:`--check` option of the  
5810 - :command:`qpdf` command-line tool to force all  
5811 - streams in the file to be decoded, but it also suffered from  
5812 - the problem of opening otherwise unreferenced streams and thus  
5813 - could report false positive. The  
5814 - :samp:`--check` option now causes qpdf to go  
5815 - through all the motions of writing a new file based on the  
5816 - original one, so it will always reference and check exactly  
5817 - those parts of a file that any ordinary viewer would check.  
5818 -  
5819 - - Removed the method ``trimTrailerForWrite``. This method was  
5820 - used by ``QPDFWriter`` to modify the original QPDF object by  
5821 - removing fields from the trailer dictionary that wouldn't apply  
5822 - to the newly written file. This functionality, though generally  
5823 - harmless, was a poor implementation and has been replaced by  
5824 - having QPDFWriter filter these out when copying the trailer  
5825 - rather than modifying the original QPDF object. (Note that qpdf  
5826 - never modifies the original file itself.)  
5827 -  
5828 - - Allow the PDF header to appear anywhere in the first 1024 bytes of  
5829 - the file. This is consistent with what other readers do.  
5830 -  
5831 - - Fix the :command:`pkg-config` files to list zlib  
5832 - and pcre in ``Requires.private`` to better support static linking  
5833 - using :command:`pkg-config`.  
5834 -  
5835 -3.0.2: September 6, 2012  
5836 - - Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not  
5837 - used with ``QPDFWriter::setStaticID``, which made it pretty much  
5838 - useless. This has been fixed.  
5839 -  
5840 - - New API call ``QPDFWriter::setExtraHeaderText`` inserts additional  
5841 - text near the header of the PDF file. The intended use case is to  
5842 - insert comments that may be consumed by a downstream application,  
5843 - though other use cases may exist.  
5844 -  
5845 -3.0.1: August 11, 2012  
5846 - - Version 3.0.0 included addition of files for  
5847 - :command:`pkg-config`, but this was not mentioned  
5848 - in the release notes. The release notes for 3.0.0 were updated to  
5849 - mention this.  
5850 -  
5851 - - Bug fix: if an object stream ended with a scalar object not  
5852 - followed by space, qpdf would incorrectly report that it  
5853 - encountered a premature EOF. This bug has been in qpdf since  
5854 - versionย 2.0.  
5855 -  
5856 -3.0.0: August 2, 2012  
5857 - - Acknowledgment: I would like to express gratitude for the  
5858 - contributions of Tobias Hoffmann toward the release of qpdf  
5859 - version 3.0. He is responsible for most of the implementation and  
5860 - design of the new API for manipulating pages, and contributed code  
5861 - and ideas for many of the improvements made in version 3.0.  
5862 - Without his work, this release would certainly not have happened  
5863 - as soon as it did, if at all.  
5864 -  
5865 - - *Non-compatible API changes:*  
5866 -  
5867 - - The method ``QPDFObjectHandle::replaceStreamData`` that uses a  
5868 - ``StreamDataProvider`` to provide the stream data no longer  
5869 - takes a ``length`` parameter. The parameter was removed since  
5870 - this provides the user an opportunity to simplify the calling  
5871 - code. This method was introduced in version 2.2. At the time,  
5872 - the ``length`` parameter was required in order to ensure that  
5873 - calls to the stream data provider returned the same length for a  
5874 - specific stream every time they were invoked. In particular, the  
5875 - linearization code depends on this. Instead, qpdf 3.0 and newer  
5876 - check for that constraint explicitly. The first time the stream  
5877 - data provider is called for a specific stream, the actual length  
5878 - is saved, and subsequent calls are required to return the same  
5879 - number of bytes. This means the calling code no longer has to  
5880 - compute the length in advance, which can be a significant  
5881 - simplification. If your code fails to compile because of the  
5882 - extra argument and you don't want to make other changes to your  
5883 - code, just omit the argument.  
5884 -  
5885 - - Many methods take ``long long`` instead of other integer types.  
5886 - Most if not all existing code should compile fine with this  
5887 - change since such parameters had always previously been smaller  
5888 - types. This change was required to support files larger than two  
5889 - gigabytes in size.  
5890 -  
5891 - - Support has been added for large files. The test suite verifies  
5892 - support for files larger than 4 gigabytes, and manual testing has  
5893 - verified support for files larger than 10 gigabytes. Large file  
5894 - support is available for both 32-bit and 64-bit platforms as long  
5895 - as the compiler and underlying platforms support it.  
5896 -  
5897 - - Support for page selection (splitting and merging PDF files) has  
5898 - been added to the :command:`qpdf` command-line  
5899 - tool. See :ref:`ref.page-selection`.  
5900 -  
5901 - - Options have been added to the :command:`qpdf`  
5902 - command-line tool for copying encryption parameters from another  
5903 - file. See :ref:`ref.basic-options`.  
5904 -  
5905 - - New methods have been added to the ``QPDF`` object for adding and  
5906 - removing pages. See :ref:`ref.adding-and-remove-pages`.  
5907 -  
5908 - - New methods have been added to the ``QPDF`` object for copying  
5909 - objects from other PDF files. See :ref:`ref.foreign-objects`  
5910 -  
5911 - - A new method ``QPDFObjectHandle::parse`` has been added for  
5912 - constructing ``QPDFObjectHandle`` objects from a string  
5913 - description.  
5914 -  
5915 - - Methods have been added to ``QPDFWriter`` to allow writing to an  
5916 - already open stdio ``FILE*`` addition to writing to standard  
5917 - output or a named file. Methods have been added to ``QPDF`` to be  
5918 - able to process a file from an already open stdio ``FILE*``. This  
5919 - makes it possible to read and write PDF from secure temporary  
5920 - files that have been unlinked prior to being fully read or  
5921 - written.  
5922 -  
5923 - - The ``QPDF::emptyPDF`` can be used to allow creation of PDF files  
5924 - from scratch. The example  
5925 - :file:`examples/pdf-create.cc` illustrates how  
5926 - it can be used.  
5927 -  
5928 - - Several methods to take ``PointerHolder<Buffer>`` can now also  
5929 - accept ``std::string`` arguments.  
5930 -  
5931 - - Many new convenience methods have been added to the library, most  
5932 - in ``QPDFObjectHandle``. See :file:`ChangeLog`  
5933 - for a full list.  
5934 -  
5935 - - When building on a platform that supports ELF shared libraries  
5936 - (such as Linux), symbol versions are enabled by default. They can  
5937 - be disabled by passing  
5938 - :samp:`--disable-ld-version-script` to  
5939 - :command:`./configure`.  
5940 -  
5941 - - The file :file:`libqpdf.pc` is now installed  
5942 - to support :command:`pkg-config`.  
5943 -  
5944 - - Image comparison tests are off by default now since they are not  
5945 - needed to verify a correct build or port of qpdf. They are needed  
5946 - only when changing the actual PDF output generated by qpdf. You  
5947 - should enable them if you are making deep changes to qpdf itself.  
5948 - See :file:`README.md` for details.  
5949 -  
5950 - - Large file tests are off by default but can be turned on with  
5951 - :command:`./configure` or by setting an environment  
5952 - variable before running the test suite. See  
5953 - :file:`README.md` for details.  
5954 -  
5955 - - When qpdf's test suite fails, failures are not printed to the  
5956 - terminal anymore by default. Instead, find them in  
5957 - :file:`build/qtest.log`. For packagers who are  
5958 - building with an autobuilder, you can add the  
5959 - :samp:`--enable-show-failed-test-output` option to  
5960 - :command:`./configure` to restore the old behavior.  
5961 -  
5962 -2.3.1: December 28, 2011  
5963 - - Fix thread-safety problem resulting from non-thread-safe use of  
5964 - the PCRE library.  
5965 -  
5966 - - Made a few minor documentation fixes.  
5967 -  
5968 - - Add workaround for a bug that appears in some versions of  
5969 - ghostscript to the test suite  
5970 -  
5971 - - Fix minor build issue for Visual C++ 2010.  
5972 -  
5973 -2.3.0: August 11, 2011  
5974 - - Bug fix: when preserving existing encryption on encrypted files  
5975 - with cleartext metadata, older qpdf versions would generate  
5976 - password-protected files with no valid password. This operation  
5977 - now works. This bug only affected files created by copying  
5978 - existing encryption parameters; explicit encryption with  
5979 - specification of cleartext metadata worked before and continues to  
5980 - work.  
5981 -  
5982 - - Enhance ``QPDFWriter`` with a new constructor that allows you to  
5983 - delay the specification of the output file. When using this  
5984 - constructor, you may now call ``QPDFWriter::setOutputFilename`` to  
5985 - specify the output file, or you may use  
5986 - ``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write  
5987 - the resulting PDF file to a memory buffer. You may then use  
5988 - ``QPDFWriter::getBuffer`` to retrieve the memory buffer.  
5989 -  
5990 - - Add new API call ``QPDF::replaceObject`` for replacing objects by  
5991 - object ID  
5992 -  
5993 - - Add new API call ``QPDF::swapObjects`` for swapping two objects by  
5994 - object ID  
5995 -  
5996 - - Add ``QPDFObjectHandle::getDictAsMap`` and  
5997 - ``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of  
5998 - dictionary objects as maps and array objects as vectors.  
5999 -  
6000 - - Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to  
6001 - the C API for manipulating string fields of the document's  
6002 - ``/Info`` dictionary.  
6003 -  
6004 - - Add functions ``qpdf_init_write_memory``,  
6005 - ``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API  
6006 - for writing PDF files to a memory buffer instead of a file.  
6007 -  
6008 -2.2.4: June 25, 2011  
6009 - - Fix installation and compilation issues; no functionality changes.  
6010 -  
6011 -2.2.3: April 30, 2011  
6012 - - Handle some damaged streams with incorrect characters following  
6013 - the stream keyword.  
6014 -  
6015 - - Improve handling of inline images when normalizing content  
6016 - streams.  
6017 -  
6018 - - Enhance error recovery to properly handle files that use object 0  
6019 - as a regular object, which is specifically disallowed by the spec.  
6020 -  
6021 -2.2.2: October 4, 2010  
6022 - - Add new function ``qpdf_read_memory`` to the C API to call  
6023 - ``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1.  
6024 -  
6025 -2.2.1: October 1, 2010  
6026 - - Add new method ``QPDF::setOutputStreams`` to replace ``std::cout``  
6027 - and ``std::cerr`` with other streams for generation of diagnostic  
6028 - messages and error messages. This can be useful for GUIs or other  
6029 - applications that want to capture any output generated by the  
6030 - library to present to the user in some other way. Note that QPDF  
6031 - does not write to ``std::cout`` (or the specified output stream)  
6032 - except where explicitly mentioned in  
6033 - :file:`QPDF.hh`, and that the only use of the  
6034 - error stream is for warnings. Note also that output of warnings is  
6035 - suppressed when ``setSuppressWarnings(true)`` is called.  
6036 -  
6037 - - Add new method ``QPDF::processMemoryFile`` for operating on PDF  
6038 - files that are loaded into memory rather than in a file on disk.  
6039 -  
6040 - - Give a warning but otherwise ignore empty PDF objects by treating  
6041 - them as null. Empty object are not permitted by the PDF  
6042 - specification but have been known to appear in some actual PDF  
6043 - files.  
6044 -  
6045 - - Handle inline image filter abbreviations when the appear as stream  
6046 - filter abbreviations. The PDF specification does not allow use of  
6047 - stream filter abbreviations in this way, but Adobe Reader and some  
6048 - other PDF readers accept them since they sometimes appear  
6049 - incorrectly in actual PDF files.  
6050 -  
6051 - - Implement miscellaneous enhancements to ``PointerHolder`` and  
6052 - ``Buffer`` to support other changes.  
6053 -  
6054 -2.2.0: August 14, 2010  
6055 - - Add new methods to ``QPDFObjectHandle`` (``newStream`` and  
6056 - ``replaceStreamData`` for creating new streams and replacing  
6057 - stream data. This makes it possible to perform a wide range of  
6058 - operations that were not previously possible.  
6059 -  
6060 - - Add new helper method in ``QPDFObjectHandle``  
6061 - (``addPageContents``) for appending or prepending new content  
6062 - streams to a page. This method makes it possible to manipulate  
6063 - content streams without having to be concerned whether a page's  
6064 - contents are a single stream or an array of streams.  
6065 -  
6066 - - Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``,  
6067 - which replaces a dictionary key with a given value unless the  
6068 - value is null, in which case it removes the key instead.  
6069 -  
6070 - - Add new method in ``QPDFObjectHandle``: ``getRawStreamData``,  
6071 - which returns the raw (unfiltered) stream data into a buffer. This  
6072 - complements the ``getStreamData`` method, which returns the  
6073 - filtered (uncompressed) stream data and can only be used when the  
6074 - stream's data is filterable.  
6075 -  
6076 - - Provide two new examples:  
6077 - :command:`pdf-double-page-size` and  
6078 - :command:`pdf-invert-images` that illustrate the  
6079 - newly added interfaces.  
6080 -  
6081 - - Fix a memory leak that would cause loss of a few bytes for every  
6082 - object involved in a cycle of object references. Thanks to Jian Ma  
6083 - for calling my attention to the leak.  
6084 -  
6085 -2.1.5: April 25, 2010  
6086 - - Remove restriction of file identifier strings to 16 bytes. This  
6087 - unnecessary restriction was preventing qpdf from being able to  
6088 - encrypt or decrypt files with identifier strings that were not  
6089 - exactly 16 bytes long. The specification imposes no such  
6090 - restriction.  
6091 -  
6092 -2.1.4: April 18, 2010  
6093 - - Apply the same padding calculation fix from version 2.1.2 to the  
6094 - main cross reference stream as well.  
6095 -  
6096 - - Since :command:`qpdf --check` only performs limited  
6097 - checks, clarify the output to make it clear that there still may  
6098 - be errors that qpdf can't check. This should make it less  
6099 - surprising to people when another PDF reader is unable to read a  
6100 - file that qpdf thinks is okay.  
6101 -  
6102 -2.1.3: March 27, 2010  
6103 - - Fix bug that could cause a failure when rewriting PDF files that  
6104 - contain object streams with unreferenced objects that in turn  
6105 - reference indirect scalars.  
6106 -  
6107 - - Don't complain about (invalid) AES streams that aren't a multiple  
6108 - of 16 bytes. Instead, pad them before decrypting.  
6109 -  
6110 -2.1.2: January 24, 2010  
6111 - - Fix bug in padding around first half cross reference stream in  
6112 - linearized files. The bug could cause an assertion failure when  
6113 - linearizing certain unlucky files.  
6114 -  
6115 -2.1.1: December 14, 2009  
6116 - - No changes in functionality; insert missing include in an internal  
6117 - library header file to support gcc 4.4, and update test suite to  
6118 - ignore broken Adobe Reader installations.  
6119 -  
6120 -2.1: October 30, 2009  
6121 - - This is the first version of qpdf to include Windows support. On  
6122 - Windows, it is possible to build a DLL. Additionally, a partial  
6123 - C-language API has been introduced, which makes it possible to  
6124 - call qpdf functions from non-C++ environments. I am very grateful  
6125 - to ลฝarko Gajiฤ‡ (http://zarko-gajic.iz.hr/) for tirelessly testing  
6126 - numerous pre-release versions of this DLL and providing many  
6127 - excellent suggestions on improving the interface.  
6128 -  
6129 - For programming to the C interface, please see the header file  
6130 - :file:`qpdf/qpdf-c.h` and the example  
6131 - :file:`examples/pdf-linearize.c`.  
6132 -  
6133 - - ลฝarko Gajiฤ‡ has written a Delphi wrapper for qpdf, which can be  
6134 - downloaded from qpdf's download side. ลฝarko's Delphi wrapper is  
6135 - released with the same licensing terms as qpdf itself and comes  
6136 - with this disclaimer: "Delphi wrapper unit  
6137 - :file:`qpdf.pas` created by ลฝarko Gajiฤ‡  
6138 - (http://zarko-gajic.iz.hr/). Use at your own risk and for whatever  
6139 - purpose you want. No support is provided. Sample code is  
6140 - provided."  
6141 -  
6142 - - Support has been added for AES encryption and crypt filters.  
6143 - Although qpdf does not presently support files that use PKI-based  
6144 - encryption, with the addition of AES and crypt filters, qpdf is  
6145 - now be able to open most encrypted files created with newer  
6146 - versions of Acrobat or other PDF creation software. Note that I  
6147 - have not been able to get very many files encrypted in this way,  
6148 - so it's possible there could still be some cases that qpdf can't  
6149 - handle. Please report them if you find them.  
6150 -  
6151 - - Many error messages have been improved to include more information  
6152 - in hopes of making qpdf a more useful tool for PDF experts to use  
6153 - in manually recovering damaged PDF files.  
6154 -  
6155 - - Attempt to avoid compressing metadata streams if possible. This is  
6156 - consistent with other PDF creation applications.  
6157 -  
6158 - - Provide new command-line options for AES encrypt, cleartext  
6159 - metadata, and setting the minimum and forced PDF versions of  
6160 - output files.  
6161 -  
6162 - - Add additional methods to the ``QPDF`` object for querying the  
6163 - document's permissions. Although qpdf does not enforce these  
6164 - permissions, it does make them available so that applications that  
6165 - use qpdf can enforce permissions.  
6166 -  
6167 - - The :samp:`--check` option to  
6168 - :command:`qpdf` has been extended to include some  
6169 - additional information.  
6170 -  
6171 - - *Non-compatible API changes:*  
6172 -  
6173 - - QPDF's exception handling mechanism now uses  
6174 - ``std::logic_error`` for internal errors and  
6175 - ``std::runtime_error`` for runtime errors in favor of the now  
6176 - removed ``QEXC`` classes used in previous versions. The ``QEXC``  
6177 - exception classes predated the addition of the  
6178 - :file:`<stdexcept>` header file to the C++ standard library.  
6179 - Most of the exceptions thrown by the qpdf library itself are  
6180 - still of type ``QPDFExc`` which is now derived from  
6181 - ``std::runtime_error``. Programs that catch an instance of  
6182 - ``std::exception`` and displayed it by calling the ``what()``  
6183 - method will not need to be changed.  
6184 -  
6185 - - The ``QPDFExc`` class now internally represents various fields  
6186 - of the error condition and provides interfaces for querying  
6187 - them. Among the fields is a numeric error code that can help  
6188 - applications act differently on (a small number of) different  
6189 - error conditions. See :file:`QPDFExc.hh` for details.  
6190 -  
6191 - - Warnings can be retrieved from qpdf as instances of ``QPDFExc``  
6192 - instead of strings.  
6193 -  
6194 - - The nested ``QPDF::EncryptionData`` class's constructor takes an  
6195 - additional argument. This class is primarily intended to be used  
6196 - by ``QPDFWriter``. There's not really anything useful an  
6197 - end-user application could do with it. It probably shouldn't  
6198 - really be part of the public interface to begin with. Likewise,  
6199 - some of the methods for computing internal encryption dictionary  
6200 - parameters have changed to support ``/R=4`` encryption.  
6201 -  
6202 - - The method ``QPDF::getUserPassword`` has been removed since it  
6203 - didn't do what people would think it did. There are now two new  
6204 - methods: ``QPDF::getPaddedUserPassword`` and  
6205 - ``QPDF::getTrimmedUserPassword``. The first one does what the  
6206 - old ``QPDF::getUserPassword`` method used to do, which is to  
6207 - return the password with possible binary padding as specified by  
6208 - the PDF specification. The second one returns a human-readable  
6209 - password string.  
6210 -  
6211 - - The enumerated types that used to be nested in ``QPDFWriter``  
6212 - have moved to top-level enumerated types and are now defined in  
6213 - the file :file:`qpdf/Constants.h`. This enables them to be  
6214 - shared by both the C and C++ interfaces.  
6215 -  
6216 -2.0.6: May 3, 2009  
6217 - - Do not attempt to uncompress streams that have decode parameters  
6218 - we don't recognize. Earlier versions of qpdf would have rejected  
6219 - files with such streams.  
6220 -  
6221 -2.0.5: March 10, 2009  
6222 - - Improve error handling in the LZW decoder, and fix a small error  
6223 - introduced in the previous version with regard to handling full  
6224 - tables. The LZW decoder has been more strongly verified in this  
6225 - release.  
6226 -  
6227 -2.0.4: February 21, 2009  
6228 - - Include proper support for LZW streams encoded without the "early  
6229 - code change" flag. Special thanks to Atom Smasher who reported the  
6230 - problem and provided an input file compressed in this way, which I  
6231 - did not previously have.  
6232 -  
6233 - - Implement some improvements to file recovery logic.  
6234 -  
6235 -2.0.3: February 15, 2009  
6236 - - Compile cleanly with gcc 4.4.  
6237 -  
6238 - - Handle strings encoded as UTF-16BE properly.  
6239 -  
6240 -2.0.2: June 30, 2008  
6241 - - Update test suite to work properly with a  
6242 - non-:command:`bash`  
6243 - :file:`/bin/sh` and with Perl 5.10. No changes  
6244 - were made to the actual qpdf source code itself for this release.  
6245 -  
6246 -2.0.1: May 6, 2008  
6247 - - No changes in functionality or interface. This release includes  
6248 - fixes to the source code so that qpdf compiles properly and passes  
6249 - its test suite on a broader range of platforms. See  
6250 - :file:`ChangeLog` in the source distribution  
6251 - for details.  
6252 -  
6253 -2.0: April 29, 2008  
6254 - - First public release.  
6255 -  
6256 -.. _acknowledgments:  
6257 -  
6258 -Acknowledgment  
6259 -==============  
6260 -  
6261 -QPDF was originally created in 2001 and modified periodically between  
6262 -2001 and 2005 during my employment at `Apex CoVantage  
6263 -<http://www.apexcovantage.com>`__. Upon my departure from Apex, the  
6264 -company graciously allowed me to take ownership of the software and  
6265 -continue maintaining it as an open source project, a decision for which I  
6266 -am very grateful. I have made considerable enhancements to it since  
6267 -that time. I feel fortunate to have worked for people who would make  
6268 -such a decision. This work would not have been possible without their  
6269 -support. 12 + overview
  13 + license
  14 + installation
  15 + cli
  16 + qdf
  17 + library
  18 + weak-crypto
  19 + json
  20 + design
  21 + linearization
  22 + object-streams
  23 + release-notes
  24 + acknowledgement
manual/installation.rst 0 โ†’ 100644
  1 +.. _ref.installing:
  2 +
  3 +Building and Installing QPDF
  4 +============================
  5 +
  6 +This chapter describes how to build and install qpdf. Please see also
  7 +the :file:`README.md` and
  8 +:file:`INSTALL` files in the source distribution.
  9 +
  10 +.. _ref.prerequisites:
  11 +
  12 +System Requirements
  13 +-------------------
  14 +
  15 +The qpdf package has few external dependencies. In order to build qpdf,
  16 +the following packages are required:
  17 +
  18 +- A C++ compiler that supports C++-14.
  19 +
  20 +- zlib: http://www.zlib.net/
  21 +
  22 +- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/
  23 +
  24 +- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be
  25 + able to use the gnutls crypto provider, and/or openssl:
  26 + https://openssl.org/ to be able to use the openssl crypto provider.
  27 +
  28 +- gnu make 3.81 or newer: http://www.gnu.org/software/make
  29 +
  30 +- perl version 5.8 or newer: http://www.perl.org/; required for running
  31 + the test suite. Starting with qpdf version 9.1.1, perl is no longer
  32 + required at runtime.
  33 +
  34 +- GNU diffutils (any version): http://www.gnu.org/software/diffutils/
  35 + is required to run the test suite. Note that this is the version of
  36 + diff present on virtually all GNU/Linux systems. This is required
  37 + because the test suite uses :command:`diff -u`.
  38 +
  39 +Part of qpdf's test suite does comparisons of the contents PDF files by
  40 +converting them images and comparing the images. The image comparison
  41 +tests are disabled by default. Those tests are not required for
  42 +determining correctness of a qpdf build if you have not modified the
  43 +code since the test suite also contains expected output files that are
  44 +compared literally. The image comparison tests provide an extra check to
  45 +make sure that any content transformations don't break the rendering of
  46 +pages. Transformations that affect the content streams themselves are
  47 +off by default and are only provided to help developers look into the
  48 +contents of PDF files. If you are making deep changes to the library
  49 +that cause changes in the contents of the files that qpdf generate,
  50 +then you should enable the image comparison tests. Enable them by
  51 +running :command:`configure` with the
  52 +:samp:`--enable-test-compare-images` flag. If you enable
  53 +this, the following additional requirements are required by the test
  54 +suite. Note that in no case are these items required to use qpdf.
  55 +
  56 +- libtiff: http://www.remotesensing.org/libtiff/
  57 +
  58 +- GhostScript version 8.60 or newer: http://www.ghostscript.com
  59 +
  60 +If you do not enable this, then you do not need to have tiff and
  61 +ghostscript.
  62 +
  63 +Pre-built documentation is distributed with qpdf, so you should
  64 +generally not need to rebuild the documentation. In order to build the
  65 +documentation from source, you need to install `Sphinx
  66 +<https://sphinx-doc.org>`__. To build the PDF version of the
  67 +documentation, you need `pdflatex`, `latexmk`, and a fairly complete
  68 +LaTeX installation. Detailed requirements can be found in the Sphinx
  69 +documentation.
  70 +
  71 +.. _ref.building:
  72 +
  73 +Build Instructions
  74 +------------------
  75 +
  76 +Building qpdf on UNIX is generally just a matter of running
  77 +
  78 +::
  79 +
  80 + ./configure
  81 + make
  82 +
  83 +You can also run :command:`make check` to run the test
  84 +suite and :command:`make install` to install. Please run
  85 +:command:`./configure --help` for options on what can be
  86 +configured. You can also set the value of ``DESTDIR`` during
  87 +installation to install to a temporary location, as is common with many
  88 +open source packages. Please see also the
  89 +:file:`README.md` and
  90 +:file:`INSTALL` files in the source distribution.
  91 +
  92 +Building on Windows is a little bit more complicated. For details,
  93 +please see :file:`README-windows.md` in the source
  94 +distribution. You can also download a binary distribution for Windows.
  95 +There is a port of qpdf to Visual C++ version 6 in the
  96 +:file:`contrib` area generously contributed by Jian
  97 +Ma. This is also discussed in more detail in
  98 +:file:`README-windows.md`.
  99 +
  100 +While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one
  101 +place in the public API, and it's just in a helper function. It is
  102 +possible to build qpdf on a system that doesn't have ``wchar_t``, and
  103 +it's also possible to compile a program that uses qpdf on a system
  104 +without ``wchar_t`` as long as you don't call that one method. This is a
  105 +very unusual situation. For a detailed discussion, please see the
  106 +top-level README.md file in qpdf's source distribution.
  107 +
  108 +There are some other things you can do with the build. Although qpdf
  109 +uses :command:`autoconf`, it does not use
  110 +:command:`automake` but instead uses a
  111 +hand-crafted non-recursive Makefile that requires gnu make. If you're
  112 +really interested, please read the comments in the top-level
  113 +:file:`Makefile`.
  114 +
  115 +.. _ref.crypto:
  116 +
  117 +Crypto Providers
  118 +----------------
  119 +
  120 +Starting with qpdf 9.1.0, the qpdf library can be built with multiple
  121 +implementations of providers of cryptographic functions, which we refer
  122 +to as "crypto providers." At the time of writing, a crypto
  123 +implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes
  124 +and RC4 and AES256 with and without CBC encryption. In the future, if
  125 +digital signature is added to qpdf, there may be additional requirements
  126 +beyond this.
  127 +
  128 +Starting with qpdf version 9.1.0, the available implementations are
  129 +``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added.
  130 +Additional implementations may be added if needed. It is also possible
  131 +for a developer to provide their own implementation without modifying
  132 +the qpdf library.
  133 +
  134 +.. _ref.crypto.build:
  135 +
  136 +Build Support For Crypto Providers
  137 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  138 +
  139 +When building with qpdf's build system, crypto providers can be enabled
  140 +at build time using various :command:`./configure`
  141 +options. The default behavior is for
  142 +:command:`./configure` to discover which crypto providers
  143 +can be supported based on available external libraries, to build all
  144 +available crypto providers, and to use an external provider as the
  145 +default over the native one. This behavior can be changed with the
  146 +following flags to :command:`./configure`:
  147 +
  148 +- :samp:`--enable-crypto-{x}`
  149 + (where :samp:`{x}` is a supported crypto
  150 + provider): enable the :samp:`{x}` crypto
  151 + provider, requiring any external dependencies it needs
  152 +
  153 +- :samp:`--disable-crypto-{x}`:
  154 + disable the :samp:`{x}` provider, and do not
  155 + link against its dependencies even if they are available
  156 +
  157 +- :samp:`--with-default-crypto={x}`:
  158 + make :samp:`{x}` the default provider even if
  159 + a higher priority one is available
  160 +
  161 +- :samp:`--disable-implicit-crypto`: only build crypto
  162 + providers that are explicitly requested with an
  163 + :samp:`--enable-crypto-{x}`
  164 + option
  165 +
  166 +For example, if you want to guarantee that the gnutls crypto provider is
  167 +used and that the native provider is not built, you could run
  168 +:command:`./configure --enable-crypto-gnutls
  169 +--disable-implicit-crypto`.
  170 +
  171 +If you build qpdf using your own build system, in order for qpdf to work
  172 +at all, you need to enable at least one crypto provider. The file
  173 +:file:`libqpdf/qpdf/qpdf-config.h.in` provides
  174 +macros ``DEFAULT_CRYPTO``, whose value must be a string naming the
  175 +default crypto provider, and various symbols starting with
  176 +``USE_CRYPTO_``, at least one of which has to be enabled. Additionally,
  177 +you must compile the source files that implement a crypto provider. To
  178 +get a list of those files, look at
  179 +:file:`libqpdf/build.mk`. If you want to omit a
  180 +particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is
  181 +undefined, you can completely ignore the source files that belong to a
  182 +particular crypto provider. Additionally, crypto providers may have
  183 +their own external dependencies that can be omitted if the crypto
  184 +provider is not used. For example, if you are building qpdf yourself and
  185 +are using an environment that does not support gnutls or openssl, you
  186 +can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS``
  187 +is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then
  188 +you must include the source files used in the native implementation,
  189 +some of which were added or renamed from earlier versions, to your
  190 +build, and you can ignore
  191 +:file:`QPDFCrypto_gnutls.cc`. Always consult
  192 +:file:`libqpdf/build.mk` to get the list of source
  193 +files you need to build.
  194 +
  195 +.. _ref.crypto.runtime:
  196 +
  197 +Runtime Crypto Provider Selection
  198 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  199 +
  200 +You can use the :samp:`--show-crypto` option to
  201 +:command:`qpdf` to get a list of available crypto
  202 +providers. The default provider is always listed first, and the rest are
  203 +listed in lexical order. Each crypto provider is listed on a line by
  204 +itself with no other text, enabling the output of this command to be
  205 +used easily in scripts.
  206 +
  207 +You can override which crypto provider is used by setting the
  208 +``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to
  209 +ever do this, but you might want to do it if you were explicitly trying
  210 +to compare behavior of two different crypto providers while testing
  211 +performance or reproducing a bug. It could also be useful for people who
  212 +are implementing their own crypto providers.
  213 +
  214 +.. _ref.crypto.develop:
  215 +
  216 +Crypto Provider Information for Developers
  217 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  218 +
  219 +If you are writing code that uses libqpdf and you want to force a
  220 +certain crypto provider to be used, you can call the method
  221 +``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of
  222 +a built-in or developer-supplied provider. To add your own crypto
  223 +provider, you have to create a class derived from ``QPDFCryptoImpl`` and
  224 +register it with ``QPDFCryptoProvider``. For additional information, see
  225 +comments in :file:`include/qpdf/QPDFCryptoImpl.hh`.
  226 +
  227 +.. _ref.crypto.design:
  228 +
  229 +Crypto Provider Design Notes
  230 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  231 +
  232 +This section describes a few bits of rationale for why the crypto
  233 +provider interface was set up the way it was. You don't need to know any
  234 +of this information, but it's provided for the record and in case it's
  235 +interesting.
  236 +
  237 +As a general rule, I want to avoid as much as possible including large
  238 +blocks of code that are conditionally compiled such that, in most
  239 +builds, some code is never built. This is dangerous because it makes it
  240 +very easy for invalid code to creep in unnoticed. As such, I want it to
  241 +be possible to build qpdf with all available crypto providers, and this
  242 +is the way I build qpdf for local development. At the same time, if a
  243 +particular packager feels that it is a security liability for qpdf to
  244 +use crypto functionality from other than a library that gets
  245 +considerable scrutiny for this specific purpose (such as gnutls,
  246 +openssl, or nettle), then I want to give that packager the ability to
  247 +completely disable qpdf's native implementation. Or if someone wants to
  248 +avoid adding a dependency on one of the external crypto providers, I
  249 +don't want the availability of the provider to impose additional
  250 +external dependencies within that environment. Both of these are
  251 +situations that I know to be true for some users of qpdf.
  252 +
  253 +I want registration and selection of crypto providers to be thread-safe,
  254 +and I want it to work deterministically for a developer to provide their
  255 +own crypto provider and be able to set it up as the default. This was
  256 +the primary motivation behind requiring C++-11 as doing so enabled me to
  257 +exploit the guaranteed thread safety of local block static
  258 +initialization. The ``QPDFCryptoProvider`` class uses a singleton
  259 +pattern with thread-safe initialization to create the singleton instance
  260 +of ``QPDFCryptoProvider`` and exposes only static methods in its public
  261 +interface. In this way, if a developer wants to call any
  262 +``QPDFCryptoProvider`` methods, the library guarantees the
  263 +``QPDFCryptoProvider`` is fully initialized and all built-in crypto
  264 +providers are registered. Making ``QPDFCryptoProvider`` actually know
  265 +about all the built-in providers may seem a bit sad at first, but this
  266 +choice makes it extremely clear exactly what the initialization behavior
  267 +is. There's no question about provider implementations automatically
  268 +registering themselves in a nondeterministic order. It also means that
  269 +implementations do not need to know anything about the provider
  270 +interface, which makes them easier to test in isolation. Another
  271 +advantage of this approach is that a developer who wants to develop
  272 +their own crypto provider can do so in complete isolation from the qpdf
  273 +library and, with just two calls, can make qpdf use their provider in
  274 +their application. If they decided to contribute their code, plugging it
  275 +into the qpdf library would require a very small change to qpdf's source
  276 +code.
  277 +
  278 +The decision to make the crypto provider selectable at runtime was one I
  279 +struggled with a little, but I decided to do it for various reasons.
  280 +Allowing an end user to switch crypto providers easily could be very
  281 +useful for reproducing a potential bug. If a user reports a bug that
  282 +some cryptographic thing is broken, I can easily ask that person to try
  283 +with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The
  284 +same could apply in the event of a performance problem. This also makes
  285 +it easier for qpdf's own test suite to exercise code with different
  286 +providers without having to make every program that links with qpdf
  287 +aware of the possibility of multiple providers. In qpdf's continuous
  288 +integration environment, the entire test suite is run for each supported
  289 +crypto provider. This is made simple by being able to select the
  290 +provider using an environment variable.
  291 +
  292 +Finally, making crypto providers selectable in this way establish a
  293 +pattern that I may follow again in the future for stream filter
  294 +providers. One could imagine a future enhancement where someone could
  295 +provide their own implementations for basic filters like
  296 +``/FlateDecode`` or for other filters that qpdf doesn't support.
  297 +Implementing the registration functions and internal storage of
  298 +registered providers was also easier using C++-11's functional
  299 +interfaces, which was another reason to require C++-11 at this time.
  300 +
  301 +.. _ref.packaging:
  302 +
  303 +Notes for Packagers
  304 +-------------------
  305 +
  306 +If you are packaging qpdf for an operating system distribution, here are
  307 +some things you may want to keep in mind:
  308 +
  309 +- Starting in qpdf version 9.1.1, qpdf no longer has a runtime
  310 + dependency on perl. This is because fix-qdf was rewritten in C++.
  311 + However, qpdf still has a build-time dependency on perl.
  312 +
  313 +- Make sure you are getting the intended behavior with regard to crypto
  314 + providers. Read :ref:`ref.crypto.build` for details.
  315 +
  316 +- Passing :samp:`--enable-show-failed-test-output` to
  317 + :command:`./configure` will cause any failed test
  318 + output to be written to the console. This can be very useful for
  319 + seeing test failures generated by autobuilders where you can't access
  320 + qtest.log after the fact.
  321 +
  322 +- If qpdf's build environment detects the presence of autoconf and
  323 + related tools, it will check to ensure that automatically generated
  324 + files are up-to-date with recorded checksums and fail if it detects a
  325 + discrepancy. This feature is intended to prevent you from
  326 + accidentally forgetting to regenerate automatic files after modifying
  327 + their sources. If your packaging environment automatically refreshes
  328 + automatic files, it can cause this check to fail. Suppress qpdf's
  329 + checks by passing :samp:`--disable-check-autofiles`
  330 + to :command:`/.configure`. This is safe since qpdf's
  331 + :command:`autogen.sh` just runs autotools in the
  332 + normal way.
  333 +
  334 +- QPDF's :command:`make install` does not install
  335 + completion files by default, but as a packager, it's good if you
  336 + install them wherever your distribution expects such files to go. You
  337 + can find completion files to install in the
  338 + :file:`completions` directory.
  339 +
  340 +- Packagers are encouraged to install the source files from the
  341 + :file:`examples` directory along with qpdf
  342 + development packages.
manual/json.rst 0 โ†’ 100644
  1 +.. _ref.json:
  2 +
  3 +QPDF JSON
  4 +=========
  5 +
  6 +.. _ref.json-overview:
  7 +
  8 +Overview
  9 +--------
  10 +
  11 +Beginning with qpdf version 8.3.0, the :command:`qpdf`
  12 +command-line program can produce a JSON representation of the
  13 +non-content data in a PDF file. It includes a dump in JSON format of all
  14 +objects in the PDF file excluding the content of streams. This JSON
  15 +representation makes it very easy to look in detail at the structure of
  16 +a given PDF file, and it also provides a great way to work with PDF
  17 +files programmatically from the command-line in languages that can't
  18 +call or link with the qpdf library directly. Note that stream data can
  19 +be extracted from PDF files using other qpdf command-line options.
  20 +
  21 +.. _ref.json-guarantees:
  22 +
  23 +JSON Guarantees
  24 +---------------
  25 +
  26 +The qpdf JSON representation includes a JSON serialization of the raw
  27 +objects in the PDF file as well as some computed information in a more
  28 +easily extracted format. QPDF provides some guarantees about its JSON
  29 +format. These guarantees are designed to simplify the experience of a
  30 +developer working with the JSON format.
  31 +
  32 +Compatibility
  33 + The top-level JSON object output is a dictionary. The JSON output
  34 + contains various nested dictionaries and arrays. With the exception
  35 + of dictionaries that are populated by the fields of objects from the
  36 + file, all instances of a dictionary are guaranteed to have exactly
  37 + the same keys. Future versions of qpdf are free to add additional
  38 + keys but not to remove keys or change the type of object that a key
  39 + points to. The qpdf program validates this guarantee, and in the
  40 + unlikely event that a bug in qpdf should cause it to generate data
  41 + that doesn't conform to this rule, it will ask you to file a bug
  42 + report.
  43 +
  44 + The top-level JSON structure contains a "``version``" key whose value
  45 + is simple integer. The value of the ``version`` key will be
  46 + incremented if a non-compatible change is made. A non-compatible
  47 + change would be any change that involves removal of a key, a change
  48 + to the format of data pointed to by a key, or a semantic change that
  49 + requires a different interpretation of a previously existing key. A
  50 + strong effort will be made to avoid breaking compatibility.
  51 +
  52 +Documentation
  53 + The :command:`qpdf` command can be invoked with the
  54 + :samp:`--json-help` option. This will output a JSON
  55 + structure that has the same structure as the JSON output that qpdf
  56 + generates, except that each field in the help output is a description
  57 + of the corresponding field in the JSON output. The specific
  58 + guarantees are as follows:
  59 +
  60 + - A dictionary in the help output means that the corresponding
  61 + location in the actual JSON output is also a dictionary with
  62 + exactly the same keys; that is, no keys present in help are absent
  63 + in the real output, and no keys will be present in the real output
  64 + that are not in help. As a special case, if the dictionary has a
  65 + single key whose name starts with ``<`` and ends with ``>``, it
  66 + means that the JSON output is a dictionary that can have any keys,
  67 + each of which conforms to the value of the special key. This is
  68 + used for cases in which the keys of the dictionary are things like
  69 + object IDs.
  70 +
  71 + - A string in the help output is a description of the item that
  72 + appears in the corresponding location of the actual output. The
  73 + corresponding output can have any format.
  74 +
  75 + - An array in the help output always contains a single element. It
  76 + indicates that the corresponding location in the actual output is
  77 + also an array, and that each element of the array has whatever
  78 + format is implied by the single element of the help output's
  79 + array.
  80 +
  81 + For example, the help output indicates includes a "``pagelabels``"
  82 + key whose value is an array of one element. That element is a
  83 + dictionary with keys "``index``" and "``label``". In addition to
  84 + describing the meaning of those keys, this tells you that the actual
  85 + JSON output will contain a ``pagelabels`` array, each of whose
  86 + elements is a dictionary that contains an ``index`` key, a ``label``
  87 + key, and no other keys.
  88 +
  89 +Directness and Simplicity
  90 + The JSON output contains the value of every object in the file, but
  91 + it also contains some processed data. This is analogous to how qpdf's
  92 + library interface works. The processed data is similar to the helper
  93 + functions in that it allows you to look at certain aspects of the PDF
  94 + file without having to understand all the nuances of the PDF
  95 + specification, while the raw objects allow you to mine the PDF for
  96 + anything that the higher-level interfaces are lacking.
  97 +
  98 +.. _json.limitations:
  99 +
  100 +Limitations of JSON Representation
  101 +----------------------------------
  102 +
  103 +There are a few limitations to be aware of with the JSON structure:
  104 +
  105 +- Strings, names, and indirect object references in the original PDF
  106 + file are all converted to strings in the JSON representation. In the
  107 + case of a "normal" PDF file, you can tell the difference because a
  108 + name starts with a slash (``/``), and an indirect object reference
  109 + looks like ``n n R``, but if there were to be a string that looked
  110 + like a name or indirect object reference, there would be no way to
  111 + tell this from the JSON output. Note that there are certain cases
  112 + where you know for sure what something is, such as knowing that
  113 + dictionary keys in objects are always names and that certain things
  114 + in the higher-level computed data are known to contain indirect
  115 + object references.
  116 +
  117 +- The JSON format doesn't support binary data very well. Mostly the
  118 + details are not important, but they are presented here for
  119 + information. When qpdf outputs a string in the JSON representation,
  120 + it converts the string to UTF-8, assuming usual PDF string semantics.
  121 + Specifically, if the original string is UTF-16, it is converted to
  122 + UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is
  123 + converted to UTF-8 with that assumption. This causes strange things
  124 + to happen to binary strings. For example, if you had the binary
  125 + string ``<038051>``, this would be output to the JSON as ``\u0003โ€ขQ``
  126 + because ``03`` is not a printable character and ``80`` is the bullet
  127 + character in PDF doc encoding and is mapped to the Unicode value
  128 + ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to
  129 + convert back from here to a binary string, would have to recognize
  130 + Unicode values whose code points are higher than ``0xFF`` and map
  131 + those back to their corresponding PDF doc encoding characters. There
  132 + is no way to tell the difference between a Unicode string that was
  133 + originally encoded as UTF-16 or one that was converted from PDF doc
  134 + encoding. In other words, it's best if you don't try to use the JSON
  135 + format to extract binary strings from the PDF file, but if you really
  136 + had to, it could be done. Note that qpdf's
  137 + :samp:`--show-object` option does not have this
  138 + limitation and will reveal the string as encoded in the original
  139 + file.
  140 +
  141 +.. _json.considerations:
  142 +
  143 +JSON: Special Considerations
  144 +----------------------------
  145 +
  146 +For the most part, the built-in JSON help tells you everything you need
  147 +to know about the JSON format, but there are a few non-obvious things to
  148 +be aware of:
  149 +
  150 +- While qpdf guarantees that keys present in the help will be present
  151 + in the output, those fields may be null or empty if the information
  152 + is not known or absent in the file. Also, if you specify
  153 + :samp:`--json-keys`, the keys that are not listed
  154 + will be excluded entirely except for those that
  155 + :samp:`--json-help` says are always present.
  156 +
  157 +- In a few places, there are keys with names containing
  158 + ``pageposfrom1``. The values of these keys are null or an integer. If
  159 + an integer, they point to a page index within the file numbering from
  160 + 1. Note that JSON indexes from 0, and you would also use 0-based
  161 + indexing using the API. However, 1-based indexing is easier in this
  162 + case because the command-line syntax for specifying page ranges is
  163 + 1-based. If you were going to write a program that looked through the
  164 + JSON for information about specific pages and then use the
  165 + command-line to extract those pages, 1-based indexing is easier.
  166 + Besides, it's more convenient to subtract 1 from a program in a real
  167 + programming language than it is to add 1 from shell code.
  168 +
  169 +- The image information included in the ``page`` section of the JSON
  170 + output includes the key "``filterable``". Note that the value of this
  171 + field may depend on the :samp:`--decode-level` that
  172 + you invoke qpdf with. The JSON output includes a top-level key
  173 + "``parameters``" that indicates the decode level used for computing
  174 + whether a stream was filterable. For example, jpeg images will be
  175 + shown as not filterable by default, but they will be shown as
  176 + filterable if you run :command:`qpdf --json
  177 + --decode-level=all`.
manual/library.rst 0 โ†’ 100644
  1 +.. _ref.using-library:
  2 +
  3 +Using the QPDF Library
  4 +======================
  5 +
  6 +.. _ref.using.from-cxx:
  7 +
  8 +Using QPDF from C++
  9 +-------------------
  10 +
  11 +The source tree for the qpdf package has an
  12 +:file:`examples` directory that contains a few
  13 +example programs. The :file:`qpdf/qpdf.cc` source
  14 +file also serves as a useful example since it exercises almost all of
  15 +the qpdf library's public interface. The best source of documentation on
  16 +the library itself is reading comments in
  17 +:file:`include/qpdf/QPDF.hh`,
  18 +:file:`include/qpdf/QPDFWriter.hh`, and
  19 +:file:`include/qpdf/QPDFObjectHandle.hh`.
  20 +
  21 +All header files are installed in the
  22 +:file:`include/qpdf` directory. It is recommend that
  23 +you use ``#include <qpdf/QPDF.hh>`` rather than adding
  24 +:file:`include/qpdf` to your include path.
  25 +
  26 +When linking against the qpdf static library, you may also need to
  27 +specify ``-lz -ljpeg`` on your link command. If your system understands
  28 +how to read libtool :file:`.la` files, this may not
  29 +be necessary.
  30 +
  31 +The qpdf library is safe to use in a multithreaded program, but no
  32 +individual ``QPDF`` object instance (including ``QPDF``,
  33 +``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one
  34 +thread at a time. Multiple threads may simultaneously work with
  35 +different instances of these and all other QPDF objects.
  36 +
  37 +.. _ref.using.other-languages:
  38 +
  39 +Using QPDF from other languages
  40 +-------------------------------
  41 +
  42 +The qpdf library is implemented in C++, which makes it hard to use
  43 +directly in other languages. There are a few things that can help.
  44 +
  45 +"C"
  46 + The qpdf library includes a "C" language interface that provides a
  47 + subset of the overall capabilities. The header file
  48 + :file:`qpdf/qpdf-c.h` includes information about
  49 + its use. As long as you use a C++ linker, you can link C programs
  50 + with qpdf and use the C API. For languages that can directly load
  51 + methods from a shared library, the C API can also be useful. People
  52 + have reported success using the C API from other languages on Windows
  53 + by directly calling functions in the DLL.
  54 +
  55 +Python
  56 + A Python module called
  57 + `pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and
  58 + highly functional set of Python bindings to the qpdf library. Using
  59 + pikepdf, you can work with PDF files in a natural way and combine
  60 + qpdf's capabilities with other functionality provided by Python's
  61 + rich standard library and available modules.
  62 +
  63 +Other Languages
  64 + Starting with version 8.3.0, the :command:`qpdf`
  65 + command-line tool can produce a JSON representation of the PDF file's
  66 + non-content data. This can facilitate interacting programmatically
  67 + with PDF files through qpdf's command line interface. For more
  68 + information, please see :ref:`ref.json`.
  69 +
  70 +.. _ref.unicode-files:
  71 +
  72 +A Note About Unicode File Names
  73 +-------------------------------
  74 +
  75 +When strings are passed to qpdf library routines either as ``char*`` or
  76 +as ``std::string``, they are treated as byte arrays except where
  77 +otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless
  78 +otherwise noted in comments in header files. In modern UNIX/Linux
  79 +environments, this generally does the right thing. In Windows, it's a
  80 +bit more complicated. Starting in qpdf 8.4.0, passwords that contain
  81 +Unicode characters are handled much better, and starting in qpdf 8.4.1,
  82 +the library attempts to properly handle Unicode characters in filenames.
  83 +In particular, in Windows, if a UTF-8 encoded string is used as a
  84 +filename in either ``QPDF`` or ``QPDFWriter``, it is internally
  85 +converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As
  86 +such, qpdf will generally operate properly on files with non-ASCII
  87 +characters in their names as long as the filenames are UTF-8 encoded for
  88 +passing into the qpdf library API, but there are still some rough edges,
  89 +such as the encoding of the filenames in error messages our CLI output
  90 +messages. Patches or bug reports are welcome for any continuing issues
  91 +with Unicode file names in Windows.
manual/license.rst 0 โ†’ 100644
  1 +.. _ref.license:
  2 +
  3 +License
  4 +=======
  5 +
  6 +QPDF is licensed under `the Apache License, Version 2.0
  7 +<http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License").
  8 +Unless required by applicable law or agreed to in writing, software
  9 +distributed under the License is distributed on an "AS IS" BASIS,
  10 +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
  11 +implied. See the License for the specific language governing
  12 +permissions and limitations under the License.
manual/linearization.rst 0 โ†’ 100644
  1 +.. _ref.linearization:
  2 +
  3 +Linearization
  4 +=============
  5 +
  6 +This chapter describes how ``QPDF`` and ``QPDFWriter`` implement
  7 +creation and processing of linearized PDFS.
  8 +
  9 +.. _ref.linearization-strategy:
  10 +
  11 +Basic Strategy for Linearization
  12 +--------------------------------
  13 +
  14 +To avoid the incestuous problem of having the qpdf library validate its
  15 +own linearized files, we have a special linearized file checking mode
  16 +which can be invoked via :command:`qpdf
  17 +--check-linearization` (or :command:`qpdf
  18 +--check`). This mode reads the linearization parameter
  19 +dictionary and the hint streams and validates that object ordering,
  20 +parameters, and hint stream contents are correct. The validation code
  21 +was first tested against linearized files created by external tools
  22 +(Acrobat and pdlin) and then used to validate files created by
  23 +``QPDFWriter`` itself.
  24 +
  25 +.. _ref.linearized.preparation:
  26 +
  27 +Preparing For Linearization
  28 +---------------------------
  29 +
  30 +Before creating a linearized PDF file from any other PDF file, the PDF
  31 +file must be altered such that all page attributes are propagated down
  32 +to the page level (and not inherited from parents in the ``/Pages``
  33 +tree). We also have to know which objects refer to which other objects,
  34 +being concerned with page boundaries and a few other cases. We refer to
  35 +this part of preparing the PDF file as
  36 +*optimization*, discussed in
  37 +:ref:`ref.optimization`. Note the, in this context, the
  38 +term *optimization* is a qpdf term, and the
  39 +term *linearization* is a term from the PDF
  40 +specification. Do not be confused by the fact that many applications
  41 +refer to linearization as optimization or web optimization.
  42 +
  43 +When creating linearized PDF files from optimized PDF files, there are
  44 +really only a few issues that need to be dealt with:
  45 +
  46 +- Creation of hints tables
  47 +
  48 +- Placing objects in the correct order
  49 +
  50 +- Filling in offsets and byte sizes
  51 +
  52 +.. _ref.optimization:
  53 +
  54 +Optimization
  55 +------------
  56 +
  57 +In order to perform various operations such as linearization and
  58 +splitting files into pages, it is necessary to know which objects are
  59 +referenced by which pages, page thumbnails, and root and trailer
  60 +dictionary keys. It is also necessary to ensure that all page-level
  61 +attributes appear directly at the page level and are not inherited from
  62 +parents in the pages tree.
  63 +
  64 +We refer to the process of enforcing these constraints as
  65 +*optimization*. As mentioned above, note
  66 +that some applications refer to linearization as optimization. Although
  67 +this optimization was initially motivated by the need to create
  68 +linearized files, we are using these terms separately.
  69 +
  70 +PDF file optimization is implemented in the
  71 +:file:`QPDF_optimization.cc` source file. That file
  72 +is richly commented and serves as the primary reference for the
  73 +optimization process.
  74 +
  75 +After optimization has been completed, the private member variables
  76 +``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have
  77 +been populated. Any object that has more than one value in the
  78 +``object_to_obj_users`` table is shared. Any object that has exactly one
  79 +value in the ``object_to_obj_users`` table is private. To find all the
  80 +private objects in a page or a trailer or root dictionary key, one
  81 +merely has make this determination for each element in the
  82 +``obj_user_to_objects`` table for the given page or key.
  83 +
  84 +Note that pages and thumbnails have different object user types, so the
  85 +above test on a page will not include objects referenced by the page's
  86 +thumbnail dictionary and nothing else.
  87 +
  88 +.. _ref.linearization.writing:
  89 +
  90 +Writing Linearized Files
  91 +------------------------
  92 +
  93 +We will create files with only primary hint streams. We will never write
  94 +overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either,
  95 +and they are never necessary.) The hint streams contain offset
  96 +information to objects that point to where they would be if the hint
  97 +stream were not present. This means that we have to calculate all object
  98 +positions before we can generate and write the hint table. This means
  99 +that we have to generate the file in two passes. To make this reliable,
  100 +``QPDFWriter`` in linearization mode invokes exactly the same code twice
  101 +to write the file to a pipeline.
  102 +
  103 +In the first pass, the target pipeline is a count pipeline chained to a
  104 +discard pipeline. The count pipeline simply passes its data through to
  105 +the next pipeline in the chain but can return the number of bytes passed
  106 +through it at any intermediate point. The discard pipeline is an end of
  107 +line pipeline that just throws its data away. The hint stream is not
  108 +written and dummy values with adequate padding are stored in the first
  109 +cross reference table, linearization parameter dictionary, and /Prev key
  110 +of the first trailer dictionary. All the offset, length, object
  111 +renumbering information, and anything else we need for the second pass
  112 +is stored.
  113 +
  114 +At the end of the first pass, this information is passed to the ``QPDF``
  115 +class which constructs a compressed hint stream in a memory buffer and
  116 +returns it. ``QPDFWriter`` uses this information to write a complete
  117 +hint stream object into a memory buffer. At this point, the length of
  118 +the hint stream is known.
  119 +
  120 +In the second pass, the end of the pipeline chain is a regular file
  121 +instead of a discard pipeline, and we have known values for all the
  122 +offsets and lengths that we didn't have in the first pass. We have to
  123 +adjust offsets that appear after the start of the hint stream by the
  124 +length of the hint stream, which is known. Anything that is of variable
  125 +length is padded, with the padding code surrounding any writing code
  126 +that differs in the two passes. This ensures that changes to the way
  127 +things are represented never results in offsets that were gathered
  128 +during the first pass becoming incorrect for the second pass.
  129 +
  130 +Using this strategy, we can write linearized files to a non-seekable
  131 +output stream with only a single pass to disk or wherever the output is
  132 +going.
  133 +
  134 +.. _ref.linearization-data:
  135 +
  136 +Calculating Linearization Data
  137 +------------------------------
  138 +
  139 +Once a file is optimized, we have information about which objects access
  140 +which other objects. We can then process these tables to decide which
  141 +part (as described in "Linearized PDF Document Structure" in the PDF
  142 +specification) each object is contained within. This tells us the exact
  143 +order in which objects are written. The ``QPDFWriter`` class asks for
  144 +this information and enqueues objects for writing in the proper order.
  145 +It also turns on a check that causes an exception to be thrown if an
  146 +object is encountered that has not already been queued. (This could
  147 +happen only if there were a bug in the traversal code used to calculate
  148 +the linearization data.)
  149 +
  150 +.. _ref.linearization-issues:
  151 +
  152 +Known Issues with Linearization
  153 +-------------------------------
  154 +
  155 +There are a handful of known issues with this linearization code. These
  156 +issues do not appear to impact the behavior of linearized files which
  157 +still work as intended: it is possible for a web browser to begin to
  158 +display them before they are fully downloaded. In fact, it seems that
  159 +various other programs that create linearized files have many of these
  160 +same issues. These items make reference to terminology used in the
  161 +linearization appendix of the PDF specification.
  162 +
  163 +- Thread Dictionary information keys appear in part 4 with the rest of
  164 + Threads instead of in part 9. Objects in part 9 are not grouped
  165 + together functionally.
  166 +
  167 +- We are not calculating numerators for shared object positions within
  168 + content streams or interleaving them within content streams.
  169 +
  170 +- We generate only page offset, shared object, and outline hint tables.
  171 + It would be relatively easy to add some additional tables. We gather
  172 + most of the information needed to create thumbnail hint tables. There
  173 + are comments in the code about this.
  174 +
  175 +.. _ref.linearization-debugging:
  176 +
  177 +Debugging Note
  178 +--------------
  179 +
  180 +The :command:`qpdf --show-linearization` command can show
  181 +the complete contents of linearization hint streams. To look at the raw
  182 +data, you can extract the filtered contents of the linearization hint
  183 +tables using :command:`qpdf --show-object=n
  184 +--filtered-stream-data`. Then, to convert this into a bit
  185 +stream (since linearization tables are bit streams written without
  186 +regard to byte boundaries), you can pipe the resulting data through the
  187 +following perl code:
  188 +
  189 +.. code-block:: perl
  190 +
  191 + use bytes;
  192 + binmode STDIN;
  193 + undef $/;
  194 + my $a = <STDIN>;
  195 + my @ch = split(//, $a);
  196 + map { printf("%08b", ord($_)) } @ch;
  197 + print "\n";
manual/object-streams.rst 0 โ†’ 100644
  1 +.. _ref.object-and-xref-streams:
  2 +
  3 +Object and Cross-Reference Streams
  4 +==================================
  5 +
  6 +This chapter provides information about the implementation of object
  7 +stream and cross-reference stream support in qpdf.
  8 +
  9 +.. _ref.object-streams:
  10 +
  11 +Object Streams
  12 +--------------
  13 +
  14 +Object streams can contain any regular object except the following:
  15 +
  16 +- stream objects
  17 +
  18 +- objects with generation > 0
  19 +
  20 +- the encryption dictionary
  21 +
  22 +- objects containing the /Length of another stream
  23 +
  24 +In addition, Adobe reader (at least as of version 8.0.0) appears to not
  25 +be able to handle having the document catalog appear in an object stream
  26 +if the file is encrypted, though this is not specifically disallowed by
  27 +the specification.
  28 +
  29 +There are additional restrictions for linearized files. See
  30 +:ref:`ref.object-streams-linearization` for details.
  31 +
  32 +The PDF specification refers to objects in object streams as "compressed
  33 +objects" regardless of whether the object stream is compressed.
  34 +
  35 +The generation number of every object in an object stream must be zero.
  36 +It is possible to delete and replace an object in an object stream with
  37 +a regular object.
  38 +
  39 +The object stream dictionary has the following keys:
  40 +
  41 +- ``/N``: number of objects
  42 +
  43 +- ``/First``: byte offset of first object
  44 +
  45 +- ``/Extends``: indirect reference to stream that this extends
  46 +
  47 +Stream collections are formed with ``/Extends``. They must form a
  48 +directed acyclic graph. These can be used for semantic information and
  49 +are not meaningful to the PDF document's syntactic structure. Although
  50 +qpdf preserves stream collections, it never generates them and doesn't
  51 +make use of this information in any way.
  52 +
  53 +The specification recommends limiting the number of objects in object
  54 +stream for efficiency in reading and decoding. Acrobat 6 uses no more
  55 +than 100 objects per object stream for linearized files and no more 200
  56 +objects per stream for non-linearized files. ``QPDFWriter``, in object
  57 +stream generation mode, never puts more than 100 objects in an object
  58 +stream.
  59 +
  60 +Object stream contents consists of *N* pairs of integers, each of which
  61 +is the object number and the byte offset of the object relative to the
  62 +first object in the stream, followed by the objects themselves,
  63 +concatenated.
  64 +
  65 +.. _ref.xref-streams:
  66 +
  67 +Cross-Reference Streams
  68 +-----------------------
  69 +
  70 +For non-hybrid files, the value following ``startxref`` is the byte
  71 +offset to the xref stream rather than the word ``xref``.
  72 +
  73 +For hybrid files (files containing both xref tables and cross-reference
  74 +streams), the xref table's trailer dictionary contains the key
  75 +``/XRefStm`` whose value is the byte offset to a cross-reference stream
  76 +that supplements the xref table. A PDF 1.5-compliant application should
  77 +read the xref table first. Then it should replace any object that it has
  78 +already seen with any defined in the xref stream. Then it should follow
  79 +any ``/Prev`` pointer in the original xref table's trailer dictionary.
  80 +The specification is not clear about what should be done, if anything,
  81 +with a ``/Prev`` pointer in the xref stream referenced by an xref table.
  82 +The ``QPDF`` class ignores it, which is probably reasonable since, if
  83 +this case were to appear for any sensible PDF file, the previous xref
  84 +table would probably have a corresponding ``/XRefStm`` pointer of its
  85 +own. For example, if a hybrid file were appended, the appended section
  86 +would have its own xref table and ``/XRefStm``. The appended xref table
  87 +would point to the previous xref table which would point the
  88 +``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to
  89 +it.
  90 +
  91 +Since xref streams must be read very early, they may not be encrypted,
  92 +and the may not contain indirect objects for keys required to read them,
  93 +which are these:
  94 +
  95 +- ``/Type``: value ``/XRef``
  96 +
  97 +- ``/Size``: value *n+1*: where *n* is highest object number (same as
  98 + ``/Size`` in the trailer dictionary)
  99 +
  100 +- ``/Index`` (optional): value
  101 + ``[:samp:`{n count}` ...]`` used to determine
  102 + which objects' information is stored in this stream. The default is
  103 + ``[0 /Size]``.
  104 +
  105 +- ``/Prev``: value :samp:`{offset}`: byte
  106 + offset of previous xref stream (same as ``/Prev`` in the trailer
  107 + dictionary)
  108 +
  109 +- ``/W [...]``: sizes of each field in the xref table
  110 +
  111 +The other fields in the xref stream, which may be indirect if desired,
  112 +are the union of those from the xref table's trailer dictionary.
  113 +
  114 +.. _ref.xref-stream-data:
  115 +
  116 +Cross-Reference Stream Data
  117 +~~~~~~~~~~~~~~~~~~~~~~~~~~~
  118 +
  119 +The stream data is binary and encoded in big-endian byte order. Entries
  120 +are concatenated, and each entry has a length equal to the total of the
  121 +entries in ``/W`` above. Each entry consists of one or more fields, the
  122 +first of which is the type of the field. The number of bytes for each
  123 +field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
  124 +is omitted and has the default value. The default value for the field
  125 +type is "``1``". All other default values are "``0``".
  126 +
  127 +PDF 1.5 has three field types:
  128 +
  129 +- 0: for free objects. Format: ``0 obj next-generation``, same as the
  130 + free table in a traditional cross-reference table
  131 +
  132 +- 1: regular non-compressed object. Format: ``1 offset generation``
  133 +
  134 +- 2: for objects in object streams. Format: ``2 object-stream-number
  135 + index``, the number of object stream containing the object and the
  136 + index within the object stream of the object.
  137 +
  138 +It seems standard to have the first entry in the table be ``0 0 0``
  139 +instead of ``0 0 ffff`` if there are no deleted objects.
  140 +
  141 +.. _ref.object-streams-linearization:
  142 +
  143 +Implications for Linearized Files
  144 +---------------------------------
  145 +
  146 +For linearized files, the linearization dictionary, document catalog,
  147 +and page objects may not be contained in object streams.
  148 +
  149 +Objects stored within object streams are given the highest range of
  150 +object numbers within the main and first-page cross-reference sections.
  151 +
  152 +It is okay to use cross-reference streams in place of regular xref
  153 +tables. There are on special considerations.
  154 +
  155 +Hint data refers to object streams themselves, not the objects in the
  156 +streams. Shared object references should also be made to the object
  157 +streams. There are no reference in any hint tables to the object numbers
  158 +of compressed objects (objects within object streams).
  159 +
  160 +When numbering objects, all shared objects within both the first and
  161 +second halves of the linearized files must be numbered consecutively
  162 +after all normal uncompressed objects in that half.
  163 +
  164 +.. _ref.object-stream-implementation:
  165 +
  166 +Implementation Notes
  167 +--------------------
  168 +
  169 +There are three modes for writing object streams:
  170 +:samp:`disable`, :samp:`preserve`, and
  171 +:samp:`generate`. In disable mode, we do not generate
  172 +any object streams, and we also generate an xref table rather than xref
  173 +streams. This can be used to generate PDF files that are viewable with
  174 +older readers. In preserve mode, we write object streams such that
  175 +written object streams contain the same objects and ``/Extends``
  176 +relationships as in the original file. This is equal to disable if the
  177 +file has no object streams. In generate, we create object streams
  178 +ourselves by grouping objects that are allowed in object streams
  179 +together in sets of no more than 100 objects. We also ensure that the
  180 +PDF version is at least 1.5 in generate mode, but we preserve the
  181 +version header in the other modes. The default is
  182 +:samp:`preserve`.
  183 +
  184 +We do not support creation of hybrid files. When we write files, even in
  185 +preserve mode, we will lose any xref tables and merge any appended
  186 +sections.
manual/overview.rst 0 โ†’ 100644
  1 +.. _ref.overview:
  2 +
  3 +What is QPDF?
  4 +=============
  5 +
  6 +QPDF is a program and C++ library for structural, content-preserving
  7 +transformations on PDF files. QPDF's website is located at
  8 +https://qpdf.sourceforge.io/. QPDF's source code is hosted on github
  9 +at https://github.com/qpdf/qpdf.
  10 +
  11 +QPDF provides many useful capabilities to developers of PDF-producing
  12 +software or for people who just want to look at the innards of a PDF
  13 +file to learn more about how they work. With QPDF, it is possible to
  14 +copy objects from one PDF file into another and to manipulate the list
  15 +of pages in a PDF file. This makes it possible to merge and split PDF
  16 +files. The QPDF library also makes it possible for you to create PDF
  17 +files from scratch. In this mode, you are responsible for supplying
  18 +all the contents of the file, while the QPDF library takes care of all
  19 +the syntactical representation of the objects, creation of cross
  20 +references tables and, if you use them, object streams, encryption,
  21 +linearization, and other syntactic details. You are still responsible
  22 +for generating PDF content on your own.
  23 +
  24 +QPDF has been designed with very few external dependencies, and it is
  25 +intentionally very lightweight. QPDF is *not* a PDF content creation
  26 +library, a PDF viewer, or a program capable of converting PDF into other
  27 +formats. In particular, QPDF knows nothing about the semantics of PDF
  28 +content streams. If you are looking for something that can do that, you
  29 +should look elsewhere. However, once you have a valid PDF file, QPDF can
  30 +be used to transform that file in ways that perhaps your original PDF
  31 +creation tool can't handle. For example, many programs generate simple PDF
  32 +files but can't password-protect them, web-optimize them, or perform
  33 +other transformations of that type.
manual/qdf.rst 0 โ†’ 100644
  1 +.. _ref.qdf:
  2 +
  3 +QDF Mode
  4 +========
  5 +
  6 +In QDF mode, qpdf creates PDF files in what we call *QDF
  7 +form*. A PDF file in QDF form, sometimes called a QDF
  8 +file, is a completely valid PDF file that has ``%QDF-1.0`` as its third
  9 +line (after the pdf header and binary characters) and has certain other
  10 +characteristics. The purpose of QDF form is to make it possible to edit
  11 +PDF files, with some restrictions, in an ordinary text editor. This can
  12 +be very useful for experimenting with different PDF constructs or for
  13 +making one-off edits to PDF files (though there are other reasons why
  14 +this may not always work). Note that QDF mode does not support
  15 +linearized files. If you enable linearization, QDF mode is automatically
  16 +disabled.
  17 +
  18 +It is ordinarily very difficult to edit PDF files in a text editor for
  19 +two reasons: most meaningful data in PDF files is compressed, and PDF
  20 +files are full of offset and length information that makes it hard to
  21 +add or remove data. A QDF file is organized in a manner such that, if
  22 +edits are kept within certain constraints, the
  23 +:command:`fix-qdf` program, distributed with qpdf, is
  24 +able to restore edited files to a correct state. The
  25 +:command:`fix-qdf` program takes no command-line
  26 +arguments. It reads a possibly edited QDF file from standard input and
  27 +writes a repaired file to standard output.
  28 +
  29 +The following attributes characterize a QDF file:
  30 +
  31 +- All objects appear in numerical order in the PDF file, including when
  32 + objects appear in object streams.
  33 +
  34 +- Objects are printed in an easy-to-read format, and all line endings
  35 + are normalized to UNIX line endings.
  36 +
  37 +- Unless specifically overridden, streams appear uncompressed (when
  38 + qpdf supports the filters and they are compressed with a non-lossy
  39 + compression scheme), and most content streams are normalized (line
  40 + endings are converted to just a UNIX-style linefeeds).
  41 +
  42 +- All streams lengths are represented as indirect objects, and the
  43 + stream length object is always the next object after the stream. If
  44 + the stream data does not end with a newline, an extra newline is
  45 + inserted, and a special comment appears after the stream indicating
  46 + that this has been done.
  47 +
  48 +- If the PDF file contains object streams, if object stream *n*
  49 + contains *k* objects, those objects are numbered from *n+1* through
  50 + *n+k*, and the object number/offset pairs appear on a separate line
  51 + for each object. Additionally, each object in the object stream is
  52 + preceded by a comment indicating its object number and index. This
  53 + makes it very easy to find objects in object streams.
  54 +
  55 +- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens,
  56 + and ``endobj`` tokens appear on lines by themselves. A blank line
  57 + follows every ``endobj`` token.
  58 +
  59 +- If there is a cross-reference stream, it is unfiltered.
  60 +
  61 +- Page dictionaries and page content streams are marked with special
  62 + comments that make them easy to find.
  63 +
  64 +- Comments precede each object indicating the object number of the
  65 + corresponding object in the original file.
  66 +
  67 +When editing a QDF file, any edits can be made as long as the above
  68 +constraints are maintained. This means that you can freely edit a page's
  69 +content without worrying about messing up the QDF file. It is also
  70 +possible to add new objects so long as those objects are added after the
  71 +last object in the file or subsequent objects are renumbered. If a QDF
  72 +file has object streams in it, you can always add the new objects before
  73 +the xref stream and then change the number of the xref stream, since
  74 +nothing generally ever references it by number.
  75 +
  76 +It is not generally practical to remove objects from QDF files without
  77 +messing up object numbering, but if you remove all references to an
  78 +object, you can run qpdf on the file (after running
  79 +:command:`fix-qdf`), and qpdf will omit the now-orphaned
  80 +object.
  81 +
  82 +When :command:`fix-qdf` is run, it goes through the file
  83 +and recomputes the following parts of the file:
  84 +
  85 +- the ``/N``, ``/W``, and ``/First`` keys of all object stream
  86 + dictionaries
  87 +
  88 +- the pairs of numbers representing object numbers and offsets of
  89 + objects in object streams
  90 +
  91 +- all stream lengths
  92 +
  93 +- the cross-reference table or cross-reference stream
  94 +
  95 +- the offset to the cross-reference table or cross-reference stream
  96 + following the ``startxref`` token
manual/release-notes.rst 0 โ†’ 100644
  1 +.. _ref.release-notes:
  2 +
  3 +Release Notes
  4 +=============
  5 +
  6 +For a detailed list of changes, please see the file
  7 +:file:`ChangeLog` in the source distribution.
  8 +
  9 +10.5.0: XXX Month dd, YYYY
  10 + - Library Enhancements
  11 +
  12 + - Since qpdf version 8, using object accessor methods on an
  13 + instance of ``QPDFObjectHandle`` may create warnings if the
  14 + object is not of the expected type. These warnings now have an
  15 + error code of ``qpdf_e_object`` instead of
  16 + ``qpdf_e_damaged_pdf``. Also, comments have been added to
  17 + :file:`QPDFObjectHandle.hh` to explain in more detail what the
  18 + behavior is. See :ref:`ref.object-accessors` for a more in-depth
  19 + discussion.
  20 +
  21 + - Add ``Pl_Buffer::getMallocBuffer()`` to initialize a buffer
  22 + allocated with ``malloc()`` for better cross-language
  23 + interoperability.
  24 +
  25 + - C API Enhancements
  26 +
  27 + - Overhaul error handling for the object handle functions C API.
  28 + Some rare error conditions that would previously have caused a
  29 + crash are now trapped and reported, and the functions that
  30 + generate them return fallback values. See comments in the
  31 + ``ERROR HANDLING`` section of :file:`include/qpdf/qpdf-c.h` for
  32 + details. In particular, exceptions thrown by the underlying C++
  33 + code when calling object accessors are caught and converted into
  34 + errors. The errors can be checked by call ``qpdf_has_error``.
  35 + Use ``qpdf_silence_errors`` to prevent the error from being
  36 + written to stderr.
  37 +
  38 + - Add ``qpdf_get_last_string_length`` to the C API to get the
  39 + length of the last string that was returned. This is needed to
  40 + handle strings that contain embedded null characters.
  41 +
  42 + - Add ``qpdf_oh_is_initialized`` and
  43 + ``qpdf_oh_new_uninitialized`` to the C API to make it possible
  44 + to work with uninitialized objects.
  45 +
  46 + - Add ``qpdf_oh_new_object`` to the C API. This allows you to
  47 + clone an object handle.
  48 +
  49 + - Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``,
  50 + and ``qpdf_replace_object``, exposing the corresponding methods
  51 + in ``QPDF`` and ``QPDFObjectHandle``.
  52 +
  53 + - Add several functions for working with pages. See ``PAGE
  54 + FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.
  55 +
  56 + - Add several functions for working with streams. See ``STREAM
  57 + FUNCTIONS`` in ``include/qpdf/qpdf-c.h`` for details.
  58 +
  59 + - Add ``qpdf_oh_get_type_code`` and ``qpdf_oh_get_type_name``.
  60 +
  61 + - Documentation change
  62 +
  63 + - The documentation sources have been switched from docbook to
  64 + reStructuredText processed with `Sphinx
  65 + <https://sphinx-doc.org>`__. This is mostly transparent (other
  66 + than format change) with the exception that all section links
  67 + have changed. What used to be `#ref.something` is now
  68 + `#something`. A top-to-bottom review of the documentation is
  69 + planned for an upcoming release.
  70 +
  71 +10.4.0: November 16, 2021
  72 + - Handling of Weak Cryptography Algorithms
  73 +
  74 + - From the qpdf CLI, the
  75 + :samp:`--allow-weak-crypto` is now required to
  76 + suppress a warning when explicitly creating PDF files using RC4
  77 + encryption. While qpdf will always retain the ability to read
  78 + and write such files, doing so will require explicit
  79 + acknowledgment moving forward. For qpdf 10.4, this change only
  80 + affects the command-line tool. Starting in qpdf 11, there will
  81 + be small API changes to require explicit acknowledgment in
  82 + those cases as well. For additional information, see :ref:`ref.weak-crypto`.
  83 +
  84 + - Bug Fixes
  85 +
  86 + - Fix potential bounds error when handling shell completion that
  87 + could occur when given bogus input.
  88 +
  89 + - Properly handle overlay/underlay on completely empty pages
  90 + (with no resource dictionary).
  91 +
  92 + - Fix crash that could occur under certain conditions when using
  93 + :samp:`--pages` with files that had form
  94 + fields.
  95 +
  96 + - Library Enhancements
  97 +
  98 + - Make ``QPDF::findPage`` functions public.
  99 +
  100 + - Add methods to ``Pl_Flate`` to be able to receive warnings on
  101 + certain recoverable conditions.
  102 +
  103 + - Add an extra check to the library to detect when foreign
  104 + objects are inserted directly (instead of using
  105 + ``QPDF::copyForeignObject``) at the time of insertion rather
  106 + than when the file is written. Catching the error sooner makes
  107 + it much easier to locate the incorrect code.
  108 +
  109 + - CLI Enhancements
  110 +
  111 + - Improve diagnostics around parsing
  112 + :samp:`--pages` command-line options
  113 +
  114 + - Packaging Changes
  115 +
  116 + - The Windows binary distribution is now built with crypto
  117 + provided by OpenSSL 3.0.
  118 +
  119 +10.3.2: May 8, 2021
  120 + - Bug Fixes
  121 +
  122 + - When generating a file while preserving object streams,
  123 + unreferenced objects are correctly removed unless
  124 + :samp:`--preserve-unreferenced` is specified.
  125 +
  126 + - Library Enhancements
  127 +
  128 + - When adding a page that already exists, make a shallow copy
  129 + instead of throwing an exception. This makes the library
  130 + behavior consistent with the CLI behavior. See
  131 + :file:`ChangeLog` for additional notes.
  132 +
  133 +10.3.1: March 11, 2021
  134 + - Bug Fixes
  135 +
  136 + - Form field copying failed on files where /DR was a direct
  137 + object in the document-level form dictionary.
  138 +
  139 +10.3.0: March 4, 2021
  140 + - Bug Fixes
  141 +
  142 + - The code for handling form fields when copying pages from
  143 + 10.2.0 was not quite right and didn't work in a number of
  144 + situations, such as when the same page was copied multiple
  145 + times or when there were conflicting resource or field names
  146 + across multiple copies. The 10.3.0 code has been much more
  147 + thoroughly tested with more complex cases and with a multitude
  148 + of readers and should be much closer to correct. The 10.2.0
  149 + code worked well enough for page splitting or for copying pages
  150 + with form fields into documents that didn't already have them
  151 + but was still not quite correct in handling of field-level
  152 + resources.
  153 +
  154 + - When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is
  155 + called, existing ``QPDFObjectHandle`` instances no longer point
  156 + to the old objects. The next time they are accessed, they
  157 + automatically notice the change to the underlying object and
  158 + update themselves. This resolves a very longstanding source of
  159 + confusion, albeit in a very rarely used method call.
  160 +
  161 + - Fix form field handling code to look for default appearances,
  162 + quadding, and default resources in the right places. The code
  163 + was not looking for things in the document-level interactive
  164 + form dictionary that it was supposed to be finding there. This
  165 + required adding a few new methods to
  166 + ``QPDFFormFieldObjectHelper``.
  167 +
  168 + - Library Enhancements
  169 +
  170 + - Reworked the code that handles copying annotations and form
  171 + fields during page operations. There were additional methods
  172 + added to the public API from 10.2.0 and a one deprecation of a
  173 + method added in 10.2.0. The majority of the API changes are in
  174 + methods most people would never call and that will hopefully be
  175 + superseded by higher-level interfaces for handling page copies.
  176 + Please see the :file:`ChangeLog` file for
  177 + details.
  178 +
  179 + - The method ``QPDF::numWarnings`` was added so that you can tell
  180 + whether any warnings happened during a specific block of code.
  181 +
  182 +10.2.0: February 23, 2021
  183 + - CLI Behavior Changes
  184 +
  185 + - Operations that work on combining pages are much better about
  186 + protecting form fields. In particular,
  187 + :samp:`--split-pages` and
  188 + :samp:`--pages` now preserve interaction form
  189 + functionality by copying the relevant form field information
  190 + from the original files. Additionally, if you use
  191 + :samp:`--pages` to select only some pages from
  192 + the original input file, unused form fields are removed, which
  193 + prevents lots of unused annotations from being retained.
  194 +
  195 + - By default, :command:`qpdf` no longer allows
  196 + creation of encrypted PDF files whose user password is
  197 + non-empty and owner password is empty when a 256-bit key is in
  198 + use. The :samp:`--allow-insecure` option,
  199 + specified inside the :samp:`--encrypt` options,
  200 + allows creation of such files. Behavior changes in the CLI are
  201 + avoided when possible, but an exception was made here because
  202 + this is security-related. qpdf must always allow creation of
  203 + weird files for testing purposes, but it should not default to
  204 + letting users unknowingly create insecure files.
  205 +
  206 + - Library Behavior Changes
  207 +
  208 + - Note: the changes in this section cause differences in output
  209 + in some cases. These differences change the syntax of the PDF
  210 + but do not change the semantics (meaning). I make a strong
  211 + effort to avoid gratuitous changes in qpdf's output so that
  212 + qpdf changes don't break people's tests. In this case, the
  213 + changes significantly improve the readability of the generated
  214 + PDF and don't affect any output that's generated by simple
  215 + transformation. If you are annoyed by having to update test
  216 + files, please rest assured that changes like this have been and
  217 + will continue to be rare events.
  218 +
  219 + - ``QPDFObjectHandle::newUnicodeString`` now uses whichever of
  220 + ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all
  221 + the characters in the string. This reduces needless encoding in
  222 + UTF-16 of strings that can be encoded in ASCII. This change may
  223 + cause qpdf to generate different output than before when form
  224 + field values are set using ``QPDFFormFieldObjectHelper`` but
  225 + does not change the meaning of the output.
  226 +
  227 + - The code that places form XObjects and also the code that
  228 + flattens rotations trim trailing zeroes from real numbers that
  229 + they calculate. This causes slight (but semantically
  230 + equivalent) differences in generated appearance streams and
  231 + form XObject invocations in overlay/underlay code or in user
  232 + code that calls the methods that place form XObjects on a page.
  233 +
  234 + - CLI Enhancements
  235 +
  236 + - Add new command line options for listing, saving, adding,
  237 + removing, and and copying file attachments. See :ref:`ref.attachments` for details.
  238 +
  239 + - Page splitting and merging operations, as well as
  240 + :samp:`--flatten-rotation`, are better behaved
  241 + with respect to annotations and interactive form fields. In
  242 + most cases, interactive form field functionality and proper
  243 + formatting and functionality of annotations is preserved by
  244 + these operations. There are still some cases that aren't
  245 + perfect, such as when functionality of annotations depends on
  246 + document-level data that qpdf doesn't yet understand or when
  247 + there are problems with referential integrity among form fields
  248 + and annotations (e.g., when a single form field object or its
  249 + associated annotations are shared across multiple pages, a case
  250 + that is out of spec but that works in most viewers anyway).
  251 +
  252 + - The option
  253 + :samp:`--password-file={filename}`
  254 + can now be used to read the decryption password from a file.
  255 + You can use ``-`` as the file name to read the password from
  256 + standard input. This is an easier/more obvious way to read
  257 + passwords from files or standard input than using
  258 + :samp:`@file` for this purpose.
  259 +
  260 + - Add some information about attachments to the json output, and
  261 + added ``attachments`` as an additional json key. The
  262 + information included here is limited to the preferred name and
  263 + content stream and a reference to the file spec object. This is
  264 + enough detail for clients to avoid the hassle of navigating a
  265 + name tree and provides what is needed for basic enumeration and
  266 + extraction of attachments. More detailed information can be
  267 + obtained by following the reference to the file spec object.
  268 +
  269 + - Add numeric option to :samp:`--collate`. If
  270 + :samp:`--collate={n}`
  271 + is given, take pages in groups of
  272 + :samp:`{n}` from the given files.
  273 +
  274 + - It is now valid to provide :samp:`--rotate=0`
  275 + to clear rotation from a page.
  276 +
  277 + - Library Enhancements
  278 +
  279 + - This release includes numerous additions to the API. Not all
  280 + changes are listed here. Please see the
  281 + :file:`ChangeLog` file in the source
  282 + distribution for a comprehensive list. Highlights appear below.
  283 +
  284 + - Add ``QPDFObjectHandle::ditems()`` and
  285 + ``QPDFObjectHandle::aitems()`` that enable C++-style iteration,
  286 + including range-for iteration, over dictionary and array
  287 + QPDFObjectHandles. See comments in
  288 + :file:`include/qpdf/QPDFObjectHandle.hh`
  289 + and
  290 + :file:`examples/pdf-name-number-tree.cc`
  291 + for details.
  292 +
  293 + - Add ``QPDFObjectHandle::copyStream`` for making a copy of a
  294 + stream within the same ``QPDF`` instance.
  295 +
  296 + - Add new helper classes for supporting file attachments, also
  297 + known as embedded files. New classes are
  298 + ``QPDFEmbeddedFileDocumentHelper``,
  299 + ``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``.
  300 + See their respective headers for details and
  301 + :file:`examples/pdf-attach-file.cc` for an
  302 + example.
  303 +
  304 + - Add a version of ``QPDFObjectHandle::parse`` that takes a
  305 + ``QPDF`` pointer as context so that it can parse strings
  306 + containing indirect object references. This is illustrated in
  307 + :file:`examples/pdf-attach-file.cc`.
  308 +
  309 + - Re-implement ``QPDFNameTreeObjectHelper`` and
  310 + ``QPDFNumberTreeObjectHelper`` to be more efficient, add an
  311 + iterator-based API, give them the capability to repair broken
  312 + trees, and create methods for modifying the trees. With this
  313 + change, qpdf has a robust read/write implementation of name and
  314 + number trees.
  315 +
  316 + - Add new versions of ``QPDFObjectHandle::replaceStreamData``
  317 + that take ``std::function`` objects for cases when you need
  318 + something between a static string and a full-fledged
  319 + StreamDataProvider. Using this with ``QUtil::file_provider`` is
  320 + a very easy way to create a stream from the contents of a file.
  321 +
  322 + - The ``QPDFMatrix`` class, formerly a private, internal class,
  323 + has been added to the public API. See
  324 + :file:`include/qpdf/QPDFMatrix.hh` for
  325 + details. This class is for working with transformation
  326 + matrices. Some methods in ``QPDFPageObjectHelper`` make use of
  327 + this to make information about transformation matrices
  328 + available. For an example, see
  329 + :file:`examples/pdf-overlay-page.cc`.
  330 +
  331 + - Several new methods were added to
  332 + ``QPDFAcroFormDocumentHelper`` for adding, removing, getting
  333 + information about, and enumerating form fields.
  334 +
  335 + - Add method
  336 + ``QPDFAcroFormDocumentHelper::transformAnnotations``, which
  337 + applies a transformation to each annotation on a page.
  338 +
  339 + - Add ``QPDFPageObjectHelper::copyAnnotations``, which copies
  340 + annotations and, if applicable, associated form fields, from
  341 + one page to another, possibly transforming the rectangles.
  342 +
  343 + - Build Changes
  344 +
  345 + - A C++-14 compiler is now required to build qpdf. There is no
  346 + intention to require anything newer than that for a while.
  347 + C++-14 includes modest enhancements to C++-11 and appears to be
  348 + supported about as widely as C++-11.
  349 +
  350 + - Bug Fixes
  351 +
  352 + - The :samp:`--flatten-rotation` option applies
  353 + transformations to any annotations that may be on the page.
  354 +
  355 + - If a form XObject lacks a resources dictionary, consider any
  356 + names in that form XObject to be referenced from the containing
  357 + page. This is compliant with older PDF versions. Also detect if
  358 + any form XObjects have any unresolved names and, if so, don't
  359 + remove unreferenced resources from them or from the page that
  360 + contains them. Unfortunately this has the side effect of
  361 + preventing removal of unreferenced resources in some cases
  362 + where names appear that don't refer to resources, such as with
  363 + tagged PDF. This is a bit of a corner case that is not likely
  364 + to cause a significant problem in practice, but the only side
  365 + effect would be lack of removal of shared resources. A future
  366 + version of qpdf may be more sophisticated in its detection of
  367 + names that refer to resources.
  368 +
  369 + - Properly handle strings if they appear in inline image
  370 + dictionaries while externalizing inline images.
  371 +
  372 +10.1.0: January 5, 2021
  373 + - CLI Enhancements
  374 +
  375 + - Add :samp:`--flatten-rotation` command-line
  376 + option, which causes all pages that are rotated using
  377 + parameters in the page's dictionary to instead be identically
  378 + rotated in the page's contents. The change is not user-visible
  379 + for compliant PDF readers but can be used to work around broken
  380 + PDF applications that don't properly handle page rotation.
  381 +
  382 + - Library Enhancements
  383 +
  384 + - Support for user-provided (pluggable, modular) stream filters.
  385 + It is now possible to derive a class from ``QPDFStreamFilter``
  386 + and register it with ``QPDF`` so that regular library methods,
  387 + including those used by ``QPDFWriter``, can decode streams with
  388 + filters not directly supported by the library. The example
  389 + :file:`examples/pdf-custom-filter.cc`
  390 + illustrates how to use this capability.
  391 +
  392 + - Add methods to ``QPDFPageObjectHelper`` to iterate through
  393 + XObjects on a page or form XObjects, possibly recursing into
  394 + nested form XObjects: ``forEachXObject``, ``ForEachImage``,
  395 + ``forEachFormXObject``.
  396 +
  397 + - Enhance several methods in ``QPDFPageObjectHelper`` to work
  398 + with form XObjects as well as pages, as noted in comments. See
  399 + :file:`ChangeLog` for a full list.
  400 +
  401 + - Rename some functions in ``QPDFPageObjectHelper``, while
  402 + keeping old names for compatibility:
  403 +
  404 + - ``getPageImages`` to ``getImages``
  405 +
  406 + - ``filterPageContents`` to ``filterContents``
  407 +
  408 + - ``pipePageContents`` to ``pipeContents``
  409 +
  410 + - ``parsePageContents`` to ``parseContents``
  411 +
  412 + - Add method ``QPDFPageObjectHelper::getFormXObjects`` to return
  413 + a map of form XObjects directly on a page or form XObject
  414 +
  415 + - Add new helper methods to ``QPDFObjectHandle``:
  416 + ``isFormXObject``, ``isImage``
  417 +
  418 + - Add the optional ``allow_streams`` parameter
  419 + ``QPDFObjectHandle::makeDirect``. When
  420 + ``QPDFObjectHandle::makeDirect`` is called in this way, it
  421 + preserves references to streams rather than throwing an
  422 + exception.
  423 +
  424 + - Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this
  425 + on a stream prevents ``QPDFWriter`` from attempting to
  426 + uncompress, recompress, or otherwise filter a stream even if it
  427 + could. Developers can use this to protect streams that are
  428 + optimized should be protected from ``QPDFWriter``'s default
  429 + behavior for any other reason.
  430 +
  431 + - Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is
  432 + useful to have for debugging.
  433 +
  434 + - Add method ``QPDFPageObjectHelper::flattenRotation``, which
  435 + replaces a page's ``/Rotate`` keyword by rotating the page
  436 + within the content stream and altering the page's bounding
  437 + boxes so the rendering is the same. This can be used to work
  438 + around buggy PDF readers that can't properly handle page
  439 + rotation.
  440 +
  441 + - C API Enhancements
  442 +
  443 + - Add several new functions to the C API for working with
  444 + objects. These are wrappers around many of the methods in
  445 + ``QPDFObjectHandle``. Their inclusion adds considerable new
  446 + capability to the C API.
  447 +
  448 + - Add ``qpdf_register_progress_reporter`` to the C API,
  449 + corresponding to ``QPDFWriter::registerProgressReporter``.
  450 +
  451 + - Performance Enhancements
  452 +
  453 + - Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object
  454 + for writing, resulting in about an 8% improvement in write
  455 + performance while allowing indirect objects to appear in
  456 + ``/DecodeParms``.
  457 +
  458 + - When extracting pages, the :command:`qpdf` CLI
  459 + only removes unreferenced resources from the pages that are
  460 + being kept, resulting in a significant performance improvement
  461 + when extracting small numbers of pages from large, complex
  462 + documents.
  463 +
  464 + - Bug Fixes
  465 +
  466 + - ``QPDFPageObjectHelper::externalizeInlineImages`` was not
  467 + externalizing images referenced from form XObjects that
  468 + appeared on the page.
  469 +
  470 + - ``QPDFObjectHandle::filterPageContents`` was broken for pages
  471 + with multiple content streams.
  472 +
  473 + - Tweak zsh completion code to behave a little better with
  474 + respect to path completion.
  475 +
  476 +10.0.4: November 21, 2020
  477 + - Bug Fixes
  478 +
  479 + - Fix a handful of integer overflows. This includes cases found
  480 + by fuzzing as well as having qpdf not do range checking on
  481 + unused values in the xref stream.
  482 +
  483 +10.0.3: October 31, 2020
  484 + - Bug Fixes
  485 +
  486 + - The fix to the bug involving copying streams with indirect
  487 + filters was incorrect and introduced a new, more serious bug.
  488 + The original bug has been fixed correctly, as has the bug
  489 + introduced in 10.0.2.
  490 +
  491 +10.0.2: October 27, 2020
  492 + - Bug Fixes
  493 +
  494 + - When concatenating content streams, as with
  495 + :samp:`--coalesce-contents`, there were cases
  496 + in which qpdf would merge two lexical tokens together, creating
  497 + invalid results. A newline is now inserted between merged
  498 + content streams if one is not already present.
  499 +
  500 + - Fix an internal error that could occur when copying foreign
  501 + streams whose stream data had been replaced using a stream data
  502 + provider if those streams had indirect filters or decode
  503 + parameters. This is a rare corner case.
  504 +
  505 + - Ensure that the caller's locale settings do not change the
  506 + results of numeric conversions performed internally by the qpdf
  507 + library. Note that the problem here could only be caused when
  508 + the qpdf library was used programmatically. Using the qpdf CLI
  509 + already ignored the user's locale for numeric conversion.
  510 +
  511 + - Fix several instances in which warnings were not suppressed in
  512 + spite of :samp:`--no-warn` and/or errors or
  513 + warnings were written to standard output rather than standard
  514 + error.
  515 +
  516 + - Fixed a memory leak that could occur under specific
  517 + circumstances when
  518 + :samp:`--object-streams=generate` was used.
  519 +
  520 + - Fix various integer overflows and similar conditions found by
  521 + the OSS-Fuzz project.
  522 +
  523 + - Enhancements
  524 +
  525 + - New option :samp:`--warning-exit-0` causes qpdf
  526 + to exit with a status of ``0`` rather than ``3`` if there are
  527 + warnings but no errors. Combine with
  528 + :samp:`--no-warn` to completely ignore
  529 + warnings.
  530 +
  531 + - Performance improvements have been made to
  532 + ``QPDF::processMemoryFile``.
  533 +
  534 + - The OpenSSL crypto provider produces more detailed error
  535 + messages.
  536 +
  537 + - Build Changes
  538 +
  539 + - The option :samp:`--disable-rpath` is now
  540 + supported by qpdf's :command:`./configure`
  541 + script. Some distributions' packaging standards recommended the
  542 + use of this option.
  543 +
  544 + - Selection of a printf format string for ``long long`` has
  545 + been moved from ``ifdefs`` to an autoconf
  546 + test. If you are using your own build system, you will need to
  547 + provide a value for ``LL_FMT`` in
  548 + :file:`libqpdf/qpdf/qpdf-config.h`, which
  549 + would typically be ``"%lld"`` or, for some Windows compilers,
  550 + ``"%I64d"``.
  551 +
  552 + - Several improvements were made to build-time configuration of
  553 + the OpenSSL crypto provider.
  554 +
  555 + - A nearly stand-alone Linux binary zip file is now included with
  556 + the qpdf release. This is built on an older (but supported)
  557 + Ubuntu LTS release, but would work on most reasonably recent
  558 + Linux distributions. It contains only the executables and
  559 + required shared libraries that would not be present on a
  560 + minimal system. It can be used for including qpdf in a minimal
  561 + environment, such as a docker container. The zip file is also
  562 + known to work as a layer in AWS Lambda.
  563 +
  564 + - QPDF's automated build has been migrated from Azure Pipelines
  565 + to GitHub Actions.
  566 +
  567 + - Windows-specific Changes
  568 +
  569 + - The Windows executables distributed with qpdf releases now use
  570 + the OpenSSL crypto provider by default. The native crypto
  571 + provider is also compiled in and can be selected at runtime
  572 + with the ``QPDF_CRYPTO_PROVIDER`` environment variable.
  573 +
  574 + - Improvements have been made to how a cryptographic provider is
  575 + obtained in the native Windows crypto implementation. However
  576 + mostly this is shadowed by OpenSSL being used by default.
  577 +
  578 +10.0.1: April 9, 2020
  579 + - Bug Fixes
  580 +
  581 + - 10.0.0 introduced a bug in which calling
  582 + ``QPDFObjectHandle::getStreamData`` on a stream that can't be
  583 + filtered was returning the raw data instead of throwing an
  584 + exception. This is now fixed.
  585 +
  586 + - Fix a bug that was preventing qpdf from linking with some
  587 + versions of clang on some platforms.
  588 +
  589 + - Enhancements
  590 +
  591 + - Improve the :file:`pdf-invert-images`
  592 + example to avoid having to load all the images into RAM at the
  593 + same time.
  594 +
  595 +10.0.0: April 6, 2020
  596 + - Performance Enhancements
  597 +
  598 + - The qpdf library and executable should run much faster in this
  599 + version than in the last several releases. Several internal
  600 + library optimizations have been made, and there has been
  601 + improved behavior on page splitting as well. This version of
  602 + qpdf should outperform any of the 8.x or 9.x versions.
  603 +
  604 + - Incompatible API (source-level) Changes (minor)
  605 +
  606 + - The ``QUtil::srandom`` method was removed. It didn't do
  607 + anything unless insecure random numbers were compiled in, and
  608 + they have been off by default for a long time. If you were
  609 + calling it, just remove the call since it wasn't doing anything
  610 + anyway.
  611 +
  612 + - Build/Packaging Changes
  613 +
  614 + - Add a ``openssl`` crypto provider, which is implemented with
  615 + OpenSSL and also works with BoringSSL. Thanks to Dean Scarff
  616 + for this contribution. If you maintain qpdf for a distribution,
  617 + pay special attention to make sure that you are including
  618 + support for the crypto providers you want. Package maintainers
  619 + will have to weigh the advantages of allowing users to pick a
  620 + crypto provider at runtime against the disadvantages of adding
  621 + more dependencies to qpdf.
  622 +
  623 + - Allow qpdf to built on stripped down systems whose C/C++
  624 + libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in
  625 + qpdf's README.md for details. This should be very rare, but it
  626 + is known to be helpful in some embedded environments.
  627 +
  628 + - CLI Enhancements
  629 +
  630 + - Add ``objectinfo`` key to the JSON output. This will be a place
  631 + to put computed metadata or other information about PDF objects
  632 + that are not immediately evident in other ways or that seem
  633 + useful for some other reason. In this version, information is
  634 + provided about each object indicating whether it is a stream
  635 + and, if so, what its length and filters are. Without this, it
  636 + was not possible to tell conclusively from the JSON output
  637 + alone whether or not an object was a stream. Run
  638 + :command:`qpdf --json-help` for details.
  639 +
  640 + - Add new option
  641 + :samp:`--remove-unreferenced-resources` which
  642 + takes ``auto``, ``yes``, or ``no`` as arguments. The new
  643 + ``auto`` mode, which is the default, performs a fast heuristic
  644 + over a PDF file when splitting pages to determine whether the
  645 + expensive process of finding and removing unreferenced
  646 + resources is likely to be of benefit. For most files, this new
  647 + default will result in a significant performance improvement
  648 + for splitting pages. See :ref:`ref.advanced-transformation` for a more detailed
  649 + discussion.
  650 +
  651 + - The :samp:`--preserve-unreferenced-resources`
  652 + is now just a synonym for
  653 + :samp:`--remove-unreferenced-resources=no`.
  654 +
  655 + - If the ``QPDF_EXECUTABLE`` environment variable is set when
  656 + invoking :command:`qpdf --bash-completion` or
  657 + :command:`qpdf --zsh-completion`, the completion
  658 + command that it outputs will refer to qpdf using the value of
  659 + that variable rather than what :command:`qpdf`
  660 + determines its executable path to be. This can be useful when
  661 + wrapping :command:`qpdf` with a script, working
  662 + with a version in the source tree, using an AppImage, or other
  663 + situations where there is some indirection.
  664 +
  665 + - Library Enhancements
  666 +
  667 + - Random number generation is now delegated to the crypto
  668 + provider. The old behavior is still used by the native crypto
  669 + provider. It is still possible to provide your own random
  670 + number generator.
  671 +
  672 + - Add a new version of
  673 + ``QPDFObjectHandle::StreamDataProvider::provideStreamData``
  674 + that accepts the ``suppress_warnings`` and ``will_retry``
  675 + options and allows a success code to be returned. This makes it
  676 + possible to implement a ``StreamDataProvider`` that calls
  677 + ``pipeStreamData`` on another stream and to pass the response
  678 + back to the caller, which enables better error handling on
  679 + those proxied streams.
  680 +
  681 + - Update ``QPDFObjectHandle::pipeStreamData`` to return an
  682 + overall success code that goes beyond whether or not filtered
  683 + data was written successfully. This allows better error
  684 + handling of cases that were not filtering errors. You have to
  685 + call this explicitly. Methods in previously existing APIs have
  686 + the same semantics as before.
  687 +
  688 + - The ``QPDFPageObjectHelper::placeFormXObject`` method now
  689 + allows separate control over whether it should be willing to
  690 + shrink or expand objects to fit them better into the
  691 + destination rectangle. The previous behavior was that shrinking
  692 + was allowed but expansion was not. The previous behavior is
  693 + still the default.
  694 +
  695 + - When calling the C API, any non-zero value passed to a boolean
  696 + parameter is treated as ``TRUE``. Previously only the value
  697 + ``1`` was accepted. This makes the C API behave more like most
  698 + C interfaces and is known to improve compatibility with some
  699 + Windows environments that dynamically load the DLL and call
  700 + functions from it.
  701 +
  702 + - Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only
  703 + top-level dictionary keys or array items. This is unsafe
  704 + because it creates a situation in which changing a lower-level
  705 + item in one object may also change it in another object, but
  706 + for cases in which you *know* you are only inserting or
  707 + replacing top-level items, it is much faster than
  708 + ``QPDFObjectHandle::shallowCopy``.
  709 +
  710 + - Add ``QPDFObjectHandle::filterAsContents``, which filter's a
  711 + stream's data as a content stream. This is useful for parsing
  712 + the contents for form XObjects in the same way as parsing page
  713 + content streams.
  714 +
  715 + - Bug Fixes
  716 +
  717 + - When detecting and removing unreferenced resources during page
  718 + splitting, traverse into form XObjects and handle their
  719 + resources dictionaries as well.
  720 +
  721 + - The same error recovery is applied to streams in other than the
  722 + primary input file when merging or splitting pages.
  723 +
  724 +9.1.1: January 26, 2020
  725 + - Build/Packaging Changes
  726 +
  727 + - The fix-qdf program was converted from perl to C++. As such,
  728 + qpdf no longer has a runtime dependency on perl.
  729 +
  730 + - Library Enhancements
  731 +
  732 + - Added new helper routine ``QUtil::call_main_from_wmain`` which
  733 + converts ``wchar_t`` arguments to UTF-8 encoded strings. This
  734 + is useful for qpdf because library methods expect file names to
  735 + be UTF-8 encoded, even on Windows
  736 +
  737 + - Added new ``QUtil::read_lines_from_file`` methods that take
  738 + ``FILE*`` arguments and that allow preservation of end-of-line
  739 + characters. This also fixes a bug where
  740 + ``QUtil::read_lines_from_file`` wouldn't work properly with
  741 + Unicode filenames.
  742 +
  743 + - CLI Enhancements
  744 +
  745 + - Added options :samp:`--is-encrypted` and
  746 + :samp:`--requires-password` for testing whether
  747 + a file is encrypted or requires a password other than the
  748 + supplied (or empty) password. These communicate via exit
  749 + status, making them useful for shell scripts. They also work on
  750 + encrypted files with unknown passwords.
  751 +
  752 + - Added ``encrypt`` key to JSON options. With the exception of
  753 + the reconstructed user password for older encryption formats,
  754 + this provides the same information as
  755 + :samp:`--show-encryption` but in a consistent,
  756 + parseable format. See output of :command:`qpdf
  757 + --json-help` for details.
  758 +
  759 + - Bug Fixes
  760 +
  761 + - In QDF mode, be sure not to write more than one XRef stream to
  762 + a file, even when
  763 + :samp:`--preserve-unreferenced` is used.
  764 + :command:`fix-qdf` assumes that there is only
  765 + one XRef stream, and that it appears at the end of the file.
  766 +
  767 + - When externalizing inline images, properly handle images whose
  768 + color space is a reference to an object in the page's resource
  769 + dictionary.
  770 +
  771 + - Windows-specific fix for acquiring crypt context with a new
  772 + keyset.
  773 +
  774 +9.1.0: November 17, 2019
  775 + - Build Changes
  776 +
  777 + - A C++-11 compiler is now required to build qpdf.
  778 +
  779 + - A new crypto provider that uses gnutls for crypto functions is
  780 + now available and can be enabled at build time. See :ref:`ref.crypto` for more information about crypto
  781 + providers and :ref:`ref.crypto.build` for specific information about
  782 + the build.
  783 +
  784 + - Library Enhancements
  785 +
  786 + - Incorporate contribution from Masamichi Hosoda to properly
  787 + handle signature dictionaries by not including them in object
  788 + streams, formatting the ``Contents`` key has a hexadecimal
  789 + string, and excluding the ``/Contents`` key from encryption and
  790 + decryption.
  791 +
  792 + - Incorporate contribution from Masamichi Hosoda to provide new
  793 + API calls for getting file-level information about input and
  794 + output files, enabling certain operations on the files at the
  795 + file level rather than the object level. New methods include
  796 + ``QPDF::getXRefTable()``,
  797 + ``QPDFObjectHandle::getParsedOffset()``,
  798 + ``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and
  799 + ``QPDFWriter::getWrittenXRefTable()``.
  800 +
  801 + - Support build-time and runtime selectable crypto providers.
  802 + This includes the addition of new classes
  803 + ``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the
  804 + recognition of the ``QPDF_CRYPTO_PROVIDER`` environment
  805 + variable. Crypto providers are described in depth in :ref:`ref.crypto`.
  806 +
  807 + - CLI Enhancements
  808 +
  809 + - Addition of the :samp:`--show-crypto` option in
  810 + support of selectable crypto providers, as described in :ref:`ref.crypto`.
  811 +
  812 + - Allow ``:even`` or ``:odd`` to be appended to numeric ranges
  813 + for specification of the even or odd pages from among the pages
  814 + specified in the range.
  815 +
  816 + - Fix shell wildcard expansion behavior (``*`` and ``?``) of the
  817 + :command:`qpdf.exe` as built my MSVC.
  818 +
  819 +9.0.2: October 12, 2019
  820 + - Bug Fix
  821 +
  822 + - Fix the name of the temporary file used by
  823 + :samp:`--replace-input` so that it doesn't
  824 + require path splitting and works with paths include
  825 + directories.
  826 +
  827 +9.0.1: September 20, 2019
  828 + - Bug Fixes/Enhancements
  829 +
  830 + - Fix some build and test issues on big-endian systems and
  831 + compilers with characters that are unsigned by default. The
  832 + problems were in build and test only. There were no actual bugs
  833 + in the qpdf library itself relating to endianness or unsigned
  834 + characters.
  835 +
  836 + - When a dictionary has a duplicated key, report this with a
  837 + warning. The behavior of the library in this case is unchanged,
  838 + but the error condition is no longer silently ignored.
  839 +
  840 + - When a form field's display rectangle is erroneously specified
  841 + with inverted coordinates, detect and correct this situation.
  842 + This avoids some form fields from being flipped when flattening
  843 + annotations on files with this condition.
  844 +
  845 +9.0.0: August 31, 2019
  846 + - Incompatible API (source-level) Changes (minor)
  847 +
  848 + - The method ``QUtil::strcasecmp`` has been renamed to
  849 + ``QUtil::str_compare_nocase``. This incompatible change is
  850 + necessary to enable qpdf to build on platforms that define
  851 + ``strcasecmp`` as a macro.
  852 +
  853 + - The ``QPDF::copyForeignObject`` method had an overloaded
  854 + version that took a boolean parameter that was not used. If you
  855 + were using this version, just omit the extra parameter.
  856 +
  857 + - There was a version ``QPDFTokenizer::expectInlineImage`` that
  858 + took no arguments. This version has been removed since it
  859 + caused the tokenizer to return incorrect inline images. A new
  860 + version was added some time ago that produces correct output.
  861 + This is a very low level method that doesn't make sense to call
  862 + outside of qpdf's lexical engine. There are higher level
  863 + methods for tokenizing content streams.
  864 +
  865 + - Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and
  866 + ``QPDFOutlineObjectHelper::getKids`` to return a
  867 + ``std::vector`` instead of a ``std::list`` of
  868 + ``QPDFOutlineObjectHelper`` objects.
  869 +
  870 + - Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This
  871 + function would allow creation of name tokens whose value would
  872 + change when unparsed, which is never the correct behavior.
  873 +
  874 + - CLI Enhancements
  875 +
  876 + - The :samp:`--replace-input` option may be given
  877 + in place of an output file name. This causes qpdf to overwrite
  878 + the input file with the output. See the description of
  879 + :samp:`--replace-input` in :ref:`ref.basic-options` for more details.
  880 +
  881 + - The :samp:`--recompress-flate` instructs
  882 + :command:`qpdf` to recompress streams that are
  883 + already compressed with ``/FlateDecode``. Useful with
  884 + :samp:`--compression-level`.
  885 +
  886 + - The
  887 + :samp:`--compression-level={level}`
  888 + sets the zlib compression level used for any streams compressed
  889 + by ``/FlateDecode``. Most effective when combined with
  890 + :samp:`--recompress-flate`.
  891 +
  892 + - Library Enhancements
  893 +
  894 + - A new namespace ``QIntC``, provided by
  895 + :file:`qpdf/QIntC.hh`, provides safe
  896 + conversion methods between different integer types. These
  897 + conversion methods do range checking to ensure that the cast
  898 + can be performed with no loss of information. Every use of
  899 + ``static_cast`` in the library was inspected to see if it could
  900 + use one of these safe converters instead. See :ref:`ref.casting` for additional details.
  901 +
  902 + - Method ``QPDF::anyWarnings`` tells whether there have been any
  903 + warnings without clearing the list of warnings.
  904 +
  905 + - Method ``QPDF::closeInputSource`` closes or otherwise releases
  906 + the input source. This enables the input file to be deleted or
  907 + renamed.
  908 +
  909 + - New methods have been added to ``QUtil`` for converting back
  910 + and forth between strings and unsigned integers:
  911 + ``uint_to_string``, ``uint_to_string_base``,
  912 + ``string_to_uint``, and ``string_to_ull``.
  913 +
  914 + - New methods have been added to ``QPDFObjectHandle`` that return
  915 + the value of ``Integer`` objects as ``int`` or ``unsigned int``
  916 + with range checking and sensible fallback values, and a new
  917 + method was added to return an unsigned value. This makes it
  918 + easier to write code that is safe from unintentional data loss.
  919 + Functions: ``getUIntValue``, ``getIntValueAsInt``,
  920 + ``getUIntValueAsUInt``.
  921 +
  922 + - When parsing content streams with
  923 + ``QPDFObjectHandle::ParserCallbacks``, in place of the method
  924 + ``handleObject(QPDFObjectHandle)``, the developer may override
  925 + ``handleObject(QPDFObjectHandle, size_t offset, size_t
  926 + length)``. If this method is defined, it will
  927 + be invoked with the object along with its offset and length
  928 + within the overall contents being parsed. Intervening spaces
  929 + and comments are not included in offset and length.
  930 + Additionally, a new method ``contentSize(size_t)`` may be
  931 + implemented. If present, it will be called prior to the first
  932 + call to ``handleObject`` with the total size in bytes of the
  933 + combined contents.
  934 +
  935 + - New methods ``QPDF::userPasswordMatched`` and
  936 + ``QPDF::ownerPasswordMatched`` have been added to enable a
  937 + caller to determine whether the supplied password was the user
  938 + password, the owner password, or both. This information is also
  939 + displayed by :command:`qpdf --show-encryption`
  940 + and :command:`qpdf --check`.
  941 +
  942 + - Static method ``Pl_Flate::setCompressionLevel`` can be called
  943 + to set the zlib compression level globally used by all
  944 + instances of Pl_Flate in deflate mode.
  945 +
  946 + - The method ``QPDFWriter::setRecompressFlate`` can be called to
  947 + tell ``QPDFWriter`` to uncompress and recompress streams
  948 + already compressed with ``/FlateDecode``.
  949 +
  950 + - The underlying implementation of QPDF arrays has been enhanced
  951 + to be much more memory efficient when dealing with arrays with
  952 + lots of nulls. This enables qpdf to use drastically less memory
  953 + for certain types of files.
  954 +
  955 + - When traversing the pages tree, if nodes are encountered with
  956 + invalid types, the types are fixed, and a warning is issued.
  957 +
  958 + - A new helper method ``QUtil::read_file_into_memory`` was added.
  959 +
  960 + - All conditions previously reported by
  961 + ``QPDF::checkLinearization()`` as errors are now presented as
  962 + warnings.
  963 +
  964 + - Name tokens containing the ``#`` character not preceded by two
  965 + hexadecimal digits, which is invalid in PDF 1.2 and above, are
  966 + properly handled by the library: a warning is generated, and
  967 + the name token is properly preserved, even if invalid, in the
  968 + output. See :file:`ChangeLog` for a more
  969 + complete description of this change.
  970 +
  971 + - Bug Fixes
  972 +
  973 + - A small handful of memory issues, assertion failures, and
  974 + unhandled exceptions that could occur on badly mangled input
  975 + files have been fixed. Most of these problems were found by
  976 + Google's OSS-Fuzz project.
  977 +
  978 + - When :command:`qpdf --check` or
  979 + :command:`qpdf --check-linearization` encounters
  980 + a file with linearization warnings but not errors, it now
  981 + properly exits with exit code 3 instead of 2.
  982 +
  983 + - The :samp:`--completion-bash` and
  984 + :samp:`--completion-zsh` options now work
  985 + properly when qpdf is invoked as an AppImage.
  986 +
  987 + - Calling ``QPDFWriter::set*EncryptionParameters`` on a
  988 + ``QPDFWriter`` object whose output filename has not yet been
  989 + set no longer produces a segmentation fault.
  990 +
  991 + - When reading encrypted files, follow the spec more closely
  992 + regarding encryption key length. This allows qpdf to open
  993 + encrypted files in most cases when they have invalid or missing
  994 + /Length keys in the encryption dictionary.
  995 +
  996 + - Build Changes
  997 +
  998 + - On platforms that support it, qpdf now builds with
  999 + :samp:`-fvisibility=hidden`. If you build qpdf
  1000 + with your own build system, this is now safe to use. This
  1001 + prevents methods that are not part of the public API from being
  1002 + exported by the shared library, and makes qpdf's ELF shared
  1003 + libraries (used on Linux, MacOS, and most other UNIX flavors)
  1004 + behave more like the Windows DLL. Since the DLL already behaves
  1005 + in much this way, it is unlikely that there are any methods
  1006 + that were accidentally not exported. However, with ELF shared
  1007 + libraries, typeinfo for some classes has to be explicitly
  1008 + exported. If there are problems in dynamically linked code
  1009 + catching exceptions or subclassing, this could be the reason.
  1010 + If you see this, please report a bug at
  1011 + https://github.com/qpdf/qpdf/issues/.
  1012 +
  1013 + - QPDF is now compiled with integer conversion and sign
  1014 + conversion warnings enabled. Numerous changes were made to the
  1015 + library to make this safe.
  1016 +
  1017 + - QPDF's :command:`make install` target explicitly
  1018 + specifies the mode to use when installing files instead of
  1019 + relying the user's umask. It was previously doing this for some
  1020 + files but not others.
  1021 +
  1022 + - If :command:`pkg-config` is available, use it to
  1023 + locate :file:`libjpeg` and
  1024 + :file:`zlib` dependencies, falling back on
  1025 + old behavior if unsuccessful.
  1026 +
  1027 + - Other Notes
  1028 +
  1029 + - QPDF has been fully integrated into `Google's OSS-Fuzz
  1030 + project <https://github.com/google/oss-fuzz>`__. This project
  1031 + exercises code with randomly mutated inputs and is great for
  1032 + discovering hidden security crashes and security issues.
  1033 + Several bugs found by oss-fuzz have already been fixed in qpdf.
  1034 +
  1035 +8.4.2: May 18, 2019
  1036 + This release has just one change: correction of a buffer overrun in
  1037 + the Windows code used to open files. Windows users should take this
  1038 + update. There are no code changes that affect non-Windows releases.
  1039 +
  1040 +8.4.1: April 27, 2019
  1041 + - Enhancements
  1042 +
  1043 + - When :command:`qpdf --version` is run, it will
  1044 + detect if the qpdf CLI was built with a different version of
  1045 + qpdf than the library, which may indicate a problem with the
  1046 + installation.
  1047 +
  1048 + - New option :samp:`--remove-page-labels` will
  1049 + remove page labels before generating output. This used to
  1050 + happen if you ran :command:`qpdf --empty --pages ..
  1051 + --`, but the behavior changed in qpdf 8.3.0. This
  1052 + option enables people who were relying on the old behavior to
  1053 + get it again.
  1054 +
  1055 + - New option
  1056 + :samp:`--keep-files-open-threshold={count}`
  1057 + can be used to override number of files that qpdf will use to
  1058 + trigger the behavior of not keeping all files open when merging
  1059 + files. This may be necessary if your system allows fewer than
  1060 + the default value of 200 files to be open at the same time.
  1061 +
  1062 + - Bug Fixes
  1063 +
  1064 + - Handle Unicode characters in filenames on Windows. The changes
  1065 + to support Unicode on the CLI in Windows broke Unicode
  1066 + filenames for Windows.
  1067 +
  1068 + - Slightly tighten logic that determines whether an object is a
  1069 + page. This should resolve problems in some rare files where
  1070 + some non-page objects were passing qpdf's test for whether
  1071 + something was a page, thus causing them to be erroneously lost
  1072 + during page splitting operations.
  1073 +
  1074 + - Revert change that included preservation of outlines
  1075 + (bookmarks) in :samp:`--split-pages`. The way
  1076 + it was implemented in 8.3.0 and 8.4.0 caused a very significant
  1077 + degradation of performance for splitting certain files. A
  1078 + future release of qpdf may re-introduce the behavior in a more
  1079 + performant and also more correct fashion.
  1080 +
  1081 + - In JSON mode, add missing leading 0 to decimal values between
  1082 + -1 and 1 even if not present in the input. The JSON
  1083 + specification requires the leading 0. The PDF specification
  1084 + does not.
  1085 +
  1086 +8.4.0: February 1, 2019
  1087 + - Command-line Enhancements
  1088 +
  1089 + - *Non-compatible CLI change:* The qpdf command-line tool
  1090 + interprets passwords given at the command-line differently from
  1091 + previous releases when the passwords contain non-ASCII
  1092 + characters. In some cases, the behavior differs from previous
  1093 + releases. For a discussion of the current behavior, please see
  1094 + :ref:`ref.unicode-passwords`. The
  1095 + incompatibilities are as follows:
  1096 +
  1097 + - On Windows, qpdf now receives all command-line options as
  1098 + Unicode strings if it can figure out the appropriate
  1099 + compile/link options. This is enabled at least for MSVC and
  1100 + mingw builds. That means that if non-ASCII strings are
  1101 + passed to the qpdf CLI in Windows, qpdf will now correctly
  1102 + receive them. In the past, they would have either been
  1103 + encoded as Windows code page 1252 (also known as "Windows
  1104 + ANSI" or as something unintelligible. In almost all cases,
  1105 + qpdf is able to properly interpret Unicode arguments now,
  1106 + whereas in the past, it would almost never interpret them
  1107 + properly. The result is that non-ASCII passwords given to
  1108 + the qpdf CLI on Windows now have a much greater chance of
  1109 + creating PDF files that can be opened by a variety of
  1110 + readers. In the past, usually files encrypted from the
  1111 + Windows CLI using non-ASCII passwords would not be readable
  1112 + by most viewers. Note that the current version of qpdf is
  1113 + able to decrypt files that it previously created using the
  1114 + previously supplied password.
  1115 +
  1116 + - The PDF specification requires passwords to be encoded as
  1117 + UTF-8 for 256-bit encryption and with PDF Doc encoding for
  1118 + 40-bit or 128-bit encryption. Older versions of qpdf left it
  1119 + up to the user to provide passwords with the correct
  1120 + encoding. The qpdf CLI now detects when a password is given
  1121 + with UTF-8 encoding and automatically transcodes it to what
  1122 + the PDF spec requires. While this is almost always the
  1123 + correct behavior, it is possible to override the behavior if
  1124 + there is some reason to do so. This is discussed in more
  1125 + depth in :ref:`ref.unicode-passwords`.
  1126 +
  1127 + - New options
  1128 + :samp:`--externalize-inline-images`,
  1129 + :samp:`--ii-min-bytes`, and
  1130 + :samp:`--keep-inline-images` control qpdf's
  1131 + handling of inline images and possible conversion of them to
  1132 + regular images. By default,
  1133 + :samp:`--optimize-images` now also applies to
  1134 + inline images. These options are discussed in :ref:`ref.advanced-transformation`.
  1135 +
  1136 + - Add options :samp:`--overlay` and
  1137 + :samp:`--underlay` for overlaying or
  1138 + underlaying pages of other files onto output pages. See
  1139 + :ref:`ref.overlay-underlay` for
  1140 + details.
  1141 +
  1142 + - When opening an encrypted file with a password, if the
  1143 + specified password doesn't work and the password contains any
  1144 + non-ASCII characters, qpdf will try a number of alternative
  1145 + passwords to try to compensate for possible character encoding
  1146 + errors. This behavior can be suppressed with the
  1147 + :samp:`--suppress-password-recovery` option.
  1148 + See :ref:`ref.unicode-passwords` for a full
  1149 + discussion.
  1150 +
  1151 + - Add the :samp:`--password-mode` option to
  1152 + fine-tune how qpdf interprets password arguments, especially
  1153 + when they contain non-ASCII characters. See :ref:`ref.unicode-passwords` for more information.
  1154 +
  1155 + - In the :samp:`--pages` option, it is now
  1156 + possible to copy the same page more than once from the same
  1157 + file without using the previous workaround of specifying two
  1158 + different paths to the same file.
  1159 +
  1160 + - In the :samp:`--pages` option, allow use of "."
  1161 + as a shortcut for the primary input file. That way, you can do
  1162 + :command:`qpdf in.pdf --pages . 1-2 -- out.pdf`
  1163 + instead of having to repeat :file:`in.pdf`
  1164 + in the command.
  1165 +
  1166 + - When encrypting with 128-bit and 256-bit encryption, new
  1167 + encryption options :samp:`--assemble`,
  1168 + :samp:`--annotate`,
  1169 + :samp:`--form`, and
  1170 + :samp:`--modify-other` allow more fine-grained
  1171 + granularity in configuring options. Before, the
  1172 + :samp:`--modify` option only configured certain
  1173 + predefined groups of permissions.
  1174 +
  1175 + - Bug Fixes and Enhancements
  1176 +
  1177 + - *Potential data-loss bug:* Versions of qpdf between 8.1.0 and
  1178 + 8.3.0 had a bug that could cause page splitting and merging
  1179 + operations to drop some font or image resources if the PDF
  1180 + file's internal structure shared these resource lists across
  1181 + pages and if some but not all of the pages in the output did
  1182 + not reference all the fonts and images. Using the
  1183 + :samp:`--preserve-unreferenced-resources`
  1184 + option would work around the incorrect behavior. This bug was
  1185 + the result of a typo in the code and a deficiency in the test
  1186 + suite. The case that triggered the error was known, just not
  1187 + handled properly. This case is now exercised in qpdf's test
  1188 + suite and properly handled.
  1189 +
  1190 + - When optimizing images, detect and refuse to optimize images
  1191 + that can't be converted to JPEG because of bit depth or color
  1192 + space.
  1193 +
  1194 + - Linearization and page manipulation APIs now detect and recover
  1195 + from files that have duplicate Page objects in the pages tree.
  1196 +
  1197 + - Using older option
  1198 + :samp:`--stream-data=compress` with object
  1199 + streams, object streams and xref streams were not compressed.
  1200 +
  1201 + - When the tokenizer returns inline image tokens, delimiters
  1202 + following ``ID`` and ``EI`` operators are no longer excluded.
  1203 + This makes it possible to reliably extract the actual image
  1204 + data.
  1205 +
  1206 + - Library Enhancements
  1207 +
  1208 + - Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to
  1209 + convert inline images to regular images.
  1210 +
  1211 + - Add method ``QUtil::possible_repaired_encodings()`` to generate
  1212 + a list of strings that represent other ways the given string
  1213 + could have been encoded. This is the method the QPDF CLI uses
  1214 + to generate the strings it tries when recovering incorrectly
  1215 + encoded Unicode passwords.
  1216 +
  1217 + - Add new versions of
  1218 + ``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow
  1219 + more granular setting of permissions bits. See
  1220 + :file:`QPDFWriter.hh` for details.
  1221 +
  1222 + - Add new versions of the transcoders from UTF-8 to single-byte
  1223 + coding systems in ``QUtil`` that report success or failure
  1224 + rather than just substituting a specified unknown character.
  1225 +
  1226 + - Add method ``QUtil::analyze_encoding()`` to determine whether a
  1227 + string has high-bit characters and is appears to be UTF-16 or
  1228 + valid UTF-8 encoding.
  1229 +
  1230 + - Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to
  1231 + copy a new page that is a "shallow copy" of a page. The
  1232 + resulting object is an indirect object ready to be passed to
  1233 + ``QPDFPageDocumentHelper::addPage()`` for either the original
  1234 + ``QPDF`` object or a different one. This is what the
  1235 + :command:`qpdf` command-line tool uses to copy
  1236 + the same page multiple times from the same file during
  1237 + splitting and merging operations.
  1238 +
  1239 + - Add method ``QPDF::getUniqueId()``, which returns a unique
  1240 + identifier for the given QPDF object. The identifier will be
  1241 + unique across the life of the application. The returned value
  1242 + can be safely used as a map key.
  1243 +
  1244 + - Add method ``QPDF::setImmediateCopyFrom``. This further
  1245 + enhances qpdf's ability to allow a ``QPDF`` object from which
  1246 + objects are being copied to go out of scope before the
  1247 + destination object is written. If you call this method on a
  1248 + ``QPDF`` instances, objects copied *from* this instance will be
  1249 + copied immediately instead of lazily. This option uses more
  1250 + memory but allows the source object to go out of scope before
  1251 + the destination object is written in all cases. See comments in
  1252 + :file:`QPDF.hh` for details.
  1253 +
  1254 + - Add method ``QPDFPageObjectHelper::getAttribute`` for
  1255 + retrieving an attribute from the page dictionary taking
  1256 + inheritance into consideration, and optionally making a copy if
  1257 + your intention is to modify the attribute.
  1258 +
  1259 + - Fix long-standing limitation of
  1260 + ``QPDFPageObjectHelper::getPageImages`` so that it now properly
  1261 + reports images from inherited resources dictionaries,
  1262 + eliminating the need to call
  1263 + ``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in
  1264 + this case.
  1265 +
  1266 + - Add method ``QPDFObjectHandle::getUniqueResourceName`` for
  1267 + finding an unused name in a resource dictionary.
  1268 +
  1269 + - Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for
  1270 + generating a form XObject equivalent to a page. The resulting
  1271 + object can be used in the same file or copied to another file
  1272 + with ``copyForeignObject``. This can be useful for implementing
  1273 + underlay, overlay, n-up, thumbnails, or any other functionality
  1274 + requiring replication of pages in other contexts.
  1275 +
  1276 + - Add method ``QPDFPageObjectHelper::placeFormXObject`` for
  1277 + generating content stream text that places a given form XObject
  1278 + on a page, centered and fit within a specified rectangle. This
  1279 + method takes care of computing the proper transformation matrix
  1280 + and may optionally compensate for rotation or scaling of the
  1281 + destination page.
  1282 +
  1283 + - Build Improvements
  1284 +
  1285 + - Add new configure option
  1286 + :samp:`--enable-avoid-windows-handle`, which
  1287 + causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be
  1288 + defined. When defined, qpdf will avoid referencing the Windows
  1289 + ``HANDLE`` type, which is disallowed with certain versions of
  1290 + the Windows SDK.
  1291 +
  1292 + - For Windows builds, attempt to determine what options, if any,
  1293 + have to be passed to the compiler and linker to enable use of
  1294 + ``wmain``. This causes the preprocessor symbol
  1295 + ``WINDOWS_WMAIN`` to be defined. If you do your own builds with
  1296 + other compilers, you can define this symbol to cause ``wmain``
  1297 + to be used. This is needed to allow the Windows
  1298 + :command:`qpdf` command to receive Unicode
  1299 + command-line options.
  1300 +
  1301 +8.3.0: January 7, 2019
  1302 + - Command-line Enhancements
  1303 +
  1304 + - Shell completion: you can now use eval :command:`$(qpdf
  1305 + --completion-bash)` and eval :command:`$(qpdf
  1306 + --completion-zsh)` to enable shell completion for
  1307 + bash and zsh.
  1308 +
  1309 + - Page numbers (also known as page labels) are now preserved when
  1310 + merging and splitting files with the
  1311 + :samp:`--pages` and
  1312 + :samp:`--split-pages` options.
  1313 +
  1314 + - Bookmarks are partially preserved when splitting pages with the
  1315 + :samp:`--split-pages` option. Specifically, the
  1316 + outlines dictionary and some supporting metadata are copied
  1317 + into the split files. The result is that all bookmarks from the
  1318 + original file appear, those that point to pages that are
  1319 + preserved work, and those that point to pages that are not
  1320 + preserved don't do anything. This is an interim step toward
  1321 + proper support for bookmarks in splitting and merging
  1322 + operations.
  1323 +
  1324 + - Page collation: add new option
  1325 + :samp:`--collate`. When specified, the
  1326 + semantics of :samp:`--pages` change from
  1327 + concatenation to collation. See :ref:`ref.page-selection` for examples and discussion.
  1328 +
  1329 + - Generation of information in JSON format, primarily to
  1330 + facilitate use of qpdf from languages other than C++. Add new
  1331 + options :samp:`--json`,
  1332 + :samp:`--json-key`, and
  1333 + :samp:`--json-object` to generate a JSON
  1334 + representation of the PDF file. Run :command:`qpdf
  1335 + --json-help` to get a description of the JSON
  1336 + format. For more information, see :ref:`ref.json`.
  1337 +
  1338 + - The :samp:`--generate-appearances` flag will
  1339 + cause qpdf to generate appearances for form fields if the PDF
  1340 + file indicates that form field appearances are out of date.
  1341 + This can happen when PDF forms are filled in by a program that
  1342 + doesn't know how to regenerate the appearances of the filled-in
  1343 + fields.
  1344 +
  1345 + - The :samp:`--flatten-annotations` flag can be
  1346 + used to *flatten* annotations, including form fields.
  1347 + Ordinarily, annotations are drawn separately from the page.
  1348 + Flattening annotations is the process of combining their
  1349 + appearances into the page's contents. You might want to do this
  1350 + if you are going to rotate or combine pages using a tool that
  1351 + doesn't understand about annotations. You may also want to use
  1352 + :samp:`--generate-appearances` when using this
  1353 + flag since annotations for outdated form fields are not
  1354 + flattened as that would cause loss of information.
  1355 +
  1356 + - The :samp:`--optimize-images` flag tells qpdf
  1357 + to recompresses every image using DCT (JPEG) compression as
  1358 + long as the image is not already compressed with lossy
  1359 + compression and recompressing the image reduces its size. The
  1360 + additional options :samp:`--oi-min-width`,
  1361 + :samp:`--oi-min-height`, and
  1362 + :samp:`--oi-min-area` prevent recompression of
  1363 + images whose width, height, or pixel area (widthย ร—ย height) are
  1364 + below a specified threshold.
  1365 +
  1366 + - The :samp:`--show-object` option can now be
  1367 + given as :samp:`--show-object=trailer` to show
  1368 + the trailer dictionary.
  1369 +
  1370 + - Bug Fixes and Enhancements
  1371 +
  1372 + - QPDF now automatically detects and recovers from dangling
  1373 + references. If a PDF file contained an indirect reference to a
  1374 + non-existent object, which is valid, when adding a new object
  1375 + to the file, it was possible for the new object to take the
  1376 + object ID of the dangling reference, thereby causing the
  1377 + dangling reference to point to the new object. This case is now
  1378 + prevented.
  1379 +
  1380 + - Fixes to form field setting code: strings are always written in
  1381 + UTF-16 format, and checkboxes and radio buttons are handled
  1382 + properly with respect to synchronization of values and
  1383 + appearance states.
  1384 +
  1385 + - The ``QPDF::checkLinearization()`` no longer causes the program
  1386 + to crash when it detects problems with linearization data.
  1387 + Instead, it issues a normal warning or error.
  1388 +
  1389 + - Ordinarily qpdf treats an argument of the form
  1390 + :samp:`@file` to mean that command-line options
  1391 + should be read from :file:`file`. Now, if
  1392 + :file:`file` does not exist but
  1393 + :file:`@file` does, qpdf will treat
  1394 + :file:`@file` as a regular option. This
  1395 + makes it possible to work more easily with PDF files whose
  1396 + names happen to start with the ``@`` character.
  1397 +
  1398 + - Library Enhancements
  1399 +
  1400 + - Remove the restriction in most cases that the source QPDF
  1401 + object used in a ``QPDF::copyForeignObject`` call has to stick
  1402 + around until the destination QPDF is written. The exceptional
  1403 + case is when the source stream gets is data using a
  1404 + QPDFObjectHandle::StreamDataProvider. For a more in-depth
  1405 + discussion, see comments around ``copyForeignObject`` in
  1406 + :file:`QPDF.hh`.
  1407 +
  1408 + - Add new method ``QPDFWriter::getFinalVersion()``, which returns
  1409 + the PDF version that will ultimately be written to the final
  1410 + file. See comments in :file:`QPDFWriter.hh`
  1411 + for some restrictions on its use.
  1412 +
  1413 + - Add several methods for transcoding strings to some of the
  1414 + character sets used in PDF files: ``QUtil::utf8_to_ascii``,
  1415 + ``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and
  1416 + ``QUtil::utf8_to_utf16``. For the single-byte encodings that
  1417 + support only a limited character sets, these methods replace
  1418 + unsupported characters with a specified substitute.
  1419 +
  1420 + - Add new methods to ``QPDFAnnotationObjectHelper`` and
  1421 + ``QPDFFormFieldObjectHelper`` for querying flags and
  1422 + interpretation of different field types. Define constants in
  1423 + :file:`qpdf/Constants.h` to help with
  1424 + interpretation of flag values.
  1425 +
  1426 + - Add new methods
  1427 + ``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and
  1428 + ``QPDFFormFieldObjectHelper::generateAppearance`` for
  1429 + generating appearance streams. See discussion in
  1430 + :file:`QPDFFormFieldObjectHelper.hh` for
  1431 + limitations.
  1432 +
  1433 + - Add two new helper functions for dealing with resource
  1434 + dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns
  1435 + a list of all second-level keys, which correspond to the names
  1436 + of resources, and ``QPDFObjectHandle::mergeResources()`` merges
  1437 + two resources dictionaries as long as they have non-conflicting
  1438 + keys. These methods are useful for certain types of objects
  1439 + that resolve resources from multiple places, such as form
  1440 + fields.
  1441 +
  1442 + - Add methods ``QPDFPageDocumentHelper::flattenAnnotations()``
  1443 + and
  1444 + ``QPDFAnnotationObjectHelper::getPageContentForAppearance()``
  1445 + for handling low-level details of annotation flattening.
  1446 +
  1447 + - Add new helper classes: ``QPDFOutlineDocumentHelper``,
  1448 + ``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``,
  1449 + ``QPDFNameTreeObjectHelper``, and
  1450 + ``QPDFNumberTreeObjectHelper``.
  1451 +
  1452 + - Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON
  1453 + representation of the object. Call ``serialize()`` on the
  1454 + result to convert it to a string.
  1455 +
  1456 + - Add a simple JSON serializer. This is not a complete or
  1457 + general-purpose JSON library. It allows assembly and
  1458 + serialization of JSON structures with some restrictions, which
  1459 + are described in the header file. This is the serializer used
  1460 + by qpdf's new JSON representation.
  1461 +
  1462 + - Add new ``QPDFObjectHandle::Matrix`` class along with a few
  1463 + convenience methods for dealing with six-element numerical
  1464 + arrays as matrices.
  1465 +
  1466 + - Add new method ``QPDFObjectHandle::wrapInArray``, which returns
  1467 + the object itself if it is an array, or an array containing the
  1468 + object otherwise. This is a common construct in PDF. This
  1469 + method prevents you from having to explicitly test whether
  1470 + something is a single element or an array.
  1471 +
  1472 + - Build Improvements
  1473 +
  1474 + - It is no longer necessary to run
  1475 + :command:`autogen.sh` to build from a pristine
  1476 + checkout. Automatically generated files are now committed so
  1477 + that it is possible to build on platforms without autoconf
  1478 + directly from a clean checkout of the repository. The
  1479 + :command:`configure` script detects if the files
  1480 + are out of date when it also determines that the tools are
  1481 + present to regenerate them.
  1482 +
  1483 + - Pull requests and the master branch are now built automatically
  1484 + in `Azure
  1485 + Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is
  1486 + free for open source projects. The build includes Linux, mac,
  1487 + Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage
  1488 + build. Official qpdf releases are now built with Azure
  1489 + Pipelines.
  1490 +
  1491 + - Notes for Packagers
  1492 +
  1493 + - A new section has been added to the documentation with notes
  1494 + for packagers. Please see :ref:`ref.packaging`.
  1495 +
  1496 + - The qpdf detects out-of-date automatically generated files. If
  1497 + your packaging system automatically refreshes libtool or
  1498 + autoconf files, it could cause this check to fail. To avoid
  1499 + this problem, pass
  1500 + :samp:`--disable-check-autofiles` to
  1501 + :command:`configure`.
  1502 +
  1503 + - If you would like to have qpdf completion enabled
  1504 + automatically, you can install completion files in the
  1505 + distribution's default location. You can find sample completion
  1506 + files to install in the :file:`completions`
  1507 + directory.
  1508 +
  1509 +8.2.1: August 18, 2018
  1510 + - Command-line Enhancements
  1511 +
  1512 + - Add
  1513 + :samp:`--keep-files-open={[yn]}`
  1514 + to override default determination of whether to keep files open
  1515 + when merging. Please see the discussion of
  1516 + :samp:`--keep-files-open` in :ref:`ref.basic-options` for additional details.
  1517 +
  1518 +8.2.0: August 16, 2018
  1519 + - Command-line Enhancements
  1520 +
  1521 + - Add :samp:`--no-warn` option to suppress
  1522 + issuing warning messages. If there are any conditions that
  1523 + would have caused warnings to be issued, the exit status is
  1524 + still 3.
  1525 +
  1526 + - Bug Fixes and Optimizations
  1527 +
  1528 + - Performance fix: optimize page merging operation to avoid
  1529 + unnecessary open/close calls on files being merged. This solves
  1530 + a dramatic slow-down that was observed when merging certain
  1531 + types of files.
  1532 +
  1533 + - Optimize how memory was used for the TIFF predictor,
  1534 + drastically improving performance and memory usage for files
  1535 + containing high-resolution images compressed with Flate using
  1536 + the TIFF predictor.
  1537 +
  1538 + - Bug fix: end of line characters were not properly handled
  1539 + inside strings in some cases.
  1540 +
  1541 + - Bug fix: using :samp:`--progress` on very small
  1542 + files could cause an infinite loop.
  1543 +
  1544 + - API enhancements
  1545 +
  1546 + - Add new class ``QPDFSystemError``, derived from
  1547 + ``std::runtime_error``, which is now thrown by
  1548 + ``QUtil::throw_system_error``. This enables the triggering
  1549 + ``errno`` value to be retrieved.
  1550 +
  1551 + - Add ``ClosedFileInputSource::stayOpen`` method, enabling a
  1552 + ``ClosedFileInputSource`` to stay open during manually
  1553 + indicated periods of high activity, thus reducing the overhead
  1554 + of frequent open/close operations.
  1555 +
  1556 + - Build Changes
  1557 +
  1558 + - For the mingw builds, change the name of the DLL import library
  1559 + from :file:`libqpdf.a` to
  1560 + :file:`libqpdf.dll.a` to more accurately
  1561 + reflect that it is an import library rather than a static
  1562 + library. This potentially clears the way for supporting a
  1563 + static library in the future, though presently, the qpdf
  1564 + Windows build only builds the DLL and executables.
  1565 +
  1566 +8.1.0: June 23, 2018
  1567 + - Usability Improvements
  1568 +
  1569 + - When splitting files, qpdf detects fonts and images that the
  1570 + document metadata claims are referenced from a page but are not
  1571 + actually referenced and omits them from the output file. This
  1572 + change can cause a significant reduction in the size of split
  1573 + PDF files for files created by some software packages. In some
  1574 + cases, it can also make page splitting slower. Prior versions
  1575 + of qpdf would believe the document metadata and sometimes
  1576 + include all the images from all the other pages even though the
  1577 + pages were no longer present. In the unlikely event that the
  1578 + old behavior should be desired, or if you have a case where
  1579 + page splitting is very slow, the old behavior (and speed) can
  1580 + be enabled by specifying
  1581 + :samp:`--preserve-unreferenced-resources`. For
  1582 + additional details, please see :ref:`ref.advanced-transformation`.
  1583 +
  1584 + - When merging multiple PDF files, qpdf no longer leaves all the
  1585 + files open. This makes it possible to merge numbers of files
  1586 + that may exceed the operating system's limit for the maximum
  1587 + number of open files.
  1588 +
  1589 + - The :samp:`--rotate` option's syntax has been
  1590 + extended to make the page range optional. If you specify
  1591 + :samp:`--rotate={angle}`
  1592 + without specifying a page range, the rotation will be applied
  1593 + to all pages. This can be especially useful for adjusting a PDF
  1594 + created from a multi-page document that was scanned upside
  1595 + down.
  1596 +
  1597 + - When merging multiple files, the
  1598 + :samp:`--verbose` option now prints information
  1599 + about each file as it operates on that file.
  1600 +
  1601 + - When the :samp:`--progress` option is
  1602 + specified, qpdf will print a running indicator of its best
  1603 + guess at how far through the writing process it is. Note that,
  1604 + as with all progress meters, it's an approximation. This option
  1605 + is implemented in a way that makes it useful for software that
  1606 + uses the qpdf library; see API Enhancements below.
  1607 +
  1608 + - Bug Fixes
  1609 +
  1610 + - Properly decrypt files that use revision 3 of the standard
  1611 + security handler but use 40 bit keys (even though revision 3
  1612 + supports 128-bit keys).
  1613 +
  1614 + - Limit depth of nested data structures to prevent crashes from
  1615 + certain types of malformed (malicious) PDFs.
  1616 +
  1617 + - In "newline before endstream" mode, insert the required extra
  1618 + newline before the ``endstream`` at the end of object streams.
  1619 + This one case was previously omitted.
  1620 +
  1621 + - API Enhancements
  1622 +
  1623 + - The first round of higher level "helper" interfaces has been
  1624 + introduced. These are designed to provide a more convenient way
  1625 + of interacting with certain document features than using
  1626 + ``QPDFObjectHandle`` directly. For details on helpers, see
  1627 + :ref:`ref.helper-classes`. Specific additional
  1628 + interfaces are described below.
  1629 +
  1630 + - Add two new document helper classes: ``QPDFPageDocumentHelper``
  1631 + for working with pages, and ``QPDFAcroFormDocumentHelper`` for
  1632 + working with interactive forms. No old methods have been
  1633 + removed, but ``QPDFPageDocumentHelper`` is now the preferred
  1634 + way to perform operations on pages rather than calling the old
  1635 + methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments
  1636 + in the header files direct you to the new interfaces. Please
  1637 + see the header files and :file:`ChangeLog`
  1638 + for additional details.
  1639 +
  1640 + - Add three new object helper class: ``QPDFPageObjectHelper`` for
  1641 + pages, ``QPDFFormFieldObjectHelper`` for interactive form
  1642 + fields, and ``QPDFAnnotationObjectHelper`` for annotations. All
  1643 + three classes are fairly sparse at the moment, but they have
  1644 + some useful, basic functionality.
  1645 +
  1646 + - A new example program
  1647 + :file:`examples/pdf-set-form-values.cc` has
  1648 + been added that illustrates use of the new document and object
  1649 + helpers.
  1650 +
  1651 + - The method ``QPDFWriter::registerProgressReporter`` has been
  1652 + added. This method allows you to register a function that is
  1653 + called by ``QPDFWriter`` to update your idea of the percentage
  1654 + it thinks it is through writing its output. Client programs can
  1655 + use this to implement reasonably accurate progress meters. The
  1656 + :command:`qpdf` command line tool uses this to
  1657 + implement its :samp:`--progress` option.
  1658 +
  1659 + - New methods ``QPDFObjectHandle::newUnicodeString`` and
  1660 + ``QPDFObject::unparseBinary`` have been added to allow for more
  1661 + convenient creation of strings that are explicitly encoded
  1662 + using big-endian UTF-16. This is useful for creating strings
  1663 + that appear outside of content streams, such as labels, form
  1664 + fields, outlines, document metadata, etc.
  1665 +
  1666 + - A new class ``QPDFObjectHandle::Rectangle`` has been added to
  1667 + ease working with PDF rectangles, which are just arrays of four
  1668 + numeric values.
  1669 +
  1670 +8.0.2: March 6, 2018
  1671 + - When a loop is detected while following cross reference streams or
  1672 + tables, treat this as damage instead of silently ignoring the
  1673 + previous table. This prevents loss of otherwise recoverable data
  1674 + in some damaged files.
  1675 +
  1676 + - Properly handle pages with no contents.
  1677 +
  1678 +8.0.1: March 4, 2018
  1679 + - Disregard data check errors when uncompressing ``/FlateDecode``
  1680 + streams. This is consistent with most other PDF readers and allows
  1681 + qpdf to recover data from another class of malformed PDF files.
  1682 +
  1683 + - On the command line when specifying page ranges, support preceding
  1684 + a page number by "r" to indicate that it should be counted from
  1685 + the end. For example, the range ``r3-r1`` would indicate the last
  1686 + three pages of a document.
  1687 +
  1688 +8.0.0: February 25, 2018
  1689 + - Packaging and Distribution Changes
  1690 +
  1691 + - QPDF is now distributed as an
  1692 + `AppImage <https://appimage.org/>`__ in addition to all the
  1693 + other ways it is distributed. The AppImage can be found in the
  1694 + download area with the other packages. Thanks to Kurt Pfeifle
  1695 + and Simon Peter for their contributions.
  1696 +
  1697 + - Bug Fixes
  1698 +
  1699 + - ``QPDFObjectHandle::getUTF8Val`` now properly treats
  1700 + non-Unicode strings as encoded with PDF Doc Encoding.
  1701 +
  1702 + - Improvements to handling of objects in PDF files that are not
  1703 + of the expected type. In most cases, qpdf will be able to warn
  1704 + for such cases rather than fail with an exception. Previous
  1705 + versions of qpdf would sometimes fail with errors such as
  1706 + "operation for dictionary object attempted on object of wrong
  1707 + type". This situation should be mostly or entirely eliminated
  1708 + now.
  1709 +
  1710 + - Enhancements to the :command:`qpdf` Command-line
  1711 + Tool. All new options listed here are documented in more detail in
  1712 + :ref:`ref.using`.
  1713 +
  1714 + - The option
  1715 + :samp:`--linearize-pass1={file}`
  1716 + has been added for debugging qpdf's linearization code.
  1717 +
  1718 + - The option :samp:`--coalesce-contents` can be
  1719 + used to combine content streams of a page whose contents are an
  1720 + array of streams into a single stream.
  1721 +
  1722 + - API Enhancements. All new API calls are documented in their
  1723 + respective classes' header files. There are no non-compatible
  1724 + changes to the API.
  1725 +
  1726 + - Add function ``qpdf_check_pdf`` to the C API. This function
  1727 + does basic checking that is a subset of what :command:`qpdf
  1728 + --check` performs.
  1729 +
  1730 + - Major enhancements to the lexical layer of qpdf. For a complete
  1731 + list of enhancements, please refer to the
  1732 + :file:`ChangeLog` file. Most of the changes
  1733 + result in improvements to qpdf's ability handle erroneous
  1734 + files. It is also possible for programs to handle whitespace,
  1735 + comments, and inline images as tokens.
  1736 +
  1737 + - New API for working with PDF content streams at a lexical
  1738 + level. The new class ``QPDFObjectHandle::TokenFilter`` allows
  1739 + the developer to provide token handlers. Token filters can be
  1740 + used with several different methods in ``QPDFObjectHandle`` as
  1741 + well as with a lower-level interface. See comments in
  1742 + :file:`QPDFObjectHandle.hh` as well as the
  1743 + new examples
  1744 + :file:`examples/pdf-filter-tokens.cc` and
  1745 + :file:`examples/pdf-count-strings.cc` for
  1746 + details.
  1747 +
  1748 +7.1.1: February 4, 2018
  1749 + - Bug fix: files whose /ID fields were other than 16 bytes long can
  1750 + now be properly linearized
  1751 +
  1752 + - A few compile and link issues have been corrected for some
  1753 + platforms.
  1754 +
  1755 +7.1.0: January 14, 2018
  1756 + - PDF files contain streams that may be compressed with various
  1757 + compression algorithms which, in some cases, may be enhanced by
  1758 + various predictor functions. Previously only the PNG up predictor
  1759 + was supported. In this version, all the PNG predictors as well as
  1760 + the TIFF predictor are supported. This increases the range of
  1761 + files that qpdf is able to handle.
  1762 +
  1763 + - QPDF now allows a raw encryption key to be specified in place of a
  1764 + password when opening encrypted files, and will optionally display
  1765 + the encryption key used by a file. This is a non-standard
  1766 + operation, but it can be useful in certain situations. Please see
  1767 + the discussion of :samp:`--password-is-hex-key` in
  1768 + :ref:`ref.basic-options` or the comments around
  1769 + ``QPDF::setPasswordIsHexKey`` in
  1770 + :file:`QPDF.hh` for additional details.
  1771 +
  1772 + - Bug fix: numbers ending with a trailing decimal point are now
  1773 + properly recognized as numbers.
  1774 +
  1775 + - Bug fix: when building qpdf from source on some platforms
  1776 + (especially MacOS), the build could get confused by older versions
  1777 + of qpdf installed on the system. This has been corrected.
  1778 +
  1779 +7.0.0: September 15, 2017
  1780 + - Packaging and Distribution Changes
  1781 +
  1782 + - QPDF's primary license is now `version 2.0 of the Apache
  1783 + License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather
  1784 + than version 2.0 of the Artistic License. You may still, at
  1785 + your option, consider qpdf to be licensed with version 2.0 of
  1786 + the Artistic license.
  1787 +
  1788 + - QPDF no longer has a dependency on the PCRE (Perl-Compatible
  1789 + Regular Expression) library. QPDF now has an added dependency
  1790 + on the JPEG library.
  1791 +
  1792 + - Bug Fixes
  1793 +
  1794 + - This release contains many bug fixes for various infinite
  1795 + loops, memory leaks, and other memory errors that could be
  1796 + encountered with specially crafted or otherwise erroneous PDF
  1797 + files.
  1798 +
  1799 + - New Features
  1800 +
  1801 + - QPDF now supports reading and writing streams encoded with JPEG
  1802 + or RunLength encoding. Library API enhancements and
  1803 + command-line options have been added to control this behavior.
  1804 + See command-line options
  1805 + :samp:`--compress-streams` and
  1806 + :samp:`--decode-level` and methods
  1807 + ``QPDFWriter::setCompressStreams`` and
  1808 + ``QPDFWriter::setDecodeLevel``.
  1809 +
  1810 + - QPDF is much better at recovering from broken files. In most
  1811 + cases, qpdf will skip invalid objects and will preserve broken
  1812 + stream data by not attempting to filter broken streams. QPDF is
  1813 + now able to recover or at least not crash on dozens of broken
  1814 + test files I have received over the past few years.
  1815 +
  1816 + - Page rotation is now supported and accessible from both the
  1817 + library and the command line.
  1818 +
  1819 + - ``QPDFWriter`` supports writing files in a way that preserves
  1820 + PCLm compliance in support of driverless printing. This is very
  1821 + specialized and is only useful to applications that already
  1822 + know how to create PCLm files.
  1823 +
  1824 + - Enhancements to the :command:`qpdf` Command-line
  1825 + Tool. All new options listed here are documented in more detail in
  1826 + :ref:`ref.using`.
  1827 +
  1828 + - Command-line arguments can now be read from files or standard
  1829 + input using ``@file`` or ``@-`` syntax. Please see :ref:`ref.invocation`.
  1830 +
  1831 + - :samp:`--rotate`: request page rotation
  1832 +
  1833 + - :samp:`--newline-before-endstream`: ensure that
  1834 + a newline appears before every ``endstream`` keyword in the
  1835 + file; used to prevent qpdf from breaking PDF/A compliance on
  1836 + already compliant files.
  1837 +
  1838 + - :samp:`--preserve-unreferenced`: preserve
  1839 + unreferenced objects in the input PDF
  1840 +
  1841 + - :samp:`--split-pages`: break output into chunks
  1842 + with fixed numbers of pages
  1843 +
  1844 + - :samp:`--verbose`: print the name of each
  1845 + output file that is created
  1846 +
  1847 + - :samp:`--compress-streams` and
  1848 + :samp:`--decode-level` replace
  1849 + :samp:`--stream-data` for improving granularity
  1850 + of controlling compression and decompression of stream data.
  1851 + The :samp:`--stream-data` option will remain
  1852 + available.
  1853 +
  1854 + - When running :command:`qpdf --check` with other
  1855 + options, checks are always run first. This enables qpdf to
  1856 + perform its full recovery logic before outputting other
  1857 + information. This can be especially useful when manually
  1858 + recovering broken files, looking at qpdf's regenerated cross
  1859 + reference table, or other similar operations.
  1860 +
  1861 + - Process :command:`--pages` earlier so that other
  1862 + options like :samp:`--show-pages` or
  1863 + :samp:`--split-pages` can operate on the file
  1864 + after page splitting/merging has occurred.
  1865 +
  1866 + - API Changes. All new API calls are documented in their respective
  1867 + classes' header files.
  1868 +
  1869 + - ``QPDFObjectHandle::rotatePage``: apply rotation to a page
  1870 + object
  1871 +
  1872 + - ``QPDFWriter::setNewlineBeforeEndstream``: force newline to
  1873 + appear before ``endstream``
  1874 +
  1875 + - ``QPDFWriter::setPreserveUnreferencedObjects``: preserve
  1876 + unreferenced objects that appear in the input PDF. The default
  1877 + behavior is to discard them.
  1878 +
  1879 + - New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are
  1880 + available for developers who wish to produce or consume
  1881 + RunLength or DCT stream data directly. The
  1882 + :file:`examples/pdf-create.cc` example
  1883 + illustrates their use.
  1884 +
  1885 + - ``QPDFWriter::setCompressStreams`` and
  1886 + ``QPDFWriter::setDecodeLevel`` methods control handling of
  1887 + different types of stream compression.
  1888 +
  1889 + - Add new C API functions ``qpdf_set_compress_streams``,
  1890 + ``qpdf_set_decode_level``,
  1891 + ``qpdf_set_preserve_unreferenced_objects``, and
  1892 + ``qpdf_set_newline_before_endstream`` corresponding to the new
  1893 + ``QPDFWriter`` methods.
  1894 +
  1895 +6.0.0: November 10, 2015
  1896 + - Implement :samp:`--deterministic-id` command-line
  1897 + option and ``QPDFWriter::setDeterministicID`` as well as C API
  1898 + function ``qpdf_set_deterministic_ID`` for generating a
  1899 + deterministic ID for non-encrypted files. When this option is
  1900 + selected, the ID of the file depends on the contents of the output
  1901 + file, and not on transient items such as the timestamp or output
  1902 + file name.
  1903 +
  1904 + - Make qpdf more tolerant of files whose xref table entries are not
  1905 + the correct length.
  1906 +
  1907 +5.1.3: May 24, 2015
  1908 + - Bug fix: fix-qdf was not properly handling files that contained
  1909 + object streams with more than 255 objects in them.
  1910 +
  1911 + - Bug fix: qpdf was not properly initializing Microsoft's secure
  1912 + crypto provider on fresh Windows installations that had not had
  1913 + any keys created yet.
  1914 +
  1915 + - Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of
  1916 + the Google Security Team. Please see the ChangeLog for details.
  1917 +
  1918 + - Properly handle pages that have no contents at all. There were
  1919 + many cases in which qpdf handled this fine, but a few methods
  1920 + blindly obtained page contents with handling the possibility that
  1921 + there were no contents.
  1922 +
  1923 + - Make qpdf more robust for a few more kinds of problems that may
  1924 + occur in invalid PDF files.
  1925 +
  1926 +5.1.2: June 7, 2014
  1927 + - Bug fix: linearizing files could create a corrupted output file
  1928 + under extremely unlikely file size circumstances. See ChangeLog
  1929 + for details. The odds of getting hit by this are very low, though
  1930 + one person did.
  1931 +
  1932 + - Bug fix: qpdf would fail to write files that had streams with
  1933 + decode parameters referencing other streams.
  1934 +
  1935 + - New example program: :command:`pdf-split-pages`:
  1936 + efficiently split PDF files into individual pages. The example
  1937 + program does this more efficiently than using :command:`qpdf
  1938 + --pages` to do it.
  1939 +
  1940 + - Packaging fix: Visual C++ binaries did not support Windows XP.
  1941 + This has been rectified by updating the compilers used to generate
  1942 + the release binaries.
  1943 +
  1944 +5.1.1: January 14, 2014
  1945 + - Performance fix: copying foreign objects could be very slow with
  1946 + certain types of files. This was most likely to be visible during
  1947 + page splitting and was due to traversing the same objects multiple
  1948 + times in some cases.
  1949 +
  1950 +5.1.0: December 17, 2013
  1951 + - Added runtime option (``QUtil::setRandomDataProvider``) to supply
  1952 + your own random data provider. You can use this if you want to
  1953 + avoid using the OS-provided secure random number generation
  1954 + facility or stdlib's less secure version. See comments in
  1955 + include/qpdf/QUtil.hh for details.
  1956 +
  1957 + - Fixed image comparison tests to not create 12-bit-per-pixel images
  1958 + since some versions of tiffcmp have bugs in comparing them in some
  1959 + cases. This increases the disk space required by the image
  1960 + comparison tests, which are off by default anyway.
  1961 +
  1962 + - Introduce a number of small fixes for compilation on the latest
  1963 + clang in MacOS and the latest Visual C++ in Windows.
  1964 +
  1965 + - Be able to handle broken files that end the xref table header with
  1966 + a space instead of a newline.
  1967 +
  1968 +5.0.1: October 18, 2013
  1969 + - Thanks to a detailed review by Florian Weimer and the Red Hat
  1970 + Product Security Team, this release includes a number of
  1971 + non-user-visible security hardening changes. Please see the
  1972 + ChangeLog file in the source distribution for the complete list.
  1973 +
  1974 + - When available, operating system-specific secure random number
  1975 + generation is used for generating initialization vectors and other
  1976 + random values used during encryption or file creation. For the
  1977 + Windows build, this results in an added dependency on Microsoft's
  1978 + cryptography API. To disable the OS-specific cryptography and use
  1979 + the old version, pass the
  1980 + :samp:`--enable-insecure-random` option to
  1981 + :command:`./configure`.
  1982 +
  1983 + - The :command:`qpdf` command-line tool now issues a
  1984 + warning when :samp:`-accessibility=n` is specified
  1985 + for newer encryption versions stating that the option is ignored.
  1986 + qpdf, per the spec, has always ignored this flag, but it
  1987 + previously did so silently. This warning is issued only by the
  1988 + command-line tool, not by the library. The library's handling of
  1989 + this flag is unchanged.
  1990 +
  1991 +5.0.0: July 10, 2013
  1992 + - Bug fix: previous versions of qpdf would lose objects with
  1993 + generation != 0 when generating object streams. Fixing this
  1994 + required changes to the public API.
  1995 +
  1996 + - Removed methods from public API that were only supposed to be
  1997 + called by QPDFWriter and couldn't realistically be called anywhere
  1998 + else. See ChangeLog for details.
  1999 +
  2000 + - New ``QPDFObjGen`` class added to represent an object
  2001 + ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now
  2002 + preferred over ``QPDFObjectHandle::getObjectID()`` and
  2003 + ``QPDFObjectHandle::getGeneration()`` as it makes it less likely
  2004 + for people to accidentally write code that ignores the generation
  2005 + number. See :file:`QPDF.hh` and
  2006 + :file:`QPDFObjectHandle.hh` for additional
  2007 + notes.
  2008 +
  2009 + - Add :samp:`--show-npages` command-line option to
  2010 + the :command:`qpdf` command to show the number of
  2011 + pages in a file.
  2012 +
  2013 + - Allow omission of the page range within
  2014 + :samp:`--pages` for the
  2015 + :command:`qpdf` command. When omitted, the page
  2016 + range is implicitly taken to be all the pages in the file.
  2017 +
  2018 + - Various enhancements were made to support different types of
  2019 + broken files or broken readers. Details can be found in
  2020 + :file:`ChangeLog`.
  2021 +
  2022 +4.1.0: April 14, 2013
  2023 + - Note to people including qpdf in distributions: the
  2024 + :file:`.la` files generated by libtool are now
  2025 + installed by qpdf's :command:`make install` target.
  2026 + Before, they were not installed. This means that if your
  2027 + distribution does not want to include
  2028 + :file:`.la` files, you must remove them as
  2029 + part of your packaging process.
  2030 +
  2031 + - Major enhancement: API enhancements have been made to support
  2032 + parsing of content streams. This enhancement includes the
  2033 + following changes:
  2034 +
  2035 + - ``QPDFObjectHandle::parseContentStream`` method parses objects
  2036 + in a content stream and calls handlers in a callback class. The
  2037 + example
  2038 + :file:`examples/pdf-parse-content.cc`
  2039 + illustrates how this may be used.
  2040 +
  2041 + - ``QPDFObjectHandle`` can now represent operators and inline
  2042 + images, object types that may only appear in content streams.
  2043 +
  2044 + - Method ``QPDFObjectHandle::getTypeCode()`` returns an
  2045 + enumerated type value representing the underlying object type.
  2046 + Method ``QPDFObjectHandle::getTypeName()`` returns a text
  2047 + string describing the name of the type of a
  2048 + ``QPDFObjectHandle`` object. These methods can be used for more
  2049 + efficient parsing and debugging/diagnostic messages.
  2050 +
  2051 + - :command:`qpdf --check` now parses all pages'
  2052 + content streams in addition to doing other checks. While there are
  2053 + still many types of errors that cannot be detected, syntactic
  2054 + errors in content streams will now be reported.
  2055 +
  2056 + - Minor compilation enhancements have been made to facilitate easier
  2057 + for support for a broader range of compilers and compiler
  2058 + versions.
  2059 +
  2060 + - Warning flags have been moved into a separate variable in
  2061 + :file:`autoconf.mk`
  2062 +
  2063 + - The configure flag :samp:`--enable-werror` work
  2064 + for Microsoft compilers
  2065 +
  2066 + - All MSVC CRT security warnings have been resolved.
  2067 +
  2068 + - All C-style casts in C++ Code have been replaced by C++ casts,
  2069 + and many casts that had been included to suppress higher
  2070 + warning levels for some compilers have been removed, primarily
  2071 + for clarity. Places where integer type coercion occurs have
  2072 + been scrutinized. A new casting policy has been documented in
  2073 + the manual. This is of concern mainly to people porting qpdf to
  2074 + new platforms or compilers. It is not visible to programmers
  2075 + writing code that uses the library
  2076 +
  2077 + - Some internal limits have been removed in code that converts
  2078 + numbers to strings. This is largely invisible to users, but it
  2079 + does trigger a bug in some older versions of mingw-w64's C++
  2080 + library. See :file:`README-windows.md` in
  2081 + the source distribution if you think this may affect you. The
  2082 + copy of the DLL distributed with qpdf's binary distribution is
  2083 + not affected by this problem.
  2084 +
  2085 + - The RPM spec file previously included with qpdf has been removed.
  2086 + This is because virtually all Linux distributions include qpdf now
  2087 + that it is a dependency of CUPS filters.
  2088 +
  2089 + - A few bug fixes are included:
  2090 +
  2091 + - Overridden compressed objects are properly handled. Before,
  2092 + there were certain constructs that could cause qpdf to see old
  2093 + versions of some objects. The most usual manifestation of this
  2094 + was loss of filled in form values for certain files.
  2095 +
  2096 + - Installation no longer uses GNU/Linux-specific versions of some
  2097 + commands, so :command:`make install` works on
  2098 + Solaris with native tools.
  2099 +
  2100 + - The 64-bit mingw Windows binary package no longer includes a
  2101 + 32-bit DLL.
  2102 +
  2103 +4.0.1: January 17, 2013
  2104 + - Fix detection of binary attachments in test suite to avoid false
  2105 + test failures on some platforms.
  2106 +
  2107 + - Add clarifying comment in :file:`QPDF.hh` to
  2108 + methods that return the user password explaining that it is no
  2109 + longer possible with newer encryption formats to recover the user
  2110 + password knowing the owner password. In earlier encryption
  2111 + formats, the user password was encrypted in the file using the
  2112 + owner password. In newer encryption formats, a separate encryption
  2113 + key is used on the file, and that key is independently encrypted
  2114 + using both the user password and the owner password.
  2115 +
  2116 +4.0.0: December 31, 2012
  2117 + - Major enhancement: support has been added for newer encryption
  2118 + schemes supported by version X of Adobe Acrobat. This includes use
  2119 + of 127-character passwords, 256-bit encryption keys, and the
  2120 + encryption scheme specified in ISO 32000-2, the PDF 2.0
  2121 + specification. This scheme can be chosen from the command line by
  2122 + specifying use of 256-bit keys. qpdf also supports the deprecated
  2123 + encryption method used by Acrobat IX. This encryption style has
  2124 + known security weaknesses and should not be used in practice.
  2125 + However, such files exist "in the wild," so support for this
  2126 + scheme is still useful. New methods
  2127 + ``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme)
  2128 + and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated
  2129 + scheme) have been added to enable these new encryption schemes.
  2130 + Corresponding functions have been added to the C API as well.
  2131 +
  2132 + - Full support for Adobe extension levels in PDF version
  2133 + information. Starting with PDF version 1.7, corresponding to ISO
  2134 + 32000, Adobe adds new functionality by increasing the extension
  2135 + level rather than increasing the version. This support includes
  2136 + addition of the ``QPDF::getExtensionLevel`` method for retrieving
  2137 + the document's extension level, addition of versions of
  2138 + ``QPDFWriter::setMinimumPDFVersion`` and
  2139 + ``QPDFWriter::forcePDFVersion`` that accept an extension level,
  2140 + and extended syntax for specifying forced and minimum versions on
  2141 + the command line as described in :ref:`ref.advanced-transformation`. Corresponding functions
  2142 + have been added to the C API as well.
  2143 +
  2144 + - Minor fixes to prevent qpdf from referencing objects in the file
  2145 + that are not referenced in the file's overall structure. Most
  2146 + files don't have any such objects, but some files have contain
  2147 + unreferenced objects with errors, so these fixes prevent qpdf from
  2148 + needlessly rejecting or complaining about such objects.
  2149 +
  2150 + - Add new generalized methods for reading and writing files from/to
  2151 + programmer-defined sources. The method
  2152 + ``QPDF::processInputSource`` allows the programmer to use any
  2153 + input source for the input file, and
  2154 + ``QPDFWriter::setOutputPipeline`` allows the programmer to write
  2155 + the output file through any pipeline. These methods would make it
  2156 + possible to perform any number of specialized operations, such as
  2157 + accessing external storage systems, creating bindings for qpdf in
  2158 + other programming languages that have their own I/O systems, etc.
  2159 +
  2160 + - Add new method ``QPDF::getEncryptionKey`` for retrieving the
  2161 + underlying encryption key used in the file.
  2162 +
  2163 + - This release includes a small handful of non-compatible API
  2164 + changes. While effort is made to avoid such changes, all the
  2165 + non-compatible API changes in this version were to parts of the
  2166 + API that would likely never be used outside the library itself. In
  2167 + all cases, the altered methods or structures were parts of the
  2168 + ``QPDF`` that were public to enable them to be called from either
  2169 + ``QPDFWriter`` or were part of validation code that was
  2170 + over-zealous in reporting problems in parts of the file that would
  2171 + not ordinarily be referenced. In no case did any of the removed
  2172 + methods do anything worse that falsely report error conditions in
  2173 + files that were broken in ways that didn't matter. The following
  2174 + public parts of the ``QPDF`` class were changed in a
  2175 + non-compatible way:
  2176 +
  2177 + - Updated nested ``QPDF::EncryptionData`` class to add fields
  2178 + needed by the newer encryption formats, member variables
  2179 + changed to private so that future changes will not require
  2180 + breaking backward compatibility.
  2181 +
  2182 + - Added additional parameters to ``compute_data_key``, which is
  2183 + used by ``QPDFWriter`` to compute the encryption key used to
  2184 + encrypt a specific object.
  2185 +
  2186 + - Removed the method ``flattenScalarReferences``. This method was
  2187 + previously used prior to writing a new PDF file, but it has the
  2188 + undesired side effect of causing qpdf to read objects in the
  2189 + file that were not referenced. Some otherwise files have
  2190 + unreferenced objects with errors in them, so this could cause
  2191 + qpdf to reject files that would be accepted by virtually all
  2192 + other PDF readers. In fact, qpdf relied on only a very small
  2193 + part of what flattenScalarReferences did, so only this part has
  2194 + been preserved, and it is now done directly inside
  2195 + ``QPDFWriter``.
  2196 +
  2197 + - Removed the method ``decodeStreams``. This method was used by
  2198 + the :samp:`--check` option of the
  2199 + :command:`qpdf` command-line tool to force all
  2200 + streams in the file to be decoded, but it also suffered from
  2201 + the problem of opening otherwise unreferenced streams and thus
  2202 + could report false positive. The
  2203 + :samp:`--check` option now causes qpdf to go
  2204 + through all the motions of writing a new file based on the
  2205 + original one, so it will always reference and check exactly
  2206 + those parts of a file that any ordinary viewer would check.
  2207 +
  2208 + - Removed the method ``trimTrailerForWrite``. This method was
  2209 + used by ``QPDFWriter`` to modify the original QPDF object by
  2210 + removing fields from the trailer dictionary that wouldn't apply
  2211 + to the newly written file. This functionality, though generally
  2212 + harmless, was a poor implementation and has been replaced by
  2213 + having QPDFWriter filter these out when copying the trailer
  2214 + rather than modifying the original QPDF object. (Note that qpdf
  2215 + never modifies the original file itself.)
  2216 +
  2217 + - Allow the PDF header to appear anywhere in the first 1024 bytes of
  2218 + the file. This is consistent with what other readers do.
  2219 +
  2220 + - Fix the :command:`pkg-config` files to list zlib
  2221 + and pcre in ``Requires.private`` to better support static linking
  2222 + using :command:`pkg-config`.
  2223 +
  2224 +3.0.2: September 6, 2012
  2225 + - Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not
  2226 + used with ``QPDFWriter::setStaticID``, which made it pretty much
  2227 + useless. This has been fixed.
  2228 +
  2229 + - New API call ``QPDFWriter::setExtraHeaderText`` inserts additional
  2230 + text near the header of the PDF file. The intended use case is to
  2231 + insert comments that may be consumed by a downstream application,
  2232 + though other use cases may exist.
  2233 +
  2234 +3.0.1: August 11, 2012
  2235 + - Version 3.0.0 included addition of files for
  2236 + :command:`pkg-config`, but this was not mentioned
  2237 + in the release notes. The release notes for 3.0.0 were updated to
  2238 + mention this.
  2239 +
  2240 + - Bug fix: if an object stream ended with a scalar object not
  2241 + followed by space, qpdf would incorrectly report that it
  2242 + encountered a premature EOF. This bug has been in qpdf since
  2243 + versionย 2.0.
  2244 +
  2245 +3.0.0: August 2, 2012
  2246 + - Acknowledgment: I would like to express gratitude for the
  2247 + contributions of Tobias Hoffmann toward the release of qpdf
  2248 + version 3.0. He is responsible for most of the implementation and
  2249 + design of the new API for manipulating pages, and contributed code
  2250 + and ideas for many of the improvements made in version 3.0.
  2251 + Without his work, this release would certainly not have happened
  2252 + as soon as it did, if at all.
  2253 +
  2254 + - *Non-compatible API changes:*
  2255 +
  2256 + - The method ``QPDFObjectHandle::replaceStreamData`` that uses a
  2257 + ``StreamDataProvider`` to provide the stream data no longer
  2258 + takes a ``length`` parameter. The parameter was removed since
  2259 + this provides the user an opportunity to simplify the calling
  2260 + code. This method was introduced in version 2.2. At the time,
  2261 + the ``length`` parameter was required in order to ensure that
  2262 + calls to the stream data provider returned the same length for a
  2263 + specific stream every time they were invoked. In particular, the
  2264 + linearization code depends on this. Instead, qpdf 3.0 and newer
  2265 + check for that constraint explicitly. The first time the stream
  2266 + data provider is called for a specific stream, the actual length
  2267 + is saved, and subsequent calls are required to return the same
  2268 + number of bytes. This means the calling code no longer has to
  2269 + compute the length in advance, which can be a significant
  2270 + simplification. If your code fails to compile because of the
  2271 + extra argument and you don't want to make other changes to your
  2272 + code, just omit the argument.
  2273 +
  2274 + - Many methods take ``long long`` instead of other integer types.
  2275 + Most if not all existing code should compile fine with this
  2276 + change since such parameters had always previously been smaller
  2277 + types. This change was required to support files larger than two
  2278 + gigabytes in size.
  2279 +
  2280 + - Support has been added for large files. The test suite verifies
  2281 + support for files larger than 4 gigabytes, and manual testing has
  2282 + verified support for files larger than 10 gigabytes. Large file
  2283 + support is available for both 32-bit and 64-bit platforms as long
  2284 + as the compiler and underlying platforms support it.
  2285 +
  2286 + - Support for page selection (splitting and merging PDF files) has
  2287 + been added to the :command:`qpdf` command-line
  2288 + tool. See :ref:`ref.page-selection`.
  2289 +
  2290 + - Options have been added to the :command:`qpdf`
  2291 + command-line tool for copying encryption parameters from another
  2292 + file. See :ref:`ref.basic-options`.
  2293 +
  2294 + - New methods have been added to the ``QPDF`` object for adding and
  2295 + removing pages. See :ref:`ref.adding-and-remove-pages`.
  2296 +
  2297 + - New methods have been added to the ``QPDF`` object for copying
  2298 + objects from other PDF files. See :ref:`ref.foreign-objects`
  2299 +
  2300 + - A new method ``QPDFObjectHandle::parse`` has been added for
  2301 + constructing ``QPDFObjectHandle`` objects from a string
  2302 + description.
  2303 +
  2304 + - Methods have been added to ``QPDFWriter`` to allow writing to an
  2305 + already open stdio ``FILE*`` addition to writing to standard
  2306 + output or a named file. Methods have been added to ``QPDF`` to be
  2307 + able to process a file from an already open stdio ``FILE*``. This
  2308 + makes it possible to read and write PDF from secure temporary
  2309 + files that have been unlinked prior to being fully read or
  2310 + written.
  2311 +
  2312 + - The ``QPDF::emptyPDF`` can be used to allow creation of PDF files
  2313 + from scratch. The example
  2314 + :file:`examples/pdf-create.cc` illustrates how
  2315 + it can be used.
  2316 +
  2317 + - Several methods to take ``PointerHolder<Buffer>`` can now also
  2318 + accept ``std::string`` arguments.
  2319 +
  2320 + - Many new convenience methods have been added to the library, most
  2321 + in ``QPDFObjectHandle``. See :file:`ChangeLog`
  2322 + for a full list.
  2323 +
  2324 + - When building on a platform that supports ELF shared libraries
  2325 + (such as Linux), symbol versions are enabled by default. They can
  2326 + be disabled by passing
  2327 + :samp:`--disable-ld-version-script` to
  2328 + :command:`./configure`.
  2329 +
  2330 + - The file :file:`libqpdf.pc` is now installed
  2331 + to support :command:`pkg-config`.
  2332 +
  2333 + - Image comparison tests are off by default now since they are not
  2334 + needed to verify a correct build or port of qpdf. They are needed
  2335 + only when changing the actual PDF output generated by qpdf. You
  2336 + should enable them if you are making deep changes to qpdf itself.
  2337 + See :file:`README.md` for details.
  2338 +
  2339 + - Large file tests are off by default but can be turned on with
  2340 + :command:`./configure` or by setting an environment
  2341 + variable before running the test suite. See
  2342 + :file:`README.md` for details.
  2343 +
  2344 + - When qpdf's test suite fails, failures are not printed to the
  2345 + terminal anymore by default. Instead, find them in
  2346 + :file:`build/qtest.log`. For packagers who are
  2347 + building with an autobuilder, you can add the
  2348 + :samp:`--enable-show-failed-test-output` option to
  2349 + :command:`./configure` to restore the old behavior.
  2350 +
  2351 +2.3.1: December 28, 2011
  2352 + - Fix thread-safety problem resulting from non-thread-safe use of
  2353 + the PCRE library.
  2354 +
  2355 + - Made a few minor documentation fixes.
  2356 +
  2357 + - Add workaround for a bug that appears in some versions of
  2358 + ghostscript to the test suite
  2359 +
  2360 + - Fix minor build issue for Visual C++ 2010.
  2361 +
  2362 +2.3.0: August 11, 2011
  2363 + - Bug fix: when preserving existing encryption on encrypted files
  2364 + with cleartext metadata, older qpdf versions would generate
  2365 + password-protected files with no valid password. This operation
  2366 + now works. This bug only affected files created by copying
  2367 + existing encryption parameters; explicit encryption with
  2368 + specification of cleartext metadata worked before and continues to
  2369 + work.
  2370 +
  2371 + - Enhance ``QPDFWriter`` with a new constructor that allows you to
  2372 + delay the specification of the output file. When using this
  2373 + constructor, you may now call ``QPDFWriter::setOutputFilename`` to
  2374 + specify the output file, or you may use
  2375 + ``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write
  2376 + the resulting PDF file to a memory buffer. You may then use
  2377 + ``QPDFWriter::getBuffer`` to retrieve the memory buffer.
  2378 +
  2379 + - Add new API call ``QPDF::replaceObject`` for replacing objects by
  2380 + object ID
  2381 +
  2382 + - Add new API call ``QPDF::swapObjects`` for swapping two objects by
  2383 + object ID
  2384 +
  2385 + - Add ``QPDFObjectHandle::getDictAsMap`` and
  2386 + ``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of
  2387 + dictionary objects as maps and array objects as vectors.
  2388 +
  2389 + - Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to
  2390 + the C API for manipulating string fields of the document's
  2391 + ``/Info`` dictionary.
  2392 +
  2393 + - Add functions ``qpdf_init_write_memory``,
  2394 + ``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API
  2395 + for writing PDF files to a memory buffer instead of a file.
  2396 +
  2397 +2.2.4: June 25, 2011
  2398 + - Fix installation and compilation issues; no functionality changes.
  2399 +
  2400 +2.2.3: April 30, 2011
  2401 + - Handle some damaged streams with incorrect characters following
  2402 + the stream keyword.
  2403 +
  2404 + - Improve handling of inline images when normalizing content
  2405 + streams.
  2406 +
  2407 + - Enhance error recovery to properly handle files that use object 0
  2408 + as a regular object, which is specifically disallowed by the spec.
  2409 +
  2410 +2.2.2: October 4, 2010
  2411 + - Add new function ``qpdf_read_memory`` to the C API to call
  2412 + ``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1.
  2413 +
  2414 +2.2.1: October 1, 2010
  2415 + - Add new method ``QPDF::setOutputStreams`` to replace ``std::cout``
  2416 + and ``std::cerr`` with other streams for generation of diagnostic
  2417 + messages and error messages. This can be useful for GUIs or other
  2418 + applications that want to capture any output generated by the
  2419 + library to present to the user in some other way. Note that QPDF
  2420 + does not write to ``std::cout`` (or the specified output stream)
  2421 + except where explicitly mentioned in
  2422 + :file:`QPDF.hh`, and that the only use of the
  2423 + error stream is for warnings. Note also that output of warnings is
  2424 + suppressed when ``setSuppressWarnings(true)`` is called.
  2425 +
  2426 + - Add new method ``QPDF::processMemoryFile`` for operating on PDF
  2427 + files that are loaded into memory rather than in a file on disk.
  2428 +
  2429 + - Give a warning but otherwise ignore empty PDF objects by treating
  2430 + them as null. Empty object are not permitted by the PDF
  2431 + specification but have been known to appear in some actual PDF
  2432 + files.
  2433 +
  2434 + - Handle inline image filter abbreviations when the appear as stream
  2435 + filter abbreviations. The PDF specification does not allow use of
  2436 + stream filter abbreviations in this way, but Adobe Reader and some
  2437 + other PDF readers accept them since they sometimes appear
  2438 + incorrectly in actual PDF files.
  2439 +
  2440 + - Implement miscellaneous enhancements to ``PointerHolder`` and
  2441 + ``Buffer`` to support other changes.
  2442 +
  2443 +2.2.0: August 14, 2010
  2444 + - Add new methods to ``QPDFObjectHandle`` (``newStream`` and
  2445 + ``replaceStreamData`` for creating new streams and replacing
  2446 + stream data. This makes it possible to perform a wide range of
  2447 + operations that were not previously possible.
  2448 +
  2449 + - Add new helper method in ``QPDFObjectHandle``
  2450 + (``addPageContents``) for appending or prepending new content
  2451 + streams to a page. This method makes it possible to manipulate
  2452 + content streams without having to be concerned whether a page's
  2453 + contents are a single stream or an array of streams.
  2454 +
  2455 + - Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``,
  2456 + which replaces a dictionary key with a given value unless the
  2457 + value is null, in which case it removes the key instead.
  2458 +
  2459 + - Add new method in ``QPDFObjectHandle``: ``getRawStreamData``,
  2460 + which returns the raw (unfiltered) stream data into a buffer. This
  2461 + complements the ``getStreamData`` method, which returns the
  2462 + filtered (uncompressed) stream data and can only be used when the
  2463 + stream's data is filterable.
  2464 +
  2465 + - Provide two new examples:
  2466 + :command:`pdf-double-page-size` and
  2467 + :command:`pdf-invert-images` that illustrate the
  2468 + newly added interfaces.
  2469 +
  2470 + - Fix a memory leak that would cause loss of a few bytes for every
  2471 + object involved in a cycle of object references. Thanks to Jian Ma
  2472 + for calling my attention to the leak.
  2473 +
  2474 +2.1.5: April 25, 2010
  2475 + - Remove restriction of file identifier strings to 16 bytes. This
  2476 + unnecessary restriction was preventing qpdf from being able to
  2477 + encrypt or decrypt files with identifier strings that were not
  2478 + exactly 16 bytes long. The specification imposes no such
  2479 + restriction.
  2480 +
  2481 +2.1.4: April 18, 2010
  2482 + - Apply the same padding calculation fix from version 2.1.2 to the
  2483 + main cross reference stream as well.
  2484 +
  2485 + - Since :command:`qpdf --check` only performs limited
  2486 + checks, clarify the output to make it clear that there still may
  2487 + be errors that qpdf can't check. This should make it less
  2488 + surprising to people when another PDF reader is unable to read a
  2489 + file that qpdf thinks is okay.
  2490 +
  2491 +2.1.3: March 27, 2010
  2492 + - Fix bug that could cause a failure when rewriting PDF files that
  2493 + contain object streams with unreferenced objects that in turn
  2494 + reference indirect scalars.
  2495 +
  2496 + - Don't complain about (invalid) AES streams that aren't a multiple
  2497 + of 16 bytes. Instead, pad them before decrypting.
  2498 +
  2499 +2.1.2: January 24, 2010
  2500 + - Fix bug in padding around first half cross reference stream in
  2501 + linearized files. The bug could cause an assertion failure when
  2502 + linearizing certain unlucky files.
  2503 +
  2504 +2.1.1: December 14, 2009
  2505 + - No changes in functionality; insert missing include in an internal
  2506 + library header file to support gcc 4.4, and update test suite to
  2507 + ignore broken Adobe Reader installations.
  2508 +
  2509 +2.1: October 30, 2009
  2510 + - This is the first version of qpdf to include Windows support. On
  2511 + Windows, it is possible to build a DLL. Additionally, a partial
  2512 + C-language API has been introduced, which makes it possible to
  2513 + call qpdf functions from non-C++ environments. I am very grateful
  2514 + to ลฝarko Gajiฤ‡ (http://zarko-gajic.iz.hr/) for tirelessly testing
  2515 + numerous pre-release versions of this DLL and providing many
  2516 + excellent suggestions on improving the interface.
  2517 +
  2518 + For programming to the C interface, please see the header file
  2519 + :file:`qpdf/qpdf-c.h` and the example
  2520 + :file:`examples/pdf-linearize.c`.
  2521 +
  2522 + - ลฝarko Gajiฤ‡ has written a Delphi wrapper for qpdf, which can be
  2523 + downloaded from qpdf's download side. ลฝarko's Delphi wrapper is
  2524 + released with the same licensing terms as qpdf itself and comes
  2525 + with this disclaimer: "Delphi wrapper unit
  2526 + :file:`qpdf.pas` created by ลฝarko Gajiฤ‡
  2527 + (http://zarko-gajic.iz.hr/). Use at your own risk and for whatever
  2528 + purpose you want. No support is provided. Sample code is
  2529 + provided."
  2530 +
  2531 + - Support has been added for AES encryption and crypt filters.
  2532 + Although qpdf does not presently support files that use PKI-based
  2533 + encryption, with the addition of AES and crypt filters, qpdf is
  2534 + now be able to open most encrypted files created with newer
  2535 + versions of Acrobat or other PDF creation software. Note that I
  2536 + have not been able to get very many files encrypted in this way,
  2537 + so it's possible there could still be some cases that qpdf can't
  2538 + handle. Please report them if you find them.
  2539 +
  2540 + - Many error messages have been improved to include more information
  2541 + in hopes of making qpdf a more useful tool for PDF experts to use
  2542 + in manually recovering damaged PDF files.
  2543 +
  2544 + - Attempt to avoid compressing metadata streams if possible. This is
  2545 + consistent with other PDF creation applications.
  2546 +
  2547 + - Provide new command-line options for AES encrypt, cleartext
  2548 + metadata, and setting the minimum and forced PDF versions of
  2549 + output files.
  2550 +
  2551 + - Add additional methods to the ``QPDF`` object for querying the
  2552 + document's permissions. Although qpdf does not enforce these
  2553 + permissions, it does make them available so that applications that
  2554 + use qpdf can enforce permissions.
  2555 +
  2556 + - The :samp:`--check` option to
  2557 + :command:`qpdf` has been extended to include some
  2558 + additional information.
  2559 +
  2560 + - *Non-compatible API changes:*
  2561 +
  2562 + - QPDF's exception handling mechanism now uses
  2563 + ``std::logic_error`` for internal errors and
  2564 + ``std::runtime_error`` for runtime errors in favor of the now
  2565 + removed ``QEXC`` classes used in previous versions. The ``QEXC``
  2566 + exception classes predated the addition of the
  2567 + :file:`<stdexcept>` header file to the C++ standard library.
  2568 + Most of the exceptions thrown by the qpdf library itself are
  2569 + still of type ``QPDFExc`` which is now derived from
  2570 + ``std::runtime_error``. Programs that catch an instance of
  2571 + ``std::exception`` and displayed it by calling the ``what()``
  2572 + method will not need to be changed.
  2573 +
  2574 + - The ``QPDFExc`` class now internally represents various fields
  2575 + of the error condition and provides interfaces for querying
  2576 + them. Among the fields is a numeric error code that can help
  2577 + applications act differently on (a small number of) different
  2578 + error conditions. See :file:`QPDFExc.hh` for details.
  2579 +
  2580 + - Warnings can be retrieved from qpdf as instances of ``QPDFExc``
  2581 + instead of strings.
  2582 +
  2583 + - The nested ``QPDF::EncryptionData`` class's constructor takes an
  2584 + additional argument. This class is primarily intended to be used
  2585 + by ``QPDFWriter``. There's not really anything useful an
  2586 + end-user application could do with it. It probably shouldn't
  2587 + really be part of the public interface to begin with. Likewise,
  2588 + some of the methods for computing internal encryption dictionary
  2589 + parameters have changed to support ``/R=4`` encryption.
  2590 +
  2591 + - The method ``QPDF::getUserPassword`` has been removed since it
  2592 + didn't do what people would think it did. There are now two new
  2593 + methods: ``QPDF::getPaddedUserPassword`` and
  2594 + ``QPDF::getTrimmedUserPassword``. The first one does what the
  2595 + old ``QPDF::getUserPassword`` method used to do, which is to
  2596 + return the password with possible binary padding as specified by
  2597 + the PDF specification. The second one returns a human-readable
  2598 + password string.
  2599 +
  2600 + - The enumerated types that used to be nested in ``QPDFWriter``
  2601 + have moved to top-level enumerated types and are now defined in
  2602 + the file :file:`qpdf/Constants.h`. This enables them to be
  2603 + shared by both the C and C++ interfaces.
  2604 +
  2605 +2.0.6: May 3, 2009
  2606 + - Do not attempt to uncompress streams that have decode parameters
  2607 + we don't recognize. Earlier versions of qpdf would have rejected
  2608 + files with such streams.
  2609 +
  2610 +2.0.5: March 10, 2009
  2611 + - Improve error handling in the LZW decoder, and fix a small error
  2612 + introduced in the previous version with regard to handling full
  2613 + tables. The LZW decoder has been more strongly verified in this
  2614 + release.
  2615 +
  2616 +2.0.4: February 21, 2009
  2617 + - Include proper support for LZW streams encoded without the "early
  2618 + code change" flag. Special thanks to Atom Smasher who reported the
  2619 + problem and provided an input file compressed in this way, which I
  2620 + did not previously have.
  2621 +
  2622 + - Implement some improvements to file recovery logic.
  2623 +
  2624 +2.0.3: February 15, 2009
  2625 + - Compile cleanly with gcc 4.4.
  2626 +
  2627 + - Handle strings encoded as UTF-16BE properly.
  2628 +
  2629 +2.0.2: June 30, 2008
  2630 + - Update test suite to work properly with a
  2631 + non-:command:`bash`
  2632 + :file:`/bin/sh` and with Perl 5.10. No changes
  2633 + were made to the actual qpdf source code itself for this release.
  2634 +
  2635 +2.0.1: May 6, 2008
  2636 + - No changes in functionality or interface. This release includes
  2637 + fixes to the source code so that qpdf compiles properly and passes
  2638 + its test suite on a broader range of platforms. See
  2639 + :file:`ChangeLog` in the source distribution
  2640 + for details.
  2641 +
  2642 +2.0: April 29, 2008
  2643 + - First public release.
manual/weak-crypto.rst 0 โ†’ 100644
  1 +.. _ref.weak-crypto:
  2 +
  3 +Weak Cryptography
  4 +=================
  5 +
  6 +Start with version 10.4, qpdf is taking steps to reduce the likelihood
  7 +of a user *accidentally* creating PDF files with insecure cryptography
  8 +but will continue to allow creation of such files indefinitely with
  9 +explicit acknowledgment.
  10 +
  11 +The PDF file format makes use of RC4, which is known to be a weak
  12 +cryptography algorithm, and MD5, which is a weak hashing algorithm. In
  13 +version 10.4, qpdf generates warnings for some (but not all) cases of
  14 +writing files with weak cryptography when invoked from the command-line.
  15 +These warnings can be suppressed using the
  16 +:samp:`--allow-weak-crypto` option.
  17 +
  18 +It is planned for qpdf version 11 to be stricter, making it an error to
  19 +write files with insecure cryptography from the command-line tool in
  20 +most cases without specifying the
  21 +:samp:`--allow-weak-crypto` flag and also to require
  22 +explicit steps when using the C++ library to enable use of insecure
  23 +cryptography.
  24 +
  25 +Note that qpdf must always retain support for weak cryptographic
  26 +algorithms since this is required for reading older PDF files that use
  27 +it. Additionally, qpdf will always retain the ability to create files
  28 +using weak cryptographic algorithms since, as a development tool, qpdf
  29 +explicitly supports creating older or deprecated types of PDF files
  30 +since these are sometimes needed to test or work with older versions of
  31 +software. Even if other cryptography libraries drop support for RC4 or
  32 +MD5, qpdf can always fall back to its internal implementations of those
  33 +algorithms, so they are not going to disappear from qpdf.