Commit abb6a16ed16b6137b829bc88a6f2b8b3b6c8cf35

Authored by Jay Berkenbilt
1 parent 22d53f92

Insert output of pandoc as is

pandoc -f docbook -t rst qpdf-manual.xml >| /tmp/a.rst
Insert /tmp/a.rst into existing index.rst
Showing 1 changed file with 6324 additions and 0 deletions
manual/index.rst
Changes suppressed. Click to show
... ... @@ -6,6 +6,6330 @@ QPDF version |release|
6 6 :maxdepth: 2
7 7 :caption: Contents:
8 8  
  9 +.. _acknowledgments:
  10 +
  11 +General Information
  12 +===================
  13 +
  14 +QPDF is a program that does structural, content-preserving
  15 +transformations on PDF files. QPDF's website is located at
  16 +https://qpdf.sourceforge.io/. QPDF's source code is hosted on github at
  17 +https://github.com/qpdf/qpdf.
  18 +
  19 +QPDF is licensed under `the Apache License, Version
  20 +2.0 <http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License").
  21 +Unless required by applicable law or agreed to in writing, software
  22 +distributed under the License is distributed on an "AS IS" BASIS,
  23 +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  24 +See the License for the specific language governing permissions and
  25 +limitations under the License.
  26 +
  27 +Versions of qpdf prior to version 7 were released under the terms of
  28 +`the Artistic License, version
  29 +2.0 <https://opensource.org/licenses/Artistic-2.0>`__. At your option,
  30 +you may continue to consider qpdf to be licensed under those terms. The
  31 +Apache License 2.0 permits everything that the Artistic License 2.0
  32 +permits but is slightly less restrictive. Allowing the Artistic License
  33 +to continue being used is primary to help people who may have to get
  34 +specific approval to use qpdf in their products.
  35 +
  36 +QPDF is intentionally released with a permissive license. However, if
  37 +there is some reason that the licensing terms don't work for your
  38 +requirements, please feel free to contact the copyright holder to make
  39 +other arrangements.
  40 +
  41 +QPDF was originally created in 2001 and modified periodically between
  42 +2001 and 2005 during my employment at `Apex
  43 +CoVantage <http://www.apexcovantage.com>`__. Upon my departure from
  44 +Apex, the company graciously allowed me to take ownership of the
  45 +software and continue maintaining as an open source project, a decision
  46 +for which I am very grateful. I have made considerable enhancements to
  47 +it since that time. I feel fortunate to have worked for people who would
  48 +make such a decision. This work would not have been possible without
  49 +their support.
  50 +
  51 +.. _ref.overview:
  52 +
  53 +What is QPDF?
  54 +=============
  55 +
  56 +QPDF is a program that does structural, content-preserving
  57 +transformations on PDF files. It could have been called something like
  58 +*pdf-to-pdf*. It also provides many useful capabilities to developers of
  59 +PDF-producing software or for people who just want to look at the
  60 +innards of a PDF file to learn more about how they work.
  61 +
  62 +With QPDF, it is possible to copy objects from one PDF file into another
  63 +and to manipulate the list of pages in a PDF file. This makes it
  64 +possible to merge and split PDF files. The QPDF library also makes it
  65 +possible for you to create PDF files from scratch. In this mode, you are
  66 +responsible for supplying all the contents of the file, while the QPDF
  67 +library takes care off all the syntactical representation of the
  68 +objects, creation of cross references tables and, if you use them,
  69 +object streams, encryption, linearization, and other syntactic details.
  70 +You are still responsible for generating PDF content on your own.
  71 +
  72 +QPDF has been designed with very few external dependencies, and it is
  73 +intentionally very lightweight. QPDF is *not* a PDF content creation
  74 +library, a PDF viewer, or a program capable of converting PDF into other
  75 +formats. In particular, QPDF knows nothing about the semantics of PDF
  76 +content streams. If you are looking for something that can do that, you
  77 +should look elsewhere. However, once you have a valid PDF file, QPDF can
  78 +be used to transform that file in ways perhaps your original PDF
  79 +creation can't handle. For example, many programs generate simple PDF
  80 +files but can't password-protect them, web-optimize them, or perform
  81 +other transformations of that type.
  82 +
  83 +.. _ref.installing:
  84 +
  85 +Building and Installing QPDF
  86 +============================
  87 +
  88 +This chapter describes how to build and install qpdf. Please see also
  89 +the @1@filename@1@README.md@2@filename@2@ and
  90 +@1@filename@1@INSTALL@2@filename@2@ files in the source distribution.
  91 +
  92 +.. _ref.prerequisites:
  93 +
  94 +System Requirements
  95 +-------------------
  96 +
  97 +The qpdf package has few external dependencies. In order to build qpdf,
  98 +the following packages are required:
  99 +
  100 +- A C++ compiler that supports C++-14.
  101 +
  102 +- zlib: http://www.zlib.net/
  103 +
  104 +- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/
  105 +
  106 +- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be
  107 + able to use the gnutls crypto provider, and/or openssl:
  108 + https://openssl.org/ to be able to use the openssl crypto provider.
  109 +
  110 +- gnu make 3.81 or newer: http://www.gnu.org/software/make
  111 +
  112 +- perl version 5.8 or newer: http://www.perl.org/; required for running
  113 + the test suite. Starting with qpdf version 9.1.1, perl is no longer
  114 + required at runtime.
  115 +
  116 +- GNU diffutils (any version): http://www.gnu.org/software/diffutils/
  117 + is required to run the test suite. Note that this is the version of
  118 + diff present on virtually all GNU/Linux systems. This is required
  119 + because the test suite uses @1@command@1@diff -u@2@command@2@.
  120 +
  121 +Part of qpdf's test suite does comparisons of the contents PDF files by
  122 +converting them images and comparing the images. The image comparison
  123 +tests are disabled by default. Those tests are not required for
  124 +determining correctness of a qpdf build if you have not modified the
  125 +code since the test suite also contains expected output files that are
  126 +compared literally. The image comparison tests provide an extra check to
  127 +make sure that any content transformations don't break the rendering of
  128 +pages. Transformations that affect the content streams themselves are
  129 +off by default and are only provided to help developers look into the
  130 +contents of PDF files. If you are making deep changes to the library
  131 +that cause changes in the contents of the files that qpdf generates,
  132 +then you should enable the image comparison tests. Enable them by
  133 +running @1@command@1@configure@2@command@2@ with the
  134 +@1@option@1@--enable-test-compare-images@2@option@2@ flag. If you enable
  135 +this, the following additional requirements are required by the test
  136 +suite. Note that in no case are these items required to use qpdf.
  137 +
  138 +- libtiff: http://www.remotesensing.org/libtiff/
  139 +
  140 +- GhostScript version 8.60 or newer: http://www.ghostscript.com
  141 +
  142 +If you do not enable this, then you do not need to have tiff and
  143 +ghostscript.
  144 +
  145 +Pre-built documentation is distributed with qpdf, so you should
  146 +generally not need to rebuild the documentation. In order to build the
  147 +documentation from its docbook sources, you need the docbook XML style
  148 +sheets (http://downloads.sourceforge.net/docbook/). To build the PDF
  149 +version of the documentation, you need Apache fop
  150 +(http://xml.apache.org/fop/) version 0.94 or higher.
  151 +
  152 +.. _ref.building:
  153 +
  154 +Build Instructions
  155 +------------------
  156 +
  157 +Building qpdf on UNIX is generally just a matter of running
  158 +
  159 +::
  160 +
  161 + ./configure
  162 + make
  163 +
  164 +You can also run @1@command@1@make check@2@command@2@ to run the test
  165 +suite and @1@command@1@make install@2@command@2@ to install. Please run
  166 +@1@command@1@./configure --help@2@command@2@ for options on what can be
  167 +configured. You can also set the value of ``DESTDIR`` during
  168 +installation to install to a temporary location, as is common with many
  169 +open source packages. Please see also the
  170 +@1@filename@1@README.md@2@filename@2@ and
  171 +@1@filename@1@INSTALL@2@filename@2@ files in the source distribution.
  172 +
  173 +Building on Windows is a little bit more complicated. For details,
  174 +please see @1@filename@1@README-windows.md@2@filename@2@ in the source
  175 +distribution. You can also download a binary distribution for Windows.
  176 +There is a port of qpdf to Visual C++ version 6 in the
  177 +@1@filename@1@contrib@2@filename@2@ area generously contributed by Jian
  178 +Ma. This is also discussed in more detail in
  179 +@1@filename@1@README-windows.md@2@filename@2@.
  180 +
  181 +While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one
  182 +place in the public API, and it's just in a helper function. It is
  183 +possible to build qpdf on a system that doesn't have ``wchar_t``, and
  184 +it's also possible to compile a program that uses qpdf on a system
  185 +without ``wchar_t`` as long as you don't call that one method. This is a
  186 +very unusual situation. For a detailed discussion, please see the
  187 +top-level README.md file in qpdf's source distribution.
  188 +
  189 +There are some other things you can do with the build. Although qpdf
  190 +uses @1@application@1@autoconf@2@application@2@, it does not use
  191 +@1@application@1@automake@2@application@2@ but instead uses a
  192 +hand-crafted non-recursive Makefile that requires gnu make. If you're
  193 +really interested, please read the comments in the top-level
  194 +@1@filename@1@Makefile@2@filename@2@.
  195 +
  196 +.. _ref.crypto:
  197 +
  198 +Crypto Providers
  199 +----------------
  200 +
  201 +Starting with qpdf 9.1.0, the qpdf library can be built with multiple
  202 +implementations of providers of cryptographic functions, which we refer
  203 +to as "crypto providers." At the time of writing, a crypto
  204 +implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes
  205 +and RC4 and AES256 with and without CBC encryption. In the future, if
  206 +digital signature is added to qpdf, there may be additional requirements
  207 +beyond this.
  208 +
  209 +Starting with qpdf version 9.1.0, the available implementations are
  210 +``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added.
  211 +Additional implementations may be added if needed. It is also possible
  212 +for a developer to provide their own implementation without modifying
  213 +the qpdf library.
  214 +
  215 +.. _ref.crypto.build:
  216 +
  217 +Build Support For Crypto Providers
  218 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  219 +
  220 +When building with qpdf's build system, crypto providers can be enabled
  221 +at build time using various @1@command@1@./configure@2@command@2@
  222 +options. The default behavior is for
  223 +@1@command@1@./configure@2@command@2@ to discover which crypto providers
  224 +can be supported based on available external libraries, to build all
  225 +available crypto providers, and to use an external provider as the
  226 +default over the native one. This behavior can be changed with the
  227 +following flags to @1@command@1@./configure@2@command@2@:
  228 +
  229 +- @1@option@1@--enable-crypto-@1@replaceable@1@x@2@replaceable@2@@2@option@2@
  230 + (where @1@replaceable@1@x@2@replaceable@2@ is a supported crypto
  231 + provider): enable the @1@replaceable@1@x@2@replaceable@2@ crypto
  232 + provider, requiring any external dependencies it needs
  233 +
  234 +- @1@option@1@--disable-crypto-@1@replaceable@1@x@2@replaceable@2@@2@option@2@:
  235 + disable the @1@replaceable@1@x@2@replaceable@2@ provider, and do not
  236 + link against its dependencies even if they are available
  237 +
  238 +- @1@option@1@--with-default-crypto=@1@replaceable@1@x@2@replaceable@2@@2@option@2@:
  239 + make @1@replaceable@1@x@2@replaceable@2@ the default provider even if
  240 + a higher priority one is available
  241 +
  242 +- @1@option@1@--disable-implicit-crypto@2@option@2@: only build crypto
  243 + providers that are explicitly requested with an
  244 + @1@option@1@--enable-crypto-@1@replaceable@1@x@2@replaceable@2@@2@option@2@
  245 + option
  246 +
  247 +For example, if you want to guarantee that the gnutls crypto provider is
  248 +used and that the native provider is not built, you could run
  249 +@1@command@1@./configure --enable-crypto-gnutls
  250 +--disable-implicit-crypto@2@command@2@.
  251 +
  252 +If you build qpdf using your own build system, in order for qpdf to work
  253 +at all, you need to enable at least one crypto provider. The file
  254 +@1@filename@1@libqpdf/qpdf/qpdf-config.h.in@2@filename@2@ provides
  255 +macros ``DEFAULT_CRYPTO``, whose value must be a string naming the
  256 +default crypto provider, and various symbols starting with
  257 +``USE_CRYPTO_``, at least one of which has to be enabled. Additionally,
  258 +you must compile the source files that implement a crypto provider. To
  259 +get a list of those files, look at
  260 +@1@filename@1@libqpdf/build.mk@2@filename@2@. If you want to omit a
  261 +particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is
  262 +undefined, you can completely ignore the source files that belong to a
  263 +particular crypto provider. Additionally, crypto providers may have
  264 +their own external dependencies that can be omitted if the crypto
  265 +provider is not used. For example, if you are building qpdf yourself and
  266 +are using an environment that does not support gnutls or openssl, you
  267 +can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS``
  268 +is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then
  269 +you must include the source files used in the native implementation,
  270 +some of which were added or renamed from earlier versions, to your
  271 +build, and you can ignore
  272 +@1@filename@1@QPDFCrypto_gnutls.cc@2@filename@2@. Always consult
  273 +@1@filename@1@libqpdf/build.mk@2@filename@2@ to get the list of source
  274 +files you need to build.
  275 +
  276 +.. _ref.crypto.runtime:
  277 +
  278 +Runtime Crypto Provider Selection
  279 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  280 +
  281 +You can use the @1@option@1@--show-crypto@2@option@2@ option to
  282 +@1@command@1@qpdf@2@command@2@ to get a list of available crypto
  283 +providers. The default provider is always listed first, and the rest are
  284 +listed in lexical order. Each crypto provider is listed on a line by
  285 +itself with no other text, enabling the output of this command to be
  286 +used easily in scripts.
  287 +
  288 +You can override which crypto provider is used by setting the
  289 +``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to
  290 +ever do this, but you might want to do it if you were explicitly trying
  291 +to compare behavior of two different crypto providers while testing
  292 +performance or reproducing a bug. It could also be useful for people who
  293 +are implementing their own crypto providers.
  294 +
  295 +.. _ref.crypto.develop:
  296 +
  297 +Crypto Provider Information for Developers
  298 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  299 +
  300 +If you are writing code that uses libqpdf and you want to force a
  301 +certain crypto provider to be used, you can call the method
  302 +``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of
  303 +a built-in or developer-supplied provider. To add your own crypto
  304 +provider, you have to create a class derived from ``QPDFCryptoImpl`` and
  305 +register it with ``QPDFCryptoProvider``. For additional information, see
  306 +comments in @1@filename@1@include/qpdf/QPDFCryptoImpl.hh@2@filename@2@.
  307 +
  308 +.. _ref.crypto.design:
  309 +
  310 +Crypto Provider Design Notes
  311 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  312 +
  313 +This section describes a few bits of rationale for why the crypto
  314 +provider interface was set up the way it was. You don't need to know any
  315 +of this information, but it's provided for the record and in case it's
  316 +interesting.
  317 +
  318 +As a general rule, I want to avoid as much as possible including large
  319 +blocks of code that are conditionally compiled such that, in most
  320 +builds, some code is never built. This is dangerous because it makes it
  321 +very easy for invalid code to creep in unnoticed. As such, I want it to
  322 +be possible to build qpdf with all available crypto providers, and this
  323 +is the way I build qpdf for local development. At the same time, if a
  324 +particular packager feels that it is a security liability for qpdf to
  325 +use crypto functionality from other than a library that gets
  326 +considerable scrutiny for this specific purpose (such as gnutls,
  327 +openssl, or nettle), then I want to give that packager the ability to
  328 +completely disable qpdf's native implementation. Or if someone wants to
  329 +avoid adding a dependency on one of the external crypto providers, I
  330 +don't want the availability of the provider to impose additional
  331 +external dependencies within that environment. Both of these are
  332 +situations that I know to be true for some users of qpdf.
  333 +
  334 +I want registration and selection of crypto providers to be thread-safe,
  335 +and I want it to work deterministically for a developer to provide their
  336 +own crypto provider and be able to set it up as the default. This was
  337 +the primary motivation behind requiring C++-11 as doing so enabled me to
  338 +exploit the guaranteed thread safety of local block static
  339 +initialization. The ``QPDFCryptoProvider`` class uses a singleton
  340 +pattern with thread-safe initialization to create the singleton instance
  341 +of ``QPDFCryptoProvider`` and exposes only static methods in its public
  342 +interface. In this way, if a developer wants to call any
  343 +``QPDFCryptoProvider`` methods, the library guarantees the
  344 +``QPDFCryptoProvider`` is fully initialized and all built-in crypto
  345 +providers are registered. Making ``QPDFCryptoProvider`` actually know
  346 +about all the built-in providers may seem a bit sad at first, but this
  347 +choice makes it extremely clear exactly what the initialization behavior
  348 +is. There's no question about provider implementations automatically
  349 +registering themselves in a nondeterministic order. It also means that
  350 +implementations do not need to know anything about the provider
  351 +interface, which makes them easier to test in isolation. Another
  352 +advantage of this approach is that a developer who wants to develop
  353 +their own crypto provider can do so in complete isolation from the qpdf
  354 +library and, with just two calls, can make qpdf use their provider in
  355 +their application. If they decided to contribute their code, plugging it
  356 +into the qpdf library would require a very small change to qpdf's source
  357 +code.
  358 +
  359 +The decision to make the crypto provider selectable at runtime was one I
  360 +struggled with a little, but I decided to do it for various reasons.
  361 +Allowing an end user to switch crypto providers easily could be very
  362 +useful for reproducing a potential bug. If a user reports a bug that
  363 +some cryptographic thing is broken, I can easily ask that person to try
  364 +with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The
  365 +same could apply in the event of a performance problem. This also makes
  366 +it easier for qpdf's own test suite to exercise code with different
  367 +providers without having to make every program that links with qpdf
  368 +aware of the possibility of multiple providers. In qpdf's continuous
  369 +integration environment, the entire test suite is run for each supported
  370 +crypto provider. This is made simple by being able to select the
  371 +provider using an environment variable.
  372 +
  373 +Finally, making crypto providers selectable in this way establish a
  374 +pattern that I may follow again in the future for stream filter
  375 +providers. One could imagine a future enhancement where someone could
  376 +provide their own implementations for basic filters like
  377 +``/FlateDecode`` or for other filters that qpdf doesn't support.
  378 +Implementing the registration functions and internal storage of
  379 +registered providers was also easier using C++-11's functional
  380 +interfaces, which was another reason to require C++-11 at this time.
  381 +
  382 +.. _ref.packaging:
  383 +
  384 +Notes for Packagers
  385 +-------------------
  386 +
  387 +If you are packaging qpdf for an operating system distribution, here are
  388 +some things you may want to keep in mind:
  389 +
  390 +- Starting in qpdf version 9.1.1, qpdf no longer has a runtime
  391 + dependency on perl. This is because fix-qdf was rewritten in C++.
  392 + However, qpdf still has a build-time dependency on perl.
  393 +
  394 +- Make sure you are getting the intended behavior with regard to crypto
  395 + providers. Read `Build Support For Crypto
  396 + Providers <#ref.crypto.build>`__ for details.
  397 +
  398 +- Passing @1@option@1@--enable-show-failed-test-output@2@option@2@ to
  399 + @1@command@1@./configure@2@command@2@ will cause any failed test
  400 + output to be written to the console. This can be very useful for
  401 + seeing test failures generated by autobuilders where you can't access
  402 + qtest.log after the fact.
  403 +
  404 +- If qpdf's build environment detects the presence of autoconf and
  405 + related tools, it will check to ensure that automatically generated
  406 + files are up-to-date with recorded checksums and fail if it detects a
  407 + discrepancy. This feature is intended to prevent you from
  408 + accidentally forgetting to regenerate automatic files after modifying
  409 + their sources. If your packaging environment automatically refreshes
  410 + automatic files, it can cause this check to fail. Suppress qpdf's
  411 + checks by passing @1@option@1@--disable-check-autofiles@2@option@2@
  412 + to @1@command@1@/.configure@2@command@2@. This is safe since qpdf's
  413 + @1@command@1@autogen.sh@2@command@2@ just runs autotools in the
  414 + normal way.
  415 +
  416 +- QPDF's @1@command@1@make install@2@command@2@ does not install
  417 + completion files by default, but as a packager, it's good if you
  418 + install them wherever your distribution expects such files to go. You
  419 + can find completion files to install in the
  420 + @1@filename@1@completions@2@filename@2@ directory.
  421 +
  422 +- Packagers are encouraged to install the source files from the
  423 + @1@filename@1@examples@2@filename@2@ directory along with qpdf
  424 + development packages.
  425 +
  426 +.. _ref.using:
  427 +
  428 +Running QPDF
  429 +============
  430 +
  431 +This chapter describes how to run the qpdf program from the command
  432 +line.
  433 +
  434 +.. _ref.invocation:
  435 +
  436 +Basic Invocation
  437 +----------------
  438 +
  439 +When running qpdf, the basic invocation is as follows:
  440 +
  441 +::
  442 +
  443 + @1@command@1@qpdf@2@command@2@@1@option@1@ [ @1@replaceable@1@options@2@replaceable@2@ ] { @1@replaceable@1@infilename@2@replaceable@2@ | @1@option@1@--empty@2@option@2@ } [ @1@replaceable@1@page_selection_options@2@replaceable@2@ ] @1@replaceable@1@outfilename@2@replaceable@2@@2@option@2@
  444 +
  445 +This converts PDF file @1@option@1@infilename@2@option@2@ to PDF file
  446 +@1@option@1@outfilename@2@option@2@. The output file is functionally
  447 +identical to the input file but may have been structurally reorganized.
  448 +Also, orphaned objects will be removed from the file. Many
  449 +transformations are available as controlled by the options below. In
  450 +place of @1@option@1@infilename@2@option@2@, the parameter
  451 +@1@option@1@--empty@2@option@2@ may be specified. This causes qpdf to
  452 +use a dummy input file that contains zero pages. The only normal use
  453 +case for using @1@option@1@--empty@2@option@2@ would be if you were
  454 +going to add pages from another source, as discussed in `Page Selection
  455 +Options <#ref.page-selection>`__.
  456 +
  457 +If @1@option@1@@filename@2@option@2@ appears as a word anywhere in the
  458 +command-line, it will be read line by line, and each line will be
  459 +treated as a command-line argument. Leading and trailing whitespace is
  460 +intentionally not removed from lines, which makes it possible to handle
  461 +arguments that start or end with spaces. The @1@option@1@@-@2@option@2@
  462 +option allows arguments to be read from standard input. This allows qpdf
  463 +to be invoked with an arbitrary number of arbitrarily long arguments. It
  464 +is also very useful for avoiding having to pass passwords on the command
  465 +line. Note that the @1@option@1@@filename@2@option@2@ can't appear in
  466 +the middle of an argument, so constructs such as
  467 +@1@option@1@--arg=@option@2@option@2@ will not work. You would have to
  468 +include the argument and its options together in the arguments file.
  469 +
  470 +@1@option@1@outfilename@2@option@2@ does not have to be seekable, even
  471 +when generating linearized files. Specifying "@1@option@1@-@2@option@2@"
  472 +as @1@option@1@outfilename@2@option@2@ means to write to standard
  473 +output. If you want to overwrite the input file with the output, use the
  474 +option @1@option@1@--replace-input@2@option@2@ and omit the output file
  475 +name. You can't specify the same file as both the input and the output.
  476 +If you do this, qpdf will tell you about the
  477 +@1@option@1@--replace-input@2@option@2@ option.
  478 +
  479 +Most options require an output file, but some testing or inspection
  480 +commands do not. These are specifically noted.
  481 +
  482 +.. _ref.exit-status:
  483 +
  484 +Exit Status
  485 +~~~~~~~~~~~
  486 +
  487 +The exit status of @1@command@1@qpdf@2@command@2@ may be interpreted as
  488 +follows:
  489 +
  490 +- ``0``: no errors or warnings were found. The file may still have
  491 + problems qpdf can't detect. If
  492 + @1@option@1@--warning-exit-0@2@option@2@ was specified, exit status 0
  493 + is used even if there are warnings.
  494 +
  495 +- ``2``: errors were found. qpdf was not able to fully process the
  496 + file.
  497 +
  498 +- ``3``: qpdf encountered problems that it was able to recover from. In
  499 + some cases, the resulting file may still be damaged. Note that qpdf
  500 + still exits with status ``3`` if it finds warnings even when
  501 + @1@option@1@--no-warn@2@option@2@ is specified. With
  502 + @1@option@1@--warning-exit-0@2@option@2@, warnings without errors
  503 + exit with status 0 instead of 3.
  504 +
  505 +Note that @1@command@1@qpdf@2@command@2@ never exists with status ``1``.
  506 +If you get an exit status of ``1``, it was something else, like the
  507 +shell not being able to find or execute @1@command@1@qpdf@2@command@2@.
  508 +
  509 +.. _ref.shell-completion:
  510 +
  511 +Shell Completion
  512 +----------------
  513 +
  514 +Starting in qpdf version 8.3.0, qpdf provides its own completion support
  515 +for zsh and bash. You can enable bash completion with @1@command@1@eval
  516 +$(qpdf --completion-bash)@2@command@2@ and zsh completion with
  517 +@1@command@1@eval $(qpdf --completion-zsh)@2@command@2@. If
  518 +@1@command@1@qpdf@2@command@2@ is not in your path, you should invoke it
  519 +above with an absolute path. If you invoke it with a relative path, it
  520 +will warn you, and the completion won't work if you're in a different
  521 +directory.
  522 +
  523 +qpdf will use ``argv[0]`` to figure out where its executable is. This
  524 +may produce unwanted results in some cases, especially if you are trying
  525 +to use completion with copy of qpdf that is built from source. You can
  526 +specify a full path to the qpdf you want to use for completion in the
  527 +``QPDF_EXECUTABLE`` environment variable.
  528 +
  529 +.. _ref.basic-options:
  530 +
  531 +Basic Options
  532 +-------------
  533 +
  534 +The following options are the most common ones and perform commonly
  535 +needed transformations.
  536 +
  537 +@1@option@1@--help@2@option@2@
  538 + Display command-line invocation help.
  539 +
  540 +@1@option@1@--version@2@option@2@
  541 + Display the current version of qpdf.
  542 +
  543 +@1@option@1@--copyright@2@option@2@
  544 + Show detailed copyright information.
  545 +
  546 +@1@option@1@--show-crypto@2@option@2@
  547 + Show a list of available crypto providers, each on a line by itself.
  548 + The default provider is always listed first. See `Crypto
  549 + Providers <#ref.crypto>`__ for more information about crypto
  550 + providers.
  551 +
  552 +@1@option@1@--completion-bash@2@option@2@
  553 + Output a completion command you can eval to enable shell completion
  554 + from bash.
  555 +
  556 +@1@option@1@--completion-zsh@2@option@2@
  557 + Output a completion command you can eval to enable shell completion
  558 + from zsh.
  559 +
  560 +@1@option@1@--password=@1@replaceable@1@password@2@replaceable@2@@2@option@2@
  561 + Specifies a password for accessing encrypted files. To read the
  562 + password from a file or standard input, you can use
  563 + @1@option@1@--password-file@2@option@2@, added in qpdf 10.2. Note
  564 + that you can also use @1@option@1@@filename@2@option@2@ or
  565 + @1@option@1@@-@2@option@2@ as described above to put the password in
  566 + a file or pass it via standard input, but you would do so by
  567 + specifying the entire
  568 + @1@option@1@--password=@1@replaceable@1@password@2@replaceable@2@@2@option@2@
  569 + option in the file. Syntax such as
  570 + @1@option@1@--password=@filename@2@option@2@ won't work since
  571 + @1@option@1@@filename@2@option@2@ is not recognized in the middle of
  572 + an argument.
  573 +
  574 +@1@option@1@--password-file=@1@replaceable@1@filename@2@replaceable@2@@2@option@2@
  575 + Reads the first line from the specified file and uses it as the
  576 + password for accessing encrypted files.
  577 + @1@option@1@@1@replaceable@1@filename@2@replaceable@2@@2@option@2@
  578 + may be ``-`` to read the password from standard input. Note that, in
  579 + this case, the password is echoed and there is no prompt, so use with
  580 + caution.
  581 +
  582 +@1@option@1@--is-encrypted@2@option@2@
  583 + Silently exit with status 0 if the file is encrypted or status 2 if
  584 + the file is not encrypted. This is useful for shell scripts. Other
  585 + options are ignored if this is given. This option is mutually
  586 + exclusive with @1@option@1@--requires-password@2@option@2@. Both this
  587 + option and @1@option@1@--requires-password@2@option@2@ exit with
  588 + status 2 for non-encrypted files.
  589 +
  590 +@1@option@1@--requires-password@2@option@2@
  591 + Silently exit with status 0 if a password (other than as supplied) is
  592 + required. Exit with status 2 if the file is not encrypted. Exit with
  593 + status 3 if the file is encrypted but requires no password or the
  594 + correct password has been supplied. This is useful for shell scripts.
  595 + Note that any supplied password is used when opening the file. When
  596 + used with a @1@option@1@--password@2@option@2@ option, this option
  597 + can be used to check the correctness of the password. In that case,
  598 + an exit status of 3 means the file works with the supplied password.
  599 + This option is mutually exclusive with
  600 + @1@option@1@--is-encrypted@2@option@2@. Both this option and
  601 + @1@option@1@--is-encrypted@2@option@2@ exit with status 2 for
  602 + non-encrypted files.
  603 +
  604 +@1@option@1@--verbose@2@option@2@
  605 + Increase verbosity of output. For now, this just prints some
  606 + indication of any file that it creates.
  607 +
  608 +@1@option@1@--progress@2@option@2@
  609 + Indicate progress while writing files.
  610 +
  611 +@1@option@1@--no-warn@2@option@2@
  612 + Suppress writing of warnings to stderr. If warnings were detected and
  613 + suppressed, @1@command@1@qpdf@2@command@2@ will still exit with exit
  614 + code 3. See also @1@option@1@--warning-exit-0@2@option@2@.
  615 +
  616 +@1@option@1@--warning-exit-0@2@option@2@
  617 + If warnings are found but no errors, exit with exit code 0 instead 3.
  618 + When combined with @1@option@1@--no-warn@2@option@2@, the effect is
  619 + for @1@command@1@qpdf@2@command@2@ to completely ignore warnings.
  620 +
  621 +@1@option@1@--linearize@2@option@2@
  622 + Causes generation of a linearized (web-optimized) output file.
  623 +
  624 +@1@option@1@--replace-input@2@option@2@
  625 + If specified, the output file name should be omitted. This option
  626 + tells qpdf to replace the input file with the output. It does this by
  627 + writing to
  628 + @1@filename@1@@1@replaceable@1@infilename@2@replaceable@2@.~qpdf-temp#@2@filename@2@
  629 + and, when done, overwriting the input file with the temporary file.
  630 + If there were any warnings, the original input is saved as
  631 + @1@filename@1@@1@replaceable@1@infilename@2@replaceable@2@.~qpdf-orig@2@filename@2@.
  632 +
  633 +@1@option@1@--copy-encryption=file@2@option@2@
  634 + Encrypt the file using the same encryption parameters, including user
  635 + and owner password, as the specified file. Use
  636 + @1@option@1@--encryption-file-password@2@option@2@ to specify a
  637 + password if one is needed to open this file. Note that copying the
  638 + encryption parameters from a file also copies the first half of
  639 + ``/ID`` from the file since this is part of the encryption
  640 + parameters.
  641 +
  642 +@1@option@1@--encryption-file-password=password@2@option@2@
  643 + If the file specified with @1@option@1@--copy-encryption@2@option@2@
  644 + requires a password, specify the password using this option. Note
  645 + that only one of the user or owner password is required. Both
  646 + passwords will be preserved since QPDF does not distinguish between
  647 + the two passwords. It is possible to preserve encryption parameters,
  648 + including the owner password, from a file even if you don't know the
  649 + file's owner password.
  650 +
  651 +@1@option@1@--allow-weak-crypto@2@option@2@
  652 + Starting with version 10.4, qpdf issues warnings when requested to
  653 + create files using RC4 encryption. This option suppresses those
  654 + warnings. In future versions of qpdf, qpdf will refuse to create
  655 + files with weak cryptography when this flag is not given. See `Weak
  656 + Cryptography <#ref.weak-crypto>`__ for additional details.
  657 +
  658 +@1@option@1@--encrypt options --@2@option@2@
  659 + Causes generation an encrypted output file. Please see `Encryption
  660 + Options <#ref.encryption-options>`__ for details on how to specify
  661 + encryption parameters.
  662 +
  663 +@1@option@1@--decrypt@2@option@2@
  664 + Removes any encryption on the file. A password must be supplied if
  665 + the file is password protected.
  666 +
  667 +@1@option@1@--password-is-hex-key@2@option@2@
  668 + Overrides the usual computation/retrieval of the PDF file's
  669 + encryption key from user/owner password with an explicit
  670 + specification of the encryption key. When this option is specified,
  671 + the argument to the @1@option@1@--password@2@option@2@ option is
  672 + interpreted as a hexadecimal-encoded key value. This only applies to
  673 + the password used to open the main input file. It does not apply to
  674 + other files opened by @1@option@1@--pages@2@option@2@ or other
  675 + options or to files being written.
  676 +
  677 + Most users will never have a need for this option, and no standard
  678 + viewers support this mode of operation, but it can be useful for
  679 + forensic or investigatory purposes. For example, if a PDF file is
  680 + encrypted with an unknown password, a brute-force attack using the
  681 + key directly is sometimes more efficient than one using the password.
  682 + Also, if a file is heavily damaged, it may be possible to derive the
  683 + encryption key and recover parts of the file using it directly. To
  684 + expose the encryption key used by an encrypted file that you can open
  685 + normally, use the @1@option@1@--show-encryption-key@2@option@2@
  686 + option.
  687 +
  688 +@1@option@1@--suppress-password-recovery@2@option@2@
  689 + Ordinarily, qpdf attempts to automatically compensate for passwords
  690 + specified in the wrong character encoding. This option suppresses
  691 + that behavior. Under normal conditions, there are no reasons to use
  692 + this option. See `Unicode Passwords <#ref.unicode-passwords>`__ for a
  693 + discussion
  694 +
  695 +@1@option@1@--password-mode=@1@replaceable@1@mode@2@replaceable@2@@2@option@2@
  696 + This option can be used to fine-tune how qpdf interprets Unicode
  697 + (non-ASCII) password strings passed on the command line. With the
  698 + exception of the @1@option@1@hex-bytes@2@option@2@ mode, these only
  699 + apply to passwords provided when encrypting files. The
  700 + @1@option@1@hex-bytes@2@option@2@ mode also applies to passwords
  701 + specified for reading files. For additional discussion of the
  702 + supported password modes and when you might want to use them, see
  703 + `Unicode Passwords <#ref.unicode-passwords>`__. The following modes
  704 + are supported:
  705 +
  706 + - @1@option@1@auto@2@option@2@: Automatically determine whether the
  707 + specified password is a properly encoded Unicode (UTF-8) string,
  708 + and transcode it as required by the PDF spec based on the type
  709 + encryption being applied. On Windows starting with version 8.4.0,
  710 + and on almost all other modern platforms, incoming passwords will
  711 + be properly encoded in UTF-8, so this is almost always what you
  712 + want.
  713 +
  714 + - @1@option@1@unicode@2@option@2@: Tells qpdf that the incoming
  715 + password is UTF-8, overriding whatever its automatic detection
  716 + determines. The only difference between this mode and
  717 + @1@option@1@auto@2@option@2@ is that qpdf will fail with an error
  718 + message if the password is not valid UTF-8 instead of falling back
  719 + to @1@option@1@bytes@2@option@2@ mode with a warning.
  720 +
  721 + - @1@option@1@bytes@2@option@2@: Interpret the password as a literal
  722 + byte string. For non-Windows platforms, this is what versions of
  723 + qpdf prior to 8.4.0 did. For Windows platforms, there is no way to
  724 + specify strings of binary data on the command line directly, but
  725 + you can use the @1@option@1@@filename@2@option@2@ option to do it,
  726 + in which case this option forces qpdf to respect the string of
  727 + bytes as provided. This option will allow you to encrypt PDF files
  728 + with passwords that will not be usable by other readers.
  729 +
  730 + - @1@option@1@hex-bytes@2@option@2@: Interpret the password as a
  731 + hex-encoded string. This provides a way to pass binary data as a
  732 + password on all platforms including Windows. As with
  733 + @1@option@1@bytes@2@option@2@, this option may allow creation of
  734 + files that can't be opened by other readers. This mode affects
  735 + qpdf's interpretation of passwords specified for decrypting files
  736 + as well as for encrypting them. It makes it possible to specify
  737 + strings that are encoded in some manner other than the system's
  738 + default encoding.
  739 +
  740 +@1@option@1@--rotate=[+|-]angle[:page-range]@2@option@2@
  741 + Apply rotation to specified pages. The
  742 + @1@option@1@page-range@2@option@2@ portion of the option value has
  743 + the same format as page ranges in `Page Selection
  744 + Options <#ref.page-selection>`__. If the page range is omitted, the
  745 + rotation is applied to all pages. The @1@option@1@angle@2@option@2@
  746 + portion of the parameter may be either 0, 90, 180, or 270. If
  747 + preceded by @1@option@1@+@2@option@2@ or @1@option@1@-@2@option@2@,
  748 + the angle is added to or subtracted from the specified pages'
  749 + original rotations. This is almost always what you want. Otherwise
  750 + the pages' rotations are set to the exact value, which may cause the
  751 + appearances of the pages to be inconsistent, especially for scans.
  752 + For example, the command @1@command@1@qpdf in.pdf out.pdf
  753 + --rotate=+90:2,4,6 --rotate=180:7-8@2@command@2@ would rotate pages
  754 + 2, 4, and 6 90 degrees clockwise from their original rotation and
  755 + force the rotation of pages 7 through 8 to 180 degrees regardless of
  756 + their original rotation, and the command @1@command@1@qpdf in.pdf
  757 + out.pdf --rotate=+180@2@command@2@ would rotate all pages by 180
  758 + degrees.
  759 +
  760 +@1@option@1@--keep-files-open=@1@replaceable@1@[yn]@2@replaceable@2@@2@option@2@
  761 + This option controls whether qpdf keeps individual files open while
  762 + merging. Prior to version 8.1.0, qpdf always kept all files open, but
  763 + this meant that the number of files that could be merged was limited
  764 + by the operating system's open file limit. Version 8.1.0 opened files
  765 + as they were referenced and closed them after each read, but this
  766 + caused a major performance impact. Version 8.2.0 optimized the
  767 + performance but did so in a way that, for local file systems, there
  768 + was a small but unavoidable performance hit, but for networked file
  769 + systems, the performance impact could be very high. Starting with
  770 + version 8.2.1, the default behavior is that files are kept open if no
  771 + more than 200 files are specified, but that the behavior can be
  772 + explicitly overridden with the
  773 + @1@option@1@--keep-files-open@2@option@2@ flag. If you are merging
  774 + more than 200 files but less than the operating system's max open
  775 + files limit, you may want to use
  776 + @1@option@1@--keep-files-open=y@2@option@2@, especially if working
  777 + over a networked file system. If you are using a local file system
  778 + where the overhead is low and you might sometimes merge more than the
  779 + OS limit's number of files from a script and are not worried about a
  780 + few seconds additional processing time, you may want to specify
  781 + @1@option@1@--keep-files-open=n@2@option@2@. The threshold for
  782 + switching may be changed from the default 200 with the
  783 + @1@option@1@--keep-files-open-threshold@2@option@2@ option.
  784 +
  785 +@1@option@1@--keep-files-open-threshold=@1@replaceable@1@count@2@replaceable@2@@2@option@2@
  786 + If specified, overrides the default value of 200 used as the
  787 + threshold for qpdf deciding whether or not to keep files open. See
  788 + @1@option@1@--keep-files-open@2@option@2@ for details.
  789 +
  790 +@1@option@1@--pages options --@2@option@2@
  791 + Select specific pages from one or more input files. See `Page
  792 + Selection Options <#ref.page-selection>`__ for details on how to do
  793 + page selection (splitting and merging).
  794 +
  795 +@1@option@1@--collate=@1@replaceable@1@n@2@replaceable@2@@2@option@2@
  796 + When specified, collate rather than concatenate pages from files
  797 + specified with @1@option@1@--pages@2@option@2@. With a numeric
  798 + argument, collate in groups of @1@replaceable@1@n@2@replaceable@2@.
  799 + The default is 1. See `Page Selection
  800 + Options <#ref.page-selection>`__ for additional details.
  801 +
  802 +@1@option@1@--flatten-rotation@2@option@2@
  803 + For each page that is rotated using the ``/Rotate`` key in the page's
  804 + dictionary, remove the ``/Rotate`` key and implement the identical
  805 + rotation semantics by modifying the page's contents. This option can
  806 + be useful to prepare files for buggy PDF applications that don't
  807 + properly handle rotated pages.
  808 +
  809 +@1@option@1@--split-pages=[n]@2@option@2@
  810 + Write each group of @1@option@1@n@2@option@2@ pages to a separate
  811 + output file. If @1@option@1@n@2@option@2@ is not specified, create
  812 + single pages. Output file names are generated as follows:
  813 +
  814 + - If the string ``%d`` appears in the output file name, it is
  815 + replaced with a range of zero-padded page numbers starting from 1.
  816 +
  817 + - Otherwise, if the output file name ends in
  818 + @1@filename@1@.pdf@2@filename@2@ (case insensitive), a zero-padded
  819 + page range, preceded by a dash, is inserted before the file
  820 + extension.
  821 +
  822 + - Otherwise, the file name is appended with a zero-padded page range
  823 + preceded by a dash.
  824 +
  825 + Page ranges are a single number in the case of single-page groups or
  826 + two numbers separated by a dash otherwise. For example, if
  827 + @1@filename@1@infile.pdf@2@filename@2@ has 12 pages
  828 +
  829 + - @1@command@1@qpdf --split-pages infile.pdf %d-out@2@command@2@
  830 + would generate files @1@filename@1@01-out@2@filename@2@ through
  831 + @1@filename@1@12-out@2@filename@2@
  832 +
  833 + - @1@command@1@qpdf --split-pages=2 infile.pdf
  834 + outfile.pdf@2@command@2@ would generate files
  835 + @1@filename@1@outfile-01-02.pdf@2@filename@2@ through
  836 + @1@filename@1@outfile-11-12.pdf@2@filename@2@
  837 +
  838 + - @1@command@1@qpdf --split-pages infile.pdf
  839 + something.else@2@command@2@ would generate files
  840 + @1@filename@1@something.else-01@2@filename@2@ through
  841 + @1@filename@1@something.else-12@2@filename@2@
  842 +
  843 + Note that outlines, threads, and other global features of the
  844 + original PDF file are not preserved. For each page of output, this
  845 + option creates an empty PDF and copies a single page from the output
  846 + into it. If you require the global data, you will have to run
  847 + @1@command@1@qpdf@2@command@2@ with the
  848 + @1@option@1@--pages@2@option@2@ option once for each file. Using
  849 + @1@option@1@--split-pages@2@option@2@ is much faster if you don't
  850 + require the global data.
  851 +
  852 +@1@option@1@--overlay options --@2@option@2@
  853 + Overlay pages from another file onto the output pages. See `Overlay
  854 + and Underlay Options <#ref.overlay-underlay>`__ for details on
  855 + overlay/underlay.
  856 +
  857 +@1@option@1@--underlay options --@2@option@2@
  858 + Overlay pages from another file onto the output pages. See `Overlay
  859 + and Underlay Options <#ref.overlay-underlay>`__ for details on
  860 + overlay/underlay.
  861 +
  862 +Password-protected files may be opened by specifying a password. By
  863 +default, qpdf will preserve any encryption data associated with a file.
  864 +If @1@option@1@--decrypt@2@option@2@ is specified, qpdf will attempt to
  865 +remove any encryption information. If @1@option@1@--encrypt@2@option@2@
  866 +is specified, qpdf will replace the document's encryption parameters
  867 +with whatever is specified.
  868 +
  869 +Note that qpdf does not obey encryption restrictions already imposed on
  870 +the file. Doing so would be meaningless since qpdf can be used to remove
  871 +encryption from the file entirely. This functionality is not intended to
  872 +be used for bypassing copyright restrictions or other restrictions
  873 +placed on files by their producers.
  874 +
  875 +Prior to 8.4.0, in the case of passwords that contain characters that
  876 +fall outside of 7-bit US-ASCII, qpdf left the burden of supplying
  877 +properly encoded encryption and decryption passwords to the user.
  878 +Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For
  879 +an in-depth discussion, please see `Unicode
  880 +Passwords <#ref.unicode-passwords>`__. Previous versions of this manual
  881 +described workarounds using the @1@command@1@iconv@2@command@2@ command.
  882 +Such workarounds are no longer required or recommended with qpdf 8.4.0.
  883 +However, for backward compatibility, qpdf attempts to detect those
  884 +workarounds and do the right thing in most cases.
  885 +
  886 +.. _ref.encryption-options:
  887 +
  888 +Encryption Options
  889 +------------------
  890 +
  891 +To change the encryption parameters of a file, use the --encrypt flag.
  892 +The syntax is
  893 +
  894 +::
  895 +
  896 + @1@option@1@--encrypt @1@replaceable@1@user-password@2@replaceable@2@ @1@replaceable@1@owner-password@2@replaceable@2@ @1@replaceable@1@key-length@2@replaceable@2@ [ @1@replaceable@1@restrictions@2@replaceable@2@ ] --@2@option@2@
  897 +
  898 +Note that "@1@option@1@--@2@option@2@" terminates parsing of encryption
  899 +flags and must be present even if no restrictions are present.
  900 +
  901 +Either or both of the user password and the owner password may be empty
  902 +strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation
  903 +of PDF files with a non-empty user password, an empty owner password,
  904 +and a 256-bit key since such files can be opened with no password. If
  905 +you want to create such files, specify the encryption option
  906 +@1@option@1@--allow-insecure@2@option@2@, as described below.
  907 +
  908 +The value for
  909 +@1@option@1@@1@replaceable@1@key-length@2@replaceable@2@@2@option@2@ may
  910 +be 40, 128, or 256. The restriction flags are dependent upon key length.
  911 +When no additional restrictions are given, the default is to be fully
  912 +permissive.
  913 +
  914 +If @1@option@1@@1@replaceable@1@key-length@2@replaceable@2@@2@option@2@
  915 +is 40, the following restriction options are available:
  916 +
  917 +@1@option@1@--print=[yn]@2@option@2@
  918 + Determines whether or not to allow printing.
  919 +
  920 +@1@option@1@--modify=[yn]@2@option@2@
  921 + Determines whether or not to allow document modification.
  922 +
  923 +@1@option@1@--extract=[yn]@2@option@2@
  924 + Determines whether or not to allow text/image extraction.
  925 +
  926 +@1@option@1@--annotate=[yn]@2@option@2@
  927 + Determines whether or not to allow comments and form fill-in and
  928 + signing.
  929 +
  930 +If @1@option@1@@1@replaceable@1@key-length@2@replaceable@2@@2@option@2@
  931 +is 128, the following restriction options are available:
  932 +
  933 +@1@option@1@--accessibility=[yn]@2@option@2@
  934 + Determines whether or not to allow accessibility to visually
  935 + impaired. The qpdf library disregards this field when AES is used or
  936 + when 256-bit encryption is used. You should really never disable
  937 + accessibility, but qpdf lets you do it in case you need to configure
  938 + a file this way for testing purposes. The PDF spec says that
  939 + conforming readers should disregard this permission and always allow
  940 + accessibility.
  941 +
  942 +@1@option@1@--extract=[yn]@2@option@2@
  943 + Determines whether or not to allow text/graphic extraction.
  944 +
  945 +@1@option@1@--assemble=[yn]@2@option@2@
  946 + Determines whether document assembly (rotation and reordering of
  947 + pages) is allowed.
  948 +
  949 +@1@option@1@--annotate=[yn]@2@option@2@
  950 + Determines whether modifying annotations is allowed. This includes
  951 + adding comments and filling in form fields. Also allows editing of
  952 + form fields if @1@option@1@--modify-other=y@2@option@2@ is given.
  953 +
  954 +@1@option@1@--form=[yn]@2@option@2@
  955 + Determines whether filling form fields is allowed.
  956 +
  957 +@1@option@1@--modify-other=[yn]@2@option@2@
  958 + Allow all document editing except those controlled separately by the
  959 + @1@option@1@--assemble@2@option@2@,
  960 + @1@option@1@--annotate@2@option@2@, and
  961 + @1@option@1@--form@2@option@2@ options.
  962 +
  963 +@1@option@1@--print=@1@replaceable@1@print-opt@2@replaceable@2@@2@option@2@
  964 + Controls printing access.
  965 + @1@option@1@@1@replaceable@1@print-opt@2@replaceable@2@@2@option@2@
  966 + may be one of the following:
  967 +
  968 + - @1@option@1@full@2@option@2@: allow full printing
  969 +
  970 + - @1@option@1@low@2@option@2@: allow low-resolution printing only
  971 +
  972 + - @1@option@1@none@2@option@2@: disallow printing
  973 +
  974 +@1@option@1@--modify=@1@replaceable@1@modify-opt@2@replaceable@2@@2@option@2@
  975 + Controls modify access. This way of controlling modify access has
  976 + less granularity than new options added in qpdf 8.4.
  977 + @1@option@1@@1@replaceable@1@modify-opt@2@replaceable@2@@2@option@2@
  978 + may be one of the following:
  979 +
  980 + - @1@option@1@all@2@option@2@: allow full document modification
  981 +
  982 + - @1@option@1@annotate@2@option@2@: allow comment authoring, form
  983 + operations, and document assembly
  984 +
  985 + - @1@option@1@form@2@option@2@: allow form field fill-in and signing
  986 + and document assembly
  987 +
  988 + - @1@option@1@assembly@2@option@2@: allow document assembly only
  989 +
  990 + - @1@option@1@none@2@option@2@: allow no modifications
  991 +
  992 + Using the @1@option@1@--modify@2@option@2@ option does not allow you
  993 + to create certain combinations of permissions such as allowing form
  994 + filling but not allowing document assembly. Starting with qpdf 8.4,
  995 + you can either just use the other options to control fields
  996 + individually, or you can use something like @1@option@1@--modify=form
  997 + --assembly=n@2@option@2@ to fine tune.
  998 +
  999 +@1@option@1@--cleartext-metadata@2@option@2@
  1000 + If specified, any metadata stream in the document will be left
  1001 + unencrypted even if the rest of the document is encrypted. This also
  1002 + forces the PDF version to be at least 1.5.
  1003 +
  1004 +@1@option@1@--use-aes=[yn]@2@option@2@
  1005 + If @1@option@1@--use-aes=y@2@option@2@ is specified, AES encryption
  1006 + will be used instead of RC4 encryption. This forces the PDF version
  1007 + to be at least 1.6.
  1008 +
  1009 +@1@option@1@--allow-insecure@2@option@2@
  1010 + From qpdf 10.2, qpdf defaults to not allowing creation of PDF files
  1011 + where the user password is non-empty, the owner password is empty,
  1012 + and a 256-bit key is in use. Files created in this way are insecure
  1013 + since they can be opened without a password. Users would ordinarily
  1014 + never want to create such files. If you are using qpdf to
  1015 + intentionally created strange files for testing (a definite valid use
  1016 + of qpdf!), this option allows you to create such insecure files.
  1017 +
  1018 +@1@option@1@--force-V4@2@option@2@
  1019 + Use of this option forces the ``/V`` and ``/R`` parameters in the
  1020 + document's encryption dictionary to be set to the value ``4``. As
  1021 + qpdf will automatically do this when required, there is no reason to
  1022 + ever use this option. It exists primarily for use in testing qpdf
  1023 + itself. This option also forces the PDF version to be at least 1.5.
  1024 +
  1025 +If @1@option@1@@1@replaceable@1@key-length@2@replaceable@2@@2@option@2@
  1026 +is 256, the minimum PDF version is 1.7 with extension level 8, and the
  1027 +AES-based encryption format used is the PDF 2.0 encryption method
  1028 +supported by Acrobat X. the same options are available as with 128 bits
  1029 +with the following exceptions:
  1030 +
  1031 +@1@option@1@--use-aes@2@option@2@
  1032 + This option is not available with 256-bit keys. AES is always used
  1033 + with 256-bit encryption keys.
  1034 +
  1035 +@1@option@1@--force-V4@2@option@2@
  1036 + This option is not available with 256 keys.
  1037 +
  1038 +@1@option@1@--force-R5@2@option@2@
  1039 + If specified, qpdf sets the minimum version to 1.7 at extension level
  1040 + 3 and writes the deprecated encryption format used by Acrobat version
  1041 + IX. This option should not be used in practice to generate PDF files
  1042 + that will be in general use, but it can be useful to generate files
  1043 + if you are trying to test proper support in another application for
  1044 + PDF files encrypted in this way.
  1045 +
  1046 +The default for each permission option is to be fully permissive.
  1047 +
  1048 +.. _ref.page-selection:
  1049 +
  1050 +Page Selection Options
  1051 +----------------------
  1052 +
  1053 +Starting with qpdf 3.0, it is possible to split and merge PDF files by
  1054 +selecting pages from one or more input files. Whatever file is given as
  1055 +the primary input file is used as the starting point, but its pages are
  1056 +replaced with pages as specified.
  1057 +
  1058 +::
  1059 +
  1060 + @1@option@1@--pages @1@replaceable@1@input-file@2@replaceable@2@ [ @1@replaceable@1@--password=password@2@replaceable@2@ ] [ @1@replaceable@1@page-range@2@replaceable@2@ ] [ ... ] --@2@option@2@
  1061 +
  1062 +Multiple input files may be specified. Each one is given as the name of
  1063 +the input file, an optional password (if required to open the file), and
  1064 +the range of pages. Note that "@1@option@1@--@2@option@2@" terminates
  1065 +parsing of page selection flags.
  1066 +
  1067 +Starting with qpf 8.4, the special input file name
  1068 +"@1@filename@1@.@2@filename@2@" can be used as a shortcut for the
  1069 +primary input filename.
  1070 +
  1071 +For each file that pages should be taken from, specify the file, a
  1072 +password needed to open the file (if any), and a page range. The
  1073 +password needs to be given only once per file. If any of the input files
  1074 +are the same as the primary input file or the file used to copy
  1075 +encryption parameters (if specified), you do not need to repeat the
  1076 +password here. The same file can be repeated multiple times. If a file
  1077 +that is repeated has a password, the password only has to be given the
  1078 +first time. All non-page data (info, outlines, page numbers, etc.) are
  1079 +taken from the primary input file. To discard these, use
  1080 +@1@option@1@--empty@2@option@2@ as the primary input.
  1081 +
  1082 +Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf
  1083 +sees a value in the place where it expects a page range and that value
  1084 +is not a valid range but is a valid file name, qpdf will implicitly use
  1085 +the range ``1-z``, meaning that it will include all pages in the file.
  1086 +This makes it possible to easily combine all pages in a set of files
  1087 +with a command like @1@command@1@qpdf --empty out.pdf --pages \*.pdf
  1088 +--@2@command@2@.
  1089 +
  1090 +The page range is a set of numbers separated by commas, ranges of
  1091 +numbers separated dashes, or combinations of those. The character "z"
  1092 +represents the last page. A number preceded by an "r" indicates to count
  1093 +from the end, so ``r3-r1`` would be the last three pages of the
  1094 +document. Pages can appear in any order. Ranges can appear with a high
  1095 +number followed by a low number, which causes the pages to appear in
  1096 +reverse. Numbers may be repeated in a page range. A page range may be
  1097 +optionally appended with ``:even`` or ``:odd`` to indicate only the even
  1098 +or odd pages in the given range. Note that even and odd refer to the
  1099 +positions within the specified, range, not whether the original number
  1100 +is even or odd.
  1101 +
  1102 +Example page ranges:
  1103 +
  1104 +- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in
  1105 + that order.
  1106 +
  1107 +- ``z-1``: all pages in the document in reverse
  1108 +
  1109 +- ``r3-r1``: the last three pages of the document
  1110 +
  1111 +- ``r1-r3``: the last three pages of the document in reverse order
  1112 +
  1113 +- ``1-20:even``: even pages from 2 to 20
  1114 +
  1115 +- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd
  1116 + positions from among the original range, which represents pages 5, 7,
  1117 + 8, 9, and 12.
  1118 +
  1119 +Starting in qpdf version 8.3, you can specify the
  1120 +@1@option@1@--collate@2@option@2@ option. Note that this option is
  1121 +specified outside of @1@option@1@--pagesย ...ย --@2@option@2@. When
  1122 +@1@option@1@--collate@2@option@2@ is specified, it changes the meaning
  1123 +of @1@option@1@--pages@2@option@2@ so that the specified files, as
  1124 +modified by page ranges, are collated rather than concatenated. For
  1125 +example, if you add the files @1@filename@1@odd.pdf@2@filename@2@ and
  1126 +@1@filename@1@even.pdf@2@filename@2@ containing odd and even pages of a
  1127 +document respectively, you could run @1@command@1@qpdf --collate odd.pdf
  1128 +--pages odd.pdf even.pdf -- all.pdf@2@command@2@ to collate the pages.
  1129 +This would pick page 1 from odd, page 1 from even, page 2 from odd, page
  1130 +2 from even, etc. until all pages have been included. Any number of
  1131 +files and page ranges can be specified. If any file has fewer pages,
  1132 +that file is just skipped when its pages have all been included. For
  1133 +example, if you ran @1@command@1@qpdf --collate --empty --pages a.pdf
  1134 +1-5 b.pdf 6-4 c.pdf r1 -- out.pdf@2@command@2@, you would get the
  1135 +following pages in this order:
  1136 +
  1137 +- a.pdf page 1
  1138 +
  1139 +- b.pdf page 6
  1140 +
  1141 +- c.pdf last page
  1142 +
  1143 +- a.pdf page 2
  1144 +
  1145 +- b.pdf page 5
  1146 +
  1147 +- a.pdf page 3
  1148 +
  1149 +- b.pdf page 4
  1150 +
  1151 +- a.pdf page 4
  1152 +
  1153 +- a.pdf page 5
  1154 +
  1155 +Starting in qpdf version 10.2, you may specify a numeric argument to
  1156 +@1@option@1@--collate@2@option@2@. With
  1157 +@1@option@1@--collate=@1@replaceable@1@n@2@replaceable@2@@2@option@2@,
  1158 +pull groups of @1@replaceable@1@n@2@replaceable@2@ pages from each file,
  1159 +again, stopping when there are no more pages. For example, if you ran
  1160 +@1@command@1@qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf
  1161 +r1 -- out.pdf@2@command@2@, you would get the following pages in this
  1162 +order:
  1163 +
  1164 +- a.pdf page 1
  1165 +
  1166 +- a.pdf page 2
  1167 +
  1168 +- b.pdf page 6
  1169 +
  1170 +- b.pdf page 5
  1171 +
  1172 +- c.pdf last page
  1173 +
  1174 +- a.pdf page 3
  1175 +
  1176 +- a.pdf page 4
  1177 +
  1178 +- b.pdf page 4
  1179 +
  1180 +- a.pdf page 5
  1181 +
  1182 +Starting in qpdf version 8.3, when you split and merge files, any page
  1183 +labels (page numbers) are preserved in the final file. It is expected
  1184 +that more document features will be preserved by splitting and merging.
  1185 +In the mean time, semantics of splitting and merging vary across
  1186 +features. For example, the document's outlines (bookmarks) point to
  1187 +actual page objects, so if you select some pages and not others,
  1188 +bookmarks that point to pages that are in the output file will work, and
  1189 +remaining bookmarks will not work. A future version of
  1190 +@1@command@1@qpdf@2@command@2@ may do a better job at handling these
  1191 +issues. (Note that the qpdf library already contains all of the APIs
  1192 +required in order to implement this in your own application if you need
  1193 +it.) In the mean time, you can always use
  1194 +@1@option@1@--empty@2@option@2@ as the primary input file to avoid
  1195 +copying all of that from the first file. For example, to take pages 1
  1196 +through 5 from a @1@filename@1@infile.pdf@2@filename@2@ while preserving
  1197 +all metadata associated with that file, you could use
  1198 +
  1199 +::
  1200 +
  1201 + @1@command@1@qpdf@2@command@2@ @1@option@1@infile.pdf --pages . 1-5 -- outfile.pdf@2@option@2@
  1202 +
  1203 +If you wanted pages 1 through 5 from
  1204 +@1@filename@1@infile.pdf@2@filename@2@ but you wanted the rest of the
  1205 +metadata to be dropped, you could instead run
  1206 +
  1207 +::
  1208 +
  1209 + @1@command@1@qpdf@2@command@2@ @1@option@1@--empty --pages infile.pdf 1-5 -- outfile.pdf@2@option@2@
  1210 +
  1211 +If you wanted to take pages 1 through 5 from
  1212 +@1@filename@1@file1.pdf@2@filename@2@ and pages 11 through 15 from
  1213 +@1@filename@1@file2.pdf@2@filename@2@ in reverse, taking document-level
  1214 +metadata from @1@filename@1@file2.pdf@2@filename@2@, you would run
  1215 +
  1216 +::
  1217 +
  1218 + @1@command@1@qpdf@2@command@2@ @1@option@1@file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf@2@option@2@
  1219 +
  1220 +If, for some reason, you wanted to take the first page of an encrypted
  1221 +file called @1@filename@1@encrypted.pdf@2@filename@2@ with password
  1222 +``pass`` and repeat it twice in an output file, and if you wanted to
  1223 +drop document-level metadata but preserve encryption, you would use
  1224 +
  1225 +::
  1226 +
  1227 + @1@command@1@qpdf@2@command@2@ @1@option@1@--empty --copy-encryption=encrypted.pdf --encryption-file-password=pass
  1228 + --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 --
  1229 + outfile.pdf@2@option@2@
  1230 +
  1231 +Note that we had to specify the password all three times because giving
  1232 +a password as @1@option@1@--encryption-file-password@2@option@2@ doesn't
  1233 +count for page selection, and as far as qpdf is concerned,
  1234 +@1@filename@1@encrypted.pdf@2@filename@2@ and
  1235 +@1@filename@1@./encrypted.pdf@2@filename@2@ are separated files. These
  1236 +are all corner cases that most users should hopefully never have to be
  1237 +bothered with.
  1238 +
  1239 +Prior to version 8.4, it was not possible to specify the same page from
  1240 +the same file directly more than once, and the workaround of specifying
  1241 +the same file in more than one way was required. Version 8.4 removes
  1242 +this limitation, but there is still a valid use case. When you specify
  1243 +the same page from the same file more than once, qpdf will share objects
  1244 +between the pages. If you are going to do further manipulation on the
  1245 +file and need the two instances of the same original page to be deep
  1246 +copies, then you can specify the file in two different ways. For example
  1247 +@1@command@1@qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf@2@command@2@
  1248 +would create a file with two copies of the first page of the input, and
  1249 +the two copies would share any objects in common. This includes fonts,
  1250 +images, and anything else the page references.
  1251 +
  1252 +.. _ref.overlay-underlay:
  1253 +
  1254 +Overlay and Underlay Options
  1255 +----------------------------
  1256 +
  1257 +Starting with qpdf 8.4, it is possible to overlay or underlay pages from
  1258 +other files onto the output generated by qpdf. Specify overlay or
  1259 +underlay as follows:
  1260 +
  1261 +::
  1262 +
  1263 + { @1@option@1@--overlay@2@option@2@ | @1@option@1@--underlay@2@option@2@ } @1@replaceable@1@file@2@replaceable@2@ [ @1@option@1@options@2@option@2@ ] @1@option@1@--@2@option@2@
  1264 +
  1265 +Overlay and underlay options are processed late, so they can be combined
  1266 +with other like merging and will apply to the final output. The
  1267 +@1@option@1@--overlay@2@option@2@ and @1@option@1@--underlay@2@option@2@
  1268 +options work the same way, except underlay pages are drawn underneath
  1269 +the page to which they are applied, possibly obscured by the original
  1270 +page, and overlay files are drawn on top of the page to which they are
  1271 +applied, possibly obscuring the page. You can combine overlay and
  1272 +underlay.
  1273 +
  1274 +The default behavior of overlay and underlay is that pages are taken
  1275 +from the overlay/underlay file in sequence and applied to corresponding
  1276 +pages in the output until there are no more output pages. If the overlay
  1277 +or underlay file runs out of pages, remaining output pages are left
  1278 +alone. This behavior can be modified by options, which are provided
  1279 +between the @1@option@1@--overlay@2@option@2@ or
  1280 +@1@option@1@--underlay@2@option@2@ flag and the
  1281 +@1@option@1@--@2@option@2@ option. The following options are supported:
  1282 +
  1283 +- @1@option@1@--password=password@2@option@2@: supply a password if the
  1284 + overlay/underlay file is encrypted.
  1285 +
  1286 +- @1@option@1@--to=page-range@2@option@2@: a range of pages in the same
  1287 + form at described in `Page Selection Options <#ref.page-selection>`__
  1288 + indicates which pages in the output should have the overlay/underlay
  1289 + applied. If not specified, overlay/underlay are applied to all pages.
  1290 +
  1291 +- @1@option@1@--from=[page-range]@2@option@2@: a range of pages that
  1292 + specifies which pages in the overlay/underlay file will be used for
  1293 + overlay or underlay. If not specified, all pages will be used. This
  1294 + can be explicitly specified to be empty if
  1295 + @1@option@1@--repeat@2@option@2@ is used.
  1296 +
  1297 +- @1@option@1@--repeat=page-range@2@option@2@: an optional range of
  1298 + pages that specifies which pages in the overlay/underlay file will be
  1299 + repeated after the "from" pages are used up. If you want to repeat a
  1300 + range of pages starting at the beginning, you can explicitly use
  1301 + @1@option@1@--from=@2@option@2@.
  1302 +
  1303 +Here are some examples.
  1304 +
  1305 +- @1@command@1@--overlay o.pdf --to=1-5 --from=1-3 --repeat=4
  1306 + --@2@command@2@: overlay the first three pages from file
  1307 + @1@filename@1@o.pdf@2@filename@2@ onto the first three pages of the
  1308 + output, then overlay page 4 from @1@filename@1@o.pdf@2@filename@2@
  1309 + onto pages 4 and 5 of the output. Leave remaining output pages
  1310 + untouched.
  1311 +
  1312 +- @1@command@1@--underlay footer.pdf --from= --repeat=1,2
  1313 + --@2@command@2@: Underlay page 1 of
  1314 + @1@filename@1@footer.pdf@2@filename@2@ on all odd output pages, and
  1315 + underlay page 2 of @1@filename@1@footer.pdf@2@filename@2@ on all even
  1316 + output pages.
  1317 +
  1318 +.. _ref.attachments:
  1319 +
  1320 +Embedded Files/Attachments Options
  1321 +----------------------------------
  1322 +
  1323 +Starting with qpdf 10.2, you can work with file attachments in PDF files
  1324 +from the command line. The following options are available:
  1325 +
  1326 +@1@option@1@--list-attachments@2@option@2@
  1327 + Show the "key" and stream number for embedded files. With
  1328 + @1@option@1@--verbose@2@option@2@, additional information, including
  1329 + preferred file name, description, dates, and more are also displayed.
  1330 + The key is usually but not always equal to the file name, and is
  1331 + needed by some of the other options.
  1332 +
  1333 +@1@option@1@--show-attachment=@1@replaceable@1@key@2@replaceable@2@@2@option@2@
  1334 + Write the contents of the specified attachment to standard output as
  1335 + binary data. The key should match one of the keys shown by
  1336 + @1@option@1@--list-attachments@2@option@2@. If specified multiple
  1337 + times, only the last attachment will be shown.
  1338 +
  1339 +@1@option@1@--add-attachment @1@replaceable@1@file@2@replaceable@2@ @1@replaceable@1@options@2@replaceable@2@ --@2@option@2@
  1340 + Add or replace an attachment with the contents of
  1341 + @1@replaceable@1@file@2@replaceable@2@. This may be specified more
  1342 + than once. The following additional options may appear before the
  1343 + ``--`` that ends this option:
  1344 +
  1345 + @1@option@1@--key=@1@replaceable@1@key@2@replaceable@2@@2@option@2@
  1346 + The key to use to register the attachment in the embedded files
  1347 + table. Defaults to the last path element of
  1348 + @1@replaceable@1@file@2@replaceable@2@.
  1349 +
  1350 + @1@option@1@--filename=@1@replaceable@1@name@2@replaceable@2@@2@option@2@
  1351 + The file name to be used for the attachment. This is what is
  1352 + usually displayed to the user and is the name most graphical PDF
  1353 + viewers will use when saving a file. It defaults to the last path
  1354 + element of @1@replaceable@1@file@2@replaceable@2@.
  1355 +
  1356 + @1@option@1@--creationdate=@1@replaceable@1@date@2@replaceable@2@@2@option@2@
  1357 + The attachment's creation date in PDF format; defaults to the
  1358 + current time. The date format is explained below.
  1359 +
  1360 + @1@option@1@--moddate=@1@replaceable@1@date@2@replaceable@2@@2@option@2@
  1361 + The attachment's modification date in PDF format; defaults to the
  1362 + current time. The date format is explained below.
  1363 +
  1364 + @1@option@1@--mimetype=@1@replaceable@1@type/subtype@2@replaceable@2@@2@option@2@
  1365 + The mime type for the attachment, e.g. ``text/plain`` or
  1366 + ``application/pdf``. Note that the mimetype appears in a field
  1367 + called ``/Subtype`` in the PDF but actually includes the full type
  1368 + and subtype of the mime type.
  1369 +
  1370 + @1@option@1@--description=@1@replaceable@1@"text"@2@replaceable@2@@2@option@2@
  1371 + Descriptive text for the attachment, displayed by some PDF
  1372 + viewers.
  1373 +
  1374 + @1@option@1@--replace@2@option@2@
  1375 + Indicates that any existing attachment with the same key should be
  1376 + replaced by the new attachment. Otherwise,
  1377 + @1@command@1@qpdf@2@command@2@ gives an error if an attachment
  1378 + with that key is already present.
  1379 +
  1380 +@1@option@1@--remove-attachment=@1@replaceable@1@key@2@replaceable@2@@2@option@2@
  1381 + Remove the specified attachment. This doesn't only remove the
  1382 + attachment from the embedded files table but also clears out the file
  1383 + specification. That means that any potential internal links to the
  1384 + attachment will be broken. This option may be specified multiple
  1385 + times. Run with @1@option@1@--verbose@2@option@2@ to see status of
  1386 + the removal.
  1387 +
  1388 +@1@option@1@--copy-attachments-from @1@replaceable@1@file@2@replaceable@2@ @1@replaceable@1@options@2@replaceable@2@ --@2@option@2@
  1389 + Copy attachments from another file. This may be specified more than
  1390 + once. The following additional options may appear before the ``--``
  1391 + that ends this option:
  1392 +
  1393 + @1@option@1@--password=@1@replaceable@1@password@2@replaceable@2@@2@option@2@
  1394 + If required, the password needed to open
  1395 + @1@replaceable@1@file@2@replaceable@2@
  1396 +
  1397 + @1@option@1@--prefix=@1@replaceable@1@prefix@2@replaceable@2@@2@option@2@
  1398 + Only required if the file from which attachments are being copied
  1399 + has attachments with keys that conflict with attachments already
  1400 + in the file. In this case, the specified prefix will be prepended
  1401 + to each key. This affects only the key in the embedded files
  1402 + table, not the file name. The PDF specification doesn't preclude
  1403 + multiple attachments having the same file name.
  1404 +
  1405 +When a date is required, the date should conform to the PDF date format
  1406 +specification, which is
  1407 +``D:``\ @1@replaceable@1@yyyymmddhhmmss<z>@2@replaceable@2@, where
  1408 +@1@replaceable@1@<z>@2@replaceable@2@ is either ``Z`` for UTC or a
  1409 +timezone offset in the form @1@replaceable@1@-hh'mm'@2@replaceable@2@ or
  1410 +@1@replaceable@1@+hh'mm'@2@replaceable@2@. Examples:
  1411 +``D:20210207161528-05'00'``, ``D:20210207211528Z``.
  1412 +
  1413 +.. _ref.advanced-parsing:
  1414 +
  1415 +Advanced Parsing Options
  1416 +------------------------
  1417 +
  1418 +These options control aspects of how qpdf reads PDF files. Mostly these
  1419 +are of use to people who are working with damaged files. There is little
  1420 +reason to use these options unless you are trying to solve specific
  1421 +problems. The following options are available:
  1422 +
  1423 +@1@option@1@--suppress-recovery@2@option@2@
  1424 + Prevents qpdf from attempting to recover damaged files.
  1425 +
  1426 +@1@option@1@--ignore-xref-streams@2@option@2@
  1427 + Tells qpdf to ignore any cross-reference streams.
  1428 +
  1429 +Ordinarily, qpdf will attempt to recover from certain types of errors in
  1430 +PDF files. These include errors in the cross-reference table, certain
  1431 +types of object numbering errors, and certain types of stream length
  1432 +errors. Sometimes, qpdf may think it has recovered but may not have
  1433 +actually recovered, so care should be taken when using this option as
  1434 +some data loss is possible. The
  1435 +@1@option@1@--suppress-recovery@2@option@2@ option will prevent qpdf
  1436 +from attempting recovery. In this case, it will fail on the first error
  1437 +that it encounters.
  1438 +
  1439 +Ordinarily, qpdf reads cross-reference streams when they are present in
  1440 +a PDF file. If @1@option@1@--ignore-xref-streams@2@option@2@ is
  1441 +specified, qpdf will ignore any cross-reference streams for hybrid PDF
  1442 +files. The purpose of hybrid files is to make some content available to
  1443 +viewers that are not aware of cross-reference streams. It is almost
  1444 +never desirable to ignore them. The only time when you might want to use
  1445 +this feature is if you are testing creation of hybrid PDF files and wish
  1446 +to see how a PDF consumer that doesn't understand object and
  1447 +cross-reference streams would interpret such a file.
  1448 +
  1449 +.. _ref.advanced-transformation:
  1450 +
  1451 +Advanced Transformation Options
  1452 +-------------------------------
  1453 +
  1454 +These transformation options control fine points of how qpdf creates the
  1455 +output file. Mostly these are of use only to people who are very
  1456 +familiar with the PDF file format or who are PDF developers. The
  1457 +following options are available:
  1458 +
  1459 +@1@option@1@--compress-streams=@1@replaceable@1@[yn]@2@replaceable@2@@2@option@2@
  1460 + By default, or with @1@option@1@--compress-streams=y@2@option@2@,
  1461 + qpdf will compress any stream with no other filters applied to it
  1462 + with the ``/FlateDecode`` filter when it writes it. To suppress this
  1463 + behavior and preserve uncompressed streams as uncompressed, use
  1464 + @1@option@1@--compress-streams=n@2@option@2@.
  1465 +
  1466 +@1@option@1@--decode-level=@1@replaceable@1@option@2@replaceable@2@@2@option@2@
  1467 + Controls which streams qpdf tries to decode. The default is
  1468 + @1@option@1@generalized@2@option@2@. The following options are
  1469 + available:
  1470 +
  1471 + - @1@option@1@none@2@option@2@: do not attempt to decode any streams
  1472 +
  1473 + - @1@option@1@generalized@2@option@2@: decode streams filtered with
  1474 + supported generalized filters: ``/LZWDecode``, ``/FlateDecode``,
  1475 + ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized
  1476 + filters as those to be used for general-purpose compression or
  1477 + encoding, as opposed to filters specifically designed for image
  1478 + data. Note that, by default, streams already compressed with
  1479 + ``/FlateDecode`` are not uncompressed and recompressed unless you
  1480 + also specify @1@option@1@--recompress-flate@2@option@2@.
  1481 +
  1482 + - @1@option@1@specialized@2@option@2@: in addition to generalized,
  1483 + decode streams with supported non-lossy specialized filters;
  1484 + currently this is just ``/RunLengthDecode``
  1485 +
  1486 + - @1@option@1@all@2@option@2@: in addition to generalized and
  1487 + specialized, decode streams with supported lossy filters;
  1488 + currently this is just ``/DCTDecode`` (JPEG)
  1489 +
  1490 +@1@option@1@--stream-data=@1@replaceable@1@option@2@replaceable@2@@2@option@2@
  1491 + Controls transformation of stream data. This option predates the
  1492 + @1@option@1@--compress-streams@2@option@2@ and
  1493 + @1@option@1@--decode-level@2@option@2@ options. Those options can be
  1494 + used to achieve the same affect with more control. The value of
  1495 + @1@option@1@@1@replaceable@1@option@2@replaceable@2@@2@option@2@ may
  1496 + be one of the following:
  1497 +
  1498 + - @1@option@1@compress@2@option@2@: recompress stream data when
  1499 + possible (default); equivalent to
  1500 + @1@option@1@--compress-streams=y@2@option@2@
  1501 + @1@option@1@--decode-level=generalized@2@option@2@. Does not
  1502 + recompress streams already compressed with ``/FlateDecode`` unless
  1503 + @1@option@1@--recompress-flate@2@option@2@ is also specified.
  1504 +
  1505 + - @1@option@1@preserve@2@option@2@: leave all stream data as is;
  1506 + equivalent to @1@option@1@--compress-streams=n@2@option@2@
  1507 + @1@option@1@--decode-level=none@2@option@2@
  1508 +
  1509 + - @1@option@1@uncompress@2@option@2@: uncompress stream data
  1510 + compressed with generalized filters when possible; equivalent to
  1511 + @1@option@1@--compress-streams=n@2@option@2@
  1512 + @1@option@1@--decode-level=generalized@2@option@2@
  1513 +
  1514 +@1@option@1@--recompress-flate@2@option@2@
  1515 + By default, streams already compressed with ``/FlateDecode`` are left
  1516 + alone rather than being uncompressed and recompressed. This option
  1517 + causes qpdf to uncompress and recompress the streams. There is a
  1518 + significant performance cost to using this option, but you probably
  1519 + want to use it if you specify
  1520 + @1@option@1@--compression-level@2@option@2@.
  1521 +
  1522 +@1@option@1@--compression-level=@1@replaceable@1@level@2@replaceable@2@@2@option@2@
  1523 + When writing new streams that are compressed with ``/FlateDecode``,
  1524 + use the specified compression level. The value of
  1525 + @1@option@1@level@2@option@2@ should be a number from 1 to 9 and is
  1526 + passed directly to zlib, which implements deflate compression. Note
  1527 + that qpdf doesn't uncompress and recompress streams by default. To
  1528 + have this option apply to already compressed streams, you should also
  1529 + specify @1@option@1@--recompress-flate@2@option@2@. If your goal is
  1530 + to shrink the size of PDF files, you should also use
  1531 + @1@option@1@--object-streams=generate@2@option@2@.
  1532 +
  1533 +@1@option@1@--normalize-content=[yn]@2@option@2@
  1534 + Enables or disables normalization of content streams. Content
  1535 + normalization is enabled by default in QDF mode. Please see `QDF
  1536 + Mode <#ref.qdf>`__ for additional discussion of QDF mode.
  1537 +
  1538 +@1@option@1@--object-streams=@1@replaceable@1@mode@2@replaceable@2@@2@option@2@
  1539 + Controls handling of object streams. The value of
  1540 + @1@option@1@@1@replaceable@1@mode@2@replaceable@2@@2@option@2@ may be
  1541 + one of the following:
  1542 +
  1543 + - @1@option@1@preserve@2@option@2@: preserve original object streams
  1544 + (default)
  1545 +
  1546 + - @1@option@1@disable@2@option@2@: don't write any object streams
  1547 +
  1548 + - @1@option@1@generate@2@option@2@: use object streams wherever
  1549 + possible
  1550 +
  1551 +@1@option@1@--preserve-unreferenced@2@option@2@
  1552 + Tells qpdf to preserve objects that are not referenced when writing
  1553 + the file. Ordinarily any object that is not referenced in a traversal
  1554 + of the document from the trailer dictionary will be discarded. This
  1555 + may be useful in working with some damaged files or inspecting files
  1556 + with known unreferenced objects.
  1557 +
  1558 + This flag is ignored for linearized files and has the effect of
  1559 + causing objects in the new file to be written in order by object ID
  1560 + from the original file. This does not mean that object numbers will
  1561 + be the same since qpdf may create stream lengths as direct or
  1562 + indirect differently from the original file, and the original file
  1563 + may have gaps in its numbering.
  1564 +
  1565 + See also @1@option@1@--preserve-unreferenced-resources@2@option@2@,
  1566 + which does something completely different.
  1567 +
  1568 +@1@option@1@--remove-unreferenced-resources=@1@replaceable@1@option@2@replaceable@2@@2@option@2@
  1569 + The @1@replaceable@1@option@2@replaceable@2@ may be ``auto``,
  1570 + ``yes``, or ``no``. The default is ``auto``.
  1571 +
  1572 + Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt
  1573 + to remove images and fonts that are not used by a page even if they
  1574 + are referenced in the page's resources dictionary. When shared
  1575 + resources are in use, this behavior can greatly reduce the file sizes
  1576 + of split pages, but the analysis is very slow. In versions from 8.1
  1577 + through 9.1.1, qpdf did this analysis by default. Starting in qpdf
  1578 + 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file
  1579 + to determine whether the file is likely to have unreferenced objects
  1580 + on pages, a pattern that frequently occurs when resource dictionaries
  1581 + are shared across multiple pages and rarely occurs otherwise. If it
  1582 + discovers this pattern, then it will attempt to remove unreferenced
  1583 + resources. Usually this means you get the slower splitting speed only
  1584 + when it's actually going to create smaller files. You can suppress
  1585 + removal of unreferenced resources altogether by specifying ``no`` or
  1586 + force it to do the full algorithm by specifying ``yes``.
  1587 +
  1588 + Other than cases in which you don't care about file size and care a
  1589 + lot about runtime, there are few reasons to use this option,
  1590 + especially now that ``auto`` mode is supported. One reason to use
  1591 + this is if you suspect that qpdf is removing resources it shouldn't
  1592 + be removing. If you encounter that case, please report it as bug at
  1593 + https://github.com/qpdf/qpdf/issues/.
  1594 +
  1595 +@1@option@1@--preserve-unreferenced-resources@2@option@2@
  1596 + This is a synonym for
  1597 + @1@option@1@--remove-unreferenced-resources=no@2@option@2@.
  1598 +
  1599 + See also @1@option@1@--preserve-unreferenced@2@option@2@, which does
  1600 + something completely different.
  1601 +
  1602 +@1@option@1@--newline-before-endstream@2@option@2@
  1603 + Tells qpdf to insert a newline before the ``endstream`` keyword, not
  1604 + counted in the length, after any stream content even if the last
  1605 + character of the stream was a newline. This may result in two
  1606 + newlines in some cases. This is a requirement of PDF/A. While qpdf
  1607 + doesn't specifically know how to generate PDF/A-compliant PDFs, this
  1608 + at least prevents it from removing compliance on already compliant
  1609 + files.
  1610 +
  1611 +@1@option@1@--linearize-pass1=@1@replaceable@1@file@2@replaceable@2@@2@option@2@
  1612 + Write the first pass of linearization to the named file. The
  1613 + resulting file is not a valid PDF file. This option is useful only
  1614 + for debugging ``QPDFWriter``'s linearization code. When qpdf
  1615 + linearizes files, it writes the file in two passes, using the first
  1616 + pass to calculate sizes and offsets that are required for hint tables
  1617 + and the linearization dictionary. Ordinarily, the first pass is
  1618 + discarded. This option enables it to be captured.
  1619 +
  1620 +@1@option@1@--coalesce-contents@2@option@2@
  1621 + When a page's contents are split across multiple streams, this option
  1622 + causes qpdf to combine them into a single stream. Use of this option
  1623 + is never necessary for ordinary usage, but it can help when working
  1624 + with some files in some cases. For example, this can also be combined
  1625 + with QDF mode or content normalization to make it easier to look at
  1626 + all of a page's contents at once.
  1627 +
  1628 +@1@option@1@--flatten-annotations=@1@replaceable@1@option@2@replaceable@2@@2@option@2@
  1629 + This option collapses annotations into the pages' contents with
  1630 + special handling for form fields. Ordinarily, an annotation is
  1631 + rendered separately and on top of the page. Combining annotations
  1632 + into the page's contents effectively freezes the placement of the
  1633 + annotations, making them look right after various page
  1634 + transformations. The library functionality backing this option was
  1635 + added for the benefit of programs that want to create *n-up* page
  1636 + layouts and other similar things that don't work well with
  1637 + annotations. The @1@replaceable@1@option@2@replaceable@2@ parameter
  1638 + may be any of the following:
  1639 +
  1640 + - @1@option@1@all@2@option@2@: include all annotations that are not
  1641 + marked invisible or hidden
  1642 +
  1643 + - @1@option@1@print@2@option@2@: only include annotations that
  1644 + indicate that they should appear when the page is printed
  1645 +
  1646 + - @1@option@1@screen@2@option@2@: omit annotations that indicate
  1647 + they should not appear on the screen
  1648 +
  1649 + Note that form fields are special because the annotations that are
  1650 + used to render filled-in form fields may become out of date from the
  1651 + fields' values if the form is filled in by a program that doesn't
  1652 + know how to update the appearances. If qpdf detects this case, its
  1653 + default behavior is not to flatten those annotations because doing so
  1654 + would cause the value of the form field to be lost. This gives you a
  1655 + chance to go back and resave the form with a program that knows how
  1656 + to generate appearances. QPDF itself can generate appearances with
  1657 + some limitations. See the
  1658 + @1@option@1@--generate-appearances@2@option@2@ option below.
  1659 +
  1660 +@1@option@1@--generate-appearances@2@option@2@
  1661 + If a file contains interactive form fields and indicates that the
  1662 + appearances are out of date with the values of the form, this flag
  1663 + will regenerate appearances, subject to a few limitations. Note that
  1664 + there is not usually a reason to do this, but it can be necessary
  1665 + before using the @1@option@1@--flatten-annotations@2@option@2@
  1666 + option. Most of these are not a problem with well-behaved PDF files.
  1667 + The limitations are as follows:
  1668 +
  1669 + - Radio button and checkbox appearances use the pre-set values in
  1670 + the PDF file. QPDF just makes sure that the correct appearance is
  1671 + displayed based on the value of the field. This is fine for PDF
  1672 + files that create their forms properly. Some PDF writers save
  1673 + appearances for fields when they change, which could cause some
  1674 + controls to have inconsistent appearances.
  1675 +
  1676 + - For text fields and list boxes, any characters that fall outside
  1677 + of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman"
  1678 + encoding, will be replaced by the ``?`` character.
  1679 +
  1680 + - Quadding is ignored. Quadding is used to specify whether the
  1681 + contents of a field should be left, center, or right aligned with
  1682 + the field.
  1683 +
  1684 + - Rich text, multi-line, and other more elaborate formatting
  1685 + directives are ignored.
  1686 +
  1687 + - There is no support for multi-select fields or signature fields.
  1688 +
  1689 + If qpdf doesn't do a good enough job with your form, use an external
  1690 + application to save your filled-in form before processing it with
  1691 + qpdf.
  1692 +
  1693 +@1@option@1@--optimize-images@2@option@2@
  1694 + This flag causes qpdf to recompress all images that are not
  1695 + compressed with DCT (JPEG) using DCT compression as long as doing so
  1696 + decreases the size in bytes of the image data and the image does not
  1697 + fall below minimum specified dimensions. Useful information is
  1698 + provided when used in combination with
  1699 + @1@option@1@--verbose@2@option@2@. See also the
  1700 + @1@option@1@--oi-min-width@2@option@2@,
  1701 + @1@option@1@--oi-min-height@2@option@2@, and
  1702 + @1@option@1@--oi-min-area@2@option@2@ options. By default, starting
  1703 + in qpdf 8.4, inline images are converted to regular images and
  1704 + optimized as well. Use @1@option@1@--keep-inline-images@2@option@2@
  1705 + to prevent inline images from being included.
  1706 +
  1707 +@1@option@1@--oi-min-width=@1@replaceable@1@width@2@replaceable@2@@2@option@2@
  1708 + Avoid optimizing images whose width is below the specified amount. If
  1709 + omitted, the default is 128 pixels. Use 0 for no minimum.
  1710 +
  1711 +@1@option@1@--oi-min-height=@1@replaceable@1@height@2@replaceable@2@@2@option@2@
  1712 + Avoid optimizing images whose height is below the specified amount.
  1713 + If omitted, the default is 128 pixels. Use 0 for no minimum.
  1714 +
  1715 +@1@option@1@--oi-min-area=@1@replaceable@1@area-in-pixels@2@replaceable@2@@2@option@2@
  1716 + Avoid optimizing images whose pixel count (widthย ร—ย height) is below
  1717 + the specified amount. If omitted, the default is 16,384 pixels. Use 0
  1718 + for no minimum.
  1719 +
  1720 +@1@option@1@--externalize-inline-images@2@option@2@
  1721 + Convert inline images to regular images. By default, images whose
  1722 + data is at least 1,024 bytes are converted when this option is
  1723 + selected. Use @1@option@1@--ii-min-bytes@2@option@2@ to change the
  1724 + size threshold. This option is implicitly selected when
  1725 + @1@option@1@--optimize-images@2@option@2@ is selected. Use
  1726 + @1@option@1@--keep-inline-images@2@option@2@ to exclude inline images
  1727 + from image optimization.
  1728 +
  1729 +@1@option@1@--ii-min-bytes=@1@replaceable@1@bytes@2@replaceable@2@@2@option@2@
  1730 + Avoid converting inline images whose size is below the specified
  1731 + minimum size to regular images. If omitted, the default is 1,024
  1732 + bytes. Use 0 for no minimum.
  1733 +
  1734 +@1@option@1@--keep-inline-images@2@option@2@
  1735 + Prevent inline images from being included in image optimization. This
  1736 + option has no affect when @1@option@1@--optimize-images@2@option@2@
  1737 + is not specified.
  1738 +
  1739 +@1@option@1@--remove-page-labels@2@option@2@
  1740 + Remove page labels from the output file.
  1741 +
  1742 +@1@option@1@--qdf@2@option@2@
  1743 + Turns on QDF mode. For additional information on QDF, please see `QDF
  1744 + Mode <#ref.qdf>`__. Note that @1@option@1@--linearize@2@option@2@
  1745 + disables QDF mode.
  1746 +
  1747 +@1@option@1@--min-version=@1@replaceable@1@version@2@replaceable@2@@2@option@2@
  1748 + Forces the PDF version of the output file to be at least
  1749 + @1@replaceable@1@version@2@replaceable@2@. In other words, if the
  1750 + input file has a lower version than the specified version, the
  1751 + specified version will be used. If the input file has a higher
  1752 + version, the input file's original version will be used. It is seldom
  1753 + necessary to use this option since qpdf will automatically increase
  1754 + the version as needed when adding features that require newer PDF
  1755 + readers.
  1756 +
  1757 + The version number may be expressed in the form
  1758 + @1@replaceable@1@major.minor.extension-level@2@replaceable@2@, in
  1759 + which case the version is interpreted as
  1760 + @1@replaceable@1@major.minor@2@replaceable@2@ at extension level
  1761 + @1@replaceable@1@extension-level@2@replaceable@2@. For example,
  1762 + version ``1.7.8`` represents version 1.7 at extension level 8. Note
  1763 + that minimal syntax checking is done on the command line.
  1764 +
  1765 +@1@option@1@--force-version=@1@replaceable@1@version@2@replaceable@2@@2@option@2@
  1766 + This option forces the PDF version to be the exact version specified
  1767 + *even when the file may have content that is not supported in that
  1768 + version*. The version number is interpreted in the same way as with
  1769 + @1@option@1@--min-version@2@option@2@ so that extension levels can be
  1770 + set. In some cases, forcing the output file's PDF version to be lower
  1771 + than that of the input file will cause qpdf to disable certain
  1772 + features of the document. Specifically, 256-bit keys are disabled if
  1773 + the version is less than 1.7 with extension level 8 (except R5 is
  1774 + disabled if less than 1.7 with extension level 3), AES encryption is
  1775 + disabled if the version is less than 1.6, cleartext metadata and
  1776 + object streams are disabled if less than 1.5, 128-bit encryption keys
  1777 + are disabled if less than 1.4, and all encryption is disabled if less
  1778 + than 1.3. Even with these precautions, qpdf won't be able to do
  1779 + things like eliminate use of newer image compression schemes,
  1780 + transparency groups, or other features that may have been added in
  1781 + more recent versions of PDF.
  1782 +
  1783 + As a general rule, with the exception of big structural things like
  1784 + the use of object streams or AES encryption, PDF viewers are supposed
  1785 + to ignore features in files that they don't support from newer
  1786 + versions. This means that forcing the version to a lower version may
  1787 + make it possible to open your PDF file with an older version, though
  1788 + bear in mind that some of the original document's functionality may
  1789 + be lost.
  1790 +
  1791 +By default, when a stream is encoded using non-lossy filters that qpdf
  1792 +understands and is not already compressed using a good compression
  1793 +scheme, qpdf will uncompress and recompress streams. Assuming proper
  1794 +filter implements, this is safe and generally results in smaller files.
  1795 +This behavior may also be explicitly requested with
  1796 +@1@option@1@--stream-data=compress@2@option@2@.
  1797 +
  1798 +When @1@option@1@--normalize-content=y@2@option@2@ is specified, qpdf
  1799 +will attempt to normalize whitespace and newlines in page content
  1800 +streams. This is generally safe but could, in some cases, cause damage
  1801 +to the content streams. This option is intended for people who wish to
  1802 +study PDF content streams or to debug PDF content. You should not use
  1803 +this for "production" PDF files.
  1804 +
  1805 +When normalizing content, if qpdf runs into any lexical errors, it will
  1806 +print a warning indicating that content may be damaged. The only
  1807 +situation in which qpdf is known to cause damage during content
  1808 +normalization is when a page's contents are split across multiple
  1809 +streams and streams are split in the middle of a lexical token such as a
  1810 +string, name, or inline image. Note that files that do this are invalid
  1811 +since the PDF specification states that content streams are not to be
  1812 +split in the middle of a token. If you want to inspect the original
  1813 +content streams in an uncompressed format, you can always run with
  1814 +@1@option@1@--qdf --normalize-content=n@2@option@2@ for a QDF file
  1815 +without content normalization, or alternatively
  1816 +@1@option@1@--stream-data=uncompress@2@option@2@ for a regular non-QDF
  1817 +mode file with uncompressed streams. These will both uncompress all the
  1818 +streams but will not attempt to normalize content. Please note that if
  1819 +you are using content normalization or QDF mode for the purpose of
  1820 +manually inspecting files, you don't have to care about this.
  1821 +
  1822 +Object streams, also known as compressed objects, were introduced into
  1823 +the PDF specification at version 1.5, corresponding to Acrobat 6. Some
  1824 +older PDF viewers may not support files with object streams. qpdf can be
  1825 +used to transform files with object streams to files without object
  1826 +streams or vice versa. As mentioned above, there are three object stream
  1827 +modes: @1@option@1@preserve@2@option@2@,
  1828 +@1@option@1@disable@2@option@2@, and @1@option@1@generate@2@option@2@.
  1829 +
  1830 +In @1@option@1@preserve@2@option@2@ mode, the relationship to objects
  1831 +and the streams that contain them is preserved from the original file.
  1832 +In @1@option@1@disable@2@option@2@ mode, all objects are written as
  1833 +regular, uncompressed objects. The resulting file should be readable by
  1834 +older PDF viewers. (Of course, the content of the files may include
  1835 +features not supported by older viewers, but at least the structure will
  1836 +be supported.) In @1@option@1@generate@2@option@2@ mode, qpdf will
  1837 +create its own object streams. This will usually result in more compact
  1838 +PDF files, though they may not be readable by older viewers. In this
  1839 +mode, qpdf will also make sure the PDF version number in the header is
  1840 +at least 1.5.
  1841 +
  1842 +The @1@option@1@--qdf@2@option@2@ flag turns on QDF mode, which changes
  1843 +some of the defaults described above. Specifically, in QDF mode, by
  1844 +default, stream data is uncompressed, content streams are normalized,
  1845 +and encryption is removed. These defaults can still be overridden by
  1846 +specifying the appropriate options as described above. Additionally, in
  1847 +QDF mode, stream lengths are stored as indirect objects, objects are
  1848 +laid out in a less efficient but more readable fashion, and the
  1849 +documents are interspersed with comments that make it easier for the
  1850 +user to find things and also make it possible for
  1851 +@1@command@1@fix-qdf@2@command@2@ to work properly. QDF mode is intended
  1852 +for people, mostly developers, who wish to inspect or modify PDF files
  1853 +in a text editor. For details, please see `QDF Mode <#ref.qdf>`__.
  1854 +
  1855 +.. _ref.testing-options:
  1856 +
  1857 +Testing, Inspection, and Debugging Options
  1858 +------------------------------------------
  1859 +
  1860 +These options can be useful for digging into PDF files or for use in
  1861 +automated test suites for software that uses the qpdf library. When any
  1862 +of the options in this section are specified, no output file should be
  1863 +given. The following options are available:
  1864 +
  1865 +@1@option@1@--deterministic-id@2@option@2@
  1866 + Causes generation of a deterministic value for /ID. This prevents use
  1867 + of timestamp and output file name information in the /ID generation.
  1868 + Instead, at some slight additional runtime cost, the /ID field is
  1869 + generated to include a digest of the significant parts of the content
  1870 + of the output PDF file. This means that a given qpdf operation should
  1871 + generate the same /ID each time it is run, which can be useful when
  1872 + caching results or for generation of some test data. Use of this flag
  1873 + is not compatible with creation of encrypted files.
  1874 +
  1875 +@1@option@1@--static-id@2@option@2@
  1876 + Causes generation of a fixed value for /ID. This is intended for
  1877 + testing only. Never use it for production files. If you are trying to
  1878 + get the same /ID each time for a given file and you are not
  1879 + generating encrypted files, consider using the
  1880 + @1@option@1@--deterministic-id@2@option@2@ option.
  1881 +
  1882 +@1@option@1@--static-aes-iv@2@option@2@
  1883 + Causes use of a static initialization vector for AES-CBC. This is
  1884 + intended for testing only so that output files can be reproducible.
  1885 + Never use it for production files. This option in particular is not
  1886 + secure since it significantly weakens the encryption.
  1887 +
  1888 +@1@option@1@--no-original-object-ids@2@option@2@
  1889 + Suppresses inclusion of original object ID comments in QDF files.
  1890 + This can be useful when generating QDF files for test purposes,
  1891 + particularly when comparing them to determine whether two PDF files
  1892 + have identical content.
  1893 +
  1894 +@1@option@1@--show-encryption@2@option@2@
  1895 + Shows document encryption parameters. Also shows the document's user
  1896 + password if the owner password is given.
  1897 +
  1898 +@1@option@1@--show-encryption-key@2@option@2@
  1899 + When encryption information is being displayed, as when
  1900 + @1@option@1@--check@2@option@2@ or
  1901 + @1@option@1@--show-encryption@2@option@2@ is given, display the
  1902 + computed or retrieved encryption key as a hexadecimal string. This
  1903 + value is not ordinarily useful to users, but it can be used as the
  1904 + argument to @1@option@1@--password@2@option@2@ if the
  1905 + @1@option@1@--password-is-hex-key@2@option@2@ is specified. Note
  1906 + that, when PDF files are encrypted, passwords and other metadata are
  1907 + used only to compute an encryption key, and the encryption key is
  1908 + what is actually used for encryption. This enables retrieval of that
  1909 + key.
  1910 +
  1911 +@1@option@1@--check-linearization@2@option@2@
  1912 + Checks file integrity and linearization status.
  1913 +
  1914 +@1@option@1@--show-linearization@2@option@2@
  1915 + Checks and displays all data in the linearization hint tables.
  1916 +
  1917 +@1@option@1@--show-xref@2@option@2@
  1918 + Shows the contents of the cross-reference table in a human-readable
  1919 + form. This is especially useful for files with cross-reference
  1920 + streams which are stored in a binary format.
  1921 +
  1922 +@1@option@1@--show-object=trailer|obj[,gen]@2@option@2@
  1923 + Show the contents of the given object. This is especially useful for
  1924 + inspecting objects that are inside of object streams (also known as
  1925 + "compressed objects").
  1926 +
  1927 +@1@option@1@--raw-stream-data@2@option@2@
  1928 + When used along with the @1@option@1@--show-object@2@option@2@
  1929 + option, if the object is a stream, shows the raw stream data instead
  1930 + of object's contents.
  1931 +
  1932 +@1@option@1@--filtered-stream-data@2@option@2@
  1933 + When used along with the @1@option@1@--show-object@2@option@2@
  1934 + option, if the object is a stream, shows the filtered stream data
  1935 + instead of object's contents. If the stream is filtered using filters
  1936 + that qpdf does not support, an error will be issued.
  1937 +
  1938 +@1@option@1@--show-npages@2@option@2@
  1939 + Prints the number of pages in the input file on a line by itself.
  1940 + Since the number of pages appears by itself on a line, this option
  1941 + can be useful for scripting if you need to know the number of pages
  1942 + in a file.
  1943 +
  1944 +@1@option@1@--show-pages@2@option@2@
  1945 + Shows the object and generation number for each page dictionary
  1946 + object and for each content stream associated with the page. Having
  1947 + this information makes it more convenient to inspect objects from a
  1948 + particular page.
  1949 +
  1950 +@1@option@1@--with-images@2@option@2@
  1951 + When used along with @1@option@1@--show-pages@2@option@2@, also shows
  1952 + the object and generation numbers for the image objects on each page.
  1953 + (At present, information about images in shared resource dictionaries
  1954 + are not output by this command. This is discussed in a comment in the
  1955 + source code.)
  1956 +
  1957 +@1@option@1@--json@2@option@2@
  1958 + Generate a JSON representation of the file. This is described in
  1959 + depth in `QPDF JSON <#ref.json>`__
  1960 +
  1961 +@1@option@1@--json-help@2@option@2@
  1962 + Describe the format of the JSON output.
  1963 +
  1964 +@1@option@1@--json-key=key@2@option@2@
  1965 + This option is repeatable. If specified, only top-level keys
  1966 + specified will be included in the JSON output. If not specified, all
  1967 + keys will be shown.
  1968 +
  1969 +@1@option@1@--json-object=trailer|obj[,gen]@2@option@2@
  1970 + This option is repeatable. If specified, only specified objects will
  1971 + be shown in the "``objects``" key of the JSON output. If absent, all
  1972 + objects will be shown.
  1973 +
  1974 +@1@option@1@--check@2@option@2@
  1975 + Checks file structure and well as encryption, linearization, and
  1976 + encoding of stream data. A file for which
  1977 + @1@option@1@--check@2@option@2@ reports no errors may still have
  1978 + errors in stream data content but should otherwise be structurally
  1979 + sound. If @1@option@1@--check@2@option@2@ any errors, qpdf will exit
  1980 + with a status of 2. There are some recoverable conditions that
  1981 + @1@option@1@--check@2@option@2@ detects. These are issued as warnings
  1982 + instead of errors. If qpdf finds no errors but finds warnings, it
  1983 + will exit with a status of 3 (as of versionย 2.0.4). When
  1984 + @1@option@1@--check@2@option@2@ is combined with other options,
  1985 + checks are always performed before any other options are processed.
  1986 + For erroneous files, @1@option@1@--check@2@option@2@ will cause qpdf
  1987 + to attempt to recover, after which other options are effectively
  1988 + operating on the recovered file. Combining
  1989 + @1@option@1@--check@2@option@2@ with other options in this way can be
  1990 + useful for manually recovering severely damaged files. Note that
  1991 + @1@option@1@--check@2@option@2@ produces no output to standard output
  1992 + when everything is valid, so if you are using this to
  1993 + programmatically validate files in bulk, it is safe to run without
  1994 + output redirected to @1@filename@1@/dev/null@2@filename@2@ and just
  1995 + check for a 0 exit code.
  1996 +
  1997 +The @1@option@1@--raw-stream-data@2@option@2@ and
  1998 +@1@option@1@--filtered-stream-data@2@option@2@ options are ignored
  1999 +unless @1@option@1@--show-object@2@option@2@ is given. Either of these
  2000 +options will cause the stream data to be written to standard output. In
  2001 +order to avoid commingling of stream data with other output, it is
  2002 +recommend that these objects not be combined with other test/inspection
  2003 +options.
  2004 +
  2005 +If @1@option@1@--filtered-stream-data@2@option@2@ is given and
  2006 +@1@option@1@--normalize-content=y@2@option@2@ is also given, qpdf will
  2007 +attempt to normalize the stream data as if it is a page content stream.
  2008 +This attempt will be made even if it is not a page content stream, in
  2009 +which case it will produce unusable results.
  2010 +
  2011 +.. _ref.unicode-passwords:
  2012 +
  2013 +Unicode Passwords
  2014 +-----------------
  2015 +
  2016 +At the library API level, all methods that perform encryption and
  2017 +decryption interpret passwords as strings of bytes. It is up to the
  2018 +caller to ensure that they are appropriately encoded. Starting with qpdf
  2019 +version 8.4.0, qpdf will attempt to make this easier for you when
  2020 +interact with qpdf via its command line interface. The PDF specification
  2021 +requires passwords used to encrypt files with 40-bit or 128-bit
  2022 +encryption to be encoded with PDF Doc encoding. This encoding is a
  2023 +single-byte encoding that supports ISO-Latin-1 and a handful of other
  2024 +commonly used characters. It has a large overlap with Windows ANSI but
  2025 +is not exactly the same. There is generally not a way to provide PDF Doc
  2026 +encoded strings on the command line. As such, qpdf versions prior to
  2027 +8.4.0 would often create PDF files that couldn't be opened with other
  2028 +software when given a password with non-ASCII characters to encrypt a
  2029 +file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf
  2030 +recognizes the encoding of the parameter and transcodes it as needed.
  2031 +The rest of this section provides the details about exactly how qpdf
  2032 +behaves. Most users will not need to know this information, but it might
  2033 +be useful if you have been working around qpdf's old behavior or if you
  2034 +are using qpdf to generate encrypted files for testing other PDF
  2035 +software.
  2036 +
  2037 +A note about Windows: when qpdf builds, it attempts to determine what it
  2038 +has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain``
  2039 +function is an alternative entry point that receives all arguments as
  2040 +UTF-16-encoded strings. When qpdf starts up this way, it converts all
  2041 +the strings to UTF-8 encoding and then invokes the regular main. This
  2042 +means that, as far as qpdf is concerned, it receives its command-line
  2043 +arguments with UTF-8 encoding, just as it would in any modern Linux or
  2044 +UNIX environment.
  2045 +
  2046 +If a file is being encrypted with 40-bit or 128-bit encryption and the
  2047 +supplied password is not a valid UTF-8 string, qpdf will fall back to
  2048 +the behavior of interpreting the password as a string of bytes. If you
  2049 +have old scripts that encrypt files by passing the output of
  2050 +@1@command@1@iconv@2@command@2@ to qpdf, you no longer need to do that,
  2051 +but if you do, qpdf should still work. The only exception would be for
  2052 +the extremely unlikely case of a password that is encoded with a
  2053 +single-byte encoding but also happens to be valid UTF-8. Such a password
  2054 +would contain strings of even numbers of characters that alternate
  2055 +between accented letters and symbols. In the extremely unlikely event
  2056 +that you are intentionally using such passwords and qpdf is thwarting
  2057 +you by interpreting them as UTF-8, you can use
  2058 +@1@option@1@--password-mode=bytes@2@option@2@ to suppress qpdf's
  2059 +automatic behavior.
  2060 +
  2061 +The @1@option@1@--password-mode@2@option@2@ option, as described earlier
  2062 +in this chapter, can be used to change qpdf's interpretation of supplied
  2063 +passwords. There are very few reasons to use this option. One would be
  2064 +the unlikely case described in the previous paragraph in which the
  2065 +supplied password happens to be valid UTF-8 but isn't supposed to be
  2066 +UTF-8. Your best bet would be just to provide the password as a valid
  2067 +UTF-8 string, but you could also use
  2068 +@1@option@1@--password-mode=bytes@2@option@2@. Another reason to use
  2069 +@1@option@1@--password-mode=bytes@2@option@2@ would be to intentionally
  2070 +generate PDF files encrypted with passwords that are not properly
  2071 +encoded. The qpdf test suite does this to generate invalid files for the
  2072 +purpose of testing its password recovery capability. If you were trying
  2073 +to create intentionally incorrect files for a similar purposes, the
  2074 +@1@option@1@bytes@2@option@2@ password mode can enable you to do this.
  2075 +
  2076 +When qpdf attempts to decrypt a file with a password that contains
  2077 +non-ASCII characters, it will generate a list of alternative passwords
  2078 +by attempting to interpret the password as each of a handful of
  2079 +different coding systems and then transcode them to the required format.
  2080 +This helps to compensate for the supplied password being given in the
  2081 +wrong coding system, such as would happen if you used the
  2082 +@1@command@1@iconv@2@command@2@ workaround that was previously needed.
  2083 +It also generates passwords by doing the reverse operation: translating
  2084 +from correct in incorrect encoding of the password. This would enable
  2085 +qpdf to decrypt files using passwords that were improperly encoded by
  2086 +whatever software encrypted the files, including older versions of qpdf
  2087 +invoked without properly encoded passwords. The combination of these two
  2088 +recovery methods should make qpdf transparently open most encrypted
  2089 +files with the password supplied correctly but in the wrong coding
  2090 +system. There are no real downsides to this behavior, but if you don't
  2091 +want qpdf to do this, you can use the
  2092 +@1@option@1@--suppress-password-recovery@2@option@2@ option. One reason
  2093 +to do that is to ensure that you know the exact password that was used
  2094 +to encrypt the file.
  2095 +
  2096 +With these changes, qpdf now generates compliant passwords in most
  2097 +cases. There are still some exceptions. In particular, the PDF
  2098 +specification directs compliant writers to normalize Unicode passwords
  2099 +and to perform certain transformations on passwords with bidirectional
  2100 +text. Implementing this functionality requires using a real Unicode
  2101 +library like ICU. If a client application that uses qpdf wants to do
  2102 +this, the qpdf library will accept the resulting passwords, but qpdf
  2103 +will not perform these transformations itself. It is possible that this
  2104 +will be addressed in a future version of qpdf. The ``QPDFWriter``
  2105 +methods that enable encryption on the output file accept passwords as
  2106 +strings of bytes.
  2107 +
  2108 +Please note that the @1@option@1@--password-is-hex-key@2@option@2@
  2109 +option is unrelated to all this. This flag bypasses the normal process
  2110 +of going from password to encryption string entirely, allowing the raw
  2111 +encryption key to be specified directly. This is useful for forensic
  2112 +purposes or for brute-force recovery of files with unknown passwords.
  2113 +
  2114 +.. _ref.qdf:
  2115 +
  2116 +QDF Mode
  2117 +========
  2118 +
  2119 +In QDF mode, qpdf creates PDF files in what we call @1@firstterm@1@QDF
  2120 +form@2@firstterm@2@. A PDF file in QDF form, sometimes called a QDF
  2121 +file, is a completely valid PDF file that has ``%QDF-1.0`` as its third
  2122 +line (after the pdf header and binary characters) and has certain other
  2123 +characteristics. The purpose of QDF form is to make it possible to edit
  2124 +PDF files, with some restrictions, in an ordinary text editor. This can
  2125 +be very useful for experimenting with different PDF constructs or for
  2126 +making one-off edits to PDF files (though there are other reasons why
  2127 +this may not always work). Note that QDF mode does not support
  2128 +linearized files. If you enable linearization, QDF mode is automatically
  2129 +disabled.
  2130 +
  2131 +It is ordinarily very difficult to edit PDF files in a text editor for
  2132 +two reasons: most meaningful data in PDF files is compressed, and PDF
  2133 +files are full of offset and length information that makes it hard to
  2134 +add or remove data. A QDF file is organized in a manner such that, if
  2135 +edits are kept within certain constraints, the
  2136 +@1@command@1@fix-qdf@2@command@2@ program, distributed with qpdf, is
  2137 +able to restore edited files to a correct state. The
  2138 +@1@command@1@fix-qdf@2@command@2@ program takes no command-line
  2139 +arguments. It reads a possibly edited QDF file from standard input and
  2140 +writes a repaired file to standard output.
  2141 +
  2142 +The following attributes characterize a QDF file:
  2143 +
  2144 +- All objects appear in numerical order in the PDF file, including when
  2145 + objects appear in object streams.
  2146 +
  2147 +- Objects are printed in an easy-to-read format, and all line endings
  2148 + are normalized to UNIX line endings.
  2149 +
  2150 +- Unless specifically overridden, streams appear uncompressed (when
  2151 + qpdf supports the filters and they are compressed with a non-lossy
  2152 + compression scheme), and most content streams are normalized (line
  2153 + endings are converted to just a UNIX-style linefeeds).
  2154 +
  2155 +- All streams lengths are represented as indirect objects, and the
  2156 + stream length object is always the next object after the stream. If
  2157 + the stream data does not end with a newline, an extra newline is
  2158 + inserted, and a special comment appears after the stream indicating
  2159 + that this has been done.
  2160 +
  2161 +- If the PDF file contains object streams, if object stream *n*
  2162 + contains *k* objects, those objects are numbered from *n+1* through
  2163 + *n+k*, and the object number/offset pairs appear on a separate line
  2164 + for each object. Additionally, each object in the object stream is
  2165 + preceded by a comment indicating its object number and index. This
  2166 + makes it very easy to find objects in object streams.
  2167 +
  2168 +- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens,
  2169 + and ``endobj`` tokens appear on lines by themselves. A blank line
  2170 + follows every ``endobj`` token.
  2171 +
  2172 +- If there is a cross-reference stream, it is unfiltered.
  2173 +
  2174 +- Page dictionaries and page content streams are marked with special
  2175 + comments that make them easy to find.
  2176 +
  2177 +- Comments precede each object indicating the object number of the
  2178 + corresponding object in the original file.
  2179 +
  2180 +When editing a QDF file, any edits can be made as long as the above
  2181 +constraints are maintained. This means that you can freely edit a page's
  2182 +content without worrying about messing up the QDF file. It is also
  2183 +possible to add new objects so long as those objects are added after the
  2184 +last object in the file or subsequent objects are renumbered. If a QDF
  2185 +file has object streams in it, you can always add the new objects before
  2186 +the xref stream and then change the number of the xref stream, since
  2187 +nothing generally ever references it by number.
  2188 +
  2189 +It is not generally practical to remove objects from QDF files without
  2190 +messing up object numbering, but if you remove all references to an
  2191 +object, you can run qpdf on the file (after running
  2192 +@1@command@1@fix-qdf@2@command@2@), and qpdf will omit the now-orphaned
  2193 +object.
  2194 +
  2195 +When @1@command@1@fix-qdf@2@command@2@ is run, it goes through the file
  2196 +and recomputes the following parts of the file:
  2197 +
  2198 +- the ``/N``, ``/W``, and ``/First`` keys of all object stream
  2199 + dictionaries
  2200 +
  2201 +- the pairs of numbers representing object numbers and offsets of
  2202 + objects in object streams
  2203 +
  2204 +- all stream lengths
  2205 +
  2206 +- the cross-reference table or cross-reference stream
  2207 +
  2208 +- the offset to the cross-reference table or cross-reference stream
  2209 + following the ``startxref`` token
  2210 +
  2211 +.. _ref.using-library:
  2212 +
  2213 +Using the QPDF Library
  2214 +======================
  2215 +
  2216 +.. _ref.using.from-cxx:
  2217 +
  2218 +Using QPDF from C++
  2219 +-------------------
  2220 +
  2221 +The source tree for the qpdf package has an
  2222 +@1@filename@1@examples@2@filename@2@ directory that contains a few
  2223 +example programs. The @1@filename@1@qpdf/qpdf.cc@2@filename@2@ source
  2224 +file also serves as a useful example since it exercises almost all of
  2225 +the qpdf library's public interface. The best source of documentation on
  2226 +the library itself is reading comments in
  2227 +@1@filename@1@include/qpdf/QPDF.hh@2@filename@2@,
  2228 +@1@filename@1@include/qpdf/QPDFWriter.hh@2@filename@2@, and
  2229 +@1@filename@1@include/qpdf/QPDFObjectHandle.hh@2@filename@2@.
  2230 +
  2231 +All header files are installed in the
  2232 +@1@filename@1@include/qpdf@2@filename@2@ directory. It is recommend that
  2233 +you use ``#include
  2234 + <qpdf/QPDF.hh>`` rather than adding
  2235 +@1@filename@1@include/qpdf@2@filename@2@ to your include path.
  2236 +
  2237 +When linking against the qpdf static library, you may also need to
  2238 +specify ``-lz -ljpeg`` on your link command. If your system understands
  2239 +how to read libtool @1@filename@1@.la@2@filename@2@ files, this may not
  2240 +be necessary.
  2241 +
  2242 +The qpdf library is safe to use in a multithreaded program, but no
  2243 +individual ``QPDF`` object instance (including ``QPDF``,
  2244 +``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one
  2245 +thread at a time. Multiple threads may simultaneously work with
  2246 +different instances of these and all other QPDF objects.
  2247 +
  2248 +.. _ref.using.other-languages:
  2249 +
  2250 +Using QPDF from other languages
  2251 +-------------------------------
  2252 +
  2253 +The qpdf library is implemented in C++, which makes it hard to use
  2254 +directly in other languages. There are a few things that can help.
  2255 +
  2256 +"C"
  2257 + The qpdf library includes a "C" language interface that provides a
  2258 + subset of the overall capabilities. The header file
  2259 + @1@filename@1@qpdf/qpdf-c.h@2@filename@2@ includes information about
  2260 + its use. As long as you use a C++ linker, you can link C programs
  2261 + with qpdf and use the C API. For languages that can directly load
  2262 + methods from a shared library, the C API can also be useful. People
  2263 + have reported success using the C API from other languages on Windows
  2264 + by directly calling functions in the DLL.
  2265 +
  2266 +Python
  2267 + A Python module called
  2268 + `pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and
  2269 + highly functional set of Python bindings to the qpdf library. Using
  2270 + pikepdf, you can work with PDF files in a natural way and combine
  2271 + qpdf's capabilities with other functionality provided by Python's
  2272 + rich standard library and available modules.
  2273 +
  2274 +Other Languages
  2275 + Starting with version 8.3.0, the @1@command@1@qpdf@2@command@2@
  2276 + command-line tool can produce a JSON representation of the PDF file's
  2277 + non-content data. This can facilitate interacting programmatically
  2278 + with PDF files through qpdf's command line interface. For more
  2279 + information, please see `QPDF JSON <#ref.json>`__.
  2280 +
  2281 +.. _ref.unicode-files:
  2282 +
  2283 +A Note About Unicode File Names
  2284 +-------------------------------
  2285 +
  2286 +When strings are passed to qpdf library routines either as ``char*`` or
  2287 +as ``std::string``, they are treated as byte arrays except where
  2288 +otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless
  2289 +otherwise noted in comments in header files. In modern UNIX/Linux
  2290 +environments, this generally does the right thing. In Windows, it's a
  2291 +bit more complicated. Starting in qpdf 8.4.0, passwords that contain
  2292 +Unicode characters are handled much better, and starting in qpdf 8.4.1,
  2293 +the library attempts to properly handle Unicode characters in filenames.
  2294 +In particular, in Windows, if a UTF-8 encoded string is used as a
  2295 +filename in either ``QPDF`` or ``QPDFWriter``, it is internally
  2296 +converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As
  2297 +such, qpdf will generally operate properly on files with non-ASCII
  2298 +characters in their names as long as the filenames are UTF-8 encoded for
  2299 +passing into the qpdf library API, but there are still some rough edges,
  2300 +such as the encoding of the filenames in error messages our CLI output
  2301 +messages. Patches or bug reports are welcome for any continuing issues
  2302 +with Unicode file names in Windows.
  2303 +
  2304 +.. _ref.weak-crypto:
  2305 +
  2306 +Weak Cryptography
  2307 +=================
  2308 +
  2309 +Start with version 10.4, qpdf is taking steps to reduce the likelihood
  2310 +of a user *accidentally* creating PDF files with insecure cryptography
  2311 +but will continue to allow creation of such files indefinitely with
  2312 +explicit acknowledgment.
  2313 +
  2314 +The PDF file format makes use of RC4, which is known to be a weak
  2315 +cryptography algorithm, and MD5, which is a weak hashing algorithm. In
  2316 +version 10.4, qpdf generates warnings for some (but not all) cases of
  2317 +writing files with weak cryptography when invoked from the command-line.
  2318 +These warnings can be suppressed using the
  2319 +@1@option@1@--allow-weak-crypto@2@option@2@ option.
  2320 +
  2321 +It is planned for qpdf version 11 to be stricter, making it an error to
  2322 +write files with insecure cryptography from the command-line tool in
  2323 +most cases without specifying the
  2324 +@1@option@1@--allow-weak-crypto@2@option@2@ flag and also to require
  2325 +explicit steps when using the C++ library to enable use of insecure
  2326 +cryptography.
  2327 +
  2328 +Note that qpdf must always retain support for weak cryptographic
  2329 +algorithms since this is required for reading older PDF files that use
  2330 +it. Additionally, qpdf will always retain the ability to create files
  2331 +using weak cryptographic algorithms since, as a development tool, qpdf
  2332 +explicitly supports creating older or deprecated types of PDF files
  2333 +since these are sometimes needed to test or work with older versions of
  2334 +software. Even if other cryptography libraries drop support for RC4 or
  2335 +MD5, qpdf can always fall back to its internal implementations of those
  2336 +algorithms, so they are not going to disappear from qpdf.
  2337 +
  2338 +.. _ref.json:
  2339 +
  2340 +QPDF JSON
  2341 +=========
  2342 +
  2343 +.. _ref.json-overview:
  2344 +
  2345 +Overview
  2346 +--------
  2347 +
  2348 +Beginning with qpdf version 8.3.0, the @1@command@1@qpdf@2@command@2@
  2349 +command-line program can produce a JSON representation of the
  2350 +non-content data in a PDF file. It includes a dump in JSON format of all
  2351 +objects in the PDF file excluding the content of streams. This JSON
  2352 +representation makes it very easy to look in detail at the structure of
  2353 +a given PDF file, and it also provides a great way to work with PDF
  2354 +files programmatically from the command-line in languages that can't
  2355 +call or link with the qpdf library directly. Note that stream data can
  2356 +be extracted from PDF files using other qpdf command-line options.
  2357 +
  2358 +.. _ref.json-guarantees:
  2359 +
  2360 +JSON Guarantees
  2361 +---------------
  2362 +
  2363 +The qpdf JSON representation includes a JSON serialization of the raw
  2364 +objects in the PDF file as well as some computed information in a more
  2365 +easily extracted format. QPDF provides some guarantees about its JSON
  2366 +format. These guarantees are designed to simplify the experience of a
  2367 +developer working with the JSON format.
  2368 +
  2369 +Compatibility
  2370 + The top-level JSON object output is a dictionary. The JSON output
  2371 + contains various nested dictionaries and arrays. With the exception
  2372 + of dictionaries that are populated by the fields of objects from the
  2373 + file, all instances of a dictionary are guaranteed to have exactly
  2374 + the same keys. Future versions of qpdf are free to add additional
  2375 + keys but not to remove keys or change the type of object that a key
  2376 + points to. The qpdf program validates this guarantee, and in the
  2377 + unlikely event that a bug in qpdf should cause it to generate data
  2378 + that doesn't conform to this rule, it will ask you to file a bug
  2379 + report.
  2380 +
  2381 + The top-level JSON structure contains a "``version``" key whose value
  2382 + is simple integer. The value of the ``version`` key will be
  2383 + incremented if a non-compatible change is made. A non-compatible
  2384 + change would be any change that involves removal of a key, a change
  2385 + to the format of data pointed to by a key, or a semantic change that
  2386 + requires a different interpretation of a previously existing key. A
  2387 + strong effort will be made to avoid breaking compatibility.
  2388 +
  2389 +Documentation
  2390 + The @1@command@1@qpdf@2@command@2@ command can be invoked with the
  2391 + @1@option@1@--json-help@2@option@2@ option. This will output a JSON
  2392 + structure that has the same structure as the JSON output that qpdf
  2393 + generates, except that each field in the help output is a description
  2394 + of the corresponding field in the JSON output. The specific
  2395 + guarantees are as follows:
  2396 +
  2397 + - A dictionary in the help output means that the corresponding
  2398 + location in the actual JSON output is also a dictionary with
  2399 + exactly the same keys; that is, no keys present in help are absent
  2400 + in the real output, and no keys will be present in the real output
  2401 + that are not in help. As a special case, if the dictionary has a
  2402 + single key whose name starts with ``<`` and ends with ``>``, it
  2403 + means that the JSON output is a dictionary that can have any keys,
  2404 + each of which conforms to the value of the special key. This is
  2405 + used for cases in which the keys of the dictionary are things like
  2406 + object IDs.
  2407 +
  2408 + - A string in the help output is a description of the item that
  2409 + appears in the corresponding location of the actual output. The
  2410 + corresponding output can have any format.
  2411 +
  2412 + - An array in the help output always contains a single element. It
  2413 + indicates that the corresponding location in the actual output is
  2414 + also an array, and that each element of the array has whatever
  2415 + format is implied by the single element of the help output's
  2416 + array.
  2417 +
  2418 + For example, the help output indicates includes a "``pagelabels``"
  2419 + key whose value is an array of one element. That element is a
  2420 + dictionary with keys "``index``" and "``label``". In addition to
  2421 + describing the meaning of those keys, this tells you that the actual
  2422 + JSON output will contain a ``pagelabels`` array, each of whose
  2423 + elements is a dictionary that contains an ``index`` key, a ``label``
  2424 + key, and no other keys.
  2425 +
  2426 +Directness and Simplicity
  2427 + The JSON output contains the value of every object in the file, but
  2428 + it also contains some processed data. This is analogous to how qpdf's
  2429 + library interface works. The processed data is similar to the helper
  2430 + functions in that it allows you to look at certain aspects of the PDF
  2431 + file without having to understand all the nuances of the PDF
  2432 + specification, while the raw objects allow you to mine the PDF for
  2433 + anything that the higher-level interfaces are lacking.
  2434 +
  2435 +.. _json.limitations:
  2436 +
  2437 +Limitations of JSON Representation
  2438 +----------------------------------
  2439 +
  2440 +There are a few limitations to be aware of with the JSON structure:
  2441 +
  2442 +- Strings, names, and indirect object references in the original PDF
  2443 + file are all converted to strings in the JSON representation. In the
  2444 + case of a "normal" PDF file, you can tell the difference because a
  2445 + name starts with a slash (``/``), and an indirect object reference
  2446 + looks like ``n n R``, but if there were to be a string that looked
  2447 + like a name or indirect object reference, there would be no way to
  2448 + tell this from the JSON output. Note that there are certain cases
  2449 + where you know for sure what something is, such as knowing that
  2450 + dictionary keys in objects are always names and that certain things
  2451 + in the higher-level computed data are known to contain indirect
  2452 + object references.
  2453 +
  2454 +- The JSON format doesn't support binary data very well. Mostly the
  2455 + details are not important, but they are presented here for
  2456 + information. When qpdf outputs a string in the JSON representation,
  2457 + it converts the string to UTF-8, assuming usual PDF string semantics.
  2458 + Specifically, if the original string is UTF-16, it is converted to
  2459 + UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is
  2460 + converted to UTF-8 with that assumption. This causes strange things
  2461 + to happen to binary strings. For example, if you had the binary
  2462 + string ``<038051>``, this would be output to the JSON as ``\u0003โ€ขQ``
  2463 + because ``03`` is not a printable character and ``80`` is the bullet
  2464 + character in PDF doc encoding and is mapped to the Unicode value
  2465 + ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to
  2466 + convert back from here to a binary string, would have to recognize
  2467 + Unicode values whose code points are higher than ``0xFF`` and map
  2468 + those back to their corresponding PDF doc encoding characters. There
  2469 + is no way to tell the difference between a Unicode string that was
  2470 + originally encoded as UTF-16 or one that was converted from PDF doc
  2471 + encoding. In other words, it's best if you don't try to use the JSON
  2472 + format to extract binary strings from the PDF file, but if you really
  2473 + had to, it could be done. Note that qpdf's
  2474 + @1@option@1@--show-object@2@option@2@ option does not have this
  2475 + limitation and will reveal the string as encoded in the original
  2476 + file.
  2477 +
  2478 +.. _json.considerations:
  2479 +
  2480 +JSON: Special Considerations
  2481 +----------------------------
  2482 +
  2483 +For the most part, the built-in JSON help tells you everything you need
  2484 +to know about the JSON format, but there are a few non-obvious things to
  2485 +be aware of:
  2486 +
  2487 +- While qpdf guarantees that keys present in the help will be present
  2488 + in the output, those fields may be null or empty if the information
  2489 + is not known or absent in the file. Also, if you specify
  2490 + @1@option@1@--json-keys@2@option@2@, the keys that are not listed
  2491 + will be excluded entirely except for those that
  2492 + @1@option@1@--json-help@2@option@2@ says are always present.
  2493 +
  2494 +- In a few places, there are keys with names containing
  2495 + ``pageposfrom1``. The values of these keys are null or an integer. If
  2496 + an integer, they point to a page index within the file numbering from
  2497 + 1. Note that JSON indexes from 0, and you would also use 0-based
  2498 + indexing using the API. However, 1-based indexing is easier in this
  2499 + case because the command-line syntax for specifying page ranges is
  2500 + 1-based. If you were going to write a program that looked through the
  2501 + JSON for information about specific pages and then use the
  2502 + command-line to extract those pages, 1-based indexing is easier.
  2503 + Besides, it's more convenient to subtract 1 from a program in a real
  2504 + programming language than it is to add 1 from shell code.
  2505 +
  2506 +- The image information included in the ``page`` section of the JSON
  2507 + output includes the key "``filterable``". Note that the value of this
  2508 + field may depend on the @1@option@1@--decode-level@2@option@2@ that
  2509 + you invoke qpdf with. The JSON output includes a top-level key
  2510 + "``parameters``" that indicates the decode level used for computing
  2511 + whether a stream was filterable. For example, jpeg images will be
  2512 + shown as not filterable by default, but they will be shown as
  2513 + filterable if you run @1@command@1@qpdf --json
  2514 + --decode-level=all@2@command@2@.
  2515 +
  2516 +.. _ref.design:
  2517 +
  2518 +Design and Library Notes
  2519 +========================
  2520 +
  2521 +.. _ref.design.intro:
  2522 +
  2523 +Introduction
  2524 +------------
  2525 +
  2526 +This section was written prior to the implementation of the qpdf package
  2527 +and was subsequently modified to reflect the implementation. In some
  2528 +cases, for purposes of explanation, it may differ slightly from the
  2529 +actual implementation. As always, the source code and test suite are
  2530 +authoritative. Even if there are some errors, this document should serve
  2531 +as a road map to understanding how this code works.
  2532 +
  2533 +In general, one should adhere strictly to a specification when writing
  2534 +but be liberal in reading. This way, the product of our software will be
  2535 +accepted by the widest range of other programs, and we will accept the
  2536 +widest range of input files. This library attempts to conform to that
  2537 +philosophy whenever possible but also aims to provide strict checking
  2538 +for people who want to validate PDF files. If you don't want to see
  2539 +warnings and are trying to write something that is tolerant, you can
  2540 +call ``setSuppressWarnings(true)``. If you want to fail on the first
  2541 +error, you can call ``setAttemptRecovery(false)``. The default behavior
  2542 +is to generating warnings for recoverable problems. Note that recovery
  2543 +will not always produce the desired results even if it is able to get
  2544 +through the file. Unlike most other PDF files that produce generic
  2545 +warnings such as "This file is damaged,", qpdf generally issues a
  2546 +detailed error message that would be most useful to a PDF developer.
  2547 +This is by design as there seems to be a shortage of PDF validation
  2548 +tools out there. This was, in fact, one of the major motivations behind
  2549 +the initial creation of qpdf.
  2550 +
  2551 +.. _ref.design-goals:
  2552 +
  2553 +Design Goals
  2554 +------------
  2555 +
  2556 +The QPDF package includes support for reading and rewriting PDF files.
  2557 +It aims to hide from the user details involving object locations,
  2558 +modified (appended) PDF files, the directness/indirectness of objects,
  2559 +and stream filters including encryption. It does not aim to hide
  2560 +knowledge of the object hierarchy or content stream contents. Put
  2561 +another way, a user of the qpdf library is expected to have knowledge
  2562 +about how PDF files work, but is not expected to have to keep track of
  2563 +bookkeeping details such as file positions.
  2564 +
  2565 +A user of the library never has to care whether an object is direct or
  2566 +indirect, though it is possible to determine whether an object is direct
  2567 +or not if this information is needed. All access to objects deals with
  2568 +this transparently. All memory management details are also handled by
  2569 +the library.
  2570 +
  2571 +The ``PointerHolder`` object is used internally by the library to deal
  2572 +with memory management. This is basically a smart pointer object very
  2573 +similar in spirit to C++-11's ``std::shared_ptr`` object, but predating
  2574 +it by several years. This library also makes use of a technique for
  2575 +giving fine-grained access to methods in one class to other classes by
  2576 +using public subclasses with friends and only private members that in
  2577 +turn call private methods of the containing class. See
  2578 +``QPDFObjectHandle::Factory`` as an example.
  2579 +
  2580 +The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF
  2581 +file. The library provides methods for both accessing and mutating PDF
  2582 +files.
  2583 +
  2584 +The primary class for interacting with PDF objects is
  2585 +``QPDFObjectHandle``. Instances of this class can be passed around by
  2586 +value, copied, stored in containers, etc. with very low overhead.
  2587 +Instances of ``QPDFObjectHandle`` created by reading from a file will
  2588 +always contain a reference back to the ``QPDF`` object from which they
  2589 +were created. A ``QPDFObjectHandle`` may be direct or indirect. If
  2590 +indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to
  2591 +is a null pointer. In this case, the first attempt to access the
  2592 +underlying ``QPDFObject`` will result in the ``QPDFObject`` being
  2593 +resolved via a call to the referenced ``QPDF`` instance. This makes it
  2594 +essentially impossible to make coding errors in which certain things
  2595 +will work for some PDF files and not for others based on which objects
  2596 +are direct and which objects are indirect.
  2597 +
  2598 +Instances of ``QPDFObjectHandle`` can be directly created and modified
  2599 +using static factory methods in the ``QPDFObjectHandle`` class. There
  2600 +are factory methods for each type of object as well as a convenience
  2601 +method ``QPDFObjectHandle::parse`` that creates an object from a string
  2602 +representation of the object. Existing instances of ``QPDFObjectHandle``
  2603 +can also be modified in several ways. See comments in
  2604 +@1@filename@1@QPDFObjectHandle.hh@2@filename@2@ for details.
  2605 +
  2606 +An instance of ``QPDF`` is constructed by using the class's default
  2607 +constructor. If desired, the ``QPDF`` object may be configured with
  2608 +various methods that change its default behavior. Then the
  2609 +``QPDF::processFile()`` method is passed the name of a PDF file, which
  2610 +permanently associates the file with that QPDF object. A password may
  2611 +also be given for access to password-protected files. QPDF does not
  2612 +enforce encryption parameters and will treat user and owner passwords
  2613 +equivalently. Either password may be used to access an encrypted file.
  2614 +``QPDF`` will allow recovery of a user password given an owner password.
  2615 +The input PDF file must be seekable. (Output files written by
  2616 +``QPDFWriter`` need not be seekable, even when creating linearized
  2617 +files.) During construction, ``QPDF`` validates the PDF file's header,
  2618 +and then reads the cross reference tables and trailer dictionaries. The
  2619 +``QPDF`` class keeps only the first trailer dictionary though it does
  2620 +read all of them so it can check the ``/Prev`` key. ``QPDF`` class users
  2621 +may request the root object and the trailer dictionary specifically. The
  2622 +cross reference table is kept private. Objects may then be requested by
  2623 +number of by walking the object tree.
  2624 +
  2625 +When a PDF file has a cross-reference stream instead of a
  2626 +cross-reference table and trailer, requesting the document's trailer
  2627 +dictionary returns the stream dictionary from the cross-reference stream
  2628 +instead.
  2629 +
  2630 +There are some convenience routines for very common operations such as
  2631 +walking the page tree and returning a vector of all page objects. For
  2632 +full details, please see the header files
  2633 +@1@filename@1@QPDF.hh@2@filename@2@ and
  2634 +@1@filename@1@QPDFObjectHandle.hh@2@filename@2@. There are also some
  2635 +additional helper classes that provide higher level API functions for
  2636 +certain document constructions. These are discussed in `Helper
  2637 +Classes <#ref.helper-classes>`__.
  2638 +
  2639 +.. _ref.helper-classes:
  2640 +
  2641 +Helper Classes
  2642 +--------------
  2643 +
  2644 +QPDF version 8.1 introduced the concept of helper classes. Helper
  2645 +classes are intended to contain higher level APIs that allow developers
  2646 +to work with certain document constructs at an abstraction level above
  2647 +that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of
  2648 +not hiding document structure from the developer. As with qpdf in
  2649 +general, the goal is take away some of the more tedious bookkeeping
  2650 +aspects of working with PDF files, not to remove the need for the
  2651 +developer to understand how the PDF construction in question works. The
  2652 +driving factor behind the creation of helper classes was to allow the
  2653 +evolution of higher level interfaces in qpdf without polluting the
  2654 +interfaces of the main top-level classes ``QPDF`` and
  2655 +``QPDFObjectHandle``.
  2656 +
  2657 +There are two kinds of helper classes: *document* helpers and *object*
  2658 +helpers. Document helpers are constructed with a reference to a ``QPDF``
  2659 +object and provide methods for working with structures that are at the
  2660 +document level. Object helpers are constructed with an instance of a
  2661 +``QPDFObjectHandle`` and provide methods for working with specific types
  2662 +of objects.
  2663 +
  2664 +Examples of document helpers include ``QPDFPageDocumentHelper``, which
  2665 +contains methods for operating on the document's page trees, such as
  2666 +enumerating all pages of a document and adding and removing pages; and
  2667 +``QPDFAcroFormDocumentHelper``, which contains document-level methods
  2668 +related to interactive forms, such as enumerating form fields and
  2669 +creating mappings between form fields and annotations.
  2670 +
  2671 +Examples of object helpers include ``QPDFPageObjectHelper`` for
  2672 +performing operations on pages such as page rotation and some operations
  2673 +on content streams, ``QPDFFormFieldObjectHelper`` for performing
  2674 +operations related to interactive form fields, and
  2675 +``QPDFAnnotationObjectHelper`` for working with annotations.
  2676 +
  2677 +It is always possible to retrieve the underlying ``QPDF`` reference from
  2678 +a document helper and the underlying ``QPDFObjectHandle`` reference from
  2679 +an object helper. Helpers are designed to be helpers, not wrappers. The
  2680 +intention is that, in general, it is safe to freely intermix operations
  2681 +that use helpers with operations that use the underlying objects.
  2682 +Document and object helpers do not attempt to provide a complete
  2683 +interface for working with the things they are helping with, nor do they
  2684 +attempt to encapsulate underlying structures. They just provide a few
  2685 +methods to help with error-prone, repetitive, or complex tasks. In some
  2686 +cases, a helper object may cache some information that is expensive to
  2687 +gather. In such cases, the helper classes are implemented so that their
  2688 +own methods keep the cache consistent, and the header file will provide
  2689 +a method to invalidate the cache and a description of what kinds of
  2690 +operations would make the cache invalid. If in doubt, you can always
  2691 +discard a helper class and create a new one with the same underlying
  2692 +objects, which will ensure that you have discarded any stale
  2693 +information.
  2694 +
  2695 +By Convention, document helpers are called
  2696 +``QPDFSomethingDocumentHelper`` and are derived from
  2697 +``QPDFDocumentHelper``, and object helpers are called
  2698 +``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``.
  2699 +For details on specific helpers, please see their header files. You can
  2700 +find them by looking at
  2701 +@1@filename@1@include/qpdf/QPDF*DocumentHelper.hh@2@filename@2@ and
  2702 +@1@filename@1@include/qpdf/QPDF*ObjectHelper.hh@2@filename@2@.
  2703 +
  2704 +In order to avoid creation of circular dependencies, the following
  2705 +general guidelines are followed with helper classes:
  2706 +
  2707 +- Core class interfaces do not know about helper classes. For example,
  2708 + no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper
  2709 + classes in their interfaces.
  2710 +
  2711 +- Interfaces of object helpers will usually not use document helpers in
  2712 + their interfaces. This is because it is much more useful for document
  2713 + helpers to have methods that return object helpers. Most operations
  2714 + in PDF files start at the document level and go from there to the
  2715 + object level rather than the other way around. It can sometimes be
  2716 + useful to map back from object-level structures to document-level
  2717 + structures. If there is a desire to do this, it will generally be
  2718 + provided by a method in the document helper class.
  2719 +
  2720 +- Most of the time, object helpers don't know about other object
  2721 + helpers. However, in some cases, one type of object may be a
  2722 + container for another type of object, in which case it may make sense
  2723 + for the outer object to know about the inner object. For example,
  2724 + there are methods in the ``QPDFPageObjectHelper`` that know
  2725 + ``QPDFAnnotationObjectHelper`` because references to annotations are
  2726 + contained in page dictionaries.
  2727 +
  2728 +- Any helper or core library class may use helpers in their
  2729 + implementations.
  2730 +
  2731 +Prior to qpdf version 8.1, higher level interfaces were added as
  2732 +"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For
  2733 +compatibility, older convenience functions for operating with pages will
  2734 +remain in those classes even as alternatives are provided in helper
  2735 +classes. Going forward, new higher level interfaces will be provided
  2736 +using helper classes.
  2737 +
  2738 +.. _ref.implementation-notes:
  2739 +
  2740 +Implementation Notes
  2741 +--------------------
  2742 +
  2743 +This section contains a few notes about QPDF's internal implementation,
  2744 +particularly around what it does when it first processes a file. This
  2745 +section is a bit of a simplification of what it actually does, but it
  2746 +could serve as a starting point to someone trying to understand the
  2747 +implementation. There is nothing in this section that you need to know
  2748 +to use the qpdf library.
  2749 +
  2750 +``QPDFObject`` is the basic PDF Object class. It is an abstract base
  2751 +class from which are derived classes for each type of PDF object.
  2752 +Clients do not interact with Objects directly but instead interact with
  2753 +``QPDFObjectHandle``.
  2754 +
  2755 +When the ``QPDF`` class creates a new object, it dynamically allocates
  2756 +the appropriate type of ``QPDFObject`` and immediately hands the pointer
  2757 +to an instance of ``QPDFObjectHandle``. The parser reads a token from
  2758 +the current file position. If the token is a not either a dictionary or
  2759 +array opener, an object is immediately constructed from the single token
  2760 +and the parser returns. Otherwise, the parser iterates in a special mode
  2761 +in which it accumulates objects until it finds a balancing closer.
  2762 +During this process, the "``R``" keyword is recognized and an indirect
  2763 +``QPDFObjectHandle`` may be constructed.
  2764 +
  2765 +The ``QPDF::resolve()`` method, which is used to resolve an indirect
  2766 +object, may be invoked from the ``QPDFObjectHandle`` class. It first
  2767 +checks a cache to see whether this object has already been read. If not,
  2768 +it reads the object from the PDF file and caches it. It the returns the
  2769 +resulting ``QPDFObjectHandle``. The calling object handle then replaces
  2770 +its ``PointerHolder<QDFObject>`` with the one from the newly returned
  2771 +``QPDFObjectHandle``. In this way, only a single copy of any direct
  2772 +object need exist and clients can access objects transparently without
  2773 +knowing caring whether they are direct or indirect objects.
  2774 +Additionally, no object is ever read from the file more than once. That
  2775 +means that only the portions of the PDF file that are actually needed
  2776 +are ever read from the input file, thus allowing the qpdf package to
  2777 +take advantage of this important design goal of PDF files.
  2778 +
  2779 +If the requested object is inside of an object stream, the object stream
  2780 +itself is first read into memory. Then the tokenizer reads objects from
  2781 +the memory stream based on the offset information stored in the stream.
  2782 +Those individual objects are cached, after which the temporary buffer
  2783 +holding the object stream contents are discarded. In this way, the first
  2784 +time an object in an object stream is requested, all objects in the
  2785 +stream are cached.
  2786 +
  2787 +The following example should clarify how ``QPDF`` processes a simple
  2788 +file.
  2789 +
  2790 +- Client constructs ``QPDF`` ``pdf`` and calls
  2791 + ``pdf.processFile("a.pdf");``.
  2792 +
  2793 +- The ``QPDF`` class checks the beginning of
  2794 + @1@filename@1@a.pdf@2@filename@2@ for a PDF header. It then reads the
  2795 + cross reference table mentioned at the end of the file, ensuring that
  2796 + it is looking before the last ``%%EOF``. After getting to ``trailer``
  2797 + keyword, it invokes the parser.
  2798 +
  2799 +- The parser sees "``<<``", so it calls itself recursively in
  2800 + dictionary creation mode.
  2801 +
  2802 +- In dictionary creation mode, the parser keeps accumulating objects
  2803 + until it encounters "``>>``". Each object that is read is pushed onto
  2804 + a stack. If "``R``" is read, the last two objects on the stack are
  2805 + inspected. If they are integers, they are popped off the stack and
  2806 + their values are used to construct an indirect object handle which is
  2807 + then pushed onto the stack. When "``>>``" is finally read, the stack
  2808 + is converted into a ``QPDF_Dictionary`` which is placed in a
  2809 + ``QPDFObjectHandle`` and returned.
  2810 +
  2811 +- The resulting dictionary is saved as the trailer dictionary.
  2812 +
  2813 +- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that
  2814 + point and repeats except that the new trailer dictionary is not
  2815 + saved. If ``/Prev`` is not present, the initial parsing process is
  2816 + complete.
  2817 +
  2818 + If there is an encryption dictionary, the document's encryption
  2819 + parameters are initialized.
  2820 +
  2821 +- The client requests root object. The ``QPDF`` class gets the value of
  2822 + root key from trailer dictionary and returns it. It is an unresolved
  2823 + indirect ``QPDFObjectHandle``.
  2824 +
  2825 +- The client requests the ``/Pages`` key from root
  2826 + ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is
  2827 + indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the
  2828 + object cache for an object with the root dictionary's object ID and
  2829 + generation number. Upon not seeing it, it checks the cross reference
  2830 + table, gets the offset, and reads the object present at that offset.
  2831 + It stores the result in the object cache and returns the cached
  2832 + result. The calling ``QPDFObjectHandle`` replaces its object pointer
  2833 + with the one from the resolved ``QPDFObjectHandle``, verifies that it
  2834 + a valid dictionary object, and returns the (unresolved indirect)
  2835 + ``QPDFObject`` handle to the top of the Pages hierarchy.
  2836 +
  2837 + As the client continues to request objects, the same process is
  2838 + followed for each new requested object.
  2839 +
  2840 +.. _ref.casting:
  2841 +
  2842 +Casting Policy
  2843 +--------------
  2844 +
  2845 +This section describes the casting policy followed by qpdf's
  2846 +implementation. This is no concern to qpdf's end users and largely of no
  2847 +concern to people writing code that uses qpdf, but it could be of
  2848 +interest to people who are porting qpdf to a new platform or who are
  2849 +making modifications to the code.
  2850 +
  2851 +The C++ code in qpdf is free of old-style casts except where unavoidable
  2852 +(e.g. where the old-style cast is in a macro provided by a third-party
  2853 +header file). When there is a need for a cast, it is handled, in order
  2854 +of preference, by rewriting the code to avoid the need for a cast,
  2855 +calling ``const_cast``, calling ``static_cast``, calling
  2856 +``reinterpret_cast``, or calling some combination of the above. As a
  2857 +last resort, a compiler-specific ``#pragma`` may be used to suppress a
  2858 +warning that we don't want to fix. Examples may include suppressing
  2859 +warnings about the use of old-style casts in code that is shared between
  2860 +C and C++ code.
  2861 +
  2862 +The ``QIntC`` namespace, provided by
  2863 +@1@filename@1@include/qpdf/QIntC.hh@2@filename@2@, implements safe
  2864 +functions for converting between integer types. These functions do range
  2865 +checking and throw a ``std::range_error``, which is subclass of
  2866 +``std::runtime_error``, if conversion from one integer type to another
  2867 +results in loss of information. There are many cases in which we have to
  2868 +move between different integer types because of incompatible integer
  2869 +types used in interoperable interfaces. Some are unavoidable, such as
  2870 +moving between sizes and offsets, and others are there because of old
  2871 +code that is too in entrenched to be fixable without breaking source
  2872 +compatibility and causing pain for users. QPDF is compiled with extra
  2873 +warnings to detect conversions with potential data loss, and all such
  2874 +cases should be fixed by either using a function from ``QIntC`` or a
  2875 +``static_cast``.
  2876 +
  2877 +When the intention is just to switch the type because of exchanging data
  2878 +between incompatible interfaces, use ``QIntC``. This is the usual case.
  2879 +However, there are some cases in which we are explicitly intending to
  2880 +use the exact same bit pattern with a different type. This is most
  2881 +common when switching between signed and unsigned characters. A lot of
  2882 +qpdf's code uses unsigned characters internally, but ``std::string`` and
  2883 +``char`` are signed. Using ``QIntC::to_char`` would be wrong for
  2884 +converting from unsigned to signed characters because a negative
  2885 +``char`` value and the corresponding ``unsigned
  2886 + char`` value greater than 127 *mean the same thing*. There are also
  2887 +cases in which we use ``static_cast`` when working with bit fields where
  2888 +we are not representing a numerical value but rather a bunch of bits
  2889 +packed together in some integer type. Also note that ``size_t`` and
  2890 +``long`` both typically differ between 32-bit and 64-bit environments,
  2891 +so sometimes an explicit cast may not be needed to avoid warnings on one
  2892 +platform but may be needed on another. A conversion with ``QIntC``
  2893 +should always be used when the types are different even if the
  2894 +underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit
  2895 +platforms, and the test suite is very thorough, so it is hard to make
  2896 +any of the potential errors here without being caught in build or test.
  2897 +
  2898 +Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The
  2899 +pipeline interface has a ``write`` call that uses ``unsigned
  2900 + char*`` without a ``const`` qualifier. The main reason for this is
  2901 +to support pipelines that make calls to third-party libraries, such as
  2902 +zlib, that don't include ``const`` in their interfaces. Unfortunately,
  2903 +there are many places in the code where it is desirable to have ``const
  2904 + char*`` with pipelines. None of the pipeline implementations in qpdf
  2905 +currently modify the data passed to write, and doing so would be counter
  2906 +to the intent of ``Pipeline``, but there is nothing in the code to
  2907 +prevent this from being done. There are places in the code where
  2908 +``const_cast`` is used to remove the const-ness of pointers going into
  2909 +``Pipeline``\ s. This could theoretically be unsafe, but there is
  2910 +adequate testing to assert that it is safe and will remain safe in
  2911 +qpdf's code.
  2912 +
  2913 +.. _ref.encryption:
  2914 +
  2915 +Encryption
  2916 +----------
  2917 +
  2918 +Encryption is supported transparently by qpdf. When opening a PDF file,
  2919 +if an encryption dictionary exists, the ``QPDF`` object processes this
  2920 +dictionary using the password (if any) provided. The primary decryption
  2921 +key is computed and cached. No further access is made to the encryption
  2922 +dictionary after that time. When an object is read from a file, the
  2923 +object ID and generation of the object in which it is contained is
  2924 +always known. Using this information along with the stored encryption
  2925 +key, all stream and string objects are transparently decrypted. Raw
  2926 +encrypted objects are never stored in memory. This way, nothing in the
  2927 +library ever has to know or care whether it is reading an encrypted
  2928 +file.
  2929 +
  2930 +An interface is also provided for writing encrypted streams and strings
  2931 +given an encryption key. This is used by ``QPDFWriter`` when it rewrites
  2932 +encrypted files.
  2933 +
  2934 +When copying encrypted files, unless otherwise directed, qpdf will
  2935 +preserve any encryption in force in the original file. qpdf can do this
  2936 +with either the user or the owner password. There is no difference in
  2937 +capability based on which password is used. When 40 or 128 bit
  2938 +encryption keys are used, the user password can be recovered with the
  2939 +owner password. With 256 keys, the user and owner passwords are used
  2940 +independently to encrypt the actual encryption key, so while either can
  2941 +be used, the owner password can no longer be used to recover the user
  2942 +password.
  2943 +
  2944 +Starting with version 4.0.0, qpdf can read files that are not encrypted
  2945 +but that contain encrypted attachments, but it cannot write such files.
  2946 +qpdf also requires the password to be specified in order to open the
  2947 +file, not just to extract attachments, since once the file is open, all
  2948 +decryption is handled transparently. When copying files like this while
  2949 +preserving encryption, qpdf will apply the file's encryption to
  2950 +everything in the file, not just to the attachments. When decrypting the
  2951 +file, qpdf will decrypt the attachments. In general, when copying PDF
  2952 +files with multiple encryption formats, qpdf will choose the newest
  2953 +format. The only exception to this is that clear-text metadata will be
  2954 +preserved as clear-text if it is that way in the original file.
  2955 +
  2956 +One point of confusion some people have about encrypted PDF files is
  2957 +that encryption is not the same as password protection. Password
  2958 +protected files are always encrypted, but it is also possible to create
  2959 +encrypted files that do not have passwords. Internally, such files use
  2960 +the empty string as a password, and most readers try the empty string
  2961 +first to see if it works and prompt for a password only if the empty
  2962 +string doesn't work. Normally such files have an empty user password and
  2963 +a non-empty owner password. In that way, if the file is opened by an
  2964 +ordinary reader without specification of password, the restrictions
  2965 +specified in the encryption dictionary can be enforced. Most users
  2966 +wouldn't even realize such a file was encrypted. Since qpdf always
  2967 +ignores the restrictions (except for the purpose of reporting what they
  2968 +are), qpdf doesn't care which password you use. QPDF will allow you to
  2969 +create PDF files with non-empty user passwords and empty owner
  2970 +passwords. Some readers will require a password when you open these
  2971 +files, and others will open the files without a password and not enforce
  2972 +restrictions. Having a non-empty user password and an empty owner
  2973 +password doesn't really make sense because it would mean that opening
  2974 +the file with the user password would be more restrictive than not
  2975 +supplying a password at all. QPDF also allows you to create PDF files
  2976 +with the same password as both the user and owner password. Some readers
  2977 +will not ever allow such files to be accessed without restrictions
  2978 +because they never try the password as the owner password if it works as
  2979 +the user password. Nonetheless, one of the powerful aspects of qpdf is
  2980 +that it allows you to finely specify the way encrypted files are
  2981 +created, even if the results are not useful to some readers. One use
  2982 +case for this would be for testing a PDF reader to ensure that it
  2983 +handles odd configurations of input files.
  2984 +
  2985 +.. _ref.random-numbers:
  2986 +
  2987 +Random Number Generation
  2988 +------------------------
  2989 +
  2990 +QPDF generates random numbers to support generation of encrypted data.
  2991 +Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of
  2992 +random numbers. Older versions used the OS-provided source of secure
  2993 +random numbers or, if allowed at build time, insecure random numbers
  2994 +from stdlib. Starting with version 5.1.0, you can disable use of
  2995 +OS-provided secure random numbers at build time. This is especially
  2996 +useful on Windows if you want to avoid a dependency on Microsoft's
  2997 +cryptography API. You can also supply your own random data provider. For
  2998 +details on how to do this, please refer to the top-level README.md file
  2999 +in the source distribution and to comments in
  3000 +@1@filename@1@QUtil.hh@2@filename@2@.
  3001 +
  3002 +.. _ref.adding-and-remove-pages:
  3003 +
  3004 +Adding and Removing Pages
  3005 +-------------------------
  3006 +
  3007 +While qpdf's API has supported adding and modifying objects for some
  3008 +time, version 3.0 introduces specific methods for adding and removing
  3009 +pages. These are largely convenience routines that handle two tricky
  3010 +issues: pushing inheritable resources from the ``/Pages`` tree down to
  3011 +individual pages and manipulation of the ``/Pages`` tree itself. For
  3012 +details, see ``addPage`` and surrounding methods in
  3013 +@1@filename@1@QPDF.hh@2@filename@2@.
  3014 +
  3015 +.. _ref.reserved-objects:
  3016 +
  3017 +Reserving Object Numbers
  3018 +------------------------
  3019 +
  3020 +Version 3.0 of qpdf introduced the concept of reserved objects. These
  3021 +are seldom needed for ordinary operations, but there are cases in which
  3022 +you may want to add a series of indirect objects with references to each
  3023 +other to a ``QPDF`` object. This causes a problem because you can't
  3024 +determine the object ID that a new indirect object will have until you
  3025 +add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The
  3026 +only way to add two mutually referential objects to a ``QPDF`` object
  3027 +prior to version 3.0 would be to add the new objects first and then make
  3028 +them refer to each other after adding them. Now it is possible to create
  3029 +a @1@firstterm@1@reserved object@2@firstterm@2@ using
  3030 +``QPDFObjectHandle::newReserved``. This is an indirect object that stays
  3031 +"unresolved" even if it is queried for its type. So now, if you want to
  3032 +create a set of mutually referential objects, you can create
  3033 +reservations for each one of them and use those reservations to
  3034 +construct the references. When finished, you can call
  3035 +``QPDF::replaceReserved`` to replace the reserved objects with the real
  3036 +ones. This functionality will never be needed by most applications, but
  3037 +it is used internally by QPDF when copying objects from other PDF files,
  3038 +as discussed in `Copying Objects From Other PDF
  3039 +Files <#ref.foreign-objects>`__. For an example of how to use reserved
  3040 +objects, search for ``newReserved`` in
  3041 +@1@filename@1@test_driver.cc@2@filename@2@ in qpdf's sources.
  3042 +
  3043 +.. _ref.foreign-objects:
  3044 +
  3045 +Copying Objects From Other PDF Files
  3046 +------------------------------------
  3047 +
  3048 +Version 3.0 of qpdf introduced the ability to copy objects into a
  3049 +``QPDF`` object from a different ``QPDF`` object, which we refer to as
  3050 +@1@firstterm@1@foreign objects@2@firstterm@2@. This allows arbitrary
  3051 +merging of PDF files. The "from" ``QPDF`` object must remain valid after
  3052 +the copy as discussed in the note below. The
  3053 +@1@command@1@qpdf@2@command@2@ command-line tool provides limited
  3054 +support for basic page selection, including merging in pages from other
  3055 +files, but the library's API makes it possible to implement arbitrarily
  3056 +complex merging operations. The main method for copying foreign objects
  3057 +is ``QPDF::copyForeignObject``. This takes an indirect object from
  3058 +another ``QPDF`` and copies it recursively into this object while
  3059 +preserving all object structure, including circular references. This
  3060 +means you can add a direct object that you create from scratch to a
  3061 +``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an
  3062 +indirect object from another file with ``QPDF::copyForeignObject``. The
  3063 +fact that ``QPDF::makeIndirectObject`` does not automatically detect a
  3064 +foreign object and copy it is an explicit design decision. Copying a
  3065 +foreign object seems like a sufficiently significant thing to do that it
  3066 +should be done explicitly.
  3067 +
  3068 +The other way to copy foreign objects is by passing a page from one
  3069 +``QPDF`` to another by calling ``QPDF::addPage``. In contrast to
  3070 +``QPDF::makeIndirectObject``, this method automatically distinguishes
  3071 +between indirect objects in the current file, foreign objects, and
  3072 +direct objects.
  3073 +
  3074 +Please note: when you copy objects from one ``QPDF`` to another, the
  3075 +source ``QPDF`` object must remain valid until you have finished with
  3076 +the destination object. This is because the original object is still
  3077 +used to retrieve any referenced stream data from the copied object.
  3078 +
  3079 +.. _ref.rewriting:
  3080 +
  3081 +Writing PDF Files
  3082 +-----------------
  3083 +
  3084 +The qpdf library supports file writing of ``QPDF`` objects to PDF files
  3085 +through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two
  3086 +writing modes: one for non-linearized files, and one for linearized
  3087 +files. See `Linearization <#ref.linearization>`__ for a description of
  3088 +linearization is implemented. This section describes how we write
  3089 +non-linearized files including the creation of QDF files (see `QDF
  3090 +Mode <#ref.qdf>`__.
  3091 +
  3092 +This outline was written prior to implementation and is not exactly
  3093 +accurate, but it provides a correct "notional" idea of how writing
  3094 +works. Look at the code in ``QPDFWriter`` for exact details.
  3095 +
  3096 +- Initialize state:
  3097 +
  3098 + - next object number = 1
  3099 +
  3100 + - object queue = empty
  3101 +
  3102 + - renumber table: old object id/generation to new id/0 = empty
  3103 +
  3104 + - xref table: new id -> offset = empty
  3105 +
  3106 +- Create a QPDF object from a file.
  3107 +
  3108 +- Write header for new PDF file.
  3109 +
  3110 +- Request the trailer dictionary.
  3111 +
  3112 +- For each value that is an indirect object, grab the next object
  3113 + number (via an operation that returns and increments the number). Map
  3114 + object to new number in renumber table. Push object onto queue.
  3115 +
  3116 +- While there are more objects on the queue:
  3117 +
  3118 + - Pop queue.
  3119 +
  3120 + - Look up object's new number *n* in the renumbering table.
  3121 +
  3122 + - Store current offset into xref table.
  3123 +
  3124 + - Write ``@1@replaceable@1@n@2@replaceable@2@ 0 obj``.
  3125 +
  3126 + - If object is null, whether direct or indirect, write out null,
  3127 + thus eliminating unresolvable indirect object references.
  3128 +
  3129 + - If the object is a stream stream, write stream contents, piped
  3130 + through any filters as required, to a memory buffer. Use this
  3131 + buffer to determine the stream length.
  3132 +
  3133 + - If object is not a stream, array, or dictionary, write out its
  3134 + contents.
  3135 +
  3136 + - If object is an array or dictionary (including stream), traverse
  3137 + its elements (for array) or values (for dictionaries), handling
  3138 + recursive dictionaries and arrays, looking for indirect objects.
  3139 + When an indirect object is found, if it is not resolvable, ignore.
  3140 + (This case is handled when writing it out.) Otherwise, look it up
  3141 + in the renumbering table. If not found, grab the next available
  3142 + object number, assign to the referenced object in the renumbering
  3143 + table, and push the referenced object onto the queue. As a special
  3144 + case, when writing out a stream dictionary, replace length,
  3145 + filters, and decode parameters as required.
  3146 +
  3147 + Write out dictionary or array, replacing any unresolvable indirect
  3148 + object references with null (pdf spec says reference to
  3149 + non-existent object is legal and resolves to null) and any
  3150 + resolvable ones with references to the renumbered objects.
  3151 +
  3152 + - If the object is a stream, write ``stream\n``, the stream contents
  3153 + (from the memory buffer), and ``\nendstream\n``.
  3154 +
  3155 + - When done, write ``endobj``.
  3156 +
  3157 +Once we have finished the queue, all referenced objects will have been
  3158 +written out and all deleted objects or unreferenced objects will have
  3159 +been skipped. The new cross-reference table will contain an offset for
  3160 +every new object number from 1 up to the number of objects written. This
  3161 +can be used to write out a new xref table. Finally we can write out the
  3162 +trailer dictionary with appropriately computed /ID (see spec, 8.3, File
  3163 +Identifiers), the cross reference table offset, and ``%%EOF``.
  3164 +
  3165 +.. _ref.filtered-streams:
  3166 +
  3167 +Filtered Streams
  3168 +----------------
  3169 +
  3170 +Support for streams is implemented through the ``Pipeline`` interface
  3171 +which was designed for this package.
  3172 +
  3173 +When reading streams, create a series of ``Pipeline`` objects. The
  3174 +``Pipeline`` abstract base requires implementation ``write()`` and
  3175 +``finish()`` and provides an implementation of ``getNext()``. Each
  3176 +pipeline object, upon receiving data, does whatever it is going to do
  3177 +and then writes the data (possibly modified) to its successor.
  3178 +Alternatively, a pipeline may be an end-of-the-line pipeline that does
  3179 +something like store its output to a file or a memory buffer ignoring a
  3180 +successor. For additional details, look at
  3181 +@1@filename@1@Pipeline.hh@2@filename@2@.
  3182 +
  3183 +``QPDF`` can read raw or filtered streams. When reading a filtered
  3184 +stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each
  3185 +appropriate filter object and chains them together. The last filter
  3186 +should write to whatever type of output is required. The ``QPDF`` class
  3187 +has an interface to write raw or filtered stream contents to a given
  3188 +pipeline.
  3189 +
  3190 +.. _ref.object-accessors:
  3191 +
  3192 +Object Accessor Methods
  3193 +-----------------------
  3194 +
  3195 +@1@comment: This section is referenced in QPDFObjectHandle.hh @1@
  3196 +
  3197 +For general information about how to access instances of
  3198 +``QPDFObjectHandle``, please see the comments in
  3199 +@1@filename@1@QPDFObjectHandle.hh@2@filename@2@. Search for "Accessor
  3200 +methods". This section provides a more in-depth discussion of the
  3201 +behavior and the rationale for the behavior.
  3202 +
  3203 +*Why were type errors made into warnings?* When type checks were
  3204 +introduced into qpdf in the early days, it was expected that type errors
  3205 +would only occur as a result of programmer error. However, in practice,
  3206 +type errors would occur with malformed PDF files because of assumptions
  3207 +made in code, including code within the qpdf library and code written by
  3208 +library users. The most common case would be chaining calls to
  3209 +``getKey()`` to access keys deep within a dictionary. In many cases,
  3210 +qpdf would be able to recover from these situations, but the old
  3211 +behavior often resulted in crashes rather than graceful recovery. For
  3212 +this reason, the errors were changed to warnings.
  3213 +
  3214 +*Why even warn about type errors when the user can't usually do anything
  3215 +about them?* Type warnings are extremely valuable during development.
  3216 +Since it's impossible to catch at compile time things like typos in
  3217 +dictionary key names or logic errors around what the structure of a PDF
  3218 +file might be, the presence of type warnings can save lots of developer
  3219 +time. They have also proven useful in exposing issues in qpdf itself
  3220 +that would have otherwise gone undetected.
  3221 +
  3222 +*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if
  3223 +``QPDFObjectHandle`` could be more strongly typed so that you'd have to
  3224 +have check that something was of a particular type before calling
  3225 +type-specific accessor methods. However, implementing this at this stage
  3226 +of the library's history would be quite difficult, and it would make a
  3227 +the common pattern of drilling into an object no longer work. While it
  3228 +would be possible to have a parallel interface, it would create a lot of
  3229 +extra code. If qpdf were written in a language like rust, an interface
  3230 +like this would make a lot of sense, but, for a variety of reasons, the
  3231 +qpdf API is consistent with other APIs of its time, relying on exception
  3232 +handling to catch errors. The underlying PDF objects are inherently not
  3233 +type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would
  3234 +ultimately cause a lot more code to have to be written and would like
  3235 +make software that uses qpdf more brittle, and even so, checks would
  3236 +have to occur at runtime.
  3237 +
  3238 +*Why do type errors sometimes raise exceptions?* The way warnings work
  3239 +in qpdf requires a ``QPDF`` object to be associated with an object
  3240 +handle for a warning to be issued. It would be nice if this could be
  3241 +fixed, but it would require major changes to the API. Rather than
  3242 +throwing away these conditions, we convert them to exceptions. It's not
  3243 +that bad though. Since any object handle that was read from a file has
  3244 +an associated ``QPDF`` object, it would only be type errors on objects
  3245 +that were created explicitly that would cause exceptions, and in that
  3246 +case, type errors are much more likely to be the result of a coding
  3247 +error than invalid input.
  3248 +
  3249 +*Why does the behavior of a type exception differ between the C and C++
  3250 +API?* There is no way to throw and catch exceptions in C short of
  3251 +something like ``setjmp`` and ``longjmp``, and that approach is not
  3252 +portable across language barriers. Since the C API is often used from
  3253 +other languages, it's important to keep things as simple as possible.
  3254 +Starting in qpdf 10.5, exceptions that used to crash code using the C
  3255 +API will be written to stderr by default, and it is possible to register
  3256 +an error handler. There's no reason that the error handler can't
  3257 +simulate exception handling in some way, such as by using ``setjmp`` and
  3258 +``longjmp`` or by setting some variable that can be checked after
  3259 +library calls are made. In retrospect, it might have been better if the
  3260 +C API object handle methods returned error codes like the other methods
  3261 +and set return values in passed-in pointers, but this would complicate
  3262 +both the implementation and the use of the library for a case that is
  3263 +actually quite rare and largely avoidable.
  3264 +
  3265 +.. _ref.linearization:
  3266 +
  3267 +Linearization
  3268 +=============
  3269 +
  3270 +This chapter describes how ``QPDF`` and ``QPDFWriter`` implement
  3271 +creation and processing of linearized PDFS.
  3272 +
  3273 +.. _ref.linearization-strategy:
  3274 +
  3275 +Basic Strategy for Linearization
  3276 +--------------------------------
  3277 +
  3278 +To avoid the incestuous problem of having the qpdf library validate its
  3279 +own linearized files, we have a special linearized file checking mode
  3280 +which can be invoked via @1@command@1@qpdf
  3281 +--check-linearization@2@command@2@ (or @1@command@1@qpdf
  3282 +--check@2@command@2@). This mode reads the linearization parameter
  3283 +dictionary and the hint streams and validates that object ordering,
  3284 +parameters, and hint stream contents are correct. The validation code
  3285 +was first tested against linearized files created by external tools
  3286 +(Acrobat and pdlin) and then used to validate files created by
  3287 +``QPDFWriter`` itself.
  3288 +
  3289 +.. _ref.linearized.preparation:
  3290 +
  3291 +Preparing For Linearization
  3292 +---------------------------
  3293 +
  3294 +Before creating a linearized PDF file from any other PDF file, the PDF
  3295 +file must be altered such that all page attributes are propagated down
  3296 +to the page level (and not inherited from parents in the ``/Pages``
  3297 +tree). We also have to know which objects refer to which other objects,
  3298 +being concerned with page boundaries and a few other cases. We refer to
  3299 +this part of preparing the PDF file as
  3300 +@1@firstterm@1@optimization@2@firstterm@2@, discussed in
  3301 +`Optimization <#ref.optimization>`__. Note the, in this context, the
  3302 +term @1@firstterm@1@optimization@2@firstterm@2@ is a qpdf term, and the
  3303 +term @1@firstterm@1@linearization@2@firstterm@2@ is a term from the PDF
  3304 +specification. Do not be confused by the fact that many applications
  3305 +refer to linearization as optimization or web optimization.
  3306 +
  3307 +When creating linearized PDF files from optimized PDF files, there are
  3308 +really only a few issues that need to be dealt with:
  3309 +
  3310 +- Creation of hints tables
  3311 +
  3312 +- Placing objects in the correct order
  3313 +
  3314 +- Filling in offsets and byte sizes
  3315 +
  3316 +.. _ref.optimization:
  3317 +
  3318 +Optimization
  3319 +------------
  3320 +
  3321 +In order to perform various operations such as linearization and
  3322 +splitting files into pages, it is necessary to know which objects are
  3323 +referenced by which pages, page thumbnails, and root and trailer
  3324 +dictionary keys. It is also necessary to ensure that all page-level
  3325 +attributes appear directly at the page level and are not inherited from
  3326 +parents in the pages tree.
  3327 +
  3328 +We refer to the process of enforcing these constraints as
  3329 +@1@firstterm@1@optimization@2@firstterm@2@. As mentioned above, note
  3330 +that some applications refer to linearization as optimization. Although
  3331 +this optimization was initially motivated by the need to create
  3332 +linearized files, we are using these terms separately.
  3333 +
  3334 +PDF file optimization is implemented in the
  3335 +@1@filename@1@QPDF_optimization.cc@2@filename@2@ source file. That file
  3336 +is richly commented and serves as the primary reference for the
  3337 +optimization process.
  3338 +
  3339 +After optimization has been completed, the private member variables
  3340 +``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have
  3341 +been populated. Any object that has more than one value in the
  3342 +``object_to_obj_users`` table is shared. Any object that has exactly one
  3343 +value in the ``object_to_obj_users`` table is private. To find all the
  3344 +private objects in a page or a trailer or root dictionary key, one
  3345 +merely has make this determination for each element in the
  3346 +``obj_user_to_objects`` table for the given page or key.
  3347 +
  3348 +Note that pages and thumbnails have different object user types, so the
  3349 +above test on a page will not include objects referenced by the page's
  3350 +thumbnail dictionary and nothing else.
  3351 +
  3352 +.. _ref.linearization.writing:
  3353 +
  3354 +Writing Linearized Files
  3355 +------------------------
  3356 +
  3357 +We will create files with only primary hint streams. We will never write
  3358 +overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either,
  3359 +and they are never necessary.) The hint streams contain offset
  3360 +information to objects that point to where they would be if the hint
  3361 +stream were not present. This means that we have to calculate all object
  3362 +positions before we can generate and write the hint table. This means
  3363 +that we have to generate the file in two passes. To make this reliable,
  3364 +``QPDFWriter`` in linearization mode invokes exactly the same code twice
  3365 +to write the file to a pipeline.
  3366 +
  3367 +In the first pass, the target pipeline is a count pipeline chained to a
  3368 +discard pipeline. The count pipeline simply passes its data through to
  3369 +the next pipeline in the chain but can return the number of bytes passed
  3370 +through it at any intermediate point. The discard pipeline is an end of
  3371 +line pipeline that just throws its data away. The hint stream is not
  3372 +written and dummy values with adequate padding are stored in the first
  3373 +cross reference table, linearization parameter dictionary, and /Prev key
  3374 +of the first trailer dictionary. All the offset, length, object
  3375 +renumbering information, and anything else we need for the second pass
  3376 +is stored.
  3377 +
  3378 +At the end of the first pass, this information is passed to the ``QPDF``
  3379 +class which constructs a compressed hint stream in a memory buffer and
  3380 +returns it. ``QPDFWriter`` uses this information to write a complete
  3381 +hint stream object into a memory buffer. At this point, the length of
  3382 +the hint stream is known.
  3383 +
  3384 +In the second pass, the end of the pipeline chain is a regular file
  3385 +instead of a discard pipeline, and we have known values for all the
  3386 +offsets and lengths that we didn't have in the first pass. We have to
  3387 +adjust offsets that appear after the start of the hint stream by the
  3388 +length of the hint stream, which is known. Anything that is of variable
  3389 +length is padded, with the padding code surrounding any writing code
  3390 +that differs in the two passes. This ensures that changes to the way
  3391 +things are represented never results in offsets that were gathered
  3392 +during the first pass becoming incorrect for the second pass.
  3393 +
  3394 +Using this strategy, we can write linearized files to a non-seekable
  3395 +output stream with only a single pass to disk or wherever the output is
  3396 +going.
  3397 +
  3398 +.. _ref.linearization-data:
  3399 +
  3400 +Calculating Linearization Data
  3401 +------------------------------
  3402 +
  3403 +Once a file is optimized, we have information about which objects access
  3404 +which other objects. We can then process these tables to decide which
  3405 +part (as described in "Linearized PDF Document Structure" in the PDF
  3406 +specification) each object is contained within. This tells us the exact
  3407 +order in which objects are written. The ``QPDFWriter`` class asks for
  3408 +this information and enqueues objects for writing in the proper order.
  3409 +It also turns on a check that causes an exception to be thrown if an
  3410 +object is encountered that has not already been queued. (This could
  3411 +happen only if there were a bug in the traversal code used to calculate
  3412 +the linearization data.)
  3413 +
  3414 +.. _ref.linearization-issues:
  3415 +
  3416 +Known Issues with Linearization
  3417 +-------------------------------
  3418 +
  3419 +There are a handful of known issues with this linearization code. These
  3420 +issues do not appear to impact the behavior of linearized files which
  3421 +still work as intended: it is possible for a web browser to begin to
  3422 +display them before they are fully downloaded. In fact, it seems that
  3423 +various other programs that create linearized files have many of these
  3424 +same issues. These items make reference to terminology used in the
  3425 +linearization appendix of the PDF specification.
  3426 +
  3427 +- Thread Dictionary information keys appear in part 4 with the rest of
  3428 + Threads instead of in part 9. Objects in part 9 are not grouped
  3429 + together functionally.
  3430 +
  3431 +- We are not calculating numerators for shared object positions within
  3432 + content streams or interleaving them within content streams.
  3433 +
  3434 +- We generate only page offset, shared object, and outline hint tables.
  3435 + It would be relatively easy to add some additional tables. We gather
  3436 + most of the information needed to create thumbnail hint tables. There
  3437 + are comments in the code about this.
  3438 +
  3439 +.. _ref.linearization-debugging:
  3440 +
  3441 +Debugging Note
  3442 +--------------
  3443 +
  3444 +The @1@command@1@qpdf --show-linearization@2@command@2@ command can show
  3445 +the complete contents of linearization hint streams. To look at the raw
  3446 +data, you can extract the filtered contents of the linearization hint
  3447 +tables using @1@command@1@qpdf --show-object=n
  3448 +--filtered-stream-data@2@command@2@. Then, to convert this into a bit
  3449 +stream (since linearization tables are bit streams written without
  3450 +regard to byte boundaries), you can pipe the resulting data through the
  3451 +following perl code:
  3452 +
  3453 +::
  3454 +
  3455 + use bytes;
  3456 + binmode STDIN;
  3457 + undef $/;
  3458 + my $a = <STDIN>;
  3459 + my @ch = split(//, $a);
  3460 + map { printf("%08b", ord($_)) } @ch;
  3461 + print "\n";
  3462 +
  3463 +.. _ref.object-and-xref-streams:
  3464 +
  3465 +Object and Cross-Reference Streams
  3466 +==================================
  3467 +
  3468 +This chapter provides information about the implementation of object
  3469 +stream and cross-reference stream support in qpdf.
  3470 +
  3471 +.. _ref.object-streams:
  3472 +
  3473 +Object Streams
  3474 +--------------
  3475 +
  3476 +Object streams can contain any regular object except the following:
  3477 +
  3478 +- stream objects
  3479 +
  3480 +- objects with generation > 0
  3481 +
  3482 +- the encryption dictionary
  3483 +
  3484 +- objects containing the /Length of another stream
  3485 +
  3486 +In addition, Adobe reader (at least as of version 8.0.0) appears to not
  3487 +be able to handle having the document catalog appear in an object stream
  3488 +if the file is encrypted, though this is not specifically disallowed by
  3489 +the specification.
  3490 +
  3491 +There are additional restrictions for linearized files. See
  3492 +`Implications for Linearized
  3493 +Files <#ref.object-streams-linearization>`__\ for details.
  3494 +
  3495 +The PDF specification refers to objects in object streams as "compressed
  3496 +objects" regardless of whether the object stream is compressed.
  3497 +
  3498 +The generation number of every object in an object stream must be zero.
  3499 +It is possible to delete and replace an object in an object stream with
  3500 +a regular object.
  3501 +
  3502 +The object stream dictionary has the following keys:
  3503 +
  3504 +- ``/N``: number of objects
  3505 +
  3506 +- ``/First``: byte offset of first object
  3507 +
  3508 +- ``/Extends``: indirect reference to stream that this extends
  3509 +
  3510 +Stream collections are formed with ``/Extends``. They must form a
  3511 +directed acyclic graph. These can be used for semantic information and
  3512 +are not meaningful to the PDF document's syntactic structure. Although
  3513 +qpdf preserves stream collections, it never generates them and doesn't
  3514 +make use of this information in any way.
  3515 +
  3516 +The specification recommends limiting the number of objects in object
  3517 +stream for efficiency in reading and decoding. Acrobat 6 uses no more
  3518 +than 100 objects per object stream for linearized files and no more 200
  3519 +objects per stream for non-linearized files. ``QPDFWriter``, in object
  3520 +stream generation mode, never puts more than 100 objects in an object
  3521 +stream.
  3522 +
  3523 +Object stream contents consists of *N* pairs of integers, each of which
  3524 +is the object number and the byte offset of the object relative to the
  3525 +first object in the stream, followed by the objects themselves,
  3526 +concatenated.
  3527 +
  3528 +.. _ref.xref-streams:
  3529 +
  3530 +Cross-Reference Streams
  3531 +-----------------------
  3532 +
  3533 +For non-hybrid files, the value following ``startxref`` is the byte
  3534 +offset to the xref stream rather than the word ``xref``.
  3535 +
  3536 +For hybrid files (files containing both xref tables and cross-reference
  3537 +streams), the xref table's trailer dictionary contains the key
  3538 +``/XRefStm`` whose value is the byte offset to a cross-reference stream
  3539 +that supplements the xref table. A PDF 1.5-compliant application should
  3540 +read the xref table first. Then it should replace any object that it has
  3541 +already seen with any defined in the xref stream. Then it should follow
  3542 +any ``/Prev`` pointer in the original xref table's trailer dictionary.
  3543 +The specification is not clear about what should be done, if anything,
  3544 +with a ``/Prev`` pointer in the xref stream referenced by an xref table.
  3545 +The ``QPDF`` class ignores it, which is probably reasonable since, if
  3546 +this case were to appear for any sensible PDF file, the previous xref
  3547 +table would probably have a corresponding ``/XRefStm`` pointer of its
  3548 +own. For example, if a hybrid file were appended, the appended section
  3549 +would have its own xref table and ``/XRefStm``. The appended xref table
  3550 +would point to the previous xref table which would point the
  3551 +``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to
  3552 +it.
  3553 +
  3554 +Since xref streams must be read very early, they may not be encrypted,
  3555 +and the may not contain indirect objects for keys required to read them,
  3556 +which are these:
  3557 +
  3558 +- ``/Type``: value ``/XRef``
  3559 +
  3560 +- ``/Size``: value *n+1*: where *n* is highest object number (same as
  3561 + ``/Size`` in the trailer dictionary)
  3562 +
  3563 +- ``/Index`` (optional): value
  3564 + ``[@1@replaceable@1@n count@2@replaceable@2@ ...]`` used to determine
  3565 + which objects' information is stored in this stream. The default is
  3566 + ``[0 /Size]``.
  3567 +
  3568 +- ``/Prev``: value @1@replaceable@1@offset@2@replaceable@2@: byte
  3569 + offset of previous xref stream (same as ``/Prev`` in the trailer
  3570 + dictionary)
  3571 +
  3572 +- ``/W [...]``: sizes of each field in the xref table
  3573 +
  3574 +The other fields in the xref stream, which may be indirect if desired,
  3575 +are the union of those from the xref table's trailer dictionary.
  3576 +
  3577 +.. _ref.xref-stream-data:
  3578 +
  3579 +Cross-Reference Stream Data
  3580 +~~~~~~~~~~~~~~~~~~~~~~~~~~~
  3581 +
  3582 +The stream data is binary and encoded in big-endian byte order. Entries
  3583 +are concatenated, and each entry has a length equal to the total of the
  3584 +entries in ``/W`` above. Each entry consists of one or more fields, the
  3585 +first of which is the type of the field. The number of bytes for each
  3586 +field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
  3587 +is omitted and has the default value. The default value for the field
  3588 +type is "``1``". All other default values are "``0``".
  3589 +
  3590 +PDF 1.5 has three field types:
  3591 +
  3592 +- 0: for free objects. Format: ``0 obj
  3593 + next-generation``, same as the free table in a traditional
  3594 + cross-reference table
  3595 +
  3596 +- 1: regular non-compressed object. Format: ``1 offset
  3597 + generation``
  3598 +
  3599 +- 2: for objects in object streams. Format: ``2
  3600 + object-stream-number index``, the number of object stream
  3601 + containing the object and the index within the object stream of the
  3602 + object.
  3603 +
  3604 +It seems standard to have the first entry in the table be ``0 0 0``
  3605 +instead of ``0 0 ffff`` if there are no deleted objects.
  3606 +
  3607 +.. _ref.object-streams-linearization:
  3608 +
  3609 +Implications for Linearized Files
  3610 +---------------------------------
  3611 +
  3612 +For linearized files, the linearization dictionary, document catalog,
  3613 +and page objects may not be contained in object streams.
  3614 +
  3615 +Objects stored within object streams are given the highest range of
  3616 +object numbers within the main and first-page cross-reference sections.
  3617 +
  3618 +It is okay to use cross-reference streams in place of regular xref
  3619 +tables. There are on special considerations.
  3620 +
  3621 +Hint data refers to object streams themselves, not the objects in the
  3622 +streams. Shared object references should also be made to the object
  3623 +streams. There are no reference in any hint tables to the object numbers
  3624 +of compressed objects (objects within object streams).
  3625 +
  3626 +When numbering objects, all shared objects within both the first and
  3627 +second halves of the linearized files must be numbered consecutively
  3628 +after all normal uncompressed objects in that half.
  3629 +
  3630 +.. _ref.object-stream-implementation:
  3631 +
  3632 +Implementation Notes
  3633 +--------------------
  3634 +
  3635 +There are three modes for writing object streams:
  3636 +@1@option@1@disable@2@option@2@, @1@option@1@preserve@2@option@2@, and
  3637 +@1@option@1@generate@2@option@2@. In disable mode, we do not generate
  3638 +any object streams, and we also generate an xref table rather than xref
  3639 +streams. This can be used to generate PDF files that are viewable with
  3640 +older readers. In preserve mode, we write object streams such that
  3641 +written object streams contain the same objects and ``/Extends``
  3642 +relationships as in the original file. This is equal to disable if the
  3643 +file has no object streams. In generate, we create object streams
  3644 +ourselves by grouping objects that are allowed in object streams
  3645 +together in sets of no more than 100 objects. We also ensure that the
  3646 +PDF version is at least 1.5 in generate mode, but we preserve the
  3647 +version header in the other modes. The default is
  3648 +@1@option@1@preserve@2@option@2@.
  3649 +
  3650 +We do not support creation of hybrid files. When we write files, even in
  3651 +preserve mode, we will lose any xref tables and merge any appended
  3652 +sections.
  3653 +
  3654 +.. _ref.release-notes:
  3655 +
  3656 +Release Notes
  3657 +=============
  3658 +
  3659 +For a detailed list of changes, please see the file
  3660 +@1@filename@1@ChangeLog@2@filename@2@ in the source distribution.
  3661 +
  3662 +10.5.0: XXX Month dd, YYYY
  3663 + - Library Enhancements
  3664 +
  3665 + - Since qpdf version 8, using object accessor methods on an
  3666 + instance of ``QPDFObjectHandle`` may create warnings if the
  3667 + object is not of the expected type. These warnings now have an
  3668 + error code of ``qpdf_e_object`` instead of
  3669 + ``qpdf_e_damaged_pdf``. Also, comments have been added to
  3670 + @1@filename@1@QPDFObjectHandle.hh@2@filename@2@ to explain in
  3671 + more detail what the behavior is. See `Object Accessor
  3672 + Methods <#ref.object-accessors>`__ for a more in-depth
  3673 + discussion.
  3674 +
  3675 + - Overhaul error handling for the object handle functions in the
  3676 + C API. See comments in the "Object handling" section of
  3677 + @1@filename@1@include/qpdf/qpdf-c.h@2@filename@2@ for details.
  3678 + In particular, exceptions thrown by the underlying C++ code
  3679 + when calling object accessors are caught and converted into
  3680 + errors. The errors can be trapped by registering an error
  3681 + handler with ``qpdf_register_oh_error_handler`` or will be
  3682 + written to stderr if no handler is registered.
  3683 +
  3684 + - Add ``qpdf_get_last_string_length`` to the C API to get the
  3685 + length of the last string that was returned. This is needed to
  3686 + handle strings that contain embedded null characters.
  3687 +
  3688 + - Add ``qpdf_oh_is_initialized`` and
  3689 + ``qpdf_oh_new_uninitialized`` to the C API to make it possible
  3690 + to work with uninitialized objects.
  3691 +
  3692 + - Add ``qpdf_oh_new_object`` to the C API. This allows you to
  3693 + clone an object handle.
  3694 +
  3695 + - Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``,
  3696 + and ``qpdf_replace_object``, exposing the corresponding methods
  3697 + in ``QPDF`` and ``QPDFObjectHandle``.
  3698 +
  3699 +10.4.0: November 16, 2021
  3700 + - Handling of Weak Cryptography Algorithms
  3701 +
  3702 + - From the qpdf CLI, the
  3703 + @1@option@1@--allow-weak-crypto@2@option@2@ is now required to
  3704 + suppress a warning when explicitly creating PDF files using RC4
  3705 + encryption. While qpdf will always retain the ability to read
  3706 + and write such files, doing so will require explicit
  3707 + acknowledgment moving forward. For qpdf 10.4, this change only
  3708 + affects the command-line tool. Starting in qpdf 11, there will
  3709 + be small API changes to require explicit acknowledgment in
  3710 + those cases as well. For additional information, see `Weak
  3711 + Cryptography <#ref.weak-crypto>`__.
  3712 +
  3713 + - Bug Fixes
  3714 +
  3715 + - Fix potential bounds error when handling shell completion that
  3716 + could occur when given bogus input.
  3717 +
  3718 + - Properly handle overlay/underlay on completely empty pages
  3719 + (with no resource dictionary).
  3720 +
  3721 + - Fix crash that could occur under certain conditions when using
  3722 + @1@option@1@--pages@2@option@2@ with files that had form
  3723 + fields.
  3724 +
  3725 + - Library Enhancements
  3726 +
  3727 + - Make ``QPDF::findPage`` functions public.
  3728 +
  3729 + - Add methods to ``Pl_Flate`` to be able to receive warnings on
  3730 + certain recoverable conditions.
  3731 +
  3732 + - Add an extra check to the library to detect when foreign
  3733 + objects are inserted directly (instead of using
  3734 + ``QPDF::copyForeignObject``) at the time of insertion rather
  3735 + than when the file is written. Catching the error sooner makes
  3736 + it much easier to locate the incorrect code.
  3737 +
  3738 + - CLI Enhancements
  3739 +
  3740 + - Improve diagnostics around parsing
  3741 + @1@option@1@--pages@2@option@2@ command-line options
  3742 +
  3743 + - Packaging Changes
  3744 +
  3745 + - The Windows binary distribution is now built with crypto
  3746 + provided by OpenSSL 3.0.
  3747 +
  3748 +10.3.2: May 8, 2021
  3749 + - Bug Fixes
  3750 +
  3751 + - When generating a file while preserving object streams,
  3752 + unreferenced objects are correctly removed unless
  3753 + @1@option@1@--preserve-unreferenced@2@option@2@ is specified.
  3754 +
  3755 + - Library Enhancements
  3756 +
  3757 + - When adding a page that already exists, make a shallow copy
  3758 + instead of throwing an exception. This makes the library
  3759 + behavior consistent with the CLI behavior. See
  3760 + @1@filename@1@ChangeLog@2@filename@2@ for additional notes.
  3761 +
  3762 +10.3.1: March 11, 2021
  3763 + - Bug Fixes
  3764 +
  3765 + - Form field copying failed on files where /DR was a direct
  3766 + object in the document-level form dictionary.
  3767 +
  3768 +10.3.0: March 4, 2021
  3769 + - Bug Fixes
  3770 +
  3771 + - The code for handling form fields when copying pages from
  3772 + 10.2.0 was not quite right and didn't work in a number of
  3773 + situations, such as when the same page was copied multiple
  3774 + times or when there were conflicting resource or field names
  3775 + across multiple copies. The 10.3.0 code has been much more
  3776 + thoroughly tested with more complex cases and with a multitude
  3777 + of readers and should be much closer to correct. The 10.2.0
  3778 + code worked well enough for page splitting or for copying pages
  3779 + with form fields into documents that didn't already have them
  3780 + but was still not quite correct in handling of field-level
  3781 + resources.
  3782 +
  3783 + - When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is
  3784 + called, existing ``QPDFObjectHandle`` instances no longer point
  3785 + to the old objects. The next time they are accessed, they
  3786 + automatically notice the change to the underlying object and
  3787 + update themselves. This resolves a very longstanding source of
  3788 + confusion, albeit in a very rarely used method call.
  3789 +
  3790 + - Fix form field handling code to look for default appearances,
  3791 + quadding, and default resources in the right places. The code
  3792 + was not looking for things in the document-level interactive
  3793 + form dictionary that it was supposed to be finding there. This
  3794 + required adding a few new methods to
  3795 + ``QPDFFormFieldObjectHelper``.
  3796 +
  3797 + - Library Enhancements
  3798 +
  3799 + - Reworked the code that handles copying annotations and form
  3800 + fields during page operations. There were additional methods
  3801 + added to the public API from 10.2.0 and a one deprecation of a
  3802 + method added in 10.2.0. The majority of the API changes are in
  3803 + methods most people would never call and that will hopefully be
  3804 + superseded by higher-level interfaces for handling page copies.
  3805 + Please see the @1@filename@1@ChangeLog@2@filename@2@ file for
  3806 + details.
  3807 +
  3808 + - The method ``QPDF::numWarnings`` was added so that you can tell
  3809 + whether any warnings happened during a specific block of code.
  3810 +
  3811 +10.2.0: February 23, 2021
  3812 + - CLI Behavior Changes
  3813 +
  3814 + - Operations that work on combining pages are much better about
  3815 + protecting form fields. In particular,
  3816 + @1@option@1@--split-pages@2@option@2@ and
  3817 + @1@option@1@--pages@2@option@2@ now preserve interaction form
  3818 + functionality by copying the relevant form field information
  3819 + from the original files. Additionally, if you use
  3820 + @1@option@1@--pages@2@option@2@ to select only some pages from
  3821 + the original input file, unused form fields are removed, which
  3822 + prevents lots of unused annotations from being retained.
  3823 +
  3824 + - By default, @1@command@1@qpdf@2@command@2@ no longer allows
  3825 + creation of encrypted PDF files whose user password is
  3826 + non-empty and owner password is empty when a 256-bit key is in
  3827 + use. The @1@option@1@--allow-insecure@2@option@2@ option,
  3828 + specified inside the @1@option@1@--encrypt@2@option@2@ options,
  3829 + allows creation of such files. Behavior changes in the CLI are
  3830 + avoided when possible, but an exception was made here because
  3831 + this is security-related. qpdf must always allow creation of
  3832 + weird files for testing purposes, but it should not default to
  3833 + letting users unknowingly create insecure files.
  3834 +
  3835 + - Library Behavior Changes
  3836 +
  3837 + - Note: the changes in this section cause differences in output
  3838 + in some cases. These differences change the syntax of the PDF
  3839 + but do not change the semantics (meaning). I make a strong
  3840 + effort to avoid gratuitous changes in qpdf's output so that
  3841 + qpdf changes don't break people's tests. In this case, the
  3842 + changes significantly improve the readability of the generated
  3843 + PDF and don't affect any output that's generated by simple
  3844 + transformation. If you are annoyed by having to update test
  3845 + files, please rest assured that changes like this have been and
  3846 + will continue to be rare events.
  3847 +
  3848 + - ``QPDFObjectHandle::newUnicodeString`` now uses whichever of
  3849 + ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all
  3850 + the characters in the string. This reduces needless encoding in
  3851 + UTF-16 of strings that can be encoded in ASCII. This change may
  3852 + cause qpdf to generate different output than before when form
  3853 + field values are set using ``QPDFFormFieldObjectHelper`` but
  3854 + does not change the meaning of the output.
  3855 +
  3856 + - The code that places form XObjects and also the code that
  3857 + flattens rotations trim trailing zeroes from real numbers that
  3858 + they calculate. This causes slight (but semantically
  3859 + equivalent) differences in generated appearance streams and
  3860 + form XObject invocations in overlay/underlay code or in user
  3861 + code that calls the methods that place form XObjects on a page.
  3862 +
  3863 + - CLI Enhancements
  3864 +
  3865 + - Add new command line options for listing, saving, adding,
  3866 + removing, and and copying file attachments. See `Embedded
  3867 + Files/Attachments Options <#ref.attachments>`__ for details.
  3868 +
  3869 + - Page splitting and merging operations, as well as
  3870 + @1@option@1@--flatten-rotation@2@option@2@, are better behaved
  3871 + with respect to annotations and interactive form fields. In
  3872 + most cases, interactive form field functionality and proper
  3873 + formatting and functionality of annotations is preserved by
  3874 + these operations. There are still some cases that aren't
  3875 + perfect, such as when functionality of annotations depends on
  3876 + document-level data that qpdf doesn't yet understand or when
  3877 + there are problems with referential integrity among form fields
  3878 + and annotations (e.g., when a single form field object or its
  3879 + associated annotations are shared across multiple pages, a case
  3880 + that is out of spec but that works in most viewers anyway).
  3881 +
  3882 + - The option
  3883 + @1@option@1@--password-file=@1@replaceable@1@filename@2@replaceable@2@@2@option@2@
  3884 + can now be used to read the decryption password from a file.
  3885 + You can use ``-`` as the file name to read the password from
  3886 + standard input. This is an easier/more obvious way to read
  3887 + passwords from files or standard input than using
  3888 + @1@option@1@@file@2@option@2@ for this purpose.
  3889 +
  3890 + - Add some information about attachments to the json output, and
  3891 + added ``attachments`` as an additional json key. The
  3892 + information included here is limited to the preferred name and
  3893 + content stream and a reference to the file spec object. This is
  3894 + enough detail for clients to avoid the hassle of navigating a
  3895 + name tree and provides what is needed for basic enumeration and
  3896 + extraction of attachments. More detailed information can be
  3897 + obtained by following the reference to the file spec object.
  3898 +
  3899 + - Add numeric option to @1@option@1@--collate@2@option@2@. If
  3900 + @1@option@1@--collate=@1@replaceable@1@n@2@replaceable@2@@2@option@2@
  3901 + is given, take pages in groups of
  3902 + @1@replaceable@1@n@2@replaceable@2@ from the given files.
  3903 +
  3904 + - It is now valid to provide @1@option@1@--rotate=0@2@option@2@
  3905 + to clear rotation from a page.
  3906 +
  3907 + - Library Enhancements
  3908 +
  3909 + - This release includes numerous additions to the API. Not all
  3910 + changes are listed here. Please see the
  3911 + @1@filename@1@ChangeLog@2@filename@2@ file in the source
  3912 + distribution for a comprehensive list. Highlights appear below.
  3913 +
  3914 + - Add ``QPDFObjectHandle::ditems()`` and
  3915 + ``QPDFObjectHandle::aitems()`` that enable C++-style iteration,
  3916 + including range-for iteration, over dictionary and array
  3917 + QPDFObjectHandles. See comments in
  3918 + @1@filename@1@include/qpdf/QPDFObjectHandle.hh@2@filename@2@
  3919 + and
  3920 + @1@filename@1@examples/pdf-name-number-tree.cc@2@filename@2@
  3921 + for details.
  3922 +
  3923 + - Add ``QPDFObjectHandle::copyStream`` for making a copy of a
  3924 + stream within the same ``QPDF`` instance.
  3925 +
  3926 + - Add new helper classes for supporting file attachments, also
  3927 + known as embedded files. New classes are
  3928 + ``QPDFEmbeddedFileDocumentHelper``,
  3929 + ``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``.
  3930 + See their respective headers for details and
  3931 + @1@filename@1@examples/pdf-attach-file.cc@2@filename@2@ for an
  3932 + example.
  3933 +
  3934 + - Add a version of ``QPDFObjectHandle::parse`` that takes a
  3935 + ``QPDF`` pointer as context so that it can parse strings
  3936 + containing indirect object references. This is illustrated in
  3937 + @1@filename@1@examples/pdf-attach-file.cc@2@filename@2@.
  3938 +
  3939 + - Re-implement ``QPDFNameTreeObjectHelper`` and
  3940 + ``QPDFNumberTreeObjectHelper`` to be more efficient, add an
  3941 + iterator-based API, give them the capability to repair broken
  3942 + trees, and create methods for modifying the trees. With this
  3943 + change, qpdf has a robust read/write implementation of name and
  3944 + number trees.
  3945 +
  3946 + - Add new versions of ``QPDFObjectHandle::replaceStreamData``
  3947 + that take ``std::function`` objects for cases when you need
  3948 + something between a static string and a full-fledged
  3949 + StreamDataProvider. Using this with ``QUtil::file_provider`` is
  3950 + a very easy way to create a stream from the contents of a file.
  3951 +
  3952 + - The ``QPDFMatrix`` class, formerly a private, internal class,
  3953 + has been added to the public API. See
  3954 + @1@filename@1@include/qpdf/QPDFMatrix.hh@2@filename@2@ for
  3955 + details. This class is for working with transformation
  3956 + matrices. Some methods in ``QPDFPageObjectHelper`` make use of
  3957 + this to make information about transformation matrices
  3958 + available. For an example, see
  3959 + @1@filename@1@examples/pdf-overlay-page.cc@2@filename@2@.
  3960 +
  3961 + - Several new methods were added to
  3962 + ``QPDFAcroFormDocumentHelper`` for adding, removing, getting
  3963 + information about, and enumerating form fields.
  3964 +
  3965 + - Add method
  3966 + ``QPDFAcroFormDocumentHelper::transformAnnotations``, which
  3967 + applies a transformation to each annotation on a page.
  3968 +
  3969 + - Add ``QPDFPageObjectHelper::copyAnnotations``, which copies
  3970 + annotations and, if applicable, associated form fields, from
  3971 + one page to another, possibly transforming the rectangles.
  3972 +
  3973 + - Build Changes
  3974 +
  3975 + - A C++-14 compiler is now required to build qpdf. There is no
  3976 + intention to require anything newer than that for a while.
  3977 + C++-14 includes modest enhancements to C++-11 and appears to be
  3978 + supported about as widely as C++-11.
  3979 +
  3980 + - Bug Fixes
  3981 +
  3982 + - The @1@option@1@--flatten-rotation@2@option@2@ option applies
  3983 + transformations to any annotations that may be on the page.
  3984 +
  3985 + - If a form XObject lacks a resources dictionary, consider any
  3986 + names in that form XObject to be referenced from the containing
  3987 + page. This is compliant with older PDF versions. Also detect if
  3988 + any form XObjects have any unresolved names and, if so, don't
  3989 + remove unreferenced resources from them or from the page that
  3990 + contains them. Unfortunately this has the side effect of
  3991 + preventing removal of unreferenced resources in some cases
  3992 + where names appear that don't refer to resources, such as with
  3993 + tagged PDF. This is a bit of a corner case that is not likely
  3994 + to cause a significant problem in practice, but the only side
  3995 + effect would be lack of removal of shared resources. A future
  3996 + version of qpdf may be more sophisticated in its detection of
  3997 + names that refer to resources.
  3998 +
  3999 + - Properly handle strings if they appear in inline image
  4000 + dictionaries while externalizing inline images.
  4001 +
  4002 +10.1.0: January 5, 2021
  4003 + - CLI Enhancements
  4004 +
  4005 + - Add @1@option@1@--flatten-rotation@2@option@2@ command-line
  4006 + option, which causes all pages that are rotated using
  4007 + parameters in the page's dictionary to instead be identically
  4008 + rotated in the page's contents. The change is not user-visible
  4009 + for compliant PDF readers but can be used to work around broken
  4010 + PDF applications that don't properly handle page rotation.
  4011 +
  4012 + - Library Enhancements
  4013 +
  4014 + - Support for user-provided (pluggable, modular) stream filters.
  4015 + It is now possible to derive a class from ``QPDFStreamFilter``
  4016 + and register it with ``QPDF`` so that regular library methods,
  4017 + including those used by ``QPDFWriter``, can decode streams with
  4018 + filters not directly supported by the library. The example
  4019 + @1@filename@1@examples/pdf-custom-filter.cc@2@filename@2@
  4020 + illustrates how to use this capability.
  4021 +
  4022 + - Add methods to ``QPDFPageObjectHelper`` to iterate through
  4023 + XObjects on a page or form XObjects, possibly recursing into
  4024 + nested form XObjects: ``forEachXObject``, ``ForEachImage``,
  4025 + ``forEachFormXObject``.
  4026 +
  4027 + - Enhance several methods in ``QPDFPageObjectHelper`` to work
  4028 + with form XObjects as well as pages, as noted in comments. See
  4029 + @1@filename@1@ChangeLog@2@filename@2@ for a full list.
  4030 +
  4031 + - Rename some functions in ``QPDFPageObjectHelper``, while
  4032 + keeping old names for compatibility:
  4033 +
  4034 + - ``getPageImages`` to ``getImages``
  4035 +
  4036 + - ``filterPageContents`` to ``filterContents``
  4037 +
  4038 + - ``pipePageContents`` to ``pipeContents``
  4039 +
  4040 + - ``parsePageContents`` to ``parseContents``
  4041 +
  4042 + - Add method ``QPDFPageObjectHelper::getFormXObjects`` to return
  4043 + a map of form XObjects directly on a page or form XObject
  4044 +
  4045 + - Add new helper methods to ``QPDFObjectHandle``:
  4046 + ``isFormXObject``, ``isImage``
  4047 +
  4048 + - Add the optional ``allow_streams`` parameter
  4049 + ``QPDFObjectHandle::makeDirect``. When
  4050 + ``QPDFObjectHandle::makeDirect`` is called in this way, it
  4051 + preserves references to streams rather than throwing an
  4052 + exception.
  4053 +
  4054 + - Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this
  4055 + on a stream prevents ``QPDFWriter`` from attempting to
  4056 + uncompress, recompress, or otherwise filter a stream even if it
  4057 + could. Developers can use this to protect streams that are
  4058 + optimized should be protected from ``QPDFWriter``'s default
  4059 + behavior for any other reason.
  4060 +
  4061 + - Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is
  4062 + useful to have for debugging.
  4063 +
  4064 + - Add method ``QPDFPageObjectHelper::flattenRotation``, which
  4065 + replaces a page's ``/Rotate`` keyword by rotating the page
  4066 + within the content stream and altering the page's bounding
  4067 + boxes so the rendering is the same. This can be used to work
  4068 + around buggy PDF readers that can't properly handle page
  4069 + rotation.
  4070 +
  4071 + - C API Enhancements
  4072 +
  4073 + - Add several new functions to the C API for working with
  4074 + objects. These are wrappers around many of the methods in
  4075 + ``QPDFObjectHandle``. Their inclusion adds considerable new
  4076 + capability to the C API.
  4077 +
  4078 + - Add ``qpdf_register_progress_reporter`` to the C API,
  4079 + corresponding to ``QPDFWriter::registerProgressReporter``.
  4080 +
  4081 + - Performance Enhancements
  4082 +
  4083 + - Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object
  4084 + for writing, resulting in about an 8% improvement in write
  4085 + performance while allowing indirect objects to appear in
  4086 + ``/DecodeParms``.
  4087 +
  4088 + - When extracting pages, the @1@command@1@qpdf@2@command@2@ CLI
  4089 + only removes unreferenced resources from the pages that are
  4090 + being kept, resulting in a significant performance improvement
  4091 + when extracting small numbers of pages from large, complex
  4092 + documents.
  4093 +
  4094 + - Bug Fixes
  4095 +
  4096 + - ``QPDFPageObjectHelper::externalizeInlineImages`` was not
  4097 + externalizing images referenced from form XObjects that
  4098 + appeared on the page.
  4099 +
  4100 + - ``QPDFObjectHandle::filterPageContents`` was broken for pages
  4101 + with multiple content streams.
  4102 +
  4103 + - Tweak zsh completion code to behave a little better with
  4104 + respect to path completion.
  4105 +
  4106 +10.0.4: November 21, 2020
  4107 + - Bug Fixes
  4108 +
  4109 + - Fix a handful of integer overflows. This includes cases found
  4110 + by fuzzing as well as having qpdf not do range checking on
  4111 + unused values in the xref stream.
  4112 +
  4113 +10.0.3: October 31, 2020
  4114 + - Bug Fixes
  4115 +
  4116 + - The fix to the bug involving copying streams with indirect
  4117 + filters was incorrect and introduced a new, more serious bug.
  4118 + The original bug has been fixed correctly, as has the bug
  4119 + introduced in 10.0.2.
  4120 +
  4121 +10.0.2: October 27, 2020
  4122 + - Bug Fixes
  4123 +
  4124 + - When concatenating content streams, as with
  4125 + @1@option@1@--coalesce-contents@2@option@2@, there were cases
  4126 + in which qpdf would merge two lexical tokens together, creating
  4127 + invalid results. A newline is now inserted between merged
  4128 + content streams if one is not already present.
  4129 +
  4130 + - Fix an internal error that could occur when copying foreign
  4131 + streams whose stream data had been replaced using a stream data
  4132 + provider if those streams had indirect filters or decode
  4133 + parameters. This is a rare corner case.
  4134 +
  4135 + - Ensure that the caller's locale settings do not change the
  4136 + results of numeric conversions performed internally by the qpdf
  4137 + library. Note that the problem here could only be caused when
  4138 + the qpdf library was used programmatically. Using the qpdf CLI
  4139 + already ignored the user's locale for numeric conversion.
  4140 +
  4141 + - Fix several instances in which warnings were not suppressed in
  4142 + spite of @1@option@1@--no-warn@2@option@2@ and/or errors or
  4143 + warnings were written to standard output rather than standard
  4144 + error.
  4145 +
  4146 + - Fixed a memory leak that could occur under specific
  4147 + circumstances when
  4148 + @1@option@1@--object-streams=generate@2@option@2@ was used.
  4149 +
  4150 + - Fix various integer overflows and similar conditions found by
  4151 + the OSS-Fuzz project.
  4152 +
  4153 + - Enhancements
  4154 +
  4155 + - New option @1@option@1@--warning-exit-0@2@option@2@ causes qpdf
  4156 + to exit with a status of ``0`` rather than ``3`` if there are
  4157 + warnings but no errors. Combine with
  4158 + @1@option@1@--no-warn@2@option@2@ to completely ignore
  4159 + warnings.
  4160 +
  4161 + - Performance improvements have been made to
  4162 + ``QPDF::processMemoryFile``.
  4163 +
  4164 + - The OpenSSL crypto provider produces more detailed error
  4165 + messages.
  4166 +
  4167 + - Build Changes
  4168 +
  4169 + - The option @1@option@1@--disable-rpath@2@option@2@ is now
  4170 + supported by qpdf's @1@command@1@./configure@2@command@2@
  4171 + script. Some distributions' packaging standards recommended the
  4172 + use of this option.
  4173 +
  4174 + - Selection of a printf format string for ``long
  4175 + long`` has been moved from ``ifdefs`` to an autoconf
  4176 + test. If you are using your own build system, you will need to
  4177 + provide a value for ``LL_FMT`` in
  4178 + @1@filename@1@libqpdf/qpdf/qpdf-config.h@2@filename@2@, which
  4179 + would typically be ``"%lld"`` or, for some Windows compilers,
  4180 + ``"%I64d"``.
  4181 +
  4182 + - Several improvements were made to build-time configuration of
  4183 + the OpenSSL crypto provider.
  4184 +
  4185 + - A nearly stand-alone Linux binary zip file is now included with
  4186 + the qpdf release. This is built on an older (but supported)
  4187 + Ubuntu LTS release, but would work on most reasonably recent
  4188 + Linux distributions. It contains only the executables and
  4189 + required shared libraries that would not be present on a
  4190 + minimal system. It can be used for including qpdf in a minimal
  4191 + environment, such as a docker container. The zip file is also
  4192 + known to work as a layer in AWS Lambda.
  4193 +
  4194 + - QPDF's automated build has been migrated from Azure Pipelines
  4195 + to GitHub Actions.
  4196 +
  4197 + - Windows-specific Changes
  4198 +
  4199 + - The Windows executables distributed with qpdf releases now use
  4200 + the OpenSSL crypto provider by default. The native crypto
  4201 + provider is also compiled in and can be selected at runtime
  4202 + with the ``QPDF_CRYPTO_PROVIDER`` environment variable.
  4203 +
  4204 + - Improvements have been made to how a cryptographic provider is
  4205 + obtained in the native Windows crypto implementation. However
  4206 + mostly this is shadowed by OpenSSL being used by default.
  4207 +
  4208 +10.0.1: April 9, 2020
  4209 + - Bug Fixes
  4210 +
  4211 + - 10.0.0 introduced a bug in which calling
  4212 + ``QPDFObjectHandle::getStreamData`` on a stream that can't be
  4213 + filtered was returning the raw data instead of throwing an
  4214 + exception. This is now fixed.
  4215 +
  4216 + - Fix a bug that was preventing qpdf from linking with some
  4217 + versions of clang on some platforms.
  4218 +
  4219 + - Enhancements
  4220 +
  4221 + - Improve the @1@filename@1@pdf-invert-images@2@filename@2@
  4222 + example to avoid having to load all the images into RAM at the
  4223 + same time.
  4224 +
  4225 +10.0.0: April 6, 2020
  4226 + - Performance Enhancements
  4227 +
  4228 + - The qpdf library and executable should run much faster in this
  4229 + version than in the last several releases. Several internal
  4230 + library optimizations have been made, and there has been
  4231 + improved behavior on page splitting as well. This version of
  4232 + qpdf should outperform any of the 8.x or 9.x versions.
  4233 +
  4234 + - Incompatible API (source-level) Changes (minor)
  4235 +
  4236 + - The ``QUtil::srandom`` method was removed. It didn't do
  4237 + anything unless insecure random numbers were compiled in, and
  4238 + they have been off by default for a long time. If you were
  4239 + calling it, just remove the call since it wasn't doing anything
  4240 + anyway.
  4241 +
  4242 + - Build/Packaging Changes
  4243 +
  4244 + - Add a ``openssl`` crypto provider, which is implemented with
  4245 + OpenSSL and also works with BoringSSL. Thanks to Dean Scarff
  4246 + for this contribution. If you maintain qpdf for a distribution,
  4247 + pay special attention to make sure that you are including
  4248 + support for the crypto providers you want. Package maintainers
  4249 + will have to weigh the advantages of allowing users to pick a
  4250 + crypto provider at runtime against the disadvantages of adding
  4251 + more dependencies to qpdf.
  4252 +
  4253 + - Allow qpdf to built on stripped down systems whose C/C++
  4254 + libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in
  4255 + qpdf's README.md for details. This should be very rare, but it
  4256 + is known to be helpful in some embedded environments.
  4257 +
  4258 + - CLI Enhancements
  4259 +
  4260 + - Add ``objectinfo`` key to the JSON output. This will be a place
  4261 + to put computed metadata or other information about PDF objects
  4262 + that are not immediately evident in other ways or that seem
  4263 + useful for some other reason. In this version, information is
  4264 + provided about each object indicating whether it is a stream
  4265 + and, if so, what its length and filters are. Without this, it
  4266 + was not possible to tell conclusively from the JSON output
  4267 + alone whether or not an object was a stream. Run
  4268 + @1@command@1@qpdf --json-help@2@command@2@ for details.
  4269 +
  4270 + - Add new option
  4271 + @1@option@1@--remove-unreferenced-resources@2@option@2@ which
  4272 + takes ``auto``, ``yes``, or ``no`` as arguments. The new
  4273 + ``auto`` mode, which is the default, performs a fast heuristic
  4274 + over a PDF file when splitting pages to determine whether the
  4275 + expensive process of finding and removing unreferenced
  4276 + resources is likely to be of benefit. For most files, this new
  4277 + default will result in a significant performance improvement
  4278 + for splitting pages. See `Advanced Transformation
  4279 + Options <#ref.advanced-transformation>`__ for a more detailed
  4280 + discussion.
  4281 +
  4282 + - The @1@option@1@--preserve-unreferenced-resources@2@option@2@
  4283 + is now just a synonym for
  4284 + @1@option@1@--remove-unreferenced-resources=no@2@option@2@.
  4285 +
  4286 + - If the ``QPDF_EXECUTABLE`` environment variable is set when
  4287 + invoking @1@command@1@qpdf --bash-completion@2@command@2@ or
  4288 + @1@command@1@qpdf --zsh-completion@2@command@2@, the completion
  4289 + command that it outputs will refer to qpdf using the value of
  4290 + that variable rather than what @1@command@1@qpdf@2@command@2@
  4291 + determines its executable path to be. This can be useful when
  4292 + wrapping @1@command@1@qpdf@2@command@2@ with a script, working
  4293 + with a version in the source tree, using an AppImage, or other
  4294 + situations where there is some indirection.
  4295 +
  4296 + - Library Enhancements
  4297 +
  4298 + - Random number generation is now delegated to the crypto
  4299 + provider. The old behavior is still used by the native crypto
  4300 + provider. It is still possible to provide your own random
  4301 + number generator.
  4302 +
  4303 + - Add a new version of
  4304 + ``QPDFObjectHandle::StreamDataProvider::provideStreamData``
  4305 + that accepts the ``suppress_warnings`` and ``will_retry``
  4306 + options and allows a success code to be returned. This makes it
  4307 + possible to implement a ``StreamDataProvider`` that calls
  4308 + ``pipeStreamData`` on another stream and to pass the response
  4309 + back to the caller, which enables better error handling on
  4310 + those proxied streams.
  4311 +
  4312 + - Update ``QPDFObjectHandle::pipeStreamData`` to return an
  4313 + overall success code that goes beyond whether or not filtered
  4314 + data was written successfully. This allows better error
  4315 + handling of cases that were not filtering errors. You have to
  4316 + call this explicitly. Methods in previously existing APIs have
  4317 + the same semantics as before.
  4318 +
  4319 + - The ``QPDFPageObjectHelper::placeFormXObject`` method now
  4320 + allows separate control over whether it should be willing to
  4321 + shrink or expand objects to fit them better into the
  4322 + destination rectangle. The previous behavior was that shrinking
  4323 + was allowed but expansion was not. The previous behavior is
  4324 + still the default.
  4325 +
  4326 + - When calling the C API, any non-zero value passed to a boolean
  4327 + parameter is treated as ``TRUE``. Previously only the value
  4328 + ``1`` was accepted. This makes the C API behave more like most
  4329 + C interfaces and is known to improve compatibility with some
  4330 + Windows environments that dynamically load the DLL and call
  4331 + functions from it.
  4332 +
  4333 + - Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only
  4334 + top-level dictionary keys or array items. This is unsafe
  4335 + because it creates a situation in which changing a lower-level
  4336 + item in one object may also change it in another object, but
  4337 + for cases in which you *know* you are only inserting or
  4338 + replacing top-level items, it is much faster than
  4339 + ``QPDFObjectHandle::shallowCopy``.
  4340 +
  4341 + - Add ``QPDFObjectHandle::filterAsContents``, which filter's a
  4342 + stream's data as a content stream. This is useful for parsing
  4343 + the contents for form XObjects in the same way as parsing page
  4344 + content streams.
  4345 +
  4346 + - Bug Fixes
  4347 +
  4348 + - When detecting and removing unreferenced resources during page
  4349 + splitting, traverse into form XObjects and handle their
  4350 + resources dictionaries as well.
  4351 +
  4352 + - The same error recovery is applied to streams in other than the
  4353 + primary input file when merging or splitting pages.
  4354 +
  4355 +9.1.1: January 26, 2020
  4356 + - Build/Packaging Changes
  4357 +
  4358 + - The fix-qdf program was converted from perl to C++. As such,
  4359 + qpdf no longer has a runtime dependency on perl.
  4360 +
  4361 + - Library Enhancements
  4362 +
  4363 + - Added new helper routine ``QUtil::call_main_from_wmain`` which
  4364 + converts ``wchar_t`` arguments to UTF-8 encoded strings. This
  4365 + is useful for qpdf because library methods expect file names to
  4366 + be UTF-8 encoded, even on Windows
  4367 +
  4368 + - Added new ``QUtil::read_lines_from_file`` methods that take
  4369 + ``FILE*`` arguments and that allow preservation of end-of-line
  4370 + characters. This also fixes a bug where
  4371 + ``QUtil::read_lines_from_file`` wouldn't work properly with
  4372 + Unicode filenames.
  4373 +
  4374 + - CLI Enhancements
  4375 +
  4376 + - Added options @1@option@1@--is-encrypted@2@option@2@ and
  4377 + @1@option@1@--requires-password@2@option@2@ for testing whether
  4378 + a file is encrypted or requires a password other than the
  4379 + supplied (or empty) password. These communicate via exit
  4380 + status, making them useful for shell scripts. They also work on
  4381 + encrypted files with unknown passwords.
  4382 +
  4383 + - Added ``encrypt`` key to JSON options. With the exception of
  4384 + the reconstructed user password for older encryption formats,
  4385 + this provides the same information as
  4386 + @1@option@1@--show-encryption@2@option@2@ but in a consistent,
  4387 + parseable format. See output of @1@command@1@qpdf
  4388 + --json-help@2@command@2@ for details.
  4389 +
  4390 + - Bug Fixes
  4391 +
  4392 + - In QDF mode, be sure not to write more than one XRef stream to
  4393 + a file, even when
  4394 + @1@option@1@--preserve-unreferenced@2@option@2@ is used.
  4395 + @1@command@1@fix-qdf@2@command@2@ assumes that there is only
  4396 + one XRef stream, and that it appears at the end of the file.
  4397 +
  4398 + - When externalizing inline images, properly handle images whose
  4399 + color space is a reference to an object in the page's resource
  4400 + dictionary.
  4401 +
  4402 + - Windows-specific fix for acquiring crypt context with a new
  4403 + keyset.
  4404 +
  4405 +9.1.0: November 17, 2019
  4406 + - Build Changes
  4407 +
  4408 + - A C++-11 compiler is now required to build qpdf.
  4409 +
  4410 + - A new crypto provider that uses gnutls for crypto functions is
  4411 + now available and can be enabled at build time. See `Crypto
  4412 + Providers <#ref.crypto>`__ for more information about crypto
  4413 + providers and `Build Support For Crypto
  4414 + Providers <#ref.crypto.build>`__ for specific information about
  4415 + the build.
  4416 +
  4417 + - Library Enhancements
  4418 +
  4419 + - Incorporate contribution from Masamichi Hosoda to properly
  4420 + handle signature dictionaries by not including them in object
  4421 + streams, formatting the ``Contents`` key has a hexadecimal
  4422 + string, and excluding the ``/Contents`` key from encryption and
  4423 + decryption.
  4424 +
  4425 + - Incorporate contribution from Masamichi Hosoda to provide new
  4426 + API calls for getting file-level information about input and
  4427 + output files, enabling certain operations on the files at the
  4428 + file level rather than the object level. New methods include
  4429 + ``QPDF::getXRefTable()``,
  4430 + ``QPDFObjectHandle::getParsedOffset()``,
  4431 + ``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and
  4432 + ``QPDFWriter::getWrittenXRefTable()``.
  4433 +
  4434 + - Support build-time and runtime selectable crypto providers.
  4435 + This includes the addition of new classes
  4436 + ``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the
  4437 + recognition of the ``QPDF_CRYPTO_PROVIDER`` environment
  4438 + variable. Crypto providers are described in depth in `Crypto
  4439 + Providers <#ref.crypto>`__.
  4440 +
  4441 + - CLI Enhancements
  4442 +
  4443 + - Addition of the @1@option@1@--show-crypto@2@option@2@ option in
  4444 + support of selectable crypto providers, as described in `Crypto
  4445 + Providers <#ref.crypto>`__.
  4446 +
  4447 + - Allow ``:even`` or ``:odd`` to be appended to numeric ranges
  4448 + for specification of the even or odd pages from among the pages
  4449 + specified in the range.
  4450 +
  4451 + - Fix shell wildcard expansion behavior (``*`` and ``?``) of the
  4452 + @1@command@1@qpdf.exe@2@command@2@ as built my MSVC.
  4453 +
  4454 +9.0.2: October 12, 2019
  4455 + - Bug Fix
  4456 +
  4457 + - Fix the name of the temporary file used by
  4458 + @1@option@1@--replace-input@2@option@2@ so that it doesn't
  4459 + require path splitting and works with paths include
  4460 + directories.
  4461 +
  4462 +9.0.1: September 20, 2019
  4463 + - Bug Fixes/Enhancements
  4464 +
  4465 + - Fix some build and test issues on big-endian systems and
  4466 + compilers with characters that are unsigned by default. The
  4467 + problems were in build and test only. There were no actual bugs
  4468 + in the qpdf library itself relating to endianness or unsigned
  4469 + characters.
  4470 +
  4471 + - When a dictionary has a duplicated key, report this with a
  4472 + warning. The behavior of the library in this case is unchanged,
  4473 + but the error condition is no longer silently ignored.
  4474 +
  4475 + - When a form field's display rectangle is erroneously specified
  4476 + with inverted coordinates, detect and correct this situation.
  4477 + This avoids some form fields from being flipped when flattening
  4478 + annotations on files with this condition.
  4479 +
  4480 +9.0.0: August 31, 2019
  4481 + - Incompatible API (source-level) Changes (minor)
  4482 +
  4483 + - The method ``QUtil::strcasecmp`` has been renamed to
  4484 + ``QUtil::str_compare_nocase``. This incompatible change is
  4485 + necessary to enable qpdf to build on platforms that define
  4486 + ``strcasecmp`` as a macro.
  4487 +
  4488 + - The ``QPDF::copyForeignObject`` method had an overloaded
  4489 + version that took a boolean parameter that was not used. If you
  4490 + were using this version, just omit the extra parameter.
  4491 +
  4492 + - There was a version ``QPDFTokenizer::expectInlineImage`` that
  4493 + took no arguments. This version has been removed since it
  4494 + caused the tokenizer to return incorrect inline images. A new
  4495 + version was added some time ago that produces correct output.
  4496 + This is a very low level method that doesn't make sense to call
  4497 + outside of qpdf's lexical engine. There are higher level
  4498 + methods for tokenizing content streams.
  4499 +
  4500 + - Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and
  4501 + ``QPDFOutlineObjectHelper::getKids`` to return a
  4502 + ``std::vector`` instead of a ``std::list`` of
  4503 + ``QPDFOutlineObjectHelper`` objects.
  4504 +
  4505 + - Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This
  4506 + function would allow creation of name tokens whose value would
  4507 + change when unparsed, which is never the correct behavior.
  4508 +
  4509 + - CLI Enhancements
  4510 +
  4511 + - The @1@option@1@--replace-input@2@option@2@ option may be given
  4512 + in place of an output file name. This causes qpdf to overwrite
  4513 + the input file with the output. See the description of
  4514 + @1@option@1@--replace-input@2@option@2@ in `Basic
  4515 + Options <#ref.basic-options>`__ for more details.
  4516 +
  4517 + - The @1@option@1@--recompress-flate@2@option@2@ instructs
  4518 + @1@command@1@qpdf@2@command@2@ to recompress streams that are
  4519 + already compressed with ``/FlateDecode``. Useful with
  4520 + @1@option@1@--compression-level@2@option@2@.
  4521 +
  4522 + - The
  4523 + @1@option@1@--compression-level=@1@replaceable@1@level@2@replaceable@2@@2@option@2@
  4524 + sets the zlib compression level used for any streams compressed
  4525 + by ``/FlateDecode``. Most effective when combined with
  4526 + @1@option@1@--recompress-flate@2@option@2@.
  4527 +
  4528 + - Library Enhancements
  4529 +
  4530 + - A new namespace ``QIntC``, provided by
  4531 + @1@filename@1@qpdf/QIntC.hh@2@filename@2@, provides safe
  4532 + conversion methods between different integer types. These
  4533 + conversion methods do range checking to ensure that the cast
  4534 + can be performed with no loss of information. Every use of
  4535 + ``static_cast`` in the library was inspected to see if it could
  4536 + use one of these safe converters instead. See `Casting
  4537 + Policy <#ref.casting>`__ for additional details.
  4538 +
  4539 + - Method ``QPDF::anyWarnings`` tells whether there have been any
  4540 + warnings without clearing the list of warnings.
  4541 +
  4542 + - Method ``QPDF::closeInputSource`` closes or otherwise releases
  4543 + the input source. This enables the input file to be deleted or
  4544 + renamed.
  4545 +
  4546 + - New methods have been added to ``QUtil`` for converting back
  4547 + and forth between strings and unsigned integers:
  4548 + ``uint_to_string``, ``uint_to_string_base``,
  4549 + ``string_to_uint``, and ``string_to_ull``.
  4550 +
  4551 + - New methods have been added to ``QPDFObjectHandle`` that return
  4552 + the value of ``Integer`` objects as ``int`` or ``unsigned int``
  4553 + with range checking and sensible fallback values, and a new
  4554 + method was added to return an unsigned value. This makes it
  4555 + easier to write code that is safe from unintentional data loss.
  4556 + Functions: ``getUIntValue``, ``getIntValueAsInt``,
  4557 + ``getUIntValueAsUInt``.
  4558 +
  4559 + - When parsing content streams with
  4560 + ``QPDFObjectHandle::ParserCallbacks``, in place of the method
  4561 + ``handleObject(QPDFObjectHandle)``, the developer may override
  4562 + ``handleObject(QPDFObjectHandle, size_t offset,
  4563 + size_t length)``. If this method is defined, it will
  4564 + be invoked with the object along with its offset and length
  4565 + within the overall contents being parsed. Intervening spaces
  4566 + and comments are not included in offset and length.
  4567 + Additionally, a new method ``contentSize(size_t)`` may be
  4568 + implemented. If present, it will be called prior to the first
  4569 + call to ``handleObject`` with the total size in bytes of the
  4570 + combined contents.
  4571 +
  4572 + - New methods ``QPDF::userPasswordMatched`` and
  4573 + ``QPDF::ownerPasswordMatched`` have been added to enable a
  4574 + caller to determine whether the supplied password was the user
  4575 + password, the owner password, or both. This information is also
  4576 + displayed by @1@command@1@qpdf --show-encryption@2@command@2@
  4577 + and @1@command@1@qpdf --check@2@command@2@.
  4578 +
  4579 + - Static method ``Pl_Flate::setCompressionLevel`` can be called
  4580 + to set the zlib compression level globally used by all
  4581 + instances of Pl_Flate in deflate mode.
  4582 +
  4583 + - The method ``QPDFWriter::setRecompressFlate`` can be called to
  4584 + tell ``QPDFWriter`` to uncompress and recompress streams
  4585 + already compressed with ``/FlateDecode``.
  4586 +
  4587 + - The underlying implementation of QPDF arrays has been enhanced
  4588 + to be much more memory efficient when dealing with arrays with
  4589 + lots of nulls. This enables qpdf to use drastically less memory
  4590 + for certain types of files.
  4591 +
  4592 + - When traversing the pages tree, if nodes are encountered with
  4593 + invalid types, the types are fixed, and a warning is issued.
  4594 +
  4595 + - A new helper method ``QUtil::read_file_into_memory`` was added.
  4596 +
  4597 + - All conditions previously reported by
  4598 + ``QPDF::checkLinearization()`` as errors are now presented as
  4599 + warnings.
  4600 +
  4601 + - Name tokens containing the ``#`` character not preceded by two
  4602 + hexadecimal digits, which is invalid in PDF 1.2 and above, are
  4603 + properly handled by the library: a warning is generated, and
  4604 + the name token is properly preserved, even if invalid, in the
  4605 + output. See @1@filename@1@ChangeLog@2@filename@2@ for a more
  4606 + complete description of this change.
  4607 +
  4608 + - Bug Fixes
  4609 +
  4610 + - A small handful of memory issues, assertion failures, and
  4611 + unhandled exceptions that could occur on badly mangled input
  4612 + files have been fixed. Most of these problems were found by
  4613 + Google's OSS-Fuzz project.
  4614 +
  4615 + - When @1@command@1@qpdf --check@2@command@2@ or
  4616 + @1@command@1@qpdf --check-linearization@2@command@2@ encounters
  4617 + a file with linearization warnings but not errors, it now
  4618 + properly exits with exit code 3 instead of 2.
  4619 +
  4620 + - The @1@option@1@--completion-bash@2@option@2@ and
  4621 + @1@option@1@--completion-zsh@2@option@2@ options now work
  4622 + properly when qpdf is invoked as an AppImage.
  4623 +
  4624 + - Calling ``QPDFWriter::set*EncryptionParameters`` on a
  4625 + ``QPDFWriter`` object whose output filename has not yet been
  4626 + set no longer produces a segmentation fault.
  4627 +
  4628 + - When reading encrypted files, follow the spec more closely
  4629 + regarding encryption key length. This allows qpdf to open
  4630 + encrypted files in most cases when they have invalid or missing
  4631 + /Length keys in the encryption dictionary.
  4632 +
  4633 + - Build Changes
  4634 +
  4635 + - On platforms that support it, qpdf now builds with
  4636 + @1@option@1@-fvisibility=hidden@2@option@2@. If you build qpdf
  4637 + with your own build system, this is now safe to use. This
  4638 + prevents methods that are not part of the public API from being
  4639 + exported by the shared library, and makes qpdf's ELF shared
  4640 + libraries (used on Linux, MacOS, and most other UNIX flavors)
  4641 + behave more like the Windows DLL. Since the DLL already behaves
  4642 + in much this way, it is unlikely that there are any methods
  4643 + that were accidentally not exported. However, with ELF shared
  4644 + libraries, typeinfo for some classes has to be explicitly
  4645 + exported. If there are problems in dynamically linked code
  4646 + catching exceptions or subclassing, this could be the reason.
  4647 + If you see this, please report a bug at
  4648 + https://github.com/qpdf/qpdf/issues/.
  4649 +
  4650 + - QPDF is now compiled with integer conversion and sign
  4651 + conversion warnings enabled. Numerous changes were made to the
  4652 + library to make this safe.
  4653 +
  4654 + - QPDF's @1@command@1@make install@2@command@2@ target explicitly
  4655 + specifies the mode to use when installing files instead of
  4656 + relying the user's umask. It was previously doing this for some
  4657 + files but not others.
  4658 +
  4659 + - If @1@command@1@pkg-config@2@command@2@ is available, use it to
  4660 + locate @1@filename@1@libjpeg@2@filename@2@ and
  4661 + @1@filename@1@zlib@2@filename@2@ dependencies, falling back on
  4662 + old behavior if unsuccessful.
  4663 +
  4664 + - Other Notes
  4665 +
  4666 + - QPDF has been fully integrated into `Google's OSS-Fuzz
  4667 + project <https://github.com/google/oss-fuzz>`__. This project
  4668 + exercises code with randomly mutated inputs and is great for
  4669 + discovering hidden security crashes and security issues.
  4670 + Several bugs found by oss-fuzz have already been fixed in qpdf.
  4671 +
  4672 +8.4.2: May 18, 2019
  4673 + This release has just one change: correction of a buffer overrun in
  4674 + the Windows code used to open files. Windows users should take this
  4675 + update. There are no code changes that affect non-Windows releases.
  4676 +
  4677 +8.4.1: April 27, 2019
  4678 + - Enhancements
  4679 +
  4680 + - When @1@command@1@qpdf --version@2@command@2@ is run, it will
  4681 + detect if the qpdf CLI was built with a different version of
  4682 + qpdf than the library, which may indicate a problem with the
  4683 + installation.
  4684 +
  4685 + - New option @1@option@1@--remove-page-labels@2@option@2@ will
  4686 + remove page labels before generating output. This used to
  4687 + happen if you ran @1@command@1@qpdf --empty --pages ..
  4688 + --@2@command@2@, but the behavior changed in qpdf 8.3.0. This
  4689 + option enables people who were relying on the old behavior to
  4690 + get it again.
  4691 +
  4692 + - New option
  4693 + @1@option@1@--keep-files-open-threshold=@1@replaceable@1@count@2@replaceable@2@@2@option@2@
  4694 + can be used to override number of files that qpdf will use to
  4695 + trigger the behavior of not keeping all files open when merging
  4696 + files. This may be necessary if your system allows fewer than
  4697 + the default value of 200 files to be open at the same time.
  4698 +
  4699 + - Bug Fixes
  4700 +
  4701 + - Handle Unicode characters in filenames on Windows. The changes
  4702 + to support Unicode on the CLI in Windows broke Unicode
  4703 + filenames for Windows.
  4704 +
  4705 + - Slightly tighten logic that determines whether an object is a
  4706 + page. This should resolve problems in some rare files where
  4707 + some non-page objects were passing qpdf's test for whether
  4708 + something was a page, thus causing them to be erroneously lost
  4709 + during page splitting operations.
  4710 +
  4711 + - Revert change that included preservation of outlines
  4712 + (bookmarks) in @1@option@1@--split-pages@2@option@2@. The way
  4713 + it was implemented in 8.3.0 and 8.4.0 caused a very significant
  4714 + degradation of performance for splitting certain files. A
  4715 + future release of qpdf may re-introduce the behavior in a more
  4716 + performant and also more correct fashion.
  4717 +
  4718 + - In JSON mode, add missing leading 0 to decimal values between
  4719 + -1 and 1 even if not present in the input. The JSON
  4720 + specification requires the leading 0. The PDF specification
  4721 + does not.
  4722 +
  4723 +8.4.0: February 1, 2019
  4724 + - Command-line Enhancements
  4725 +
  4726 + - *Non-compatible CLI change:* The qpdf command-line tool
  4727 + interprets passwords given at the command-line differently from
  4728 + previous releases when the passwords contain non-ASCII
  4729 + characters. In some cases, the behavior differs from previous
  4730 + releases. For a discussion of the current behavior, please see
  4731 + `Unicode Passwords <#ref.unicode-passwords>`__. The
  4732 + incompatibilities are as follows:
  4733 +
  4734 + - On Windows, qpdf now receives all command-line options as
  4735 + Unicode strings if it can figure out the appropriate
  4736 + compile/link options. This is enabled at least for MSVC and
  4737 + mingw builds. That means that if non-ASCII strings are
  4738 + passed to the qpdf CLI in Windows, qpdf will now correctly
  4739 + receive them. In the past, they would have either been
  4740 + encoded as Windows code page 1252 (also known as "Windows
  4741 + ANSI" or as something unintelligible. In almost all cases,
  4742 + qpdf is able to properly interpret Unicode arguments now,
  4743 + whereas in the past, it would almost never interpret them
  4744 + properly. The result is that non-ASCII passwords given to
  4745 + the qpdf CLI on Windows now have a much greater chance of
  4746 + creating PDF files that can be opened by a variety of
  4747 + readers. In the past, usually files encrypted from the
  4748 + Windows CLI using non-ASCII passwords would not be readable
  4749 + by most viewers. Note that the current version of qpdf is
  4750 + able to decrypt files that it previously created using the
  4751 + previously supplied password.
  4752 +
  4753 + - The PDF specification requires passwords to be encoded as
  4754 + UTF-8 for 256-bit encryption and with PDF Doc encoding for
  4755 + 40-bit or 128-bit encryption. Older versions of qpdf left it
  4756 + up to the user to provide passwords with the correct
  4757 + encoding. The qpdf CLI now detects when a password is given
  4758 + with UTF-8 encoding and automatically transcodes it to what
  4759 + the PDF spec requires. While this is almost always the
  4760 + correct behavior, it is possible to override the behavior if
  4761 + there is some reason to do so. This is discussed in more
  4762 + depth in `Unicode Passwords <#ref.unicode-passwords>`__.
  4763 +
  4764 + - New options
  4765 + @1@option@1@--externalize-inline-images@2@option@2@,
  4766 + @1@option@1@--ii-min-bytes@2@option@2@, and
  4767 + @1@option@1@--keep-inline-images@2@option@2@ control qpdf's
  4768 + handling of inline images and possible conversion of them to
  4769 + regular images. By default,
  4770 + @1@option@1@--optimize-images@2@option@2@ now also applies to
  4771 + inline images. These options are discussed in `Advanced
  4772 + Transformation Options <#ref.advanced-transformation>`__.
  4773 +
  4774 + - Add options @1@option@1@--overlay@2@option@2@ and
  4775 + @1@option@1@--underlay@2@option@2@ for overlaying or
  4776 + underlaying pages of other files onto output pages. See
  4777 + `Overlay and Underlay Options <#ref.overlay-underlay>`__ for
  4778 + details.
  4779 +
  4780 + - When opening an encrypted file with a password, if the
  4781 + specified password doesn't work and the password contains any
  4782 + non-ASCII characters, qpdf will try a number of alternative
  4783 + passwords to try to compensate for possible character encoding
  4784 + errors. This behavior can be suppressed with the
  4785 + @1@option@1@--suppress-password-recovery@2@option@2@ option.
  4786 + See `Unicode Passwords <#ref.unicode-passwords>`__ for a full
  4787 + discussion.
  4788 +
  4789 + - Add the @1@option@1@--password-mode@2@option@2@ option to
  4790 + fine-tune how qpdf interprets password arguments, especially
  4791 + when they contain non-ASCII characters. See `Unicode
  4792 + Passwords <#ref.unicode-passwords>`__ for more information.
  4793 +
  4794 + - In the @1@option@1@--pages@2@option@2@ option, it is now
  4795 + possible to copy the same page more than once from the same
  4796 + file without using the previous workaround of specifying two
  4797 + different paths to the same file.
  4798 +
  4799 + - In the @1@option@1@--pages@2@option@2@ option, allow use of "."
  4800 + as a shortcut for the primary input file. That way, you can do
  4801 + @1@command@1@qpdf in.pdf --pages . 1-2 -- out.pdf@2@command@2@
  4802 + instead of having to repeat @1@filename@1@in.pdf@2@filename@2@
  4803 + in the command.
  4804 +
  4805 + - When encrypting with 128-bit and 256-bit encryption, new
  4806 + encryption options @1@option@1@--assemble@2@option@2@,
  4807 + @1@option@1@--annotate@2@option@2@,
  4808 + @1@option@1@--form@2@option@2@, and
  4809 + @1@option@1@--modify-other@2@option@2@ allow more fine-grained
  4810 + granularity in configuring options. Before, the
  4811 + @1@option@1@--modify@2@option@2@ option only configured certain
  4812 + predefined groups of permissions.
  4813 +
  4814 + - Bug Fixes and Enhancements
  4815 +
  4816 + - *Potential data-loss bug:* Versions of qpdf between 8.1.0 and
  4817 + 8.3.0 had a bug that could cause page splitting and merging
  4818 + operations to drop some font or image resources if the PDF
  4819 + file's internal structure shared these resource lists across
  4820 + pages and if some but not all of the pages in the output did
  4821 + not reference all the fonts and images. Using the
  4822 + @1@option@1@--preserve-unreferenced-resources@2@option@2@
  4823 + option would work around the incorrect behavior. This bug was
  4824 + the result of a typo in the code and a deficiency in the test
  4825 + suite. The case that triggered the error was known, just not
  4826 + handled properly. This case is now exercised in qpdf's test
  4827 + suite and properly handled.
  4828 +
  4829 + - When optimizing images, detect and refuse to optimize images
  4830 + that can't be converted to JPEG because of bit depth or color
  4831 + space.
  4832 +
  4833 + - Linearization and page manipulation APIs now detect and recover
  4834 + from files that have duplicate Page objects in the pages tree.
  4835 +
  4836 + - Using older option
  4837 + @1@option@1@--stream-data=compress@2@option@2@ with object
  4838 + streams, object streams and xref streams were not compressed.
  4839 +
  4840 + - When the tokenizer returns inline image tokens, delimiters
  4841 + following ``ID`` and ``EI`` operators are no longer excluded.
  4842 + This makes it possible to reliably extract the actual image
  4843 + data.
  4844 +
  4845 + - Library Enhancements
  4846 +
  4847 + - Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to
  4848 + convert inline images to regular images.
  4849 +
  4850 + - Add method ``QUtil::possible_repaired_encodings()`` to generate
  4851 + a list of strings that represent other ways the given string
  4852 + could have been encoded. This is the method the QPDF CLI uses
  4853 + to generate the strings it tries when recovering incorrectly
  4854 + encoded Unicode passwords.
  4855 +
  4856 + - Add new versions of
  4857 + ``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow
  4858 + more granular setting of permissions bits. See
  4859 + @1@filename@1@QPDFWriter.hh@2@filename@2@ for details.
  4860 +
  4861 + - Add new versions of the transcoders from UTF-8 to single-byte
  4862 + coding systems in ``QUtil`` that report success or failure
  4863 + rather than just substituting a specified unknown character.
  4864 +
  4865 + - Add method ``QUtil::analyze_encoding()`` to determine whether a
  4866 + string has high-bit characters and is appears to be UTF-16 or
  4867 + valid UTF-8 encoding.
  4868 +
  4869 + - Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to
  4870 + copy a new page that is a "shallow copy" of a page. The
  4871 + resulting object is an indirect object ready to be passed to
  4872 + ``QPDFPageDocumentHelper::addPage()`` for either the original
  4873 + ``QPDF`` object or a different one. This is what the
  4874 + @1@command@1@qpdf@2@command@2@ command-line tool uses to copy
  4875 + the same page multiple times from the same file during
  4876 + splitting and merging operations.
  4877 +
  4878 + - Add method ``QPDF::getUniqueId()``, which returns a unique
  4879 + identifier for the given QPDF object. The identifier will be
  4880 + unique across the life of the application. The returned value
  4881 + can be safely used as a map key.
  4882 +
  4883 + - Add method ``QPDF::setImmediateCopyFrom``. This further
  4884 + enhances qpdf's ability to allow a ``QPDF`` object from which
  4885 + objects are being copied to go out of scope before the
  4886 + destination object is written. If you call this method on a
  4887 + ``QPDF`` instances, objects copied *from* this instance will be
  4888 + copied immediately instead of lazily. This option uses more
  4889 + memory but allows the source object to go out of scope before
  4890 + the destination object is written in all cases. See comments in
  4891 + @1@filename@1@QPDF.hh@2@filename@2@ for details.
  4892 +
  4893 + - Add method ``QPDFPageObjectHelper::getAttribute`` for
  4894 + retrieving an attribute from the page dictionary taking
  4895 + inheritance into consideration, and optionally making a copy if
  4896 + your intention is to modify the attribute.
  4897 +
  4898 + - Fix long-standing limitation of
  4899 + ``QPDFPageObjectHelper::getPageImages`` so that it now properly
  4900 + reports images from inherited resources dictionaries,
  4901 + eliminating the need to call
  4902 + ``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in
  4903 + this case.
  4904 +
  4905 + - Add method ``QPDFObjectHandle::getUniqueResourceName`` for
  4906 + finding an unused name in a resource dictionary.
  4907 +
  4908 + - Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for
  4909 + generating a form XObject equivalent to a page. The resulting
  4910 + object can be used in the same file or copied to another file
  4911 + with ``copyForeignObject``. This can be useful for implementing
  4912 + underlay, overlay, n-up, thumbnails, or any other functionality
  4913 + requiring replication of pages in other contexts.
  4914 +
  4915 + - Add method ``QPDFPageObjectHelper::placeFormXObject`` for
  4916 + generating content stream text that places a given form XObject
  4917 + on a page, centered and fit within a specified rectangle. This
  4918 + method takes care of computing the proper transformation matrix
  4919 + and may optionally compensate for rotation or scaling of the
  4920 + destination page.
  4921 +
  4922 + - Build Improvements
  4923 +
  4924 + - Add new configure option
  4925 + @1@option@1@--enable-avoid-windows-handle@2@option@2@, which
  4926 + causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be
  4927 + defined. When defined, qpdf will avoid referencing the Windows
  4928 + ``HANDLE`` type, which is disallowed with certain versions of
  4929 + the Windows SDK.
  4930 +
  4931 + - For Windows builds, attempt to determine what options, if any,
  4932 + have to be passed to the compiler and linker to enable use of
  4933 + ``wmain``. This causes the preprocessor symbol
  4934 + ``WINDOWS_WMAIN`` to be defined. If you do your own builds with
  4935 + other compilers, you can define this symbol to cause ``wmain``
  4936 + to be used. This is needed to allow the Windows
  4937 + @1@command@1@qpdf@2@command@2@ command to receive Unicode
  4938 + command-line options.
  4939 +
  4940 +8.3.0: January 7, 2019
  4941 + - Command-line Enhancements
  4942 +
  4943 + - Shell completion: you can now use eval @1@command@1@$(qpdf
  4944 + --completion-bash)@2@command@2@ and eval @1@command@1@$(qpdf
  4945 + --completion-zsh)@2@command@2@ to enable shell completion for
  4946 + bash and zsh.
  4947 +
  4948 + - Page numbers (also known as page labels) are now preserved when
  4949 + merging and splitting files with the
  4950 + @1@option@1@--pages@2@option@2@ and
  4951 + @1@option@1@--split-pages@2@option@2@ options.
  4952 +
  4953 + - Bookmarks are partially preserved when splitting pages with the
  4954 + @1@option@1@--split-pages@2@option@2@ option. Specifically, the
  4955 + outlines dictionary and some supporting metadata are copied
  4956 + into the split files. The result is that all bookmarks from the
  4957 + original file appear, those that point to pages that are
  4958 + preserved work, and those that point to pages that are not
  4959 + preserved don't do anything. This is an interim step toward
  4960 + proper support for bookmarks in splitting and merging
  4961 + operations.
  4962 +
  4963 + - Page collation: add new option
  4964 + @1@option@1@--collate@2@option@2@. When specified, the
  4965 + semantics of @1@option@1@--pages@2@option@2@ change from
  4966 + concatenation to collation. See `Page Selection
  4967 + Options <#ref.page-selection>`__ for examples and discussion.
  4968 +
  4969 + - Generation of information in JSON format, primarily to
  4970 + facilitate use of qpdf from languages other than C++. Add new
  4971 + options @1@option@1@--json@2@option@2@,
  4972 + @1@option@1@--json-key@2@option@2@, and
  4973 + @1@option@1@--json-object@2@option@2@ to generate a JSON
  4974 + representation of the PDF file. Run @1@command@1@qpdf
  4975 + --json-help@2@command@2@ to get a description of the JSON
  4976 + format. For more information, see `QPDF JSON <#ref.json>`__.
  4977 +
  4978 + - The @1@option@1@--generate-appearances@2@option@2@ flag will
  4979 + cause qpdf to generate appearances for form fields if the PDF
  4980 + file indicates that form field appearances are out of date.
  4981 + This can happen when PDF forms are filled in by a program that
  4982 + doesn't know how to regenerate the appearances of the filled-in
  4983 + fields.
  4984 +
  4985 + - The @1@option@1@--flatten-annotations@2@option@2@ flag can be
  4986 + used to *flatten* annotations, including form fields.
  4987 + Ordinarily, annotations are drawn separately from the page.
  4988 + Flattening annotations is the process of combining their
  4989 + appearances into the page's contents. You might want to do this
  4990 + if you are going to rotate or combine pages using a tool that
  4991 + doesn't understand about annotations. You may also want to use
  4992 + @1@option@1@--generate-appearances@2@option@2@ when using this
  4993 + flag since annotations for outdated form fields are not
  4994 + flattened as that would cause loss of information.
  4995 +
  4996 + - The @1@option@1@--optimize-images@2@option@2@ flag tells qpdf
  4997 + to recompresses every image using DCT (JPEG) compression as
  4998 + long as the image is not already compressed with lossy
  4999 + compression and recompressing the image reduces its size. The
  5000 + additional options @1@option@1@--oi-min-width@2@option@2@,
  5001 + @1@option@1@--oi-min-height@2@option@2@, and
  5002 + @1@option@1@--oi-min-area@2@option@2@ prevent recompression of
  5003 + images whose width, height, or pixel area (widthย ร—ย height) are
  5004 + below a specified threshold.
  5005 +
  5006 + - The @1@option@1@--show-object@2@option@2@ option can now be
  5007 + given as @1@option@1@--show-object=trailer@2@option@2@ to show
  5008 + the trailer dictionary.
  5009 +
  5010 + - Bug Fixes and Enhancements
  5011 +
  5012 + - QPDF now automatically detects and recovers from dangling
  5013 + references. If a PDF file contained an indirect reference to a
  5014 + non-existent object, which is valid, when adding a new object
  5015 + to the file, it was possible for the new object to take the
  5016 + object ID of the dangling reference, thereby causing the
  5017 + dangling reference to point to the new object. This case is now
  5018 + prevented.
  5019 +
  5020 + - Fixes to form field setting code: strings are always written in
  5021 + UTF-16 format, and checkboxes and radio buttons are handled
  5022 + properly with respect to synchronization of values and
  5023 + appearance states.
  5024 +
  5025 + - The ``QPDF::checkLinearization()`` no longer causes the program
  5026 + to crash when it detects problems with linearization data.
  5027 + Instead, it issues a normal warning or error.
  5028 +
  5029 + - Ordinarily qpdf treats an argument of the form
  5030 + @1@option@1@@file@2@option@2@ to mean that command-line options
  5031 + should be read from @1@filename@1@file@2@filename@2@. Now, if
  5032 + @1@filename@1@file@2@filename@2@ does not exist but
  5033 + @1@filename@1@@file@2@filename@2@ does, qpdf will treat
  5034 + @1@filename@1@@file@2@filename@2@ as a regular option. This
  5035 + makes it possible to work more easily with PDF files whose
  5036 + names happen to start with the ``@`` character.
  5037 +
  5038 + - Library Enhancements
  5039 +
  5040 + - Remove the restriction in most cases that the source QPDF
  5041 + object used in a ``QPDF::copyForeignObject`` call has to stick
  5042 + around until the destination QPDF is written. The exceptional
  5043 + case is when the source stream gets is data using a
  5044 + QPDFObjectHandle::StreamDataProvider. For a more in-depth
  5045 + discussion, see comments around ``copyForeignObject`` in
  5046 + @1@filename@1@QPDF.hh@2@filename@2@.
  5047 +
  5048 + - Add new method ``QPDFWriter::getFinalVersion()``, which returns
  5049 + the PDF version that will ultimately be written to the final
  5050 + file. See comments in @1@filename@1@QPDFWriter.hh@2@filename@2@
  5051 + for some restrictions on its use.
  5052 +
  5053 + - Add several methods for transcoding strings to some of the
  5054 + character sets used in PDF files: ``QUtil::utf8_to_ascii``,
  5055 + ``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and
  5056 + ``QUtil::utf8_to_utf16``. For the single-byte encodings that
  5057 + support only a limited character sets, these methods replace
  5058 + unsupported characters with a specified substitute.
  5059 +
  5060 + - Add new methods to ``QPDFAnnotationObjectHelper`` and
  5061 + ``QPDFFormFieldObjectHelper`` for querying flags and
  5062 + interpretation of different field types. Define constants in
  5063 + @1@filename@1@qpdf/Constants.h@2@filename@2@ to help with
  5064 + interpretation of flag values.
  5065 +
  5066 + - Add new methods
  5067 + ``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and
  5068 + ``QPDFFormFieldObjectHelper::generateAppearance`` for
  5069 + generating appearance streams. See discussion in
  5070 + @1@filename@1@QPDFFormFieldObjectHelper.hh@2@filename@2@ for
  5071 + limitations.
  5072 +
  5073 + - Add two new helper functions for dealing with resource
  5074 + dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns
  5075 + a list of all second-level keys, which correspond to the names
  5076 + of resources, and ``QPDFObjectHandle::mergeResources()`` merges
  5077 + two resources dictionaries as long as they have non-conflicting
  5078 + keys. These methods are useful for certain types of objects
  5079 + that resolve resources from multiple places, such as form
  5080 + fields.
  5081 +
  5082 + - Add methods ``QPDFPageDocumentHelper::flattenAnnotations()``
  5083 + and
  5084 + ``QPDFAnnotationObjectHelper::getPageContentForAppearance()``
  5085 + for handling low-level details of annotation flattening.
  5086 +
  5087 + - Add new helper classes: ``QPDFOutlineDocumentHelper``,
  5088 + ``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``,
  5089 + ``QPDFNameTreeObjectHelper``, and
  5090 + ``QPDFNumberTreeObjectHelper``.
  5091 +
  5092 + - Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON
  5093 + representation of the object. Call ``serialize()`` on the
  5094 + result to convert it to a string.
  5095 +
  5096 + - Add a simple JSON serializer. This is not a complete or
  5097 + general-purpose JSON library. It allows assembly and
  5098 + serialization of JSON structures with some restrictions, which
  5099 + are described in the header file. This is the serializer used
  5100 + by qpdf's new JSON representation.
  5101 +
  5102 + - Add new ``QPDFObjectHandle::Matrix`` class along with a few
  5103 + convenience methods for dealing with six-element numerical
  5104 + arrays as matrices.
  5105 +
  5106 + - Add new method ``QPDFObjectHandle::wrapInArray``, which returns
  5107 + the object itself if it is an array, or an array containing the
  5108 + object otherwise. This is a common construct in PDF. This
  5109 + method prevents you from having to explicitly test whether
  5110 + something is a single element or an array.
  5111 +
  5112 + - Build Improvements
  5113 +
  5114 + - It is no longer necessary to run
  5115 + @1@command@1@autogen.sh@2@command@2@ to build from a pristine
  5116 + checkout. Automatically generated files are now committed so
  5117 + that it is possible to build on platforms without autoconf
  5118 + directly from a clean checkout of the repository. The
  5119 + @1@command@1@configure@2@command@2@ script detects if the files
  5120 + are out of date when it also determines that the tools are
  5121 + present to regenerate them.
  5122 +
  5123 + - Pull requests and the master branch are now built automatically
  5124 + in `Azure
  5125 + Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is
  5126 + free for open source projects. The build includes Linux, mac,
  5127 + Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage
  5128 + build. Official qpdf releases are now built with Azure
  5129 + Pipelines.
  5130 +
  5131 + - Notes for Packagers
  5132 +
  5133 + - A new section has been added to the documentation with notes
  5134 + for packagers. Please see `Notes for
  5135 + Packagers <#ref.packaging>`__.
  5136 +
  5137 + - The qpdf detects out-of-date automatically generated files. If
  5138 + your packaging system automatically refreshes libtool or
  5139 + autoconf files, it could cause this check to fail. To avoid
  5140 + this problem, pass
  5141 + @1@option@1@--disable-check-autofiles@2@option@2@ to
  5142 + @1@command@1@configure@2@command@2@.
  5143 +
  5144 + - If you would like to have qpdf completion enabled
  5145 + automatically, you can install completion files in the
  5146 + distribution's default location. You can find sample completion
  5147 + files to install in the @1@filename@1@completions@2@filename@2@
  5148 + directory.
  5149 +
  5150 +8.2.1: August 18, 2018
  5151 + - Command-line Enhancements
  5152 +
  5153 + - Add
  5154 + @1@option@1@--keep-files-open=@1@replaceable@1@[yn]@2@replaceable@2@@2@option@2@
  5155 + to override default determination of whether to keep files open
  5156 + when merging. Please see the discussion of
  5157 + @1@option@1@--keep-files-open@2@option@2@ in `Basic
  5158 + Options <#ref.basic-options>`__ for additional details.
  5159 +
  5160 +8.2.0: August 16, 2018
  5161 + - Command-line Enhancements
  5162 +
  5163 + - Add @1@option@1@--no-warn@2@option@2@ option to suppress
  5164 + issuing warning messages. If there are any conditions that
  5165 + would have caused warnings to be issued, the exit status is
  5166 + still 3.
  5167 +
  5168 + - Bug Fixes and Optimizations
  5169 +
  5170 + - Performance fix: optimize page merging operation to avoid
  5171 + unnecessary open/close calls on files being merged. This solves
  5172 + a dramatic slow-down that was observed when merging certain
  5173 + types of files.
  5174 +
  5175 + - Optimize how memory was used for the TIFF predictor,
  5176 + drastically improving performance and memory usage for files
  5177 + containing high-resolution images compressed with Flate using
  5178 + the TIFF predictor.
  5179 +
  5180 + - Bug fix: end of line characters were not properly handled
  5181 + inside strings in some cases.
  5182 +
  5183 + - Bug fix: using @1@option@1@--progress@2@option@2@ on very small
  5184 + files could cause an infinite loop.
  5185 +
  5186 + - API enhancements
  5187 +
  5188 + - Add new class ``QPDFSystemError``, derived from
  5189 + ``std::runtime_error``, which is now thrown by
  5190 + ``QUtil::throw_system_error``. This enables the triggering
  5191 + ``errno`` value to be retrieved.
  5192 +
  5193 + - Add ``ClosedFileInputSource::stayOpen`` method, enabling a
  5194 + ``ClosedFileInputSource`` to stay open during manually
  5195 + indicated periods of high activity, thus reducing the overhead
  5196 + of frequent open/close operations.
  5197 +
  5198 + - Build Changes
  5199 +
  5200 + - For the mingw builds, change the name of the DLL import library
  5201 + from @1@filename@1@libqpdf.a@2@filename@2@ to
  5202 + @1@filename@1@libqpdf.dll.a@2@filename@2@ to more accurately
  5203 + reflect that it is an import library rather than a static
  5204 + library. This potentially clears the way for supporting a
  5205 + static library in the future, though presently, the qpdf
  5206 + Windows build only builds the DLL and executables.
  5207 +
  5208 +8.1.0: June 23, 2018
  5209 + - Usability Improvements
  5210 +
  5211 + - When splitting files, qpdf detects fonts and images that the
  5212 + document metadata claims are referenced from a page but are not
  5213 + actually referenced and omits them from the output file. This
  5214 + change can cause a significant reduction in the size of split
  5215 + PDF files for files created by some software packages. In some
  5216 + cases, it can also make page splitting slower. Prior versions
  5217 + of qpdf would believe the document metadata and sometimes
  5218 + include all the images from all the other pages even though the
  5219 + pages were no longer present. In the unlikely event that the
  5220 + old behavior should be desired, or if you have a case where
  5221 + page splitting is very slow, the old behavior (and speed) can
  5222 + be enabled by specifying
  5223 + @1@option@1@--preserve-unreferenced-resources@2@option@2@. For
  5224 + additional details, please see `Advanced Transformation
  5225 + Options <#ref.advanced-transformation>`__.
  5226 +
  5227 + - When merging multiple PDF files, qpdf no longer leaves all the
  5228 + files open. This makes it possible to merge numbers of files
  5229 + that may exceed the operating system's limit for the maximum
  5230 + number of open files.
  5231 +
  5232 + - The @1@option@1@--rotate@2@option@2@ option's syntax has been
  5233 + extended to make the page range optional. If you specify
  5234 + @1@option@1@--rotate=@1@replaceable@1@angle@2@replaceable@2@@2@option@2@
  5235 + without specifying a page range, the rotation will be applied
  5236 + to all pages. This can be especially useful for adjusting a PDF
  5237 + created from a multi-page document that was scanned upside
  5238 + down.
  5239 +
  5240 + - When merging multiple files, the
  5241 + @1@option@1@--verbose@2@option@2@ option now prints information
  5242 + about each file as it operates on that file.
  5243 +
  5244 + - When the @1@option@1@--progress@2@option@2@ option is
  5245 + specified, qpdf will print a running indicator of its best
  5246 + guess at how far through the writing process it is. Note that,
  5247 + as with all progress meters, it's an approximation. This option
  5248 + is implemented in a way that makes it useful for software that
  5249 + uses the qpdf library; see API Enhancements below.
  5250 +
  5251 + - Bug Fixes
  5252 +
  5253 + - Properly decrypt files that use revision 3 of the standard
  5254 + security handler but use 40 bit keys (even though revision 3
  5255 + supports 128-bit keys).
  5256 +
  5257 + - Limit depth of nested data structures to prevent crashes from
  5258 + certain types of malformed (malicious) PDFs.
  5259 +
  5260 + - In "newline before endstream" mode, insert the required extra
  5261 + newline before the ``endstream`` at the end of object streams.
  5262 + This one case was previously omitted.
  5263 +
  5264 + - API Enhancements
  5265 +
  5266 + - The first round of higher level "helper" interfaces has been
  5267 + introduced. These are designed to provide a more convenient way
  5268 + of interacting with certain document features than using
  5269 + ``QPDFObjectHandle`` directly. For details on helpers, see
  5270 + `Helper Classes <#ref.helper-classes>`__. Specific additional
  5271 + interfaces are described below.
  5272 +
  5273 + - Add two new document helper classes: ``QPDFPageDocumentHelper``
  5274 + for working with pages, and ``QPDFAcroFormDocumentHelper`` for
  5275 + working with interactive forms. No old methods have been
  5276 + removed, but ``QPDFPageDocumentHelper`` is now the preferred
  5277 + way to perform operations on pages rather than calling the old
  5278 + methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments
  5279 + in the header files direct you to the new interfaces. Please
  5280 + see the header files and @1@filename@1@ChangeLog@2@filename@2@
  5281 + for additional details.
  5282 +
  5283 + - Add three new object helper class: ``QPDFPageObjectHelper`` for
  5284 + pages, ``QPDFFormFieldObjectHelper`` for interactive form
  5285 + fields, and ``QPDFAnnotationObjectHelper`` for annotations. All
  5286 + three classes are fairly sparse at the moment, but they have
  5287 + some useful, basic functionality.
  5288 +
  5289 + - A new example program
  5290 + @1@filename@1@examples/pdf-set-form-values.cc@2@filename@2@ has
  5291 + been added that illustrates use of the new document and object
  5292 + helpers.
  5293 +
  5294 + - The method ``QPDFWriter::registerProgressReporter`` has been
  5295 + added. This method allows you to register a function that is
  5296 + called by ``QPDFWriter`` to update your idea of the percentage
  5297 + it thinks it is through writing its output. Client programs can
  5298 + use this to implement reasonably accurate progress meters. The
  5299 + @1@command@1@qpdf@2@command@2@ command line tool uses this to
  5300 + implement its @1@option@1@--progress@2@option@2@ option.
  5301 +
  5302 + - New methods ``QPDFObjectHandle::newUnicodeString`` and
  5303 + ``QPDFObject::unparseBinary`` have been added to allow for more
  5304 + convenient creation of strings that are explicitly encoded
  5305 + using big-endian UTF-16. This is useful for creating strings
  5306 + that appear outside of content streams, such as labels, form
  5307 + fields, outlines, document metadata, etc.
  5308 +
  5309 + - A new class ``QPDFObjectHandle::Rectangle`` has been added to
  5310 + ease working with PDF rectangles, which are just arrays of four
  5311 + numeric values.
  5312 +
  5313 +8.0.2: March 6, 2018
  5314 + - When a loop is detected while following cross reference streams or
  5315 + tables, treat this as damage instead of silently ignoring the
  5316 + previous table. This prevents loss of otherwise recoverable data
  5317 + in some damaged files.
  5318 +
  5319 + - Properly handle pages with no contents.
  5320 +
  5321 +8.0.1: March 4, 2018
  5322 + - Disregard data check errors when uncompressing ``/FlateDecode``
  5323 + streams. This is consistent with most other PDF readers and allows
  5324 + qpdf to recover data from another class of malformed PDF files.
  5325 +
  5326 + - On the command line when specifying page ranges, support preceding
  5327 + a page number by "r" to indicate that it should be counted from
  5328 + the end. For example, the range ``r3-r1`` would indicate the last
  5329 + three pages of a document.
  5330 +
  5331 +8.0.0: February 25, 2018
  5332 + - Packaging and Distribution Changes
  5333 +
  5334 + - QPDF is now distributed as an
  5335 + `AppImage <https://appimage.org/>`__ in addition to all the
  5336 + other ways it is distributed. The AppImage can be found in the
  5337 + download area with the other packages. Thanks to Kurt Pfeifle
  5338 + and Simon Peter for their contributions.
  5339 +
  5340 + - Bug Fixes
  5341 +
  5342 + - ``QPDFObjectHandle::getUTF8Val`` now properly treats
  5343 + non-Unicode strings as encoded with PDF Doc Encoding.
  5344 +
  5345 + - Improvements to handling of objects in PDF files that are not
  5346 + of the expected type. In most cases, qpdf will be able to warn
  5347 + for such cases rather than fail with an exception. Previous
  5348 + versions of qpdf would sometimes fail with errors such as
  5349 + "operation for dictionary object attempted on object of wrong
  5350 + type". This situation should be mostly or entirely eliminated
  5351 + now.
  5352 +
  5353 + - Enhancements to the @1@command@1@qpdf@2@command@2@ Command-line
  5354 + Tool. All new options listed here are documented in more detail in
  5355 + `Running QPDF <#ref.using>`__.
  5356 +
  5357 + - The option
  5358 + @1@option@1@--linearize-pass1=@1@replaceable@1@file@2@replaceable@2@@2@option@2@
  5359 + has been added for debugging qpdf's linearization code.
  5360 +
  5361 + - The option @1@option@1@--coalesce-contents@2@option@2@ can be
  5362 + used to combine content streams of a page whose contents are an
  5363 + array of streams into a single stream.
  5364 +
  5365 + - API Enhancements. All new API calls are documented in their
  5366 + respective classes' header files. There are no non-compatible
  5367 + changes to the API.
  5368 +
  5369 + - Add function ``qpdf_check_pdf`` to the C API. This function
  5370 + does basic checking that is a subset of what @1@command@1@qpdf
  5371 + --check@2@command@2@ performs.
  5372 +
  5373 + - Major enhancements to the lexical layer of qpdf. For a complete
  5374 + list of enhancements, please refer to the
  5375 + @1@filename@1@ChangeLog@2@filename@2@ file. Most of the changes
  5376 + result in improvements to qpdf's ability handle erroneous
  5377 + files. It is also possible for programs to handle whitespace,
  5378 + comments, and inline images as tokens.
  5379 +
  5380 + - New API for working with PDF content streams at a lexical
  5381 + level. The new class ``QPDFObjectHandle::TokenFilter`` allows
  5382 + the developer to provide token handlers. Token filters can be
  5383 + used with several different methods in ``QPDFObjectHandle`` as
  5384 + well as with a lower-level interface. See comments in
  5385 + @1@filename@1@QPDFObjectHandle.hh@2@filename@2@ as well as the
  5386 + new examples
  5387 + @1@filename@1@examples/pdf-filter-tokens.cc@2@filename@2@ and
  5388 + @1@filename@1@examples/pdf-count-strings.cc@2@filename@2@ for
  5389 + details.
  5390 +
  5391 +7.1.1: February 4, 2018
  5392 + - Bug fix: files whose /ID fields were other than 16 bytes long can
  5393 + now be properly linearized
  5394 +
  5395 + - A few compile and link issues have been corrected for some
  5396 + platforms.
  5397 +
  5398 +7.1.0: January 14, 2018
  5399 + - PDF files contain streams that may be compressed with various
  5400 + compression algorithms which, in some cases, may be enhanced by
  5401 + various predictor functions. Previously only the PNG up predictor
  5402 + was supported. In this version, all the PNG predictors as well as
  5403 + the TIFF predictor are supported. This increases the range of
  5404 + files that qpdf is able to handle.
  5405 +
  5406 + - QPDF now allows a raw encryption key to be specified in place of a
  5407 + password when opening encrypted files, and will optionally display
  5408 + the encryption key used by a file. This is a non-standard
  5409 + operation, but it can be useful in certain situations. Please see
  5410 + the discussion of @1@option@1@--password-is-hex-key@2@option@2@ in
  5411 + `Basic Options <#ref.basic-options>`__ or the comments around
  5412 + ``QPDF::setPasswordIsHexKey`` in
  5413 + @1@filename@1@QPDF.hh@2@filename@2@ for additional details.
  5414 +
  5415 + - Bug fix: numbers ending with a trailing decimal point are now
  5416 + properly recognized as numbers.
  5417 +
  5418 + - Bug fix: when building qpdf from source on some platforms
  5419 + (especially MacOS), the build could get confused by older versions
  5420 + of qpdf installed on the system. This has been corrected.
  5421 +
  5422 +7.0.0: September 15, 2017
  5423 + - Packaging and Distribution Changes
  5424 +
  5425 + - QPDF's primary license is now `version 2.0 of the Apache
  5426 + License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather
  5427 + than version 2.0 of the Artistic License. You may still, at
  5428 + your option, consider qpdf to be licensed with version 2.0 of
  5429 + the Artistic license.
  5430 +
  5431 + - QPDF no longer has a dependency on the PCRE (Perl-Compatible
  5432 + Regular Expression) library. QPDF now has an added dependency
  5433 + on the JPEG library.
  5434 +
  5435 + - Bug Fixes
  5436 +
  5437 + - This release contains many bug fixes for various infinite
  5438 + loops, memory leaks, and other memory errors that could be
  5439 + encountered with specially crafted or otherwise erroneous PDF
  5440 + files.
  5441 +
  5442 + - New Features
  5443 +
  5444 + - QPDF now supports reading and writing streams encoded with JPEG
  5445 + or RunLength encoding. Library API enhancements and
  5446 + command-line options have been added to control this behavior.
  5447 + See command-line options
  5448 + @1@option@1@--compress-streams@2@option@2@ and
  5449 + @1@option@1@--decode-level@2@option@2@ and methods
  5450 + ``QPDFWriter::setCompressStreams`` and
  5451 + ``QPDFWriter::setDecodeLevel``.
  5452 +
  5453 + - QPDF is much better at recovering from broken files. In most
  5454 + cases, qpdf will skip invalid objects and will preserve broken
  5455 + stream data by not attempting to filter broken streams. QPDF is
  5456 + now able to recover or at least not crash on dozens of broken
  5457 + test files I have received over the past few years.
  5458 +
  5459 + - Page rotation is now supported and accessible from both the
  5460 + library and the command line.
  5461 +
  5462 + - ``QPDFWriter`` supports writing files in a way that preserves
  5463 + PCLm compliance in support of driverless printing. This is very
  5464 + specialized and is only useful to applications that already
  5465 + know how to create PCLm files.
  5466 +
  5467 + - Enhancements to the @1@command@1@qpdf@2@command@2@ Command-line
  5468 + Tool. All new options listed here are documented in more detail in
  5469 + `Running QPDF <#ref.using>`__.
  5470 +
  5471 + - Command-line arguments can now be read from files or standard
  5472 + input using ``@file`` or ``@-`` syntax. Please see `Basic
  5473 + Invocation <#ref.invocation>`__.
  5474 +
  5475 + - @1@option@1@--rotate@2@option@2@: request page rotation
  5476 +
  5477 + - @1@option@1@--newline-before-endstream@2@option@2@: ensure that
  5478 + a newline appears before every ``endstream`` keyword in the
  5479 + file; used to prevent qpdf from breaking PDF/A compliance on
  5480 + already compliant files.
  5481 +
  5482 + - @1@option@1@--preserve-unreferenced@2@option@2@: preserve
  5483 + unreferenced objects in the input PDF
  5484 +
  5485 + - @1@option@1@--split-pages@2@option@2@: break output into chunks
  5486 + with fixed numbers of pages
  5487 +
  5488 + - @1@option@1@--verbose@2@option@2@: print the name of each
  5489 + output file that is created
  5490 +
  5491 + - @1@option@1@--compress-streams@2@option@2@ and
  5492 + @1@option@1@--decode-level@2@option@2@ replace
  5493 + @1@option@1@--stream-data@2@option@2@ for improving granularity
  5494 + of controlling compression and decompression of stream data.
  5495 + The @1@option@1@--stream-data@2@option@2@ option will remain
  5496 + available.
  5497 +
  5498 + - When running @1@command@1@qpdf --check@2@command@2@ with other
  5499 + options, checks are always run first. This enables qpdf to
  5500 + perform its full recovery logic before outputting other
  5501 + information. This can be especially useful when manually
  5502 + recovering broken files, looking at qpdf's regenerated cross
  5503 + reference table, or other similar operations.
  5504 +
  5505 + - Process @1@command@1@--pages@2@command@2@ earlier so that other
  5506 + options like @1@option@1@--show-pages@2@option@2@ or
  5507 + @1@option@1@--split-pages@2@option@2@ can operate on the file
  5508 + after page splitting/merging has occurred.
  5509 +
  5510 + - API Changes. All new API calls are documented in their respective
  5511 + classes' header files.
  5512 +
  5513 + - ``QPDFObjectHandle::rotatePage``: apply rotation to a page
  5514 + object
  5515 +
  5516 + - ``QPDFWriter::setNewlineBeforeEndstream``: force newline to
  5517 + appear before ``endstream``
  5518 +
  5519 + - ``QPDFWriter::setPreserveUnreferencedObjects``: preserve
  5520 + unreferenced objects that appear in the input PDF. The default
  5521 + behavior is to discard them.
  5522 +
  5523 + - New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are
  5524 + available for developers who wish to produce or consume
  5525 + RunLength or DCT stream data directly. The
  5526 + @1@filename@1@examples/pdf-create.cc@2@filename@2@ example
  5527 + illustrates their use.
  5528 +
  5529 + - ``QPDFWriter::setCompressStreams`` and
  5530 + ``QPDFWriter::setDecodeLevel`` methods control handling of
  5531 + different types of stream compression.
  5532 +
  5533 + - Add new C API functions ``qpdf_set_compress_streams``,
  5534 + ``qpdf_set_decode_level``,
  5535 + ``qpdf_set_preserve_unreferenced_objects``, and
  5536 + ``qpdf_set_newline_before_endstream`` corresponding to the new
  5537 + ``QPDFWriter`` methods.
  5538 +
  5539 +6.0.0: November 10, 2015
  5540 + - Implement @1@option@1@--deterministic-id@2@option@2@ command-line
  5541 + option and ``QPDFWriter::setDeterministicID`` as well as C API
  5542 + function ``qpdf_set_deterministic_ID`` for generating a
  5543 + deterministic ID for non-encrypted files. When this option is
  5544 + selected, the ID of the file depends on the contents of the output
  5545 + file, and not on transient items such as the timestamp or output
  5546 + file name.
  5547 +
  5548 + - Make qpdf more tolerant of files whose xref table entries are not
  5549 + the correct length.
  5550 +
  5551 +5.1.3: May 24, 2015
  5552 + - Bug fix: fix-qdf was not properly handling files that contained
  5553 + object streams with more than 255 objects in them.
  5554 +
  5555 + - Bug fix: qpdf was not properly initializing Microsoft's secure
  5556 + crypto provider on fresh Windows installations that had not had
  5557 + any keys created yet.
  5558 +
  5559 + - Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of
  5560 + the Google Security Team. Please see the ChangeLog for details.
  5561 +
  5562 + - Properly handle pages that have no contents at all. There were
  5563 + many cases in which qpdf handled this fine, but a few methods
  5564 + blindly obtained page contents with handling the possibility that
  5565 + there were no contents.
  5566 +
  5567 + - Make qpdf more robust for a few more kinds of problems that may
  5568 + occur in invalid PDF files.
  5569 +
  5570 +5.1.2: June 7, 2014
  5571 + - Bug fix: linearizing files could create a corrupted output file
  5572 + under extremely unlikely file size circumstances. See ChangeLog
  5573 + for details. The odds of getting hit by this are very low, though
  5574 + one person did.
  5575 +
  5576 + - Bug fix: qpdf would fail to write files that had streams with
  5577 + decode parameters referencing other streams.
  5578 +
  5579 + - New example program: @1@command@1@pdf-split-pages@2@command@2@:
  5580 + efficiently split PDF files into individual pages. The example
  5581 + program does this more efficiently than using @1@command@1@qpdf
  5582 + --pages@2@command@2@ to do it.
  5583 +
  5584 + - Packaging fix: Visual C++ binaries did not support Windows XP.
  5585 + This has been rectified by updating the compilers used to generate
  5586 + the release binaries.
  5587 +
  5588 +5.1.1: January 14, 2014
  5589 + - Performance fix: copying foreign objects could be very slow with
  5590 + certain types of files. This was most likely to be visible during
  5591 + page splitting and was due to traversing the same objects multiple
  5592 + times in some cases.
  5593 +
  5594 +5.1.0: December 17, 2013
  5595 + - Added runtime option (``QUtil::setRandomDataProvider``) to supply
  5596 + your own random data provider. You can use this if you want to
  5597 + avoid using the OS-provided secure random number generation
  5598 + facility or stdlib's less secure version. See comments in
  5599 + include/qpdf/QUtil.hh for details.
  5600 +
  5601 + - Fixed image comparison tests to not create 12-bit-per-pixel images
  5602 + since some versions of tiffcmp have bugs in comparing them in some
  5603 + cases. This increases the disk space required by the image
  5604 + comparison tests, which are off by default anyway.
  5605 +
  5606 + - Introduce a number of small fixes for compilation on the latest
  5607 + clang in MacOS and the latest Visual C++ in Windows.
  5608 +
  5609 + - Be able to handle broken files that end the xref table header with
  5610 + a space instead of a newline.
  5611 +
  5612 +5.0.1: October 18, 2013
  5613 + - Thanks to a detailed review by Florian Weimer and the Red Hat
  5614 + Product Security Team, this release includes a number of
  5615 + non-user-visible security hardening changes. Please see the
  5616 + ChangeLog file in the source distribution for the complete list.
  5617 +
  5618 + - When available, operating system-specific secure random number
  5619 + generation is used for generating initialization vectors and other
  5620 + random values used during encryption or file creation. For the
  5621 + Windows build, this results in an added dependency on Microsoft's
  5622 + cryptography API. To disable the OS-specific cryptography and use
  5623 + the old version, pass the
  5624 + @1@option@1@--enable-insecure-random@2@option@2@ option to
  5625 + @1@command@1@./configure@2@command@2@.
  5626 +
  5627 + - The @1@command@1@qpdf@2@command@2@ command-line tool now issues a
  5628 + warning when @1@option@1@-accessibility=n@2@option@2@ is specified
  5629 + for newer encryption versions stating that the option is ignored.
  5630 + qpdf, per the spec, has always ignored this flag, but it
  5631 + previously did so silently. This warning is issued only by the
  5632 + command-line tool, not by the library. The library's handling of
  5633 + this flag is unchanged.
  5634 +
  5635 +5.0.0: July 10, 2013
  5636 + - Bug fix: previous versions of qpdf would lose objects with
  5637 + generation != 0 when generating object streams. Fixing this
  5638 + required changes to the public API.
  5639 +
  5640 + - Removed methods from public API that were only supposed to be
  5641 + called by QPDFWriter and couldn't realistically be called anywhere
  5642 + else. See ChangeLog for details.
  5643 +
  5644 + - New ``QPDFObjGen`` class added to represent an object
  5645 + ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now
  5646 + preferred over ``QPDFObjectHandle::getObjectID()`` and
  5647 + ``QPDFObjectHandle::getGeneration()`` as it makes it less likely
  5648 + for people to accidentally write code that ignores the generation
  5649 + number. See @1@filename@1@QPDF.hh@2@filename@2@ and
  5650 + @1@filename@1@QPDFObjectHandle.hh@2@filename@2@ for additional
  5651 + notes.
  5652 +
  5653 + - Add @1@option@1@--show-npages@2@option@2@ command-line option to
  5654 + the @1@command@1@qpdf@2@command@2@ command to show the number of
  5655 + pages in a file.
  5656 +
  5657 + - Allow omission of the page range within
  5658 + @1@option@1@--pages@2@option@2@ for the
  5659 + @1@command@1@qpdf@2@command@2@ command. When omitted, the page
  5660 + range is implicitly taken to be all the pages in the file.
  5661 +
  5662 + - Various enhancements were made to support different types of
  5663 + broken files or broken readers. Details can be found in
  5664 + @1@filename@1@ChangeLog@2@filename@2@.
  5665 +
  5666 +4.1.0: April 14, 2013
  5667 + - Note to people including qpdf in distributions: the
  5668 + @1@filename@1@.la@2@filename@2@ files generated by libtool are now
  5669 + installed by qpdf's @1@command@1@make install@2@command@2@ target.
  5670 + Before, they were not installed. This means that if your
  5671 + distribution does not want to include
  5672 + @1@filename@1@.la@2@filename@2@ files, you must remove them as
  5673 + part of your packaging process.
  5674 +
  5675 + - Major enhancement: API enhancements have been made to support
  5676 + parsing of content streams. This enhancement includes the
  5677 + following changes:
  5678 +
  5679 + - ``QPDFObjectHandle::parseContentStream`` method parses objects
  5680 + in a content stream and calls handlers in a callback class. The
  5681 + example
  5682 + @1@filename@1@examples/pdf-parse-content.cc@2@filename@2@
  5683 + illustrates how this may be used.
  5684 +
  5685 + - ``QPDFObjectHandle`` can now represent operators and inline
  5686 + images, object types that may only appear in content streams.
  5687 +
  5688 + - Method ``QPDFObjectHandle::getTypeCode()`` returns an
  5689 + enumerated type value representing the underlying object type.
  5690 + Method ``QPDFObjectHandle::getTypeName()`` returns a text
  5691 + string describing the name of the type of a
  5692 + ``QPDFObjectHandle`` object. These methods can be used for more
  5693 + efficient parsing and debugging/diagnostic messages.
  5694 +
  5695 + - @1@command@1@qpdf --check@2@command@2@ now parses all pages'
  5696 + content streams in addition to doing other checks. While there are
  5697 + still many types of errors that cannot be detected, syntactic
  5698 + errors in content streams will now be reported.
  5699 +
  5700 + - Minor compilation enhancements have been made to facilitate easier
  5701 + for support for a broader range of compilers and compiler
  5702 + versions.
  5703 +
  5704 + - Warning flags have been moved into a separate variable in
  5705 + @1@filename@1@autoconf.mk@2@filename@2@
  5706 +
  5707 + - The configure flag @1@option@1@--enable-werror@2@option@2@ work
  5708 + for Microsoft compilers
  5709 +
  5710 + - All MSVC CRT security warnings have been resolved.
  5711 +
  5712 + - All C-style casts in C++ Code have been replaced by C++ casts,
  5713 + and many casts that had been included to suppress higher
  5714 + warning levels for some compilers have been removed, primarily
  5715 + for clarity. Places where integer type coercion occurs have
  5716 + been scrutinized. A new casting policy has been documented in
  5717 + the manual. This is of concern mainly to people porting qpdf to
  5718 + new platforms or compilers. It is not visible to programmers
  5719 + writing code that uses the library
  5720 +
  5721 + - Some internal limits have been removed in code that converts
  5722 + numbers to strings. This is largely invisible to users, but it
  5723 + does trigger a bug in some older versions of mingw-w64's C++
  5724 + library. See @1@filename@1@README-windows.md@2@filename@2@ in
  5725 + the source distribution if you think this may affect you. The
  5726 + copy of the DLL distributed with qpdf's binary distribution is
  5727 + not affected by this problem.
  5728 +
  5729 + - The RPM spec file previously included with qpdf has been removed.
  5730 + This is because virtually all Linux distributions include qpdf now
  5731 + that it is a dependency of CUPS filters.
  5732 +
  5733 + - A few bug fixes are included:
  5734 +
  5735 + - Overridden compressed objects are properly handled. Before,
  5736 + there were certain constructs that could cause qpdf to see old
  5737 + versions of some objects. The most usual manifestation of this
  5738 + was loss of filled in form values for certain files.
  5739 +
  5740 + - Installation no longer uses GNU/Linux-specific versions of some
  5741 + commands, so @1@command@1@make install@2@command@2@ works on
  5742 + Solaris with native tools.
  5743 +
  5744 + - The 64-bit mingw Windows binary package no longer includes a
  5745 + 32-bit DLL.
  5746 +
  5747 +4.0.1: January 17, 2013
  5748 + - Fix detection of binary attachments in test suite to avoid false
  5749 + test failures on some platforms.
  5750 +
  5751 + - Add clarifying comment in @1@filename@1@QPDF.hh@2@filename@2@ to
  5752 + methods that return the user password explaining that it is no
  5753 + longer possible with newer encryption formats to recover the user
  5754 + password knowing the owner password. In earlier encryption
  5755 + formats, the user password was encrypted in the file using the
  5756 + owner password. In newer encryption formats, a separate encryption
  5757 + key is used on the file, and that key is independently encrypted
  5758 + using both the user password and the owner password.
  5759 +
  5760 +4.0.0: December 31, 2012
  5761 + - Major enhancement: support has been added for newer encryption
  5762 + schemes supported by version X of Adobe Acrobat. This includes use
  5763 + of 127-character passwords, 256-bit encryption keys, and the
  5764 + encryption scheme specified in ISO 32000-2, the PDF 2.0
  5765 + specification. This scheme can be chosen from the command line by
  5766 + specifying use of 256-bit keys. qpdf also supports the deprecated
  5767 + encryption method used by Acrobat IX. This encryption style has
  5768 + known security weaknesses and should not be used in practice.
  5769 + However, such files exist "in the wild," so support for this
  5770 + scheme is still useful. New methods
  5771 + ``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme)
  5772 + and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated
  5773 + scheme) have been added to enable these new encryption schemes.
  5774 + Corresponding functions have been added to the C API as well.
  5775 +
  5776 + - Full support for Adobe extension levels in PDF version
  5777 + information. Starting with PDF version 1.7, corresponding to ISO
  5778 + 32000, Adobe adds new functionality by increasing the extension
  5779 + level rather than increasing the version. This support includes
  5780 + addition of the ``QPDF::getExtensionLevel`` method for retrieving
  5781 + the document's extension level, addition of versions of
  5782 + ``QPDFWriter::setMinimumPDFVersion`` and
  5783 + ``QPDFWriter::forcePDFVersion`` that accept an extension level,
  5784 + and extended syntax for specifying forced and minimum versions on
  5785 + the command line as described in `Advanced Transformation
  5786 + Options <#ref.advanced-transformation>`__. Corresponding functions
  5787 + have been added to the C API as well.
  5788 +
  5789 + - Minor fixes to prevent qpdf from referencing objects in the file
  5790 + that are not referenced in the file's overall structure. Most
  5791 + files don't have any such objects, but some files have contain
  5792 + unreferenced objects with errors, so these fixes prevent qpdf from
  5793 + needlessly rejecting or complaining about such objects.
  5794 +
  5795 + - Add new generalized methods for reading and writing files from/to
  5796 + programmer-defined sources. The method
  5797 + ``QPDF::processInputSource`` allows the programmer to use any
  5798 + input source for the input file, and
  5799 + ``QPDFWriter::setOutputPipeline`` allows the programmer to write
  5800 + the output file through any pipeline. These methods would make it
  5801 + possible to perform any number of specialized operations, such as
  5802 + accessing external storage systems, creating bindings for qpdf in
  5803 + other programming languages that have their own I/O systems, etc.
  5804 +
  5805 + - Add new method ``QPDF::getEncryptionKey`` for retrieving the
  5806 + underlying encryption key used in the file.
  5807 +
  5808 + - This release includes a small handful of non-compatible API
  5809 + changes. While effort is made to avoid such changes, all the
  5810 + non-compatible API changes in this version were to parts of the
  5811 + API that would likely never be used outside the library itself. In
  5812 + all cases, the altered methods or structures were parts of the
  5813 + ``QPDF`` that were public to enable them to be called from either
  5814 + ``QPDFWriter`` or were part of validation code that was
  5815 + over-zealous in reporting problems in parts of the file that would
  5816 + not ordinarily be referenced. In no case did any of the removed
  5817 + methods do anything worse that falsely report error conditions in
  5818 + files that were broken in ways that didn't matter. The following
  5819 + public parts of the ``QPDF`` class were changed in a
  5820 + non-compatible way:
  5821 +
  5822 + - Updated nested ``QPDF::EncryptionData`` class to add fields
  5823 + needed by the newer encryption formats, member variables
  5824 + changed to private so that future changes will not require
  5825 + breaking backward compatibility.
  5826 +
  5827 + - Added additional parameters to ``compute_data_key``, which is
  5828 + used by ``QPDFWriter`` to compute the encryption key used to
  5829 + encrypt a specific object.
  5830 +
  5831 + - Removed the method ``flattenScalarReferences``. This method was
  5832 + previously used prior to writing a new PDF file, but it has the
  5833 + undesired side effect of causing qpdf to read objects in the
  5834 + file that were not referenced. Some otherwise files have
  5835 + unreferenced objects with errors in them, so this could cause
  5836 + qpdf to reject files that would be accepted by virtually all
  5837 + other PDF readers. In fact, qpdf relied on only a very small
  5838 + part of what flattenScalarReferences did, so only this part has
  5839 + been preserved, and it is now done directly inside
  5840 + ``QPDFWriter``.
  5841 +
  5842 + - Removed the method ``decodeStreams``. This method was used by
  5843 + the @1@option@1@--check@2@option@2@ option of the
  5844 + @1@command@1@qpdf@2@command@2@ command-line tool to force all
  5845 + streams in the file to be decoded, but it also suffered from
  5846 + the problem of opening otherwise unreferenced streams and thus
  5847 + could report false positive. The
  5848 + @1@option@1@--check@2@option@2@ option now causes qpdf to go
  5849 + through all the motions of writing a new file based on the
  5850 + original one, so it will always reference and check exactly
  5851 + those parts of a file that any ordinary viewer would check.
  5852 +
  5853 + - Removed the method ``trimTrailerForWrite``. This method was
  5854 + used by ``QPDFWriter`` to modify the original QPDF object by
  5855 + removing fields from the trailer dictionary that wouldn't apply
  5856 + to the newly written file. This functionality, though generally
  5857 + harmless, was a poor implementation and has been replaced by
  5858 + having QPDFWriter filter these out when copying the trailer
  5859 + rather than modifying the original QPDF object. (Note that qpdf
  5860 + never modifies the original file itself.)
  5861 +
  5862 + - Allow the PDF header to appear anywhere in the first 1024 bytes of
  5863 + the file. This is consistent with what other readers do.
  5864 +
  5865 + - Fix the @1@command@1@pkg-config@2@command@2@ files to list zlib
  5866 + and pcre in ``Requires.private`` to better support static linking
  5867 + using @1@command@1@pkg-config@2@command@2@.
  5868 +
  5869 +3.0.2: September 6, 2012
  5870 + - Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not
  5871 + used with ``QPDFWriter::setStaticID``, which made it pretty much
  5872 + useless. This has been fixed.
  5873 +
  5874 + - New API call ``QPDFWriter::setExtraHeaderText`` inserts additional
  5875 + text near the header of the PDF file. The intended use case is to
  5876 + insert comments that may be consumed by a downstream application,
  5877 + though other use cases may exist.
  5878 +
  5879 +3.0.1: August 11, 2012
  5880 + - Version 3.0.0 included addition of files for
  5881 + @1@command@1@pkg-config@2@command@2@, but this was not mentioned
  5882 + in the release notes. The release notes for 3.0.0 were updated to
  5883 + mention this.
  5884 +
  5885 + - Bug fix: if an object stream ended with a scalar object not
  5886 + followed by space, qpdf would incorrectly report that it
  5887 + encountered a premature EOF. This bug has been in qpdf since
  5888 + versionย 2.0.
  5889 +
  5890 +3.0.0: August 2, 2012
  5891 + - Acknowledgment: I would like to express gratitude for the
  5892 + contributions of Tobias Hoffmann toward the release of qpdf
  5893 + version 3.0. He is responsible for most of the implementation and
  5894 + design of the new API for manipulating pages, and contributed code
  5895 + and ideas for many of the improvements made in version 3.0.
  5896 + Without his work, this release would certainly not have happened
  5897 + as soon as it did, if at all.
  5898 +
  5899 + - *Non-compatible API change:* The version of
  5900 + ``QPDFObjectHandle::replaceStreamData`` that uses a
  5901 + ``StreamDataProvider`` no longer requires (or accepts) a
  5902 + ``length`` parameter. See
  5903 + `appendix_title <#ref.upgrading-to-3.0>`__ for an explanation.
  5904 + While care is taken to avoid non-compatible API changes in
  5905 + general, an exception was made this time because the new interface
  5906 + offers an opportunity to significantly simplify calling code.
  5907 +
  5908 + - Support has been added for large files. The test suite verifies
  5909 + support for files larger than 4 gigabytes, and manual testing has
  5910 + verified support for files larger than 10 gigabytes. Large file
  5911 + support is available for both 32-bit and 64-bit platforms as long
  5912 + as the compiler and underlying platforms support it.
  5913 +
  5914 + - Support for page selection (splitting and merging PDF files) has
  5915 + been added to the @1@command@1@qpdf@2@command@2@ command-line
  5916 + tool. See `Page Selection Options <#ref.page-selection>`__.
  5917 +
  5918 + - Options have been added to the @1@command@1@qpdf@2@command@2@
  5919 + command-line tool for copying encryption parameters from another
  5920 + file. See `Basic Options <#ref.basic-options>`__.
  5921 +
  5922 + - New methods have been added to the ``QPDF`` object for adding and
  5923 + removing pages. See `Adding and Removing
  5924 + Pages <#ref.adding-and-remove-pages>`__.
  5925 +
  5926 + - New methods have been added to the ``QPDF`` object for copying
  5927 + objects from other PDF files. See `Copying Objects From Other PDF
  5928 + Files <#ref.foreign-objects>`__
  5929 +
  5930 + - A new method ``QPDFObjectHandle::parse`` has been added for
  5931 + constructing ``QPDFObjectHandle`` objects from a string
  5932 + description.
  5933 +
  5934 + - Methods have been added to ``QPDFWriter`` to allow writing to an
  5935 + already open stdio ``FILE*`` addition to writing to standard
  5936 + output or a named file. Methods have been added to ``QPDF`` to be
  5937 + able to process a file from an already open stdio ``FILE*``. This
  5938 + makes it possible to read and write PDF from secure temporary
  5939 + files that have been unlinked prior to being fully read or
  5940 + written.
  5941 +
  5942 + - The ``QPDF::emptyPDF`` can be used to allow creation of PDF files
  5943 + from scratch. The example
  5944 + @1@filename@1@examples/pdf-create.cc@2@filename@2@ illustrates how
  5945 + it can be used.
  5946 +
  5947 + - Several methods to take ``PointerHolder<Buffer>`` can now also
  5948 + accept ``std::string`` arguments.
  5949 +
  5950 + - Many new convenience methods have been added to the library, most
  5951 + in ``QPDFObjectHandle``. See @1@filename@1@ChangeLog@2@filename@2@
  5952 + for a full list.
  5953 +
  5954 + - When building on a platform that supports ELF shared libraries
  5955 + (such as Linux), symbol versions are enabled by default. They can
  5956 + be disabled by passing
  5957 + @1@option@1@--disable-ld-version-script@2@option@2@ to
  5958 + @1@command@1@./configure@2@command@2@.
  5959 +
  5960 + - The file @1@filename@1@libqpdf.pc@2@filename@2@ is now installed
  5961 + to support @1@command@1@pkg-config@2@command@2@.
  5962 +
  5963 + - Image comparison tests are off by default now since they are not
  5964 + needed to verify a correct build or port of qpdf. They are needed
  5965 + only when changing the actual PDF output generated by qpdf. You
  5966 + should enable them if you are making deep changes to qpdf itself.
  5967 + See @1@filename@1@README.md@2@filename@2@ for details.
  5968 +
  5969 + - Large file tests are off by default but can be turned on with
  5970 + @1@command@1@./configure@2@command@2@ or by setting an environment
  5971 + variable before running the test suite. See
  5972 + @1@filename@1@README.md@2@filename@2@ for details.
  5973 +
  5974 + - When qpdf's test suite fails, failures are not printed to the
  5975 + terminal anymore by default. Instead, find them in
  5976 + @1@filename@1@build/qtest.log@2@filename@2@. For packagers who are
  5977 + building with an autobuilder, you can add the
  5978 + @1@option@1@--enable-show-failed-test-output@2@option@2@ option to
  5979 + @1@command@1@./configure@2@command@2@ to restore the old behavior.
  5980 +
  5981 +2.3.1: December 28, 2011
  5982 + - Fix thread-safety problem resulting from non-thread-safe use of
  5983 + the PCRE library.
  5984 +
  5985 + - Made a few minor documentation fixes.
  5986 +
  5987 + - Add workaround for a bug that appears in some versions of
  5988 + ghostscript to the test suite
  5989 +
  5990 + - Fix minor build issue for Visual C++ 2010.
  5991 +
  5992 +2.3.0: August 11, 2011
  5993 + - Bug fix: when preserving existing encryption on encrypted files
  5994 + with cleartext metadata, older qpdf versions would generate
  5995 + password-protected files with no valid password. This operation
  5996 + now works. This bug only affected files created by copying
  5997 + existing encryption parameters; explicit encryption with
  5998 + specification of cleartext metadata worked before and continues to
  5999 + work.
  6000 +
  6001 + - Enhance ``QPDFWriter`` with a new constructor that allows you to
  6002 + delay the specification of the output file. When using this
  6003 + constructor, you may now call ``QPDFWriter::setOutputFilename`` to
  6004 + specify the output file, or you may use
  6005 + ``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write
  6006 + the resulting PDF file to a memory buffer. You may then use
  6007 + ``QPDFWriter::getBuffer`` to retrieve the memory buffer.
  6008 +
  6009 + - Add new API call ``QPDF::replaceObject`` for replacing objects by
  6010 + object ID
  6011 +
  6012 + - Add new API call ``QPDF::swapObjects`` for swapping two objects by
  6013 + object ID
  6014 +
  6015 + - Add ``QPDFObjectHandle::getDictAsMap`` and
  6016 + ``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of
  6017 + dictionary objects as maps and array objects as vectors.
  6018 +
  6019 + - Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to
  6020 + the C API for manipulating string fields of the document's
  6021 + ``/Info`` dictionary.
  6022 +
  6023 + - Add functions ``qpdf_init_write_memory``,
  6024 + ``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API
  6025 + for writing PDF files to a memory buffer instead of a file.
  6026 +
  6027 +2.2.4: June 25, 2011
  6028 + - Fix installation and compilation issues; no functionality changes.
  6029 +
  6030 +2.2.3: April 30, 2011
  6031 + - Handle some damaged streams with incorrect characters following
  6032 + the stream keyword.
  6033 +
  6034 + - Improve handling of inline images when normalizing content
  6035 + streams.
  6036 +
  6037 + - Enhance error recovery to properly handle files that use object 0
  6038 + as a regular object, which is specifically disallowed by the spec.
  6039 +
  6040 +2.2.2: October 4, 2010
  6041 + - Add new function ``qpdf_read_memory`` to the C API to call
  6042 + ``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1.
  6043 +
  6044 +2.2.1: October 1, 2010
  6045 + - Add new method ``QPDF::setOutputStreams`` to replace ``std::cout``
  6046 + and ``std::cerr`` with other streams for generation of diagnostic
  6047 + messages and error messages. This can be useful for GUIs or other
  6048 + applications that want to capture any output generated by the
  6049 + library to present to the user in some other way. Note that QPDF
  6050 + does not write to ``std::cout`` (or the specified output stream)
  6051 + except where explicitly mentioned in
  6052 + @1@filename@1@QPDF.hh@2@filename@2@, and that the only use of the
  6053 + error stream is for warnings. Note also that output of warnings is
  6054 + suppressed when ``setSuppressWarnings(true)`` is called.
  6055 +
  6056 + - Add new method ``QPDF::processMemoryFile`` for operating on PDF
  6057 + files that are loaded into memory rather than in a file on disk.
  6058 +
  6059 + - Give a warning but otherwise ignore empty PDF objects by treating
  6060 + them as null. Empty object are not permitted by the PDF
  6061 + specification but have been known to appear in some actual PDF
  6062 + files.
  6063 +
  6064 + - Handle inline image filter abbreviations when the appear as stream
  6065 + filter abbreviations. The PDF specification does not allow use of
  6066 + stream filter abbreviations in this way, but Adobe Reader and some
  6067 + other PDF readers accept them since they sometimes appear
  6068 + incorrectly in actual PDF files.
  6069 +
  6070 + - Implement miscellaneous enhancements to ``PointerHolder`` and
  6071 + ``Buffer`` to support other changes.
  6072 +
  6073 +2.2.0: August 14, 2010
  6074 + - Add new methods to ``QPDFObjectHandle`` (``newStream`` and
  6075 + ``replaceStreamData`` for creating new streams and replacing
  6076 + stream data. This makes it possible to perform a wide range of
  6077 + operations that were not previously possible.
  6078 +
  6079 + - Add new helper method in ``QPDFObjectHandle``
  6080 + (``addPageContents``) for appending or prepending new content
  6081 + streams to a page. This method makes it possible to manipulate
  6082 + content streams without having to be concerned whether a page's
  6083 + contents are a single stream or an array of streams.
  6084 +
  6085 + - Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``,
  6086 + which replaces a dictionary key with a given value unless the
  6087 + value is null, in which case it removes the key instead.
  6088 +
  6089 + - Add new method in ``QPDFObjectHandle``: ``getRawStreamData``,
  6090 + which returns the raw (unfiltered) stream data into a buffer. This
  6091 + complements the ``getStreamData`` method, which returns the
  6092 + filtered (uncompressed) stream data and can only be used when the
  6093 + stream's data is filterable.
  6094 +
  6095 + - Provide two new examples:
  6096 + @1@command@1@pdf-double-page-size@2@command@2@ and
  6097 + @1@command@1@pdf-invert-images@2@command@2@ that illustrate the
  6098 + newly added interfaces.
  6099 +
  6100 + - Fix a memory leak that would cause loss of a few bytes for every
  6101 + object involved in a cycle of object references. Thanks to Jian Ma
  6102 + for calling my attention to the leak.
  6103 +
  6104 +2.1.5: April 25, 2010
  6105 + - Remove restriction of file identifier strings to 16 bytes. This
  6106 + unnecessary restriction was preventing qpdf from being able to
  6107 + encrypt or decrypt files with identifier strings that were not
  6108 + exactly 16 bytes long. The specification imposes no such
  6109 + restriction.
  6110 +
  6111 +2.1.4: April 18, 2010
  6112 + - Apply the same padding calculation fix from version 2.1.2 to the
  6113 + main cross reference stream as well.
  6114 +
  6115 + - Since @1@command@1@qpdf --check@2@command@2@ only performs limited
  6116 + checks, clarify the output to make it clear that there still may
  6117 + be errors that qpdf can't check. This should make it less
  6118 + surprising to people when another PDF reader is unable to read a
  6119 + file that qpdf thinks is okay.
  6120 +
  6121 +2.1.3: March 27, 2010
  6122 + - Fix bug that could cause a failure when rewriting PDF files that
  6123 + contain object streams with unreferenced objects that in turn
  6124 + reference indirect scalars.
  6125 +
  6126 + - Don't complain about (invalid) AES streams that aren't a multiple
  6127 + of 16 bytes. Instead, pad them before decrypting.
  6128 +
  6129 +2.1.2: January 24, 2010
  6130 + - Fix bug in padding around first half cross reference stream in
  6131 + linearized files. The bug could cause an assertion failure when
  6132 + linearizing certain unlucky files.
  6133 +
  6134 +2.1.1: December 14, 2009
  6135 + - No changes in functionality; insert missing include in an internal
  6136 + library header file to support gcc 4.4, and update test suite to
  6137 + ignore broken Adobe Reader installations.
  6138 +
  6139 +2.1: October 30, 2009
  6140 + - This is the first version of qpdf to include Windows support. On
  6141 + Windows, it is possible to build a DLL. Additionally, a partial
  6142 + C-language API has been introduced, which makes it possible to
  6143 + call qpdf functions from non-C++ environments. I am very grateful
  6144 + to ลฝarko Gajiฤ‡ (http://zarko-gajic.iz.hr/) for tirelessly testing
  6145 + numerous pre-release versions of this DLL and providing many
  6146 + excellent suggestions on improving the interface.
  6147 +
  6148 + For programming to the C interface, please see the header file
  6149 + @1@filename@1@qpdf/qpdf-c.h@2@filename@2@ and the example
  6150 + @1@filename@1@examples/pdf-linearize.c@2@filename@2@.
  6151 +
  6152 + - ลฝarko Gajiฤ‡ has written a Delphi wrapper for qpdf, which can be
  6153 + downloaded from qpdf's download side. ลฝarko's Delphi wrapper is
  6154 + released with the same licensing terms as qpdf itself and comes
  6155 + with this disclaimer: "Delphi wrapper unit
  6156 + @1@filename@1@qpdf.pas@2@filename@2@ created by ลฝarko Gajiฤ‡
  6157 + (http://zarko-gajic.iz.hr/). Use at your own risk and for whatever
  6158 + purpose you want. No support is provided. Sample code is
  6159 + provided."
  6160 +
  6161 + - Support has been added for AES encryption and crypt filters.
  6162 + Although qpdf does not presently support files that use PKI-based
  6163 + encryption, with the addition of AES and crypt filters, qpdf is
  6164 + now be able to open most encrypted files created with newer
  6165 + versions of Acrobat or other PDF creation software. Note that I
  6166 + have not been able to get very many files encrypted in this way,
  6167 + so it's possible there could still be some cases that qpdf can't
  6168 + handle. Please report them if you find them.
  6169 +
  6170 + - Many error messages have been improved to include more information
  6171 + in hopes of making qpdf a more useful tool for PDF experts to use
  6172 + in manually recovering damaged PDF files.
  6173 +
  6174 + - Attempt to avoid compressing metadata streams if possible. This is
  6175 + consistent with other PDF creation applications.
  6176 +
  6177 + - Provide new command-line options for AES encrypt, cleartext
  6178 + metadata, and setting the minimum and forced PDF versions of
  6179 + output files.
  6180 +
  6181 + - Add additional methods to the ``QPDF`` object for querying the
  6182 + document's permissions. Although qpdf does not enforce these
  6183 + permissions, it does make them available so that applications that
  6184 + use qpdf can enforce permissions.
  6185 +
  6186 + - The @1@option@1@--check@2@option@2@ option to
  6187 + @1@command@1@qpdf@2@command@2@ has been extended to include some
  6188 + additional information.
  6189 +
  6190 + - There have been a handful of non-compatible API changes. For
  6191 + details, see `appendix_title <#ref.upgrading-to-2.1>`__.
  6192 +
  6193 +2.0.6: May 3, 2009
  6194 + - Do not attempt to uncompress streams that have decode parameters
  6195 + we don't recognize. Earlier versions of qpdf would have rejected
  6196 + files with such streams.
  6197 +
  6198 +2.0.5: March 10, 2009
  6199 + - Improve error handling in the LZW decoder, and fix a small error
  6200 + introduced in the previous version with regard to handling full
  6201 + tables. The LZW decoder has been more strongly verified in this
  6202 + release.
  6203 +
  6204 +2.0.4: February 21, 2009
  6205 + - Include proper support for LZW streams encoded without the "early
  6206 + code change" flag. Special thanks to Atom Smasher who reported the
  6207 + problem and provided an input file compressed in this way, which I
  6208 + did not previously have.
  6209 +
  6210 + - Implement some improvements to file recovery logic.
  6211 +
  6212 +2.0.3: February 15, 2009
  6213 + - Compile cleanly with gcc 4.4.
  6214 +
  6215 + - Handle strings encoded as UTF-16BE properly.
  6216 +
  6217 +2.0.2: June 30, 2008
  6218 + - Update test suite to work properly with a
  6219 + non-@1@command@1@bash@2@command@2@
  6220 + @1@filename@1@/bin/sh@2@filename@2@ and with Perl 5.10. No changes
  6221 + were made to the actual qpdf source code itself for this release.
  6222 +
  6223 +2.0.1: May 6, 2008
  6224 + - No changes in functionality or interface. This release includes
  6225 + fixes to the source code so that qpdf compiles properly and passes
  6226 + its test suite on a broader range of platforms. See
  6227 + @1@filename@1@ChangeLog@2@filename@2@ in the source distribution
  6228 + for details.
  6229 +
  6230 +2.0: April 29, 2008
  6231 + - First public release.
  6232 +
  6233 +.. _ref.upgrading-to-2.1:
  6234 +
  6235 +Upgrading from 2.0 to 2.1
  6236 +=========================
  6237 +
  6238 +Although, as a general rule, we like to avoid introducing source-level
  6239 +incompatibilities in qpdf's interface, there were a few non-compatible
  6240 +changes made in this version. A considerable amount of source code that
  6241 +uses qpdf will probably compile without any changes, but in some cases,
  6242 +you may have to update your code. The changes are enumerated here. There
  6243 +are also some new interfaces; for those, please refer to the header
  6244 +files.
  6245 +
  6246 +- QPDF's exception handling mechanism now uses ``std::logic_error`` for
  6247 + internal errors and ``std::runtime_error`` for runtime errors in
  6248 + favor of the now removed ``QEXC`` classes used in previous versions.
  6249 + The ``QEXC`` exception classes predated the addition of the
  6250 + @1@filename@1@<stdexcept>@2@filename@2@ header file to the C++
  6251 + standard library. Most of the exceptions thrown by the qpdf library
  6252 + itself are still of type ``QPDFExc`` which is now derived from
  6253 + ``std::runtime_error``. Programs that caught an instance of
  6254 + ``std::exception`` and displayed it by calling the ``what()`` method
  6255 + will not need to be changed.
  6256 +
  6257 +- The ``QPDFExc`` class now internally represents various fields of the
  6258 + error condition and provides interfaces for querying them. Among the
  6259 + fields is a numeric error code that can help applications act
  6260 + differently on (a small number of) different error conditions. See
  6261 + @1@filename@1@QPDFExc.hh@2@filename@2@ for details.
  6262 +
  6263 +- Warnings can be retrieved from qpdf as instances of ``QPDFExc``
  6264 + instead of strings.
  6265 +
  6266 +- The nested ``QPDF::EncryptionData`` class's constructor takes an
  6267 + additional argument. This class is primarily intended to be used by
  6268 + ``QPDFWriter``. There's not really anything useful an end-user
  6269 + application could do with it. It probably shouldn't really be part of
  6270 + the public interface to begin with. Likewise, some of the methods for
  6271 + computing internal encryption dictionary parameters have changed to
  6272 + support ``/R=4`` encryption.
  6273 +
  6274 +- The method ``QPDF::getUserPassword`` has been removed since it didn't
  6275 + do what people would think it did. There are now two new methods:
  6276 + ``QPDF::getPaddedUserPassword`` and ``QPDF::getTrimmedUserPassword``.
  6277 + The first one does what the old ``QPDF::getUserPassword`` method used
  6278 + to do, which is to return the password with possible binary padding
  6279 + as specified by the PDF specification. The second one returns a
  6280 + human-readable password string.
  6281 +
  6282 +- The enumerated types that used to be nested in ``QPDFWriter`` have
  6283 + moved to top-level enumerated types and are now defined in the file
  6284 + @1@filename@1@qpdf/Constants.h@2@filename@2@. This enables them to be
  6285 + shared by both the C and C++ interfaces.
  6286 +
  6287 +.. _ref.upgrading-to-3.0:
  6288 +
  6289 +Upgrading to 3.0
  6290 +================
  6291 +
  6292 +For the most part, the API for qpdf version 3.0 is backward compatible
  6293 +with versions 2.1 and later. There are two exceptions:
  6294 +
  6295 +- The method ``QPDFObjectHandle::replaceStreamData`` that uses a
  6296 + ``StreamDataProvider`` to provide the stream data no longer takes a
  6297 + ``length`` parameter. While it would have been easy enough to keep
  6298 + the parameter for backward compatibility, in this case, the parameter
  6299 + was removed since this provides the user an opportunity to simplify
  6300 + the calling code. This method was introduced in version 2.2. At the
  6301 + time, the ``length`` parameter was required in order to ensure that
  6302 + calls to the stream data provider returned the same length for a
  6303 + specific stream every time they were invoked. In particular, the
  6304 + linearization code depends on this. Instead, qpdf 3.0 and newer check
  6305 + for that constraint explicitly. The first time the stream data
  6306 + provider is called for a specific stream, the actual length is saved,
  6307 + and subsequent calls are required to return the same number of bytes.
  6308 + This means the calling code no longer has to compute the length in
  6309 + advance, which can be a significant simplification. If your code
  6310 + fails to compile because of the extra argument and you don't want to
  6311 + make other changes to your code, just omit the argument.
  6312 +
  6313 +- Many methods take ``long long`` instead of other integer types. Most
  6314 + if not all existing code should compile fine with this change since
  6315 + such parameters had always previously been smaller types. This change
  6316 + was required to support files larger than two gigabytes in size.
  6317 +
  6318 +.. _ref.upgrading-to-4.0:
  6319 +
  6320 +Upgrading to 4.0
  6321 +================
  6322 +
  6323 +While version 4.0 includes a few non-compatible API changes, it is very
  6324 +unlikely that anyone's code would have used any of those parts of the
  6325 +API since they generally required information that would only be
  6326 +available inside the library. In the unlikely event that you should run
  6327 +into trouble, please see the ChangeLog. See also
  6328 +`appendix_title <#ref.release-notes>`__ for a complete list of the
  6329 +non-compatible API changes made in this version.
  6330 +
  6331 +
  6332 +
9 6333 Indices and tables
10 6334 ==================
11 6335  
... ...