Commit abb6a16ed16b6137b829bc88a6f2b8b3b6c8cf35
1 parent
22d53f92
Insert output of pandoc as is
pandoc -f docbook -t rst qpdf-manual.xml >| /tmp/a.rst Insert /tmp/a.rst into existing index.rst
Showing
1 changed file
with
6324 additions
and
0 deletions
manual/index.rst
Changes suppressed. Click to show
| ... | ... | @@ -6,6 +6,6330 @@ QPDF version |release| |
| 6 | 6 | :maxdepth: 2 |
| 7 | 7 | :caption: Contents: |
| 8 | 8 | |
| 9 | +.. _acknowledgments: | |
| 10 | + | |
| 11 | +General Information | |
| 12 | +=================== | |
| 13 | + | |
| 14 | +QPDF is a program that does structural, content-preserving | |
| 15 | +transformations on PDF files. QPDF's website is located at | |
| 16 | +https://qpdf.sourceforge.io/. QPDF's source code is hosted on github at | |
| 17 | +https://github.com/qpdf/qpdf. | |
| 18 | + | |
| 19 | +QPDF is licensed under `the Apache License, Version | |
| 20 | +2.0 <http://www.apache.org/licenses/LICENSE-2.0>`__ (the "License"). | |
| 21 | +Unless required by applicable law or agreed to in writing, software | |
| 22 | +distributed under the License is distributed on an "AS IS" BASIS, | |
| 23 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
| 24 | +See the License for the specific language governing permissions and | |
| 25 | +limitations under the License. | |
| 26 | + | |
| 27 | +Versions of qpdf prior to version 7 were released under the terms of | |
| 28 | +`the Artistic License, version | |
| 29 | +2.0 <https://opensource.org/licenses/Artistic-2.0>`__. At your option, | |
| 30 | +you may continue to consider qpdf to be licensed under those terms. The | |
| 31 | +Apache License 2.0 permits everything that the Artistic License 2.0 | |
| 32 | +permits but is slightly less restrictive. Allowing the Artistic License | |
| 33 | +to continue being used is primary to help people who may have to get | |
| 34 | +specific approval to use qpdf in their products. | |
| 35 | + | |
| 36 | +QPDF is intentionally released with a permissive license. However, if | |
| 37 | +there is some reason that the licensing terms don't work for your | |
| 38 | +requirements, please feel free to contact the copyright holder to make | |
| 39 | +other arrangements. | |
| 40 | + | |
| 41 | +QPDF was originally created in 2001 and modified periodically between | |
| 42 | +2001 and 2005 during my employment at `Apex | |
| 43 | +CoVantage <http://www.apexcovantage.com>`__. Upon my departure from | |
| 44 | +Apex, the company graciously allowed me to take ownership of the | |
| 45 | +software and continue maintaining as an open source project, a decision | |
| 46 | +for which I am very grateful. I have made considerable enhancements to | |
| 47 | +it since that time. I feel fortunate to have worked for people who would | |
| 48 | +make such a decision. This work would not have been possible without | |
| 49 | +their support. | |
| 50 | + | |
| 51 | +.. _ref.overview: | |
| 52 | + | |
| 53 | +What is QPDF? | |
| 54 | +============= | |
| 55 | + | |
| 56 | +QPDF is a program that does structural, content-preserving | |
| 57 | +transformations on PDF files. It could have been called something like | |
| 58 | +*pdf-to-pdf*. It also provides many useful capabilities to developers of | |
| 59 | +PDF-producing software or for people who just want to look at the | |
| 60 | +innards of a PDF file to learn more about how they work. | |
| 61 | + | |
| 62 | +With QPDF, it is possible to copy objects from one PDF file into another | |
| 63 | +and to manipulate the list of pages in a PDF file. This makes it | |
| 64 | +possible to merge and split PDF files. The QPDF library also makes it | |
| 65 | +possible for you to create PDF files from scratch. In this mode, you are | |
| 66 | +responsible for supplying all the contents of the file, while the QPDF | |
| 67 | +library takes care off all the syntactical representation of the | |
| 68 | +objects, creation of cross references tables and, if you use them, | |
| 69 | +object streams, encryption, linearization, and other syntactic details. | |
| 70 | +You are still responsible for generating PDF content on your own. | |
| 71 | + | |
| 72 | +QPDF has been designed with very few external dependencies, and it is | |
| 73 | +intentionally very lightweight. QPDF is *not* a PDF content creation | |
| 74 | +library, a PDF viewer, or a program capable of converting PDF into other | |
| 75 | +formats. In particular, QPDF knows nothing about the semantics of PDF | |
| 76 | +content streams. If you are looking for something that can do that, you | |
| 77 | +should look elsewhere. However, once you have a valid PDF file, QPDF can | |
| 78 | +be used to transform that file in ways perhaps your original PDF | |
| 79 | +creation can't handle. For example, many programs generate simple PDF | |
| 80 | +files but can't password-protect them, web-optimize them, or perform | |
| 81 | +other transformations of that type. | |
| 82 | + | |
| 83 | +.. _ref.installing: | |
| 84 | + | |
| 85 | +Building and Installing QPDF | |
| 86 | +============================ | |
| 87 | + | |
| 88 | +This chapter describes how to build and install qpdf. Please see also | |
| 89 | +the @1@filename@1@README.md@2@filename@2@ and | |
| 90 | +@1@filename@1@INSTALL@2@filename@2@ files in the source distribution. | |
| 91 | + | |
| 92 | +.. _ref.prerequisites: | |
| 93 | + | |
| 94 | +System Requirements | |
| 95 | +------------------- | |
| 96 | + | |
| 97 | +The qpdf package has few external dependencies. In order to build qpdf, | |
| 98 | +the following packages are required: | |
| 99 | + | |
| 100 | +- A C++ compiler that supports C++-14. | |
| 101 | + | |
| 102 | +- zlib: http://www.zlib.net/ | |
| 103 | + | |
| 104 | +- jpeg: http://www.ijg.org/files/ or https://libjpeg-turbo.org/ | |
| 105 | + | |
| 106 | +- *Recommended but not required:* gnutls: https://www.gnutls.org/ to be | |
| 107 | + able to use the gnutls crypto provider, and/or openssl: | |
| 108 | + https://openssl.org/ to be able to use the openssl crypto provider. | |
| 109 | + | |
| 110 | +- gnu make 3.81 or newer: http://www.gnu.org/software/make | |
| 111 | + | |
| 112 | +- perl version 5.8 or newer: http://www.perl.org/; required for running | |
| 113 | + the test suite. Starting with qpdf version 9.1.1, perl is no longer | |
| 114 | + required at runtime. | |
| 115 | + | |
| 116 | +- GNU diffutils (any version): http://www.gnu.org/software/diffutils/ | |
| 117 | + is required to run the test suite. Note that this is the version of | |
| 118 | + diff present on virtually all GNU/Linux systems. This is required | |
| 119 | + because the test suite uses @1@command@1@diff -u@2@command@2@. | |
| 120 | + | |
| 121 | +Part of qpdf's test suite does comparisons of the contents PDF files by | |
| 122 | +converting them images and comparing the images. The image comparison | |
| 123 | +tests are disabled by default. Those tests are not required for | |
| 124 | +determining correctness of a qpdf build if you have not modified the | |
| 125 | +code since the test suite also contains expected output files that are | |
| 126 | +compared literally. The image comparison tests provide an extra check to | |
| 127 | +make sure that any content transformations don't break the rendering of | |
| 128 | +pages. Transformations that affect the content streams themselves are | |
| 129 | +off by default and are only provided to help developers look into the | |
| 130 | +contents of PDF files. If you are making deep changes to the library | |
| 131 | +that cause changes in the contents of the files that qpdf generates, | |
| 132 | +then you should enable the image comparison tests. Enable them by | |
| 133 | +running @1@command@1@configure@2@command@2@ with the | |
| 134 | +@1@option@1@--enable-test-compare-images@2@option@2@ flag. If you enable | |
| 135 | +this, the following additional requirements are required by the test | |
| 136 | +suite. Note that in no case are these items required to use qpdf. | |
| 137 | + | |
| 138 | +- libtiff: http://www.remotesensing.org/libtiff/ | |
| 139 | + | |
| 140 | +- GhostScript version 8.60 or newer: http://www.ghostscript.com | |
| 141 | + | |
| 142 | +If you do not enable this, then you do not need to have tiff and | |
| 143 | +ghostscript. | |
| 144 | + | |
| 145 | +Pre-built documentation is distributed with qpdf, so you should | |
| 146 | +generally not need to rebuild the documentation. In order to build the | |
| 147 | +documentation from its docbook sources, you need the docbook XML style | |
| 148 | +sheets (http://downloads.sourceforge.net/docbook/). To build the PDF | |
| 149 | +version of the documentation, you need Apache fop | |
| 150 | +(http://xml.apache.org/fop/) version 0.94 or higher. | |
| 151 | + | |
| 152 | +.. _ref.building: | |
| 153 | + | |
| 154 | +Build Instructions | |
| 155 | +------------------ | |
| 156 | + | |
| 157 | +Building qpdf on UNIX is generally just a matter of running | |
| 158 | + | |
| 159 | +:: | |
| 160 | + | |
| 161 | + ./configure | |
| 162 | + make | |
| 163 | + | |
| 164 | +You can also run @1@command@1@make check@2@command@2@ to run the test | |
| 165 | +suite and @1@command@1@make install@2@command@2@ to install. Please run | |
| 166 | +@1@command@1@./configure --help@2@command@2@ for options on what can be | |
| 167 | +configured. You can also set the value of ``DESTDIR`` during | |
| 168 | +installation to install to a temporary location, as is common with many | |
| 169 | +open source packages. Please see also the | |
| 170 | +@1@filename@1@README.md@2@filename@2@ and | |
| 171 | +@1@filename@1@INSTALL@2@filename@2@ files in the source distribution. | |
| 172 | + | |
| 173 | +Building on Windows is a little bit more complicated. For details, | |
| 174 | +please see @1@filename@1@README-windows.md@2@filename@2@ in the source | |
| 175 | +distribution. You can also download a binary distribution for Windows. | |
| 176 | +There is a port of qpdf to Visual C++ version 6 in the | |
| 177 | +@1@filename@1@contrib@2@filename@2@ area generously contributed by Jian | |
| 178 | +Ma. This is also discussed in more detail in | |
| 179 | +@1@filename@1@README-windows.md@2@filename@2@. | |
| 180 | + | |
| 181 | +While ``wchar_t`` is part of the C++ standard, qpdf uses it in only one | |
| 182 | +place in the public API, and it's just in a helper function. It is | |
| 183 | +possible to build qpdf on a system that doesn't have ``wchar_t``, and | |
| 184 | +it's also possible to compile a program that uses qpdf on a system | |
| 185 | +without ``wchar_t`` as long as you don't call that one method. This is a | |
| 186 | +very unusual situation. For a detailed discussion, please see the | |
| 187 | +top-level README.md file in qpdf's source distribution. | |
| 188 | + | |
| 189 | +There are some other things you can do with the build. Although qpdf | |
| 190 | +uses @1@application@1@autoconf@2@application@2@, it does not use | |
| 191 | +@1@application@1@automake@2@application@2@ but instead uses a | |
| 192 | +hand-crafted non-recursive Makefile that requires gnu make. If you're | |
| 193 | +really interested, please read the comments in the top-level | |
| 194 | +@1@filename@1@Makefile@2@filename@2@. | |
| 195 | + | |
| 196 | +.. _ref.crypto: | |
| 197 | + | |
| 198 | +Crypto Providers | |
| 199 | +---------------- | |
| 200 | + | |
| 201 | +Starting with qpdf 9.1.0, the qpdf library can be built with multiple | |
| 202 | +implementations of providers of cryptographic functions, which we refer | |
| 203 | +to as "crypto providers." At the time of writing, a crypto | |
| 204 | +implementation must provide MD5 and SHA2 (256, 384, and 512-bit) hashes | |
| 205 | +and RC4 and AES256 with and without CBC encryption. In the future, if | |
| 206 | +digital signature is added to qpdf, there may be additional requirements | |
| 207 | +beyond this. | |
| 208 | + | |
| 209 | +Starting with qpdf version 9.1.0, the available implementations are | |
| 210 | +``native`` and ``gnutls``. In qpdf 10.0.0, ``openssl`` was added. | |
| 211 | +Additional implementations may be added if needed. It is also possible | |
| 212 | +for a developer to provide their own implementation without modifying | |
| 213 | +the qpdf library. | |
| 214 | + | |
| 215 | +.. _ref.crypto.build: | |
| 216 | + | |
| 217 | +Build Support For Crypto Providers | |
| 218 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| 219 | + | |
| 220 | +When building with qpdf's build system, crypto providers can be enabled | |
| 221 | +at build time using various @1@command@1@./configure@2@command@2@ | |
| 222 | +options. The default behavior is for | |
| 223 | +@1@command@1@./configure@2@command@2@ to discover which crypto providers | |
| 224 | +can be supported based on available external libraries, to build all | |
| 225 | +available crypto providers, and to use an external provider as the | |
| 226 | +default over the native one. This behavior can be changed with the | |
| 227 | +following flags to @1@command@1@./configure@2@command@2@: | |
| 228 | + | |
| 229 | +- @1@option@1@--enable-crypto-@1@replaceable@1@x@2@replaceable@2@@2@option@2@ | |
| 230 | + (where @1@replaceable@1@x@2@replaceable@2@ is a supported crypto | |
| 231 | + provider): enable the @1@replaceable@1@x@2@replaceable@2@ crypto | |
| 232 | + provider, requiring any external dependencies it needs | |
| 233 | + | |
| 234 | +- @1@option@1@--disable-crypto-@1@replaceable@1@x@2@replaceable@2@@2@option@2@: | |
| 235 | + disable the @1@replaceable@1@x@2@replaceable@2@ provider, and do not | |
| 236 | + link against its dependencies even if they are available | |
| 237 | + | |
| 238 | +- @1@option@1@--with-default-crypto=@1@replaceable@1@x@2@replaceable@2@@2@option@2@: | |
| 239 | + make @1@replaceable@1@x@2@replaceable@2@ the default provider even if | |
| 240 | + a higher priority one is available | |
| 241 | + | |
| 242 | +- @1@option@1@--disable-implicit-crypto@2@option@2@: only build crypto | |
| 243 | + providers that are explicitly requested with an | |
| 244 | + @1@option@1@--enable-crypto-@1@replaceable@1@x@2@replaceable@2@@2@option@2@ | |
| 245 | + option | |
| 246 | + | |
| 247 | +For example, if you want to guarantee that the gnutls crypto provider is | |
| 248 | +used and that the native provider is not built, you could run | |
| 249 | +@1@command@1@./configure --enable-crypto-gnutls | |
| 250 | +--disable-implicit-crypto@2@command@2@. | |
| 251 | + | |
| 252 | +If you build qpdf using your own build system, in order for qpdf to work | |
| 253 | +at all, you need to enable at least one crypto provider. The file | |
| 254 | +@1@filename@1@libqpdf/qpdf/qpdf-config.h.in@2@filename@2@ provides | |
| 255 | +macros ``DEFAULT_CRYPTO``, whose value must be a string naming the | |
| 256 | +default crypto provider, and various symbols starting with | |
| 257 | +``USE_CRYPTO_``, at least one of which has to be enabled. Additionally, | |
| 258 | +you must compile the source files that implement a crypto provider. To | |
| 259 | +get a list of those files, look at | |
| 260 | +@1@filename@1@libqpdf/build.mk@2@filename@2@. If you want to omit a | |
| 261 | +particular crypto provider, as long as its ``USE_CRYPTO_`` symbol is | |
| 262 | +undefined, you can completely ignore the source files that belong to a | |
| 263 | +particular crypto provider. Additionally, crypto providers may have | |
| 264 | +their own external dependencies that can be omitted if the crypto | |
| 265 | +provider is not used. For example, if you are building qpdf yourself and | |
| 266 | +are using an environment that does not support gnutls or openssl, you | |
| 267 | +can ensure that ``USE_CRYPTO_NATIVE`` is defined, ``USE_CRYPTO_GNUTLS`` | |
| 268 | +is not defined, and ``DEFAULT_CRYPTO`` is defined to ``"native"``. Then | |
| 269 | +you must include the source files used in the native implementation, | |
| 270 | +some of which were added or renamed from earlier versions, to your | |
| 271 | +build, and you can ignore | |
| 272 | +@1@filename@1@QPDFCrypto_gnutls.cc@2@filename@2@. Always consult | |
| 273 | +@1@filename@1@libqpdf/build.mk@2@filename@2@ to get the list of source | |
| 274 | +files you need to build. | |
| 275 | + | |
| 276 | +.. _ref.crypto.runtime: | |
| 277 | + | |
| 278 | +Runtime Crypto Provider Selection | |
| 279 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| 280 | + | |
| 281 | +You can use the @1@option@1@--show-crypto@2@option@2@ option to | |
| 282 | +@1@command@1@qpdf@2@command@2@ to get a list of available crypto | |
| 283 | +providers. The default provider is always listed first, and the rest are | |
| 284 | +listed in lexical order. Each crypto provider is listed on a line by | |
| 285 | +itself with no other text, enabling the output of this command to be | |
| 286 | +used easily in scripts. | |
| 287 | + | |
| 288 | +You can override which crypto provider is used by setting the | |
| 289 | +``QPDF_CRYPTO_PROVIDER`` environment variable. There are few reasons to | |
| 290 | +ever do this, but you might want to do it if you were explicitly trying | |
| 291 | +to compare behavior of two different crypto providers while testing | |
| 292 | +performance or reproducing a bug. It could also be useful for people who | |
| 293 | +are implementing their own crypto providers. | |
| 294 | + | |
| 295 | +.. _ref.crypto.develop: | |
| 296 | + | |
| 297 | +Crypto Provider Information for Developers | |
| 298 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| 299 | + | |
| 300 | +If you are writing code that uses libqpdf and you want to force a | |
| 301 | +certain crypto provider to be used, you can call the method | |
| 302 | +``QPDFCryptoProvider::setDefaultProvider``. The argument is the name of | |
| 303 | +a built-in or developer-supplied provider. To add your own crypto | |
| 304 | +provider, you have to create a class derived from ``QPDFCryptoImpl`` and | |
| 305 | +register it with ``QPDFCryptoProvider``. For additional information, see | |
| 306 | +comments in @1@filename@1@include/qpdf/QPDFCryptoImpl.hh@2@filename@2@. | |
| 307 | + | |
| 308 | +.. _ref.crypto.design: | |
| 309 | + | |
| 310 | +Crypto Provider Design Notes | |
| 311 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| 312 | + | |
| 313 | +This section describes a few bits of rationale for why the crypto | |
| 314 | +provider interface was set up the way it was. You don't need to know any | |
| 315 | +of this information, but it's provided for the record and in case it's | |
| 316 | +interesting. | |
| 317 | + | |
| 318 | +As a general rule, I want to avoid as much as possible including large | |
| 319 | +blocks of code that are conditionally compiled such that, in most | |
| 320 | +builds, some code is never built. This is dangerous because it makes it | |
| 321 | +very easy for invalid code to creep in unnoticed. As such, I want it to | |
| 322 | +be possible to build qpdf with all available crypto providers, and this | |
| 323 | +is the way I build qpdf for local development. At the same time, if a | |
| 324 | +particular packager feels that it is a security liability for qpdf to | |
| 325 | +use crypto functionality from other than a library that gets | |
| 326 | +considerable scrutiny for this specific purpose (such as gnutls, | |
| 327 | +openssl, or nettle), then I want to give that packager the ability to | |
| 328 | +completely disable qpdf's native implementation. Or if someone wants to | |
| 329 | +avoid adding a dependency on one of the external crypto providers, I | |
| 330 | +don't want the availability of the provider to impose additional | |
| 331 | +external dependencies within that environment. Both of these are | |
| 332 | +situations that I know to be true for some users of qpdf. | |
| 333 | + | |
| 334 | +I want registration and selection of crypto providers to be thread-safe, | |
| 335 | +and I want it to work deterministically for a developer to provide their | |
| 336 | +own crypto provider and be able to set it up as the default. This was | |
| 337 | +the primary motivation behind requiring C++-11 as doing so enabled me to | |
| 338 | +exploit the guaranteed thread safety of local block static | |
| 339 | +initialization. The ``QPDFCryptoProvider`` class uses a singleton | |
| 340 | +pattern with thread-safe initialization to create the singleton instance | |
| 341 | +of ``QPDFCryptoProvider`` and exposes only static methods in its public | |
| 342 | +interface. In this way, if a developer wants to call any | |
| 343 | +``QPDFCryptoProvider`` methods, the library guarantees the | |
| 344 | +``QPDFCryptoProvider`` is fully initialized and all built-in crypto | |
| 345 | +providers are registered. Making ``QPDFCryptoProvider`` actually know | |
| 346 | +about all the built-in providers may seem a bit sad at first, but this | |
| 347 | +choice makes it extremely clear exactly what the initialization behavior | |
| 348 | +is. There's no question about provider implementations automatically | |
| 349 | +registering themselves in a nondeterministic order. It also means that | |
| 350 | +implementations do not need to know anything about the provider | |
| 351 | +interface, which makes them easier to test in isolation. Another | |
| 352 | +advantage of this approach is that a developer who wants to develop | |
| 353 | +their own crypto provider can do so in complete isolation from the qpdf | |
| 354 | +library and, with just two calls, can make qpdf use their provider in | |
| 355 | +their application. If they decided to contribute their code, plugging it | |
| 356 | +into the qpdf library would require a very small change to qpdf's source | |
| 357 | +code. | |
| 358 | + | |
| 359 | +The decision to make the crypto provider selectable at runtime was one I | |
| 360 | +struggled with a little, but I decided to do it for various reasons. | |
| 361 | +Allowing an end user to switch crypto providers easily could be very | |
| 362 | +useful for reproducing a potential bug. If a user reports a bug that | |
| 363 | +some cryptographic thing is broken, I can easily ask that person to try | |
| 364 | +with the ``QPDF_CRYPTO_PROVIDER`` variable set to different values. The | |
| 365 | +same could apply in the event of a performance problem. This also makes | |
| 366 | +it easier for qpdf's own test suite to exercise code with different | |
| 367 | +providers without having to make every program that links with qpdf | |
| 368 | +aware of the possibility of multiple providers. In qpdf's continuous | |
| 369 | +integration environment, the entire test suite is run for each supported | |
| 370 | +crypto provider. This is made simple by being able to select the | |
| 371 | +provider using an environment variable. | |
| 372 | + | |
| 373 | +Finally, making crypto providers selectable in this way establish a | |
| 374 | +pattern that I may follow again in the future for stream filter | |
| 375 | +providers. One could imagine a future enhancement where someone could | |
| 376 | +provide their own implementations for basic filters like | |
| 377 | +``/FlateDecode`` or for other filters that qpdf doesn't support. | |
| 378 | +Implementing the registration functions and internal storage of | |
| 379 | +registered providers was also easier using C++-11's functional | |
| 380 | +interfaces, which was another reason to require C++-11 at this time. | |
| 381 | + | |
| 382 | +.. _ref.packaging: | |
| 383 | + | |
| 384 | +Notes for Packagers | |
| 385 | +------------------- | |
| 386 | + | |
| 387 | +If you are packaging qpdf for an operating system distribution, here are | |
| 388 | +some things you may want to keep in mind: | |
| 389 | + | |
| 390 | +- Starting in qpdf version 9.1.1, qpdf no longer has a runtime | |
| 391 | + dependency on perl. This is because fix-qdf was rewritten in C++. | |
| 392 | + However, qpdf still has a build-time dependency on perl. | |
| 393 | + | |
| 394 | +- Make sure you are getting the intended behavior with regard to crypto | |
| 395 | + providers. Read `Build Support For Crypto | |
| 396 | + Providers <#ref.crypto.build>`__ for details. | |
| 397 | + | |
| 398 | +- Passing @1@option@1@--enable-show-failed-test-output@2@option@2@ to | |
| 399 | + @1@command@1@./configure@2@command@2@ will cause any failed test | |
| 400 | + output to be written to the console. This can be very useful for | |
| 401 | + seeing test failures generated by autobuilders where you can't access | |
| 402 | + qtest.log after the fact. | |
| 403 | + | |
| 404 | +- If qpdf's build environment detects the presence of autoconf and | |
| 405 | + related tools, it will check to ensure that automatically generated | |
| 406 | + files are up-to-date with recorded checksums and fail if it detects a | |
| 407 | + discrepancy. This feature is intended to prevent you from | |
| 408 | + accidentally forgetting to regenerate automatic files after modifying | |
| 409 | + their sources. If your packaging environment automatically refreshes | |
| 410 | + automatic files, it can cause this check to fail. Suppress qpdf's | |
| 411 | + checks by passing @1@option@1@--disable-check-autofiles@2@option@2@ | |
| 412 | + to @1@command@1@/.configure@2@command@2@. This is safe since qpdf's | |
| 413 | + @1@command@1@autogen.sh@2@command@2@ just runs autotools in the | |
| 414 | + normal way. | |
| 415 | + | |
| 416 | +- QPDF's @1@command@1@make install@2@command@2@ does not install | |
| 417 | + completion files by default, but as a packager, it's good if you | |
| 418 | + install them wherever your distribution expects such files to go. You | |
| 419 | + can find completion files to install in the | |
| 420 | + @1@filename@1@completions@2@filename@2@ directory. | |
| 421 | + | |
| 422 | +- Packagers are encouraged to install the source files from the | |
| 423 | + @1@filename@1@examples@2@filename@2@ directory along with qpdf | |
| 424 | + development packages. | |
| 425 | + | |
| 426 | +.. _ref.using: | |
| 427 | + | |
| 428 | +Running QPDF | |
| 429 | +============ | |
| 430 | + | |
| 431 | +This chapter describes how to run the qpdf program from the command | |
| 432 | +line. | |
| 433 | + | |
| 434 | +.. _ref.invocation: | |
| 435 | + | |
| 436 | +Basic Invocation | |
| 437 | +---------------- | |
| 438 | + | |
| 439 | +When running qpdf, the basic invocation is as follows: | |
| 440 | + | |
| 441 | +:: | |
| 442 | + | |
| 443 | + @1@command@1@qpdf@2@command@2@@1@option@1@ [ @1@replaceable@1@options@2@replaceable@2@ ] { @1@replaceable@1@infilename@2@replaceable@2@ | @1@option@1@--empty@2@option@2@ } [ @1@replaceable@1@page_selection_options@2@replaceable@2@ ] @1@replaceable@1@outfilename@2@replaceable@2@@2@option@2@ | |
| 444 | + | |
| 445 | +This converts PDF file @1@option@1@infilename@2@option@2@ to PDF file | |
| 446 | +@1@option@1@outfilename@2@option@2@. The output file is functionally | |
| 447 | +identical to the input file but may have been structurally reorganized. | |
| 448 | +Also, orphaned objects will be removed from the file. Many | |
| 449 | +transformations are available as controlled by the options below. In | |
| 450 | +place of @1@option@1@infilename@2@option@2@, the parameter | |
| 451 | +@1@option@1@--empty@2@option@2@ may be specified. This causes qpdf to | |
| 452 | +use a dummy input file that contains zero pages. The only normal use | |
| 453 | +case for using @1@option@1@--empty@2@option@2@ would be if you were | |
| 454 | +going to add pages from another source, as discussed in `Page Selection | |
| 455 | +Options <#ref.page-selection>`__. | |
| 456 | + | |
| 457 | +If @1@option@1@@filename@2@option@2@ appears as a word anywhere in the | |
| 458 | +command-line, it will be read line by line, and each line will be | |
| 459 | +treated as a command-line argument. Leading and trailing whitespace is | |
| 460 | +intentionally not removed from lines, which makes it possible to handle | |
| 461 | +arguments that start or end with spaces. The @1@option@1@@-@2@option@2@ | |
| 462 | +option allows arguments to be read from standard input. This allows qpdf | |
| 463 | +to be invoked with an arbitrary number of arbitrarily long arguments. It | |
| 464 | +is also very useful for avoiding having to pass passwords on the command | |
| 465 | +line. Note that the @1@option@1@@filename@2@option@2@ can't appear in | |
| 466 | +the middle of an argument, so constructs such as | |
| 467 | +@1@option@1@--arg=@option@2@option@2@ will not work. You would have to | |
| 468 | +include the argument and its options together in the arguments file. | |
| 469 | + | |
| 470 | +@1@option@1@outfilename@2@option@2@ does not have to be seekable, even | |
| 471 | +when generating linearized files. Specifying "@1@option@1@-@2@option@2@" | |
| 472 | +as @1@option@1@outfilename@2@option@2@ means to write to standard | |
| 473 | +output. If you want to overwrite the input file with the output, use the | |
| 474 | +option @1@option@1@--replace-input@2@option@2@ and omit the output file | |
| 475 | +name. You can't specify the same file as both the input and the output. | |
| 476 | +If you do this, qpdf will tell you about the | |
| 477 | +@1@option@1@--replace-input@2@option@2@ option. | |
| 478 | + | |
| 479 | +Most options require an output file, but some testing or inspection | |
| 480 | +commands do not. These are specifically noted. | |
| 481 | + | |
| 482 | +.. _ref.exit-status: | |
| 483 | + | |
| 484 | +Exit Status | |
| 485 | +~~~~~~~~~~~ | |
| 486 | + | |
| 487 | +The exit status of @1@command@1@qpdf@2@command@2@ may be interpreted as | |
| 488 | +follows: | |
| 489 | + | |
| 490 | +- ``0``: no errors or warnings were found. The file may still have | |
| 491 | + problems qpdf can't detect. If | |
| 492 | + @1@option@1@--warning-exit-0@2@option@2@ was specified, exit status 0 | |
| 493 | + is used even if there are warnings. | |
| 494 | + | |
| 495 | +- ``2``: errors were found. qpdf was not able to fully process the | |
| 496 | + file. | |
| 497 | + | |
| 498 | +- ``3``: qpdf encountered problems that it was able to recover from. In | |
| 499 | + some cases, the resulting file may still be damaged. Note that qpdf | |
| 500 | + still exits with status ``3`` if it finds warnings even when | |
| 501 | + @1@option@1@--no-warn@2@option@2@ is specified. With | |
| 502 | + @1@option@1@--warning-exit-0@2@option@2@, warnings without errors | |
| 503 | + exit with status 0 instead of 3. | |
| 504 | + | |
| 505 | +Note that @1@command@1@qpdf@2@command@2@ never exists with status ``1``. | |
| 506 | +If you get an exit status of ``1``, it was something else, like the | |
| 507 | +shell not being able to find or execute @1@command@1@qpdf@2@command@2@. | |
| 508 | + | |
| 509 | +.. _ref.shell-completion: | |
| 510 | + | |
| 511 | +Shell Completion | |
| 512 | +---------------- | |
| 513 | + | |
| 514 | +Starting in qpdf version 8.3.0, qpdf provides its own completion support | |
| 515 | +for zsh and bash. You can enable bash completion with @1@command@1@eval | |
| 516 | +$(qpdf --completion-bash)@2@command@2@ and zsh completion with | |
| 517 | +@1@command@1@eval $(qpdf --completion-zsh)@2@command@2@. If | |
| 518 | +@1@command@1@qpdf@2@command@2@ is not in your path, you should invoke it | |
| 519 | +above with an absolute path. If you invoke it with a relative path, it | |
| 520 | +will warn you, and the completion won't work if you're in a different | |
| 521 | +directory. | |
| 522 | + | |
| 523 | +qpdf will use ``argv[0]`` to figure out where its executable is. This | |
| 524 | +may produce unwanted results in some cases, especially if you are trying | |
| 525 | +to use completion with copy of qpdf that is built from source. You can | |
| 526 | +specify a full path to the qpdf you want to use for completion in the | |
| 527 | +``QPDF_EXECUTABLE`` environment variable. | |
| 528 | + | |
| 529 | +.. _ref.basic-options: | |
| 530 | + | |
| 531 | +Basic Options | |
| 532 | +------------- | |
| 533 | + | |
| 534 | +The following options are the most common ones and perform commonly | |
| 535 | +needed transformations. | |
| 536 | + | |
| 537 | +@1@option@1@--help@2@option@2@ | |
| 538 | + Display command-line invocation help. | |
| 539 | + | |
| 540 | +@1@option@1@--version@2@option@2@ | |
| 541 | + Display the current version of qpdf. | |
| 542 | + | |
| 543 | +@1@option@1@--copyright@2@option@2@ | |
| 544 | + Show detailed copyright information. | |
| 545 | + | |
| 546 | +@1@option@1@--show-crypto@2@option@2@ | |
| 547 | + Show a list of available crypto providers, each on a line by itself. | |
| 548 | + The default provider is always listed first. See `Crypto | |
| 549 | + Providers <#ref.crypto>`__ for more information about crypto | |
| 550 | + providers. | |
| 551 | + | |
| 552 | +@1@option@1@--completion-bash@2@option@2@ | |
| 553 | + Output a completion command you can eval to enable shell completion | |
| 554 | + from bash. | |
| 555 | + | |
| 556 | +@1@option@1@--completion-zsh@2@option@2@ | |
| 557 | + Output a completion command you can eval to enable shell completion | |
| 558 | + from zsh. | |
| 559 | + | |
| 560 | +@1@option@1@--password=@1@replaceable@1@password@2@replaceable@2@@2@option@2@ | |
| 561 | + Specifies a password for accessing encrypted files. To read the | |
| 562 | + password from a file or standard input, you can use | |
| 563 | + @1@option@1@--password-file@2@option@2@, added in qpdf 10.2. Note | |
| 564 | + that you can also use @1@option@1@@filename@2@option@2@ or | |
| 565 | + @1@option@1@@-@2@option@2@ as described above to put the password in | |
| 566 | + a file or pass it via standard input, but you would do so by | |
| 567 | + specifying the entire | |
| 568 | + @1@option@1@--password=@1@replaceable@1@password@2@replaceable@2@@2@option@2@ | |
| 569 | + option in the file. Syntax such as | |
| 570 | + @1@option@1@--password=@filename@2@option@2@ won't work since | |
| 571 | + @1@option@1@@filename@2@option@2@ is not recognized in the middle of | |
| 572 | + an argument. | |
| 573 | + | |
| 574 | +@1@option@1@--password-file=@1@replaceable@1@filename@2@replaceable@2@@2@option@2@ | |
| 575 | + Reads the first line from the specified file and uses it as the | |
| 576 | + password for accessing encrypted files. | |
| 577 | + @1@option@1@@1@replaceable@1@filename@2@replaceable@2@@2@option@2@ | |
| 578 | + may be ``-`` to read the password from standard input. Note that, in | |
| 579 | + this case, the password is echoed and there is no prompt, so use with | |
| 580 | + caution. | |
| 581 | + | |
| 582 | +@1@option@1@--is-encrypted@2@option@2@ | |
| 583 | + Silently exit with status 0 if the file is encrypted or status 2 if | |
| 584 | + the file is not encrypted. This is useful for shell scripts. Other | |
| 585 | + options are ignored if this is given. This option is mutually | |
| 586 | + exclusive with @1@option@1@--requires-password@2@option@2@. Both this | |
| 587 | + option and @1@option@1@--requires-password@2@option@2@ exit with | |
| 588 | + status 2 for non-encrypted files. | |
| 589 | + | |
| 590 | +@1@option@1@--requires-password@2@option@2@ | |
| 591 | + Silently exit with status 0 if a password (other than as supplied) is | |
| 592 | + required. Exit with status 2 if the file is not encrypted. Exit with | |
| 593 | + status 3 if the file is encrypted but requires no password or the | |
| 594 | + correct password has been supplied. This is useful for shell scripts. | |
| 595 | + Note that any supplied password is used when opening the file. When | |
| 596 | + used with a @1@option@1@--password@2@option@2@ option, this option | |
| 597 | + can be used to check the correctness of the password. In that case, | |
| 598 | + an exit status of 3 means the file works with the supplied password. | |
| 599 | + This option is mutually exclusive with | |
| 600 | + @1@option@1@--is-encrypted@2@option@2@. Both this option and | |
| 601 | + @1@option@1@--is-encrypted@2@option@2@ exit with status 2 for | |
| 602 | + non-encrypted files. | |
| 603 | + | |
| 604 | +@1@option@1@--verbose@2@option@2@ | |
| 605 | + Increase verbosity of output. For now, this just prints some | |
| 606 | + indication of any file that it creates. | |
| 607 | + | |
| 608 | +@1@option@1@--progress@2@option@2@ | |
| 609 | + Indicate progress while writing files. | |
| 610 | + | |
| 611 | +@1@option@1@--no-warn@2@option@2@ | |
| 612 | + Suppress writing of warnings to stderr. If warnings were detected and | |
| 613 | + suppressed, @1@command@1@qpdf@2@command@2@ will still exit with exit | |
| 614 | + code 3. See also @1@option@1@--warning-exit-0@2@option@2@. | |
| 615 | + | |
| 616 | +@1@option@1@--warning-exit-0@2@option@2@ | |
| 617 | + If warnings are found but no errors, exit with exit code 0 instead 3. | |
| 618 | + When combined with @1@option@1@--no-warn@2@option@2@, the effect is | |
| 619 | + for @1@command@1@qpdf@2@command@2@ to completely ignore warnings. | |
| 620 | + | |
| 621 | +@1@option@1@--linearize@2@option@2@ | |
| 622 | + Causes generation of a linearized (web-optimized) output file. | |
| 623 | + | |
| 624 | +@1@option@1@--replace-input@2@option@2@ | |
| 625 | + If specified, the output file name should be omitted. This option | |
| 626 | + tells qpdf to replace the input file with the output. It does this by | |
| 627 | + writing to | |
| 628 | + @1@filename@1@@1@replaceable@1@infilename@2@replaceable@2@.~qpdf-temp#@2@filename@2@ | |
| 629 | + and, when done, overwriting the input file with the temporary file. | |
| 630 | + If there were any warnings, the original input is saved as | |
| 631 | + @1@filename@1@@1@replaceable@1@infilename@2@replaceable@2@.~qpdf-orig@2@filename@2@. | |
| 632 | + | |
| 633 | +@1@option@1@--copy-encryption=file@2@option@2@ | |
| 634 | + Encrypt the file using the same encryption parameters, including user | |
| 635 | + and owner password, as the specified file. Use | |
| 636 | + @1@option@1@--encryption-file-password@2@option@2@ to specify a | |
| 637 | + password if one is needed to open this file. Note that copying the | |
| 638 | + encryption parameters from a file also copies the first half of | |
| 639 | + ``/ID`` from the file since this is part of the encryption | |
| 640 | + parameters. | |
| 641 | + | |
| 642 | +@1@option@1@--encryption-file-password=password@2@option@2@ | |
| 643 | + If the file specified with @1@option@1@--copy-encryption@2@option@2@ | |
| 644 | + requires a password, specify the password using this option. Note | |
| 645 | + that only one of the user or owner password is required. Both | |
| 646 | + passwords will be preserved since QPDF does not distinguish between | |
| 647 | + the two passwords. It is possible to preserve encryption parameters, | |
| 648 | + including the owner password, from a file even if you don't know the | |
| 649 | + file's owner password. | |
| 650 | + | |
| 651 | +@1@option@1@--allow-weak-crypto@2@option@2@ | |
| 652 | + Starting with version 10.4, qpdf issues warnings when requested to | |
| 653 | + create files using RC4 encryption. This option suppresses those | |
| 654 | + warnings. In future versions of qpdf, qpdf will refuse to create | |
| 655 | + files with weak cryptography when this flag is not given. See `Weak | |
| 656 | + Cryptography <#ref.weak-crypto>`__ for additional details. | |
| 657 | + | |
| 658 | +@1@option@1@--encrypt options --@2@option@2@ | |
| 659 | + Causes generation an encrypted output file. Please see `Encryption | |
| 660 | + Options <#ref.encryption-options>`__ for details on how to specify | |
| 661 | + encryption parameters. | |
| 662 | + | |
| 663 | +@1@option@1@--decrypt@2@option@2@ | |
| 664 | + Removes any encryption on the file. A password must be supplied if | |
| 665 | + the file is password protected. | |
| 666 | + | |
| 667 | +@1@option@1@--password-is-hex-key@2@option@2@ | |
| 668 | + Overrides the usual computation/retrieval of the PDF file's | |
| 669 | + encryption key from user/owner password with an explicit | |
| 670 | + specification of the encryption key. When this option is specified, | |
| 671 | + the argument to the @1@option@1@--password@2@option@2@ option is | |
| 672 | + interpreted as a hexadecimal-encoded key value. This only applies to | |
| 673 | + the password used to open the main input file. It does not apply to | |
| 674 | + other files opened by @1@option@1@--pages@2@option@2@ or other | |
| 675 | + options or to files being written. | |
| 676 | + | |
| 677 | + Most users will never have a need for this option, and no standard | |
| 678 | + viewers support this mode of operation, but it can be useful for | |
| 679 | + forensic or investigatory purposes. For example, if a PDF file is | |
| 680 | + encrypted with an unknown password, a brute-force attack using the | |
| 681 | + key directly is sometimes more efficient than one using the password. | |
| 682 | + Also, if a file is heavily damaged, it may be possible to derive the | |
| 683 | + encryption key and recover parts of the file using it directly. To | |
| 684 | + expose the encryption key used by an encrypted file that you can open | |
| 685 | + normally, use the @1@option@1@--show-encryption-key@2@option@2@ | |
| 686 | + option. | |
| 687 | + | |
| 688 | +@1@option@1@--suppress-password-recovery@2@option@2@ | |
| 689 | + Ordinarily, qpdf attempts to automatically compensate for passwords | |
| 690 | + specified in the wrong character encoding. This option suppresses | |
| 691 | + that behavior. Under normal conditions, there are no reasons to use | |
| 692 | + this option. See `Unicode Passwords <#ref.unicode-passwords>`__ for a | |
| 693 | + discussion | |
| 694 | + | |
| 695 | +@1@option@1@--password-mode=@1@replaceable@1@mode@2@replaceable@2@@2@option@2@ | |
| 696 | + This option can be used to fine-tune how qpdf interprets Unicode | |
| 697 | + (non-ASCII) password strings passed on the command line. With the | |
| 698 | + exception of the @1@option@1@hex-bytes@2@option@2@ mode, these only | |
| 699 | + apply to passwords provided when encrypting files. The | |
| 700 | + @1@option@1@hex-bytes@2@option@2@ mode also applies to passwords | |
| 701 | + specified for reading files. For additional discussion of the | |
| 702 | + supported password modes and when you might want to use them, see | |
| 703 | + `Unicode Passwords <#ref.unicode-passwords>`__. The following modes | |
| 704 | + are supported: | |
| 705 | + | |
| 706 | + - @1@option@1@auto@2@option@2@: Automatically determine whether the | |
| 707 | + specified password is a properly encoded Unicode (UTF-8) string, | |
| 708 | + and transcode it as required by the PDF spec based on the type | |
| 709 | + encryption being applied. On Windows starting with version 8.4.0, | |
| 710 | + and on almost all other modern platforms, incoming passwords will | |
| 711 | + be properly encoded in UTF-8, so this is almost always what you | |
| 712 | + want. | |
| 713 | + | |
| 714 | + - @1@option@1@unicode@2@option@2@: Tells qpdf that the incoming | |
| 715 | + password is UTF-8, overriding whatever its automatic detection | |
| 716 | + determines. The only difference between this mode and | |
| 717 | + @1@option@1@auto@2@option@2@ is that qpdf will fail with an error | |
| 718 | + message if the password is not valid UTF-8 instead of falling back | |
| 719 | + to @1@option@1@bytes@2@option@2@ mode with a warning. | |
| 720 | + | |
| 721 | + - @1@option@1@bytes@2@option@2@: Interpret the password as a literal | |
| 722 | + byte string. For non-Windows platforms, this is what versions of | |
| 723 | + qpdf prior to 8.4.0 did. For Windows platforms, there is no way to | |
| 724 | + specify strings of binary data on the command line directly, but | |
| 725 | + you can use the @1@option@1@@filename@2@option@2@ option to do it, | |
| 726 | + in which case this option forces qpdf to respect the string of | |
| 727 | + bytes as provided. This option will allow you to encrypt PDF files | |
| 728 | + with passwords that will not be usable by other readers. | |
| 729 | + | |
| 730 | + - @1@option@1@hex-bytes@2@option@2@: Interpret the password as a | |
| 731 | + hex-encoded string. This provides a way to pass binary data as a | |
| 732 | + password on all platforms including Windows. As with | |
| 733 | + @1@option@1@bytes@2@option@2@, this option may allow creation of | |
| 734 | + files that can't be opened by other readers. This mode affects | |
| 735 | + qpdf's interpretation of passwords specified for decrypting files | |
| 736 | + as well as for encrypting them. It makes it possible to specify | |
| 737 | + strings that are encoded in some manner other than the system's | |
| 738 | + default encoding. | |
| 739 | + | |
| 740 | +@1@option@1@--rotate=[+|-]angle[:page-range]@2@option@2@ | |
| 741 | + Apply rotation to specified pages. The | |
| 742 | + @1@option@1@page-range@2@option@2@ portion of the option value has | |
| 743 | + the same format as page ranges in `Page Selection | |
| 744 | + Options <#ref.page-selection>`__. If the page range is omitted, the | |
| 745 | + rotation is applied to all pages. The @1@option@1@angle@2@option@2@ | |
| 746 | + portion of the parameter may be either 0, 90, 180, or 270. If | |
| 747 | + preceded by @1@option@1@+@2@option@2@ or @1@option@1@-@2@option@2@, | |
| 748 | + the angle is added to or subtracted from the specified pages' | |
| 749 | + original rotations. This is almost always what you want. Otherwise | |
| 750 | + the pages' rotations are set to the exact value, which may cause the | |
| 751 | + appearances of the pages to be inconsistent, especially for scans. | |
| 752 | + For example, the command @1@command@1@qpdf in.pdf out.pdf | |
| 753 | + --rotate=+90:2,4,6 --rotate=180:7-8@2@command@2@ would rotate pages | |
| 754 | + 2, 4, and 6 90 degrees clockwise from their original rotation and | |
| 755 | + force the rotation of pages 7 through 8 to 180 degrees regardless of | |
| 756 | + their original rotation, and the command @1@command@1@qpdf in.pdf | |
| 757 | + out.pdf --rotate=+180@2@command@2@ would rotate all pages by 180 | |
| 758 | + degrees. | |
| 759 | + | |
| 760 | +@1@option@1@--keep-files-open=@1@replaceable@1@[yn]@2@replaceable@2@@2@option@2@ | |
| 761 | + This option controls whether qpdf keeps individual files open while | |
| 762 | + merging. Prior to version 8.1.0, qpdf always kept all files open, but | |
| 763 | + this meant that the number of files that could be merged was limited | |
| 764 | + by the operating system's open file limit. Version 8.1.0 opened files | |
| 765 | + as they were referenced and closed them after each read, but this | |
| 766 | + caused a major performance impact. Version 8.2.0 optimized the | |
| 767 | + performance but did so in a way that, for local file systems, there | |
| 768 | + was a small but unavoidable performance hit, but for networked file | |
| 769 | + systems, the performance impact could be very high. Starting with | |
| 770 | + version 8.2.1, the default behavior is that files are kept open if no | |
| 771 | + more than 200 files are specified, but that the behavior can be | |
| 772 | + explicitly overridden with the | |
| 773 | + @1@option@1@--keep-files-open@2@option@2@ flag. If you are merging | |
| 774 | + more than 200 files but less than the operating system's max open | |
| 775 | + files limit, you may want to use | |
| 776 | + @1@option@1@--keep-files-open=y@2@option@2@, especially if working | |
| 777 | + over a networked file system. If you are using a local file system | |
| 778 | + where the overhead is low and you might sometimes merge more than the | |
| 779 | + OS limit's number of files from a script and are not worried about a | |
| 780 | + few seconds additional processing time, you may want to specify | |
| 781 | + @1@option@1@--keep-files-open=n@2@option@2@. The threshold for | |
| 782 | + switching may be changed from the default 200 with the | |
| 783 | + @1@option@1@--keep-files-open-threshold@2@option@2@ option. | |
| 784 | + | |
| 785 | +@1@option@1@--keep-files-open-threshold=@1@replaceable@1@count@2@replaceable@2@@2@option@2@ | |
| 786 | + If specified, overrides the default value of 200 used as the | |
| 787 | + threshold for qpdf deciding whether or not to keep files open. See | |
| 788 | + @1@option@1@--keep-files-open@2@option@2@ for details. | |
| 789 | + | |
| 790 | +@1@option@1@--pages options --@2@option@2@ | |
| 791 | + Select specific pages from one or more input files. See `Page | |
| 792 | + Selection Options <#ref.page-selection>`__ for details on how to do | |
| 793 | + page selection (splitting and merging). | |
| 794 | + | |
| 795 | +@1@option@1@--collate=@1@replaceable@1@n@2@replaceable@2@@2@option@2@ | |
| 796 | + When specified, collate rather than concatenate pages from files | |
| 797 | + specified with @1@option@1@--pages@2@option@2@. With a numeric | |
| 798 | + argument, collate in groups of @1@replaceable@1@n@2@replaceable@2@. | |
| 799 | + The default is 1. See `Page Selection | |
| 800 | + Options <#ref.page-selection>`__ for additional details. | |
| 801 | + | |
| 802 | +@1@option@1@--flatten-rotation@2@option@2@ | |
| 803 | + For each page that is rotated using the ``/Rotate`` key in the page's | |
| 804 | + dictionary, remove the ``/Rotate`` key and implement the identical | |
| 805 | + rotation semantics by modifying the page's contents. This option can | |
| 806 | + be useful to prepare files for buggy PDF applications that don't | |
| 807 | + properly handle rotated pages. | |
| 808 | + | |
| 809 | +@1@option@1@--split-pages=[n]@2@option@2@ | |
| 810 | + Write each group of @1@option@1@n@2@option@2@ pages to a separate | |
| 811 | + output file. If @1@option@1@n@2@option@2@ is not specified, create | |
| 812 | + single pages. Output file names are generated as follows: | |
| 813 | + | |
| 814 | + - If the string ``%d`` appears in the output file name, it is | |
| 815 | + replaced with a range of zero-padded page numbers starting from 1. | |
| 816 | + | |
| 817 | + - Otherwise, if the output file name ends in | |
| 818 | + @1@filename@1@.pdf@2@filename@2@ (case insensitive), a zero-padded | |
| 819 | + page range, preceded by a dash, is inserted before the file | |
| 820 | + extension. | |
| 821 | + | |
| 822 | + - Otherwise, the file name is appended with a zero-padded page range | |
| 823 | + preceded by a dash. | |
| 824 | + | |
| 825 | + Page ranges are a single number in the case of single-page groups or | |
| 826 | + two numbers separated by a dash otherwise. For example, if | |
| 827 | + @1@filename@1@infile.pdf@2@filename@2@ has 12 pages | |
| 828 | + | |
| 829 | + - @1@command@1@qpdf --split-pages infile.pdf %d-out@2@command@2@ | |
| 830 | + would generate files @1@filename@1@01-out@2@filename@2@ through | |
| 831 | + @1@filename@1@12-out@2@filename@2@ | |
| 832 | + | |
| 833 | + - @1@command@1@qpdf --split-pages=2 infile.pdf | |
| 834 | + outfile.pdf@2@command@2@ would generate files | |
| 835 | + @1@filename@1@outfile-01-02.pdf@2@filename@2@ through | |
| 836 | + @1@filename@1@outfile-11-12.pdf@2@filename@2@ | |
| 837 | + | |
| 838 | + - @1@command@1@qpdf --split-pages infile.pdf | |
| 839 | + something.else@2@command@2@ would generate files | |
| 840 | + @1@filename@1@something.else-01@2@filename@2@ through | |
| 841 | + @1@filename@1@something.else-12@2@filename@2@ | |
| 842 | + | |
| 843 | + Note that outlines, threads, and other global features of the | |
| 844 | + original PDF file are not preserved. For each page of output, this | |
| 845 | + option creates an empty PDF and copies a single page from the output | |
| 846 | + into it. If you require the global data, you will have to run | |
| 847 | + @1@command@1@qpdf@2@command@2@ with the | |
| 848 | + @1@option@1@--pages@2@option@2@ option once for each file. Using | |
| 849 | + @1@option@1@--split-pages@2@option@2@ is much faster if you don't | |
| 850 | + require the global data. | |
| 851 | + | |
| 852 | +@1@option@1@--overlay options --@2@option@2@ | |
| 853 | + Overlay pages from another file onto the output pages. See `Overlay | |
| 854 | + and Underlay Options <#ref.overlay-underlay>`__ for details on | |
| 855 | + overlay/underlay. | |
| 856 | + | |
| 857 | +@1@option@1@--underlay options --@2@option@2@ | |
| 858 | + Overlay pages from another file onto the output pages. See `Overlay | |
| 859 | + and Underlay Options <#ref.overlay-underlay>`__ for details on | |
| 860 | + overlay/underlay. | |
| 861 | + | |
| 862 | +Password-protected files may be opened by specifying a password. By | |
| 863 | +default, qpdf will preserve any encryption data associated with a file. | |
| 864 | +If @1@option@1@--decrypt@2@option@2@ is specified, qpdf will attempt to | |
| 865 | +remove any encryption information. If @1@option@1@--encrypt@2@option@2@ | |
| 866 | +is specified, qpdf will replace the document's encryption parameters | |
| 867 | +with whatever is specified. | |
| 868 | + | |
| 869 | +Note that qpdf does not obey encryption restrictions already imposed on | |
| 870 | +the file. Doing so would be meaningless since qpdf can be used to remove | |
| 871 | +encryption from the file entirely. This functionality is not intended to | |
| 872 | +be used for bypassing copyright restrictions or other restrictions | |
| 873 | +placed on files by their producers. | |
| 874 | + | |
| 875 | +Prior to 8.4.0, in the case of passwords that contain characters that | |
| 876 | +fall outside of 7-bit US-ASCII, qpdf left the burden of supplying | |
| 877 | +properly encoded encryption and decryption passwords to the user. | |
| 878 | +Starting in qpdf 8.4.0, qpdf does this automatically in most cases. For | |
| 879 | +an in-depth discussion, please see `Unicode | |
| 880 | +Passwords <#ref.unicode-passwords>`__. Previous versions of this manual | |
| 881 | +described workarounds using the @1@command@1@iconv@2@command@2@ command. | |
| 882 | +Such workarounds are no longer required or recommended with qpdf 8.4.0. | |
| 883 | +However, for backward compatibility, qpdf attempts to detect those | |
| 884 | +workarounds and do the right thing in most cases. | |
| 885 | + | |
| 886 | +.. _ref.encryption-options: | |
| 887 | + | |
| 888 | +Encryption Options | |
| 889 | +------------------ | |
| 890 | + | |
| 891 | +To change the encryption parameters of a file, use the --encrypt flag. | |
| 892 | +The syntax is | |
| 893 | + | |
| 894 | +:: | |
| 895 | + | |
| 896 | + @1@option@1@--encrypt @1@replaceable@1@user-password@2@replaceable@2@ @1@replaceable@1@owner-password@2@replaceable@2@ @1@replaceable@1@key-length@2@replaceable@2@ [ @1@replaceable@1@restrictions@2@replaceable@2@ ] --@2@option@2@ | |
| 897 | + | |
| 898 | +Note that "@1@option@1@--@2@option@2@" terminates parsing of encryption | |
| 899 | +flags and must be present even if no restrictions are present. | |
| 900 | + | |
| 901 | +Either or both of the user password and the owner password may be empty | |
| 902 | +strings. Starting in qpdf 10.2, qpdf defaults to not allowing creation | |
| 903 | +of PDF files with a non-empty user password, an empty owner password, | |
| 904 | +and a 256-bit key since such files can be opened with no password. If | |
| 905 | +you want to create such files, specify the encryption option | |
| 906 | +@1@option@1@--allow-insecure@2@option@2@, as described below. | |
| 907 | + | |
| 908 | +The value for | |
| 909 | +@1@option@1@@1@replaceable@1@key-length@2@replaceable@2@@2@option@2@ may | |
| 910 | +be 40, 128, or 256. The restriction flags are dependent upon key length. | |
| 911 | +When no additional restrictions are given, the default is to be fully | |
| 912 | +permissive. | |
| 913 | + | |
| 914 | +If @1@option@1@@1@replaceable@1@key-length@2@replaceable@2@@2@option@2@ | |
| 915 | +is 40, the following restriction options are available: | |
| 916 | + | |
| 917 | +@1@option@1@--print=[yn]@2@option@2@ | |
| 918 | + Determines whether or not to allow printing. | |
| 919 | + | |
| 920 | +@1@option@1@--modify=[yn]@2@option@2@ | |
| 921 | + Determines whether or not to allow document modification. | |
| 922 | + | |
| 923 | +@1@option@1@--extract=[yn]@2@option@2@ | |
| 924 | + Determines whether or not to allow text/image extraction. | |
| 925 | + | |
| 926 | +@1@option@1@--annotate=[yn]@2@option@2@ | |
| 927 | + Determines whether or not to allow comments and form fill-in and | |
| 928 | + signing. | |
| 929 | + | |
| 930 | +If @1@option@1@@1@replaceable@1@key-length@2@replaceable@2@@2@option@2@ | |
| 931 | +is 128, the following restriction options are available: | |
| 932 | + | |
| 933 | +@1@option@1@--accessibility=[yn]@2@option@2@ | |
| 934 | + Determines whether or not to allow accessibility to visually | |
| 935 | + impaired. The qpdf library disregards this field when AES is used or | |
| 936 | + when 256-bit encryption is used. You should really never disable | |
| 937 | + accessibility, but qpdf lets you do it in case you need to configure | |
| 938 | + a file this way for testing purposes. The PDF spec says that | |
| 939 | + conforming readers should disregard this permission and always allow | |
| 940 | + accessibility. | |
| 941 | + | |
| 942 | +@1@option@1@--extract=[yn]@2@option@2@ | |
| 943 | + Determines whether or not to allow text/graphic extraction. | |
| 944 | + | |
| 945 | +@1@option@1@--assemble=[yn]@2@option@2@ | |
| 946 | + Determines whether document assembly (rotation and reordering of | |
| 947 | + pages) is allowed. | |
| 948 | + | |
| 949 | +@1@option@1@--annotate=[yn]@2@option@2@ | |
| 950 | + Determines whether modifying annotations is allowed. This includes | |
| 951 | + adding comments and filling in form fields. Also allows editing of | |
| 952 | + form fields if @1@option@1@--modify-other=y@2@option@2@ is given. | |
| 953 | + | |
| 954 | +@1@option@1@--form=[yn]@2@option@2@ | |
| 955 | + Determines whether filling form fields is allowed. | |
| 956 | + | |
| 957 | +@1@option@1@--modify-other=[yn]@2@option@2@ | |
| 958 | + Allow all document editing except those controlled separately by the | |
| 959 | + @1@option@1@--assemble@2@option@2@, | |
| 960 | + @1@option@1@--annotate@2@option@2@, and | |
| 961 | + @1@option@1@--form@2@option@2@ options. | |
| 962 | + | |
| 963 | +@1@option@1@--print=@1@replaceable@1@print-opt@2@replaceable@2@@2@option@2@ | |
| 964 | + Controls printing access. | |
| 965 | + @1@option@1@@1@replaceable@1@print-opt@2@replaceable@2@@2@option@2@ | |
| 966 | + may be one of the following: | |
| 967 | + | |
| 968 | + - @1@option@1@full@2@option@2@: allow full printing | |
| 969 | + | |
| 970 | + - @1@option@1@low@2@option@2@: allow low-resolution printing only | |
| 971 | + | |
| 972 | + - @1@option@1@none@2@option@2@: disallow printing | |
| 973 | + | |
| 974 | +@1@option@1@--modify=@1@replaceable@1@modify-opt@2@replaceable@2@@2@option@2@ | |
| 975 | + Controls modify access. This way of controlling modify access has | |
| 976 | + less granularity than new options added in qpdf 8.4. | |
| 977 | + @1@option@1@@1@replaceable@1@modify-opt@2@replaceable@2@@2@option@2@ | |
| 978 | + may be one of the following: | |
| 979 | + | |
| 980 | + - @1@option@1@all@2@option@2@: allow full document modification | |
| 981 | + | |
| 982 | + - @1@option@1@annotate@2@option@2@: allow comment authoring, form | |
| 983 | + operations, and document assembly | |
| 984 | + | |
| 985 | + - @1@option@1@form@2@option@2@: allow form field fill-in and signing | |
| 986 | + and document assembly | |
| 987 | + | |
| 988 | + - @1@option@1@assembly@2@option@2@: allow document assembly only | |
| 989 | + | |
| 990 | + - @1@option@1@none@2@option@2@: allow no modifications | |
| 991 | + | |
| 992 | + Using the @1@option@1@--modify@2@option@2@ option does not allow you | |
| 993 | + to create certain combinations of permissions such as allowing form | |
| 994 | + filling but not allowing document assembly. Starting with qpdf 8.4, | |
| 995 | + you can either just use the other options to control fields | |
| 996 | + individually, or you can use something like @1@option@1@--modify=form | |
| 997 | + --assembly=n@2@option@2@ to fine tune. | |
| 998 | + | |
| 999 | +@1@option@1@--cleartext-metadata@2@option@2@ | |
| 1000 | + If specified, any metadata stream in the document will be left | |
| 1001 | + unencrypted even if the rest of the document is encrypted. This also | |
| 1002 | + forces the PDF version to be at least 1.5. | |
| 1003 | + | |
| 1004 | +@1@option@1@--use-aes=[yn]@2@option@2@ | |
| 1005 | + If @1@option@1@--use-aes=y@2@option@2@ is specified, AES encryption | |
| 1006 | + will be used instead of RC4 encryption. This forces the PDF version | |
| 1007 | + to be at least 1.6. | |
| 1008 | + | |
| 1009 | +@1@option@1@--allow-insecure@2@option@2@ | |
| 1010 | + From qpdf 10.2, qpdf defaults to not allowing creation of PDF files | |
| 1011 | + where the user password is non-empty, the owner password is empty, | |
| 1012 | + and a 256-bit key is in use. Files created in this way are insecure | |
| 1013 | + since they can be opened without a password. Users would ordinarily | |
| 1014 | + never want to create such files. If you are using qpdf to | |
| 1015 | + intentionally created strange files for testing (a definite valid use | |
| 1016 | + of qpdf!), this option allows you to create such insecure files. | |
| 1017 | + | |
| 1018 | +@1@option@1@--force-V4@2@option@2@ | |
| 1019 | + Use of this option forces the ``/V`` and ``/R`` parameters in the | |
| 1020 | + document's encryption dictionary to be set to the value ``4``. As | |
| 1021 | + qpdf will automatically do this when required, there is no reason to | |
| 1022 | + ever use this option. It exists primarily for use in testing qpdf | |
| 1023 | + itself. This option also forces the PDF version to be at least 1.5. | |
| 1024 | + | |
| 1025 | +If @1@option@1@@1@replaceable@1@key-length@2@replaceable@2@@2@option@2@ | |
| 1026 | +is 256, the minimum PDF version is 1.7 with extension level 8, and the | |
| 1027 | +AES-based encryption format used is the PDF 2.0 encryption method | |
| 1028 | +supported by Acrobat X. the same options are available as with 128 bits | |
| 1029 | +with the following exceptions: | |
| 1030 | + | |
| 1031 | +@1@option@1@--use-aes@2@option@2@ | |
| 1032 | + This option is not available with 256-bit keys. AES is always used | |
| 1033 | + with 256-bit encryption keys. | |
| 1034 | + | |
| 1035 | +@1@option@1@--force-V4@2@option@2@ | |
| 1036 | + This option is not available with 256 keys. | |
| 1037 | + | |
| 1038 | +@1@option@1@--force-R5@2@option@2@ | |
| 1039 | + If specified, qpdf sets the minimum version to 1.7 at extension level | |
| 1040 | + 3 and writes the deprecated encryption format used by Acrobat version | |
| 1041 | + IX. This option should not be used in practice to generate PDF files | |
| 1042 | + that will be in general use, but it can be useful to generate files | |
| 1043 | + if you are trying to test proper support in another application for | |
| 1044 | + PDF files encrypted in this way. | |
| 1045 | + | |
| 1046 | +The default for each permission option is to be fully permissive. | |
| 1047 | + | |
| 1048 | +.. _ref.page-selection: | |
| 1049 | + | |
| 1050 | +Page Selection Options | |
| 1051 | +---------------------- | |
| 1052 | + | |
| 1053 | +Starting with qpdf 3.0, it is possible to split and merge PDF files by | |
| 1054 | +selecting pages from one or more input files. Whatever file is given as | |
| 1055 | +the primary input file is used as the starting point, but its pages are | |
| 1056 | +replaced with pages as specified. | |
| 1057 | + | |
| 1058 | +:: | |
| 1059 | + | |
| 1060 | + @1@option@1@--pages @1@replaceable@1@input-file@2@replaceable@2@ [ @1@replaceable@1@--password=password@2@replaceable@2@ ] [ @1@replaceable@1@page-range@2@replaceable@2@ ] [ ... ] --@2@option@2@ | |
| 1061 | + | |
| 1062 | +Multiple input files may be specified. Each one is given as the name of | |
| 1063 | +the input file, an optional password (if required to open the file), and | |
| 1064 | +the range of pages. Note that "@1@option@1@--@2@option@2@" terminates | |
| 1065 | +parsing of page selection flags. | |
| 1066 | + | |
| 1067 | +Starting with qpf 8.4, the special input file name | |
| 1068 | +"@1@filename@1@.@2@filename@2@" can be used as a shortcut for the | |
| 1069 | +primary input filename. | |
| 1070 | + | |
| 1071 | +For each file that pages should be taken from, specify the file, a | |
| 1072 | +password needed to open the file (if any), and a page range. The | |
| 1073 | +password needs to be given only once per file. If any of the input files | |
| 1074 | +are the same as the primary input file or the file used to copy | |
| 1075 | +encryption parameters (if specified), you do not need to repeat the | |
| 1076 | +password here. The same file can be repeated multiple times. If a file | |
| 1077 | +that is repeated has a password, the password only has to be given the | |
| 1078 | +first time. All non-page data (info, outlines, page numbers, etc.) are | |
| 1079 | +taken from the primary input file. To discard these, use | |
| 1080 | +@1@option@1@--empty@2@option@2@ as the primary input. | |
| 1081 | + | |
| 1082 | +Starting with qpdf 5.0.0, it is possible to omit the page range. If qpdf | |
| 1083 | +sees a value in the place where it expects a page range and that value | |
| 1084 | +is not a valid range but is a valid file name, qpdf will implicitly use | |
| 1085 | +the range ``1-z``, meaning that it will include all pages in the file. | |
| 1086 | +This makes it possible to easily combine all pages in a set of files | |
| 1087 | +with a command like @1@command@1@qpdf --empty out.pdf --pages \*.pdf | |
| 1088 | +--@2@command@2@. | |
| 1089 | + | |
| 1090 | +The page range is a set of numbers separated by commas, ranges of | |
| 1091 | +numbers separated dashes, or combinations of those. The character "z" | |
| 1092 | +represents the last page. A number preceded by an "r" indicates to count | |
| 1093 | +from the end, so ``r3-r1`` would be the last three pages of the | |
| 1094 | +document. Pages can appear in any order. Ranges can appear with a high | |
| 1095 | +number followed by a low number, which causes the pages to appear in | |
| 1096 | +reverse. Numbers may be repeated in a page range. A page range may be | |
| 1097 | +optionally appended with ``:even`` or ``:odd`` to indicate only the even | |
| 1098 | +or odd pages in the given range. Note that even and odd refer to the | |
| 1099 | +positions within the specified, range, not whether the original number | |
| 1100 | +is even or odd. | |
| 1101 | + | |
| 1102 | +Example page ranges: | |
| 1103 | + | |
| 1104 | +- ``1,3,5-9,15-12``: pages 1, 3, 5, 6, 7, 8, 9, 15, 14, 13, and 12 in | |
| 1105 | + that order. | |
| 1106 | + | |
| 1107 | +- ``z-1``: all pages in the document in reverse | |
| 1108 | + | |
| 1109 | +- ``r3-r1``: the last three pages of the document | |
| 1110 | + | |
| 1111 | +- ``r1-r3``: the last three pages of the document in reverse order | |
| 1112 | + | |
| 1113 | +- ``1-20:even``: even pages from 2 to 20 | |
| 1114 | + | |
| 1115 | +- ``5,7-9,12:odd``: pages 5, 8, and, 12, which are the pages in odd | |
| 1116 | + positions from among the original range, which represents pages 5, 7, | |
| 1117 | + 8, 9, and 12. | |
| 1118 | + | |
| 1119 | +Starting in qpdf version 8.3, you can specify the | |
| 1120 | +@1@option@1@--collate@2@option@2@ option. Note that this option is | |
| 1121 | +specified outside of @1@option@1@--pagesย ...ย --@2@option@2@. When | |
| 1122 | +@1@option@1@--collate@2@option@2@ is specified, it changes the meaning | |
| 1123 | +of @1@option@1@--pages@2@option@2@ so that the specified files, as | |
| 1124 | +modified by page ranges, are collated rather than concatenated. For | |
| 1125 | +example, if you add the files @1@filename@1@odd.pdf@2@filename@2@ and | |
| 1126 | +@1@filename@1@even.pdf@2@filename@2@ containing odd and even pages of a | |
| 1127 | +document respectively, you could run @1@command@1@qpdf --collate odd.pdf | |
| 1128 | +--pages odd.pdf even.pdf -- all.pdf@2@command@2@ to collate the pages. | |
| 1129 | +This would pick page 1 from odd, page 1 from even, page 2 from odd, page | |
| 1130 | +2 from even, etc. until all pages have been included. Any number of | |
| 1131 | +files and page ranges can be specified. If any file has fewer pages, | |
| 1132 | +that file is just skipped when its pages have all been included. For | |
| 1133 | +example, if you ran @1@command@1@qpdf --collate --empty --pages a.pdf | |
| 1134 | +1-5 b.pdf 6-4 c.pdf r1 -- out.pdf@2@command@2@, you would get the | |
| 1135 | +following pages in this order: | |
| 1136 | + | |
| 1137 | +- a.pdf page 1 | |
| 1138 | + | |
| 1139 | +- b.pdf page 6 | |
| 1140 | + | |
| 1141 | +- c.pdf last page | |
| 1142 | + | |
| 1143 | +- a.pdf page 2 | |
| 1144 | + | |
| 1145 | +- b.pdf page 5 | |
| 1146 | + | |
| 1147 | +- a.pdf page 3 | |
| 1148 | + | |
| 1149 | +- b.pdf page 4 | |
| 1150 | + | |
| 1151 | +- a.pdf page 4 | |
| 1152 | + | |
| 1153 | +- a.pdf page 5 | |
| 1154 | + | |
| 1155 | +Starting in qpdf version 10.2, you may specify a numeric argument to | |
| 1156 | +@1@option@1@--collate@2@option@2@. With | |
| 1157 | +@1@option@1@--collate=@1@replaceable@1@n@2@replaceable@2@@2@option@2@, | |
| 1158 | +pull groups of @1@replaceable@1@n@2@replaceable@2@ pages from each file, | |
| 1159 | +again, stopping when there are no more pages. For example, if you ran | |
| 1160 | +@1@command@1@qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf | |
| 1161 | +r1 -- out.pdf@2@command@2@, you would get the following pages in this | |
| 1162 | +order: | |
| 1163 | + | |
| 1164 | +- a.pdf page 1 | |
| 1165 | + | |
| 1166 | +- a.pdf page 2 | |
| 1167 | + | |
| 1168 | +- b.pdf page 6 | |
| 1169 | + | |
| 1170 | +- b.pdf page 5 | |
| 1171 | + | |
| 1172 | +- c.pdf last page | |
| 1173 | + | |
| 1174 | +- a.pdf page 3 | |
| 1175 | + | |
| 1176 | +- a.pdf page 4 | |
| 1177 | + | |
| 1178 | +- b.pdf page 4 | |
| 1179 | + | |
| 1180 | +- a.pdf page 5 | |
| 1181 | + | |
| 1182 | +Starting in qpdf version 8.3, when you split and merge files, any page | |
| 1183 | +labels (page numbers) are preserved in the final file. It is expected | |
| 1184 | +that more document features will be preserved by splitting and merging. | |
| 1185 | +In the mean time, semantics of splitting and merging vary across | |
| 1186 | +features. For example, the document's outlines (bookmarks) point to | |
| 1187 | +actual page objects, so if you select some pages and not others, | |
| 1188 | +bookmarks that point to pages that are in the output file will work, and | |
| 1189 | +remaining bookmarks will not work. A future version of | |
| 1190 | +@1@command@1@qpdf@2@command@2@ may do a better job at handling these | |
| 1191 | +issues. (Note that the qpdf library already contains all of the APIs | |
| 1192 | +required in order to implement this in your own application if you need | |
| 1193 | +it.) In the mean time, you can always use | |
| 1194 | +@1@option@1@--empty@2@option@2@ as the primary input file to avoid | |
| 1195 | +copying all of that from the first file. For example, to take pages 1 | |
| 1196 | +through 5 from a @1@filename@1@infile.pdf@2@filename@2@ while preserving | |
| 1197 | +all metadata associated with that file, you could use | |
| 1198 | + | |
| 1199 | +:: | |
| 1200 | + | |
| 1201 | + @1@command@1@qpdf@2@command@2@ @1@option@1@infile.pdf --pages . 1-5 -- outfile.pdf@2@option@2@ | |
| 1202 | + | |
| 1203 | +If you wanted pages 1 through 5 from | |
| 1204 | +@1@filename@1@infile.pdf@2@filename@2@ but you wanted the rest of the | |
| 1205 | +metadata to be dropped, you could instead run | |
| 1206 | + | |
| 1207 | +:: | |
| 1208 | + | |
| 1209 | + @1@command@1@qpdf@2@command@2@ @1@option@1@--empty --pages infile.pdf 1-5 -- outfile.pdf@2@option@2@ | |
| 1210 | + | |
| 1211 | +If you wanted to take pages 1 through 5 from | |
| 1212 | +@1@filename@1@file1.pdf@2@filename@2@ and pages 11 through 15 from | |
| 1213 | +@1@filename@1@file2.pdf@2@filename@2@ in reverse, taking document-level | |
| 1214 | +metadata from @1@filename@1@file2.pdf@2@filename@2@, you would run | |
| 1215 | + | |
| 1216 | +:: | |
| 1217 | + | |
| 1218 | + @1@command@1@qpdf@2@command@2@ @1@option@1@file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf@2@option@2@ | |
| 1219 | + | |
| 1220 | +If, for some reason, you wanted to take the first page of an encrypted | |
| 1221 | +file called @1@filename@1@encrypted.pdf@2@filename@2@ with password | |
| 1222 | +``pass`` and repeat it twice in an output file, and if you wanted to | |
| 1223 | +drop document-level metadata but preserve encryption, you would use | |
| 1224 | + | |
| 1225 | +:: | |
| 1226 | + | |
| 1227 | + @1@command@1@qpdf@2@command@2@ @1@option@1@--empty --copy-encryption=encrypted.pdf --encryption-file-password=pass | |
| 1228 | + --pages encrypted.pdf --password=pass 1 ./encrypted.pdf --password=pass 1 -- | |
| 1229 | + outfile.pdf@2@option@2@ | |
| 1230 | + | |
| 1231 | +Note that we had to specify the password all three times because giving | |
| 1232 | +a password as @1@option@1@--encryption-file-password@2@option@2@ doesn't | |
| 1233 | +count for page selection, and as far as qpdf is concerned, | |
| 1234 | +@1@filename@1@encrypted.pdf@2@filename@2@ and | |
| 1235 | +@1@filename@1@./encrypted.pdf@2@filename@2@ are separated files. These | |
| 1236 | +are all corner cases that most users should hopefully never have to be | |
| 1237 | +bothered with. | |
| 1238 | + | |
| 1239 | +Prior to version 8.4, it was not possible to specify the same page from | |
| 1240 | +the same file directly more than once, and the workaround of specifying | |
| 1241 | +the same file in more than one way was required. Version 8.4 removes | |
| 1242 | +this limitation, but there is still a valid use case. When you specify | |
| 1243 | +the same page from the same file more than once, qpdf will share objects | |
| 1244 | +between the pages. If you are going to do further manipulation on the | |
| 1245 | +file and need the two instances of the same original page to be deep | |
| 1246 | +copies, then you can specify the file in two different ways. For example | |
| 1247 | +@1@command@1@qpdf in.pdf --pages . 1 ./in.pdf 1 -- out.pdf@2@command@2@ | |
| 1248 | +would create a file with two copies of the first page of the input, and | |
| 1249 | +the two copies would share any objects in common. This includes fonts, | |
| 1250 | +images, and anything else the page references. | |
| 1251 | + | |
| 1252 | +.. _ref.overlay-underlay: | |
| 1253 | + | |
| 1254 | +Overlay and Underlay Options | |
| 1255 | +---------------------------- | |
| 1256 | + | |
| 1257 | +Starting with qpdf 8.4, it is possible to overlay or underlay pages from | |
| 1258 | +other files onto the output generated by qpdf. Specify overlay or | |
| 1259 | +underlay as follows: | |
| 1260 | + | |
| 1261 | +:: | |
| 1262 | + | |
| 1263 | + { @1@option@1@--overlay@2@option@2@ | @1@option@1@--underlay@2@option@2@ } @1@replaceable@1@file@2@replaceable@2@ [ @1@option@1@options@2@option@2@ ] @1@option@1@--@2@option@2@ | |
| 1264 | + | |
| 1265 | +Overlay and underlay options are processed late, so they can be combined | |
| 1266 | +with other like merging and will apply to the final output. The | |
| 1267 | +@1@option@1@--overlay@2@option@2@ and @1@option@1@--underlay@2@option@2@ | |
| 1268 | +options work the same way, except underlay pages are drawn underneath | |
| 1269 | +the page to which they are applied, possibly obscured by the original | |
| 1270 | +page, and overlay files are drawn on top of the page to which they are | |
| 1271 | +applied, possibly obscuring the page. You can combine overlay and | |
| 1272 | +underlay. | |
| 1273 | + | |
| 1274 | +The default behavior of overlay and underlay is that pages are taken | |
| 1275 | +from the overlay/underlay file in sequence and applied to corresponding | |
| 1276 | +pages in the output until there are no more output pages. If the overlay | |
| 1277 | +or underlay file runs out of pages, remaining output pages are left | |
| 1278 | +alone. This behavior can be modified by options, which are provided | |
| 1279 | +between the @1@option@1@--overlay@2@option@2@ or | |
| 1280 | +@1@option@1@--underlay@2@option@2@ flag and the | |
| 1281 | +@1@option@1@--@2@option@2@ option. The following options are supported: | |
| 1282 | + | |
| 1283 | +- @1@option@1@--password=password@2@option@2@: supply a password if the | |
| 1284 | + overlay/underlay file is encrypted. | |
| 1285 | + | |
| 1286 | +- @1@option@1@--to=page-range@2@option@2@: a range of pages in the same | |
| 1287 | + form at described in `Page Selection Options <#ref.page-selection>`__ | |
| 1288 | + indicates which pages in the output should have the overlay/underlay | |
| 1289 | + applied. If not specified, overlay/underlay are applied to all pages. | |
| 1290 | + | |
| 1291 | +- @1@option@1@--from=[page-range]@2@option@2@: a range of pages that | |
| 1292 | + specifies which pages in the overlay/underlay file will be used for | |
| 1293 | + overlay or underlay. If not specified, all pages will be used. This | |
| 1294 | + can be explicitly specified to be empty if | |
| 1295 | + @1@option@1@--repeat@2@option@2@ is used. | |
| 1296 | + | |
| 1297 | +- @1@option@1@--repeat=page-range@2@option@2@: an optional range of | |
| 1298 | + pages that specifies which pages in the overlay/underlay file will be | |
| 1299 | + repeated after the "from" pages are used up. If you want to repeat a | |
| 1300 | + range of pages starting at the beginning, you can explicitly use | |
| 1301 | + @1@option@1@--from=@2@option@2@. | |
| 1302 | + | |
| 1303 | +Here are some examples. | |
| 1304 | + | |
| 1305 | +- @1@command@1@--overlay o.pdf --to=1-5 --from=1-3 --repeat=4 | |
| 1306 | + --@2@command@2@: overlay the first three pages from file | |
| 1307 | + @1@filename@1@o.pdf@2@filename@2@ onto the first three pages of the | |
| 1308 | + output, then overlay page 4 from @1@filename@1@o.pdf@2@filename@2@ | |
| 1309 | + onto pages 4 and 5 of the output. Leave remaining output pages | |
| 1310 | + untouched. | |
| 1311 | + | |
| 1312 | +- @1@command@1@--underlay footer.pdf --from= --repeat=1,2 | |
| 1313 | + --@2@command@2@: Underlay page 1 of | |
| 1314 | + @1@filename@1@footer.pdf@2@filename@2@ on all odd output pages, and | |
| 1315 | + underlay page 2 of @1@filename@1@footer.pdf@2@filename@2@ on all even | |
| 1316 | + output pages. | |
| 1317 | + | |
| 1318 | +.. _ref.attachments: | |
| 1319 | + | |
| 1320 | +Embedded Files/Attachments Options | |
| 1321 | +---------------------------------- | |
| 1322 | + | |
| 1323 | +Starting with qpdf 10.2, you can work with file attachments in PDF files | |
| 1324 | +from the command line. The following options are available: | |
| 1325 | + | |
| 1326 | +@1@option@1@--list-attachments@2@option@2@ | |
| 1327 | + Show the "key" and stream number for embedded files. With | |
| 1328 | + @1@option@1@--verbose@2@option@2@, additional information, including | |
| 1329 | + preferred file name, description, dates, and more are also displayed. | |
| 1330 | + The key is usually but not always equal to the file name, and is | |
| 1331 | + needed by some of the other options. | |
| 1332 | + | |
| 1333 | +@1@option@1@--show-attachment=@1@replaceable@1@key@2@replaceable@2@@2@option@2@ | |
| 1334 | + Write the contents of the specified attachment to standard output as | |
| 1335 | + binary data. The key should match one of the keys shown by | |
| 1336 | + @1@option@1@--list-attachments@2@option@2@. If specified multiple | |
| 1337 | + times, only the last attachment will be shown. | |
| 1338 | + | |
| 1339 | +@1@option@1@--add-attachment @1@replaceable@1@file@2@replaceable@2@ @1@replaceable@1@options@2@replaceable@2@ --@2@option@2@ | |
| 1340 | + Add or replace an attachment with the contents of | |
| 1341 | + @1@replaceable@1@file@2@replaceable@2@. This may be specified more | |
| 1342 | + than once. The following additional options may appear before the | |
| 1343 | + ``--`` that ends this option: | |
| 1344 | + | |
| 1345 | + @1@option@1@--key=@1@replaceable@1@key@2@replaceable@2@@2@option@2@ | |
| 1346 | + The key to use to register the attachment in the embedded files | |
| 1347 | + table. Defaults to the last path element of | |
| 1348 | + @1@replaceable@1@file@2@replaceable@2@. | |
| 1349 | + | |
| 1350 | + @1@option@1@--filename=@1@replaceable@1@name@2@replaceable@2@@2@option@2@ | |
| 1351 | + The file name to be used for the attachment. This is what is | |
| 1352 | + usually displayed to the user and is the name most graphical PDF | |
| 1353 | + viewers will use when saving a file. It defaults to the last path | |
| 1354 | + element of @1@replaceable@1@file@2@replaceable@2@. | |
| 1355 | + | |
| 1356 | + @1@option@1@--creationdate=@1@replaceable@1@date@2@replaceable@2@@2@option@2@ | |
| 1357 | + The attachment's creation date in PDF format; defaults to the | |
| 1358 | + current time. The date format is explained below. | |
| 1359 | + | |
| 1360 | + @1@option@1@--moddate=@1@replaceable@1@date@2@replaceable@2@@2@option@2@ | |
| 1361 | + The attachment's modification date in PDF format; defaults to the | |
| 1362 | + current time. The date format is explained below. | |
| 1363 | + | |
| 1364 | + @1@option@1@--mimetype=@1@replaceable@1@type/subtype@2@replaceable@2@@2@option@2@ | |
| 1365 | + The mime type for the attachment, e.g. ``text/plain`` or | |
| 1366 | + ``application/pdf``. Note that the mimetype appears in a field | |
| 1367 | + called ``/Subtype`` in the PDF but actually includes the full type | |
| 1368 | + and subtype of the mime type. | |
| 1369 | + | |
| 1370 | + @1@option@1@--description=@1@replaceable@1@"text"@2@replaceable@2@@2@option@2@ | |
| 1371 | + Descriptive text for the attachment, displayed by some PDF | |
| 1372 | + viewers. | |
| 1373 | + | |
| 1374 | + @1@option@1@--replace@2@option@2@ | |
| 1375 | + Indicates that any existing attachment with the same key should be | |
| 1376 | + replaced by the new attachment. Otherwise, | |
| 1377 | + @1@command@1@qpdf@2@command@2@ gives an error if an attachment | |
| 1378 | + with that key is already present. | |
| 1379 | + | |
| 1380 | +@1@option@1@--remove-attachment=@1@replaceable@1@key@2@replaceable@2@@2@option@2@ | |
| 1381 | + Remove the specified attachment. This doesn't only remove the | |
| 1382 | + attachment from the embedded files table but also clears out the file | |
| 1383 | + specification. That means that any potential internal links to the | |
| 1384 | + attachment will be broken. This option may be specified multiple | |
| 1385 | + times. Run with @1@option@1@--verbose@2@option@2@ to see status of | |
| 1386 | + the removal. | |
| 1387 | + | |
| 1388 | +@1@option@1@--copy-attachments-from @1@replaceable@1@file@2@replaceable@2@ @1@replaceable@1@options@2@replaceable@2@ --@2@option@2@ | |
| 1389 | + Copy attachments from another file. This may be specified more than | |
| 1390 | + once. The following additional options may appear before the ``--`` | |
| 1391 | + that ends this option: | |
| 1392 | + | |
| 1393 | + @1@option@1@--password=@1@replaceable@1@password@2@replaceable@2@@2@option@2@ | |
| 1394 | + If required, the password needed to open | |
| 1395 | + @1@replaceable@1@file@2@replaceable@2@ | |
| 1396 | + | |
| 1397 | + @1@option@1@--prefix=@1@replaceable@1@prefix@2@replaceable@2@@2@option@2@ | |
| 1398 | + Only required if the file from which attachments are being copied | |
| 1399 | + has attachments with keys that conflict with attachments already | |
| 1400 | + in the file. In this case, the specified prefix will be prepended | |
| 1401 | + to each key. This affects only the key in the embedded files | |
| 1402 | + table, not the file name. The PDF specification doesn't preclude | |
| 1403 | + multiple attachments having the same file name. | |
| 1404 | + | |
| 1405 | +When a date is required, the date should conform to the PDF date format | |
| 1406 | +specification, which is | |
| 1407 | +``D:``\ @1@replaceable@1@yyyymmddhhmmss<z>@2@replaceable@2@, where | |
| 1408 | +@1@replaceable@1@<z>@2@replaceable@2@ is either ``Z`` for UTC or a | |
| 1409 | +timezone offset in the form @1@replaceable@1@-hh'mm'@2@replaceable@2@ or | |
| 1410 | +@1@replaceable@1@+hh'mm'@2@replaceable@2@. Examples: | |
| 1411 | +``D:20210207161528-05'00'``, ``D:20210207211528Z``. | |
| 1412 | + | |
| 1413 | +.. _ref.advanced-parsing: | |
| 1414 | + | |
| 1415 | +Advanced Parsing Options | |
| 1416 | +------------------------ | |
| 1417 | + | |
| 1418 | +These options control aspects of how qpdf reads PDF files. Mostly these | |
| 1419 | +are of use to people who are working with damaged files. There is little | |
| 1420 | +reason to use these options unless you are trying to solve specific | |
| 1421 | +problems. The following options are available: | |
| 1422 | + | |
| 1423 | +@1@option@1@--suppress-recovery@2@option@2@ | |
| 1424 | + Prevents qpdf from attempting to recover damaged files. | |
| 1425 | + | |
| 1426 | +@1@option@1@--ignore-xref-streams@2@option@2@ | |
| 1427 | + Tells qpdf to ignore any cross-reference streams. | |
| 1428 | + | |
| 1429 | +Ordinarily, qpdf will attempt to recover from certain types of errors in | |
| 1430 | +PDF files. These include errors in the cross-reference table, certain | |
| 1431 | +types of object numbering errors, and certain types of stream length | |
| 1432 | +errors. Sometimes, qpdf may think it has recovered but may not have | |
| 1433 | +actually recovered, so care should be taken when using this option as | |
| 1434 | +some data loss is possible. The | |
| 1435 | +@1@option@1@--suppress-recovery@2@option@2@ option will prevent qpdf | |
| 1436 | +from attempting recovery. In this case, it will fail on the first error | |
| 1437 | +that it encounters. | |
| 1438 | + | |
| 1439 | +Ordinarily, qpdf reads cross-reference streams when they are present in | |
| 1440 | +a PDF file. If @1@option@1@--ignore-xref-streams@2@option@2@ is | |
| 1441 | +specified, qpdf will ignore any cross-reference streams for hybrid PDF | |
| 1442 | +files. The purpose of hybrid files is to make some content available to | |
| 1443 | +viewers that are not aware of cross-reference streams. It is almost | |
| 1444 | +never desirable to ignore them. The only time when you might want to use | |
| 1445 | +this feature is if you are testing creation of hybrid PDF files and wish | |
| 1446 | +to see how a PDF consumer that doesn't understand object and | |
| 1447 | +cross-reference streams would interpret such a file. | |
| 1448 | + | |
| 1449 | +.. _ref.advanced-transformation: | |
| 1450 | + | |
| 1451 | +Advanced Transformation Options | |
| 1452 | +------------------------------- | |
| 1453 | + | |
| 1454 | +These transformation options control fine points of how qpdf creates the | |
| 1455 | +output file. Mostly these are of use only to people who are very | |
| 1456 | +familiar with the PDF file format or who are PDF developers. The | |
| 1457 | +following options are available: | |
| 1458 | + | |
| 1459 | +@1@option@1@--compress-streams=@1@replaceable@1@[yn]@2@replaceable@2@@2@option@2@ | |
| 1460 | + By default, or with @1@option@1@--compress-streams=y@2@option@2@, | |
| 1461 | + qpdf will compress any stream with no other filters applied to it | |
| 1462 | + with the ``/FlateDecode`` filter when it writes it. To suppress this | |
| 1463 | + behavior and preserve uncompressed streams as uncompressed, use | |
| 1464 | + @1@option@1@--compress-streams=n@2@option@2@. | |
| 1465 | + | |
| 1466 | +@1@option@1@--decode-level=@1@replaceable@1@option@2@replaceable@2@@2@option@2@ | |
| 1467 | + Controls which streams qpdf tries to decode. The default is | |
| 1468 | + @1@option@1@generalized@2@option@2@. The following options are | |
| 1469 | + available: | |
| 1470 | + | |
| 1471 | + - @1@option@1@none@2@option@2@: do not attempt to decode any streams | |
| 1472 | + | |
| 1473 | + - @1@option@1@generalized@2@option@2@: decode streams filtered with | |
| 1474 | + supported generalized filters: ``/LZWDecode``, ``/FlateDecode``, | |
| 1475 | + ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define generalized | |
| 1476 | + filters as those to be used for general-purpose compression or | |
| 1477 | + encoding, as opposed to filters specifically designed for image | |
| 1478 | + data. Note that, by default, streams already compressed with | |
| 1479 | + ``/FlateDecode`` are not uncompressed and recompressed unless you | |
| 1480 | + also specify @1@option@1@--recompress-flate@2@option@2@. | |
| 1481 | + | |
| 1482 | + - @1@option@1@specialized@2@option@2@: in addition to generalized, | |
| 1483 | + decode streams with supported non-lossy specialized filters; | |
| 1484 | + currently this is just ``/RunLengthDecode`` | |
| 1485 | + | |
| 1486 | + - @1@option@1@all@2@option@2@: in addition to generalized and | |
| 1487 | + specialized, decode streams with supported lossy filters; | |
| 1488 | + currently this is just ``/DCTDecode`` (JPEG) | |
| 1489 | + | |
| 1490 | +@1@option@1@--stream-data=@1@replaceable@1@option@2@replaceable@2@@2@option@2@ | |
| 1491 | + Controls transformation of stream data. This option predates the | |
| 1492 | + @1@option@1@--compress-streams@2@option@2@ and | |
| 1493 | + @1@option@1@--decode-level@2@option@2@ options. Those options can be | |
| 1494 | + used to achieve the same affect with more control. The value of | |
| 1495 | + @1@option@1@@1@replaceable@1@option@2@replaceable@2@@2@option@2@ may | |
| 1496 | + be one of the following: | |
| 1497 | + | |
| 1498 | + - @1@option@1@compress@2@option@2@: recompress stream data when | |
| 1499 | + possible (default); equivalent to | |
| 1500 | + @1@option@1@--compress-streams=y@2@option@2@ | |
| 1501 | + @1@option@1@--decode-level=generalized@2@option@2@. Does not | |
| 1502 | + recompress streams already compressed with ``/FlateDecode`` unless | |
| 1503 | + @1@option@1@--recompress-flate@2@option@2@ is also specified. | |
| 1504 | + | |
| 1505 | + - @1@option@1@preserve@2@option@2@: leave all stream data as is; | |
| 1506 | + equivalent to @1@option@1@--compress-streams=n@2@option@2@ | |
| 1507 | + @1@option@1@--decode-level=none@2@option@2@ | |
| 1508 | + | |
| 1509 | + - @1@option@1@uncompress@2@option@2@: uncompress stream data | |
| 1510 | + compressed with generalized filters when possible; equivalent to | |
| 1511 | + @1@option@1@--compress-streams=n@2@option@2@ | |
| 1512 | + @1@option@1@--decode-level=generalized@2@option@2@ | |
| 1513 | + | |
| 1514 | +@1@option@1@--recompress-flate@2@option@2@ | |
| 1515 | + By default, streams already compressed with ``/FlateDecode`` are left | |
| 1516 | + alone rather than being uncompressed and recompressed. This option | |
| 1517 | + causes qpdf to uncompress and recompress the streams. There is a | |
| 1518 | + significant performance cost to using this option, but you probably | |
| 1519 | + want to use it if you specify | |
| 1520 | + @1@option@1@--compression-level@2@option@2@. | |
| 1521 | + | |
| 1522 | +@1@option@1@--compression-level=@1@replaceable@1@level@2@replaceable@2@@2@option@2@ | |
| 1523 | + When writing new streams that are compressed with ``/FlateDecode``, | |
| 1524 | + use the specified compression level. The value of | |
| 1525 | + @1@option@1@level@2@option@2@ should be a number from 1 to 9 and is | |
| 1526 | + passed directly to zlib, which implements deflate compression. Note | |
| 1527 | + that qpdf doesn't uncompress and recompress streams by default. To | |
| 1528 | + have this option apply to already compressed streams, you should also | |
| 1529 | + specify @1@option@1@--recompress-flate@2@option@2@. If your goal is | |
| 1530 | + to shrink the size of PDF files, you should also use | |
| 1531 | + @1@option@1@--object-streams=generate@2@option@2@. | |
| 1532 | + | |
| 1533 | +@1@option@1@--normalize-content=[yn]@2@option@2@ | |
| 1534 | + Enables or disables normalization of content streams. Content | |
| 1535 | + normalization is enabled by default in QDF mode. Please see `QDF | |
| 1536 | + Mode <#ref.qdf>`__ for additional discussion of QDF mode. | |
| 1537 | + | |
| 1538 | +@1@option@1@--object-streams=@1@replaceable@1@mode@2@replaceable@2@@2@option@2@ | |
| 1539 | + Controls handling of object streams. The value of | |
| 1540 | + @1@option@1@@1@replaceable@1@mode@2@replaceable@2@@2@option@2@ may be | |
| 1541 | + one of the following: | |
| 1542 | + | |
| 1543 | + - @1@option@1@preserve@2@option@2@: preserve original object streams | |
| 1544 | + (default) | |
| 1545 | + | |
| 1546 | + - @1@option@1@disable@2@option@2@: don't write any object streams | |
| 1547 | + | |
| 1548 | + - @1@option@1@generate@2@option@2@: use object streams wherever | |
| 1549 | + possible | |
| 1550 | + | |
| 1551 | +@1@option@1@--preserve-unreferenced@2@option@2@ | |
| 1552 | + Tells qpdf to preserve objects that are not referenced when writing | |
| 1553 | + the file. Ordinarily any object that is not referenced in a traversal | |
| 1554 | + of the document from the trailer dictionary will be discarded. This | |
| 1555 | + may be useful in working with some damaged files or inspecting files | |
| 1556 | + with known unreferenced objects. | |
| 1557 | + | |
| 1558 | + This flag is ignored for linearized files and has the effect of | |
| 1559 | + causing objects in the new file to be written in order by object ID | |
| 1560 | + from the original file. This does not mean that object numbers will | |
| 1561 | + be the same since qpdf may create stream lengths as direct or | |
| 1562 | + indirect differently from the original file, and the original file | |
| 1563 | + may have gaps in its numbering. | |
| 1564 | + | |
| 1565 | + See also @1@option@1@--preserve-unreferenced-resources@2@option@2@, | |
| 1566 | + which does something completely different. | |
| 1567 | + | |
| 1568 | +@1@option@1@--remove-unreferenced-resources=@1@replaceable@1@option@2@replaceable@2@@2@option@2@ | |
| 1569 | + The @1@replaceable@1@option@2@replaceable@2@ may be ``auto``, | |
| 1570 | + ``yes``, or ``no``. The default is ``auto``. | |
| 1571 | + | |
| 1572 | + Starting with qpdf 8.1, when splitting pages, qpdf is able to attempt | |
| 1573 | + to remove images and fonts that are not used by a page even if they | |
| 1574 | + are referenced in the page's resources dictionary. When shared | |
| 1575 | + resources are in use, this behavior can greatly reduce the file sizes | |
| 1576 | + of split pages, but the analysis is very slow. In versions from 8.1 | |
| 1577 | + through 9.1.1, qpdf did this analysis by default. Starting in qpdf | |
| 1578 | + 10.0.0, if ``auto`` is used, qpdf does a quick analysis of the file | |
| 1579 | + to determine whether the file is likely to have unreferenced objects | |
| 1580 | + on pages, a pattern that frequently occurs when resource dictionaries | |
| 1581 | + are shared across multiple pages and rarely occurs otherwise. If it | |
| 1582 | + discovers this pattern, then it will attempt to remove unreferenced | |
| 1583 | + resources. Usually this means you get the slower splitting speed only | |
| 1584 | + when it's actually going to create smaller files. You can suppress | |
| 1585 | + removal of unreferenced resources altogether by specifying ``no`` or | |
| 1586 | + force it to do the full algorithm by specifying ``yes``. | |
| 1587 | + | |
| 1588 | + Other than cases in which you don't care about file size and care a | |
| 1589 | + lot about runtime, there are few reasons to use this option, | |
| 1590 | + especially now that ``auto`` mode is supported. One reason to use | |
| 1591 | + this is if you suspect that qpdf is removing resources it shouldn't | |
| 1592 | + be removing. If you encounter that case, please report it as bug at | |
| 1593 | + https://github.com/qpdf/qpdf/issues/. | |
| 1594 | + | |
| 1595 | +@1@option@1@--preserve-unreferenced-resources@2@option@2@ | |
| 1596 | + This is a synonym for | |
| 1597 | + @1@option@1@--remove-unreferenced-resources=no@2@option@2@. | |
| 1598 | + | |
| 1599 | + See also @1@option@1@--preserve-unreferenced@2@option@2@, which does | |
| 1600 | + something completely different. | |
| 1601 | + | |
| 1602 | +@1@option@1@--newline-before-endstream@2@option@2@ | |
| 1603 | + Tells qpdf to insert a newline before the ``endstream`` keyword, not | |
| 1604 | + counted in the length, after any stream content even if the last | |
| 1605 | + character of the stream was a newline. This may result in two | |
| 1606 | + newlines in some cases. This is a requirement of PDF/A. While qpdf | |
| 1607 | + doesn't specifically know how to generate PDF/A-compliant PDFs, this | |
| 1608 | + at least prevents it from removing compliance on already compliant | |
| 1609 | + files. | |
| 1610 | + | |
| 1611 | +@1@option@1@--linearize-pass1=@1@replaceable@1@file@2@replaceable@2@@2@option@2@ | |
| 1612 | + Write the first pass of linearization to the named file. The | |
| 1613 | + resulting file is not a valid PDF file. This option is useful only | |
| 1614 | + for debugging ``QPDFWriter``'s linearization code. When qpdf | |
| 1615 | + linearizes files, it writes the file in two passes, using the first | |
| 1616 | + pass to calculate sizes and offsets that are required for hint tables | |
| 1617 | + and the linearization dictionary. Ordinarily, the first pass is | |
| 1618 | + discarded. This option enables it to be captured. | |
| 1619 | + | |
| 1620 | +@1@option@1@--coalesce-contents@2@option@2@ | |
| 1621 | + When a page's contents are split across multiple streams, this option | |
| 1622 | + causes qpdf to combine them into a single stream. Use of this option | |
| 1623 | + is never necessary for ordinary usage, but it can help when working | |
| 1624 | + with some files in some cases. For example, this can also be combined | |
| 1625 | + with QDF mode or content normalization to make it easier to look at | |
| 1626 | + all of a page's contents at once. | |
| 1627 | + | |
| 1628 | +@1@option@1@--flatten-annotations=@1@replaceable@1@option@2@replaceable@2@@2@option@2@ | |
| 1629 | + This option collapses annotations into the pages' contents with | |
| 1630 | + special handling for form fields. Ordinarily, an annotation is | |
| 1631 | + rendered separately and on top of the page. Combining annotations | |
| 1632 | + into the page's contents effectively freezes the placement of the | |
| 1633 | + annotations, making them look right after various page | |
| 1634 | + transformations. The library functionality backing this option was | |
| 1635 | + added for the benefit of programs that want to create *n-up* page | |
| 1636 | + layouts and other similar things that don't work well with | |
| 1637 | + annotations. The @1@replaceable@1@option@2@replaceable@2@ parameter | |
| 1638 | + may be any of the following: | |
| 1639 | + | |
| 1640 | + - @1@option@1@all@2@option@2@: include all annotations that are not | |
| 1641 | + marked invisible or hidden | |
| 1642 | + | |
| 1643 | + - @1@option@1@print@2@option@2@: only include annotations that | |
| 1644 | + indicate that they should appear when the page is printed | |
| 1645 | + | |
| 1646 | + - @1@option@1@screen@2@option@2@: omit annotations that indicate | |
| 1647 | + they should not appear on the screen | |
| 1648 | + | |
| 1649 | + Note that form fields are special because the annotations that are | |
| 1650 | + used to render filled-in form fields may become out of date from the | |
| 1651 | + fields' values if the form is filled in by a program that doesn't | |
| 1652 | + know how to update the appearances. If qpdf detects this case, its | |
| 1653 | + default behavior is not to flatten those annotations because doing so | |
| 1654 | + would cause the value of the form field to be lost. This gives you a | |
| 1655 | + chance to go back and resave the form with a program that knows how | |
| 1656 | + to generate appearances. QPDF itself can generate appearances with | |
| 1657 | + some limitations. See the | |
| 1658 | + @1@option@1@--generate-appearances@2@option@2@ option below. | |
| 1659 | + | |
| 1660 | +@1@option@1@--generate-appearances@2@option@2@ | |
| 1661 | + If a file contains interactive form fields and indicates that the | |
| 1662 | + appearances are out of date with the values of the form, this flag | |
| 1663 | + will regenerate appearances, subject to a few limitations. Note that | |
| 1664 | + there is not usually a reason to do this, but it can be necessary | |
| 1665 | + before using the @1@option@1@--flatten-annotations@2@option@2@ | |
| 1666 | + option. Most of these are not a problem with well-behaved PDF files. | |
| 1667 | + The limitations are as follows: | |
| 1668 | + | |
| 1669 | + - Radio button and checkbox appearances use the pre-set values in | |
| 1670 | + the PDF file. QPDF just makes sure that the correct appearance is | |
| 1671 | + displayed based on the value of the field. This is fine for PDF | |
| 1672 | + files that create their forms properly. Some PDF writers save | |
| 1673 | + appearances for fields when they change, which could cause some | |
| 1674 | + controls to have inconsistent appearances. | |
| 1675 | + | |
| 1676 | + - For text fields and list boxes, any characters that fall outside | |
| 1677 | + of US-ASCII or, if detected, "Windows ANSI" or "Mac Roman" | |
| 1678 | + encoding, will be replaced by the ``?`` character. | |
| 1679 | + | |
| 1680 | + - Quadding is ignored. Quadding is used to specify whether the | |
| 1681 | + contents of a field should be left, center, or right aligned with | |
| 1682 | + the field. | |
| 1683 | + | |
| 1684 | + - Rich text, multi-line, and other more elaborate formatting | |
| 1685 | + directives are ignored. | |
| 1686 | + | |
| 1687 | + - There is no support for multi-select fields or signature fields. | |
| 1688 | + | |
| 1689 | + If qpdf doesn't do a good enough job with your form, use an external | |
| 1690 | + application to save your filled-in form before processing it with | |
| 1691 | + qpdf. | |
| 1692 | + | |
| 1693 | +@1@option@1@--optimize-images@2@option@2@ | |
| 1694 | + This flag causes qpdf to recompress all images that are not | |
| 1695 | + compressed with DCT (JPEG) using DCT compression as long as doing so | |
| 1696 | + decreases the size in bytes of the image data and the image does not | |
| 1697 | + fall below minimum specified dimensions. Useful information is | |
| 1698 | + provided when used in combination with | |
| 1699 | + @1@option@1@--verbose@2@option@2@. See also the | |
| 1700 | + @1@option@1@--oi-min-width@2@option@2@, | |
| 1701 | + @1@option@1@--oi-min-height@2@option@2@, and | |
| 1702 | + @1@option@1@--oi-min-area@2@option@2@ options. By default, starting | |
| 1703 | + in qpdf 8.4, inline images are converted to regular images and | |
| 1704 | + optimized as well. Use @1@option@1@--keep-inline-images@2@option@2@ | |
| 1705 | + to prevent inline images from being included. | |
| 1706 | + | |
| 1707 | +@1@option@1@--oi-min-width=@1@replaceable@1@width@2@replaceable@2@@2@option@2@ | |
| 1708 | + Avoid optimizing images whose width is below the specified amount. If | |
| 1709 | + omitted, the default is 128 pixels. Use 0 for no minimum. | |
| 1710 | + | |
| 1711 | +@1@option@1@--oi-min-height=@1@replaceable@1@height@2@replaceable@2@@2@option@2@ | |
| 1712 | + Avoid optimizing images whose height is below the specified amount. | |
| 1713 | + If omitted, the default is 128 pixels. Use 0 for no minimum. | |
| 1714 | + | |
| 1715 | +@1@option@1@--oi-min-area=@1@replaceable@1@area-in-pixels@2@replaceable@2@@2@option@2@ | |
| 1716 | + Avoid optimizing images whose pixel count (widthย รย height) is below | |
| 1717 | + the specified amount. If omitted, the default is 16,384 pixels. Use 0 | |
| 1718 | + for no minimum. | |
| 1719 | + | |
| 1720 | +@1@option@1@--externalize-inline-images@2@option@2@ | |
| 1721 | + Convert inline images to regular images. By default, images whose | |
| 1722 | + data is at least 1,024 bytes are converted when this option is | |
| 1723 | + selected. Use @1@option@1@--ii-min-bytes@2@option@2@ to change the | |
| 1724 | + size threshold. This option is implicitly selected when | |
| 1725 | + @1@option@1@--optimize-images@2@option@2@ is selected. Use | |
| 1726 | + @1@option@1@--keep-inline-images@2@option@2@ to exclude inline images | |
| 1727 | + from image optimization. | |
| 1728 | + | |
| 1729 | +@1@option@1@--ii-min-bytes=@1@replaceable@1@bytes@2@replaceable@2@@2@option@2@ | |
| 1730 | + Avoid converting inline images whose size is below the specified | |
| 1731 | + minimum size to regular images. If omitted, the default is 1,024 | |
| 1732 | + bytes. Use 0 for no minimum. | |
| 1733 | + | |
| 1734 | +@1@option@1@--keep-inline-images@2@option@2@ | |
| 1735 | + Prevent inline images from being included in image optimization. This | |
| 1736 | + option has no affect when @1@option@1@--optimize-images@2@option@2@ | |
| 1737 | + is not specified. | |
| 1738 | + | |
| 1739 | +@1@option@1@--remove-page-labels@2@option@2@ | |
| 1740 | + Remove page labels from the output file. | |
| 1741 | + | |
| 1742 | +@1@option@1@--qdf@2@option@2@ | |
| 1743 | + Turns on QDF mode. For additional information on QDF, please see `QDF | |
| 1744 | + Mode <#ref.qdf>`__. Note that @1@option@1@--linearize@2@option@2@ | |
| 1745 | + disables QDF mode. | |
| 1746 | + | |
| 1747 | +@1@option@1@--min-version=@1@replaceable@1@version@2@replaceable@2@@2@option@2@ | |
| 1748 | + Forces the PDF version of the output file to be at least | |
| 1749 | + @1@replaceable@1@version@2@replaceable@2@. In other words, if the | |
| 1750 | + input file has a lower version than the specified version, the | |
| 1751 | + specified version will be used. If the input file has a higher | |
| 1752 | + version, the input file's original version will be used. It is seldom | |
| 1753 | + necessary to use this option since qpdf will automatically increase | |
| 1754 | + the version as needed when adding features that require newer PDF | |
| 1755 | + readers. | |
| 1756 | + | |
| 1757 | + The version number may be expressed in the form | |
| 1758 | + @1@replaceable@1@major.minor.extension-level@2@replaceable@2@, in | |
| 1759 | + which case the version is interpreted as | |
| 1760 | + @1@replaceable@1@major.minor@2@replaceable@2@ at extension level | |
| 1761 | + @1@replaceable@1@extension-level@2@replaceable@2@. For example, | |
| 1762 | + version ``1.7.8`` represents version 1.7 at extension level 8. Note | |
| 1763 | + that minimal syntax checking is done on the command line. | |
| 1764 | + | |
| 1765 | +@1@option@1@--force-version=@1@replaceable@1@version@2@replaceable@2@@2@option@2@ | |
| 1766 | + This option forces the PDF version to be the exact version specified | |
| 1767 | + *even when the file may have content that is not supported in that | |
| 1768 | + version*. The version number is interpreted in the same way as with | |
| 1769 | + @1@option@1@--min-version@2@option@2@ so that extension levels can be | |
| 1770 | + set. In some cases, forcing the output file's PDF version to be lower | |
| 1771 | + than that of the input file will cause qpdf to disable certain | |
| 1772 | + features of the document. Specifically, 256-bit keys are disabled if | |
| 1773 | + the version is less than 1.7 with extension level 8 (except R5 is | |
| 1774 | + disabled if less than 1.7 with extension level 3), AES encryption is | |
| 1775 | + disabled if the version is less than 1.6, cleartext metadata and | |
| 1776 | + object streams are disabled if less than 1.5, 128-bit encryption keys | |
| 1777 | + are disabled if less than 1.4, and all encryption is disabled if less | |
| 1778 | + than 1.3. Even with these precautions, qpdf won't be able to do | |
| 1779 | + things like eliminate use of newer image compression schemes, | |
| 1780 | + transparency groups, or other features that may have been added in | |
| 1781 | + more recent versions of PDF. | |
| 1782 | + | |
| 1783 | + As a general rule, with the exception of big structural things like | |
| 1784 | + the use of object streams or AES encryption, PDF viewers are supposed | |
| 1785 | + to ignore features in files that they don't support from newer | |
| 1786 | + versions. This means that forcing the version to a lower version may | |
| 1787 | + make it possible to open your PDF file with an older version, though | |
| 1788 | + bear in mind that some of the original document's functionality may | |
| 1789 | + be lost. | |
| 1790 | + | |
| 1791 | +By default, when a stream is encoded using non-lossy filters that qpdf | |
| 1792 | +understands and is not already compressed using a good compression | |
| 1793 | +scheme, qpdf will uncompress and recompress streams. Assuming proper | |
| 1794 | +filter implements, this is safe and generally results in smaller files. | |
| 1795 | +This behavior may also be explicitly requested with | |
| 1796 | +@1@option@1@--stream-data=compress@2@option@2@. | |
| 1797 | + | |
| 1798 | +When @1@option@1@--normalize-content=y@2@option@2@ is specified, qpdf | |
| 1799 | +will attempt to normalize whitespace and newlines in page content | |
| 1800 | +streams. This is generally safe but could, in some cases, cause damage | |
| 1801 | +to the content streams. This option is intended for people who wish to | |
| 1802 | +study PDF content streams or to debug PDF content. You should not use | |
| 1803 | +this for "production" PDF files. | |
| 1804 | + | |
| 1805 | +When normalizing content, if qpdf runs into any lexical errors, it will | |
| 1806 | +print a warning indicating that content may be damaged. The only | |
| 1807 | +situation in which qpdf is known to cause damage during content | |
| 1808 | +normalization is when a page's contents are split across multiple | |
| 1809 | +streams and streams are split in the middle of a lexical token such as a | |
| 1810 | +string, name, or inline image. Note that files that do this are invalid | |
| 1811 | +since the PDF specification states that content streams are not to be | |
| 1812 | +split in the middle of a token. If you want to inspect the original | |
| 1813 | +content streams in an uncompressed format, you can always run with | |
| 1814 | +@1@option@1@--qdf --normalize-content=n@2@option@2@ for a QDF file | |
| 1815 | +without content normalization, or alternatively | |
| 1816 | +@1@option@1@--stream-data=uncompress@2@option@2@ for a regular non-QDF | |
| 1817 | +mode file with uncompressed streams. These will both uncompress all the | |
| 1818 | +streams but will not attempt to normalize content. Please note that if | |
| 1819 | +you are using content normalization or QDF mode for the purpose of | |
| 1820 | +manually inspecting files, you don't have to care about this. | |
| 1821 | + | |
| 1822 | +Object streams, also known as compressed objects, were introduced into | |
| 1823 | +the PDF specification at version 1.5, corresponding to Acrobat 6. Some | |
| 1824 | +older PDF viewers may not support files with object streams. qpdf can be | |
| 1825 | +used to transform files with object streams to files without object | |
| 1826 | +streams or vice versa. As mentioned above, there are three object stream | |
| 1827 | +modes: @1@option@1@preserve@2@option@2@, | |
| 1828 | +@1@option@1@disable@2@option@2@, and @1@option@1@generate@2@option@2@. | |
| 1829 | + | |
| 1830 | +In @1@option@1@preserve@2@option@2@ mode, the relationship to objects | |
| 1831 | +and the streams that contain them is preserved from the original file. | |
| 1832 | +In @1@option@1@disable@2@option@2@ mode, all objects are written as | |
| 1833 | +regular, uncompressed objects. The resulting file should be readable by | |
| 1834 | +older PDF viewers. (Of course, the content of the files may include | |
| 1835 | +features not supported by older viewers, but at least the structure will | |
| 1836 | +be supported.) In @1@option@1@generate@2@option@2@ mode, qpdf will | |
| 1837 | +create its own object streams. This will usually result in more compact | |
| 1838 | +PDF files, though they may not be readable by older viewers. In this | |
| 1839 | +mode, qpdf will also make sure the PDF version number in the header is | |
| 1840 | +at least 1.5. | |
| 1841 | + | |
| 1842 | +The @1@option@1@--qdf@2@option@2@ flag turns on QDF mode, which changes | |
| 1843 | +some of the defaults described above. Specifically, in QDF mode, by | |
| 1844 | +default, stream data is uncompressed, content streams are normalized, | |
| 1845 | +and encryption is removed. These defaults can still be overridden by | |
| 1846 | +specifying the appropriate options as described above. Additionally, in | |
| 1847 | +QDF mode, stream lengths are stored as indirect objects, objects are | |
| 1848 | +laid out in a less efficient but more readable fashion, and the | |
| 1849 | +documents are interspersed with comments that make it easier for the | |
| 1850 | +user to find things and also make it possible for | |
| 1851 | +@1@command@1@fix-qdf@2@command@2@ to work properly. QDF mode is intended | |
| 1852 | +for people, mostly developers, who wish to inspect or modify PDF files | |
| 1853 | +in a text editor. For details, please see `QDF Mode <#ref.qdf>`__. | |
| 1854 | + | |
| 1855 | +.. _ref.testing-options: | |
| 1856 | + | |
| 1857 | +Testing, Inspection, and Debugging Options | |
| 1858 | +------------------------------------------ | |
| 1859 | + | |
| 1860 | +These options can be useful for digging into PDF files or for use in | |
| 1861 | +automated test suites for software that uses the qpdf library. When any | |
| 1862 | +of the options in this section are specified, no output file should be | |
| 1863 | +given. The following options are available: | |
| 1864 | + | |
| 1865 | +@1@option@1@--deterministic-id@2@option@2@ | |
| 1866 | + Causes generation of a deterministic value for /ID. This prevents use | |
| 1867 | + of timestamp and output file name information in the /ID generation. | |
| 1868 | + Instead, at some slight additional runtime cost, the /ID field is | |
| 1869 | + generated to include a digest of the significant parts of the content | |
| 1870 | + of the output PDF file. This means that a given qpdf operation should | |
| 1871 | + generate the same /ID each time it is run, which can be useful when | |
| 1872 | + caching results or for generation of some test data. Use of this flag | |
| 1873 | + is not compatible with creation of encrypted files. | |
| 1874 | + | |
| 1875 | +@1@option@1@--static-id@2@option@2@ | |
| 1876 | + Causes generation of a fixed value for /ID. This is intended for | |
| 1877 | + testing only. Never use it for production files. If you are trying to | |
| 1878 | + get the same /ID each time for a given file and you are not | |
| 1879 | + generating encrypted files, consider using the | |
| 1880 | + @1@option@1@--deterministic-id@2@option@2@ option. | |
| 1881 | + | |
| 1882 | +@1@option@1@--static-aes-iv@2@option@2@ | |
| 1883 | + Causes use of a static initialization vector for AES-CBC. This is | |
| 1884 | + intended for testing only so that output files can be reproducible. | |
| 1885 | + Never use it for production files. This option in particular is not | |
| 1886 | + secure since it significantly weakens the encryption. | |
| 1887 | + | |
| 1888 | +@1@option@1@--no-original-object-ids@2@option@2@ | |
| 1889 | + Suppresses inclusion of original object ID comments in QDF files. | |
| 1890 | + This can be useful when generating QDF files for test purposes, | |
| 1891 | + particularly when comparing them to determine whether two PDF files | |
| 1892 | + have identical content. | |
| 1893 | + | |
| 1894 | +@1@option@1@--show-encryption@2@option@2@ | |
| 1895 | + Shows document encryption parameters. Also shows the document's user | |
| 1896 | + password if the owner password is given. | |
| 1897 | + | |
| 1898 | +@1@option@1@--show-encryption-key@2@option@2@ | |
| 1899 | + When encryption information is being displayed, as when | |
| 1900 | + @1@option@1@--check@2@option@2@ or | |
| 1901 | + @1@option@1@--show-encryption@2@option@2@ is given, display the | |
| 1902 | + computed or retrieved encryption key as a hexadecimal string. This | |
| 1903 | + value is not ordinarily useful to users, but it can be used as the | |
| 1904 | + argument to @1@option@1@--password@2@option@2@ if the | |
| 1905 | + @1@option@1@--password-is-hex-key@2@option@2@ is specified. Note | |
| 1906 | + that, when PDF files are encrypted, passwords and other metadata are | |
| 1907 | + used only to compute an encryption key, and the encryption key is | |
| 1908 | + what is actually used for encryption. This enables retrieval of that | |
| 1909 | + key. | |
| 1910 | + | |
| 1911 | +@1@option@1@--check-linearization@2@option@2@ | |
| 1912 | + Checks file integrity and linearization status. | |
| 1913 | + | |
| 1914 | +@1@option@1@--show-linearization@2@option@2@ | |
| 1915 | + Checks and displays all data in the linearization hint tables. | |
| 1916 | + | |
| 1917 | +@1@option@1@--show-xref@2@option@2@ | |
| 1918 | + Shows the contents of the cross-reference table in a human-readable | |
| 1919 | + form. This is especially useful for files with cross-reference | |
| 1920 | + streams which are stored in a binary format. | |
| 1921 | + | |
| 1922 | +@1@option@1@--show-object=trailer|obj[,gen]@2@option@2@ | |
| 1923 | + Show the contents of the given object. This is especially useful for | |
| 1924 | + inspecting objects that are inside of object streams (also known as | |
| 1925 | + "compressed objects"). | |
| 1926 | + | |
| 1927 | +@1@option@1@--raw-stream-data@2@option@2@ | |
| 1928 | + When used along with the @1@option@1@--show-object@2@option@2@ | |
| 1929 | + option, if the object is a stream, shows the raw stream data instead | |
| 1930 | + of object's contents. | |
| 1931 | + | |
| 1932 | +@1@option@1@--filtered-stream-data@2@option@2@ | |
| 1933 | + When used along with the @1@option@1@--show-object@2@option@2@ | |
| 1934 | + option, if the object is a stream, shows the filtered stream data | |
| 1935 | + instead of object's contents. If the stream is filtered using filters | |
| 1936 | + that qpdf does not support, an error will be issued. | |
| 1937 | + | |
| 1938 | +@1@option@1@--show-npages@2@option@2@ | |
| 1939 | + Prints the number of pages in the input file on a line by itself. | |
| 1940 | + Since the number of pages appears by itself on a line, this option | |
| 1941 | + can be useful for scripting if you need to know the number of pages | |
| 1942 | + in a file. | |
| 1943 | + | |
| 1944 | +@1@option@1@--show-pages@2@option@2@ | |
| 1945 | + Shows the object and generation number for each page dictionary | |
| 1946 | + object and for each content stream associated with the page. Having | |
| 1947 | + this information makes it more convenient to inspect objects from a | |
| 1948 | + particular page. | |
| 1949 | + | |
| 1950 | +@1@option@1@--with-images@2@option@2@ | |
| 1951 | + When used along with @1@option@1@--show-pages@2@option@2@, also shows | |
| 1952 | + the object and generation numbers for the image objects on each page. | |
| 1953 | + (At present, information about images in shared resource dictionaries | |
| 1954 | + are not output by this command. This is discussed in a comment in the | |
| 1955 | + source code.) | |
| 1956 | + | |
| 1957 | +@1@option@1@--json@2@option@2@ | |
| 1958 | + Generate a JSON representation of the file. This is described in | |
| 1959 | + depth in `QPDF JSON <#ref.json>`__ | |
| 1960 | + | |
| 1961 | +@1@option@1@--json-help@2@option@2@ | |
| 1962 | + Describe the format of the JSON output. | |
| 1963 | + | |
| 1964 | +@1@option@1@--json-key=key@2@option@2@ | |
| 1965 | + This option is repeatable. If specified, only top-level keys | |
| 1966 | + specified will be included in the JSON output. If not specified, all | |
| 1967 | + keys will be shown. | |
| 1968 | + | |
| 1969 | +@1@option@1@--json-object=trailer|obj[,gen]@2@option@2@ | |
| 1970 | + This option is repeatable. If specified, only specified objects will | |
| 1971 | + be shown in the "``objects``" key of the JSON output. If absent, all | |
| 1972 | + objects will be shown. | |
| 1973 | + | |
| 1974 | +@1@option@1@--check@2@option@2@ | |
| 1975 | + Checks file structure and well as encryption, linearization, and | |
| 1976 | + encoding of stream data. A file for which | |
| 1977 | + @1@option@1@--check@2@option@2@ reports no errors may still have | |
| 1978 | + errors in stream data content but should otherwise be structurally | |
| 1979 | + sound. If @1@option@1@--check@2@option@2@ any errors, qpdf will exit | |
| 1980 | + with a status of 2. There are some recoverable conditions that | |
| 1981 | + @1@option@1@--check@2@option@2@ detects. These are issued as warnings | |
| 1982 | + instead of errors. If qpdf finds no errors but finds warnings, it | |
| 1983 | + will exit with a status of 3 (as of versionย 2.0.4). When | |
| 1984 | + @1@option@1@--check@2@option@2@ is combined with other options, | |
| 1985 | + checks are always performed before any other options are processed. | |
| 1986 | + For erroneous files, @1@option@1@--check@2@option@2@ will cause qpdf | |
| 1987 | + to attempt to recover, after which other options are effectively | |
| 1988 | + operating on the recovered file. Combining | |
| 1989 | + @1@option@1@--check@2@option@2@ with other options in this way can be | |
| 1990 | + useful for manually recovering severely damaged files. Note that | |
| 1991 | + @1@option@1@--check@2@option@2@ produces no output to standard output | |
| 1992 | + when everything is valid, so if you are using this to | |
| 1993 | + programmatically validate files in bulk, it is safe to run without | |
| 1994 | + output redirected to @1@filename@1@/dev/null@2@filename@2@ and just | |
| 1995 | + check for a 0 exit code. | |
| 1996 | + | |
| 1997 | +The @1@option@1@--raw-stream-data@2@option@2@ and | |
| 1998 | +@1@option@1@--filtered-stream-data@2@option@2@ options are ignored | |
| 1999 | +unless @1@option@1@--show-object@2@option@2@ is given. Either of these | |
| 2000 | +options will cause the stream data to be written to standard output. In | |
| 2001 | +order to avoid commingling of stream data with other output, it is | |
| 2002 | +recommend that these objects not be combined with other test/inspection | |
| 2003 | +options. | |
| 2004 | + | |
| 2005 | +If @1@option@1@--filtered-stream-data@2@option@2@ is given and | |
| 2006 | +@1@option@1@--normalize-content=y@2@option@2@ is also given, qpdf will | |
| 2007 | +attempt to normalize the stream data as if it is a page content stream. | |
| 2008 | +This attempt will be made even if it is not a page content stream, in | |
| 2009 | +which case it will produce unusable results. | |
| 2010 | + | |
| 2011 | +.. _ref.unicode-passwords: | |
| 2012 | + | |
| 2013 | +Unicode Passwords | |
| 2014 | +----------------- | |
| 2015 | + | |
| 2016 | +At the library API level, all methods that perform encryption and | |
| 2017 | +decryption interpret passwords as strings of bytes. It is up to the | |
| 2018 | +caller to ensure that they are appropriately encoded. Starting with qpdf | |
| 2019 | +version 8.4.0, qpdf will attempt to make this easier for you when | |
| 2020 | +interact with qpdf via its command line interface. The PDF specification | |
| 2021 | +requires passwords used to encrypt files with 40-bit or 128-bit | |
| 2022 | +encryption to be encoded with PDF Doc encoding. This encoding is a | |
| 2023 | +single-byte encoding that supports ISO-Latin-1 and a handful of other | |
| 2024 | +commonly used characters. It has a large overlap with Windows ANSI but | |
| 2025 | +is not exactly the same. There is generally not a way to provide PDF Doc | |
| 2026 | +encoded strings on the command line. As such, qpdf versions prior to | |
| 2027 | +8.4.0 would often create PDF files that couldn't be opened with other | |
| 2028 | +software when given a password with non-ASCII characters to encrypt a | |
| 2029 | +file with 40-bit or 128-bit encryption. Starting with qpdf 8.4.0, qpdf | |
| 2030 | +recognizes the encoding of the parameter and transcodes it as needed. | |
| 2031 | +The rest of this section provides the details about exactly how qpdf | |
| 2032 | +behaves. Most users will not need to know this information, but it might | |
| 2033 | +be useful if you have been working around qpdf's old behavior or if you | |
| 2034 | +are using qpdf to generate encrypted files for testing other PDF | |
| 2035 | +software. | |
| 2036 | + | |
| 2037 | +A note about Windows: when qpdf builds, it attempts to determine what it | |
| 2038 | +has to do to use ``wmain`` instead of ``main`` on Windows. The ``wmain`` | |
| 2039 | +function is an alternative entry point that receives all arguments as | |
| 2040 | +UTF-16-encoded strings. When qpdf starts up this way, it converts all | |
| 2041 | +the strings to UTF-8 encoding and then invokes the regular main. This | |
| 2042 | +means that, as far as qpdf is concerned, it receives its command-line | |
| 2043 | +arguments with UTF-8 encoding, just as it would in any modern Linux or | |
| 2044 | +UNIX environment. | |
| 2045 | + | |
| 2046 | +If a file is being encrypted with 40-bit or 128-bit encryption and the | |
| 2047 | +supplied password is not a valid UTF-8 string, qpdf will fall back to | |
| 2048 | +the behavior of interpreting the password as a string of bytes. If you | |
| 2049 | +have old scripts that encrypt files by passing the output of | |
| 2050 | +@1@command@1@iconv@2@command@2@ to qpdf, you no longer need to do that, | |
| 2051 | +but if you do, qpdf should still work. The only exception would be for | |
| 2052 | +the extremely unlikely case of a password that is encoded with a | |
| 2053 | +single-byte encoding but also happens to be valid UTF-8. Such a password | |
| 2054 | +would contain strings of even numbers of characters that alternate | |
| 2055 | +between accented letters and symbols. In the extremely unlikely event | |
| 2056 | +that you are intentionally using such passwords and qpdf is thwarting | |
| 2057 | +you by interpreting them as UTF-8, you can use | |
| 2058 | +@1@option@1@--password-mode=bytes@2@option@2@ to suppress qpdf's | |
| 2059 | +automatic behavior. | |
| 2060 | + | |
| 2061 | +The @1@option@1@--password-mode@2@option@2@ option, as described earlier | |
| 2062 | +in this chapter, can be used to change qpdf's interpretation of supplied | |
| 2063 | +passwords. There are very few reasons to use this option. One would be | |
| 2064 | +the unlikely case described in the previous paragraph in which the | |
| 2065 | +supplied password happens to be valid UTF-8 but isn't supposed to be | |
| 2066 | +UTF-8. Your best bet would be just to provide the password as a valid | |
| 2067 | +UTF-8 string, but you could also use | |
| 2068 | +@1@option@1@--password-mode=bytes@2@option@2@. Another reason to use | |
| 2069 | +@1@option@1@--password-mode=bytes@2@option@2@ would be to intentionally | |
| 2070 | +generate PDF files encrypted with passwords that are not properly | |
| 2071 | +encoded. The qpdf test suite does this to generate invalid files for the | |
| 2072 | +purpose of testing its password recovery capability. If you were trying | |
| 2073 | +to create intentionally incorrect files for a similar purposes, the | |
| 2074 | +@1@option@1@bytes@2@option@2@ password mode can enable you to do this. | |
| 2075 | + | |
| 2076 | +When qpdf attempts to decrypt a file with a password that contains | |
| 2077 | +non-ASCII characters, it will generate a list of alternative passwords | |
| 2078 | +by attempting to interpret the password as each of a handful of | |
| 2079 | +different coding systems and then transcode them to the required format. | |
| 2080 | +This helps to compensate for the supplied password being given in the | |
| 2081 | +wrong coding system, such as would happen if you used the | |
| 2082 | +@1@command@1@iconv@2@command@2@ workaround that was previously needed. | |
| 2083 | +It also generates passwords by doing the reverse operation: translating | |
| 2084 | +from correct in incorrect encoding of the password. This would enable | |
| 2085 | +qpdf to decrypt files using passwords that were improperly encoded by | |
| 2086 | +whatever software encrypted the files, including older versions of qpdf | |
| 2087 | +invoked without properly encoded passwords. The combination of these two | |
| 2088 | +recovery methods should make qpdf transparently open most encrypted | |
| 2089 | +files with the password supplied correctly but in the wrong coding | |
| 2090 | +system. There are no real downsides to this behavior, but if you don't | |
| 2091 | +want qpdf to do this, you can use the | |
| 2092 | +@1@option@1@--suppress-password-recovery@2@option@2@ option. One reason | |
| 2093 | +to do that is to ensure that you know the exact password that was used | |
| 2094 | +to encrypt the file. | |
| 2095 | + | |
| 2096 | +With these changes, qpdf now generates compliant passwords in most | |
| 2097 | +cases. There are still some exceptions. In particular, the PDF | |
| 2098 | +specification directs compliant writers to normalize Unicode passwords | |
| 2099 | +and to perform certain transformations on passwords with bidirectional | |
| 2100 | +text. Implementing this functionality requires using a real Unicode | |
| 2101 | +library like ICU. If a client application that uses qpdf wants to do | |
| 2102 | +this, the qpdf library will accept the resulting passwords, but qpdf | |
| 2103 | +will not perform these transformations itself. It is possible that this | |
| 2104 | +will be addressed in a future version of qpdf. The ``QPDFWriter`` | |
| 2105 | +methods that enable encryption on the output file accept passwords as | |
| 2106 | +strings of bytes. | |
| 2107 | + | |
| 2108 | +Please note that the @1@option@1@--password-is-hex-key@2@option@2@ | |
| 2109 | +option is unrelated to all this. This flag bypasses the normal process | |
| 2110 | +of going from password to encryption string entirely, allowing the raw | |
| 2111 | +encryption key to be specified directly. This is useful for forensic | |
| 2112 | +purposes or for brute-force recovery of files with unknown passwords. | |
| 2113 | + | |
| 2114 | +.. _ref.qdf: | |
| 2115 | + | |
| 2116 | +QDF Mode | |
| 2117 | +======== | |
| 2118 | + | |
| 2119 | +In QDF mode, qpdf creates PDF files in what we call @1@firstterm@1@QDF | |
| 2120 | +form@2@firstterm@2@. A PDF file in QDF form, sometimes called a QDF | |
| 2121 | +file, is a completely valid PDF file that has ``%QDF-1.0`` as its third | |
| 2122 | +line (after the pdf header and binary characters) and has certain other | |
| 2123 | +characteristics. The purpose of QDF form is to make it possible to edit | |
| 2124 | +PDF files, with some restrictions, in an ordinary text editor. This can | |
| 2125 | +be very useful for experimenting with different PDF constructs or for | |
| 2126 | +making one-off edits to PDF files (though there are other reasons why | |
| 2127 | +this may not always work). Note that QDF mode does not support | |
| 2128 | +linearized files. If you enable linearization, QDF mode is automatically | |
| 2129 | +disabled. | |
| 2130 | + | |
| 2131 | +It is ordinarily very difficult to edit PDF files in a text editor for | |
| 2132 | +two reasons: most meaningful data in PDF files is compressed, and PDF | |
| 2133 | +files are full of offset and length information that makes it hard to | |
| 2134 | +add or remove data. A QDF file is organized in a manner such that, if | |
| 2135 | +edits are kept within certain constraints, the | |
| 2136 | +@1@command@1@fix-qdf@2@command@2@ program, distributed with qpdf, is | |
| 2137 | +able to restore edited files to a correct state. The | |
| 2138 | +@1@command@1@fix-qdf@2@command@2@ program takes no command-line | |
| 2139 | +arguments. It reads a possibly edited QDF file from standard input and | |
| 2140 | +writes a repaired file to standard output. | |
| 2141 | + | |
| 2142 | +The following attributes characterize a QDF file: | |
| 2143 | + | |
| 2144 | +- All objects appear in numerical order in the PDF file, including when | |
| 2145 | + objects appear in object streams. | |
| 2146 | + | |
| 2147 | +- Objects are printed in an easy-to-read format, and all line endings | |
| 2148 | + are normalized to UNIX line endings. | |
| 2149 | + | |
| 2150 | +- Unless specifically overridden, streams appear uncompressed (when | |
| 2151 | + qpdf supports the filters and they are compressed with a non-lossy | |
| 2152 | + compression scheme), and most content streams are normalized (line | |
| 2153 | + endings are converted to just a UNIX-style linefeeds). | |
| 2154 | + | |
| 2155 | +- All streams lengths are represented as indirect objects, and the | |
| 2156 | + stream length object is always the next object after the stream. If | |
| 2157 | + the stream data does not end with a newline, an extra newline is | |
| 2158 | + inserted, and a special comment appears after the stream indicating | |
| 2159 | + that this has been done. | |
| 2160 | + | |
| 2161 | +- If the PDF file contains object streams, if object stream *n* | |
| 2162 | + contains *k* objects, those objects are numbered from *n+1* through | |
| 2163 | + *n+k*, and the object number/offset pairs appear on a separate line | |
| 2164 | + for each object. Additionally, each object in the object stream is | |
| 2165 | + preceded by a comment indicating its object number and index. This | |
| 2166 | + makes it very easy to find objects in object streams. | |
| 2167 | + | |
| 2168 | +- All beginnings of objects, ``stream`` tokens, ``endstream`` tokens, | |
| 2169 | + and ``endobj`` tokens appear on lines by themselves. A blank line | |
| 2170 | + follows every ``endobj`` token. | |
| 2171 | + | |
| 2172 | +- If there is a cross-reference stream, it is unfiltered. | |
| 2173 | + | |
| 2174 | +- Page dictionaries and page content streams are marked with special | |
| 2175 | + comments that make them easy to find. | |
| 2176 | + | |
| 2177 | +- Comments precede each object indicating the object number of the | |
| 2178 | + corresponding object in the original file. | |
| 2179 | + | |
| 2180 | +When editing a QDF file, any edits can be made as long as the above | |
| 2181 | +constraints are maintained. This means that you can freely edit a page's | |
| 2182 | +content without worrying about messing up the QDF file. It is also | |
| 2183 | +possible to add new objects so long as those objects are added after the | |
| 2184 | +last object in the file or subsequent objects are renumbered. If a QDF | |
| 2185 | +file has object streams in it, you can always add the new objects before | |
| 2186 | +the xref stream and then change the number of the xref stream, since | |
| 2187 | +nothing generally ever references it by number. | |
| 2188 | + | |
| 2189 | +It is not generally practical to remove objects from QDF files without | |
| 2190 | +messing up object numbering, but if you remove all references to an | |
| 2191 | +object, you can run qpdf on the file (after running | |
| 2192 | +@1@command@1@fix-qdf@2@command@2@), and qpdf will omit the now-orphaned | |
| 2193 | +object. | |
| 2194 | + | |
| 2195 | +When @1@command@1@fix-qdf@2@command@2@ is run, it goes through the file | |
| 2196 | +and recomputes the following parts of the file: | |
| 2197 | + | |
| 2198 | +- the ``/N``, ``/W``, and ``/First`` keys of all object stream | |
| 2199 | + dictionaries | |
| 2200 | + | |
| 2201 | +- the pairs of numbers representing object numbers and offsets of | |
| 2202 | + objects in object streams | |
| 2203 | + | |
| 2204 | +- all stream lengths | |
| 2205 | + | |
| 2206 | +- the cross-reference table or cross-reference stream | |
| 2207 | + | |
| 2208 | +- the offset to the cross-reference table or cross-reference stream | |
| 2209 | + following the ``startxref`` token | |
| 2210 | + | |
| 2211 | +.. _ref.using-library: | |
| 2212 | + | |
| 2213 | +Using the QPDF Library | |
| 2214 | +====================== | |
| 2215 | + | |
| 2216 | +.. _ref.using.from-cxx: | |
| 2217 | + | |
| 2218 | +Using QPDF from C++ | |
| 2219 | +------------------- | |
| 2220 | + | |
| 2221 | +The source tree for the qpdf package has an | |
| 2222 | +@1@filename@1@examples@2@filename@2@ directory that contains a few | |
| 2223 | +example programs. The @1@filename@1@qpdf/qpdf.cc@2@filename@2@ source | |
| 2224 | +file also serves as a useful example since it exercises almost all of | |
| 2225 | +the qpdf library's public interface. The best source of documentation on | |
| 2226 | +the library itself is reading comments in | |
| 2227 | +@1@filename@1@include/qpdf/QPDF.hh@2@filename@2@, | |
| 2228 | +@1@filename@1@include/qpdf/QPDFWriter.hh@2@filename@2@, and | |
| 2229 | +@1@filename@1@include/qpdf/QPDFObjectHandle.hh@2@filename@2@. | |
| 2230 | + | |
| 2231 | +All header files are installed in the | |
| 2232 | +@1@filename@1@include/qpdf@2@filename@2@ directory. It is recommend that | |
| 2233 | +you use ``#include | |
| 2234 | + <qpdf/QPDF.hh>`` rather than adding | |
| 2235 | +@1@filename@1@include/qpdf@2@filename@2@ to your include path. | |
| 2236 | + | |
| 2237 | +When linking against the qpdf static library, you may also need to | |
| 2238 | +specify ``-lz -ljpeg`` on your link command. If your system understands | |
| 2239 | +how to read libtool @1@filename@1@.la@2@filename@2@ files, this may not | |
| 2240 | +be necessary. | |
| 2241 | + | |
| 2242 | +The qpdf library is safe to use in a multithreaded program, but no | |
| 2243 | +individual ``QPDF`` object instance (including ``QPDF``, | |
| 2244 | +``QPDFObjectHandle``, or ``QPDFWriter``) can be used in more than one | |
| 2245 | +thread at a time. Multiple threads may simultaneously work with | |
| 2246 | +different instances of these and all other QPDF objects. | |
| 2247 | + | |
| 2248 | +.. _ref.using.other-languages: | |
| 2249 | + | |
| 2250 | +Using QPDF from other languages | |
| 2251 | +------------------------------- | |
| 2252 | + | |
| 2253 | +The qpdf library is implemented in C++, which makes it hard to use | |
| 2254 | +directly in other languages. There are a few things that can help. | |
| 2255 | + | |
| 2256 | +"C" | |
| 2257 | + The qpdf library includes a "C" language interface that provides a | |
| 2258 | + subset of the overall capabilities. The header file | |
| 2259 | + @1@filename@1@qpdf/qpdf-c.h@2@filename@2@ includes information about | |
| 2260 | + its use. As long as you use a C++ linker, you can link C programs | |
| 2261 | + with qpdf and use the C API. For languages that can directly load | |
| 2262 | + methods from a shared library, the C API can also be useful. People | |
| 2263 | + have reported success using the C API from other languages on Windows | |
| 2264 | + by directly calling functions in the DLL. | |
| 2265 | + | |
| 2266 | +Python | |
| 2267 | + A Python module called | |
| 2268 | + `pikepdf <https://pypi.org/project/pikepdf/>`__ provides a clean and | |
| 2269 | + highly functional set of Python bindings to the qpdf library. Using | |
| 2270 | + pikepdf, you can work with PDF files in a natural way and combine | |
| 2271 | + qpdf's capabilities with other functionality provided by Python's | |
| 2272 | + rich standard library and available modules. | |
| 2273 | + | |
| 2274 | +Other Languages | |
| 2275 | + Starting with version 8.3.0, the @1@command@1@qpdf@2@command@2@ | |
| 2276 | + command-line tool can produce a JSON representation of the PDF file's | |
| 2277 | + non-content data. This can facilitate interacting programmatically | |
| 2278 | + with PDF files through qpdf's command line interface. For more | |
| 2279 | + information, please see `QPDF JSON <#ref.json>`__. | |
| 2280 | + | |
| 2281 | +.. _ref.unicode-files: | |
| 2282 | + | |
| 2283 | +A Note About Unicode File Names | |
| 2284 | +------------------------------- | |
| 2285 | + | |
| 2286 | +When strings are passed to qpdf library routines either as ``char*`` or | |
| 2287 | +as ``std::string``, they are treated as byte arrays except where | |
| 2288 | +otherwise noted. When Unicode is desired, qpdf wants UTF-8 unless | |
| 2289 | +otherwise noted in comments in header files. In modern UNIX/Linux | |
| 2290 | +environments, this generally does the right thing. In Windows, it's a | |
| 2291 | +bit more complicated. Starting in qpdf 8.4.0, passwords that contain | |
| 2292 | +Unicode characters are handled much better, and starting in qpdf 8.4.1, | |
| 2293 | +the library attempts to properly handle Unicode characters in filenames. | |
| 2294 | +In particular, in Windows, if a UTF-8 encoded string is used as a | |
| 2295 | +filename in either ``QPDF`` or ``QPDFWriter``, it is internally | |
| 2296 | +converted to ``wchar_t*``, and Unicode-aware Windows APIs are used. As | |
| 2297 | +such, qpdf will generally operate properly on files with non-ASCII | |
| 2298 | +characters in their names as long as the filenames are UTF-8 encoded for | |
| 2299 | +passing into the qpdf library API, but there are still some rough edges, | |
| 2300 | +such as the encoding of the filenames in error messages our CLI output | |
| 2301 | +messages. Patches or bug reports are welcome for any continuing issues | |
| 2302 | +with Unicode file names in Windows. | |
| 2303 | + | |
| 2304 | +.. _ref.weak-crypto: | |
| 2305 | + | |
| 2306 | +Weak Cryptography | |
| 2307 | +================= | |
| 2308 | + | |
| 2309 | +Start with version 10.4, qpdf is taking steps to reduce the likelihood | |
| 2310 | +of a user *accidentally* creating PDF files with insecure cryptography | |
| 2311 | +but will continue to allow creation of such files indefinitely with | |
| 2312 | +explicit acknowledgment. | |
| 2313 | + | |
| 2314 | +The PDF file format makes use of RC4, which is known to be a weak | |
| 2315 | +cryptography algorithm, and MD5, which is a weak hashing algorithm. In | |
| 2316 | +version 10.4, qpdf generates warnings for some (but not all) cases of | |
| 2317 | +writing files with weak cryptography when invoked from the command-line. | |
| 2318 | +These warnings can be suppressed using the | |
| 2319 | +@1@option@1@--allow-weak-crypto@2@option@2@ option. | |
| 2320 | + | |
| 2321 | +It is planned for qpdf version 11 to be stricter, making it an error to | |
| 2322 | +write files with insecure cryptography from the command-line tool in | |
| 2323 | +most cases without specifying the | |
| 2324 | +@1@option@1@--allow-weak-crypto@2@option@2@ flag and also to require | |
| 2325 | +explicit steps when using the C++ library to enable use of insecure | |
| 2326 | +cryptography. | |
| 2327 | + | |
| 2328 | +Note that qpdf must always retain support for weak cryptographic | |
| 2329 | +algorithms since this is required for reading older PDF files that use | |
| 2330 | +it. Additionally, qpdf will always retain the ability to create files | |
| 2331 | +using weak cryptographic algorithms since, as a development tool, qpdf | |
| 2332 | +explicitly supports creating older or deprecated types of PDF files | |
| 2333 | +since these are sometimes needed to test or work with older versions of | |
| 2334 | +software. Even if other cryptography libraries drop support for RC4 or | |
| 2335 | +MD5, qpdf can always fall back to its internal implementations of those | |
| 2336 | +algorithms, so they are not going to disappear from qpdf. | |
| 2337 | + | |
| 2338 | +.. _ref.json: | |
| 2339 | + | |
| 2340 | +QPDF JSON | |
| 2341 | +========= | |
| 2342 | + | |
| 2343 | +.. _ref.json-overview: | |
| 2344 | + | |
| 2345 | +Overview | |
| 2346 | +-------- | |
| 2347 | + | |
| 2348 | +Beginning with qpdf version 8.3.0, the @1@command@1@qpdf@2@command@2@ | |
| 2349 | +command-line program can produce a JSON representation of the | |
| 2350 | +non-content data in a PDF file. It includes a dump in JSON format of all | |
| 2351 | +objects in the PDF file excluding the content of streams. This JSON | |
| 2352 | +representation makes it very easy to look in detail at the structure of | |
| 2353 | +a given PDF file, and it also provides a great way to work with PDF | |
| 2354 | +files programmatically from the command-line in languages that can't | |
| 2355 | +call or link with the qpdf library directly. Note that stream data can | |
| 2356 | +be extracted from PDF files using other qpdf command-line options. | |
| 2357 | + | |
| 2358 | +.. _ref.json-guarantees: | |
| 2359 | + | |
| 2360 | +JSON Guarantees | |
| 2361 | +--------------- | |
| 2362 | + | |
| 2363 | +The qpdf JSON representation includes a JSON serialization of the raw | |
| 2364 | +objects in the PDF file as well as some computed information in a more | |
| 2365 | +easily extracted format. QPDF provides some guarantees about its JSON | |
| 2366 | +format. These guarantees are designed to simplify the experience of a | |
| 2367 | +developer working with the JSON format. | |
| 2368 | + | |
| 2369 | +Compatibility | |
| 2370 | + The top-level JSON object output is a dictionary. The JSON output | |
| 2371 | + contains various nested dictionaries and arrays. With the exception | |
| 2372 | + of dictionaries that are populated by the fields of objects from the | |
| 2373 | + file, all instances of a dictionary are guaranteed to have exactly | |
| 2374 | + the same keys. Future versions of qpdf are free to add additional | |
| 2375 | + keys but not to remove keys or change the type of object that a key | |
| 2376 | + points to. The qpdf program validates this guarantee, and in the | |
| 2377 | + unlikely event that a bug in qpdf should cause it to generate data | |
| 2378 | + that doesn't conform to this rule, it will ask you to file a bug | |
| 2379 | + report. | |
| 2380 | + | |
| 2381 | + The top-level JSON structure contains a "``version``" key whose value | |
| 2382 | + is simple integer. The value of the ``version`` key will be | |
| 2383 | + incremented if a non-compatible change is made. A non-compatible | |
| 2384 | + change would be any change that involves removal of a key, a change | |
| 2385 | + to the format of data pointed to by a key, or a semantic change that | |
| 2386 | + requires a different interpretation of a previously existing key. A | |
| 2387 | + strong effort will be made to avoid breaking compatibility. | |
| 2388 | + | |
| 2389 | +Documentation | |
| 2390 | + The @1@command@1@qpdf@2@command@2@ command can be invoked with the | |
| 2391 | + @1@option@1@--json-help@2@option@2@ option. This will output a JSON | |
| 2392 | + structure that has the same structure as the JSON output that qpdf | |
| 2393 | + generates, except that each field in the help output is a description | |
| 2394 | + of the corresponding field in the JSON output. The specific | |
| 2395 | + guarantees are as follows: | |
| 2396 | + | |
| 2397 | + - A dictionary in the help output means that the corresponding | |
| 2398 | + location in the actual JSON output is also a dictionary with | |
| 2399 | + exactly the same keys; that is, no keys present in help are absent | |
| 2400 | + in the real output, and no keys will be present in the real output | |
| 2401 | + that are not in help. As a special case, if the dictionary has a | |
| 2402 | + single key whose name starts with ``<`` and ends with ``>``, it | |
| 2403 | + means that the JSON output is a dictionary that can have any keys, | |
| 2404 | + each of which conforms to the value of the special key. This is | |
| 2405 | + used for cases in which the keys of the dictionary are things like | |
| 2406 | + object IDs. | |
| 2407 | + | |
| 2408 | + - A string in the help output is a description of the item that | |
| 2409 | + appears in the corresponding location of the actual output. The | |
| 2410 | + corresponding output can have any format. | |
| 2411 | + | |
| 2412 | + - An array in the help output always contains a single element. It | |
| 2413 | + indicates that the corresponding location in the actual output is | |
| 2414 | + also an array, and that each element of the array has whatever | |
| 2415 | + format is implied by the single element of the help output's | |
| 2416 | + array. | |
| 2417 | + | |
| 2418 | + For example, the help output indicates includes a "``pagelabels``" | |
| 2419 | + key whose value is an array of one element. That element is a | |
| 2420 | + dictionary with keys "``index``" and "``label``". In addition to | |
| 2421 | + describing the meaning of those keys, this tells you that the actual | |
| 2422 | + JSON output will contain a ``pagelabels`` array, each of whose | |
| 2423 | + elements is a dictionary that contains an ``index`` key, a ``label`` | |
| 2424 | + key, and no other keys. | |
| 2425 | + | |
| 2426 | +Directness and Simplicity | |
| 2427 | + The JSON output contains the value of every object in the file, but | |
| 2428 | + it also contains some processed data. This is analogous to how qpdf's | |
| 2429 | + library interface works. The processed data is similar to the helper | |
| 2430 | + functions in that it allows you to look at certain aspects of the PDF | |
| 2431 | + file without having to understand all the nuances of the PDF | |
| 2432 | + specification, while the raw objects allow you to mine the PDF for | |
| 2433 | + anything that the higher-level interfaces are lacking. | |
| 2434 | + | |
| 2435 | +.. _json.limitations: | |
| 2436 | + | |
| 2437 | +Limitations of JSON Representation | |
| 2438 | +---------------------------------- | |
| 2439 | + | |
| 2440 | +There are a few limitations to be aware of with the JSON structure: | |
| 2441 | + | |
| 2442 | +- Strings, names, and indirect object references in the original PDF | |
| 2443 | + file are all converted to strings in the JSON representation. In the | |
| 2444 | + case of a "normal" PDF file, you can tell the difference because a | |
| 2445 | + name starts with a slash (``/``), and an indirect object reference | |
| 2446 | + looks like ``n n R``, but if there were to be a string that looked | |
| 2447 | + like a name or indirect object reference, there would be no way to | |
| 2448 | + tell this from the JSON output. Note that there are certain cases | |
| 2449 | + where you know for sure what something is, such as knowing that | |
| 2450 | + dictionary keys in objects are always names and that certain things | |
| 2451 | + in the higher-level computed data are known to contain indirect | |
| 2452 | + object references. | |
| 2453 | + | |
| 2454 | +- The JSON format doesn't support binary data very well. Mostly the | |
| 2455 | + details are not important, but they are presented here for | |
| 2456 | + information. When qpdf outputs a string in the JSON representation, | |
| 2457 | + it converts the string to UTF-8, assuming usual PDF string semantics. | |
| 2458 | + Specifically, if the original string is UTF-16, it is converted to | |
| 2459 | + UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is | |
| 2460 | + converted to UTF-8 with that assumption. This causes strange things | |
| 2461 | + to happen to binary strings. For example, if you had the binary | |
| 2462 | + string ``<038051>``, this would be output to the JSON as ``\u0003โขQ`` | |
| 2463 | + because ``03`` is not a printable character and ``80`` is the bullet | |
| 2464 | + character in PDF doc encoding and is mapped to the Unicode value | |
| 2465 | + ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to | |
| 2466 | + convert back from here to a binary string, would have to recognize | |
| 2467 | + Unicode values whose code points are higher than ``0xFF`` and map | |
| 2468 | + those back to their corresponding PDF doc encoding characters. There | |
| 2469 | + is no way to tell the difference between a Unicode string that was | |
| 2470 | + originally encoded as UTF-16 or one that was converted from PDF doc | |
| 2471 | + encoding. In other words, it's best if you don't try to use the JSON | |
| 2472 | + format to extract binary strings from the PDF file, but if you really | |
| 2473 | + had to, it could be done. Note that qpdf's | |
| 2474 | + @1@option@1@--show-object@2@option@2@ option does not have this | |
| 2475 | + limitation and will reveal the string as encoded in the original | |
| 2476 | + file. | |
| 2477 | + | |
| 2478 | +.. _json.considerations: | |
| 2479 | + | |
| 2480 | +JSON: Special Considerations | |
| 2481 | +---------------------------- | |
| 2482 | + | |
| 2483 | +For the most part, the built-in JSON help tells you everything you need | |
| 2484 | +to know about the JSON format, but there are a few non-obvious things to | |
| 2485 | +be aware of: | |
| 2486 | + | |
| 2487 | +- While qpdf guarantees that keys present in the help will be present | |
| 2488 | + in the output, those fields may be null or empty if the information | |
| 2489 | + is not known or absent in the file. Also, if you specify | |
| 2490 | + @1@option@1@--json-keys@2@option@2@, the keys that are not listed | |
| 2491 | + will be excluded entirely except for those that | |
| 2492 | + @1@option@1@--json-help@2@option@2@ says are always present. | |
| 2493 | + | |
| 2494 | +- In a few places, there are keys with names containing | |
| 2495 | + ``pageposfrom1``. The values of these keys are null or an integer. If | |
| 2496 | + an integer, they point to a page index within the file numbering from | |
| 2497 | + 1. Note that JSON indexes from 0, and you would also use 0-based | |
| 2498 | + indexing using the API. However, 1-based indexing is easier in this | |
| 2499 | + case because the command-line syntax for specifying page ranges is | |
| 2500 | + 1-based. If you were going to write a program that looked through the | |
| 2501 | + JSON for information about specific pages and then use the | |
| 2502 | + command-line to extract those pages, 1-based indexing is easier. | |
| 2503 | + Besides, it's more convenient to subtract 1 from a program in a real | |
| 2504 | + programming language than it is to add 1 from shell code. | |
| 2505 | + | |
| 2506 | +- The image information included in the ``page`` section of the JSON | |
| 2507 | + output includes the key "``filterable``". Note that the value of this | |
| 2508 | + field may depend on the @1@option@1@--decode-level@2@option@2@ that | |
| 2509 | + you invoke qpdf with. The JSON output includes a top-level key | |
| 2510 | + "``parameters``" that indicates the decode level used for computing | |
| 2511 | + whether a stream was filterable. For example, jpeg images will be | |
| 2512 | + shown as not filterable by default, but they will be shown as | |
| 2513 | + filterable if you run @1@command@1@qpdf --json | |
| 2514 | + --decode-level=all@2@command@2@. | |
| 2515 | + | |
| 2516 | +.. _ref.design: | |
| 2517 | + | |
| 2518 | +Design and Library Notes | |
| 2519 | +======================== | |
| 2520 | + | |
| 2521 | +.. _ref.design.intro: | |
| 2522 | + | |
| 2523 | +Introduction | |
| 2524 | +------------ | |
| 2525 | + | |
| 2526 | +This section was written prior to the implementation of the qpdf package | |
| 2527 | +and was subsequently modified to reflect the implementation. In some | |
| 2528 | +cases, for purposes of explanation, it may differ slightly from the | |
| 2529 | +actual implementation. As always, the source code and test suite are | |
| 2530 | +authoritative. Even if there are some errors, this document should serve | |
| 2531 | +as a road map to understanding how this code works. | |
| 2532 | + | |
| 2533 | +In general, one should adhere strictly to a specification when writing | |
| 2534 | +but be liberal in reading. This way, the product of our software will be | |
| 2535 | +accepted by the widest range of other programs, and we will accept the | |
| 2536 | +widest range of input files. This library attempts to conform to that | |
| 2537 | +philosophy whenever possible but also aims to provide strict checking | |
| 2538 | +for people who want to validate PDF files. If you don't want to see | |
| 2539 | +warnings and are trying to write something that is tolerant, you can | |
| 2540 | +call ``setSuppressWarnings(true)``. If you want to fail on the first | |
| 2541 | +error, you can call ``setAttemptRecovery(false)``. The default behavior | |
| 2542 | +is to generating warnings for recoverable problems. Note that recovery | |
| 2543 | +will not always produce the desired results even if it is able to get | |
| 2544 | +through the file. Unlike most other PDF files that produce generic | |
| 2545 | +warnings such as "This file is damaged,", qpdf generally issues a | |
| 2546 | +detailed error message that would be most useful to a PDF developer. | |
| 2547 | +This is by design as there seems to be a shortage of PDF validation | |
| 2548 | +tools out there. This was, in fact, one of the major motivations behind | |
| 2549 | +the initial creation of qpdf. | |
| 2550 | + | |
| 2551 | +.. _ref.design-goals: | |
| 2552 | + | |
| 2553 | +Design Goals | |
| 2554 | +------------ | |
| 2555 | + | |
| 2556 | +The QPDF package includes support for reading and rewriting PDF files. | |
| 2557 | +It aims to hide from the user details involving object locations, | |
| 2558 | +modified (appended) PDF files, the directness/indirectness of objects, | |
| 2559 | +and stream filters including encryption. It does not aim to hide | |
| 2560 | +knowledge of the object hierarchy or content stream contents. Put | |
| 2561 | +another way, a user of the qpdf library is expected to have knowledge | |
| 2562 | +about how PDF files work, but is not expected to have to keep track of | |
| 2563 | +bookkeeping details such as file positions. | |
| 2564 | + | |
| 2565 | +A user of the library never has to care whether an object is direct or | |
| 2566 | +indirect, though it is possible to determine whether an object is direct | |
| 2567 | +or not if this information is needed. All access to objects deals with | |
| 2568 | +this transparently. All memory management details are also handled by | |
| 2569 | +the library. | |
| 2570 | + | |
| 2571 | +The ``PointerHolder`` object is used internally by the library to deal | |
| 2572 | +with memory management. This is basically a smart pointer object very | |
| 2573 | +similar in spirit to C++-11's ``std::shared_ptr`` object, but predating | |
| 2574 | +it by several years. This library also makes use of a technique for | |
| 2575 | +giving fine-grained access to methods in one class to other classes by | |
| 2576 | +using public subclasses with friends and only private members that in | |
| 2577 | +turn call private methods of the containing class. See | |
| 2578 | +``QPDFObjectHandle::Factory`` as an example. | |
| 2579 | + | |
| 2580 | +The top-level qpdf class is ``QPDF``. A ``QPDF`` object represents a PDF | |
| 2581 | +file. The library provides methods for both accessing and mutating PDF | |
| 2582 | +files. | |
| 2583 | + | |
| 2584 | +The primary class for interacting with PDF objects is | |
| 2585 | +``QPDFObjectHandle``. Instances of this class can be passed around by | |
| 2586 | +value, copied, stored in containers, etc. with very low overhead. | |
| 2587 | +Instances of ``QPDFObjectHandle`` created by reading from a file will | |
| 2588 | +always contain a reference back to the ``QPDF`` object from which they | |
| 2589 | +were created. A ``QPDFObjectHandle`` may be direct or indirect. If | |
| 2590 | +indirect, the ``QPDFObject`` the ``PointerHolder`` initially points to | |
| 2591 | +is a null pointer. In this case, the first attempt to access the | |
| 2592 | +underlying ``QPDFObject`` will result in the ``QPDFObject`` being | |
| 2593 | +resolved via a call to the referenced ``QPDF`` instance. This makes it | |
| 2594 | +essentially impossible to make coding errors in which certain things | |
| 2595 | +will work for some PDF files and not for others based on which objects | |
| 2596 | +are direct and which objects are indirect. | |
| 2597 | + | |
| 2598 | +Instances of ``QPDFObjectHandle`` can be directly created and modified | |
| 2599 | +using static factory methods in the ``QPDFObjectHandle`` class. There | |
| 2600 | +are factory methods for each type of object as well as a convenience | |
| 2601 | +method ``QPDFObjectHandle::parse`` that creates an object from a string | |
| 2602 | +representation of the object. Existing instances of ``QPDFObjectHandle`` | |
| 2603 | +can also be modified in several ways. See comments in | |
| 2604 | +@1@filename@1@QPDFObjectHandle.hh@2@filename@2@ for details. | |
| 2605 | + | |
| 2606 | +An instance of ``QPDF`` is constructed by using the class's default | |
| 2607 | +constructor. If desired, the ``QPDF`` object may be configured with | |
| 2608 | +various methods that change its default behavior. Then the | |
| 2609 | +``QPDF::processFile()`` method is passed the name of a PDF file, which | |
| 2610 | +permanently associates the file with that QPDF object. A password may | |
| 2611 | +also be given for access to password-protected files. QPDF does not | |
| 2612 | +enforce encryption parameters and will treat user and owner passwords | |
| 2613 | +equivalently. Either password may be used to access an encrypted file. | |
| 2614 | +``QPDF`` will allow recovery of a user password given an owner password. | |
| 2615 | +The input PDF file must be seekable. (Output files written by | |
| 2616 | +``QPDFWriter`` need not be seekable, even when creating linearized | |
| 2617 | +files.) During construction, ``QPDF`` validates the PDF file's header, | |
| 2618 | +and then reads the cross reference tables and trailer dictionaries. The | |
| 2619 | +``QPDF`` class keeps only the first trailer dictionary though it does | |
| 2620 | +read all of them so it can check the ``/Prev`` key. ``QPDF`` class users | |
| 2621 | +may request the root object and the trailer dictionary specifically. The | |
| 2622 | +cross reference table is kept private. Objects may then be requested by | |
| 2623 | +number of by walking the object tree. | |
| 2624 | + | |
| 2625 | +When a PDF file has a cross-reference stream instead of a | |
| 2626 | +cross-reference table and trailer, requesting the document's trailer | |
| 2627 | +dictionary returns the stream dictionary from the cross-reference stream | |
| 2628 | +instead. | |
| 2629 | + | |
| 2630 | +There are some convenience routines for very common operations such as | |
| 2631 | +walking the page tree and returning a vector of all page objects. For | |
| 2632 | +full details, please see the header files | |
| 2633 | +@1@filename@1@QPDF.hh@2@filename@2@ and | |
| 2634 | +@1@filename@1@QPDFObjectHandle.hh@2@filename@2@. There are also some | |
| 2635 | +additional helper classes that provide higher level API functions for | |
| 2636 | +certain document constructions. These are discussed in `Helper | |
| 2637 | +Classes <#ref.helper-classes>`__. | |
| 2638 | + | |
| 2639 | +.. _ref.helper-classes: | |
| 2640 | + | |
| 2641 | +Helper Classes | |
| 2642 | +-------------- | |
| 2643 | + | |
| 2644 | +QPDF version 8.1 introduced the concept of helper classes. Helper | |
| 2645 | +classes are intended to contain higher level APIs that allow developers | |
| 2646 | +to work with certain document constructs at an abstraction level above | |
| 2647 | +that of ``QPDFObjectHandle`` while staying true to qpdf's philosophy of | |
| 2648 | +not hiding document structure from the developer. As with qpdf in | |
| 2649 | +general, the goal is take away some of the more tedious bookkeeping | |
| 2650 | +aspects of working with PDF files, not to remove the need for the | |
| 2651 | +developer to understand how the PDF construction in question works. The | |
| 2652 | +driving factor behind the creation of helper classes was to allow the | |
| 2653 | +evolution of higher level interfaces in qpdf without polluting the | |
| 2654 | +interfaces of the main top-level classes ``QPDF`` and | |
| 2655 | +``QPDFObjectHandle``. | |
| 2656 | + | |
| 2657 | +There are two kinds of helper classes: *document* helpers and *object* | |
| 2658 | +helpers. Document helpers are constructed with a reference to a ``QPDF`` | |
| 2659 | +object and provide methods for working with structures that are at the | |
| 2660 | +document level. Object helpers are constructed with an instance of a | |
| 2661 | +``QPDFObjectHandle`` and provide methods for working with specific types | |
| 2662 | +of objects. | |
| 2663 | + | |
| 2664 | +Examples of document helpers include ``QPDFPageDocumentHelper``, which | |
| 2665 | +contains methods for operating on the document's page trees, such as | |
| 2666 | +enumerating all pages of a document and adding and removing pages; and | |
| 2667 | +``QPDFAcroFormDocumentHelper``, which contains document-level methods | |
| 2668 | +related to interactive forms, such as enumerating form fields and | |
| 2669 | +creating mappings between form fields and annotations. | |
| 2670 | + | |
| 2671 | +Examples of object helpers include ``QPDFPageObjectHelper`` for | |
| 2672 | +performing operations on pages such as page rotation and some operations | |
| 2673 | +on content streams, ``QPDFFormFieldObjectHelper`` for performing | |
| 2674 | +operations related to interactive form fields, and | |
| 2675 | +``QPDFAnnotationObjectHelper`` for working with annotations. | |
| 2676 | + | |
| 2677 | +It is always possible to retrieve the underlying ``QPDF`` reference from | |
| 2678 | +a document helper and the underlying ``QPDFObjectHandle`` reference from | |
| 2679 | +an object helper. Helpers are designed to be helpers, not wrappers. The | |
| 2680 | +intention is that, in general, it is safe to freely intermix operations | |
| 2681 | +that use helpers with operations that use the underlying objects. | |
| 2682 | +Document and object helpers do not attempt to provide a complete | |
| 2683 | +interface for working with the things they are helping with, nor do they | |
| 2684 | +attempt to encapsulate underlying structures. They just provide a few | |
| 2685 | +methods to help with error-prone, repetitive, or complex tasks. In some | |
| 2686 | +cases, a helper object may cache some information that is expensive to | |
| 2687 | +gather. In such cases, the helper classes are implemented so that their | |
| 2688 | +own methods keep the cache consistent, and the header file will provide | |
| 2689 | +a method to invalidate the cache and a description of what kinds of | |
| 2690 | +operations would make the cache invalid. If in doubt, you can always | |
| 2691 | +discard a helper class and create a new one with the same underlying | |
| 2692 | +objects, which will ensure that you have discarded any stale | |
| 2693 | +information. | |
| 2694 | + | |
| 2695 | +By Convention, document helpers are called | |
| 2696 | +``QPDFSomethingDocumentHelper`` and are derived from | |
| 2697 | +``QPDFDocumentHelper``, and object helpers are called | |
| 2698 | +``QPDFSomethingObjectHelper`` and are derived from ``QPDFObjectHelper``. | |
| 2699 | +For details on specific helpers, please see their header files. You can | |
| 2700 | +find them by looking at | |
| 2701 | +@1@filename@1@include/qpdf/QPDF*DocumentHelper.hh@2@filename@2@ and | |
| 2702 | +@1@filename@1@include/qpdf/QPDF*ObjectHelper.hh@2@filename@2@. | |
| 2703 | + | |
| 2704 | +In order to avoid creation of circular dependencies, the following | |
| 2705 | +general guidelines are followed with helper classes: | |
| 2706 | + | |
| 2707 | +- Core class interfaces do not know about helper classes. For example, | |
| 2708 | + no methods of ``QPDF`` or ``QPDFObjectHandle`` will include helper | |
| 2709 | + classes in their interfaces. | |
| 2710 | + | |
| 2711 | +- Interfaces of object helpers will usually not use document helpers in | |
| 2712 | + their interfaces. This is because it is much more useful for document | |
| 2713 | + helpers to have methods that return object helpers. Most operations | |
| 2714 | + in PDF files start at the document level and go from there to the | |
| 2715 | + object level rather than the other way around. It can sometimes be | |
| 2716 | + useful to map back from object-level structures to document-level | |
| 2717 | + structures. If there is a desire to do this, it will generally be | |
| 2718 | + provided by a method in the document helper class. | |
| 2719 | + | |
| 2720 | +- Most of the time, object helpers don't know about other object | |
| 2721 | + helpers. However, in some cases, one type of object may be a | |
| 2722 | + container for another type of object, in which case it may make sense | |
| 2723 | + for the outer object to know about the inner object. For example, | |
| 2724 | + there are methods in the ``QPDFPageObjectHelper`` that know | |
| 2725 | + ``QPDFAnnotationObjectHelper`` because references to annotations are | |
| 2726 | + contained in page dictionaries. | |
| 2727 | + | |
| 2728 | +- Any helper or core library class may use helpers in their | |
| 2729 | + implementations. | |
| 2730 | + | |
| 2731 | +Prior to qpdf version 8.1, higher level interfaces were added as | |
| 2732 | +"convenience functions" in either ``QPDF`` or ``QPDFObjectHandle``. For | |
| 2733 | +compatibility, older convenience functions for operating with pages will | |
| 2734 | +remain in those classes even as alternatives are provided in helper | |
| 2735 | +classes. Going forward, new higher level interfaces will be provided | |
| 2736 | +using helper classes. | |
| 2737 | + | |
| 2738 | +.. _ref.implementation-notes: | |
| 2739 | + | |
| 2740 | +Implementation Notes | |
| 2741 | +-------------------- | |
| 2742 | + | |
| 2743 | +This section contains a few notes about QPDF's internal implementation, | |
| 2744 | +particularly around what it does when it first processes a file. This | |
| 2745 | +section is a bit of a simplification of what it actually does, but it | |
| 2746 | +could serve as a starting point to someone trying to understand the | |
| 2747 | +implementation. There is nothing in this section that you need to know | |
| 2748 | +to use the qpdf library. | |
| 2749 | + | |
| 2750 | +``QPDFObject`` is the basic PDF Object class. It is an abstract base | |
| 2751 | +class from which are derived classes for each type of PDF object. | |
| 2752 | +Clients do not interact with Objects directly but instead interact with | |
| 2753 | +``QPDFObjectHandle``. | |
| 2754 | + | |
| 2755 | +When the ``QPDF`` class creates a new object, it dynamically allocates | |
| 2756 | +the appropriate type of ``QPDFObject`` and immediately hands the pointer | |
| 2757 | +to an instance of ``QPDFObjectHandle``. The parser reads a token from | |
| 2758 | +the current file position. If the token is a not either a dictionary or | |
| 2759 | +array opener, an object is immediately constructed from the single token | |
| 2760 | +and the parser returns. Otherwise, the parser iterates in a special mode | |
| 2761 | +in which it accumulates objects until it finds a balancing closer. | |
| 2762 | +During this process, the "``R``" keyword is recognized and an indirect | |
| 2763 | +``QPDFObjectHandle`` may be constructed. | |
| 2764 | + | |
| 2765 | +The ``QPDF::resolve()`` method, which is used to resolve an indirect | |
| 2766 | +object, may be invoked from the ``QPDFObjectHandle`` class. It first | |
| 2767 | +checks a cache to see whether this object has already been read. If not, | |
| 2768 | +it reads the object from the PDF file and caches it. It the returns the | |
| 2769 | +resulting ``QPDFObjectHandle``. The calling object handle then replaces | |
| 2770 | +its ``PointerHolder<QDFObject>`` with the one from the newly returned | |
| 2771 | +``QPDFObjectHandle``. In this way, only a single copy of any direct | |
| 2772 | +object need exist and clients can access objects transparently without | |
| 2773 | +knowing caring whether they are direct or indirect objects. | |
| 2774 | +Additionally, no object is ever read from the file more than once. That | |
| 2775 | +means that only the portions of the PDF file that are actually needed | |
| 2776 | +are ever read from the input file, thus allowing the qpdf package to | |
| 2777 | +take advantage of this important design goal of PDF files. | |
| 2778 | + | |
| 2779 | +If the requested object is inside of an object stream, the object stream | |
| 2780 | +itself is first read into memory. Then the tokenizer reads objects from | |
| 2781 | +the memory stream based on the offset information stored in the stream. | |
| 2782 | +Those individual objects are cached, after which the temporary buffer | |
| 2783 | +holding the object stream contents are discarded. In this way, the first | |
| 2784 | +time an object in an object stream is requested, all objects in the | |
| 2785 | +stream are cached. | |
| 2786 | + | |
| 2787 | +The following example should clarify how ``QPDF`` processes a simple | |
| 2788 | +file. | |
| 2789 | + | |
| 2790 | +- Client constructs ``QPDF`` ``pdf`` and calls | |
| 2791 | + ``pdf.processFile("a.pdf");``. | |
| 2792 | + | |
| 2793 | +- The ``QPDF`` class checks the beginning of | |
| 2794 | + @1@filename@1@a.pdf@2@filename@2@ for a PDF header. It then reads the | |
| 2795 | + cross reference table mentioned at the end of the file, ensuring that | |
| 2796 | + it is looking before the last ``%%EOF``. After getting to ``trailer`` | |
| 2797 | + keyword, it invokes the parser. | |
| 2798 | + | |
| 2799 | +- The parser sees "``<<``", so it calls itself recursively in | |
| 2800 | + dictionary creation mode. | |
| 2801 | + | |
| 2802 | +- In dictionary creation mode, the parser keeps accumulating objects | |
| 2803 | + until it encounters "``>>``". Each object that is read is pushed onto | |
| 2804 | + a stack. If "``R``" is read, the last two objects on the stack are | |
| 2805 | + inspected. If they are integers, they are popped off the stack and | |
| 2806 | + their values are used to construct an indirect object handle which is | |
| 2807 | + then pushed onto the stack. When "``>>``" is finally read, the stack | |
| 2808 | + is converted into a ``QPDF_Dictionary`` which is placed in a | |
| 2809 | + ``QPDFObjectHandle`` and returned. | |
| 2810 | + | |
| 2811 | +- The resulting dictionary is saved as the trailer dictionary. | |
| 2812 | + | |
| 2813 | +- The ``/Prev`` key is searched. If present, ``QPDF`` seeks to that | |
| 2814 | + point and repeats except that the new trailer dictionary is not | |
| 2815 | + saved. If ``/Prev`` is not present, the initial parsing process is | |
| 2816 | + complete. | |
| 2817 | + | |
| 2818 | + If there is an encryption dictionary, the document's encryption | |
| 2819 | + parameters are initialized. | |
| 2820 | + | |
| 2821 | +- The client requests root object. The ``QPDF`` class gets the value of | |
| 2822 | + root key from trailer dictionary and returns it. It is an unresolved | |
| 2823 | + indirect ``QPDFObjectHandle``. | |
| 2824 | + | |
| 2825 | +- The client requests the ``/Pages`` key from root | |
| 2826 | + ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is | |
| 2827 | + indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the | |
| 2828 | + object cache for an object with the root dictionary's object ID and | |
| 2829 | + generation number. Upon not seeing it, it checks the cross reference | |
| 2830 | + table, gets the offset, and reads the object present at that offset. | |
| 2831 | + It stores the result in the object cache and returns the cached | |
| 2832 | + result. The calling ``QPDFObjectHandle`` replaces its object pointer | |
| 2833 | + with the one from the resolved ``QPDFObjectHandle``, verifies that it | |
| 2834 | + a valid dictionary object, and returns the (unresolved indirect) | |
| 2835 | + ``QPDFObject`` handle to the top of the Pages hierarchy. | |
| 2836 | + | |
| 2837 | + As the client continues to request objects, the same process is | |
| 2838 | + followed for each new requested object. | |
| 2839 | + | |
| 2840 | +.. _ref.casting: | |
| 2841 | + | |
| 2842 | +Casting Policy | |
| 2843 | +-------------- | |
| 2844 | + | |
| 2845 | +This section describes the casting policy followed by qpdf's | |
| 2846 | +implementation. This is no concern to qpdf's end users and largely of no | |
| 2847 | +concern to people writing code that uses qpdf, but it could be of | |
| 2848 | +interest to people who are porting qpdf to a new platform or who are | |
| 2849 | +making modifications to the code. | |
| 2850 | + | |
| 2851 | +The C++ code in qpdf is free of old-style casts except where unavoidable | |
| 2852 | +(e.g. where the old-style cast is in a macro provided by a third-party | |
| 2853 | +header file). When there is a need for a cast, it is handled, in order | |
| 2854 | +of preference, by rewriting the code to avoid the need for a cast, | |
| 2855 | +calling ``const_cast``, calling ``static_cast``, calling | |
| 2856 | +``reinterpret_cast``, or calling some combination of the above. As a | |
| 2857 | +last resort, a compiler-specific ``#pragma`` may be used to suppress a | |
| 2858 | +warning that we don't want to fix. Examples may include suppressing | |
| 2859 | +warnings about the use of old-style casts in code that is shared between | |
| 2860 | +C and C++ code. | |
| 2861 | + | |
| 2862 | +The ``QIntC`` namespace, provided by | |
| 2863 | +@1@filename@1@include/qpdf/QIntC.hh@2@filename@2@, implements safe | |
| 2864 | +functions for converting between integer types. These functions do range | |
| 2865 | +checking and throw a ``std::range_error``, which is subclass of | |
| 2866 | +``std::runtime_error``, if conversion from one integer type to another | |
| 2867 | +results in loss of information. There are many cases in which we have to | |
| 2868 | +move between different integer types because of incompatible integer | |
| 2869 | +types used in interoperable interfaces. Some are unavoidable, such as | |
| 2870 | +moving between sizes and offsets, and others are there because of old | |
| 2871 | +code that is too in entrenched to be fixable without breaking source | |
| 2872 | +compatibility and causing pain for users. QPDF is compiled with extra | |
| 2873 | +warnings to detect conversions with potential data loss, and all such | |
| 2874 | +cases should be fixed by either using a function from ``QIntC`` or a | |
| 2875 | +``static_cast``. | |
| 2876 | + | |
| 2877 | +When the intention is just to switch the type because of exchanging data | |
| 2878 | +between incompatible interfaces, use ``QIntC``. This is the usual case. | |
| 2879 | +However, there are some cases in which we are explicitly intending to | |
| 2880 | +use the exact same bit pattern with a different type. This is most | |
| 2881 | +common when switching between signed and unsigned characters. A lot of | |
| 2882 | +qpdf's code uses unsigned characters internally, but ``std::string`` and | |
| 2883 | +``char`` are signed. Using ``QIntC::to_char`` would be wrong for | |
| 2884 | +converting from unsigned to signed characters because a negative | |
| 2885 | +``char`` value and the corresponding ``unsigned | |
| 2886 | + char`` value greater than 127 *mean the same thing*. There are also | |
| 2887 | +cases in which we use ``static_cast`` when working with bit fields where | |
| 2888 | +we are not representing a numerical value but rather a bunch of bits | |
| 2889 | +packed together in some integer type. Also note that ``size_t`` and | |
| 2890 | +``long`` both typically differ between 32-bit and 64-bit environments, | |
| 2891 | +so sometimes an explicit cast may not be needed to avoid warnings on one | |
| 2892 | +platform but may be needed on another. A conversion with ``QIntC`` | |
| 2893 | +should always be used when the types are different even if the | |
| 2894 | +underlying size is the same. QPDF's CI build builds on 32-bit and 64-bit | |
| 2895 | +platforms, and the test suite is very thorough, so it is hard to make | |
| 2896 | +any of the potential errors here without being caught in build or test. | |
| 2897 | + | |
| 2898 | +Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The | |
| 2899 | +pipeline interface has a ``write`` call that uses ``unsigned | |
| 2900 | + char*`` without a ``const`` qualifier. The main reason for this is | |
| 2901 | +to support pipelines that make calls to third-party libraries, such as | |
| 2902 | +zlib, that don't include ``const`` in their interfaces. Unfortunately, | |
| 2903 | +there are many places in the code where it is desirable to have ``const | |
| 2904 | + char*`` with pipelines. None of the pipeline implementations in qpdf | |
| 2905 | +currently modify the data passed to write, and doing so would be counter | |
| 2906 | +to the intent of ``Pipeline``, but there is nothing in the code to | |
| 2907 | +prevent this from being done. There are places in the code where | |
| 2908 | +``const_cast`` is used to remove the const-ness of pointers going into | |
| 2909 | +``Pipeline``\ s. This could theoretically be unsafe, but there is | |
| 2910 | +adequate testing to assert that it is safe and will remain safe in | |
| 2911 | +qpdf's code. | |
| 2912 | + | |
| 2913 | +.. _ref.encryption: | |
| 2914 | + | |
| 2915 | +Encryption | |
| 2916 | +---------- | |
| 2917 | + | |
| 2918 | +Encryption is supported transparently by qpdf. When opening a PDF file, | |
| 2919 | +if an encryption dictionary exists, the ``QPDF`` object processes this | |
| 2920 | +dictionary using the password (if any) provided. The primary decryption | |
| 2921 | +key is computed and cached. No further access is made to the encryption | |
| 2922 | +dictionary after that time. When an object is read from a file, the | |
| 2923 | +object ID and generation of the object in which it is contained is | |
| 2924 | +always known. Using this information along with the stored encryption | |
| 2925 | +key, all stream and string objects are transparently decrypted. Raw | |
| 2926 | +encrypted objects are never stored in memory. This way, nothing in the | |
| 2927 | +library ever has to know or care whether it is reading an encrypted | |
| 2928 | +file. | |
| 2929 | + | |
| 2930 | +An interface is also provided for writing encrypted streams and strings | |
| 2931 | +given an encryption key. This is used by ``QPDFWriter`` when it rewrites | |
| 2932 | +encrypted files. | |
| 2933 | + | |
| 2934 | +When copying encrypted files, unless otherwise directed, qpdf will | |
| 2935 | +preserve any encryption in force in the original file. qpdf can do this | |
| 2936 | +with either the user or the owner password. There is no difference in | |
| 2937 | +capability based on which password is used. When 40 or 128 bit | |
| 2938 | +encryption keys are used, the user password can be recovered with the | |
| 2939 | +owner password. With 256 keys, the user and owner passwords are used | |
| 2940 | +independently to encrypt the actual encryption key, so while either can | |
| 2941 | +be used, the owner password can no longer be used to recover the user | |
| 2942 | +password. | |
| 2943 | + | |
| 2944 | +Starting with version 4.0.0, qpdf can read files that are not encrypted | |
| 2945 | +but that contain encrypted attachments, but it cannot write such files. | |
| 2946 | +qpdf also requires the password to be specified in order to open the | |
| 2947 | +file, not just to extract attachments, since once the file is open, all | |
| 2948 | +decryption is handled transparently. When copying files like this while | |
| 2949 | +preserving encryption, qpdf will apply the file's encryption to | |
| 2950 | +everything in the file, not just to the attachments. When decrypting the | |
| 2951 | +file, qpdf will decrypt the attachments. In general, when copying PDF | |
| 2952 | +files with multiple encryption formats, qpdf will choose the newest | |
| 2953 | +format. The only exception to this is that clear-text metadata will be | |
| 2954 | +preserved as clear-text if it is that way in the original file. | |
| 2955 | + | |
| 2956 | +One point of confusion some people have about encrypted PDF files is | |
| 2957 | +that encryption is not the same as password protection. Password | |
| 2958 | +protected files are always encrypted, but it is also possible to create | |
| 2959 | +encrypted files that do not have passwords. Internally, such files use | |
| 2960 | +the empty string as a password, and most readers try the empty string | |
| 2961 | +first to see if it works and prompt for a password only if the empty | |
| 2962 | +string doesn't work. Normally such files have an empty user password and | |
| 2963 | +a non-empty owner password. In that way, if the file is opened by an | |
| 2964 | +ordinary reader without specification of password, the restrictions | |
| 2965 | +specified in the encryption dictionary can be enforced. Most users | |
| 2966 | +wouldn't even realize such a file was encrypted. Since qpdf always | |
| 2967 | +ignores the restrictions (except for the purpose of reporting what they | |
| 2968 | +are), qpdf doesn't care which password you use. QPDF will allow you to | |
| 2969 | +create PDF files with non-empty user passwords and empty owner | |
| 2970 | +passwords. Some readers will require a password when you open these | |
| 2971 | +files, and others will open the files without a password and not enforce | |
| 2972 | +restrictions. Having a non-empty user password and an empty owner | |
| 2973 | +password doesn't really make sense because it would mean that opening | |
| 2974 | +the file with the user password would be more restrictive than not | |
| 2975 | +supplying a password at all. QPDF also allows you to create PDF files | |
| 2976 | +with the same password as both the user and owner password. Some readers | |
| 2977 | +will not ever allow such files to be accessed without restrictions | |
| 2978 | +because they never try the password as the owner password if it works as | |
| 2979 | +the user password. Nonetheless, one of the powerful aspects of qpdf is | |
| 2980 | +that it allows you to finely specify the way encrypted files are | |
| 2981 | +created, even if the results are not useful to some readers. One use | |
| 2982 | +case for this would be for testing a PDF reader to ensure that it | |
| 2983 | +handles odd configurations of input files. | |
| 2984 | + | |
| 2985 | +.. _ref.random-numbers: | |
| 2986 | + | |
| 2987 | +Random Number Generation | |
| 2988 | +------------------------ | |
| 2989 | + | |
| 2990 | +QPDF generates random numbers to support generation of encrypted data. | |
| 2991 | +Starting in qpdf 10.0.0, qpdf uses the crypto provider as its source of | |
| 2992 | +random numbers. Older versions used the OS-provided source of secure | |
| 2993 | +random numbers or, if allowed at build time, insecure random numbers | |
| 2994 | +from stdlib. Starting with version 5.1.0, you can disable use of | |
| 2995 | +OS-provided secure random numbers at build time. This is especially | |
| 2996 | +useful on Windows if you want to avoid a dependency on Microsoft's | |
| 2997 | +cryptography API. You can also supply your own random data provider. For | |
| 2998 | +details on how to do this, please refer to the top-level README.md file | |
| 2999 | +in the source distribution and to comments in | |
| 3000 | +@1@filename@1@QUtil.hh@2@filename@2@. | |
| 3001 | + | |
| 3002 | +.. _ref.adding-and-remove-pages: | |
| 3003 | + | |
| 3004 | +Adding and Removing Pages | |
| 3005 | +------------------------- | |
| 3006 | + | |
| 3007 | +While qpdf's API has supported adding and modifying objects for some | |
| 3008 | +time, version 3.0 introduces specific methods for adding and removing | |
| 3009 | +pages. These are largely convenience routines that handle two tricky | |
| 3010 | +issues: pushing inheritable resources from the ``/Pages`` tree down to | |
| 3011 | +individual pages and manipulation of the ``/Pages`` tree itself. For | |
| 3012 | +details, see ``addPage`` and surrounding methods in | |
| 3013 | +@1@filename@1@QPDF.hh@2@filename@2@. | |
| 3014 | + | |
| 3015 | +.. _ref.reserved-objects: | |
| 3016 | + | |
| 3017 | +Reserving Object Numbers | |
| 3018 | +------------------------ | |
| 3019 | + | |
| 3020 | +Version 3.0 of qpdf introduced the concept of reserved objects. These | |
| 3021 | +are seldom needed for ordinary operations, but there are cases in which | |
| 3022 | +you may want to add a series of indirect objects with references to each | |
| 3023 | +other to a ``QPDF`` object. This causes a problem because you can't | |
| 3024 | +determine the object ID that a new indirect object will have until you | |
| 3025 | +add it to the ``QPDF`` object with ``QPDF::makeIndirectObject``. The | |
| 3026 | +only way to add two mutually referential objects to a ``QPDF`` object | |
| 3027 | +prior to version 3.0 would be to add the new objects first and then make | |
| 3028 | +them refer to each other after adding them. Now it is possible to create | |
| 3029 | +a @1@firstterm@1@reserved object@2@firstterm@2@ using | |
| 3030 | +``QPDFObjectHandle::newReserved``. This is an indirect object that stays | |
| 3031 | +"unresolved" even if it is queried for its type. So now, if you want to | |
| 3032 | +create a set of mutually referential objects, you can create | |
| 3033 | +reservations for each one of them and use those reservations to | |
| 3034 | +construct the references. When finished, you can call | |
| 3035 | +``QPDF::replaceReserved`` to replace the reserved objects with the real | |
| 3036 | +ones. This functionality will never be needed by most applications, but | |
| 3037 | +it is used internally by QPDF when copying objects from other PDF files, | |
| 3038 | +as discussed in `Copying Objects From Other PDF | |
| 3039 | +Files <#ref.foreign-objects>`__. For an example of how to use reserved | |
| 3040 | +objects, search for ``newReserved`` in | |
| 3041 | +@1@filename@1@test_driver.cc@2@filename@2@ in qpdf's sources. | |
| 3042 | + | |
| 3043 | +.. _ref.foreign-objects: | |
| 3044 | + | |
| 3045 | +Copying Objects From Other PDF Files | |
| 3046 | +------------------------------------ | |
| 3047 | + | |
| 3048 | +Version 3.0 of qpdf introduced the ability to copy objects into a | |
| 3049 | +``QPDF`` object from a different ``QPDF`` object, which we refer to as | |
| 3050 | +@1@firstterm@1@foreign objects@2@firstterm@2@. This allows arbitrary | |
| 3051 | +merging of PDF files. The "from" ``QPDF`` object must remain valid after | |
| 3052 | +the copy as discussed in the note below. The | |
| 3053 | +@1@command@1@qpdf@2@command@2@ command-line tool provides limited | |
| 3054 | +support for basic page selection, including merging in pages from other | |
| 3055 | +files, but the library's API makes it possible to implement arbitrarily | |
| 3056 | +complex merging operations. The main method for copying foreign objects | |
| 3057 | +is ``QPDF::copyForeignObject``. This takes an indirect object from | |
| 3058 | +another ``QPDF`` and copies it recursively into this object while | |
| 3059 | +preserving all object structure, including circular references. This | |
| 3060 | +means you can add a direct object that you create from scratch to a | |
| 3061 | +``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an | |
| 3062 | +indirect object from another file with ``QPDF::copyForeignObject``. The | |
| 3063 | +fact that ``QPDF::makeIndirectObject`` does not automatically detect a | |
| 3064 | +foreign object and copy it is an explicit design decision. Copying a | |
| 3065 | +foreign object seems like a sufficiently significant thing to do that it | |
| 3066 | +should be done explicitly. | |
| 3067 | + | |
| 3068 | +The other way to copy foreign objects is by passing a page from one | |
| 3069 | +``QPDF`` to another by calling ``QPDF::addPage``. In contrast to | |
| 3070 | +``QPDF::makeIndirectObject``, this method automatically distinguishes | |
| 3071 | +between indirect objects in the current file, foreign objects, and | |
| 3072 | +direct objects. | |
| 3073 | + | |
| 3074 | +Please note: when you copy objects from one ``QPDF`` to another, the | |
| 3075 | +source ``QPDF`` object must remain valid until you have finished with | |
| 3076 | +the destination object. This is because the original object is still | |
| 3077 | +used to retrieve any referenced stream data from the copied object. | |
| 3078 | + | |
| 3079 | +.. _ref.rewriting: | |
| 3080 | + | |
| 3081 | +Writing PDF Files | |
| 3082 | +----------------- | |
| 3083 | + | |
| 3084 | +The qpdf library supports file writing of ``QPDF`` objects to PDF files | |
| 3085 | +through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two | |
| 3086 | +writing modes: one for non-linearized files, and one for linearized | |
| 3087 | +files. See `Linearization <#ref.linearization>`__ for a description of | |
| 3088 | +linearization is implemented. This section describes how we write | |
| 3089 | +non-linearized files including the creation of QDF files (see `QDF | |
| 3090 | +Mode <#ref.qdf>`__. | |
| 3091 | + | |
| 3092 | +This outline was written prior to implementation and is not exactly | |
| 3093 | +accurate, but it provides a correct "notional" idea of how writing | |
| 3094 | +works. Look at the code in ``QPDFWriter`` for exact details. | |
| 3095 | + | |
| 3096 | +- Initialize state: | |
| 3097 | + | |
| 3098 | + - next object number = 1 | |
| 3099 | + | |
| 3100 | + - object queue = empty | |
| 3101 | + | |
| 3102 | + - renumber table: old object id/generation to new id/0 = empty | |
| 3103 | + | |
| 3104 | + - xref table: new id -> offset = empty | |
| 3105 | + | |
| 3106 | +- Create a QPDF object from a file. | |
| 3107 | + | |
| 3108 | +- Write header for new PDF file. | |
| 3109 | + | |
| 3110 | +- Request the trailer dictionary. | |
| 3111 | + | |
| 3112 | +- For each value that is an indirect object, grab the next object | |
| 3113 | + number (via an operation that returns and increments the number). Map | |
| 3114 | + object to new number in renumber table. Push object onto queue. | |
| 3115 | + | |
| 3116 | +- While there are more objects on the queue: | |
| 3117 | + | |
| 3118 | + - Pop queue. | |
| 3119 | + | |
| 3120 | + - Look up object's new number *n* in the renumbering table. | |
| 3121 | + | |
| 3122 | + - Store current offset into xref table. | |
| 3123 | + | |
| 3124 | + - Write ``@1@replaceable@1@n@2@replaceable@2@ 0 obj``. | |
| 3125 | + | |
| 3126 | + - If object is null, whether direct or indirect, write out null, | |
| 3127 | + thus eliminating unresolvable indirect object references. | |
| 3128 | + | |
| 3129 | + - If the object is a stream stream, write stream contents, piped | |
| 3130 | + through any filters as required, to a memory buffer. Use this | |
| 3131 | + buffer to determine the stream length. | |
| 3132 | + | |
| 3133 | + - If object is not a stream, array, or dictionary, write out its | |
| 3134 | + contents. | |
| 3135 | + | |
| 3136 | + - If object is an array or dictionary (including stream), traverse | |
| 3137 | + its elements (for array) or values (for dictionaries), handling | |
| 3138 | + recursive dictionaries and arrays, looking for indirect objects. | |
| 3139 | + When an indirect object is found, if it is not resolvable, ignore. | |
| 3140 | + (This case is handled when writing it out.) Otherwise, look it up | |
| 3141 | + in the renumbering table. If not found, grab the next available | |
| 3142 | + object number, assign to the referenced object in the renumbering | |
| 3143 | + table, and push the referenced object onto the queue. As a special | |
| 3144 | + case, when writing out a stream dictionary, replace length, | |
| 3145 | + filters, and decode parameters as required. | |
| 3146 | + | |
| 3147 | + Write out dictionary or array, replacing any unresolvable indirect | |
| 3148 | + object references with null (pdf spec says reference to | |
| 3149 | + non-existent object is legal and resolves to null) and any | |
| 3150 | + resolvable ones with references to the renumbered objects. | |
| 3151 | + | |
| 3152 | + - If the object is a stream, write ``stream\n``, the stream contents | |
| 3153 | + (from the memory buffer), and ``\nendstream\n``. | |
| 3154 | + | |
| 3155 | + - When done, write ``endobj``. | |
| 3156 | + | |
| 3157 | +Once we have finished the queue, all referenced objects will have been | |
| 3158 | +written out and all deleted objects or unreferenced objects will have | |
| 3159 | +been skipped. The new cross-reference table will contain an offset for | |
| 3160 | +every new object number from 1 up to the number of objects written. This | |
| 3161 | +can be used to write out a new xref table. Finally we can write out the | |
| 3162 | +trailer dictionary with appropriately computed /ID (see spec, 8.3, File | |
| 3163 | +Identifiers), the cross reference table offset, and ``%%EOF``. | |
| 3164 | + | |
| 3165 | +.. _ref.filtered-streams: | |
| 3166 | + | |
| 3167 | +Filtered Streams | |
| 3168 | +---------------- | |
| 3169 | + | |
| 3170 | +Support for streams is implemented through the ``Pipeline`` interface | |
| 3171 | +which was designed for this package. | |
| 3172 | + | |
| 3173 | +When reading streams, create a series of ``Pipeline`` objects. The | |
| 3174 | +``Pipeline`` abstract base requires implementation ``write()`` and | |
| 3175 | +``finish()`` and provides an implementation of ``getNext()``. Each | |
| 3176 | +pipeline object, upon receiving data, does whatever it is going to do | |
| 3177 | +and then writes the data (possibly modified) to its successor. | |
| 3178 | +Alternatively, a pipeline may be an end-of-the-line pipeline that does | |
| 3179 | +something like store its output to a file or a memory buffer ignoring a | |
| 3180 | +successor. For additional details, look at | |
| 3181 | +@1@filename@1@Pipeline.hh@2@filename@2@. | |
| 3182 | + | |
| 3183 | +``QPDF`` can read raw or filtered streams. When reading a filtered | |
| 3184 | +stream, the ``QPDF`` class creates a ``Pipeline`` object for one of each | |
| 3185 | +appropriate filter object and chains them together. The last filter | |
| 3186 | +should write to whatever type of output is required. The ``QPDF`` class | |
| 3187 | +has an interface to write raw or filtered stream contents to a given | |
| 3188 | +pipeline. | |
| 3189 | + | |
| 3190 | +.. _ref.object-accessors: | |
| 3191 | + | |
| 3192 | +Object Accessor Methods | |
| 3193 | +----------------------- | |
| 3194 | + | |
| 3195 | +@1@comment: This section is referenced in QPDFObjectHandle.hh @1@ | |
| 3196 | + | |
| 3197 | +For general information about how to access instances of | |
| 3198 | +``QPDFObjectHandle``, please see the comments in | |
| 3199 | +@1@filename@1@QPDFObjectHandle.hh@2@filename@2@. Search for "Accessor | |
| 3200 | +methods". This section provides a more in-depth discussion of the | |
| 3201 | +behavior and the rationale for the behavior. | |
| 3202 | + | |
| 3203 | +*Why were type errors made into warnings?* When type checks were | |
| 3204 | +introduced into qpdf in the early days, it was expected that type errors | |
| 3205 | +would only occur as a result of programmer error. However, in practice, | |
| 3206 | +type errors would occur with malformed PDF files because of assumptions | |
| 3207 | +made in code, including code within the qpdf library and code written by | |
| 3208 | +library users. The most common case would be chaining calls to | |
| 3209 | +``getKey()`` to access keys deep within a dictionary. In many cases, | |
| 3210 | +qpdf would be able to recover from these situations, but the old | |
| 3211 | +behavior often resulted in crashes rather than graceful recovery. For | |
| 3212 | +this reason, the errors were changed to warnings. | |
| 3213 | + | |
| 3214 | +*Why even warn about type errors when the user can't usually do anything | |
| 3215 | +about them?* Type warnings are extremely valuable during development. | |
| 3216 | +Since it's impossible to catch at compile time things like typos in | |
| 3217 | +dictionary key names or logic errors around what the structure of a PDF | |
| 3218 | +file might be, the presence of type warnings can save lots of developer | |
| 3219 | +time. They have also proven useful in exposing issues in qpdf itself | |
| 3220 | +that would have otherwise gone undetected. | |
| 3221 | + | |
| 3222 | +*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if | |
| 3223 | +``QPDFObjectHandle`` could be more strongly typed so that you'd have to | |
| 3224 | +have check that something was of a particular type before calling | |
| 3225 | +type-specific accessor methods. However, implementing this at this stage | |
| 3226 | +of the library's history would be quite difficult, and it would make a | |
| 3227 | +the common pattern of drilling into an object no longer work. While it | |
| 3228 | +would be possible to have a parallel interface, it would create a lot of | |
| 3229 | +extra code. If qpdf were written in a language like rust, an interface | |
| 3230 | +like this would make a lot of sense, but, for a variety of reasons, the | |
| 3231 | +qpdf API is consistent with other APIs of its time, relying on exception | |
| 3232 | +handling to catch errors. The underlying PDF objects are inherently not | |
| 3233 | +type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would | |
| 3234 | +ultimately cause a lot more code to have to be written and would like | |
| 3235 | +make software that uses qpdf more brittle, and even so, checks would | |
| 3236 | +have to occur at runtime. | |
| 3237 | + | |
| 3238 | +*Why do type errors sometimes raise exceptions?* The way warnings work | |
| 3239 | +in qpdf requires a ``QPDF`` object to be associated with an object | |
| 3240 | +handle for a warning to be issued. It would be nice if this could be | |
| 3241 | +fixed, but it would require major changes to the API. Rather than | |
| 3242 | +throwing away these conditions, we convert them to exceptions. It's not | |
| 3243 | +that bad though. Since any object handle that was read from a file has | |
| 3244 | +an associated ``QPDF`` object, it would only be type errors on objects | |
| 3245 | +that were created explicitly that would cause exceptions, and in that | |
| 3246 | +case, type errors are much more likely to be the result of a coding | |
| 3247 | +error than invalid input. | |
| 3248 | + | |
| 3249 | +*Why does the behavior of a type exception differ between the C and C++ | |
| 3250 | +API?* There is no way to throw and catch exceptions in C short of | |
| 3251 | +something like ``setjmp`` and ``longjmp``, and that approach is not | |
| 3252 | +portable across language barriers. Since the C API is often used from | |
| 3253 | +other languages, it's important to keep things as simple as possible. | |
| 3254 | +Starting in qpdf 10.5, exceptions that used to crash code using the C | |
| 3255 | +API will be written to stderr by default, and it is possible to register | |
| 3256 | +an error handler. There's no reason that the error handler can't | |
| 3257 | +simulate exception handling in some way, such as by using ``setjmp`` and | |
| 3258 | +``longjmp`` or by setting some variable that can be checked after | |
| 3259 | +library calls are made. In retrospect, it might have been better if the | |
| 3260 | +C API object handle methods returned error codes like the other methods | |
| 3261 | +and set return values in passed-in pointers, but this would complicate | |
| 3262 | +both the implementation and the use of the library for a case that is | |
| 3263 | +actually quite rare and largely avoidable. | |
| 3264 | + | |
| 3265 | +.. _ref.linearization: | |
| 3266 | + | |
| 3267 | +Linearization | |
| 3268 | +============= | |
| 3269 | + | |
| 3270 | +This chapter describes how ``QPDF`` and ``QPDFWriter`` implement | |
| 3271 | +creation and processing of linearized PDFS. | |
| 3272 | + | |
| 3273 | +.. _ref.linearization-strategy: | |
| 3274 | + | |
| 3275 | +Basic Strategy for Linearization | |
| 3276 | +-------------------------------- | |
| 3277 | + | |
| 3278 | +To avoid the incestuous problem of having the qpdf library validate its | |
| 3279 | +own linearized files, we have a special linearized file checking mode | |
| 3280 | +which can be invoked via @1@command@1@qpdf | |
| 3281 | +--check-linearization@2@command@2@ (or @1@command@1@qpdf | |
| 3282 | +--check@2@command@2@). This mode reads the linearization parameter | |
| 3283 | +dictionary and the hint streams and validates that object ordering, | |
| 3284 | +parameters, and hint stream contents are correct. The validation code | |
| 3285 | +was first tested against linearized files created by external tools | |
| 3286 | +(Acrobat and pdlin) and then used to validate files created by | |
| 3287 | +``QPDFWriter`` itself. | |
| 3288 | + | |
| 3289 | +.. _ref.linearized.preparation: | |
| 3290 | + | |
| 3291 | +Preparing For Linearization | |
| 3292 | +--------------------------- | |
| 3293 | + | |
| 3294 | +Before creating a linearized PDF file from any other PDF file, the PDF | |
| 3295 | +file must be altered such that all page attributes are propagated down | |
| 3296 | +to the page level (and not inherited from parents in the ``/Pages`` | |
| 3297 | +tree). We also have to know which objects refer to which other objects, | |
| 3298 | +being concerned with page boundaries and a few other cases. We refer to | |
| 3299 | +this part of preparing the PDF file as | |
| 3300 | +@1@firstterm@1@optimization@2@firstterm@2@, discussed in | |
| 3301 | +`Optimization <#ref.optimization>`__. Note the, in this context, the | |
| 3302 | +term @1@firstterm@1@optimization@2@firstterm@2@ is a qpdf term, and the | |
| 3303 | +term @1@firstterm@1@linearization@2@firstterm@2@ is a term from the PDF | |
| 3304 | +specification. Do not be confused by the fact that many applications | |
| 3305 | +refer to linearization as optimization or web optimization. | |
| 3306 | + | |
| 3307 | +When creating linearized PDF files from optimized PDF files, there are | |
| 3308 | +really only a few issues that need to be dealt with: | |
| 3309 | + | |
| 3310 | +- Creation of hints tables | |
| 3311 | + | |
| 3312 | +- Placing objects in the correct order | |
| 3313 | + | |
| 3314 | +- Filling in offsets and byte sizes | |
| 3315 | + | |
| 3316 | +.. _ref.optimization: | |
| 3317 | + | |
| 3318 | +Optimization | |
| 3319 | +------------ | |
| 3320 | + | |
| 3321 | +In order to perform various operations such as linearization and | |
| 3322 | +splitting files into pages, it is necessary to know which objects are | |
| 3323 | +referenced by which pages, page thumbnails, and root and trailer | |
| 3324 | +dictionary keys. It is also necessary to ensure that all page-level | |
| 3325 | +attributes appear directly at the page level and are not inherited from | |
| 3326 | +parents in the pages tree. | |
| 3327 | + | |
| 3328 | +We refer to the process of enforcing these constraints as | |
| 3329 | +@1@firstterm@1@optimization@2@firstterm@2@. As mentioned above, note | |
| 3330 | +that some applications refer to linearization as optimization. Although | |
| 3331 | +this optimization was initially motivated by the need to create | |
| 3332 | +linearized files, we are using these terms separately. | |
| 3333 | + | |
| 3334 | +PDF file optimization is implemented in the | |
| 3335 | +@1@filename@1@QPDF_optimization.cc@2@filename@2@ source file. That file | |
| 3336 | +is richly commented and serves as the primary reference for the | |
| 3337 | +optimization process. | |
| 3338 | + | |
| 3339 | +After optimization has been completed, the private member variables | |
| 3340 | +``obj_user_to_objects`` and ``object_to_obj_users`` in ``QPDF`` have | |
| 3341 | +been populated. Any object that has more than one value in the | |
| 3342 | +``object_to_obj_users`` table is shared. Any object that has exactly one | |
| 3343 | +value in the ``object_to_obj_users`` table is private. To find all the | |
| 3344 | +private objects in a page or a trailer or root dictionary key, one | |
| 3345 | +merely has make this determination for each element in the | |
| 3346 | +``obj_user_to_objects`` table for the given page or key. | |
| 3347 | + | |
| 3348 | +Note that pages and thumbnails have different object user types, so the | |
| 3349 | +above test on a page will not include objects referenced by the page's | |
| 3350 | +thumbnail dictionary and nothing else. | |
| 3351 | + | |
| 3352 | +.. _ref.linearization.writing: | |
| 3353 | + | |
| 3354 | +Writing Linearized Files | |
| 3355 | +------------------------ | |
| 3356 | + | |
| 3357 | +We will create files with only primary hint streams. We will never write | |
| 3358 | +overflow hint streams. (As of PDF version 1.4, Acrobat doesn't either, | |
| 3359 | +and they are never necessary.) The hint streams contain offset | |
| 3360 | +information to objects that point to where they would be if the hint | |
| 3361 | +stream were not present. This means that we have to calculate all object | |
| 3362 | +positions before we can generate and write the hint table. This means | |
| 3363 | +that we have to generate the file in two passes. To make this reliable, | |
| 3364 | +``QPDFWriter`` in linearization mode invokes exactly the same code twice | |
| 3365 | +to write the file to a pipeline. | |
| 3366 | + | |
| 3367 | +In the first pass, the target pipeline is a count pipeline chained to a | |
| 3368 | +discard pipeline. The count pipeline simply passes its data through to | |
| 3369 | +the next pipeline in the chain but can return the number of bytes passed | |
| 3370 | +through it at any intermediate point. The discard pipeline is an end of | |
| 3371 | +line pipeline that just throws its data away. The hint stream is not | |
| 3372 | +written and dummy values with adequate padding are stored in the first | |
| 3373 | +cross reference table, linearization parameter dictionary, and /Prev key | |
| 3374 | +of the first trailer dictionary. All the offset, length, object | |
| 3375 | +renumbering information, and anything else we need for the second pass | |
| 3376 | +is stored. | |
| 3377 | + | |
| 3378 | +At the end of the first pass, this information is passed to the ``QPDF`` | |
| 3379 | +class which constructs a compressed hint stream in a memory buffer and | |
| 3380 | +returns it. ``QPDFWriter`` uses this information to write a complete | |
| 3381 | +hint stream object into a memory buffer. At this point, the length of | |
| 3382 | +the hint stream is known. | |
| 3383 | + | |
| 3384 | +In the second pass, the end of the pipeline chain is a regular file | |
| 3385 | +instead of a discard pipeline, and we have known values for all the | |
| 3386 | +offsets and lengths that we didn't have in the first pass. We have to | |
| 3387 | +adjust offsets that appear after the start of the hint stream by the | |
| 3388 | +length of the hint stream, which is known. Anything that is of variable | |
| 3389 | +length is padded, with the padding code surrounding any writing code | |
| 3390 | +that differs in the two passes. This ensures that changes to the way | |
| 3391 | +things are represented never results in offsets that were gathered | |
| 3392 | +during the first pass becoming incorrect for the second pass. | |
| 3393 | + | |
| 3394 | +Using this strategy, we can write linearized files to a non-seekable | |
| 3395 | +output stream with only a single pass to disk or wherever the output is | |
| 3396 | +going. | |
| 3397 | + | |
| 3398 | +.. _ref.linearization-data: | |
| 3399 | + | |
| 3400 | +Calculating Linearization Data | |
| 3401 | +------------------------------ | |
| 3402 | + | |
| 3403 | +Once a file is optimized, we have information about which objects access | |
| 3404 | +which other objects. We can then process these tables to decide which | |
| 3405 | +part (as described in "Linearized PDF Document Structure" in the PDF | |
| 3406 | +specification) each object is contained within. This tells us the exact | |
| 3407 | +order in which objects are written. The ``QPDFWriter`` class asks for | |
| 3408 | +this information and enqueues objects for writing in the proper order. | |
| 3409 | +It also turns on a check that causes an exception to be thrown if an | |
| 3410 | +object is encountered that has not already been queued. (This could | |
| 3411 | +happen only if there were a bug in the traversal code used to calculate | |
| 3412 | +the linearization data.) | |
| 3413 | + | |
| 3414 | +.. _ref.linearization-issues: | |
| 3415 | + | |
| 3416 | +Known Issues with Linearization | |
| 3417 | +------------------------------- | |
| 3418 | + | |
| 3419 | +There are a handful of known issues with this linearization code. These | |
| 3420 | +issues do not appear to impact the behavior of linearized files which | |
| 3421 | +still work as intended: it is possible for a web browser to begin to | |
| 3422 | +display them before they are fully downloaded. In fact, it seems that | |
| 3423 | +various other programs that create linearized files have many of these | |
| 3424 | +same issues. These items make reference to terminology used in the | |
| 3425 | +linearization appendix of the PDF specification. | |
| 3426 | + | |
| 3427 | +- Thread Dictionary information keys appear in part 4 with the rest of | |
| 3428 | + Threads instead of in part 9. Objects in part 9 are not grouped | |
| 3429 | + together functionally. | |
| 3430 | + | |
| 3431 | +- We are not calculating numerators for shared object positions within | |
| 3432 | + content streams or interleaving them within content streams. | |
| 3433 | + | |
| 3434 | +- We generate only page offset, shared object, and outline hint tables. | |
| 3435 | + It would be relatively easy to add some additional tables. We gather | |
| 3436 | + most of the information needed to create thumbnail hint tables. There | |
| 3437 | + are comments in the code about this. | |
| 3438 | + | |
| 3439 | +.. _ref.linearization-debugging: | |
| 3440 | + | |
| 3441 | +Debugging Note | |
| 3442 | +-------------- | |
| 3443 | + | |
| 3444 | +The @1@command@1@qpdf --show-linearization@2@command@2@ command can show | |
| 3445 | +the complete contents of linearization hint streams. To look at the raw | |
| 3446 | +data, you can extract the filtered contents of the linearization hint | |
| 3447 | +tables using @1@command@1@qpdf --show-object=n | |
| 3448 | +--filtered-stream-data@2@command@2@. Then, to convert this into a bit | |
| 3449 | +stream (since linearization tables are bit streams written without | |
| 3450 | +regard to byte boundaries), you can pipe the resulting data through the | |
| 3451 | +following perl code: | |
| 3452 | + | |
| 3453 | +:: | |
| 3454 | + | |
| 3455 | + use bytes; | |
| 3456 | + binmode STDIN; | |
| 3457 | + undef $/; | |
| 3458 | + my $a = <STDIN>; | |
| 3459 | + my @ch = split(//, $a); | |
| 3460 | + map { printf("%08b", ord($_)) } @ch; | |
| 3461 | + print "\n"; | |
| 3462 | + | |
| 3463 | +.. _ref.object-and-xref-streams: | |
| 3464 | + | |
| 3465 | +Object and Cross-Reference Streams | |
| 3466 | +================================== | |
| 3467 | + | |
| 3468 | +This chapter provides information about the implementation of object | |
| 3469 | +stream and cross-reference stream support in qpdf. | |
| 3470 | + | |
| 3471 | +.. _ref.object-streams: | |
| 3472 | + | |
| 3473 | +Object Streams | |
| 3474 | +-------------- | |
| 3475 | + | |
| 3476 | +Object streams can contain any regular object except the following: | |
| 3477 | + | |
| 3478 | +- stream objects | |
| 3479 | + | |
| 3480 | +- objects with generation > 0 | |
| 3481 | + | |
| 3482 | +- the encryption dictionary | |
| 3483 | + | |
| 3484 | +- objects containing the /Length of another stream | |
| 3485 | + | |
| 3486 | +In addition, Adobe reader (at least as of version 8.0.0) appears to not | |
| 3487 | +be able to handle having the document catalog appear in an object stream | |
| 3488 | +if the file is encrypted, though this is not specifically disallowed by | |
| 3489 | +the specification. | |
| 3490 | + | |
| 3491 | +There are additional restrictions for linearized files. See | |
| 3492 | +`Implications for Linearized | |
| 3493 | +Files <#ref.object-streams-linearization>`__\ for details. | |
| 3494 | + | |
| 3495 | +The PDF specification refers to objects in object streams as "compressed | |
| 3496 | +objects" regardless of whether the object stream is compressed. | |
| 3497 | + | |
| 3498 | +The generation number of every object in an object stream must be zero. | |
| 3499 | +It is possible to delete and replace an object in an object stream with | |
| 3500 | +a regular object. | |
| 3501 | + | |
| 3502 | +The object stream dictionary has the following keys: | |
| 3503 | + | |
| 3504 | +- ``/N``: number of objects | |
| 3505 | + | |
| 3506 | +- ``/First``: byte offset of first object | |
| 3507 | + | |
| 3508 | +- ``/Extends``: indirect reference to stream that this extends | |
| 3509 | + | |
| 3510 | +Stream collections are formed with ``/Extends``. They must form a | |
| 3511 | +directed acyclic graph. These can be used for semantic information and | |
| 3512 | +are not meaningful to the PDF document's syntactic structure. Although | |
| 3513 | +qpdf preserves stream collections, it never generates them and doesn't | |
| 3514 | +make use of this information in any way. | |
| 3515 | + | |
| 3516 | +The specification recommends limiting the number of objects in object | |
| 3517 | +stream for efficiency in reading and decoding. Acrobat 6 uses no more | |
| 3518 | +than 100 objects per object stream for linearized files and no more 200 | |
| 3519 | +objects per stream for non-linearized files. ``QPDFWriter``, in object | |
| 3520 | +stream generation mode, never puts more than 100 objects in an object | |
| 3521 | +stream. | |
| 3522 | + | |
| 3523 | +Object stream contents consists of *N* pairs of integers, each of which | |
| 3524 | +is the object number and the byte offset of the object relative to the | |
| 3525 | +first object in the stream, followed by the objects themselves, | |
| 3526 | +concatenated. | |
| 3527 | + | |
| 3528 | +.. _ref.xref-streams: | |
| 3529 | + | |
| 3530 | +Cross-Reference Streams | |
| 3531 | +----------------------- | |
| 3532 | + | |
| 3533 | +For non-hybrid files, the value following ``startxref`` is the byte | |
| 3534 | +offset to the xref stream rather than the word ``xref``. | |
| 3535 | + | |
| 3536 | +For hybrid files (files containing both xref tables and cross-reference | |
| 3537 | +streams), the xref table's trailer dictionary contains the key | |
| 3538 | +``/XRefStm`` whose value is the byte offset to a cross-reference stream | |
| 3539 | +that supplements the xref table. A PDF 1.5-compliant application should | |
| 3540 | +read the xref table first. Then it should replace any object that it has | |
| 3541 | +already seen with any defined in the xref stream. Then it should follow | |
| 3542 | +any ``/Prev`` pointer in the original xref table's trailer dictionary. | |
| 3543 | +The specification is not clear about what should be done, if anything, | |
| 3544 | +with a ``/Prev`` pointer in the xref stream referenced by an xref table. | |
| 3545 | +The ``QPDF`` class ignores it, which is probably reasonable since, if | |
| 3546 | +this case were to appear for any sensible PDF file, the previous xref | |
| 3547 | +table would probably have a corresponding ``/XRefStm`` pointer of its | |
| 3548 | +own. For example, if a hybrid file were appended, the appended section | |
| 3549 | +would have its own xref table and ``/XRefStm``. The appended xref table | |
| 3550 | +would point to the previous xref table which would point the | |
| 3551 | +``/XRefStm``, meaning that the new ``/XRefStm`` doesn't have to point to | |
| 3552 | +it. | |
| 3553 | + | |
| 3554 | +Since xref streams must be read very early, they may not be encrypted, | |
| 3555 | +and the may not contain indirect objects for keys required to read them, | |
| 3556 | +which are these: | |
| 3557 | + | |
| 3558 | +- ``/Type``: value ``/XRef`` | |
| 3559 | + | |
| 3560 | +- ``/Size``: value *n+1*: where *n* is highest object number (same as | |
| 3561 | + ``/Size`` in the trailer dictionary) | |
| 3562 | + | |
| 3563 | +- ``/Index`` (optional): value | |
| 3564 | + ``[@1@replaceable@1@n count@2@replaceable@2@ ...]`` used to determine | |
| 3565 | + which objects' information is stored in this stream. The default is | |
| 3566 | + ``[0 /Size]``. | |
| 3567 | + | |
| 3568 | +- ``/Prev``: value @1@replaceable@1@offset@2@replaceable@2@: byte | |
| 3569 | + offset of previous xref stream (same as ``/Prev`` in the trailer | |
| 3570 | + dictionary) | |
| 3571 | + | |
| 3572 | +- ``/W [...]``: sizes of each field in the xref table | |
| 3573 | + | |
| 3574 | +The other fields in the xref stream, which may be indirect if desired, | |
| 3575 | +are the union of those from the xref table's trailer dictionary. | |
| 3576 | + | |
| 3577 | +.. _ref.xref-stream-data: | |
| 3578 | + | |
| 3579 | +Cross-Reference Stream Data | |
| 3580 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| 3581 | + | |
| 3582 | +The stream data is binary and encoded in big-endian byte order. Entries | |
| 3583 | +are concatenated, and each entry has a length equal to the total of the | |
| 3584 | +entries in ``/W`` above. Each entry consists of one or more fields, the | |
| 3585 | +first of which is the type of the field. The number of bytes for each | |
| 3586 | +field is given by ``/W`` above. A 0 in ``/W`` indicates that the field | |
| 3587 | +is omitted and has the default value. The default value for the field | |
| 3588 | +type is "``1``". All other default values are "``0``". | |
| 3589 | + | |
| 3590 | +PDF 1.5 has three field types: | |
| 3591 | + | |
| 3592 | +- 0: for free objects. Format: ``0 obj | |
| 3593 | + next-generation``, same as the free table in a traditional | |
| 3594 | + cross-reference table | |
| 3595 | + | |
| 3596 | +- 1: regular non-compressed object. Format: ``1 offset | |
| 3597 | + generation`` | |
| 3598 | + | |
| 3599 | +- 2: for objects in object streams. Format: ``2 | |
| 3600 | + object-stream-number index``, the number of object stream | |
| 3601 | + containing the object and the index within the object stream of the | |
| 3602 | + object. | |
| 3603 | + | |
| 3604 | +It seems standard to have the first entry in the table be ``0 0 0`` | |
| 3605 | +instead of ``0 0 ffff`` if there are no deleted objects. | |
| 3606 | + | |
| 3607 | +.. _ref.object-streams-linearization: | |
| 3608 | + | |
| 3609 | +Implications for Linearized Files | |
| 3610 | +--------------------------------- | |
| 3611 | + | |
| 3612 | +For linearized files, the linearization dictionary, document catalog, | |
| 3613 | +and page objects may not be contained in object streams. | |
| 3614 | + | |
| 3615 | +Objects stored within object streams are given the highest range of | |
| 3616 | +object numbers within the main and first-page cross-reference sections. | |
| 3617 | + | |
| 3618 | +It is okay to use cross-reference streams in place of regular xref | |
| 3619 | +tables. There are on special considerations. | |
| 3620 | + | |
| 3621 | +Hint data refers to object streams themselves, not the objects in the | |
| 3622 | +streams. Shared object references should also be made to the object | |
| 3623 | +streams. There are no reference in any hint tables to the object numbers | |
| 3624 | +of compressed objects (objects within object streams). | |
| 3625 | + | |
| 3626 | +When numbering objects, all shared objects within both the first and | |
| 3627 | +second halves of the linearized files must be numbered consecutively | |
| 3628 | +after all normal uncompressed objects in that half. | |
| 3629 | + | |
| 3630 | +.. _ref.object-stream-implementation: | |
| 3631 | + | |
| 3632 | +Implementation Notes | |
| 3633 | +-------------------- | |
| 3634 | + | |
| 3635 | +There are three modes for writing object streams: | |
| 3636 | +@1@option@1@disable@2@option@2@, @1@option@1@preserve@2@option@2@, and | |
| 3637 | +@1@option@1@generate@2@option@2@. In disable mode, we do not generate | |
| 3638 | +any object streams, and we also generate an xref table rather than xref | |
| 3639 | +streams. This can be used to generate PDF files that are viewable with | |
| 3640 | +older readers. In preserve mode, we write object streams such that | |
| 3641 | +written object streams contain the same objects and ``/Extends`` | |
| 3642 | +relationships as in the original file. This is equal to disable if the | |
| 3643 | +file has no object streams. In generate, we create object streams | |
| 3644 | +ourselves by grouping objects that are allowed in object streams | |
| 3645 | +together in sets of no more than 100 objects. We also ensure that the | |
| 3646 | +PDF version is at least 1.5 in generate mode, but we preserve the | |
| 3647 | +version header in the other modes. The default is | |
| 3648 | +@1@option@1@preserve@2@option@2@. | |
| 3649 | + | |
| 3650 | +We do not support creation of hybrid files. When we write files, even in | |
| 3651 | +preserve mode, we will lose any xref tables and merge any appended | |
| 3652 | +sections. | |
| 3653 | + | |
| 3654 | +.. _ref.release-notes: | |
| 3655 | + | |
| 3656 | +Release Notes | |
| 3657 | +============= | |
| 3658 | + | |
| 3659 | +For a detailed list of changes, please see the file | |
| 3660 | +@1@filename@1@ChangeLog@2@filename@2@ in the source distribution. | |
| 3661 | + | |
| 3662 | +10.5.0: XXX Month dd, YYYY | |
| 3663 | + - Library Enhancements | |
| 3664 | + | |
| 3665 | + - Since qpdf version 8, using object accessor methods on an | |
| 3666 | + instance of ``QPDFObjectHandle`` may create warnings if the | |
| 3667 | + object is not of the expected type. These warnings now have an | |
| 3668 | + error code of ``qpdf_e_object`` instead of | |
| 3669 | + ``qpdf_e_damaged_pdf``. Also, comments have been added to | |
| 3670 | + @1@filename@1@QPDFObjectHandle.hh@2@filename@2@ to explain in | |
| 3671 | + more detail what the behavior is. See `Object Accessor | |
| 3672 | + Methods <#ref.object-accessors>`__ for a more in-depth | |
| 3673 | + discussion. | |
| 3674 | + | |
| 3675 | + - Overhaul error handling for the object handle functions in the | |
| 3676 | + C API. See comments in the "Object handling" section of | |
| 3677 | + @1@filename@1@include/qpdf/qpdf-c.h@2@filename@2@ for details. | |
| 3678 | + In particular, exceptions thrown by the underlying C++ code | |
| 3679 | + when calling object accessors are caught and converted into | |
| 3680 | + errors. The errors can be trapped by registering an error | |
| 3681 | + handler with ``qpdf_register_oh_error_handler`` or will be | |
| 3682 | + written to stderr if no handler is registered. | |
| 3683 | + | |
| 3684 | + - Add ``qpdf_get_last_string_length`` to the C API to get the | |
| 3685 | + length of the last string that was returned. This is needed to | |
| 3686 | + handle strings that contain embedded null characters. | |
| 3687 | + | |
| 3688 | + - Add ``qpdf_oh_is_initialized`` and | |
| 3689 | + ``qpdf_oh_new_uninitialized`` to the C API to make it possible | |
| 3690 | + to work with uninitialized objects. | |
| 3691 | + | |
| 3692 | + - Add ``qpdf_oh_new_object`` to the C API. This allows you to | |
| 3693 | + clone an object handle. | |
| 3694 | + | |
| 3695 | + - Add ``qpdf_get_object_by_id``, ``qpdf_make_indirect_object``, | |
| 3696 | + and ``qpdf_replace_object``, exposing the corresponding methods | |
| 3697 | + in ``QPDF`` and ``QPDFObjectHandle``. | |
| 3698 | + | |
| 3699 | +10.4.0: November 16, 2021 | |
| 3700 | + - Handling of Weak Cryptography Algorithms | |
| 3701 | + | |
| 3702 | + - From the qpdf CLI, the | |
| 3703 | + @1@option@1@--allow-weak-crypto@2@option@2@ is now required to | |
| 3704 | + suppress a warning when explicitly creating PDF files using RC4 | |
| 3705 | + encryption. While qpdf will always retain the ability to read | |
| 3706 | + and write such files, doing so will require explicit | |
| 3707 | + acknowledgment moving forward. For qpdf 10.4, this change only | |
| 3708 | + affects the command-line tool. Starting in qpdf 11, there will | |
| 3709 | + be small API changes to require explicit acknowledgment in | |
| 3710 | + those cases as well. For additional information, see `Weak | |
| 3711 | + Cryptography <#ref.weak-crypto>`__. | |
| 3712 | + | |
| 3713 | + - Bug Fixes | |
| 3714 | + | |
| 3715 | + - Fix potential bounds error when handling shell completion that | |
| 3716 | + could occur when given bogus input. | |
| 3717 | + | |
| 3718 | + - Properly handle overlay/underlay on completely empty pages | |
| 3719 | + (with no resource dictionary). | |
| 3720 | + | |
| 3721 | + - Fix crash that could occur under certain conditions when using | |
| 3722 | + @1@option@1@--pages@2@option@2@ with files that had form | |
| 3723 | + fields. | |
| 3724 | + | |
| 3725 | + - Library Enhancements | |
| 3726 | + | |
| 3727 | + - Make ``QPDF::findPage`` functions public. | |
| 3728 | + | |
| 3729 | + - Add methods to ``Pl_Flate`` to be able to receive warnings on | |
| 3730 | + certain recoverable conditions. | |
| 3731 | + | |
| 3732 | + - Add an extra check to the library to detect when foreign | |
| 3733 | + objects are inserted directly (instead of using | |
| 3734 | + ``QPDF::copyForeignObject``) at the time of insertion rather | |
| 3735 | + than when the file is written. Catching the error sooner makes | |
| 3736 | + it much easier to locate the incorrect code. | |
| 3737 | + | |
| 3738 | + - CLI Enhancements | |
| 3739 | + | |
| 3740 | + - Improve diagnostics around parsing | |
| 3741 | + @1@option@1@--pages@2@option@2@ command-line options | |
| 3742 | + | |
| 3743 | + - Packaging Changes | |
| 3744 | + | |
| 3745 | + - The Windows binary distribution is now built with crypto | |
| 3746 | + provided by OpenSSL 3.0. | |
| 3747 | + | |
| 3748 | +10.3.2: May 8, 2021 | |
| 3749 | + - Bug Fixes | |
| 3750 | + | |
| 3751 | + - When generating a file while preserving object streams, | |
| 3752 | + unreferenced objects are correctly removed unless | |
| 3753 | + @1@option@1@--preserve-unreferenced@2@option@2@ is specified. | |
| 3754 | + | |
| 3755 | + - Library Enhancements | |
| 3756 | + | |
| 3757 | + - When adding a page that already exists, make a shallow copy | |
| 3758 | + instead of throwing an exception. This makes the library | |
| 3759 | + behavior consistent with the CLI behavior. See | |
| 3760 | + @1@filename@1@ChangeLog@2@filename@2@ for additional notes. | |
| 3761 | + | |
| 3762 | +10.3.1: March 11, 2021 | |
| 3763 | + - Bug Fixes | |
| 3764 | + | |
| 3765 | + - Form field copying failed on files where /DR was a direct | |
| 3766 | + object in the document-level form dictionary. | |
| 3767 | + | |
| 3768 | +10.3.0: March 4, 2021 | |
| 3769 | + - Bug Fixes | |
| 3770 | + | |
| 3771 | + - The code for handling form fields when copying pages from | |
| 3772 | + 10.2.0 was not quite right and didn't work in a number of | |
| 3773 | + situations, such as when the same page was copied multiple | |
| 3774 | + times or when there were conflicting resource or field names | |
| 3775 | + across multiple copies. The 10.3.0 code has been much more | |
| 3776 | + thoroughly tested with more complex cases and with a multitude | |
| 3777 | + of readers and should be much closer to correct. The 10.2.0 | |
| 3778 | + code worked well enough for page splitting or for copying pages | |
| 3779 | + with form fields into documents that didn't already have them | |
| 3780 | + but was still not quite correct in handling of field-level | |
| 3781 | + resources. | |
| 3782 | + | |
| 3783 | + - When ``QPDF::replaceObject`` or ``QPDF::swapObjects`` is | |
| 3784 | + called, existing ``QPDFObjectHandle`` instances no longer point | |
| 3785 | + to the old objects. The next time they are accessed, they | |
| 3786 | + automatically notice the change to the underlying object and | |
| 3787 | + update themselves. This resolves a very longstanding source of | |
| 3788 | + confusion, albeit in a very rarely used method call. | |
| 3789 | + | |
| 3790 | + - Fix form field handling code to look for default appearances, | |
| 3791 | + quadding, and default resources in the right places. The code | |
| 3792 | + was not looking for things in the document-level interactive | |
| 3793 | + form dictionary that it was supposed to be finding there. This | |
| 3794 | + required adding a few new methods to | |
| 3795 | + ``QPDFFormFieldObjectHelper``. | |
| 3796 | + | |
| 3797 | + - Library Enhancements | |
| 3798 | + | |
| 3799 | + - Reworked the code that handles copying annotations and form | |
| 3800 | + fields during page operations. There were additional methods | |
| 3801 | + added to the public API from 10.2.0 and a one deprecation of a | |
| 3802 | + method added in 10.2.0. The majority of the API changes are in | |
| 3803 | + methods most people would never call and that will hopefully be | |
| 3804 | + superseded by higher-level interfaces for handling page copies. | |
| 3805 | + Please see the @1@filename@1@ChangeLog@2@filename@2@ file for | |
| 3806 | + details. | |
| 3807 | + | |
| 3808 | + - The method ``QPDF::numWarnings`` was added so that you can tell | |
| 3809 | + whether any warnings happened during a specific block of code. | |
| 3810 | + | |
| 3811 | +10.2.0: February 23, 2021 | |
| 3812 | + - CLI Behavior Changes | |
| 3813 | + | |
| 3814 | + - Operations that work on combining pages are much better about | |
| 3815 | + protecting form fields. In particular, | |
| 3816 | + @1@option@1@--split-pages@2@option@2@ and | |
| 3817 | + @1@option@1@--pages@2@option@2@ now preserve interaction form | |
| 3818 | + functionality by copying the relevant form field information | |
| 3819 | + from the original files. Additionally, if you use | |
| 3820 | + @1@option@1@--pages@2@option@2@ to select only some pages from | |
| 3821 | + the original input file, unused form fields are removed, which | |
| 3822 | + prevents lots of unused annotations from being retained. | |
| 3823 | + | |
| 3824 | + - By default, @1@command@1@qpdf@2@command@2@ no longer allows | |
| 3825 | + creation of encrypted PDF files whose user password is | |
| 3826 | + non-empty and owner password is empty when a 256-bit key is in | |
| 3827 | + use. The @1@option@1@--allow-insecure@2@option@2@ option, | |
| 3828 | + specified inside the @1@option@1@--encrypt@2@option@2@ options, | |
| 3829 | + allows creation of such files. Behavior changes in the CLI are | |
| 3830 | + avoided when possible, but an exception was made here because | |
| 3831 | + this is security-related. qpdf must always allow creation of | |
| 3832 | + weird files for testing purposes, but it should not default to | |
| 3833 | + letting users unknowingly create insecure files. | |
| 3834 | + | |
| 3835 | + - Library Behavior Changes | |
| 3836 | + | |
| 3837 | + - Note: the changes in this section cause differences in output | |
| 3838 | + in some cases. These differences change the syntax of the PDF | |
| 3839 | + but do not change the semantics (meaning). I make a strong | |
| 3840 | + effort to avoid gratuitous changes in qpdf's output so that | |
| 3841 | + qpdf changes don't break people's tests. In this case, the | |
| 3842 | + changes significantly improve the readability of the generated | |
| 3843 | + PDF and don't affect any output that's generated by simple | |
| 3844 | + transformation. If you are annoyed by having to update test | |
| 3845 | + files, please rest assured that changes like this have been and | |
| 3846 | + will continue to be rare events. | |
| 3847 | + | |
| 3848 | + - ``QPDFObjectHandle::newUnicodeString`` now uses whichever of | |
| 3849 | + ASCII, PDFDocEncoding, of UTF-16 is sufficient to encode all | |
| 3850 | + the characters in the string. This reduces needless encoding in | |
| 3851 | + UTF-16 of strings that can be encoded in ASCII. This change may | |
| 3852 | + cause qpdf to generate different output than before when form | |
| 3853 | + field values are set using ``QPDFFormFieldObjectHelper`` but | |
| 3854 | + does not change the meaning of the output. | |
| 3855 | + | |
| 3856 | + - The code that places form XObjects and also the code that | |
| 3857 | + flattens rotations trim trailing zeroes from real numbers that | |
| 3858 | + they calculate. This causes slight (but semantically | |
| 3859 | + equivalent) differences in generated appearance streams and | |
| 3860 | + form XObject invocations in overlay/underlay code or in user | |
| 3861 | + code that calls the methods that place form XObjects on a page. | |
| 3862 | + | |
| 3863 | + - CLI Enhancements | |
| 3864 | + | |
| 3865 | + - Add new command line options for listing, saving, adding, | |
| 3866 | + removing, and and copying file attachments. See `Embedded | |
| 3867 | + Files/Attachments Options <#ref.attachments>`__ for details. | |
| 3868 | + | |
| 3869 | + - Page splitting and merging operations, as well as | |
| 3870 | + @1@option@1@--flatten-rotation@2@option@2@, are better behaved | |
| 3871 | + with respect to annotations and interactive form fields. In | |
| 3872 | + most cases, interactive form field functionality and proper | |
| 3873 | + formatting and functionality of annotations is preserved by | |
| 3874 | + these operations. There are still some cases that aren't | |
| 3875 | + perfect, such as when functionality of annotations depends on | |
| 3876 | + document-level data that qpdf doesn't yet understand or when | |
| 3877 | + there are problems with referential integrity among form fields | |
| 3878 | + and annotations (e.g., when a single form field object or its | |
| 3879 | + associated annotations are shared across multiple pages, a case | |
| 3880 | + that is out of spec but that works in most viewers anyway). | |
| 3881 | + | |
| 3882 | + - The option | |
| 3883 | + @1@option@1@--password-file=@1@replaceable@1@filename@2@replaceable@2@@2@option@2@ | |
| 3884 | + can now be used to read the decryption password from a file. | |
| 3885 | + You can use ``-`` as the file name to read the password from | |
| 3886 | + standard input. This is an easier/more obvious way to read | |
| 3887 | + passwords from files or standard input than using | |
| 3888 | + @1@option@1@@file@2@option@2@ for this purpose. | |
| 3889 | + | |
| 3890 | + - Add some information about attachments to the json output, and | |
| 3891 | + added ``attachments`` as an additional json key. The | |
| 3892 | + information included here is limited to the preferred name and | |
| 3893 | + content stream and a reference to the file spec object. This is | |
| 3894 | + enough detail for clients to avoid the hassle of navigating a | |
| 3895 | + name tree and provides what is needed for basic enumeration and | |
| 3896 | + extraction of attachments. More detailed information can be | |
| 3897 | + obtained by following the reference to the file spec object. | |
| 3898 | + | |
| 3899 | + - Add numeric option to @1@option@1@--collate@2@option@2@. If | |
| 3900 | + @1@option@1@--collate=@1@replaceable@1@n@2@replaceable@2@@2@option@2@ | |
| 3901 | + is given, take pages in groups of | |
| 3902 | + @1@replaceable@1@n@2@replaceable@2@ from the given files. | |
| 3903 | + | |
| 3904 | + - It is now valid to provide @1@option@1@--rotate=0@2@option@2@ | |
| 3905 | + to clear rotation from a page. | |
| 3906 | + | |
| 3907 | + - Library Enhancements | |
| 3908 | + | |
| 3909 | + - This release includes numerous additions to the API. Not all | |
| 3910 | + changes are listed here. Please see the | |
| 3911 | + @1@filename@1@ChangeLog@2@filename@2@ file in the source | |
| 3912 | + distribution for a comprehensive list. Highlights appear below. | |
| 3913 | + | |
| 3914 | + - Add ``QPDFObjectHandle::ditems()`` and | |
| 3915 | + ``QPDFObjectHandle::aitems()`` that enable C++-style iteration, | |
| 3916 | + including range-for iteration, over dictionary and array | |
| 3917 | + QPDFObjectHandles. See comments in | |
| 3918 | + @1@filename@1@include/qpdf/QPDFObjectHandle.hh@2@filename@2@ | |
| 3919 | + and | |
| 3920 | + @1@filename@1@examples/pdf-name-number-tree.cc@2@filename@2@ | |
| 3921 | + for details. | |
| 3922 | + | |
| 3923 | + - Add ``QPDFObjectHandle::copyStream`` for making a copy of a | |
| 3924 | + stream within the same ``QPDF`` instance. | |
| 3925 | + | |
| 3926 | + - Add new helper classes for supporting file attachments, also | |
| 3927 | + known as embedded files. New classes are | |
| 3928 | + ``QPDFEmbeddedFileDocumentHelper``, | |
| 3929 | + ``QPDFFileSpecObjectHelper``, and ``QPDFEFStreamObjectHelper``. | |
| 3930 | + See their respective headers for details and | |
| 3931 | + @1@filename@1@examples/pdf-attach-file.cc@2@filename@2@ for an | |
| 3932 | + example. | |
| 3933 | + | |
| 3934 | + - Add a version of ``QPDFObjectHandle::parse`` that takes a | |
| 3935 | + ``QPDF`` pointer as context so that it can parse strings | |
| 3936 | + containing indirect object references. This is illustrated in | |
| 3937 | + @1@filename@1@examples/pdf-attach-file.cc@2@filename@2@. | |
| 3938 | + | |
| 3939 | + - Re-implement ``QPDFNameTreeObjectHelper`` and | |
| 3940 | + ``QPDFNumberTreeObjectHelper`` to be more efficient, add an | |
| 3941 | + iterator-based API, give them the capability to repair broken | |
| 3942 | + trees, and create methods for modifying the trees. With this | |
| 3943 | + change, qpdf has a robust read/write implementation of name and | |
| 3944 | + number trees. | |
| 3945 | + | |
| 3946 | + - Add new versions of ``QPDFObjectHandle::replaceStreamData`` | |
| 3947 | + that take ``std::function`` objects for cases when you need | |
| 3948 | + something between a static string and a full-fledged | |
| 3949 | + StreamDataProvider. Using this with ``QUtil::file_provider`` is | |
| 3950 | + a very easy way to create a stream from the contents of a file. | |
| 3951 | + | |
| 3952 | + - The ``QPDFMatrix`` class, formerly a private, internal class, | |
| 3953 | + has been added to the public API. See | |
| 3954 | + @1@filename@1@include/qpdf/QPDFMatrix.hh@2@filename@2@ for | |
| 3955 | + details. This class is for working with transformation | |
| 3956 | + matrices. Some methods in ``QPDFPageObjectHelper`` make use of | |
| 3957 | + this to make information about transformation matrices | |
| 3958 | + available. For an example, see | |
| 3959 | + @1@filename@1@examples/pdf-overlay-page.cc@2@filename@2@. | |
| 3960 | + | |
| 3961 | + - Several new methods were added to | |
| 3962 | + ``QPDFAcroFormDocumentHelper`` for adding, removing, getting | |
| 3963 | + information about, and enumerating form fields. | |
| 3964 | + | |
| 3965 | + - Add method | |
| 3966 | + ``QPDFAcroFormDocumentHelper::transformAnnotations``, which | |
| 3967 | + applies a transformation to each annotation on a page. | |
| 3968 | + | |
| 3969 | + - Add ``QPDFPageObjectHelper::copyAnnotations``, which copies | |
| 3970 | + annotations and, if applicable, associated form fields, from | |
| 3971 | + one page to another, possibly transforming the rectangles. | |
| 3972 | + | |
| 3973 | + - Build Changes | |
| 3974 | + | |
| 3975 | + - A C++-14 compiler is now required to build qpdf. There is no | |
| 3976 | + intention to require anything newer than that for a while. | |
| 3977 | + C++-14 includes modest enhancements to C++-11 and appears to be | |
| 3978 | + supported about as widely as C++-11. | |
| 3979 | + | |
| 3980 | + - Bug Fixes | |
| 3981 | + | |
| 3982 | + - The @1@option@1@--flatten-rotation@2@option@2@ option applies | |
| 3983 | + transformations to any annotations that may be on the page. | |
| 3984 | + | |
| 3985 | + - If a form XObject lacks a resources dictionary, consider any | |
| 3986 | + names in that form XObject to be referenced from the containing | |
| 3987 | + page. This is compliant with older PDF versions. Also detect if | |
| 3988 | + any form XObjects have any unresolved names and, if so, don't | |
| 3989 | + remove unreferenced resources from them or from the page that | |
| 3990 | + contains them. Unfortunately this has the side effect of | |
| 3991 | + preventing removal of unreferenced resources in some cases | |
| 3992 | + where names appear that don't refer to resources, such as with | |
| 3993 | + tagged PDF. This is a bit of a corner case that is not likely | |
| 3994 | + to cause a significant problem in practice, but the only side | |
| 3995 | + effect would be lack of removal of shared resources. A future | |
| 3996 | + version of qpdf may be more sophisticated in its detection of | |
| 3997 | + names that refer to resources. | |
| 3998 | + | |
| 3999 | + - Properly handle strings if they appear in inline image | |
| 4000 | + dictionaries while externalizing inline images. | |
| 4001 | + | |
| 4002 | +10.1.0: January 5, 2021 | |
| 4003 | + - CLI Enhancements | |
| 4004 | + | |
| 4005 | + - Add @1@option@1@--flatten-rotation@2@option@2@ command-line | |
| 4006 | + option, which causes all pages that are rotated using | |
| 4007 | + parameters in the page's dictionary to instead be identically | |
| 4008 | + rotated in the page's contents. The change is not user-visible | |
| 4009 | + for compliant PDF readers but can be used to work around broken | |
| 4010 | + PDF applications that don't properly handle page rotation. | |
| 4011 | + | |
| 4012 | + - Library Enhancements | |
| 4013 | + | |
| 4014 | + - Support for user-provided (pluggable, modular) stream filters. | |
| 4015 | + It is now possible to derive a class from ``QPDFStreamFilter`` | |
| 4016 | + and register it with ``QPDF`` so that regular library methods, | |
| 4017 | + including those used by ``QPDFWriter``, can decode streams with | |
| 4018 | + filters not directly supported by the library. The example | |
| 4019 | + @1@filename@1@examples/pdf-custom-filter.cc@2@filename@2@ | |
| 4020 | + illustrates how to use this capability. | |
| 4021 | + | |
| 4022 | + - Add methods to ``QPDFPageObjectHelper`` to iterate through | |
| 4023 | + XObjects on a page or form XObjects, possibly recursing into | |
| 4024 | + nested form XObjects: ``forEachXObject``, ``ForEachImage``, | |
| 4025 | + ``forEachFormXObject``. | |
| 4026 | + | |
| 4027 | + - Enhance several methods in ``QPDFPageObjectHelper`` to work | |
| 4028 | + with form XObjects as well as pages, as noted in comments. See | |
| 4029 | + @1@filename@1@ChangeLog@2@filename@2@ for a full list. | |
| 4030 | + | |
| 4031 | + - Rename some functions in ``QPDFPageObjectHelper``, while | |
| 4032 | + keeping old names for compatibility: | |
| 4033 | + | |
| 4034 | + - ``getPageImages`` to ``getImages`` | |
| 4035 | + | |
| 4036 | + - ``filterPageContents`` to ``filterContents`` | |
| 4037 | + | |
| 4038 | + - ``pipePageContents`` to ``pipeContents`` | |
| 4039 | + | |
| 4040 | + - ``parsePageContents`` to ``parseContents`` | |
| 4041 | + | |
| 4042 | + - Add method ``QPDFPageObjectHelper::getFormXObjects`` to return | |
| 4043 | + a map of form XObjects directly on a page or form XObject | |
| 4044 | + | |
| 4045 | + - Add new helper methods to ``QPDFObjectHandle``: | |
| 4046 | + ``isFormXObject``, ``isImage`` | |
| 4047 | + | |
| 4048 | + - Add the optional ``allow_streams`` parameter | |
| 4049 | + ``QPDFObjectHandle::makeDirect``. When | |
| 4050 | + ``QPDFObjectHandle::makeDirect`` is called in this way, it | |
| 4051 | + preserves references to streams rather than throwing an | |
| 4052 | + exception. | |
| 4053 | + | |
| 4054 | + - Add ``QPDFObjectHandle::setFilterOnWrite`` method. Calling this | |
| 4055 | + on a stream prevents ``QPDFWriter`` from attempting to | |
| 4056 | + uncompress, recompress, or otherwise filter a stream even if it | |
| 4057 | + could. Developers can use this to protect streams that are | |
| 4058 | + optimized should be protected from ``QPDFWriter``'s default | |
| 4059 | + behavior for any other reason. | |
| 4060 | + | |
| 4061 | + - Add ``ostream`` ``<<`` operator for ``QPDFObjGen``. This is | |
| 4062 | + useful to have for debugging. | |
| 4063 | + | |
| 4064 | + - Add method ``QPDFPageObjectHelper::flattenRotation``, which | |
| 4065 | + replaces a page's ``/Rotate`` keyword by rotating the page | |
| 4066 | + within the content stream and altering the page's bounding | |
| 4067 | + boxes so the rendering is the same. This can be used to work | |
| 4068 | + around buggy PDF readers that can't properly handle page | |
| 4069 | + rotation. | |
| 4070 | + | |
| 4071 | + - C API Enhancements | |
| 4072 | + | |
| 4073 | + - Add several new functions to the C API for working with | |
| 4074 | + objects. These are wrappers around many of the methods in | |
| 4075 | + ``QPDFObjectHandle``. Their inclusion adds considerable new | |
| 4076 | + capability to the C API. | |
| 4077 | + | |
| 4078 | + - Add ``qpdf_register_progress_reporter`` to the C API, | |
| 4079 | + corresponding to ``QPDFWriter::registerProgressReporter``. | |
| 4080 | + | |
| 4081 | + - Performance Enhancements | |
| 4082 | + | |
| 4083 | + - Improve steps ``QPDFWriter`` takes to prepare a ``QPDF`` object | |
| 4084 | + for writing, resulting in about an 8% improvement in write | |
| 4085 | + performance while allowing indirect objects to appear in | |
| 4086 | + ``/DecodeParms``. | |
| 4087 | + | |
| 4088 | + - When extracting pages, the @1@command@1@qpdf@2@command@2@ CLI | |
| 4089 | + only removes unreferenced resources from the pages that are | |
| 4090 | + being kept, resulting in a significant performance improvement | |
| 4091 | + when extracting small numbers of pages from large, complex | |
| 4092 | + documents. | |
| 4093 | + | |
| 4094 | + - Bug Fixes | |
| 4095 | + | |
| 4096 | + - ``QPDFPageObjectHelper::externalizeInlineImages`` was not | |
| 4097 | + externalizing images referenced from form XObjects that | |
| 4098 | + appeared on the page. | |
| 4099 | + | |
| 4100 | + - ``QPDFObjectHandle::filterPageContents`` was broken for pages | |
| 4101 | + with multiple content streams. | |
| 4102 | + | |
| 4103 | + - Tweak zsh completion code to behave a little better with | |
| 4104 | + respect to path completion. | |
| 4105 | + | |
| 4106 | +10.0.4: November 21, 2020 | |
| 4107 | + - Bug Fixes | |
| 4108 | + | |
| 4109 | + - Fix a handful of integer overflows. This includes cases found | |
| 4110 | + by fuzzing as well as having qpdf not do range checking on | |
| 4111 | + unused values in the xref stream. | |
| 4112 | + | |
| 4113 | +10.0.3: October 31, 2020 | |
| 4114 | + - Bug Fixes | |
| 4115 | + | |
| 4116 | + - The fix to the bug involving copying streams with indirect | |
| 4117 | + filters was incorrect and introduced a new, more serious bug. | |
| 4118 | + The original bug has been fixed correctly, as has the bug | |
| 4119 | + introduced in 10.0.2. | |
| 4120 | + | |
| 4121 | +10.0.2: October 27, 2020 | |
| 4122 | + - Bug Fixes | |
| 4123 | + | |
| 4124 | + - When concatenating content streams, as with | |
| 4125 | + @1@option@1@--coalesce-contents@2@option@2@, there were cases | |
| 4126 | + in which qpdf would merge two lexical tokens together, creating | |
| 4127 | + invalid results. A newline is now inserted between merged | |
| 4128 | + content streams if one is not already present. | |
| 4129 | + | |
| 4130 | + - Fix an internal error that could occur when copying foreign | |
| 4131 | + streams whose stream data had been replaced using a stream data | |
| 4132 | + provider if those streams had indirect filters or decode | |
| 4133 | + parameters. This is a rare corner case. | |
| 4134 | + | |
| 4135 | + - Ensure that the caller's locale settings do not change the | |
| 4136 | + results of numeric conversions performed internally by the qpdf | |
| 4137 | + library. Note that the problem here could only be caused when | |
| 4138 | + the qpdf library was used programmatically. Using the qpdf CLI | |
| 4139 | + already ignored the user's locale for numeric conversion. | |
| 4140 | + | |
| 4141 | + - Fix several instances in which warnings were not suppressed in | |
| 4142 | + spite of @1@option@1@--no-warn@2@option@2@ and/or errors or | |
| 4143 | + warnings were written to standard output rather than standard | |
| 4144 | + error. | |
| 4145 | + | |
| 4146 | + - Fixed a memory leak that could occur under specific | |
| 4147 | + circumstances when | |
| 4148 | + @1@option@1@--object-streams=generate@2@option@2@ was used. | |
| 4149 | + | |
| 4150 | + - Fix various integer overflows and similar conditions found by | |
| 4151 | + the OSS-Fuzz project. | |
| 4152 | + | |
| 4153 | + - Enhancements | |
| 4154 | + | |
| 4155 | + - New option @1@option@1@--warning-exit-0@2@option@2@ causes qpdf | |
| 4156 | + to exit with a status of ``0`` rather than ``3`` if there are | |
| 4157 | + warnings but no errors. Combine with | |
| 4158 | + @1@option@1@--no-warn@2@option@2@ to completely ignore | |
| 4159 | + warnings. | |
| 4160 | + | |
| 4161 | + - Performance improvements have been made to | |
| 4162 | + ``QPDF::processMemoryFile``. | |
| 4163 | + | |
| 4164 | + - The OpenSSL crypto provider produces more detailed error | |
| 4165 | + messages. | |
| 4166 | + | |
| 4167 | + - Build Changes | |
| 4168 | + | |
| 4169 | + - The option @1@option@1@--disable-rpath@2@option@2@ is now | |
| 4170 | + supported by qpdf's @1@command@1@./configure@2@command@2@ | |
| 4171 | + script. Some distributions' packaging standards recommended the | |
| 4172 | + use of this option. | |
| 4173 | + | |
| 4174 | + - Selection of a printf format string for ``long | |
| 4175 | + long`` has been moved from ``ifdefs`` to an autoconf | |
| 4176 | + test. If you are using your own build system, you will need to | |
| 4177 | + provide a value for ``LL_FMT`` in | |
| 4178 | + @1@filename@1@libqpdf/qpdf/qpdf-config.h@2@filename@2@, which | |
| 4179 | + would typically be ``"%lld"`` or, for some Windows compilers, | |
| 4180 | + ``"%I64d"``. | |
| 4181 | + | |
| 4182 | + - Several improvements were made to build-time configuration of | |
| 4183 | + the OpenSSL crypto provider. | |
| 4184 | + | |
| 4185 | + - A nearly stand-alone Linux binary zip file is now included with | |
| 4186 | + the qpdf release. This is built on an older (but supported) | |
| 4187 | + Ubuntu LTS release, but would work on most reasonably recent | |
| 4188 | + Linux distributions. It contains only the executables and | |
| 4189 | + required shared libraries that would not be present on a | |
| 4190 | + minimal system. It can be used for including qpdf in a minimal | |
| 4191 | + environment, such as a docker container. The zip file is also | |
| 4192 | + known to work as a layer in AWS Lambda. | |
| 4193 | + | |
| 4194 | + - QPDF's automated build has been migrated from Azure Pipelines | |
| 4195 | + to GitHub Actions. | |
| 4196 | + | |
| 4197 | + - Windows-specific Changes | |
| 4198 | + | |
| 4199 | + - The Windows executables distributed with qpdf releases now use | |
| 4200 | + the OpenSSL crypto provider by default. The native crypto | |
| 4201 | + provider is also compiled in and can be selected at runtime | |
| 4202 | + with the ``QPDF_CRYPTO_PROVIDER`` environment variable. | |
| 4203 | + | |
| 4204 | + - Improvements have been made to how a cryptographic provider is | |
| 4205 | + obtained in the native Windows crypto implementation. However | |
| 4206 | + mostly this is shadowed by OpenSSL being used by default. | |
| 4207 | + | |
| 4208 | +10.0.1: April 9, 2020 | |
| 4209 | + - Bug Fixes | |
| 4210 | + | |
| 4211 | + - 10.0.0 introduced a bug in which calling | |
| 4212 | + ``QPDFObjectHandle::getStreamData`` on a stream that can't be | |
| 4213 | + filtered was returning the raw data instead of throwing an | |
| 4214 | + exception. This is now fixed. | |
| 4215 | + | |
| 4216 | + - Fix a bug that was preventing qpdf from linking with some | |
| 4217 | + versions of clang on some platforms. | |
| 4218 | + | |
| 4219 | + - Enhancements | |
| 4220 | + | |
| 4221 | + - Improve the @1@filename@1@pdf-invert-images@2@filename@2@ | |
| 4222 | + example to avoid having to load all the images into RAM at the | |
| 4223 | + same time. | |
| 4224 | + | |
| 4225 | +10.0.0: April 6, 2020 | |
| 4226 | + - Performance Enhancements | |
| 4227 | + | |
| 4228 | + - The qpdf library and executable should run much faster in this | |
| 4229 | + version than in the last several releases. Several internal | |
| 4230 | + library optimizations have been made, and there has been | |
| 4231 | + improved behavior on page splitting as well. This version of | |
| 4232 | + qpdf should outperform any of the 8.x or 9.x versions. | |
| 4233 | + | |
| 4234 | + - Incompatible API (source-level) Changes (minor) | |
| 4235 | + | |
| 4236 | + - The ``QUtil::srandom`` method was removed. It didn't do | |
| 4237 | + anything unless insecure random numbers were compiled in, and | |
| 4238 | + they have been off by default for a long time. If you were | |
| 4239 | + calling it, just remove the call since it wasn't doing anything | |
| 4240 | + anyway. | |
| 4241 | + | |
| 4242 | + - Build/Packaging Changes | |
| 4243 | + | |
| 4244 | + - Add a ``openssl`` crypto provider, which is implemented with | |
| 4245 | + OpenSSL and also works with BoringSSL. Thanks to Dean Scarff | |
| 4246 | + for this contribution. If you maintain qpdf for a distribution, | |
| 4247 | + pay special attention to make sure that you are including | |
| 4248 | + support for the crypto providers you want. Package maintainers | |
| 4249 | + will have to weigh the advantages of allowing users to pick a | |
| 4250 | + crypto provider at runtime against the disadvantages of adding | |
| 4251 | + more dependencies to qpdf. | |
| 4252 | + | |
| 4253 | + - Allow qpdf to built on stripped down systems whose C/C++ | |
| 4254 | + libraries lack the ``wchar_t`` type. Search for ``wchar_t`` in | |
| 4255 | + qpdf's README.md for details. This should be very rare, but it | |
| 4256 | + is known to be helpful in some embedded environments. | |
| 4257 | + | |
| 4258 | + - CLI Enhancements | |
| 4259 | + | |
| 4260 | + - Add ``objectinfo`` key to the JSON output. This will be a place | |
| 4261 | + to put computed metadata or other information about PDF objects | |
| 4262 | + that are not immediately evident in other ways or that seem | |
| 4263 | + useful for some other reason. In this version, information is | |
| 4264 | + provided about each object indicating whether it is a stream | |
| 4265 | + and, if so, what its length and filters are. Without this, it | |
| 4266 | + was not possible to tell conclusively from the JSON output | |
| 4267 | + alone whether or not an object was a stream. Run | |
| 4268 | + @1@command@1@qpdf --json-help@2@command@2@ for details. | |
| 4269 | + | |
| 4270 | + - Add new option | |
| 4271 | + @1@option@1@--remove-unreferenced-resources@2@option@2@ which | |
| 4272 | + takes ``auto``, ``yes``, or ``no`` as arguments. The new | |
| 4273 | + ``auto`` mode, which is the default, performs a fast heuristic | |
| 4274 | + over a PDF file when splitting pages to determine whether the | |
| 4275 | + expensive process of finding and removing unreferenced | |
| 4276 | + resources is likely to be of benefit. For most files, this new | |
| 4277 | + default will result in a significant performance improvement | |
| 4278 | + for splitting pages. See `Advanced Transformation | |
| 4279 | + Options <#ref.advanced-transformation>`__ for a more detailed | |
| 4280 | + discussion. | |
| 4281 | + | |
| 4282 | + - The @1@option@1@--preserve-unreferenced-resources@2@option@2@ | |
| 4283 | + is now just a synonym for | |
| 4284 | + @1@option@1@--remove-unreferenced-resources=no@2@option@2@. | |
| 4285 | + | |
| 4286 | + - If the ``QPDF_EXECUTABLE`` environment variable is set when | |
| 4287 | + invoking @1@command@1@qpdf --bash-completion@2@command@2@ or | |
| 4288 | + @1@command@1@qpdf --zsh-completion@2@command@2@, the completion | |
| 4289 | + command that it outputs will refer to qpdf using the value of | |
| 4290 | + that variable rather than what @1@command@1@qpdf@2@command@2@ | |
| 4291 | + determines its executable path to be. This can be useful when | |
| 4292 | + wrapping @1@command@1@qpdf@2@command@2@ with a script, working | |
| 4293 | + with a version in the source tree, using an AppImage, or other | |
| 4294 | + situations where there is some indirection. | |
| 4295 | + | |
| 4296 | + - Library Enhancements | |
| 4297 | + | |
| 4298 | + - Random number generation is now delegated to the crypto | |
| 4299 | + provider. The old behavior is still used by the native crypto | |
| 4300 | + provider. It is still possible to provide your own random | |
| 4301 | + number generator. | |
| 4302 | + | |
| 4303 | + - Add a new version of | |
| 4304 | + ``QPDFObjectHandle::StreamDataProvider::provideStreamData`` | |
| 4305 | + that accepts the ``suppress_warnings`` and ``will_retry`` | |
| 4306 | + options and allows a success code to be returned. This makes it | |
| 4307 | + possible to implement a ``StreamDataProvider`` that calls | |
| 4308 | + ``pipeStreamData`` on another stream and to pass the response | |
| 4309 | + back to the caller, which enables better error handling on | |
| 4310 | + those proxied streams. | |
| 4311 | + | |
| 4312 | + - Update ``QPDFObjectHandle::pipeStreamData`` to return an | |
| 4313 | + overall success code that goes beyond whether or not filtered | |
| 4314 | + data was written successfully. This allows better error | |
| 4315 | + handling of cases that were not filtering errors. You have to | |
| 4316 | + call this explicitly. Methods in previously existing APIs have | |
| 4317 | + the same semantics as before. | |
| 4318 | + | |
| 4319 | + - The ``QPDFPageObjectHelper::placeFormXObject`` method now | |
| 4320 | + allows separate control over whether it should be willing to | |
| 4321 | + shrink or expand objects to fit them better into the | |
| 4322 | + destination rectangle. The previous behavior was that shrinking | |
| 4323 | + was allowed but expansion was not. The previous behavior is | |
| 4324 | + still the default. | |
| 4325 | + | |
| 4326 | + - When calling the C API, any non-zero value passed to a boolean | |
| 4327 | + parameter is treated as ``TRUE``. Previously only the value | |
| 4328 | + ``1`` was accepted. This makes the C API behave more like most | |
| 4329 | + C interfaces and is known to improve compatibility with some | |
| 4330 | + Windows environments that dynamically load the DLL and call | |
| 4331 | + functions from it. | |
| 4332 | + | |
| 4333 | + - Add ``QPDFObjectHandle::unsafeShallowCopy`` for copying only | |
| 4334 | + top-level dictionary keys or array items. This is unsafe | |
| 4335 | + because it creates a situation in which changing a lower-level | |
| 4336 | + item in one object may also change it in another object, but | |
| 4337 | + for cases in which you *know* you are only inserting or | |
| 4338 | + replacing top-level items, it is much faster than | |
| 4339 | + ``QPDFObjectHandle::shallowCopy``. | |
| 4340 | + | |
| 4341 | + - Add ``QPDFObjectHandle::filterAsContents``, which filter's a | |
| 4342 | + stream's data as a content stream. This is useful for parsing | |
| 4343 | + the contents for form XObjects in the same way as parsing page | |
| 4344 | + content streams. | |
| 4345 | + | |
| 4346 | + - Bug Fixes | |
| 4347 | + | |
| 4348 | + - When detecting and removing unreferenced resources during page | |
| 4349 | + splitting, traverse into form XObjects and handle their | |
| 4350 | + resources dictionaries as well. | |
| 4351 | + | |
| 4352 | + - The same error recovery is applied to streams in other than the | |
| 4353 | + primary input file when merging or splitting pages. | |
| 4354 | + | |
| 4355 | +9.1.1: January 26, 2020 | |
| 4356 | + - Build/Packaging Changes | |
| 4357 | + | |
| 4358 | + - The fix-qdf program was converted from perl to C++. As such, | |
| 4359 | + qpdf no longer has a runtime dependency on perl. | |
| 4360 | + | |
| 4361 | + - Library Enhancements | |
| 4362 | + | |
| 4363 | + - Added new helper routine ``QUtil::call_main_from_wmain`` which | |
| 4364 | + converts ``wchar_t`` arguments to UTF-8 encoded strings. This | |
| 4365 | + is useful for qpdf because library methods expect file names to | |
| 4366 | + be UTF-8 encoded, even on Windows | |
| 4367 | + | |
| 4368 | + - Added new ``QUtil::read_lines_from_file`` methods that take | |
| 4369 | + ``FILE*`` arguments and that allow preservation of end-of-line | |
| 4370 | + characters. This also fixes a bug where | |
| 4371 | + ``QUtil::read_lines_from_file`` wouldn't work properly with | |
| 4372 | + Unicode filenames. | |
| 4373 | + | |
| 4374 | + - CLI Enhancements | |
| 4375 | + | |
| 4376 | + - Added options @1@option@1@--is-encrypted@2@option@2@ and | |
| 4377 | + @1@option@1@--requires-password@2@option@2@ for testing whether | |
| 4378 | + a file is encrypted or requires a password other than the | |
| 4379 | + supplied (or empty) password. These communicate via exit | |
| 4380 | + status, making them useful for shell scripts. They also work on | |
| 4381 | + encrypted files with unknown passwords. | |
| 4382 | + | |
| 4383 | + - Added ``encrypt`` key to JSON options. With the exception of | |
| 4384 | + the reconstructed user password for older encryption formats, | |
| 4385 | + this provides the same information as | |
| 4386 | + @1@option@1@--show-encryption@2@option@2@ but in a consistent, | |
| 4387 | + parseable format. See output of @1@command@1@qpdf | |
| 4388 | + --json-help@2@command@2@ for details. | |
| 4389 | + | |
| 4390 | + - Bug Fixes | |
| 4391 | + | |
| 4392 | + - In QDF mode, be sure not to write more than one XRef stream to | |
| 4393 | + a file, even when | |
| 4394 | + @1@option@1@--preserve-unreferenced@2@option@2@ is used. | |
| 4395 | + @1@command@1@fix-qdf@2@command@2@ assumes that there is only | |
| 4396 | + one XRef stream, and that it appears at the end of the file. | |
| 4397 | + | |
| 4398 | + - When externalizing inline images, properly handle images whose | |
| 4399 | + color space is a reference to an object in the page's resource | |
| 4400 | + dictionary. | |
| 4401 | + | |
| 4402 | + - Windows-specific fix for acquiring crypt context with a new | |
| 4403 | + keyset. | |
| 4404 | + | |
| 4405 | +9.1.0: November 17, 2019 | |
| 4406 | + - Build Changes | |
| 4407 | + | |
| 4408 | + - A C++-11 compiler is now required to build qpdf. | |
| 4409 | + | |
| 4410 | + - A new crypto provider that uses gnutls for crypto functions is | |
| 4411 | + now available and can be enabled at build time. See `Crypto | |
| 4412 | + Providers <#ref.crypto>`__ for more information about crypto | |
| 4413 | + providers and `Build Support For Crypto | |
| 4414 | + Providers <#ref.crypto.build>`__ for specific information about | |
| 4415 | + the build. | |
| 4416 | + | |
| 4417 | + - Library Enhancements | |
| 4418 | + | |
| 4419 | + - Incorporate contribution from Masamichi Hosoda to properly | |
| 4420 | + handle signature dictionaries by not including them in object | |
| 4421 | + streams, formatting the ``Contents`` key has a hexadecimal | |
| 4422 | + string, and excluding the ``/Contents`` key from encryption and | |
| 4423 | + decryption. | |
| 4424 | + | |
| 4425 | + - Incorporate contribution from Masamichi Hosoda to provide new | |
| 4426 | + API calls for getting file-level information about input and | |
| 4427 | + output files, enabling certain operations on the files at the | |
| 4428 | + file level rather than the object level. New methods include | |
| 4429 | + ``QPDF::getXRefTable()``, | |
| 4430 | + ``QPDFObjectHandle::getParsedOffset()``, | |
| 4431 | + ``QPDFWriter::getRenumberedObjGen(QPDFObjGen)``, and | |
| 4432 | + ``QPDFWriter::getWrittenXRefTable()``. | |
| 4433 | + | |
| 4434 | + - Support build-time and runtime selectable crypto providers. | |
| 4435 | + This includes the addition of new classes | |
| 4436 | + ``QPDFCryptoProvider`` and ``QPDFCryptoImpl`` and the | |
| 4437 | + recognition of the ``QPDF_CRYPTO_PROVIDER`` environment | |
| 4438 | + variable. Crypto providers are described in depth in `Crypto | |
| 4439 | + Providers <#ref.crypto>`__. | |
| 4440 | + | |
| 4441 | + - CLI Enhancements | |
| 4442 | + | |
| 4443 | + - Addition of the @1@option@1@--show-crypto@2@option@2@ option in | |
| 4444 | + support of selectable crypto providers, as described in `Crypto | |
| 4445 | + Providers <#ref.crypto>`__. | |
| 4446 | + | |
| 4447 | + - Allow ``:even`` or ``:odd`` to be appended to numeric ranges | |
| 4448 | + for specification of the even or odd pages from among the pages | |
| 4449 | + specified in the range. | |
| 4450 | + | |
| 4451 | + - Fix shell wildcard expansion behavior (``*`` and ``?``) of the | |
| 4452 | + @1@command@1@qpdf.exe@2@command@2@ as built my MSVC. | |
| 4453 | + | |
| 4454 | +9.0.2: October 12, 2019 | |
| 4455 | + - Bug Fix | |
| 4456 | + | |
| 4457 | + - Fix the name of the temporary file used by | |
| 4458 | + @1@option@1@--replace-input@2@option@2@ so that it doesn't | |
| 4459 | + require path splitting and works with paths include | |
| 4460 | + directories. | |
| 4461 | + | |
| 4462 | +9.0.1: September 20, 2019 | |
| 4463 | + - Bug Fixes/Enhancements | |
| 4464 | + | |
| 4465 | + - Fix some build and test issues on big-endian systems and | |
| 4466 | + compilers with characters that are unsigned by default. The | |
| 4467 | + problems were in build and test only. There were no actual bugs | |
| 4468 | + in the qpdf library itself relating to endianness or unsigned | |
| 4469 | + characters. | |
| 4470 | + | |
| 4471 | + - When a dictionary has a duplicated key, report this with a | |
| 4472 | + warning. The behavior of the library in this case is unchanged, | |
| 4473 | + but the error condition is no longer silently ignored. | |
| 4474 | + | |
| 4475 | + - When a form field's display rectangle is erroneously specified | |
| 4476 | + with inverted coordinates, detect and correct this situation. | |
| 4477 | + This avoids some form fields from being flipped when flattening | |
| 4478 | + annotations on files with this condition. | |
| 4479 | + | |
| 4480 | +9.0.0: August 31, 2019 | |
| 4481 | + - Incompatible API (source-level) Changes (minor) | |
| 4482 | + | |
| 4483 | + - The method ``QUtil::strcasecmp`` has been renamed to | |
| 4484 | + ``QUtil::str_compare_nocase``. This incompatible change is | |
| 4485 | + necessary to enable qpdf to build on platforms that define | |
| 4486 | + ``strcasecmp`` as a macro. | |
| 4487 | + | |
| 4488 | + - The ``QPDF::copyForeignObject`` method had an overloaded | |
| 4489 | + version that took a boolean parameter that was not used. If you | |
| 4490 | + were using this version, just omit the extra parameter. | |
| 4491 | + | |
| 4492 | + - There was a version ``QPDFTokenizer::expectInlineImage`` that | |
| 4493 | + took no arguments. This version has been removed since it | |
| 4494 | + caused the tokenizer to return incorrect inline images. A new | |
| 4495 | + version was added some time ago that produces correct output. | |
| 4496 | + This is a very low level method that doesn't make sense to call | |
| 4497 | + outside of qpdf's lexical engine. There are higher level | |
| 4498 | + methods for tokenizing content streams. | |
| 4499 | + | |
| 4500 | + - Change ``QPDFOutlineDocumentHelper::getTopLevelOutlines`` and | |
| 4501 | + ``QPDFOutlineObjectHelper::getKids`` to return a | |
| 4502 | + ``std::vector`` instead of a ``std::list`` of | |
| 4503 | + ``QPDFOutlineObjectHelper`` objects. | |
| 4504 | + | |
| 4505 | + - Remove method ``QPDFTokenizer::allowPoundAnywhereInName``. This | |
| 4506 | + function would allow creation of name tokens whose value would | |
| 4507 | + change when unparsed, which is never the correct behavior. | |
| 4508 | + | |
| 4509 | + - CLI Enhancements | |
| 4510 | + | |
| 4511 | + - The @1@option@1@--replace-input@2@option@2@ option may be given | |
| 4512 | + in place of an output file name. This causes qpdf to overwrite | |
| 4513 | + the input file with the output. See the description of | |
| 4514 | + @1@option@1@--replace-input@2@option@2@ in `Basic | |
| 4515 | + Options <#ref.basic-options>`__ for more details. | |
| 4516 | + | |
| 4517 | + - The @1@option@1@--recompress-flate@2@option@2@ instructs | |
| 4518 | + @1@command@1@qpdf@2@command@2@ to recompress streams that are | |
| 4519 | + already compressed with ``/FlateDecode``. Useful with | |
| 4520 | + @1@option@1@--compression-level@2@option@2@. | |
| 4521 | + | |
| 4522 | + - The | |
| 4523 | + @1@option@1@--compression-level=@1@replaceable@1@level@2@replaceable@2@@2@option@2@ | |
| 4524 | + sets the zlib compression level used for any streams compressed | |
| 4525 | + by ``/FlateDecode``. Most effective when combined with | |
| 4526 | + @1@option@1@--recompress-flate@2@option@2@. | |
| 4527 | + | |
| 4528 | + - Library Enhancements | |
| 4529 | + | |
| 4530 | + - A new namespace ``QIntC``, provided by | |
| 4531 | + @1@filename@1@qpdf/QIntC.hh@2@filename@2@, provides safe | |
| 4532 | + conversion methods between different integer types. These | |
| 4533 | + conversion methods do range checking to ensure that the cast | |
| 4534 | + can be performed with no loss of information. Every use of | |
| 4535 | + ``static_cast`` in the library was inspected to see if it could | |
| 4536 | + use one of these safe converters instead. See `Casting | |
| 4537 | + Policy <#ref.casting>`__ for additional details. | |
| 4538 | + | |
| 4539 | + - Method ``QPDF::anyWarnings`` tells whether there have been any | |
| 4540 | + warnings without clearing the list of warnings. | |
| 4541 | + | |
| 4542 | + - Method ``QPDF::closeInputSource`` closes or otherwise releases | |
| 4543 | + the input source. This enables the input file to be deleted or | |
| 4544 | + renamed. | |
| 4545 | + | |
| 4546 | + - New methods have been added to ``QUtil`` for converting back | |
| 4547 | + and forth between strings and unsigned integers: | |
| 4548 | + ``uint_to_string``, ``uint_to_string_base``, | |
| 4549 | + ``string_to_uint``, and ``string_to_ull``. | |
| 4550 | + | |
| 4551 | + - New methods have been added to ``QPDFObjectHandle`` that return | |
| 4552 | + the value of ``Integer`` objects as ``int`` or ``unsigned int`` | |
| 4553 | + with range checking and sensible fallback values, and a new | |
| 4554 | + method was added to return an unsigned value. This makes it | |
| 4555 | + easier to write code that is safe from unintentional data loss. | |
| 4556 | + Functions: ``getUIntValue``, ``getIntValueAsInt``, | |
| 4557 | + ``getUIntValueAsUInt``. | |
| 4558 | + | |
| 4559 | + - When parsing content streams with | |
| 4560 | + ``QPDFObjectHandle::ParserCallbacks``, in place of the method | |
| 4561 | + ``handleObject(QPDFObjectHandle)``, the developer may override | |
| 4562 | + ``handleObject(QPDFObjectHandle, size_t offset, | |
| 4563 | + size_t length)``. If this method is defined, it will | |
| 4564 | + be invoked with the object along with its offset and length | |
| 4565 | + within the overall contents being parsed. Intervening spaces | |
| 4566 | + and comments are not included in offset and length. | |
| 4567 | + Additionally, a new method ``contentSize(size_t)`` may be | |
| 4568 | + implemented. If present, it will be called prior to the first | |
| 4569 | + call to ``handleObject`` with the total size in bytes of the | |
| 4570 | + combined contents. | |
| 4571 | + | |
| 4572 | + - New methods ``QPDF::userPasswordMatched`` and | |
| 4573 | + ``QPDF::ownerPasswordMatched`` have been added to enable a | |
| 4574 | + caller to determine whether the supplied password was the user | |
| 4575 | + password, the owner password, or both. This information is also | |
| 4576 | + displayed by @1@command@1@qpdf --show-encryption@2@command@2@ | |
| 4577 | + and @1@command@1@qpdf --check@2@command@2@. | |
| 4578 | + | |
| 4579 | + - Static method ``Pl_Flate::setCompressionLevel`` can be called | |
| 4580 | + to set the zlib compression level globally used by all | |
| 4581 | + instances of Pl_Flate in deflate mode. | |
| 4582 | + | |
| 4583 | + - The method ``QPDFWriter::setRecompressFlate`` can be called to | |
| 4584 | + tell ``QPDFWriter`` to uncompress and recompress streams | |
| 4585 | + already compressed with ``/FlateDecode``. | |
| 4586 | + | |
| 4587 | + - The underlying implementation of QPDF arrays has been enhanced | |
| 4588 | + to be much more memory efficient when dealing with arrays with | |
| 4589 | + lots of nulls. This enables qpdf to use drastically less memory | |
| 4590 | + for certain types of files. | |
| 4591 | + | |
| 4592 | + - When traversing the pages tree, if nodes are encountered with | |
| 4593 | + invalid types, the types are fixed, and a warning is issued. | |
| 4594 | + | |
| 4595 | + - A new helper method ``QUtil::read_file_into_memory`` was added. | |
| 4596 | + | |
| 4597 | + - All conditions previously reported by | |
| 4598 | + ``QPDF::checkLinearization()`` as errors are now presented as | |
| 4599 | + warnings. | |
| 4600 | + | |
| 4601 | + - Name tokens containing the ``#`` character not preceded by two | |
| 4602 | + hexadecimal digits, which is invalid in PDF 1.2 and above, are | |
| 4603 | + properly handled by the library: a warning is generated, and | |
| 4604 | + the name token is properly preserved, even if invalid, in the | |
| 4605 | + output. See @1@filename@1@ChangeLog@2@filename@2@ for a more | |
| 4606 | + complete description of this change. | |
| 4607 | + | |
| 4608 | + - Bug Fixes | |
| 4609 | + | |
| 4610 | + - A small handful of memory issues, assertion failures, and | |
| 4611 | + unhandled exceptions that could occur on badly mangled input | |
| 4612 | + files have been fixed. Most of these problems were found by | |
| 4613 | + Google's OSS-Fuzz project. | |
| 4614 | + | |
| 4615 | + - When @1@command@1@qpdf --check@2@command@2@ or | |
| 4616 | + @1@command@1@qpdf --check-linearization@2@command@2@ encounters | |
| 4617 | + a file with linearization warnings but not errors, it now | |
| 4618 | + properly exits with exit code 3 instead of 2. | |
| 4619 | + | |
| 4620 | + - The @1@option@1@--completion-bash@2@option@2@ and | |
| 4621 | + @1@option@1@--completion-zsh@2@option@2@ options now work | |
| 4622 | + properly when qpdf is invoked as an AppImage. | |
| 4623 | + | |
| 4624 | + - Calling ``QPDFWriter::set*EncryptionParameters`` on a | |
| 4625 | + ``QPDFWriter`` object whose output filename has not yet been | |
| 4626 | + set no longer produces a segmentation fault. | |
| 4627 | + | |
| 4628 | + - When reading encrypted files, follow the spec more closely | |
| 4629 | + regarding encryption key length. This allows qpdf to open | |
| 4630 | + encrypted files in most cases when they have invalid or missing | |
| 4631 | + /Length keys in the encryption dictionary. | |
| 4632 | + | |
| 4633 | + - Build Changes | |
| 4634 | + | |
| 4635 | + - On platforms that support it, qpdf now builds with | |
| 4636 | + @1@option@1@-fvisibility=hidden@2@option@2@. If you build qpdf | |
| 4637 | + with your own build system, this is now safe to use. This | |
| 4638 | + prevents methods that are not part of the public API from being | |
| 4639 | + exported by the shared library, and makes qpdf's ELF shared | |
| 4640 | + libraries (used on Linux, MacOS, and most other UNIX flavors) | |
| 4641 | + behave more like the Windows DLL. Since the DLL already behaves | |
| 4642 | + in much this way, it is unlikely that there are any methods | |
| 4643 | + that were accidentally not exported. However, with ELF shared | |
| 4644 | + libraries, typeinfo for some classes has to be explicitly | |
| 4645 | + exported. If there are problems in dynamically linked code | |
| 4646 | + catching exceptions or subclassing, this could be the reason. | |
| 4647 | + If you see this, please report a bug at | |
| 4648 | + https://github.com/qpdf/qpdf/issues/. | |
| 4649 | + | |
| 4650 | + - QPDF is now compiled with integer conversion and sign | |
| 4651 | + conversion warnings enabled. Numerous changes were made to the | |
| 4652 | + library to make this safe. | |
| 4653 | + | |
| 4654 | + - QPDF's @1@command@1@make install@2@command@2@ target explicitly | |
| 4655 | + specifies the mode to use when installing files instead of | |
| 4656 | + relying the user's umask. It was previously doing this for some | |
| 4657 | + files but not others. | |
| 4658 | + | |
| 4659 | + - If @1@command@1@pkg-config@2@command@2@ is available, use it to | |
| 4660 | + locate @1@filename@1@libjpeg@2@filename@2@ and | |
| 4661 | + @1@filename@1@zlib@2@filename@2@ dependencies, falling back on | |
| 4662 | + old behavior if unsuccessful. | |
| 4663 | + | |
| 4664 | + - Other Notes | |
| 4665 | + | |
| 4666 | + - QPDF has been fully integrated into `Google's OSS-Fuzz | |
| 4667 | + project <https://github.com/google/oss-fuzz>`__. This project | |
| 4668 | + exercises code with randomly mutated inputs and is great for | |
| 4669 | + discovering hidden security crashes and security issues. | |
| 4670 | + Several bugs found by oss-fuzz have already been fixed in qpdf. | |
| 4671 | + | |
| 4672 | +8.4.2: May 18, 2019 | |
| 4673 | + This release has just one change: correction of a buffer overrun in | |
| 4674 | + the Windows code used to open files. Windows users should take this | |
| 4675 | + update. There are no code changes that affect non-Windows releases. | |
| 4676 | + | |
| 4677 | +8.4.1: April 27, 2019 | |
| 4678 | + - Enhancements | |
| 4679 | + | |
| 4680 | + - When @1@command@1@qpdf --version@2@command@2@ is run, it will | |
| 4681 | + detect if the qpdf CLI was built with a different version of | |
| 4682 | + qpdf than the library, which may indicate a problem with the | |
| 4683 | + installation. | |
| 4684 | + | |
| 4685 | + - New option @1@option@1@--remove-page-labels@2@option@2@ will | |
| 4686 | + remove page labels before generating output. This used to | |
| 4687 | + happen if you ran @1@command@1@qpdf --empty --pages .. | |
| 4688 | + --@2@command@2@, but the behavior changed in qpdf 8.3.0. This | |
| 4689 | + option enables people who were relying on the old behavior to | |
| 4690 | + get it again. | |
| 4691 | + | |
| 4692 | + - New option | |
| 4693 | + @1@option@1@--keep-files-open-threshold=@1@replaceable@1@count@2@replaceable@2@@2@option@2@ | |
| 4694 | + can be used to override number of files that qpdf will use to | |
| 4695 | + trigger the behavior of not keeping all files open when merging | |
| 4696 | + files. This may be necessary if your system allows fewer than | |
| 4697 | + the default value of 200 files to be open at the same time. | |
| 4698 | + | |
| 4699 | + - Bug Fixes | |
| 4700 | + | |
| 4701 | + - Handle Unicode characters in filenames on Windows. The changes | |
| 4702 | + to support Unicode on the CLI in Windows broke Unicode | |
| 4703 | + filenames for Windows. | |
| 4704 | + | |
| 4705 | + - Slightly tighten logic that determines whether an object is a | |
| 4706 | + page. This should resolve problems in some rare files where | |
| 4707 | + some non-page objects were passing qpdf's test for whether | |
| 4708 | + something was a page, thus causing them to be erroneously lost | |
| 4709 | + during page splitting operations. | |
| 4710 | + | |
| 4711 | + - Revert change that included preservation of outlines | |
| 4712 | + (bookmarks) in @1@option@1@--split-pages@2@option@2@. The way | |
| 4713 | + it was implemented in 8.3.0 and 8.4.0 caused a very significant | |
| 4714 | + degradation of performance for splitting certain files. A | |
| 4715 | + future release of qpdf may re-introduce the behavior in a more | |
| 4716 | + performant and also more correct fashion. | |
| 4717 | + | |
| 4718 | + - In JSON mode, add missing leading 0 to decimal values between | |
| 4719 | + -1 and 1 even if not present in the input. The JSON | |
| 4720 | + specification requires the leading 0. The PDF specification | |
| 4721 | + does not. | |
| 4722 | + | |
| 4723 | +8.4.0: February 1, 2019 | |
| 4724 | + - Command-line Enhancements | |
| 4725 | + | |
| 4726 | + - *Non-compatible CLI change:* The qpdf command-line tool | |
| 4727 | + interprets passwords given at the command-line differently from | |
| 4728 | + previous releases when the passwords contain non-ASCII | |
| 4729 | + characters. In some cases, the behavior differs from previous | |
| 4730 | + releases. For a discussion of the current behavior, please see | |
| 4731 | + `Unicode Passwords <#ref.unicode-passwords>`__. The | |
| 4732 | + incompatibilities are as follows: | |
| 4733 | + | |
| 4734 | + - On Windows, qpdf now receives all command-line options as | |
| 4735 | + Unicode strings if it can figure out the appropriate | |
| 4736 | + compile/link options. This is enabled at least for MSVC and | |
| 4737 | + mingw builds. That means that if non-ASCII strings are | |
| 4738 | + passed to the qpdf CLI in Windows, qpdf will now correctly | |
| 4739 | + receive them. In the past, they would have either been | |
| 4740 | + encoded as Windows code page 1252 (also known as "Windows | |
| 4741 | + ANSI" or as something unintelligible. In almost all cases, | |
| 4742 | + qpdf is able to properly interpret Unicode arguments now, | |
| 4743 | + whereas in the past, it would almost never interpret them | |
| 4744 | + properly. The result is that non-ASCII passwords given to | |
| 4745 | + the qpdf CLI on Windows now have a much greater chance of | |
| 4746 | + creating PDF files that can be opened by a variety of | |
| 4747 | + readers. In the past, usually files encrypted from the | |
| 4748 | + Windows CLI using non-ASCII passwords would not be readable | |
| 4749 | + by most viewers. Note that the current version of qpdf is | |
| 4750 | + able to decrypt files that it previously created using the | |
| 4751 | + previously supplied password. | |
| 4752 | + | |
| 4753 | + - The PDF specification requires passwords to be encoded as | |
| 4754 | + UTF-8 for 256-bit encryption and with PDF Doc encoding for | |
| 4755 | + 40-bit or 128-bit encryption. Older versions of qpdf left it | |
| 4756 | + up to the user to provide passwords with the correct | |
| 4757 | + encoding. The qpdf CLI now detects when a password is given | |
| 4758 | + with UTF-8 encoding and automatically transcodes it to what | |
| 4759 | + the PDF spec requires. While this is almost always the | |
| 4760 | + correct behavior, it is possible to override the behavior if | |
| 4761 | + there is some reason to do so. This is discussed in more | |
| 4762 | + depth in `Unicode Passwords <#ref.unicode-passwords>`__. | |
| 4763 | + | |
| 4764 | + - New options | |
| 4765 | + @1@option@1@--externalize-inline-images@2@option@2@, | |
| 4766 | + @1@option@1@--ii-min-bytes@2@option@2@, and | |
| 4767 | + @1@option@1@--keep-inline-images@2@option@2@ control qpdf's | |
| 4768 | + handling of inline images and possible conversion of them to | |
| 4769 | + regular images. By default, | |
| 4770 | + @1@option@1@--optimize-images@2@option@2@ now also applies to | |
| 4771 | + inline images. These options are discussed in `Advanced | |
| 4772 | + Transformation Options <#ref.advanced-transformation>`__. | |
| 4773 | + | |
| 4774 | + - Add options @1@option@1@--overlay@2@option@2@ and | |
| 4775 | + @1@option@1@--underlay@2@option@2@ for overlaying or | |
| 4776 | + underlaying pages of other files onto output pages. See | |
| 4777 | + `Overlay and Underlay Options <#ref.overlay-underlay>`__ for | |
| 4778 | + details. | |
| 4779 | + | |
| 4780 | + - When opening an encrypted file with a password, if the | |
| 4781 | + specified password doesn't work and the password contains any | |
| 4782 | + non-ASCII characters, qpdf will try a number of alternative | |
| 4783 | + passwords to try to compensate for possible character encoding | |
| 4784 | + errors. This behavior can be suppressed with the | |
| 4785 | + @1@option@1@--suppress-password-recovery@2@option@2@ option. | |
| 4786 | + See `Unicode Passwords <#ref.unicode-passwords>`__ for a full | |
| 4787 | + discussion. | |
| 4788 | + | |
| 4789 | + - Add the @1@option@1@--password-mode@2@option@2@ option to | |
| 4790 | + fine-tune how qpdf interprets password arguments, especially | |
| 4791 | + when they contain non-ASCII characters. See `Unicode | |
| 4792 | + Passwords <#ref.unicode-passwords>`__ for more information. | |
| 4793 | + | |
| 4794 | + - In the @1@option@1@--pages@2@option@2@ option, it is now | |
| 4795 | + possible to copy the same page more than once from the same | |
| 4796 | + file without using the previous workaround of specifying two | |
| 4797 | + different paths to the same file. | |
| 4798 | + | |
| 4799 | + - In the @1@option@1@--pages@2@option@2@ option, allow use of "." | |
| 4800 | + as a shortcut for the primary input file. That way, you can do | |
| 4801 | + @1@command@1@qpdf in.pdf --pages . 1-2 -- out.pdf@2@command@2@ | |
| 4802 | + instead of having to repeat @1@filename@1@in.pdf@2@filename@2@ | |
| 4803 | + in the command. | |
| 4804 | + | |
| 4805 | + - When encrypting with 128-bit and 256-bit encryption, new | |
| 4806 | + encryption options @1@option@1@--assemble@2@option@2@, | |
| 4807 | + @1@option@1@--annotate@2@option@2@, | |
| 4808 | + @1@option@1@--form@2@option@2@, and | |
| 4809 | + @1@option@1@--modify-other@2@option@2@ allow more fine-grained | |
| 4810 | + granularity in configuring options. Before, the | |
| 4811 | + @1@option@1@--modify@2@option@2@ option only configured certain | |
| 4812 | + predefined groups of permissions. | |
| 4813 | + | |
| 4814 | + - Bug Fixes and Enhancements | |
| 4815 | + | |
| 4816 | + - *Potential data-loss bug:* Versions of qpdf between 8.1.0 and | |
| 4817 | + 8.3.0 had a bug that could cause page splitting and merging | |
| 4818 | + operations to drop some font or image resources if the PDF | |
| 4819 | + file's internal structure shared these resource lists across | |
| 4820 | + pages and if some but not all of the pages in the output did | |
| 4821 | + not reference all the fonts and images. Using the | |
| 4822 | + @1@option@1@--preserve-unreferenced-resources@2@option@2@ | |
| 4823 | + option would work around the incorrect behavior. This bug was | |
| 4824 | + the result of a typo in the code and a deficiency in the test | |
| 4825 | + suite. The case that triggered the error was known, just not | |
| 4826 | + handled properly. This case is now exercised in qpdf's test | |
| 4827 | + suite and properly handled. | |
| 4828 | + | |
| 4829 | + - When optimizing images, detect and refuse to optimize images | |
| 4830 | + that can't be converted to JPEG because of bit depth or color | |
| 4831 | + space. | |
| 4832 | + | |
| 4833 | + - Linearization and page manipulation APIs now detect and recover | |
| 4834 | + from files that have duplicate Page objects in the pages tree. | |
| 4835 | + | |
| 4836 | + - Using older option | |
| 4837 | + @1@option@1@--stream-data=compress@2@option@2@ with object | |
| 4838 | + streams, object streams and xref streams were not compressed. | |
| 4839 | + | |
| 4840 | + - When the tokenizer returns inline image tokens, delimiters | |
| 4841 | + following ``ID`` and ``EI`` operators are no longer excluded. | |
| 4842 | + This makes it possible to reliably extract the actual image | |
| 4843 | + data. | |
| 4844 | + | |
| 4845 | + - Library Enhancements | |
| 4846 | + | |
| 4847 | + - Add method ``QPDFPageObjectHelper::externalizeInlineImages`` to | |
| 4848 | + convert inline images to regular images. | |
| 4849 | + | |
| 4850 | + - Add method ``QUtil::possible_repaired_encodings()`` to generate | |
| 4851 | + a list of strings that represent other ways the given string | |
| 4852 | + could have been encoded. This is the method the QPDF CLI uses | |
| 4853 | + to generate the strings it tries when recovering incorrectly | |
| 4854 | + encoded Unicode passwords. | |
| 4855 | + | |
| 4856 | + - Add new versions of | |
| 4857 | + ``QPDFWriter::setR{3,4,5,6}EncryptionParameters`` that allow | |
| 4858 | + more granular setting of permissions bits. See | |
| 4859 | + @1@filename@1@QPDFWriter.hh@2@filename@2@ for details. | |
| 4860 | + | |
| 4861 | + - Add new versions of the transcoders from UTF-8 to single-byte | |
| 4862 | + coding systems in ``QUtil`` that report success or failure | |
| 4863 | + rather than just substituting a specified unknown character. | |
| 4864 | + | |
| 4865 | + - Add method ``QUtil::analyze_encoding()`` to determine whether a | |
| 4866 | + string has high-bit characters and is appears to be UTF-16 or | |
| 4867 | + valid UTF-8 encoding. | |
| 4868 | + | |
| 4869 | + - Add new method ``QPDFPageObjectHelper::shallowCopyPage()`` to | |
| 4870 | + copy a new page that is a "shallow copy" of a page. The | |
| 4871 | + resulting object is an indirect object ready to be passed to | |
| 4872 | + ``QPDFPageDocumentHelper::addPage()`` for either the original | |
| 4873 | + ``QPDF`` object or a different one. This is what the | |
| 4874 | + @1@command@1@qpdf@2@command@2@ command-line tool uses to copy | |
| 4875 | + the same page multiple times from the same file during | |
| 4876 | + splitting and merging operations. | |
| 4877 | + | |
| 4878 | + - Add method ``QPDF::getUniqueId()``, which returns a unique | |
| 4879 | + identifier for the given QPDF object. The identifier will be | |
| 4880 | + unique across the life of the application. The returned value | |
| 4881 | + can be safely used as a map key. | |
| 4882 | + | |
| 4883 | + - Add method ``QPDF::setImmediateCopyFrom``. This further | |
| 4884 | + enhances qpdf's ability to allow a ``QPDF`` object from which | |
| 4885 | + objects are being copied to go out of scope before the | |
| 4886 | + destination object is written. If you call this method on a | |
| 4887 | + ``QPDF`` instances, objects copied *from* this instance will be | |
| 4888 | + copied immediately instead of lazily. This option uses more | |
| 4889 | + memory but allows the source object to go out of scope before | |
| 4890 | + the destination object is written in all cases. See comments in | |
| 4891 | + @1@filename@1@QPDF.hh@2@filename@2@ for details. | |
| 4892 | + | |
| 4893 | + - Add method ``QPDFPageObjectHelper::getAttribute`` for | |
| 4894 | + retrieving an attribute from the page dictionary taking | |
| 4895 | + inheritance into consideration, and optionally making a copy if | |
| 4896 | + your intention is to modify the attribute. | |
| 4897 | + | |
| 4898 | + - Fix long-standing limitation of | |
| 4899 | + ``QPDFPageObjectHelper::getPageImages`` so that it now properly | |
| 4900 | + reports images from inherited resources dictionaries, | |
| 4901 | + eliminating the need to call | |
| 4902 | + ``QPDFPageDocumentHelper::pushInheritedAttributesToPage`` in | |
| 4903 | + this case. | |
| 4904 | + | |
| 4905 | + - Add method ``QPDFObjectHandle::getUniqueResourceName`` for | |
| 4906 | + finding an unused name in a resource dictionary. | |
| 4907 | + | |
| 4908 | + - Add method ``QPDFPageObjectHelper::getFormXObjectForPage`` for | |
| 4909 | + generating a form XObject equivalent to a page. The resulting | |
| 4910 | + object can be used in the same file or copied to another file | |
| 4911 | + with ``copyForeignObject``. This can be useful for implementing | |
| 4912 | + underlay, overlay, n-up, thumbnails, or any other functionality | |
| 4913 | + requiring replication of pages in other contexts. | |
| 4914 | + | |
| 4915 | + - Add method ``QPDFPageObjectHelper::placeFormXObject`` for | |
| 4916 | + generating content stream text that places a given form XObject | |
| 4917 | + on a page, centered and fit within a specified rectangle. This | |
| 4918 | + method takes care of computing the proper transformation matrix | |
| 4919 | + and may optionally compensate for rotation or scaling of the | |
| 4920 | + destination page. | |
| 4921 | + | |
| 4922 | + - Build Improvements | |
| 4923 | + | |
| 4924 | + - Add new configure option | |
| 4925 | + @1@option@1@--enable-avoid-windows-handle@2@option@2@, which | |
| 4926 | + causes the preprocessor symbol ``AVOID_WINDOWS_HANDLE`` to be | |
| 4927 | + defined. When defined, qpdf will avoid referencing the Windows | |
| 4928 | + ``HANDLE`` type, which is disallowed with certain versions of | |
| 4929 | + the Windows SDK. | |
| 4930 | + | |
| 4931 | + - For Windows builds, attempt to determine what options, if any, | |
| 4932 | + have to be passed to the compiler and linker to enable use of | |
| 4933 | + ``wmain``. This causes the preprocessor symbol | |
| 4934 | + ``WINDOWS_WMAIN`` to be defined. If you do your own builds with | |
| 4935 | + other compilers, you can define this symbol to cause ``wmain`` | |
| 4936 | + to be used. This is needed to allow the Windows | |
| 4937 | + @1@command@1@qpdf@2@command@2@ command to receive Unicode | |
| 4938 | + command-line options. | |
| 4939 | + | |
| 4940 | +8.3.0: January 7, 2019 | |
| 4941 | + - Command-line Enhancements | |
| 4942 | + | |
| 4943 | + - Shell completion: you can now use eval @1@command@1@$(qpdf | |
| 4944 | + --completion-bash)@2@command@2@ and eval @1@command@1@$(qpdf | |
| 4945 | + --completion-zsh)@2@command@2@ to enable shell completion for | |
| 4946 | + bash and zsh. | |
| 4947 | + | |
| 4948 | + - Page numbers (also known as page labels) are now preserved when | |
| 4949 | + merging and splitting files with the | |
| 4950 | + @1@option@1@--pages@2@option@2@ and | |
| 4951 | + @1@option@1@--split-pages@2@option@2@ options. | |
| 4952 | + | |
| 4953 | + - Bookmarks are partially preserved when splitting pages with the | |
| 4954 | + @1@option@1@--split-pages@2@option@2@ option. Specifically, the | |
| 4955 | + outlines dictionary and some supporting metadata are copied | |
| 4956 | + into the split files. The result is that all bookmarks from the | |
| 4957 | + original file appear, those that point to pages that are | |
| 4958 | + preserved work, and those that point to pages that are not | |
| 4959 | + preserved don't do anything. This is an interim step toward | |
| 4960 | + proper support for bookmarks in splitting and merging | |
| 4961 | + operations. | |
| 4962 | + | |
| 4963 | + - Page collation: add new option | |
| 4964 | + @1@option@1@--collate@2@option@2@. When specified, the | |
| 4965 | + semantics of @1@option@1@--pages@2@option@2@ change from | |
| 4966 | + concatenation to collation. See `Page Selection | |
| 4967 | + Options <#ref.page-selection>`__ for examples and discussion. | |
| 4968 | + | |
| 4969 | + - Generation of information in JSON format, primarily to | |
| 4970 | + facilitate use of qpdf from languages other than C++. Add new | |
| 4971 | + options @1@option@1@--json@2@option@2@, | |
| 4972 | + @1@option@1@--json-key@2@option@2@, and | |
| 4973 | + @1@option@1@--json-object@2@option@2@ to generate a JSON | |
| 4974 | + representation of the PDF file. Run @1@command@1@qpdf | |
| 4975 | + --json-help@2@command@2@ to get a description of the JSON | |
| 4976 | + format. For more information, see `QPDF JSON <#ref.json>`__. | |
| 4977 | + | |
| 4978 | + - The @1@option@1@--generate-appearances@2@option@2@ flag will | |
| 4979 | + cause qpdf to generate appearances for form fields if the PDF | |
| 4980 | + file indicates that form field appearances are out of date. | |
| 4981 | + This can happen when PDF forms are filled in by a program that | |
| 4982 | + doesn't know how to regenerate the appearances of the filled-in | |
| 4983 | + fields. | |
| 4984 | + | |
| 4985 | + - The @1@option@1@--flatten-annotations@2@option@2@ flag can be | |
| 4986 | + used to *flatten* annotations, including form fields. | |
| 4987 | + Ordinarily, annotations are drawn separately from the page. | |
| 4988 | + Flattening annotations is the process of combining their | |
| 4989 | + appearances into the page's contents. You might want to do this | |
| 4990 | + if you are going to rotate or combine pages using a tool that | |
| 4991 | + doesn't understand about annotations. You may also want to use | |
| 4992 | + @1@option@1@--generate-appearances@2@option@2@ when using this | |
| 4993 | + flag since annotations for outdated form fields are not | |
| 4994 | + flattened as that would cause loss of information. | |
| 4995 | + | |
| 4996 | + - The @1@option@1@--optimize-images@2@option@2@ flag tells qpdf | |
| 4997 | + to recompresses every image using DCT (JPEG) compression as | |
| 4998 | + long as the image is not already compressed with lossy | |
| 4999 | + compression and recompressing the image reduces its size. The | |
| 5000 | + additional options @1@option@1@--oi-min-width@2@option@2@, | |
| 5001 | + @1@option@1@--oi-min-height@2@option@2@, and | |
| 5002 | + @1@option@1@--oi-min-area@2@option@2@ prevent recompression of | |
| 5003 | + images whose width, height, or pixel area (widthย รย height) are | |
| 5004 | + below a specified threshold. | |
| 5005 | + | |
| 5006 | + - The @1@option@1@--show-object@2@option@2@ option can now be | |
| 5007 | + given as @1@option@1@--show-object=trailer@2@option@2@ to show | |
| 5008 | + the trailer dictionary. | |
| 5009 | + | |
| 5010 | + - Bug Fixes and Enhancements | |
| 5011 | + | |
| 5012 | + - QPDF now automatically detects and recovers from dangling | |
| 5013 | + references. If a PDF file contained an indirect reference to a | |
| 5014 | + non-existent object, which is valid, when adding a new object | |
| 5015 | + to the file, it was possible for the new object to take the | |
| 5016 | + object ID of the dangling reference, thereby causing the | |
| 5017 | + dangling reference to point to the new object. This case is now | |
| 5018 | + prevented. | |
| 5019 | + | |
| 5020 | + - Fixes to form field setting code: strings are always written in | |
| 5021 | + UTF-16 format, and checkboxes and radio buttons are handled | |
| 5022 | + properly with respect to synchronization of values and | |
| 5023 | + appearance states. | |
| 5024 | + | |
| 5025 | + - The ``QPDF::checkLinearization()`` no longer causes the program | |
| 5026 | + to crash when it detects problems with linearization data. | |
| 5027 | + Instead, it issues a normal warning or error. | |
| 5028 | + | |
| 5029 | + - Ordinarily qpdf treats an argument of the form | |
| 5030 | + @1@option@1@@file@2@option@2@ to mean that command-line options | |
| 5031 | + should be read from @1@filename@1@file@2@filename@2@. Now, if | |
| 5032 | + @1@filename@1@file@2@filename@2@ does not exist but | |
| 5033 | + @1@filename@1@@file@2@filename@2@ does, qpdf will treat | |
| 5034 | + @1@filename@1@@file@2@filename@2@ as a regular option. This | |
| 5035 | + makes it possible to work more easily with PDF files whose | |
| 5036 | + names happen to start with the ``@`` character. | |
| 5037 | + | |
| 5038 | + - Library Enhancements | |
| 5039 | + | |
| 5040 | + - Remove the restriction in most cases that the source QPDF | |
| 5041 | + object used in a ``QPDF::copyForeignObject`` call has to stick | |
| 5042 | + around until the destination QPDF is written. The exceptional | |
| 5043 | + case is when the source stream gets is data using a | |
| 5044 | + QPDFObjectHandle::StreamDataProvider. For a more in-depth | |
| 5045 | + discussion, see comments around ``copyForeignObject`` in | |
| 5046 | + @1@filename@1@QPDF.hh@2@filename@2@. | |
| 5047 | + | |
| 5048 | + - Add new method ``QPDFWriter::getFinalVersion()``, which returns | |
| 5049 | + the PDF version that will ultimately be written to the final | |
| 5050 | + file. See comments in @1@filename@1@QPDFWriter.hh@2@filename@2@ | |
| 5051 | + for some restrictions on its use. | |
| 5052 | + | |
| 5053 | + - Add several methods for transcoding strings to some of the | |
| 5054 | + character sets used in PDF files: ``QUtil::utf8_to_ascii``, | |
| 5055 | + ``QUtil::utf8_to_win_ansi``, ``QUtil::utf8_to_mac_roman``, and | |
| 5056 | + ``QUtil::utf8_to_utf16``. For the single-byte encodings that | |
| 5057 | + support only a limited character sets, these methods replace | |
| 5058 | + unsupported characters with a specified substitute. | |
| 5059 | + | |
| 5060 | + - Add new methods to ``QPDFAnnotationObjectHelper`` and | |
| 5061 | + ``QPDFFormFieldObjectHelper`` for querying flags and | |
| 5062 | + interpretation of different field types. Define constants in | |
| 5063 | + @1@filename@1@qpdf/Constants.h@2@filename@2@ to help with | |
| 5064 | + interpretation of flag values. | |
| 5065 | + | |
| 5066 | + - Add new methods | |
| 5067 | + ``QPDFAcroFormDocumentHelper::generateAppearancesIfNeeded`` and | |
| 5068 | + ``QPDFFormFieldObjectHelper::generateAppearance`` for | |
| 5069 | + generating appearance streams. See discussion in | |
| 5070 | + @1@filename@1@QPDFFormFieldObjectHelper.hh@2@filename@2@ for | |
| 5071 | + limitations. | |
| 5072 | + | |
| 5073 | + - Add two new helper functions for dealing with resource | |
| 5074 | + dictionaries: ``QPDFObjectHandle::getResourceNames()`` returns | |
| 5075 | + a list of all second-level keys, which correspond to the names | |
| 5076 | + of resources, and ``QPDFObjectHandle::mergeResources()`` merges | |
| 5077 | + two resources dictionaries as long as they have non-conflicting | |
| 5078 | + keys. These methods are useful for certain types of objects | |
| 5079 | + that resolve resources from multiple places, such as form | |
| 5080 | + fields. | |
| 5081 | + | |
| 5082 | + - Add methods ``QPDFPageDocumentHelper::flattenAnnotations()`` | |
| 5083 | + and | |
| 5084 | + ``QPDFAnnotationObjectHelper::getPageContentForAppearance()`` | |
| 5085 | + for handling low-level details of annotation flattening. | |
| 5086 | + | |
| 5087 | + - Add new helper classes: ``QPDFOutlineDocumentHelper``, | |
| 5088 | + ``QPDFOutlineObjectHelper``, ``QPDFPageLabelDocumentHelper``, | |
| 5089 | + ``QPDFNameTreeObjectHelper``, and | |
| 5090 | + ``QPDFNumberTreeObjectHelper``. | |
| 5091 | + | |
| 5092 | + - Add method ``QPDFObjectHandle::getJSON()`` that returns a JSON | |
| 5093 | + representation of the object. Call ``serialize()`` on the | |
| 5094 | + result to convert it to a string. | |
| 5095 | + | |
| 5096 | + - Add a simple JSON serializer. This is not a complete or | |
| 5097 | + general-purpose JSON library. It allows assembly and | |
| 5098 | + serialization of JSON structures with some restrictions, which | |
| 5099 | + are described in the header file. This is the serializer used | |
| 5100 | + by qpdf's new JSON representation. | |
| 5101 | + | |
| 5102 | + - Add new ``QPDFObjectHandle::Matrix`` class along with a few | |
| 5103 | + convenience methods for dealing with six-element numerical | |
| 5104 | + arrays as matrices. | |
| 5105 | + | |
| 5106 | + - Add new method ``QPDFObjectHandle::wrapInArray``, which returns | |
| 5107 | + the object itself if it is an array, or an array containing the | |
| 5108 | + object otherwise. This is a common construct in PDF. This | |
| 5109 | + method prevents you from having to explicitly test whether | |
| 5110 | + something is a single element or an array. | |
| 5111 | + | |
| 5112 | + - Build Improvements | |
| 5113 | + | |
| 5114 | + - It is no longer necessary to run | |
| 5115 | + @1@command@1@autogen.sh@2@command@2@ to build from a pristine | |
| 5116 | + checkout. Automatically generated files are now committed so | |
| 5117 | + that it is possible to build on platforms without autoconf | |
| 5118 | + directly from a clean checkout of the repository. The | |
| 5119 | + @1@command@1@configure@2@command@2@ script detects if the files | |
| 5120 | + are out of date when it also determines that the tools are | |
| 5121 | + present to regenerate them. | |
| 5122 | + | |
| 5123 | + - Pull requests and the master branch are now built automatically | |
| 5124 | + in `Azure | |
| 5125 | + Pipelines <https://dev.azure.com/qpdf/qpdf/_build>`__, which is | |
| 5126 | + free for open source projects. The build includes Linux, mac, | |
| 5127 | + Windows 32-bit and 64-bit with mingw and MSVC, and an AppImage | |
| 5128 | + build. Official qpdf releases are now built with Azure | |
| 5129 | + Pipelines. | |
| 5130 | + | |
| 5131 | + - Notes for Packagers | |
| 5132 | + | |
| 5133 | + - A new section has been added to the documentation with notes | |
| 5134 | + for packagers. Please see `Notes for | |
| 5135 | + Packagers <#ref.packaging>`__. | |
| 5136 | + | |
| 5137 | + - The qpdf detects out-of-date automatically generated files. If | |
| 5138 | + your packaging system automatically refreshes libtool or | |
| 5139 | + autoconf files, it could cause this check to fail. To avoid | |
| 5140 | + this problem, pass | |
| 5141 | + @1@option@1@--disable-check-autofiles@2@option@2@ to | |
| 5142 | + @1@command@1@configure@2@command@2@. | |
| 5143 | + | |
| 5144 | + - If you would like to have qpdf completion enabled | |
| 5145 | + automatically, you can install completion files in the | |
| 5146 | + distribution's default location. You can find sample completion | |
| 5147 | + files to install in the @1@filename@1@completions@2@filename@2@ | |
| 5148 | + directory. | |
| 5149 | + | |
| 5150 | +8.2.1: August 18, 2018 | |
| 5151 | + - Command-line Enhancements | |
| 5152 | + | |
| 5153 | + - Add | |
| 5154 | + @1@option@1@--keep-files-open=@1@replaceable@1@[yn]@2@replaceable@2@@2@option@2@ | |
| 5155 | + to override default determination of whether to keep files open | |
| 5156 | + when merging. Please see the discussion of | |
| 5157 | + @1@option@1@--keep-files-open@2@option@2@ in `Basic | |
| 5158 | + Options <#ref.basic-options>`__ for additional details. | |
| 5159 | + | |
| 5160 | +8.2.0: August 16, 2018 | |
| 5161 | + - Command-line Enhancements | |
| 5162 | + | |
| 5163 | + - Add @1@option@1@--no-warn@2@option@2@ option to suppress | |
| 5164 | + issuing warning messages. If there are any conditions that | |
| 5165 | + would have caused warnings to be issued, the exit status is | |
| 5166 | + still 3. | |
| 5167 | + | |
| 5168 | + - Bug Fixes and Optimizations | |
| 5169 | + | |
| 5170 | + - Performance fix: optimize page merging operation to avoid | |
| 5171 | + unnecessary open/close calls on files being merged. This solves | |
| 5172 | + a dramatic slow-down that was observed when merging certain | |
| 5173 | + types of files. | |
| 5174 | + | |
| 5175 | + - Optimize how memory was used for the TIFF predictor, | |
| 5176 | + drastically improving performance and memory usage for files | |
| 5177 | + containing high-resolution images compressed with Flate using | |
| 5178 | + the TIFF predictor. | |
| 5179 | + | |
| 5180 | + - Bug fix: end of line characters were not properly handled | |
| 5181 | + inside strings in some cases. | |
| 5182 | + | |
| 5183 | + - Bug fix: using @1@option@1@--progress@2@option@2@ on very small | |
| 5184 | + files could cause an infinite loop. | |
| 5185 | + | |
| 5186 | + - API enhancements | |
| 5187 | + | |
| 5188 | + - Add new class ``QPDFSystemError``, derived from | |
| 5189 | + ``std::runtime_error``, which is now thrown by | |
| 5190 | + ``QUtil::throw_system_error``. This enables the triggering | |
| 5191 | + ``errno`` value to be retrieved. | |
| 5192 | + | |
| 5193 | + - Add ``ClosedFileInputSource::stayOpen`` method, enabling a | |
| 5194 | + ``ClosedFileInputSource`` to stay open during manually | |
| 5195 | + indicated periods of high activity, thus reducing the overhead | |
| 5196 | + of frequent open/close operations. | |
| 5197 | + | |
| 5198 | + - Build Changes | |
| 5199 | + | |
| 5200 | + - For the mingw builds, change the name of the DLL import library | |
| 5201 | + from @1@filename@1@libqpdf.a@2@filename@2@ to | |
| 5202 | + @1@filename@1@libqpdf.dll.a@2@filename@2@ to more accurately | |
| 5203 | + reflect that it is an import library rather than a static | |
| 5204 | + library. This potentially clears the way for supporting a | |
| 5205 | + static library in the future, though presently, the qpdf | |
| 5206 | + Windows build only builds the DLL and executables. | |
| 5207 | + | |
| 5208 | +8.1.0: June 23, 2018 | |
| 5209 | + - Usability Improvements | |
| 5210 | + | |
| 5211 | + - When splitting files, qpdf detects fonts and images that the | |
| 5212 | + document metadata claims are referenced from a page but are not | |
| 5213 | + actually referenced and omits them from the output file. This | |
| 5214 | + change can cause a significant reduction in the size of split | |
| 5215 | + PDF files for files created by some software packages. In some | |
| 5216 | + cases, it can also make page splitting slower. Prior versions | |
| 5217 | + of qpdf would believe the document metadata and sometimes | |
| 5218 | + include all the images from all the other pages even though the | |
| 5219 | + pages were no longer present. In the unlikely event that the | |
| 5220 | + old behavior should be desired, or if you have a case where | |
| 5221 | + page splitting is very slow, the old behavior (and speed) can | |
| 5222 | + be enabled by specifying | |
| 5223 | + @1@option@1@--preserve-unreferenced-resources@2@option@2@. For | |
| 5224 | + additional details, please see `Advanced Transformation | |
| 5225 | + Options <#ref.advanced-transformation>`__. | |
| 5226 | + | |
| 5227 | + - When merging multiple PDF files, qpdf no longer leaves all the | |
| 5228 | + files open. This makes it possible to merge numbers of files | |
| 5229 | + that may exceed the operating system's limit for the maximum | |
| 5230 | + number of open files. | |
| 5231 | + | |
| 5232 | + - The @1@option@1@--rotate@2@option@2@ option's syntax has been | |
| 5233 | + extended to make the page range optional. If you specify | |
| 5234 | + @1@option@1@--rotate=@1@replaceable@1@angle@2@replaceable@2@@2@option@2@ | |
| 5235 | + without specifying a page range, the rotation will be applied | |
| 5236 | + to all pages. This can be especially useful for adjusting a PDF | |
| 5237 | + created from a multi-page document that was scanned upside | |
| 5238 | + down. | |
| 5239 | + | |
| 5240 | + - When merging multiple files, the | |
| 5241 | + @1@option@1@--verbose@2@option@2@ option now prints information | |
| 5242 | + about each file as it operates on that file. | |
| 5243 | + | |
| 5244 | + - When the @1@option@1@--progress@2@option@2@ option is | |
| 5245 | + specified, qpdf will print a running indicator of its best | |
| 5246 | + guess at how far through the writing process it is. Note that, | |
| 5247 | + as with all progress meters, it's an approximation. This option | |
| 5248 | + is implemented in a way that makes it useful for software that | |
| 5249 | + uses the qpdf library; see API Enhancements below. | |
| 5250 | + | |
| 5251 | + - Bug Fixes | |
| 5252 | + | |
| 5253 | + - Properly decrypt files that use revision 3 of the standard | |
| 5254 | + security handler but use 40 bit keys (even though revision 3 | |
| 5255 | + supports 128-bit keys). | |
| 5256 | + | |
| 5257 | + - Limit depth of nested data structures to prevent crashes from | |
| 5258 | + certain types of malformed (malicious) PDFs. | |
| 5259 | + | |
| 5260 | + - In "newline before endstream" mode, insert the required extra | |
| 5261 | + newline before the ``endstream`` at the end of object streams. | |
| 5262 | + This one case was previously omitted. | |
| 5263 | + | |
| 5264 | + - API Enhancements | |
| 5265 | + | |
| 5266 | + - The first round of higher level "helper" interfaces has been | |
| 5267 | + introduced. These are designed to provide a more convenient way | |
| 5268 | + of interacting with certain document features than using | |
| 5269 | + ``QPDFObjectHandle`` directly. For details on helpers, see | |
| 5270 | + `Helper Classes <#ref.helper-classes>`__. Specific additional | |
| 5271 | + interfaces are described below. | |
| 5272 | + | |
| 5273 | + - Add two new document helper classes: ``QPDFPageDocumentHelper`` | |
| 5274 | + for working with pages, and ``QPDFAcroFormDocumentHelper`` for | |
| 5275 | + working with interactive forms. No old methods have been | |
| 5276 | + removed, but ``QPDFPageDocumentHelper`` is now the preferred | |
| 5277 | + way to perform operations on pages rather than calling the old | |
| 5278 | + methods in ``QPDFObjectHandle`` and ``QPDF`` directly. Comments | |
| 5279 | + in the header files direct you to the new interfaces. Please | |
| 5280 | + see the header files and @1@filename@1@ChangeLog@2@filename@2@ | |
| 5281 | + for additional details. | |
| 5282 | + | |
| 5283 | + - Add three new object helper class: ``QPDFPageObjectHelper`` for | |
| 5284 | + pages, ``QPDFFormFieldObjectHelper`` for interactive form | |
| 5285 | + fields, and ``QPDFAnnotationObjectHelper`` for annotations. All | |
| 5286 | + three classes are fairly sparse at the moment, but they have | |
| 5287 | + some useful, basic functionality. | |
| 5288 | + | |
| 5289 | + - A new example program | |
| 5290 | + @1@filename@1@examples/pdf-set-form-values.cc@2@filename@2@ has | |
| 5291 | + been added that illustrates use of the new document and object | |
| 5292 | + helpers. | |
| 5293 | + | |
| 5294 | + - The method ``QPDFWriter::registerProgressReporter`` has been | |
| 5295 | + added. This method allows you to register a function that is | |
| 5296 | + called by ``QPDFWriter`` to update your idea of the percentage | |
| 5297 | + it thinks it is through writing its output. Client programs can | |
| 5298 | + use this to implement reasonably accurate progress meters. The | |
| 5299 | + @1@command@1@qpdf@2@command@2@ command line tool uses this to | |
| 5300 | + implement its @1@option@1@--progress@2@option@2@ option. | |
| 5301 | + | |
| 5302 | + - New methods ``QPDFObjectHandle::newUnicodeString`` and | |
| 5303 | + ``QPDFObject::unparseBinary`` have been added to allow for more | |
| 5304 | + convenient creation of strings that are explicitly encoded | |
| 5305 | + using big-endian UTF-16. This is useful for creating strings | |
| 5306 | + that appear outside of content streams, such as labels, form | |
| 5307 | + fields, outlines, document metadata, etc. | |
| 5308 | + | |
| 5309 | + - A new class ``QPDFObjectHandle::Rectangle`` has been added to | |
| 5310 | + ease working with PDF rectangles, which are just arrays of four | |
| 5311 | + numeric values. | |
| 5312 | + | |
| 5313 | +8.0.2: March 6, 2018 | |
| 5314 | + - When a loop is detected while following cross reference streams or | |
| 5315 | + tables, treat this as damage instead of silently ignoring the | |
| 5316 | + previous table. This prevents loss of otherwise recoverable data | |
| 5317 | + in some damaged files. | |
| 5318 | + | |
| 5319 | + - Properly handle pages with no contents. | |
| 5320 | + | |
| 5321 | +8.0.1: March 4, 2018 | |
| 5322 | + - Disregard data check errors when uncompressing ``/FlateDecode`` | |
| 5323 | + streams. This is consistent with most other PDF readers and allows | |
| 5324 | + qpdf to recover data from another class of malformed PDF files. | |
| 5325 | + | |
| 5326 | + - On the command line when specifying page ranges, support preceding | |
| 5327 | + a page number by "r" to indicate that it should be counted from | |
| 5328 | + the end. For example, the range ``r3-r1`` would indicate the last | |
| 5329 | + three pages of a document. | |
| 5330 | + | |
| 5331 | +8.0.0: February 25, 2018 | |
| 5332 | + - Packaging and Distribution Changes | |
| 5333 | + | |
| 5334 | + - QPDF is now distributed as an | |
| 5335 | + `AppImage <https://appimage.org/>`__ in addition to all the | |
| 5336 | + other ways it is distributed. The AppImage can be found in the | |
| 5337 | + download area with the other packages. Thanks to Kurt Pfeifle | |
| 5338 | + and Simon Peter for their contributions. | |
| 5339 | + | |
| 5340 | + - Bug Fixes | |
| 5341 | + | |
| 5342 | + - ``QPDFObjectHandle::getUTF8Val`` now properly treats | |
| 5343 | + non-Unicode strings as encoded with PDF Doc Encoding. | |
| 5344 | + | |
| 5345 | + - Improvements to handling of objects in PDF files that are not | |
| 5346 | + of the expected type. In most cases, qpdf will be able to warn | |
| 5347 | + for such cases rather than fail with an exception. Previous | |
| 5348 | + versions of qpdf would sometimes fail with errors such as | |
| 5349 | + "operation for dictionary object attempted on object of wrong | |
| 5350 | + type". This situation should be mostly or entirely eliminated | |
| 5351 | + now. | |
| 5352 | + | |
| 5353 | + - Enhancements to the @1@command@1@qpdf@2@command@2@ Command-line | |
| 5354 | + Tool. All new options listed here are documented in more detail in | |
| 5355 | + `Running QPDF <#ref.using>`__. | |
| 5356 | + | |
| 5357 | + - The option | |
| 5358 | + @1@option@1@--linearize-pass1=@1@replaceable@1@file@2@replaceable@2@@2@option@2@ | |
| 5359 | + has been added for debugging qpdf's linearization code. | |
| 5360 | + | |
| 5361 | + - The option @1@option@1@--coalesce-contents@2@option@2@ can be | |
| 5362 | + used to combine content streams of a page whose contents are an | |
| 5363 | + array of streams into a single stream. | |
| 5364 | + | |
| 5365 | + - API Enhancements. All new API calls are documented in their | |
| 5366 | + respective classes' header files. There are no non-compatible | |
| 5367 | + changes to the API. | |
| 5368 | + | |
| 5369 | + - Add function ``qpdf_check_pdf`` to the C API. This function | |
| 5370 | + does basic checking that is a subset of what @1@command@1@qpdf | |
| 5371 | + --check@2@command@2@ performs. | |
| 5372 | + | |
| 5373 | + - Major enhancements to the lexical layer of qpdf. For a complete | |
| 5374 | + list of enhancements, please refer to the | |
| 5375 | + @1@filename@1@ChangeLog@2@filename@2@ file. Most of the changes | |
| 5376 | + result in improvements to qpdf's ability handle erroneous | |
| 5377 | + files. It is also possible for programs to handle whitespace, | |
| 5378 | + comments, and inline images as tokens. | |
| 5379 | + | |
| 5380 | + - New API for working with PDF content streams at a lexical | |
| 5381 | + level. The new class ``QPDFObjectHandle::TokenFilter`` allows | |
| 5382 | + the developer to provide token handlers. Token filters can be | |
| 5383 | + used with several different methods in ``QPDFObjectHandle`` as | |
| 5384 | + well as with a lower-level interface. See comments in | |
| 5385 | + @1@filename@1@QPDFObjectHandle.hh@2@filename@2@ as well as the | |
| 5386 | + new examples | |
| 5387 | + @1@filename@1@examples/pdf-filter-tokens.cc@2@filename@2@ and | |
| 5388 | + @1@filename@1@examples/pdf-count-strings.cc@2@filename@2@ for | |
| 5389 | + details. | |
| 5390 | + | |
| 5391 | +7.1.1: February 4, 2018 | |
| 5392 | + - Bug fix: files whose /ID fields were other than 16 bytes long can | |
| 5393 | + now be properly linearized | |
| 5394 | + | |
| 5395 | + - A few compile and link issues have been corrected for some | |
| 5396 | + platforms. | |
| 5397 | + | |
| 5398 | +7.1.0: January 14, 2018 | |
| 5399 | + - PDF files contain streams that may be compressed with various | |
| 5400 | + compression algorithms which, in some cases, may be enhanced by | |
| 5401 | + various predictor functions. Previously only the PNG up predictor | |
| 5402 | + was supported. In this version, all the PNG predictors as well as | |
| 5403 | + the TIFF predictor are supported. This increases the range of | |
| 5404 | + files that qpdf is able to handle. | |
| 5405 | + | |
| 5406 | + - QPDF now allows a raw encryption key to be specified in place of a | |
| 5407 | + password when opening encrypted files, and will optionally display | |
| 5408 | + the encryption key used by a file. This is a non-standard | |
| 5409 | + operation, but it can be useful in certain situations. Please see | |
| 5410 | + the discussion of @1@option@1@--password-is-hex-key@2@option@2@ in | |
| 5411 | + `Basic Options <#ref.basic-options>`__ or the comments around | |
| 5412 | + ``QPDF::setPasswordIsHexKey`` in | |
| 5413 | + @1@filename@1@QPDF.hh@2@filename@2@ for additional details. | |
| 5414 | + | |
| 5415 | + - Bug fix: numbers ending with a trailing decimal point are now | |
| 5416 | + properly recognized as numbers. | |
| 5417 | + | |
| 5418 | + - Bug fix: when building qpdf from source on some platforms | |
| 5419 | + (especially MacOS), the build could get confused by older versions | |
| 5420 | + of qpdf installed on the system. This has been corrected. | |
| 5421 | + | |
| 5422 | +7.0.0: September 15, 2017 | |
| 5423 | + - Packaging and Distribution Changes | |
| 5424 | + | |
| 5425 | + - QPDF's primary license is now `version 2.0 of the Apache | |
| 5426 | + License <http://www.apache.org/licenses/LICENSE-2.0>`__ rather | |
| 5427 | + than version 2.0 of the Artistic License. You may still, at | |
| 5428 | + your option, consider qpdf to be licensed with version 2.0 of | |
| 5429 | + the Artistic license. | |
| 5430 | + | |
| 5431 | + - QPDF no longer has a dependency on the PCRE (Perl-Compatible | |
| 5432 | + Regular Expression) library. QPDF now has an added dependency | |
| 5433 | + on the JPEG library. | |
| 5434 | + | |
| 5435 | + - Bug Fixes | |
| 5436 | + | |
| 5437 | + - This release contains many bug fixes for various infinite | |
| 5438 | + loops, memory leaks, and other memory errors that could be | |
| 5439 | + encountered with specially crafted or otherwise erroneous PDF | |
| 5440 | + files. | |
| 5441 | + | |
| 5442 | + - New Features | |
| 5443 | + | |
| 5444 | + - QPDF now supports reading and writing streams encoded with JPEG | |
| 5445 | + or RunLength encoding. Library API enhancements and | |
| 5446 | + command-line options have been added to control this behavior. | |
| 5447 | + See command-line options | |
| 5448 | + @1@option@1@--compress-streams@2@option@2@ and | |
| 5449 | + @1@option@1@--decode-level@2@option@2@ and methods | |
| 5450 | + ``QPDFWriter::setCompressStreams`` and | |
| 5451 | + ``QPDFWriter::setDecodeLevel``. | |
| 5452 | + | |
| 5453 | + - QPDF is much better at recovering from broken files. In most | |
| 5454 | + cases, qpdf will skip invalid objects and will preserve broken | |
| 5455 | + stream data by not attempting to filter broken streams. QPDF is | |
| 5456 | + now able to recover or at least not crash on dozens of broken | |
| 5457 | + test files I have received over the past few years. | |
| 5458 | + | |
| 5459 | + - Page rotation is now supported and accessible from both the | |
| 5460 | + library and the command line. | |
| 5461 | + | |
| 5462 | + - ``QPDFWriter`` supports writing files in a way that preserves | |
| 5463 | + PCLm compliance in support of driverless printing. This is very | |
| 5464 | + specialized and is only useful to applications that already | |
| 5465 | + know how to create PCLm files. | |
| 5466 | + | |
| 5467 | + - Enhancements to the @1@command@1@qpdf@2@command@2@ Command-line | |
| 5468 | + Tool. All new options listed here are documented in more detail in | |
| 5469 | + `Running QPDF <#ref.using>`__. | |
| 5470 | + | |
| 5471 | + - Command-line arguments can now be read from files or standard | |
| 5472 | + input using ``@file`` or ``@-`` syntax. Please see `Basic | |
| 5473 | + Invocation <#ref.invocation>`__. | |
| 5474 | + | |
| 5475 | + - @1@option@1@--rotate@2@option@2@: request page rotation | |
| 5476 | + | |
| 5477 | + - @1@option@1@--newline-before-endstream@2@option@2@: ensure that | |
| 5478 | + a newline appears before every ``endstream`` keyword in the | |
| 5479 | + file; used to prevent qpdf from breaking PDF/A compliance on | |
| 5480 | + already compliant files. | |
| 5481 | + | |
| 5482 | + - @1@option@1@--preserve-unreferenced@2@option@2@: preserve | |
| 5483 | + unreferenced objects in the input PDF | |
| 5484 | + | |
| 5485 | + - @1@option@1@--split-pages@2@option@2@: break output into chunks | |
| 5486 | + with fixed numbers of pages | |
| 5487 | + | |
| 5488 | + - @1@option@1@--verbose@2@option@2@: print the name of each | |
| 5489 | + output file that is created | |
| 5490 | + | |
| 5491 | + - @1@option@1@--compress-streams@2@option@2@ and | |
| 5492 | + @1@option@1@--decode-level@2@option@2@ replace | |
| 5493 | + @1@option@1@--stream-data@2@option@2@ for improving granularity | |
| 5494 | + of controlling compression and decompression of stream data. | |
| 5495 | + The @1@option@1@--stream-data@2@option@2@ option will remain | |
| 5496 | + available. | |
| 5497 | + | |
| 5498 | + - When running @1@command@1@qpdf --check@2@command@2@ with other | |
| 5499 | + options, checks are always run first. This enables qpdf to | |
| 5500 | + perform its full recovery logic before outputting other | |
| 5501 | + information. This can be especially useful when manually | |
| 5502 | + recovering broken files, looking at qpdf's regenerated cross | |
| 5503 | + reference table, or other similar operations. | |
| 5504 | + | |
| 5505 | + - Process @1@command@1@--pages@2@command@2@ earlier so that other | |
| 5506 | + options like @1@option@1@--show-pages@2@option@2@ or | |
| 5507 | + @1@option@1@--split-pages@2@option@2@ can operate on the file | |
| 5508 | + after page splitting/merging has occurred. | |
| 5509 | + | |
| 5510 | + - API Changes. All new API calls are documented in their respective | |
| 5511 | + classes' header files. | |
| 5512 | + | |
| 5513 | + - ``QPDFObjectHandle::rotatePage``: apply rotation to a page | |
| 5514 | + object | |
| 5515 | + | |
| 5516 | + - ``QPDFWriter::setNewlineBeforeEndstream``: force newline to | |
| 5517 | + appear before ``endstream`` | |
| 5518 | + | |
| 5519 | + - ``QPDFWriter::setPreserveUnreferencedObjects``: preserve | |
| 5520 | + unreferenced objects that appear in the input PDF. The default | |
| 5521 | + behavior is to discard them. | |
| 5522 | + | |
| 5523 | + - New ``Pipeline`` types ``Pl_RunLength`` and ``Pl_DCT`` are | |
| 5524 | + available for developers who wish to produce or consume | |
| 5525 | + RunLength or DCT stream data directly. The | |
| 5526 | + @1@filename@1@examples/pdf-create.cc@2@filename@2@ example | |
| 5527 | + illustrates their use. | |
| 5528 | + | |
| 5529 | + - ``QPDFWriter::setCompressStreams`` and | |
| 5530 | + ``QPDFWriter::setDecodeLevel`` methods control handling of | |
| 5531 | + different types of stream compression. | |
| 5532 | + | |
| 5533 | + - Add new C API functions ``qpdf_set_compress_streams``, | |
| 5534 | + ``qpdf_set_decode_level``, | |
| 5535 | + ``qpdf_set_preserve_unreferenced_objects``, and | |
| 5536 | + ``qpdf_set_newline_before_endstream`` corresponding to the new | |
| 5537 | + ``QPDFWriter`` methods. | |
| 5538 | + | |
| 5539 | +6.0.0: November 10, 2015 | |
| 5540 | + - Implement @1@option@1@--deterministic-id@2@option@2@ command-line | |
| 5541 | + option and ``QPDFWriter::setDeterministicID`` as well as C API | |
| 5542 | + function ``qpdf_set_deterministic_ID`` for generating a | |
| 5543 | + deterministic ID for non-encrypted files. When this option is | |
| 5544 | + selected, the ID of the file depends on the contents of the output | |
| 5545 | + file, and not on transient items such as the timestamp or output | |
| 5546 | + file name. | |
| 5547 | + | |
| 5548 | + - Make qpdf more tolerant of files whose xref table entries are not | |
| 5549 | + the correct length. | |
| 5550 | + | |
| 5551 | +5.1.3: May 24, 2015 | |
| 5552 | + - Bug fix: fix-qdf was not properly handling files that contained | |
| 5553 | + object streams with more than 255 objects in them. | |
| 5554 | + | |
| 5555 | + - Bug fix: qpdf was not properly initializing Microsoft's secure | |
| 5556 | + crypto provider on fresh Windows installations that had not had | |
| 5557 | + any keys created yet. | |
| 5558 | + | |
| 5559 | + - Fix a few errors found by Gynvael Coldwind and Mateusz Jurczyk of | |
| 5560 | + the Google Security Team. Please see the ChangeLog for details. | |
| 5561 | + | |
| 5562 | + - Properly handle pages that have no contents at all. There were | |
| 5563 | + many cases in which qpdf handled this fine, but a few methods | |
| 5564 | + blindly obtained page contents with handling the possibility that | |
| 5565 | + there were no contents. | |
| 5566 | + | |
| 5567 | + - Make qpdf more robust for a few more kinds of problems that may | |
| 5568 | + occur in invalid PDF files. | |
| 5569 | + | |
| 5570 | +5.1.2: June 7, 2014 | |
| 5571 | + - Bug fix: linearizing files could create a corrupted output file | |
| 5572 | + under extremely unlikely file size circumstances. See ChangeLog | |
| 5573 | + for details. The odds of getting hit by this are very low, though | |
| 5574 | + one person did. | |
| 5575 | + | |
| 5576 | + - Bug fix: qpdf would fail to write files that had streams with | |
| 5577 | + decode parameters referencing other streams. | |
| 5578 | + | |
| 5579 | + - New example program: @1@command@1@pdf-split-pages@2@command@2@: | |
| 5580 | + efficiently split PDF files into individual pages. The example | |
| 5581 | + program does this more efficiently than using @1@command@1@qpdf | |
| 5582 | + --pages@2@command@2@ to do it. | |
| 5583 | + | |
| 5584 | + - Packaging fix: Visual C++ binaries did not support Windows XP. | |
| 5585 | + This has been rectified by updating the compilers used to generate | |
| 5586 | + the release binaries. | |
| 5587 | + | |
| 5588 | +5.1.1: January 14, 2014 | |
| 5589 | + - Performance fix: copying foreign objects could be very slow with | |
| 5590 | + certain types of files. This was most likely to be visible during | |
| 5591 | + page splitting and was due to traversing the same objects multiple | |
| 5592 | + times in some cases. | |
| 5593 | + | |
| 5594 | +5.1.0: December 17, 2013 | |
| 5595 | + - Added runtime option (``QUtil::setRandomDataProvider``) to supply | |
| 5596 | + your own random data provider. You can use this if you want to | |
| 5597 | + avoid using the OS-provided secure random number generation | |
| 5598 | + facility or stdlib's less secure version. See comments in | |
| 5599 | + include/qpdf/QUtil.hh for details. | |
| 5600 | + | |
| 5601 | + - Fixed image comparison tests to not create 12-bit-per-pixel images | |
| 5602 | + since some versions of tiffcmp have bugs in comparing them in some | |
| 5603 | + cases. This increases the disk space required by the image | |
| 5604 | + comparison tests, which are off by default anyway. | |
| 5605 | + | |
| 5606 | + - Introduce a number of small fixes for compilation on the latest | |
| 5607 | + clang in MacOS and the latest Visual C++ in Windows. | |
| 5608 | + | |
| 5609 | + - Be able to handle broken files that end the xref table header with | |
| 5610 | + a space instead of a newline. | |
| 5611 | + | |
| 5612 | +5.0.1: October 18, 2013 | |
| 5613 | + - Thanks to a detailed review by Florian Weimer and the Red Hat | |
| 5614 | + Product Security Team, this release includes a number of | |
| 5615 | + non-user-visible security hardening changes. Please see the | |
| 5616 | + ChangeLog file in the source distribution for the complete list. | |
| 5617 | + | |
| 5618 | + - When available, operating system-specific secure random number | |
| 5619 | + generation is used for generating initialization vectors and other | |
| 5620 | + random values used during encryption or file creation. For the | |
| 5621 | + Windows build, this results in an added dependency on Microsoft's | |
| 5622 | + cryptography API. To disable the OS-specific cryptography and use | |
| 5623 | + the old version, pass the | |
| 5624 | + @1@option@1@--enable-insecure-random@2@option@2@ option to | |
| 5625 | + @1@command@1@./configure@2@command@2@. | |
| 5626 | + | |
| 5627 | + - The @1@command@1@qpdf@2@command@2@ command-line tool now issues a | |
| 5628 | + warning when @1@option@1@-accessibility=n@2@option@2@ is specified | |
| 5629 | + for newer encryption versions stating that the option is ignored. | |
| 5630 | + qpdf, per the spec, has always ignored this flag, but it | |
| 5631 | + previously did so silently. This warning is issued only by the | |
| 5632 | + command-line tool, not by the library. The library's handling of | |
| 5633 | + this flag is unchanged. | |
| 5634 | + | |
| 5635 | +5.0.0: July 10, 2013 | |
| 5636 | + - Bug fix: previous versions of qpdf would lose objects with | |
| 5637 | + generation != 0 when generating object streams. Fixing this | |
| 5638 | + required changes to the public API. | |
| 5639 | + | |
| 5640 | + - Removed methods from public API that were only supposed to be | |
| 5641 | + called by QPDFWriter and couldn't realistically be called anywhere | |
| 5642 | + else. See ChangeLog for details. | |
| 5643 | + | |
| 5644 | + - New ``QPDFObjGen`` class added to represent an object | |
| 5645 | + ID/generation pair. ``QPDFObjectHandle::getObjGen()`` is now | |
| 5646 | + preferred over ``QPDFObjectHandle::getObjectID()`` and | |
| 5647 | + ``QPDFObjectHandle::getGeneration()`` as it makes it less likely | |
| 5648 | + for people to accidentally write code that ignores the generation | |
| 5649 | + number. See @1@filename@1@QPDF.hh@2@filename@2@ and | |
| 5650 | + @1@filename@1@QPDFObjectHandle.hh@2@filename@2@ for additional | |
| 5651 | + notes. | |
| 5652 | + | |
| 5653 | + - Add @1@option@1@--show-npages@2@option@2@ command-line option to | |
| 5654 | + the @1@command@1@qpdf@2@command@2@ command to show the number of | |
| 5655 | + pages in a file. | |
| 5656 | + | |
| 5657 | + - Allow omission of the page range within | |
| 5658 | + @1@option@1@--pages@2@option@2@ for the | |
| 5659 | + @1@command@1@qpdf@2@command@2@ command. When omitted, the page | |
| 5660 | + range is implicitly taken to be all the pages in the file. | |
| 5661 | + | |
| 5662 | + - Various enhancements were made to support different types of | |
| 5663 | + broken files or broken readers. Details can be found in | |
| 5664 | + @1@filename@1@ChangeLog@2@filename@2@. | |
| 5665 | + | |
| 5666 | +4.1.0: April 14, 2013 | |
| 5667 | + - Note to people including qpdf in distributions: the | |
| 5668 | + @1@filename@1@.la@2@filename@2@ files generated by libtool are now | |
| 5669 | + installed by qpdf's @1@command@1@make install@2@command@2@ target. | |
| 5670 | + Before, they were not installed. This means that if your | |
| 5671 | + distribution does not want to include | |
| 5672 | + @1@filename@1@.la@2@filename@2@ files, you must remove them as | |
| 5673 | + part of your packaging process. | |
| 5674 | + | |
| 5675 | + - Major enhancement: API enhancements have been made to support | |
| 5676 | + parsing of content streams. This enhancement includes the | |
| 5677 | + following changes: | |
| 5678 | + | |
| 5679 | + - ``QPDFObjectHandle::parseContentStream`` method parses objects | |
| 5680 | + in a content stream and calls handlers in a callback class. The | |
| 5681 | + example | |
| 5682 | + @1@filename@1@examples/pdf-parse-content.cc@2@filename@2@ | |
| 5683 | + illustrates how this may be used. | |
| 5684 | + | |
| 5685 | + - ``QPDFObjectHandle`` can now represent operators and inline | |
| 5686 | + images, object types that may only appear in content streams. | |
| 5687 | + | |
| 5688 | + - Method ``QPDFObjectHandle::getTypeCode()`` returns an | |
| 5689 | + enumerated type value representing the underlying object type. | |
| 5690 | + Method ``QPDFObjectHandle::getTypeName()`` returns a text | |
| 5691 | + string describing the name of the type of a | |
| 5692 | + ``QPDFObjectHandle`` object. These methods can be used for more | |
| 5693 | + efficient parsing and debugging/diagnostic messages. | |
| 5694 | + | |
| 5695 | + - @1@command@1@qpdf --check@2@command@2@ now parses all pages' | |
| 5696 | + content streams in addition to doing other checks. While there are | |
| 5697 | + still many types of errors that cannot be detected, syntactic | |
| 5698 | + errors in content streams will now be reported. | |
| 5699 | + | |
| 5700 | + - Minor compilation enhancements have been made to facilitate easier | |
| 5701 | + for support for a broader range of compilers and compiler | |
| 5702 | + versions. | |
| 5703 | + | |
| 5704 | + - Warning flags have been moved into a separate variable in | |
| 5705 | + @1@filename@1@autoconf.mk@2@filename@2@ | |
| 5706 | + | |
| 5707 | + - The configure flag @1@option@1@--enable-werror@2@option@2@ work | |
| 5708 | + for Microsoft compilers | |
| 5709 | + | |
| 5710 | + - All MSVC CRT security warnings have been resolved. | |
| 5711 | + | |
| 5712 | + - All C-style casts in C++ Code have been replaced by C++ casts, | |
| 5713 | + and many casts that had been included to suppress higher | |
| 5714 | + warning levels for some compilers have been removed, primarily | |
| 5715 | + for clarity. Places where integer type coercion occurs have | |
| 5716 | + been scrutinized. A new casting policy has been documented in | |
| 5717 | + the manual. This is of concern mainly to people porting qpdf to | |
| 5718 | + new platforms or compilers. It is not visible to programmers | |
| 5719 | + writing code that uses the library | |
| 5720 | + | |
| 5721 | + - Some internal limits have been removed in code that converts | |
| 5722 | + numbers to strings. This is largely invisible to users, but it | |
| 5723 | + does trigger a bug in some older versions of mingw-w64's C++ | |
| 5724 | + library. See @1@filename@1@README-windows.md@2@filename@2@ in | |
| 5725 | + the source distribution if you think this may affect you. The | |
| 5726 | + copy of the DLL distributed with qpdf's binary distribution is | |
| 5727 | + not affected by this problem. | |
| 5728 | + | |
| 5729 | + - The RPM spec file previously included with qpdf has been removed. | |
| 5730 | + This is because virtually all Linux distributions include qpdf now | |
| 5731 | + that it is a dependency of CUPS filters. | |
| 5732 | + | |
| 5733 | + - A few bug fixes are included: | |
| 5734 | + | |
| 5735 | + - Overridden compressed objects are properly handled. Before, | |
| 5736 | + there were certain constructs that could cause qpdf to see old | |
| 5737 | + versions of some objects. The most usual manifestation of this | |
| 5738 | + was loss of filled in form values for certain files. | |
| 5739 | + | |
| 5740 | + - Installation no longer uses GNU/Linux-specific versions of some | |
| 5741 | + commands, so @1@command@1@make install@2@command@2@ works on | |
| 5742 | + Solaris with native tools. | |
| 5743 | + | |
| 5744 | + - The 64-bit mingw Windows binary package no longer includes a | |
| 5745 | + 32-bit DLL. | |
| 5746 | + | |
| 5747 | +4.0.1: January 17, 2013 | |
| 5748 | + - Fix detection of binary attachments in test suite to avoid false | |
| 5749 | + test failures on some platforms. | |
| 5750 | + | |
| 5751 | + - Add clarifying comment in @1@filename@1@QPDF.hh@2@filename@2@ to | |
| 5752 | + methods that return the user password explaining that it is no | |
| 5753 | + longer possible with newer encryption formats to recover the user | |
| 5754 | + password knowing the owner password. In earlier encryption | |
| 5755 | + formats, the user password was encrypted in the file using the | |
| 5756 | + owner password. In newer encryption formats, a separate encryption | |
| 5757 | + key is used on the file, and that key is independently encrypted | |
| 5758 | + using both the user password and the owner password. | |
| 5759 | + | |
| 5760 | +4.0.0: December 31, 2012 | |
| 5761 | + - Major enhancement: support has been added for newer encryption | |
| 5762 | + schemes supported by version X of Adobe Acrobat. This includes use | |
| 5763 | + of 127-character passwords, 256-bit encryption keys, and the | |
| 5764 | + encryption scheme specified in ISO 32000-2, the PDF 2.0 | |
| 5765 | + specification. This scheme can be chosen from the command line by | |
| 5766 | + specifying use of 256-bit keys. qpdf also supports the deprecated | |
| 5767 | + encryption method used by Acrobat IX. This encryption style has | |
| 5768 | + known security weaknesses and should not be used in practice. | |
| 5769 | + However, such files exist "in the wild," so support for this | |
| 5770 | + scheme is still useful. New methods | |
| 5771 | + ``QPDFWriter::setR6EncryptionParameters`` (for the PDF 2.0 scheme) | |
| 5772 | + and ``QPDFWriter::setR5EncryptionParameters`` (for the deprecated | |
| 5773 | + scheme) have been added to enable these new encryption schemes. | |
| 5774 | + Corresponding functions have been added to the C API as well. | |
| 5775 | + | |
| 5776 | + - Full support for Adobe extension levels in PDF version | |
| 5777 | + information. Starting with PDF version 1.7, corresponding to ISO | |
| 5778 | + 32000, Adobe adds new functionality by increasing the extension | |
| 5779 | + level rather than increasing the version. This support includes | |
| 5780 | + addition of the ``QPDF::getExtensionLevel`` method for retrieving | |
| 5781 | + the document's extension level, addition of versions of | |
| 5782 | + ``QPDFWriter::setMinimumPDFVersion`` and | |
| 5783 | + ``QPDFWriter::forcePDFVersion`` that accept an extension level, | |
| 5784 | + and extended syntax for specifying forced and minimum versions on | |
| 5785 | + the command line as described in `Advanced Transformation | |
| 5786 | + Options <#ref.advanced-transformation>`__. Corresponding functions | |
| 5787 | + have been added to the C API as well. | |
| 5788 | + | |
| 5789 | + - Minor fixes to prevent qpdf from referencing objects in the file | |
| 5790 | + that are not referenced in the file's overall structure. Most | |
| 5791 | + files don't have any such objects, but some files have contain | |
| 5792 | + unreferenced objects with errors, so these fixes prevent qpdf from | |
| 5793 | + needlessly rejecting or complaining about such objects. | |
| 5794 | + | |
| 5795 | + - Add new generalized methods for reading and writing files from/to | |
| 5796 | + programmer-defined sources. The method | |
| 5797 | + ``QPDF::processInputSource`` allows the programmer to use any | |
| 5798 | + input source for the input file, and | |
| 5799 | + ``QPDFWriter::setOutputPipeline`` allows the programmer to write | |
| 5800 | + the output file through any pipeline. These methods would make it | |
| 5801 | + possible to perform any number of specialized operations, such as | |
| 5802 | + accessing external storage systems, creating bindings for qpdf in | |
| 5803 | + other programming languages that have their own I/O systems, etc. | |
| 5804 | + | |
| 5805 | + - Add new method ``QPDF::getEncryptionKey`` for retrieving the | |
| 5806 | + underlying encryption key used in the file. | |
| 5807 | + | |
| 5808 | + - This release includes a small handful of non-compatible API | |
| 5809 | + changes. While effort is made to avoid such changes, all the | |
| 5810 | + non-compatible API changes in this version were to parts of the | |
| 5811 | + API that would likely never be used outside the library itself. In | |
| 5812 | + all cases, the altered methods or structures were parts of the | |
| 5813 | + ``QPDF`` that were public to enable them to be called from either | |
| 5814 | + ``QPDFWriter`` or were part of validation code that was | |
| 5815 | + over-zealous in reporting problems in parts of the file that would | |
| 5816 | + not ordinarily be referenced. In no case did any of the removed | |
| 5817 | + methods do anything worse that falsely report error conditions in | |
| 5818 | + files that were broken in ways that didn't matter. The following | |
| 5819 | + public parts of the ``QPDF`` class were changed in a | |
| 5820 | + non-compatible way: | |
| 5821 | + | |
| 5822 | + - Updated nested ``QPDF::EncryptionData`` class to add fields | |
| 5823 | + needed by the newer encryption formats, member variables | |
| 5824 | + changed to private so that future changes will not require | |
| 5825 | + breaking backward compatibility. | |
| 5826 | + | |
| 5827 | + - Added additional parameters to ``compute_data_key``, which is | |
| 5828 | + used by ``QPDFWriter`` to compute the encryption key used to | |
| 5829 | + encrypt a specific object. | |
| 5830 | + | |
| 5831 | + - Removed the method ``flattenScalarReferences``. This method was | |
| 5832 | + previously used prior to writing a new PDF file, but it has the | |
| 5833 | + undesired side effect of causing qpdf to read objects in the | |
| 5834 | + file that were not referenced. Some otherwise files have | |
| 5835 | + unreferenced objects with errors in them, so this could cause | |
| 5836 | + qpdf to reject files that would be accepted by virtually all | |
| 5837 | + other PDF readers. In fact, qpdf relied on only a very small | |
| 5838 | + part of what flattenScalarReferences did, so only this part has | |
| 5839 | + been preserved, and it is now done directly inside | |
| 5840 | + ``QPDFWriter``. | |
| 5841 | + | |
| 5842 | + - Removed the method ``decodeStreams``. This method was used by | |
| 5843 | + the @1@option@1@--check@2@option@2@ option of the | |
| 5844 | + @1@command@1@qpdf@2@command@2@ command-line tool to force all | |
| 5845 | + streams in the file to be decoded, but it also suffered from | |
| 5846 | + the problem of opening otherwise unreferenced streams and thus | |
| 5847 | + could report false positive. The | |
| 5848 | + @1@option@1@--check@2@option@2@ option now causes qpdf to go | |
| 5849 | + through all the motions of writing a new file based on the | |
| 5850 | + original one, so it will always reference and check exactly | |
| 5851 | + those parts of a file that any ordinary viewer would check. | |
| 5852 | + | |
| 5853 | + - Removed the method ``trimTrailerForWrite``. This method was | |
| 5854 | + used by ``QPDFWriter`` to modify the original QPDF object by | |
| 5855 | + removing fields from the trailer dictionary that wouldn't apply | |
| 5856 | + to the newly written file. This functionality, though generally | |
| 5857 | + harmless, was a poor implementation and has been replaced by | |
| 5858 | + having QPDFWriter filter these out when copying the trailer | |
| 5859 | + rather than modifying the original QPDF object. (Note that qpdf | |
| 5860 | + never modifies the original file itself.) | |
| 5861 | + | |
| 5862 | + - Allow the PDF header to appear anywhere in the first 1024 bytes of | |
| 5863 | + the file. This is consistent with what other readers do. | |
| 5864 | + | |
| 5865 | + - Fix the @1@command@1@pkg-config@2@command@2@ files to list zlib | |
| 5866 | + and pcre in ``Requires.private`` to better support static linking | |
| 5867 | + using @1@command@1@pkg-config@2@command@2@. | |
| 5868 | + | |
| 5869 | +3.0.2: September 6, 2012 | |
| 5870 | + - Bug fix: ``QPDFWriter::setOutputMemory`` did not work when not | |
| 5871 | + used with ``QPDFWriter::setStaticID``, which made it pretty much | |
| 5872 | + useless. This has been fixed. | |
| 5873 | + | |
| 5874 | + - New API call ``QPDFWriter::setExtraHeaderText`` inserts additional | |
| 5875 | + text near the header of the PDF file. The intended use case is to | |
| 5876 | + insert comments that may be consumed by a downstream application, | |
| 5877 | + though other use cases may exist. | |
| 5878 | + | |
| 5879 | +3.0.1: August 11, 2012 | |
| 5880 | + - Version 3.0.0 included addition of files for | |
| 5881 | + @1@command@1@pkg-config@2@command@2@, but this was not mentioned | |
| 5882 | + in the release notes. The release notes for 3.0.0 were updated to | |
| 5883 | + mention this. | |
| 5884 | + | |
| 5885 | + - Bug fix: if an object stream ended with a scalar object not | |
| 5886 | + followed by space, qpdf would incorrectly report that it | |
| 5887 | + encountered a premature EOF. This bug has been in qpdf since | |
| 5888 | + versionย 2.0. | |
| 5889 | + | |
| 5890 | +3.0.0: August 2, 2012 | |
| 5891 | + - Acknowledgment: I would like to express gratitude for the | |
| 5892 | + contributions of Tobias Hoffmann toward the release of qpdf | |
| 5893 | + version 3.0. He is responsible for most of the implementation and | |
| 5894 | + design of the new API for manipulating pages, and contributed code | |
| 5895 | + and ideas for many of the improvements made in version 3.0. | |
| 5896 | + Without his work, this release would certainly not have happened | |
| 5897 | + as soon as it did, if at all. | |
| 5898 | + | |
| 5899 | + - *Non-compatible API change:* The version of | |
| 5900 | + ``QPDFObjectHandle::replaceStreamData`` that uses a | |
| 5901 | + ``StreamDataProvider`` no longer requires (or accepts) a | |
| 5902 | + ``length`` parameter. See | |
| 5903 | + `appendix_title <#ref.upgrading-to-3.0>`__ for an explanation. | |
| 5904 | + While care is taken to avoid non-compatible API changes in | |
| 5905 | + general, an exception was made this time because the new interface | |
| 5906 | + offers an opportunity to significantly simplify calling code. | |
| 5907 | + | |
| 5908 | + - Support has been added for large files. The test suite verifies | |
| 5909 | + support for files larger than 4 gigabytes, and manual testing has | |
| 5910 | + verified support for files larger than 10 gigabytes. Large file | |
| 5911 | + support is available for both 32-bit and 64-bit platforms as long | |
| 5912 | + as the compiler and underlying platforms support it. | |
| 5913 | + | |
| 5914 | + - Support for page selection (splitting and merging PDF files) has | |
| 5915 | + been added to the @1@command@1@qpdf@2@command@2@ command-line | |
| 5916 | + tool. See `Page Selection Options <#ref.page-selection>`__. | |
| 5917 | + | |
| 5918 | + - Options have been added to the @1@command@1@qpdf@2@command@2@ | |
| 5919 | + command-line tool for copying encryption parameters from another | |
| 5920 | + file. See `Basic Options <#ref.basic-options>`__. | |
| 5921 | + | |
| 5922 | + - New methods have been added to the ``QPDF`` object for adding and | |
| 5923 | + removing pages. See `Adding and Removing | |
| 5924 | + Pages <#ref.adding-and-remove-pages>`__. | |
| 5925 | + | |
| 5926 | + - New methods have been added to the ``QPDF`` object for copying | |
| 5927 | + objects from other PDF files. See `Copying Objects From Other PDF | |
| 5928 | + Files <#ref.foreign-objects>`__ | |
| 5929 | + | |
| 5930 | + - A new method ``QPDFObjectHandle::parse`` has been added for | |
| 5931 | + constructing ``QPDFObjectHandle`` objects from a string | |
| 5932 | + description. | |
| 5933 | + | |
| 5934 | + - Methods have been added to ``QPDFWriter`` to allow writing to an | |
| 5935 | + already open stdio ``FILE*`` addition to writing to standard | |
| 5936 | + output or a named file. Methods have been added to ``QPDF`` to be | |
| 5937 | + able to process a file from an already open stdio ``FILE*``. This | |
| 5938 | + makes it possible to read and write PDF from secure temporary | |
| 5939 | + files that have been unlinked prior to being fully read or | |
| 5940 | + written. | |
| 5941 | + | |
| 5942 | + - The ``QPDF::emptyPDF`` can be used to allow creation of PDF files | |
| 5943 | + from scratch. The example | |
| 5944 | + @1@filename@1@examples/pdf-create.cc@2@filename@2@ illustrates how | |
| 5945 | + it can be used. | |
| 5946 | + | |
| 5947 | + - Several methods to take ``PointerHolder<Buffer>`` can now also | |
| 5948 | + accept ``std::string`` arguments. | |
| 5949 | + | |
| 5950 | + - Many new convenience methods have been added to the library, most | |
| 5951 | + in ``QPDFObjectHandle``. See @1@filename@1@ChangeLog@2@filename@2@ | |
| 5952 | + for a full list. | |
| 5953 | + | |
| 5954 | + - When building on a platform that supports ELF shared libraries | |
| 5955 | + (such as Linux), symbol versions are enabled by default. They can | |
| 5956 | + be disabled by passing | |
| 5957 | + @1@option@1@--disable-ld-version-script@2@option@2@ to | |
| 5958 | + @1@command@1@./configure@2@command@2@. | |
| 5959 | + | |
| 5960 | + - The file @1@filename@1@libqpdf.pc@2@filename@2@ is now installed | |
| 5961 | + to support @1@command@1@pkg-config@2@command@2@. | |
| 5962 | + | |
| 5963 | + - Image comparison tests are off by default now since they are not | |
| 5964 | + needed to verify a correct build or port of qpdf. They are needed | |
| 5965 | + only when changing the actual PDF output generated by qpdf. You | |
| 5966 | + should enable them if you are making deep changes to qpdf itself. | |
| 5967 | + See @1@filename@1@README.md@2@filename@2@ for details. | |
| 5968 | + | |
| 5969 | + - Large file tests are off by default but can be turned on with | |
| 5970 | + @1@command@1@./configure@2@command@2@ or by setting an environment | |
| 5971 | + variable before running the test suite. See | |
| 5972 | + @1@filename@1@README.md@2@filename@2@ for details. | |
| 5973 | + | |
| 5974 | + - When qpdf's test suite fails, failures are not printed to the | |
| 5975 | + terminal anymore by default. Instead, find them in | |
| 5976 | + @1@filename@1@build/qtest.log@2@filename@2@. For packagers who are | |
| 5977 | + building with an autobuilder, you can add the | |
| 5978 | + @1@option@1@--enable-show-failed-test-output@2@option@2@ option to | |
| 5979 | + @1@command@1@./configure@2@command@2@ to restore the old behavior. | |
| 5980 | + | |
| 5981 | +2.3.1: December 28, 2011 | |
| 5982 | + - Fix thread-safety problem resulting from non-thread-safe use of | |
| 5983 | + the PCRE library. | |
| 5984 | + | |
| 5985 | + - Made a few minor documentation fixes. | |
| 5986 | + | |
| 5987 | + - Add workaround for a bug that appears in some versions of | |
| 5988 | + ghostscript to the test suite | |
| 5989 | + | |
| 5990 | + - Fix minor build issue for Visual C++ 2010. | |
| 5991 | + | |
| 5992 | +2.3.0: August 11, 2011 | |
| 5993 | + - Bug fix: when preserving existing encryption on encrypted files | |
| 5994 | + with cleartext metadata, older qpdf versions would generate | |
| 5995 | + password-protected files with no valid password. This operation | |
| 5996 | + now works. This bug only affected files created by copying | |
| 5997 | + existing encryption parameters; explicit encryption with | |
| 5998 | + specification of cleartext metadata worked before and continues to | |
| 5999 | + work. | |
| 6000 | + | |
| 6001 | + - Enhance ``QPDFWriter`` with a new constructor that allows you to | |
| 6002 | + delay the specification of the output file. When using this | |
| 6003 | + constructor, you may now call ``QPDFWriter::setOutputFilename`` to | |
| 6004 | + specify the output file, or you may use | |
| 6005 | + ``QPDFWriter::setOutputMemory`` to cause ``QPDFWriter`` to write | |
| 6006 | + the resulting PDF file to a memory buffer. You may then use | |
| 6007 | + ``QPDFWriter::getBuffer`` to retrieve the memory buffer. | |
| 6008 | + | |
| 6009 | + - Add new API call ``QPDF::replaceObject`` for replacing objects by | |
| 6010 | + object ID | |
| 6011 | + | |
| 6012 | + - Add new API call ``QPDF::swapObjects`` for swapping two objects by | |
| 6013 | + object ID | |
| 6014 | + | |
| 6015 | + - Add ``QPDFObjectHandle::getDictAsMap`` and | |
| 6016 | + ``QPDFObjectHandle::getArrayAsVector`` to allow retrieval of | |
| 6017 | + dictionary objects as maps and array objects as vectors. | |
| 6018 | + | |
| 6019 | + - Add functions ``qpdf_get_info_key`` and ``qpdf_set_info_key`` to | |
| 6020 | + the C API for manipulating string fields of the document's | |
| 6021 | + ``/Info`` dictionary. | |
| 6022 | + | |
| 6023 | + - Add functions ``qpdf_init_write_memory``, | |
| 6024 | + ``qpdf_get_buffer_length``, and ``qpdf_get_buffer`` to the C API | |
| 6025 | + for writing PDF files to a memory buffer instead of a file. | |
| 6026 | + | |
| 6027 | +2.2.4: June 25, 2011 | |
| 6028 | + - Fix installation and compilation issues; no functionality changes. | |
| 6029 | + | |
| 6030 | +2.2.3: April 30, 2011 | |
| 6031 | + - Handle some damaged streams with incorrect characters following | |
| 6032 | + the stream keyword. | |
| 6033 | + | |
| 6034 | + - Improve handling of inline images when normalizing content | |
| 6035 | + streams. | |
| 6036 | + | |
| 6037 | + - Enhance error recovery to properly handle files that use object 0 | |
| 6038 | + as a regular object, which is specifically disallowed by the spec. | |
| 6039 | + | |
| 6040 | +2.2.2: October 4, 2010 | |
| 6041 | + - Add new function ``qpdf_read_memory`` to the C API to call | |
| 6042 | + ``QPDF::processMemoryFile``. This was an omission in qpdf 2.2.1. | |
| 6043 | + | |
| 6044 | +2.2.1: October 1, 2010 | |
| 6045 | + - Add new method ``QPDF::setOutputStreams`` to replace ``std::cout`` | |
| 6046 | + and ``std::cerr`` with other streams for generation of diagnostic | |
| 6047 | + messages and error messages. This can be useful for GUIs or other | |
| 6048 | + applications that want to capture any output generated by the | |
| 6049 | + library to present to the user in some other way. Note that QPDF | |
| 6050 | + does not write to ``std::cout`` (or the specified output stream) | |
| 6051 | + except where explicitly mentioned in | |
| 6052 | + @1@filename@1@QPDF.hh@2@filename@2@, and that the only use of the | |
| 6053 | + error stream is for warnings. Note also that output of warnings is | |
| 6054 | + suppressed when ``setSuppressWarnings(true)`` is called. | |
| 6055 | + | |
| 6056 | + - Add new method ``QPDF::processMemoryFile`` for operating on PDF | |
| 6057 | + files that are loaded into memory rather than in a file on disk. | |
| 6058 | + | |
| 6059 | + - Give a warning but otherwise ignore empty PDF objects by treating | |
| 6060 | + them as null. Empty object are not permitted by the PDF | |
| 6061 | + specification but have been known to appear in some actual PDF | |
| 6062 | + files. | |
| 6063 | + | |
| 6064 | + - Handle inline image filter abbreviations when the appear as stream | |
| 6065 | + filter abbreviations. The PDF specification does not allow use of | |
| 6066 | + stream filter abbreviations in this way, but Adobe Reader and some | |
| 6067 | + other PDF readers accept them since they sometimes appear | |
| 6068 | + incorrectly in actual PDF files. | |
| 6069 | + | |
| 6070 | + - Implement miscellaneous enhancements to ``PointerHolder`` and | |
| 6071 | + ``Buffer`` to support other changes. | |
| 6072 | + | |
| 6073 | +2.2.0: August 14, 2010 | |
| 6074 | + - Add new methods to ``QPDFObjectHandle`` (``newStream`` and | |
| 6075 | + ``replaceStreamData`` for creating new streams and replacing | |
| 6076 | + stream data. This makes it possible to perform a wide range of | |
| 6077 | + operations that were not previously possible. | |
| 6078 | + | |
| 6079 | + - Add new helper method in ``QPDFObjectHandle`` | |
| 6080 | + (``addPageContents``) for appending or prepending new content | |
| 6081 | + streams to a page. This method makes it possible to manipulate | |
| 6082 | + content streams without having to be concerned whether a page's | |
| 6083 | + contents are a single stream or an array of streams. | |
| 6084 | + | |
| 6085 | + - Add new method in ``QPDFObjectHandle``: ``replaceOrRemoveKey``, | |
| 6086 | + which replaces a dictionary key with a given value unless the | |
| 6087 | + value is null, in which case it removes the key instead. | |
| 6088 | + | |
| 6089 | + - Add new method in ``QPDFObjectHandle``: ``getRawStreamData``, | |
| 6090 | + which returns the raw (unfiltered) stream data into a buffer. This | |
| 6091 | + complements the ``getStreamData`` method, which returns the | |
| 6092 | + filtered (uncompressed) stream data and can only be used when the | |
| 6093 | + stream's data is filterable. | |
| 6094 | + | |
| 6095 | + - Provide two new examples: | |
| 6096 | + @1@command@1@pdf-double-page-size@2@command@2@ and | |
| 6097 | + @1@command@1@pdf-invert-images@2@command@2@ that illustrate the | |
| 6098 | + newly added interfaces. | |
| 6099 | + | |
| 6100 | + - Fix a memory leak that would cause loss of a few bytes for every | |
| 6101 | + object involved in a cycle of object references. Thanks to Jian Ma | |
| 6102 | + for calling my attention to the leak. | |
| 6103 | + | |
| 6104 | +2.1.5: April 25, 2010 | |
| 6105 | + - Remove restriction of file identifier strings to 16 bytes. This | |
| 6106 | + unnecessary restriction was preventing qpdf from being able to | |
| 6107 | + encrypt or decrypt files with identifier strings that were not | |
| 6108 | + exactly 16 bytes long. The specification imposes no such | |
| 6109 | + restriction. | |
| 6110 | + | |
| 6111 | +2.1.4: April 18, 2010 | |
| 6112 | + - Apply the same padding calculation fix from version 2.1.2 to the | |
| 6113 | + main cross reference stream as well. | |
| 6114 | + | |
| 6115 | + - Since @1@command@1@qpdf --check@2@command@2@ only performs limited | |
| 6116 | + checks, clarify the output to make it clear that there still may | |
| 6117 | + be errors that qpdf can't check. This should make it less | |
| 6118 | + surprising to people when another PDF reader is unable to read a | |
| 6119 | + file that qpdf thinks is okay. | |
| 6120 | + | |
| 6121 | +2.1.3: March 27, 2010 | |
| 6122 | + - Fix bug that could cause a failure when rewriting PDF files that | |
| 6123 | + contain object streams with unreferenced objects that in turn | |
| 6124 | + reference indirect scalars. | |
| 6125 | + | |
| 6126 | + - Don't complain about (invalid) AES streams that aren't a multiple | |
| 6127 | + of 16 bytes. Instead, pad them before decrypting. | |
| 6128 | + | |
| 6129 | +2.1.2: January 24, 2010 | |
| 6130 | + - Fix bug in padding around first half cross reference stream in | |
| 6131 | + linearized files. The bug could cause an assertion failure when | |
| 6132 | + linearizing certain unlucky files. | |
| 6133 | + | |
| 6134 | +2.1.1: December 14, 2009 | |
| 6135 | + - No changes in functionality; insert missing include in an internal | |
| 6136 | + library header file to support gcc 4.4, and update test suite to | |
| 6137 | + ignore broken Adobe Reader installations. | |
| 6138 | + | |
| 6139 | +2.1: October 30, 2009 | |
| 6140 | + - This is the first version of qpdf to include Windows support. On | |
| 6141 | + Windows, it is possible to build a DLL. Additionally, a partial | |
| 6142 | + C-language API has been introduced, which makes it possible to | |
| 6143 | + call qpdf functions from non-C++ environments. I am very grateful | |
| 6144 | + to ลฝarko Gajiฤ (http://zarko-gajic.iz.hr/) for tirelessly testing | |
| 6145 | + numerous pre-release versions of this DLL and providing many | |
| 6146 | + excellent suggestions on improving the interface. | |
| 6147 | + | |
| 6148 | + For programming to the C interface, please see the header file | |
| 6149 | + @1@filename@1@qpdf/qpdf-c.h@2@filename@2@ and the example | |
| 6150 | + @1@filename@1@examples/pdf-linearize.c@2@filename@2@. | |
| 6151 | + | |
| 6152 | + - ลฝarko Gajiฤ has written a Delphi wrapper for qpdf, which can be | |
| 6153 | + downloaded from qpdf's download side. ลฝarko's Delphi wrapper is | |
| 6154 | + released with the same licensing terms as qpdf itself and comes | |
| 6155 | + with this disclaimer: "Delphi wrapper unit | |
| 6156 | + @1@filename@1@qpdf.pas@2@filename@2@ created by ลฝarko Gajiฤ | |
| 6157 | + (http://zarko-gajic.iz.hr/). Use at your own risk and for whatever | |
| 6158 | + purpose you want. No support is provided. Sample code is | |
| 6159 | + provided." | |
| 6160 | + | |
| 6161 | + - Support has been added for AES encryption and crypt filters. | |
| 6162 | + Although qpdf does not presently support files that use PKI-based | |
| 6163 | + encryption, with the addition of AES and crypt filters, qpdf is | |
| 6164 | + now be able to open most encrypted files created with newer | |
| 6165 | + versions of Acrobat or other PDF creation software. Note that I | |
| 6166 | + have not been able to get very many files encrypted in this way, | |
| 6167 | + so it's possible there could still be some cases that qpdf can't | |
| 6168 | + handle. Please report them if you find them. | |
| 6169 | + | |
| 6170 | + - Many error messages have been improved to include more information | |
| 6171 | + in hopes of making qpdf a more useful tool for PDF experts to use | |
| 6172 | + in manually recovering damaged PDF files. | |
| 6173 | + | |
| 6174 | + - Attempt to avoid compressing metadata streams if possible. This is | |
| 6175 | + consistent with other PDF creation applications. | |
| 6176 | + | |
| 6177 | + - Provide new command-line options for AES encrypt, cleartext | |
| 6178 | + metadata, and setting the minimum and forced PDF versions of | |
| 6179 | + output files. | |
| 6180 | + | |
| 6181 | + - Add additional methods to the ``QPDF`` object for querying the | |
| 6182 | + document's permissions. Although qpdf does not enforce these | |
| 6183 | + permissions, it does make them available so that applications that | |
| 6184 | + use qpdf can enforce permissions. | |
| 6185 | + | |
| 6186 | + - The @1@option@1@--check@2@option@2@ option to | |
| 6187 | + @1@command@1@qpdf@2@command@2@ has been extended to include some | |
| 6188 | + additional information. | |
| 6189 | + | |
| 6190 | + - There have been a handful of non-compatible API changes. For | |
| 6191 | + details, see `appendix_title <#ref.upgrading-to-2.1>`__. | |
| 6192 | + | |
| 6193 | +2.0.6: May 3, 2009 | |
| 6194 | + - Do not attempt to uncompress streams that have decode parameters | |
| 6195 | + we don't recognize. Earlier versions of qpdf would have rejected | |
| 6196 | + files with such streams. | |
| 6197 | + | |
| 6198 | +2.0.5: March 10, 2009 | |
| 6199 | + - Improve error handling in the LZW decoder, and fix a small error | |
| 6200 | + introduced in the previous version with regard to handling full | |
| 6201 | + tables. The LZW decoder has been more strongly verified in this | |
| 6202 | + release. | |
| 6203 | + | |
| 6204 | +2.0.4: February 21, 2009 | |
| 6205 | + - Include proper support for LZW streams encoded without the "early | |
| 6206 | + code change" flag. Special thanks to Atom Smasher who reported the | |
| 6207 | + problem and provided an input file compressed in this way, which I | |
| 6208 | + did not previously have. | |
| 6209 | + | |
| 6210 | + - Implement some improvements to file recovery logic. | |
| 6211 | + | |
| 6212 | +2.0.3: February 15, 2009 | |
| 6213 | + - Compile cleanly with gcc 4.4. | |
| 6214 | + | |
| 6215 | + - Handle strings encoded as UTF-16BE properly. | |
| 6216 | + | |
| 6217 | +2.0.2: June 30, 2008 | |
| 6218 | + - Update test suite to work properly with a | |
| 6219 | + non-@1@command@1@bash@2@command@2@ | |
| 6220 | + @1@filename@1@/bin/sh@2@filename@2@ and with Perl 5.10. No changes | |
| 6221 | + were made to the actual qpdf source code itself for this release. | |
| 6222 | + | |
| 6223 | +2.0.1: May 6, 2008 | |
| 6224 | + - No changes in functionality or interface. This release includes | |
| 6225 | + fixes to the source code so that qpdf compiles properly and passes | |
| 6226 | + its test suite on a broader range of platforms. See | |
| 6227 | + @1@filename@1@ChangeLog@2@filename@2@ in the source distribution | |
| 6228 | + for details. | |
| 6229 | + | |
| 6230 | +2.0: April 29, 2008 | |
| 6231 | + - First public release. | |
| 6232 | + | |
| 6233 | +.. _ref.upgrading-to-2.1: | |
| 6234 | + | |
| 6235 | +Upgrading from 2.0 to 2.1 | |
| 6236 | +========================= | |
| 6237 | + | |
| 6238 | +Although, as a general rule, we like to avoid introducing source-level | |
| 6239 | +incompatibilities in qpdf's interface, there were a few non-compatible | |
| 6240 | +changes made in this version. A considerable amount of source code that | |
| 6241 | +uses qpdf will probably compile without any changes, but in some cases, | |
| 6242 | +you may have to update your code. The changes are enumerated here. There | |
| 6243 | +are also some new interfaces; for those, please refer to the header | |
| 6244 | +files. | |
| 6245 | + | |
| 6246 | +- QPDF's exception handling mechanism now uses ``std::logic_error`` for | |
| 6247 | + internal errors and ``std::runtime_error`` for runtime errors in | |
| 6248 | + favor of the now removed ``QEXC`` classes used in previous versions. | |
| 6249 | + The ``QEXC`` exception classes predated the addition of the | |
| 6250 | + @1@filename@1@<stdexcept>@2@filename@2@ header file to the C++ | |
| 6251 | + standard library. Most of the exceptions thrown by the qpdf library | |
| 6252 | + itself are still of type ``QPDFExc`` which is now derived from | |
| 6253 | + ``std::runtime_error``. Programs that caught an instance of | |
| 6254 | + ``std::exception`` and displayed it by calling the ``what()`` method | |
| 6255 | + will not need to be changed. | |
| 6256 | + | |
| 6257 | +- The ``QPDFExc`` class now internally represents various fields of the | |
| 6258 | + error condition and provides interfaces for querying them. Among the | |
| 6259 | + fields is a numeric error code that can help applications act | |
| 6260 | + differently on (a small number of) different error conditions. See | |
| 6261 | + @1@filename@1@QPDFExc.hh@2@filename@2@ for details. | |
| 6262 | + | |
| 6263 | +- Warnings can be retrieved from qpdf as instances of ``QPDFExc`` | |
| 6264 | + instead of strings. | |
| 6265 | + | |
| 6266 | +- The nested ``QPDF::EncryptionData`` class's constructor takes an | |
| 6267 | + additional argument. This class is primarily intended to be used by | |
| 6268 | + ``QPDFWriter``. There's not really anything useful an end-user | |
| 6269 | + application could do with it. It probably shouldn't really be part of | |
| 6270 | + the public interface to begin with. Likewise, some of the methods for | |
| 6271 | + computing internal encryption dictionary parameters have changed to | |
| 6272 | + support ``/R=4`` encryption. | |
| 6273 | + | |
| 6274 | +- The method ``QPDF::getUserPassword`` has been removed since it didn't | |
| 6275 | + do what people would think it did. There are now two new methods: | |
| 6276 | + ``QPDF::getPaddedUserPassword`` and ``QPDF::getTrimmedUserPassword``. | |
| 6277 | + The first one does what the old ``QPDF::getUserPassword`` method used | |
| 6278 | + to do, which is to return the password with possible binary padding | |
| 6279 | + as specified by the PDF specification. The second one returns a | |
| 6280 | + human-readable password string. | |
| 6281 | + | |
| 6282 | +- The enumerated types that used to be nested in ``QPDFWriter`` have | |
| 6283 | + moved to top-level enumerated types and are now defined in the file | |
| 6284 | + @1@filename@1@qpdf/Constants.h@2@filename@2@. This enables them to be | |
| 6285 | + shared by both the C and C++ interfaces. | |
| 6286 | + | |
| 6287 | +.. _ref.upgrading-to-3.0: | |
| 6288 | + | |
| 6289 | +Upgrading to 3.0 | |
| 6290 | +================ | |
| 6291 | + | |
| 6292 | +For the most part, the API for qpdf version 3.0 is backward compatible | |
| 6293 | +with versions 2.1 and later. There are two exceptions: | |
| 6294 | + | |
| 6295 | +- The method ``QPDFObjectHandle::replaceStreamData`` that uses a | |
| 6296 | + ``StreamDataProvider`` to provide the stream data no longer takes a | |
| 6297 | + ``length`` parameter. While it would have been easy enough to keep | |
| 6298 | + the parameter for backward compatibility, in this case, the parameter | |
| 6299 | + was removed since this provides the user an opportunity to simplify | |
| 6300 | + the calling code. This method was introduced in version 2.2. At the | |
| 6301 | + time, the ``length`` parameter was required in order to ensure that | |
| 6302 | + calls to the stream data provider returned the same length for a | |
| 6303 | + specific stream every time they were invoked. In particular, the | |
| 6304 | + linearization code depends on this. Instead, qpdf 3.0 and newer check | |
| 6305 | + for that constraint explicitly. The first time the stream data | |
| 6306 | + provider is called for a specific stream, the actual length is saved, | |
| 6307 | + and subsequent calls are required to return the same number of bytes. | |
| 6308 | + This means the calling code no longer has to compute the length in | |
| 6309 | + advance, which can be a significant simplification. If your code | |
| 6310 | + fails to compile because of the extra argument and you don't want to | |
| 6311 | + make other changes to your code, just omit the argument. | |
| 6312 | + | |
| 6313 | +- Many methods take ``long long`` instead of other integer types. Most | |
| 6314 | + if not all existing code should compile fine with this change since | |
| 6315 | + such parameters had always previously been smaller types. This change | |
| 6316 | + was required to support files larger than two gigabytes in size. | |
| 6317 | + | |
| 6318 | +.. _ref.upgrading-to-4.0: | |
| 6319 | + | |
| 6320 | +Upgrading to 4.0 | |
| 6321 | +================ | |
| 6322 | + | |
| 6323 | +While version 4.0 includes a few non-compatible API changes, it is very | |
| 6324 | +unlikely that anyone's code would have used any of those parts of the | |
| 6325 | +API since they generally required information that would only be | |
| 6326 | +available inside the library. In the unlikely event that you should run | |
| 6327 | +into trouble, please see the ChangeLog. See also | |
| 6328 | +`appendix_title <#ref.release-notes>`__ for a complete list of the | |
| 6329 | +non-compatible API changes made in this version. | |
| 6330 | + | |
| 6331 | + | |
| 6332 | + | |
| 9 | 6333 | Indices and tables |
| 10 | 6334 | ================== |
| 11 | 6335 | ... | ... |