Commit 910a373a79f885cba1023fa69aa0c679e4ae0601
1 parent
a6c4b293
Clean up the Design and Library Notes chapter of the manual
Showing
1 changed file
with
195 additions
and
207 deletions
manual/design.rst
| ... | ... | @@ -8,50 +8,53 @@ Design and Library Notes |
| 8 | 8 | Introduction |
| 9 | 9 | ------------ |
| 10 | 10 | |
| 11 | -This section was written prior to the implementation of the qpdf package | |
| 12 | -and was subsequently modified to reflect the implementation. In some | |
| 13 | -cases, for purposes of explanation, it may differ slightly from the | |
| 14 | -actual implementation. As always, the source code and test suite are | |
| 15 | -authoritative. Even if there are some errors, this document should serve | |
| 16 | -as a road map to understanding how this code works. | |
| 11 | +This section was written prior to the implementation of the qpdf | |
| 12 | +library and was subsequently modified to reflect the implementation. | |
| 13 | +In some cases, for purposes of explanation, it may differ slightly | |
| 14 | +from the actual implementation. As always, the source code and test | |
| 15 | +suite are authoritative. Even if there are some errors, this document | |
| 16 | +should serve as a road map to understanding how this code works. | |
| 17 | 17 | |
| 18 | 18 | In general, one should adhere strictly to a specification when writing |
| 19 | -but be liberal in reading. This way, the product of our software will be | |
| 20 | -accepted by the widest range of other programs, and we will accept the | |
| 21 | -widest range of input files. This library attempts to conform to that | |
| 22 | -philosophy whenever possible but also aims to provide strict checking | |
| 23 | -for people who want to validate PDF files. If you don't want to see | |
| 24 | -warnings and are trying to write something that is tolerant, you can | |
| 25 | -call ``setSuppressWarnings(true)``. If you want to fail on the first | |
| 26 | -error, you can call ``setAttemptRecovery(false)``. The default behavior | |
| 27 | -is to generating warnings for recoverable problems. Note that recovery | |
| 28 | -will not always produce the desired results even if it is able to get | |
| 29 | -through the file. Unlike most other PDF files that produce generic | |
| 30 | -warnings such as "This file is damaged,", qpdf generally issues a | |
| 31 | -detailed error message that would be most useful to a PDF developer. | |
| 19 | +but be liberal in reading. This way, the product of our software will | |
| 20 | +be accepted by the widest range of other programs, and we will accept | |
| 21 | +the widest range of input files. This library attempts to conform to | |
| 22 | +that philosophy whenever possible but also aims to provide strict | |
| 23 | +checking for people who want to validate PDF files. If you don't want | |
| 24 | +to see warnings and are trying to write something that is tolerant, | |
| 25 | +you can call ``setSuppressWarnings(true)``. If you want to fail on the | |
| 26 | +first error, you can call ``setAttemptRecovery(false)``. The default | |
| 27 | +behavior is to generating warnings for recoverable problems. Note that | |
| 28 | +recovery will not always produce the desired results even if it is | |
| 29 | +able to get through the file. Unlike most other PDF files that produce | |
| 30 | +generic warnings such as "This file is damaged," qpdf generally issues | |
| 31 | +a detailed error message that would be most useful to a PDF developer. | |
| 32 | 32 | This is by design as there seems to be a shortage of PDF validation |
| 33 | -tools out there. This was, in fact, one of the major motivations behind | |
| 34 | -the initial creation of qpdf. | |
| 33 | +tools out there. This was, in fact, one of the major motivations | |
| 34 | +behind the initial creation of qpdf. That said, qpdf is not a strict | |
| 35 | +PDF checker. There are many ways in which a PDF file can be out of | |
| 36 | +conformance to the spec that qpdf doesn't notice or report. | |
| 35 | 37 | |
| 36 | 38 | .. _design-goals: |
| 37 | 39 | |
| 38 | 40 | Design Goals |
| 39 | 41 | ------------ |
| 40 | 42 | |
| 41 | -The QPDF package includes support for reading and rewriting PDF files. | |
| 43 | +The qpdf library includes support for reading and rewriting PDF files. | |
| 42 | 44 | It aims to hide from the user details involving object locations, |
| 43 | -modified (appended) PDF files, the directness/indirectness of objects, | |
| 44 | -and stream filters including encryption. It does not aim to hide | |
| 45 | -knowledge of the object hierarchy or content stream contents. Put | |
| 46 | -another way, a user of the qpdf library is expected to have knowledge | |
| 47 | -about how PDF files work, but is not expected to have to keep track of | |
| 48 | -bookkeeping details such as file positions. | |
| 49 | - | |
| 50 | -A user of the library never has to care whether an object is direct or | |
| 51 | -indirect, though it is possible to determine whether an object is direct | |
| 52 | -or not if this information is needed. All access to objects deals with | |
| 53 | -this transparently. All memory management details are also handled by | |
| 54 | -the library. | |
| 45 | +modified (appended) PDF files, use of object streams, and stream | |
| 46 | +filters including encryption. It does not aim to hide knowledge of the | |
| 47 | +object hierarchy or content stream contents. Put another way, a user | |
| 48 | +of the qpdf library is expected to have knowledge about how PDF files | |
| 49 | +work, but is not expected to have to keep track of bookkeeping details | |
| 50 | +such as file positions. | |
| 51 | + | |
| 52 | +When accessing objects, a user of the library never has to care | |
| 53 | +whether an object is direct or indirect as all access to objects deals | |
| 54 | +with this transparently. All memory management details are also | |
| 55 | +handled by the library. When modifying objects, it is possible to | |
| 56 | +determine whether an object is indirect and to make copies of the | |
| 57 | +object if needed. | |
| 55 | 58 | |
| 56 | 59 | Memory is managed mostly with ``std::shared_ptr`` object to minimize |
| 57 | 60 | explicit memory handling. This library also makes use of a technique |
| ... | ... | @@ -85,29 +88,32 @@ objects to indirect objects and vice versa. |
| 85 | 88 | Instances of ``QPDFObjectHandle`` can be directly created and modified |
| 86 | 89 | using static factory methods in the ``QPDFObjectHandle`` class. There |
| 87 | 90 | are factory methods for each type of object as well as a convenience |
| 88 | -method ``QPDFObjectHandle::parse`` that creates an object from a string | |
| 89 | -representation of the object. Existing instances of ``QPDFObjectHandle`` | |
| 90 | -can also be modified in several ways. See comments in | |
| 91 | -:file:`QPDFObjectHandle.hh` for details. | |
| 91 | +method ``QPDFObjectHandle::parse`` that creates an object from a | |
| 92 | +string representation of the object. The ``_qpdf`` user-defined string | |
| 93 | +literal is also available, making it possible to create instances of | |
| 94 | +``QPDFObjectHandle`` with ``"(pdf-syntax)"_qpdf``. Existing instances | |
| 95 | +of ``QPDFObjectHandle`` can also be modified in several ways. See | |
| 96 | +comments in :file:`QPDFObjectHandle.hh` for details. | |
| 92 | 97 | |
| 93 | 98 | An instance of ``QPDF`` is constructed by using the class's default |
| 94 | -constructor. If desired, the ``QPDF`` object may be configured with | |
| 95 | -various methods that change its default behavior. Then the | |
| 96 | -``QPDF::processFile()`` method is passed the name of a PDF file, which | |
| 97 | -permanently associates the file with that QPDF object. A password may | |
| 98 | -also be given for access to password-protected files. QPDF does not | |
| 99 | -enforce encryption parameters and will treat user and owner passwords | |
| 100 | -equivalently. Either password may be used to access an encrypted file. | |
| 101 | -``QPDF`` will allow recovery of a user password given an owner password. | |
| 102 | -The input PDF file must be seekable. (Output files written by | |
| 103 | -``QPDFWriter`` need not be seekable, even when creating linearized | |
| 104 | -files.) During construction, ``QPDF`` validates the PDF file's header, | |
| 105 | -and then reads the cross reference tables and trailer dictionaries. The | |
| 106 | -``QPDF`` class keeps only the first trailer dictionary though it does | |
| 107 | -read all of them so it can check the ``/Prev`` key. ``QPDF`` class users | |
| 108 | -may request the root object and the trailer dictionary specifically. The | |
| 109 | -cross reference table is kept private. Objects may then be requested by | |
| 110 | -number or by walking the object tree. | |
| 99 | +constructor or with ``QPDF::create()``. If desired, the ``QPDF`` | |
| 100 | +object may be configured with various methods that change its default | |
| 101 | +behavior. Then the ``QPDF::processFile`` method is passed the name of | |
| 102 | +a PDF file, which permanently associates the file with that ``QPDF`` | |
| 103 | +object. A password may also be given for access to password-protected | |
| 104 | +files. ``QPDF`` does not enforce encryption parameters and will treat | |
| 105 | +user and owner passwords equivalently. Either password may be used to | |
| 106 | +access an encrypted file. ``QPDF`` will allow recovery of a user | |
| 107 | +password given an owner password. The input PDF file must be seekable. | |
| 108 | +Output files written by ``QPDFWriter`` need not be seekable, even when | |
| 109 | +creating linearized files. During construction, ``QPDF`` validates the | |
| 110 | +PDF file's header, and then reads the cross reference tables and | |
| 111 | +trailer dictionaries. The ``QPDF`` class keeps only the first trailer | |
| 112 | +dictionary though it does read all of them so it can check the | |
| 113 | +``/Prev`` key. ``QPDF`` class users may request the root object and | |
| 114 | +the trailer dictionary specifically. The cross reference table is kept | |
| 115 | +private. Objects may then be requested by number or by walking the | |
| 116 | +object tree. | |
| 111 | 117 | |
| 112 | 118 | When a PDF file has a cross-reference stream instead of a |
| 113 | 119 | cross-reference table and trailer, requesting the document's trailer |
| ... | ... | @@ -240,13 +246,14 @@ the ``QPDFObjectHandle`` type to hold onto objects and to abstract |
| 240 | 246 | away in most cases whether the object is direct or indirect. |
| 241 | 247 | |
| 242 | 248 | Internally, ``QPDFObjectHandle`` holds onto a shared pointer to the |
| 243 | -underlying object value. When a direct object is created, the | |
| 244 | -``QPDFObjectHandle`` that holds it is not associated with a ``QPDF`` | |
| 245 | -object. When an indirect object reference is created, it starts off in | |
| 246 | -an *unresolved* state and must be associated with a ``QPDF`` object, | |
| 247 | -which is considered its *owner*. To access the actual value of the | |
| 248 | -object, the object must be *resolved*. This happens automatically when | |
| 249 | -the the object is accessed in any way. | |
| 249 | +underlying object value. When a direct object is created | |
| 250 | +programmatically by client code (rather than being read from the | |
| 251 | +file), the ``QPDFObjectHandle`` that holds it is not associated with a | |
| 252 | +``QPDF`` object. When an indirect object reference is created, it | |
| 253 | +starts off in an *unresolved* state and must be associated with a | |
| 254 | +``QPDF`` object, which is considered its *owner*. To access the actual | |
| 255 | +value of the object, the object must be *resolved*. This happens | |
| 256 | +automatically when the the object is accessed in any way. | |
| 250 | 257 | |
| 251 | 258 | To resolve an object, qpdf checks its object cache. If not found in |
| 252 | 259 | the cache, it attempts to read the object from the input source |
| ... | ... | @@ -286,18 +293,20 @@ file. |
| 286 | 293 | it is looking before the last ``%%EOF``. After getting to ``trailer`` |
| 287 | 294 | keyword, it invokes the parser. |
| 288 | 295 | |
| 289 | -- The parser sees ``<<``, so it calls itself recursively in | |
| 290 | - dictionary creation mode. | |
| 296 | +- The parser sees ``<<``, so it changes state and starts accumulating | |
| 297 | + the keys and values of the dictionary. | |
| 291 | 298 | |
| 292 | 299 | - In dictionary creation mode, the parser keeps accumulating objects |
| 293 | 300 | until it encounters ``>>``. Each object that is read is pushed onto |
| 294 | 301 | a stack. If ``R`` is read, the last two objects on the stack are |
| 295 | 302 | inspected. If they are integers, they are popped off the stack and |
| 296 | - their values are used to construct an indirect object handle which | |
| 297 | - is then pushed onto the stack. When ``>>`` is finally read, the | |
| 298 | - stack is converted into a ``QPDF_Dictionary`` (not directly | |
| 299 | - accessible through the API) which is placed in a | |
| 300 | - ``QPDFObjectHandle`` and returned. | |
| 303 | + their values are used to obtain an indirect object handle from the | |
| 304 | + ``QPDF`` class. The ``QPDF`` class consults its cache, and if | |
| 305 | + necessary, inserts a new unresolved object, and returns an object | |
| 306 | + handle pointing to the cache entry, which is then pushed onto the | |
| 307 | + stack. When ``>>`` is finally read, the stack is converted into a | |
| 308 | + ``QPDF_Dictionary`` (not directly accessible through the API) which | |
| 309 | + is placed in a ``QPDFObjectHandle`` and returned. | |
| 301 | 310 | |
| 302 | 311 | - The resulting dictionary is saved as the trailer dictionary. |
| 303 | 312 | |
| ... | ... | @@ -309,23 +318,21 @@ file. |
| 309 | 318 | - If there is an encryption dictionary, the document's encryption |
| 310 | 319 | parameters are initialized. |
| 311 | 320 | |
| 312 | -- The client requests root object. The ``QPDF`` class gets the value of | |
| 313 | - root key from trailer dictionary and returns it. It is an unresolved | |
| 314 | - indirect ``QPDFObjectHandle``. | |
| 321 | +- The client requests the root object by getting the value of the | |
| 322 | + ``/Root`` key from trailer dictionary and returns it. It is an | |
| 323 | + unresolved indirect ``QPDFObjectHandle``. | |
| 315 | 324 | |
| 316 | 325 | - The client requests the ``/Pages`` key from root |
| 317 | - ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is | |
| 318 | - indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the | |
| 319 | - object cache for an object with the root dictionary's object ID and | |
| 320 | - generation number. Upon not seeing it, it checks the cross reference | |
| 321 | - table, gets the offset, and reads the object present at that offset. | |
| 322 | - It stores the result in the object cache. The cache entry's value is | |
| 323 | - replaced by the actual value, which causes any previously unresolved | |
| 324 | - ``QPDFObjectHandle`` objects that that pointed there to now have a | |
| 325 | - shared copy of the actual object. Modifications through any such | |
| 326 | - ``QPDFObjectHandle`` will be reflected in all of them. As the client | |
| 327 | - continues to request objects, the same process is followed for each | |
| 328 | - new requested object. | |
| 326 | + ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is an | |
| 327 | + unresolved indirect object, so it asks ``QPDF`` to resolve it. | |
| 328 | + ``QPDF`` checks the cross reference table, gets the offset, and | |
| 329 | + reads the object present at that offset. The object cache entry's | |
| 330 | + ``unresolved`` value is replaced by the actual value, which causes | |
| 331 | + any previously unresolved ``QPDFObjectHandle`` objects that pointed | |
| 332 | + there to now have a shared copy of the actual object. Modifications | |
| 333 | + through any such ``QPDFObjectHandle`` will be reflected in all of | |
| 334 | + them. As the client continues to request objects, the same process | |
| 335 | + is followed for each new requested object. | |
| 329 | 336 | |
| 330 | 337 | .. _object_internals: |
| 331 | 338 | |
| ... | ... | @@ -339,11 +346,12 @@ Object Internals |
| 339 | 346 | ~~~~~~~~~~~~~~~~ |
| 340 | 347 | |
| 341 | 348 | The ``QPDF`` object has an object cache which contains a shared |
| 342 | -pointer to each object that was read from the file. Changes can be | |
| 343 | -made to any of those objects through ``QPDFObjectHandle`` methods. Any | |
| 344 | -such changes are visible to all ``QPDFObjectHandle`` instances that | |
| 345 | -point to the same object. When a ``QPDF`` object is written by | |
| 346 | -``QPDFWriter`` or serialized to JSON, any changes are reflected. | |
| 349 | +pointer to each object that was read from the file or added as an | |
| 350 | +indirect object. Changes can be made to any of those objects through | |
| 351 | +``QPDFObjectHandle`` methods. Any such changes are visible to all | |
| 352 | +``QPDFObjectHandle`` instances that point to the same object. When a | |
| 353 | +``QPDF`` object is written by ``QPDFWriter`` or serialized to JSON, | |
| 354 | +any changes are reflected. | |
| 347 | 355 | |
| 348 | 356 | Objects in qpdf 11 and Newer |
| 349 | 357 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| ... | ... | @@ -356,30 +364,32 @@ reference to that object has a copy of that shared pointer. Each |
| 356 | 364 | is an implementation for each of the basic object types (array, |
| 357 | 365 | dictionary, null, boolean, string, number, etc.) as well as a few |
| 358 | 366 | special ones including ``uninitialized``, ``unresolved``, |
| 359 | -``reserved``, and ``destroyed``. When an object is first referenced, | |
| 367 | +``reserved``, and ``destroyed``. When an object is first created, | |
| 360 | 368 | its underlying ``QPDFValue`` has type ``unresolved``. When the object |
| 361 | -is first resolved, the ``QPDFObject`` in the cache has its internal | |
| 369 | +is first accessed, the ``QPDFObject`` in the cache has its internal | |
| 362 | 370 | ``QPDFValue`` replaced with the object as read from the file. Since it |
| 363 | 371 | is the ``QPDFObject`` object that is shared by all referencing |
| 364 | 372 | ``QPDFObjectHandle`` objects as well as by the owning ``QPDF`` object, |
| 365 | 373 | this ensures that any future changes to the object, including |
| 366 | -replacing the object with a completely different one, will be | |
| 374 | +replacing the object with a completely different one by calling | |
| 375 | +``QPDF::replaceObject`` or ``QPDF::swapObjects``, will be | |
| 367 | 376 | reflected across all ``QPDFObjectHandle`` objects that reference it. |
| 368 | 377 | |
| 369 | 378 | A ``QPDFValue`` that originated from a PDF input source maintains a |
| 370 | 379 | pointer to the ``QPDF`` object that read it (its *owner*). When that |
| 371 | -``QPDF`` object is destroyed, it disconnects all reachable from it by | |
| 372 | -clearing their owner. For indirect objects (all objects in the object | |
| 373 | -cache), it also replaces the object's value with an object of type | |
| 374 | -``destroyed``. This means that, if there are still any referencing | |
| 375 | -``QPDFObjectHandle`` objects floating around, requesting their owning | |
| 376 | -``QPDF`` will return a null pointer rather than a pointer to a | |
| 377 | -``QPDF`` object that is either invalid or points to something else, | |
| 378 | -and any attempt to access an indirect object that is associated with a | |
| 379 | -destroyed ``QPDF`` object will throw an exception. This operation also | |
| 380 | -has the effect of breaking any circular references (which are common | |
| 381 | -and, in some cases, required by the PDF specification), thus | |
| 382 | -preventing memory leaks when ``QPDF`` objects are destroyed. | |
| 380 | +``QPDF`` object is destroyed, it disconnects all objects reachable | |
| 381 | +from it by clearing their owner. For indirect objects (all objects in | |
| 382 | +the object cache), it also replaces the object's value with an object | |
| 383 | +of type ``destroyed``. This means that, if there are still any | |
| 384 | +referencing ``QPDFObjectHandle`` objects floating around, requesting | |
| 385 | +their owning ``QPDF`` will return a null pointer rather than a pointer | |
| 386 | +to a ``QPDF`` object that is either invalid or points to something | |
| 387 | +else, and any attempt to access an indirect object that is associated | |
| 388 | +with a destroyed ``QPDF`` object will throw an exception. This | |
| 389 | +operation also has the effect of breaking any circular references | |
| 390 | +(which are common and, in some cases, required by the PDF | |
| 391 | +specification), thus preventing memory leaks when ``QPDF`` objects are | |
| 392 | +destroyed. | |
| 383 | 393 | |
| 384 | 394 | Objects prior to qpdf 11 |
| 385 | 395 | ~~~~~~~~~~~~~~~~~~~~~~~~ |
| ... | ... | @@ -478,22 +488,6 @@ and 64-bit platforms, and the test suite is very thorough, so it is |
| 478 | 488 | hard to make any of the potential errors here without being caught in |
| 479 | 489 | build or test. |
| 480 | 490 | |
| 481 | -Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The | |
| 482 | -pipeline interface has a ``write`` call that uses ``unsigned char*`` | |
| 483 | -without a ``const`` qualifier. The main reason for this is | |
| 484 | -to support pipelines that make calls to third-party libraries, such as | |
| 485 | -zlib, that don't include ``const`` in their interfaces. Unfortunately, | |
| 486 | -there are many places in the code where it is desirable to have | |
| 487 | -``const char*`` with pipelines. None of the pipeline implementations | |
| 488 | -in qpdf | |
| 489 | -currently modify the data passed to write, and doing so would be counter | |
| 490 | -to the intent of ``Pipeline``, but there is nothing in the code to | |
| 491 | -prevent this from being done. There are places in the code where | |
| 492 | -``const_cast`` is used to remove the const-ness of pointers going into | |
| 493 | -``Pipeline``\ s. This could theoretically be unsafe, but there is | |
| 494 | -adequate testing to assert that it is safe and will remain safe in | |
| 495 | -qpdf's code. | |
| 496 | - | |
| 497 | 491 | .. _encryption: |
| 498 | 492 | |
| 499 | 493 | Encryption |
| ... | ... | @@ -516,14 +510,14 @@ given an encryption key. This is used by ``QPDFWriter`` when it rewrites |
| 516 | 510 | encrypted files. |
| 517 | 511 | |
| 518 | 512 | When copying encrypted files, unless otherwise directed, qpdf will |
| 519 | -preserve any encryption in force in the original file. qpdf can do this | |
| 520 | -with either the user or the owner password. There is no difference in | |
| 521 | -capability based on which password is used. When 40 or 128 bit | |
| 522 | -encryption keys are used, the user password can be recovered with the | |
| 523 | -owner password. With 256 keys, the user and owner passwords are used | |
| 524 | -independently to encrypt the actual encryption key, so while either can | |
| 525 | -be used, the owner password can no longer be used to recover the user | |
| 526 | -password. | |
| 513 | +preserve any encryption in effect in the original file. qpdf can do | |
| 514 | +this with either the user or the owner password. There is no | |
| 515 | +difference in capability based on which password is used. When 40 or | |
| 516 | +128 bit encryption keys are used, the user password can be recovered | |
| 517 | +with the owner password. With 256 keys, the user and owner passwords | |
| 518 | +are used independently to encrypt the actual encryption key, so while | |
| 519 | +either can be used, the owner password can no longer be used to | |
| 520 | +recover the user password. | |
| 527 | 521 | |
| 528 | 522 | Starting with version 4.0.0, qpdf can read files that are not encrypted |
| 529 | 523 | but that contain encrypted attachments, but it cannot write such files. |
| ... | ... | @@ -538,33 +532,37 @@ format. The only exception to this is that clear-text metadata will be |
| 538 | 532 | preserved as clear-text if it is that way in the original file. |
| 539 | 533 | |
| 540 | 534 | One point of confusion some people have about encrypted PDF files is |
| 541 | -that encryption is not the same as password protection. Password | |
| 542 | -protected files are always encrypted, but it is also possible to create | |
| 543 | -encrypted files that do not have passwords. Internally, such files use | |
| 544 | -the empty string as a password, and most readers try the empty string | |
| 545 | -first to see if it works and prompt for a password only if the empty | |
| 546 | -string doesn't work. Normally such files have an empty user password and | |
| 547 | -a non-empty owner password. In that way, if the file is opened by an | |
| 548 | -ordinary reader without specification of password, the restrictions | |
| 549 | -specified in the encryption dictionary can be enforced. Most users | |
| 550 | -wouldn't even realize such a file was encrypted. Since qpdf always | |
| 551 | -ignores the restrictions (except for the purpose of reporting what they | |
| 552 | -are), qpdf doesn't care which password you use. QPDF will allow you to | |
| 553 | -create PDF files with non-empty user passwords and empty owner | |
| 554 | -passwords. Some readers will require a password when you open these | |
| 555 | -files, and others will open the files without a password and not enforce | |
| 556 | -restrictions. Having a non-empty user password and an empty owner | |
| 557 | -password doesn't really make sense because it would mean that opening | |
| 558 | -the file with the user password would be more restrictive than not | |
| 559 | -supplying a password at all. QPDF also allows you to create PDF files | |
| 560 | -with the same password as both the user and owner password. Some readers | |
| 561 | -will not ever allow such files to be accessed without restrictions | |
| 562 | -because they never try the password as the owner password if it works as | |
| 563 | -the user password. Nonetheless, one of the powerful aspects of qpdf is | |
| 564 | -that it allows you to finely specify the way encrypted files are | |
| 565 | -created, even if the results are not useful to some readers. One use | |
| 566 | -case for this would be for testing a PDF reader to ensure that it | |
| 567 | -handles odd configurations of input files. | |
| 535 | +that encryption is not the same as password protection. | |
| 536 | +Password-protected files are always encrypted, but it is also possible | |
| 537 | +to create encrypted files that do not have passwords. Internally, such | |
| 538 | +files use the empty string as a password, and most readers try the | |
| 539 | +empty string first to see if it works and prompt for a password only | |
| 540 | +if the empty string doesn't work. Normally such files have an empty | |
| 541 | +user password and a non-empty owner password. In that way, if the file | |
| 542 | +is opened by an ordinary reader without specification of password, the | |
| 543 | +restrictions specified in the encryption dictionary can be enforced. | |
| 544 | +Most users wouldn't even realize such a file was encrypted. Since qpdf | |
| 545 | +always ignores the restrictions (except for the purpose of reporting | |
| 546 | +what they are), qpdf doesn't care which password you use. QPDF will | |
| 547 | +allow you to create PDF files with non-empty user passwords and empty | |
| 548 | +owner passwords. Some readers will require a password when you open | |
| 549 | +these files, and others will open the files without a password and not | |
| 550 | +enforce restrictions. Having a non-empty user password and an empty | |
| 551 | +owner password doesn't really make sense because it would mean that | |
| 552 | +opening the file with the user password would be more restrictive than | |
| 553 | +not supplying a password at all. QPDF also allows you to create PDF | |
| 554 | +files with the same password as both the user and owner password. Some | |
| 555 | +readers will not ever allow such files to be accessed without | |
| 556 | +restrictions because they never try the password as the owner password | |
| 557 | +if it works as the user password. Nonetheless, one of the powerful | |
| 558 | +aspects of qpdf is that it allows you to finely specify the way | |
| 559 | +encrypted files are created, even if the results are not useful to | |
| 560 | +some readers. One use case for this would be for testing a PDF reader | |
| 561 | +to ensure that it handles odd configurations of input files. If you | |
| 562 | +attempt to create an encrypted file that is not secure, qpdf will warn | |
| 563 | +you and require you to explicitly state your intention to create an | |
| 564 | +insecure file. So while qpdf can create insecure files, it won't let | |
| 565 | +you do it by mistake. | |
| 568 | 566 | |
| 569 | 567 | .. _random-numbers: |
| 570 | 568 | |
| ... | ... | @@ -630,23 +628,21 @@ Copying Objects From Other PDF Files |
| 630 | 628 | |
| 631 | 629 | Version 3.0 of qpdf introduced the ability to copy objects into a |
| 632 | 630 | ``QPDF`` object from a different ``QPDF`` object, which we refer to as |
| 633 | -*foreign objects*. This allows arbitrary | |
| 634 | -merging of PDF files. The "from" ``QPDF`` object must remain valid after | |
| 635 | -the copy as discussed in the note below. The | |
| 636 | -:command:`qpdf` command-line tool provides limited | |
| 637 | -support for basic page selection, including merging in pages from other | |
| 638 | -files, but the library's API makes it possible to implement arbitrarily | |
| 639 | -complex merging operations. The main method for copying foreign objects | |
| 640 | -is ``QPDF::copyForeignObject``. This takes an indirect object from | |
| 631 | +*foreign objects*. This allows arbitrary merging of PDF files. The | |
| 632 | +:command:`qpdf` command-line tool provides limited support for basic | |
| 633 | +page selection, including merging in pages from other files, but the | |
| 634 | +library's API makes it possible to implement arbitrarily complex | |
| 635 | +merging operations. The main method for copying foreign objects is | |
| 636 | +``QPDF::copyForeignObject``. This takes an indirect object from | |
| 641 | 637 | another ``QPDF`` and copies it recursively into this object while |
| 642 | 638 | preserving all object structure, including circular references. This |
| 643 | 639 | means you can add a direct object that you create from scratch to a |
| 644 | 640 | ``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an |
| 645 | -indirect object from another file with ``QPDF::copyForeignObject``. The | |
| 646 | -fact that ``QPDF::makeIndirectObject`` does not automatically detect a | |
| 647 | -foreign object and copy it is an explicit design decision. Copying a | |
| 648 | -foreign object seems like a sufficiently significant thing to do that it | |
| 649 | -should be done explicitly. | |
| 641 | +indirect object from another file with ``QPDF::copyForeignObject``. | |
| 642 | +The fact that ``QPDF::makeIndirectObject`` does not automatically | |
| 643 | +detect a foreign object and copy it is an explicit design decision. | |
| 644 | +Copying a foreign object seems like a sufficiently significant thing | |
| 645 | +to do that it should be done explicitly. | |
| 650 | 646 | |
| 651 | 647 | The other way to copy foreign objects is by passing a page from one |
| 652 | 648 | ``QPDF`` to another by calling ``QPDF::addPage``. In contrast to |
| ... | ... | @@ -654,26 +650,30 @@ The other way to copy foreign objects is by passing a page from one |
| 654 | 650 | between indirect objects in the current file, foreign objects, and |
| 655 | 651 | direct objects. |
| 656 | 652 | |
| 657 | -Please note: when you copy objects from one ``QPDF`` to another, the | |
| 658 | -source ``QPDF`` object must remain valid until you have finished with | |
| 659 | -the destination object. This is because the original object is still | |
| 660 | -used to retrieve any referenced stream data from the copied object. | |
| 653 | +When you copy objects from one ``QPDF`` to another, the input source | |
| 654 | +of the original file remain valid until you have finished with the | |
| 655 | +destination object. This is because the input source is still used | |
| 656 | +to retrieve any referenced stream data from the copied object. If | |
| 657 | +needed, there are methods to force the data to be copied. See comments | |
| 658 | +near the declaration of ``copyForeignObject`` in | |
| 659 | +:file:`include/qpdf/QPDF.hh` for details. | |
| 661 | 660 | |
| 662 | 661 | .. _rewriting: |
| 663 | 662 | |
| 664 | 663 | Writing PDF Files |
| 665 | 664 | ----------------- |
| 666 | 665 | |
| 667 | -The qpdf library supports file writing of ``QPDF`` objects to PDF files | |
| 668 | -through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two | |
| 669 | -writing modes: one for non-linearized files, and one for linearized | |
| 670 | -files. See :ref:`linearization` for a description of | |
| 666 | +The qpdf library supports file writing of ``QPDF`` objects to PDF | |
| 667 | +files through the ``QPDFWriter`` class. The ``QPDFWriter`` class has | |
| 668 | +two writing modes: one for non-linearized files, and one for | |
| 669 | +linearized files. See :ref:`linearization` for a description of | |
| 671 | 670 | linearization is implemented. This section describes how we write |
| 672 | -non-linearized files including the creation of QDF files (see :ref:`qdf`. | |
| 671 | +non-linearized files including the creation of QDF files (see | |
| 672 | +:ref:`qdf`). | |
| 673 | 673 | |
| 674 | 674 | This outline was written prior to implementation and is not exactly |
| 675 | -accurate, but it provides a correct "notional" idea of how writing | |
| 676 | -works. Look at the code in ``QPDFWriter`` for exact details. | |
| 675 | +accurate, but it portrays the essence of how writing works. Look at | |
| 676 | +the code in ``QPDFWriter`` for exact details. | |
| 677 | 677 | |
| 678 | 678 | - Initialize state: |
| 679 | 679 | |
| ... | ... | @@ -685,7 +685,7 @@ works. Look at the code in ``QPDFWriter`` for exact details. |
| 685 | 685 | |
| 686 | 686 | - xref table: new id -> offset = empty |
| 687 | 687 | |
| 688 | -- Create a QPDF object from a file. | |
| 688 | +- Create a ``QPDF`` object from a file. | |
| 689 | 689 | |
| 690 | 690 | - Write header for new PDF file. |
| 691 | 691 | |
| ... | ... | @@ -750,7 +750,7 @@ Filtered Streams |
| 750 | 750 | ---------------- |
| 751 | 751 | |
| 752 | 752 | Support for streams is implemented through the ``Pipeline`` interface |
| 753 | -which was designed for this package. | |
| 753 | +which was designed for this library. | |
| 754 | 754 | |
| 755 | 755 | When reading streams, create a series of ``Pipeline`` objects. The |
| 756 | 756 | ``Pipeline`` abstract base requires implementation ``write()`` and |
| ... | ... | @@ -802,32 +802,20 @@ file might be, the presence of type warnings can save lots of developer |
| 802 | 802 | time. They have also proven useful in exposing issues in qpdf itself |
| 803 | 803 | that would have otherwise gone undetected. |
| 804 | 804 | |
| 805 | -*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if | |
| 806 | -``QPDFObjectHandle`` could be more strongly typed so that you'd have to | |
| 807 | -have check that something was of a particular type before calling | |
| 808 | -type-specific accessor methods. However, implementing this at this stage | |
| 809 | -of the library's history would be quite difficult, and it would make a | |
| 810 | -the common pattern of drilling into an object no longer work. While it | |
| 811 | -would be possible to have a parallel interface, it would create a lot of | |
| 812 | -extra code. If qpdf were written in a language like rust, an interface | |
| 813 | -like this would make a lot of sense, but, for a variety of reasons, the | |
| 814 | -qpdf API is consistent with other APIs of its time, relying on exception | |
| 815 | -handling to catch errors. The underlying PDF objects are inherently not | |
| 816 | -type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would | |
| 817 | -ultimately cause a lot more code to have to be written and would like | |
| 818 | -make software that uses qpdf more brittle, and even so, checks would | |
| 819 | -have to occur at runtime. | |
| 820 | - | |
| 821 | -*Why do type errors sometimes raise exceptions?* The way warnings work | |
| 822 | -in qpdf requires a ``QPDF`` object to be associated with an object | |
| 823 | -handle for a warning to be issued. It would be nice if this could be | |
| 824 | -fixed, but it would require major changes to the API. Rather than | |
| 825 | -throwing away these conditions, we convert them to exceptions. It's not | |
| 826 | -that bad though. Since any object handle that was read from a file has | |
| 827 | -an associated ``QPDF`` object, it would only be type errors on objects | |
| 828 | -that were created explicitly that would cause exceptions, and in that | |
| 829 | -case, type errors are much more likely to be the result of a coding | |
| 830 | -error than invalid input. | |
| 805 | +*Can there be a type-safe* ``QPDFObjectHandle``? At the time of the | |
| 806 | +release of qpdf 11, there is active work being done toward the goal of | |
| 807 | +creating a way to work with PDF objects that is more type-safe and | |
| 808 | +closer in feel to the current C++ standard library. It is hoped that | |
| 809 | +this work will make it easier to write bindings to qpdf in modern | |
| 810 | +languages like `Rust <https://www.rust-lang.org/>`__. If this happens, | |
| 811 | +it will likely be by providing an alternative to ``QPDFObjectHandle`` | |
| 812 | +that provides a separate path to the underlying object. Details are | |
| 813 | +still being worked out. Fundamentally, PDF objects are not strongly | |
| 814 | +typed. They are similar to ``JSON`` objects or to objects in dynamic | |
| 815 | +languages like `Python <https://python.org/>`__: there are certain | |
| 816 | +things you can only do to objects of a given type, but you can replace | |
| 817 | +an object of one type with an object of another. Because of this, | |
| 818 | +there will always be some checks that will happen at runtime. | |
| 831 | 819 | |
| 832 | 820 | *Why does the behavior of a type exception differ between the C and C++ |
| 833 | 821 | API?* There is no way to throw and catch exceptions in C short of | ... | ... |