Commit 910a373a79f885cba1023fa69aa0c679e4ae0601

Authored by Jay Berkenbilt
1 parent a6c4b293

Clean up the Design and Library Notes chapter of the manual

Showing 1 changed file with 195 additions and 207 deletions
manual/design.rst
@@ -8,50 +8,53 @@ Design and Library Notes @@ -8,50 +8,53 @@ Design and Library Notes
8 Introduction 8 Introduction
9 ------------ 9 ------------
10 10
11 -This section was written prior to the implementation of the qpdf package  
12 -and was subsequently modified to reflect the implementation. In some  
13 -cases, for purposes of explanation, it may differ slightly from the  
14 -actual implementation. As always, the source code and test suite are  
15 -authoritative. Even if there are some errors, this document should serve  
16 -as a road map to understanding how this code works. 11 +This section was written prior to the implementation of the qpdf
  12 +library and was subsequently modified to reflect the implementation.
  13 +In some cases, for purposes of explanation, it may differ slightly
  14 +from the actual implementation. As always, the source code and test
  15 +suite are authoritative. Even if there are some errors, this document
  16 +should serve as a road map to understanding how this code works.
17 17
18 In general, one should adhere strictly to a specification when writing 18 In general, one should adhere strictly to a specification when writing
19 -but be liberal in reading. This way, the product of our software will be  
20 -accepted by the widest range of other programs, and we will accept the  
21 -widest range of input files. This library attempts to conform to that  
22 -philosophy whenever possible but also aims to provide strict checking  
23 -for people who want to validate PDF files. If you don't want to see  
24 -warnings and are trying to write something that is tolerant, you can  
25 -call ``setSuppressWarnings(true)``. If you want to fail on the first  
26 -error, you can call ``setAttemptRecovery(false)``. The default behavior  
27 -is to generating warnings for recoverable problems. Note that recovery  
28 -will not always produce the desired results even if it is able to get  
29 -through the file. Unlike most other PDF files that produce generic  
30 -warnings such as "This file is damaged,", qpdf generally issues a  
31 -detailed error message that would be most useful to a PDF developer. 19 +but be liberal in reading. This way, the product of our software will
  20 +be accepted by the widest range of other programs, and we will accept
  21 +the widest range of input files. This library attempts to conform to
  22 +that philosophy whenever possible but also aims to provide strict
  23 +checking for people who want to validate PDF files. If you don't want
  24 +to see warnings and are trying to write something that is tolerant,
  25 +you can call ``setSuppressWarnings(true)``. If you want to fail on the
  26 +first error, you can call ``setAttemptRecovery(false)``. The default
  27 +behavior is to generating warnings for recoverable problems. Note that
  28 +recovery will not always produce the desired results even if it is
  29 +able to get through the file. Unlike most other PDF files that produce
  30 +generic warnings such as "This file is damaged," qpdf generally issues
  31 +a detailed error message that would be most useful to a PDF developer.
32 This is by design as there seems to be a shortage of PDF validation 32 This is by design as there seems to be a shortage of PDF validation
33 -tools out there. This was, in fact, one of the major motivations behind  
34 -the initial creation of qpdf. 33 +tools out there. This was, in fact, one of the major motivations
  34 +behind the initial creation of qpdf. That said, qpdf is not a strict
  35 +PDF checker. There are many ways in which a PDF file can be out of
  36 +conformance to the spec that qpdf doesn't notice or report.
35 37
36 .. _design-goals: 38 .. _design-goals:
37 39
38 Design Goals 40 Design Goals
39 ------------ 41 ------------
40 42
41 -The QPDF package includes support for reading and rewriting PDF files. 43 +The qpdf library includes support for reading and rewriting PDF files.
42 It aims to hide from the user details involving object locations, 44 It aims to hide from the user details involving object locations,
43 -modified (appended) PDF files, the directness/indirectness of objects,  
44 -and stream filters including encryption. It does not aim to hide  
45 -knowledge of the object hierarchy or content stream contents. Put  
46 -another way, a user of the qpdf library is expected to have knowledge  
47 -about how PDF files work, but is not expected to have to keep track of  
48 -bookkeeping details such as file positions.  
49 -  
50 -A user of the library never has to care whether an object is direct or  
51 -indirect, though it is possible to determine whether an object is direct  
52 -or not if this information is needed. All access to objects deals with  
53 -this transparently. All memory management details are also handled by  
54 -the library. 45 +modified (appended) PDF files, use of object streams, and stream
  46 +filters including encryption. It does not aim to hide knowledge of the
  47 +object hierarchy or content stream contents. Put another way, a user
  48 +of the qpdf library is expected to have knowledge about how PDF files
  49 +work, but is not expected to have to keep track of bookkeeping details
  50 +such as file positions.
  51 +
  52 +When accessing objects, a user of the library never has to care
  53 +whether an object is direct or indirect as all access to objects deals
  54 +with this transparently. All memory management details are also
  55 +handled by the library. When modifying objects, it is possible to
  56 +determine whether an object is indirect and to make copies of the
  57 +object if needed.
55 58
56 Memory is managed mostly with ``std::shared_ptr`` object to minimize 59 Memory is managed mostly with ``std::shared_ptr`` object to minimize
57 explicit memory handling. This library also makes use of a technique 60 explicit memory handling. This library also makes use of a technique
@@ -85,29 +88,32 @@ objects to indirect objects and vice versa. @@ -85,29 +88,32 @@ objects to indirect objects and vice versa.
85 Instances of ``QPDFObjectHandle`` can be directly created and modified 88 Instances of ``QPDFObjectHandle`` can be directly created and modified
86 using static factory methods in the ``QPDFObjectHandle`` class. There 89 using static factory methods in the ``QPDFObjectHandle`` class. There
87 are factory methods for each type of object as well as a convenience 90 are factory methods for each type of object as well as a convenience
88 -method ``QPDFObjectHandle::parse`` that creates an object from a string  
89 -representation of the object. Existing instances of ``QPDFObjectHandle``  
90 -can also be modified in several ways. See comments in  
91 -:file:`QPDFObjectHandle.hh` for details. 91 +method ``QPDFObjectHandle::parse`` that creates an object from a
  92 +string representation of the object. The ``_qpdf`` user-defined string
  93 +literal is also available, making it possible to create instances of
  94 +``QPDFObjectHandle`` with ``"(pdf-syntax)"_qpdf``. Existing instances
  95 +of ``QPDFObjectHandle`` can also be modified in several ways. See
  96 +comments in :file:`QPDFObjectHandle.hh` for details.
92 97
93 An instance of ``QPDF`` is constructed by using the class's default 98 An instance of ``QPDF`` is constructed by using the class's default
94 -constructor. If desired, the ``QPDF`` object may be configured with  
95 -various methods that change its default behavior. Then the  
96 -``QPDF::processFile()`` method is passed the name of a PDF file, which  
97 -permanently associates the file with that QPDF object. A password may  
98 -also be given for access to password-protected files. QPDF does not  
99 -enforce encryption parameters and will treat user and owner passwords  
100 -equivalently. Either password may be used to access an encrypted file.  
101 -``QPDF`` will allow recovery of a user password given an owner password.  
102 -The input PDF file must be seekable. (Output files written by  
103 -``QPDFWriter`` need not be seekable, even when creating linearized  
104 -files.) During construction, ``QPDF`` validates the PDF file's header,  
105 -and then reads the cross reference tables and trailer dictionaries. The  
106 -``QPDF`` class keeps only the first trailer dictionary though it does  
107 -read all of them so it can check the ``/Prev`` key. ``QPDF`` class users  
108 -may request the root object and the trailer dictionary specifically. The  
109 -cross reference table is kept private. Objects may then be requested by  
110 -number or by walking the object tree. 99 +constructor or with ``QPDF::create()``. If desired, the ``QPDF``
  100 +object may be configured with various methods that change its default
  101 +behavior. Then the ``QPDF::processFile`` method is passed the name of
  102 +a PDF file, which permanently associates the file with that ``QPDF``
  103 +object. A password may also be given for access to password-protected
  104 +files. ``QPDF`` does not enforce encryption parameters and will treat
  105 +user and owner passwords equivalently. Either password may be used to
  106 +access an encrypted file. ``QPDF`` will allow recovery of a user
  107 +password given an owner password. The input PDF file must be seekable.
  108 +Output files written by ``QPDFWriter`` need not be seekable, even when
  109 +creating linearized files. During construction, ``QPDF`` validates the
  110 +PDF file's header, and then reads the cross reference tables and
  111 +trailer dictionaries. The ``QPDF`` class keeps only the first trailer
  112 +dictionary though it does read all of them so it can check the
  113 +``/Prev`` key. ``QPDF`` class users may request the root object and
  114 +the trailer dictionary specifically. The cross reference table is kept
  115 +private. Objects may then be requested by number or by walking the
  116 +object tree.
111 117
112 When a PDF file has a cross-reference stream instead of a 118 When a PDF file has a cross-reference stream instead of a
113 cross-reference table and trailer, requesting the document's trailer 119 cross-reference table and trailer, requesting the document's trailer
@@ -240,13 +246,14 @@ the ``QPDFObjectHandle`` type to hold onto objects and to abstract @@ -240,13 +246,14 @@ the ``QPDFObjectHandle`` type to hold onto objects and to abstract
240 away in most cases whether the object is direct or indirect. 246 away in most cases whether the object is direct or indirect.
241 247
242 Internally, ``QPDFObjectHandle`` holds onto a shared pointer to the 248 Internally, ``QPDFObjectHandle`` holds onto a shared pointer to the
243 -underlying object value. When a direct object is created, the  
244 -``QPDFObjectHandle`` that holds it is not associated with a ``QPDF``  
245 -object. When an indirect object reference is created, it starts off in  
246 -an *unresolved* state and must be associated with a ``QPDF`` object,  
247 -which is considered its *owner*. To access the actual value of the  
248 -object, the object must be *resolved*. This happens automatically when  
249 -the the object is accessed in any way. 249 +underlying object value. When a direct object is created
  250 +programmatically by client code (rather than being read from the
  251 +file), the ``QPDFObjectHandle`` that holds it is not associated with a
  252 +``QPDF`` object. When an indirect object reference is created, it
  253 +starts off in an *unresolved* state and must be associated with a
  254 +``QPDF`` object, which is considered its *owner*. To access the actual
  255 +value of the object, the object must be *resolved*. This happens
  256 +automatically when the the object is accessed in any way.
250 257
251 To resolve an object, qpdf checks its object cache. If not found in 258 To resolve an object, qpdf checks its object cache. If not found in
252 the cache, it attempts to read the object from the input source 259 the cache, it attempts to read the object from the input source
@@ -286,18 +293,20 @@ file. @@ -286,18 +293,20 @@ file.
286 it is looking before the last ``%%EOF``. After getting to ``trailer`` 293 it is looking before the last ``%%EOF``. After getting to ``trailer``
287 keyword, it invokes the parser. 294 keyword, it invokes the parser.
288 295
289 -- The parser sees ``<<``, so it calls itself recursively in  
290 - dictionary creation mode. 296 +- The parser sees ``<<``, so it changes state and starts accumulating
  297 + the keys and values of the dictionary.
291 298
292 - In dictionary creation mode, the parser keeps accumulating objects 299 - In dictionary creation mode, the parser keeps accumulating objects
293 until it encounters ``>>``. Each object that is read is pushed onto 300 until it encounters ``>>``. Each object that is read is pushed onto
294 a stack. If ``R`` is read, the last two objects on the stack are 301 a stack. If ``R`` is read, the last two objects on the stack are
295 inspected. If they are integers, they are popped off the stack and 302 inspected. If they are integers, they are popped off the stack and
296 - their values are used to construct an indirect object handle which  
297 - is then pushed onto the stack. When ``>>`` is finally read, the  
298 - stack is converted into a ``QPDF_Dictionary`` (not directly  
299 - accessible through the API) which is placed in a  
300 - ``QPDFObjectHandle`` and returned. 303 + their values are used to obtain an indirect object handle from the
  304 + ``QPDF`` class. The ``QPDF`` class consults its cache, and if
  305 + necessary, inserts a new unresolved object, and returns an object
  306 + handle pointing to the cache entry, which is then pushed onto the
  307 + stack. When ``>>`` is finally read, the stack is converted into a
  308 + ``QPDF_Dictionary`` (not directly accessible through the API) which
  309 + is placed in a ``QPDFObjectHandle`` and returned.
301 310
302 - The resulting dictionary is saved as the trailer dictionary. 311 - The resulting dictionary is saved as the trailer dictionary.
303 312
@@ -309,23 +318,21 @@ file. @@ -309,23 +318,21 @@ file.
309 - If there is an encryption dictionary, the document's encryption 318 - If there is an encryption dictionary, the document's encryption
310 parameters are initialized. 319 parameters are initialized.
311 320
312 -- The client requests root object. The ``QPDF`` class gets the value of  
313 - root key from trailer dictionary and returns it. It is an unresolved  
314 - indirect ``QPDFObjectHandle``. 321 +- The client requests the root object by getting the value of the
  322 + ``/Root`` key from trailer dictionary and returns it. It is an
  323 + unresolved indirect ``QPDFObjectHandle``.
315 324
316 - The client requests the ``/Pages`` key from root 325 - The client requests the ``/Pages`` key from root
317 - ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is  
318 - indirect so it asks ``QPDF`` to resolve it. ``QPDF`` looks in the  
319 - object cache for an object with the root dictionary's object ID and  
320 - generation number. Upon not seeing it, it checks the cross reference  
321 - table, gets the offset, and reads the object present at that offset.  
322 - It stores the result in the object cache. The cache entry's value is  
323 - replaced by the actual value, which causes any previously unresolved  
324 - ``QPDFObjectHandle`` objects that that pointed there to now have a  
325 - shared copy of the actual object. Modifications through any such  
326 - ``QPDFObjectHandle`` will be reflected in all of them. As the client  
327 - continues to request objects, the same process is followed for each  
328 - new requested object. 326 + ``QPDFObjectHandle``. The ``QPDFObjectHandle`` notices that it is an
  327 + unresolved indirect object, so it asks ``QPDF`` to resolve it.
  328 + ``QPDF`` checks the cross reference table, gets the offset, and
  329 + reads the object present at that offset. The object cache entry's
  330 + ``unresolved`` value is replaced by the actual value, which causes
  331 + any previously unresolved ``QPDFObjectHandle`` objects that pointed
  332 + there to now have a shared copy of the actual object. Modifications
  333 + through any such ``QPDFObjectHandle`` will be reflected in all of
  334 + them. As the client continues to request objects, the same process
  335 + is followed for each new requested object.
329 336
330 .. _object_internals: 337 .. _object_internals:
331 338
@@ -339,11 +346,12 @@ Object Internals @@ -339,11 +346,12 @@ Object Internals
339 ~~~~~~~~~~~~~~~~ 346 ~~~~~~~~~~~~~~~~
340 347
341 The ``QPDF`` object has an object cache which contains a shared 348 The ``QPDF`` object has an object cache which contains a shared
342 -pointer to each object that was read from the file. Changes can be  
343 -made to any of those objects through ``QPDFObjectHandle`` methods. Any  
344 -such changes are visible to all ``QPDFObjectHandle`` instances that  
345 -point to the same object. When a ``QPDF`` object is written by  
346 -``QPDFWriter`` or serialized to JSON, any changes are reflected. 349 +pointer to each object that was read from the file or added as an
  350 +indirect object. Changes can be made to any of those objects through
  351 +``QPDFObjectHandle`` methods. Any such changes are visible to all
  352 +``QPDFObjectHandle`` instances that point to the same object. When a
  353 +``QPDF`` object is written by ``QPDFWriter`` or serialized to JSON,
  354 +any changes are reflected.
347 355
348 Objects in qpdf 11 and Newer 356 Objects in qpdf 11 and Newer
349 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 357 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -356,30 +364,32 @@ reference to that object has a copy of that shared pointer. Each @@ -356,30 +364,32 @@ reference to that object has a copy of that shared pointer. Each
356 is an implementation for each of the basic object types (array, 364 is an implementation for each of the basic object types (array,
357 dictionary, null, boolean, string, number, etc.) as well as a few 365 dictionary, null, boolean, string, number, etc.) as well as a few
358 special ones including ``uninitialized``, ``unresolved``, 366 special ones including ``uninitialized``, ``unresolved``,
359 -``reserved``, and ``destroyed``. When an object is first referenced, 367 +``reserved``, and ``destroyed``. When an object is first created,
360 its underlying ``QPDFValue`` has type ``unresolved``. When the object 368 its underlying ``QPDFValue`` has type ``unresolved``. When the object
361 -is first resolved, the ``QPDFObject`` in the cache has its internal 369 +is first accessed, the ``QPDFObject`` in the cache has its internal
362 ``QPDFValue`` replaced with the object as read from the file. Since it 370 ``QPDFValue`` replaced with the object as read from the file. Since it
363 is the ``QPDFObject`` object that is shared by all referencing 371 is the ``QPDFObject`` object that is shared by all referencing
364 ``QPDFObjectHandle`` objects as well as by the owning ``QPDF`` object, 372 ``QPDFObjectHandle`` objects as well as by the owning ``QPDF`` object,
365 this ensures that any future changes to the object, including 373 this ensures that any future changes to the object, including
366 -replacing the object with a completely different one, will be 374 +replacing the object with a completely different one by calling
  375 +``QPDF::replaceObject`` or ``QPDF::swapObjects``, will be
367 reflected across all ``QPDFObjectHandle`` objects that reference it. 376 reflected across all ``QPDFObjectHandle`` objects that reference it.
368 377
369 A ``QPDFValue`` that originated from a PDF input source maintains a 378 A ``QPDFValue`` that originated from a PDF input source maintains a
370 pointer to the ``QPDF`` object that read it (its *owner*). When that 379 pointer to the ``QPDF`` object that read it (its *owner*). When that
371 -``QPDF`` object is destroyed, it disconnects all reachable from it by  
372 -clearing their owner. For indirect objects (all objects in the object  
373 -cache), it also replaces the object's value with an object of type  
374 -``destroyed``. This means that, if there are still any referencing  
375 -``QPDFObjectHandle`` objects floating around, requesting their owning  
376 -``QPDF`` will return a null pointer rather than a pointer to a  
377 -``QPDF`` object that is either invalid or points to something else,  
378 -and any attempt to access an indirect object that is associated with a  
379 -destroyed ``QPDF`` object will throw an exception. This operation also  
380 -has the effect of breaking any circular references (which are common  
381 -and, in some cases, required by the PDF specification), thus  
382 -preventing memory leaks when ``QPDF`` objects are destroyed. 380 +``QPDF`` object is destroyed, it disconnects all objects reachable
  381 +from it by clearing their owner. For indirect objects (all objects in
  382 +the object cache), it also replaces the object's value with an object
  383 +of type ``destroyed``. This means that, if there are still any
  384 +referencing ``QPDFObjectHandle`` objects floating around, requesting
  385 +their owning ``QPDF`` will return a null pointer rather than a pointer
  386 +to a ``QPDF`` object that is either invalid or points to something
  387 +else, and any attempt to access an indirect object that is associated
  388 +with a destroyed ``QPDF`` object will throw an exception. This
  389 +operation also has the effect of breaking any circular references
  390 +(which are common and, in some cases, required by the PDF
  391 +specification), thus preventing memory leaks when ``QPDF`` objects are
  392 +destroyed.
383 393
384 Objects prior to qpdf 11 394 Objects prior to qpdf 11
385 ~~~~~~~~~~~~~~~~~~~~~~~~ 395 ~~~~~~~~~~~~~~~~~~~~~~~~
@@ -478,22 +488,6 @@ and 64-bit platforms, and the test suite is very thorough, so it is @@ -478,22 +488,6 @@ and 64-bit platforms, and the test suite is very thorough, so it is
478 hard to make any of the potential errors here without being caught in 488 hard to make any of the potential errors here without being caught in
479 build or test. 489 build or test.
480 490
481 -Non-const ``unsigned char*`` is used in the ``Pipeline`` interface. The  
482 -pipeline interface has a ``write`` call that uses ``unsigned char*``  
483 -without a ``const`` qualifier. The main reason for this is  
484 -to support pipelines that make calls to third-party libraries, such as  
485 -zlib, that don't include ``const`` in their interfaces. Unfortunately,  
486 -there are many places in the code where it is desirable to have  
487 -``const char*`` with pipelines. None of the pipeline implementations  
488 -in qpdf  
489 -currently modify the data passed to write, and doing so would be counter  
490 -to the intent of ``Pipeline``, but there is nothing in the code to  
491 -prevent this from being done. There are places in the code where  
492 -``const_cast`` is used to remove the const-ness of pointers going into  
493 -``Pipeline``\ s. This could theoretically be unsafe, but there is  
494 -adequate testing to assert that it is safe and will remain safe in  
495 -qpdf's code.  
496 -  
497 .. _encryption: 491 .. _encryption:
498 492
499 Encryption 493 Encryption
@@ -516,14 +510,14 @@ given an encryption key. This is used by ``QPDFWriter`` when it rewrites @@ -516,14 +510,14 @@ given an encryption key. This is used by ``QPDFWriter`` when it rewrites
516 encrypted files. 510 encrypted files.
517 511
518 When copying encrypted files, unless otherwise directed, qpdf will 512 When copying encrypted files, unless otherwise directed, qpdf will
519 -preserve any encryption in force in the original file. qpdf can do this  
520 -with either the user or the owner password. There is no difference in  
521 -capability based on which password is used. When 40 or 128 bit  
522 -encryption keys are used, the user password can be recovered with the  
523 -owner password. With 256 keys, the user and owner passwords are used  
524 -independently to encrypt the actual encryption key, so while either can  
525 -be used, the owner password can no longer be used to recover the user  
526 -password. 513 +preserve any encryption in effect in the original file. qpdf can do
  514 +this with either the user or the owner password. There is no
  515 +difference in capability based on which password is used. When 40 or
  516 +128 bit encryption keys are used, the user password can be recovered
  517 +with the owner password. With 256 keys, the user and owner passwords
  518 +are used independently to encrypt the actual encryption key, so while
  519 +either can be used, the owner password can no longer be used to
  520 +recover the user password.
527 521
528 Starting with version 4.0.0, qpdf can read files that are not encrypted 522 Starting with version 4.0.0, qpdf can read files that are not encrypted
529 but that contain encrypted attachments, but it cannot write such files. 523 but that contain encrypted attachments, but it cannot write such files.
@@ -538,33 +532,37 @@ format. The only exception to this is that clear-text metadata will be @@ -538,33 +532,37 @@ format. The only exception to this is that clear-text metadata will be
538 preserved as clear-text if it is that way in the original file. 532 preserved as clear-text if it is that way in the original file.
539 533
540 One point of confusion some people have about encrypted PDF files is 534 One point of confusion some people have about encrypted PDF files is
541 -that encryption is not the same as password protection. Password  
542 -protected files are always encrypted, but it is also possible to create  
543 -encrypted files that do not have passwords. Internally, such files use  
544 -the empty string as a password, and most readers try the empty string  
545 -first to see if it works and prompt for a password only if the empty  
546 -string doesn't work. Normally such files have an empty user password and  
547 -a non-empty owner password. In that way, if the file is opened by an  
548 -ordinary reader without specification of password, the restrictions  
549 -specified in the encryption dictionary can be enforced. Most users  
550 -wouldn't even realize such a file was encrypted. Since qpdf always  
551 -ignores the restrictions (except for the purpose of reporting what they  
552 -are), qpdf doesn't care which password you use. QPDF will allow you to  
553 -create PDF files with non-empty user passwords and empty owner  
554 -passwords. Some readers will require a password when you open these  
555 -files, and others will open the files without a password and not enforce  
556 -restrictions. Having a non-empty user password and an empty owner  
557 -password doesn't really make sense because it would mean that opening  
558 -the file with the user password would be more restrictive than not  
559 -supplying a password at all. QPDF also allows you to create PDF files  
560 -with the same password as both the user and owner password. Some readers  
561 -will not ever allow such files to be accessed without restrictions  
562 -because they never try the password as the owner password if it works as  
563 -the user password. Nonetheless, one of the powerful aspects of qpdf is  
564 -that it allows you to finely specify the way encrypted files are  
565 -created, even if the results are not useful to some readers. One use  
566 -case for this would be for testing a PDF reader to ensure that it  
567 -handles odd configurations of input files. 535 +that encryption is not the same as password protection.
  536 +Password-protected files are always encrypted, but it is also possible
  537 +to create encrypted files that do not have passwords. Internally, such
  538 +files use the empty string as a password, and most readers try the
  539 +empty string first to see if it works and prompt for a password only
  540 +if the empty string doesn't work. Normally such files have an empty
  541 +user password and a non-empty owner password. In that way, if the file
  542 +is opened by an ordinary reader without specification of password, the
  543 +restrictions specified in the encryption dictionary can be enforced.
  544 +Most users wouldn't even realize such a file was encrypted. Since qpdf
  545 +always ignores the restrictions (except for the purpose of reporting
  546 +what they are), qpdf doesn't care which password you use. QPDF will
  547 +allow you to create PDF files with non-empty user passwords and empty
  548 +owner passwords. Some readers will require a password when you open
  549 +these files, and others will open the files without a password and not
  550 +enforce restrictions. Having a non-empty user password and an empty
  551 +owner password doesn't really make sense because it would mean that
  552 +opening the file with the user password would be more restrictive than
  553 +not supplying a password at all. QPDF also allows you to create PDF
  554 +files with the same password as both the user and owner password. Some
  555 +readers will not ever allow such files to be accessed without
  556 +restrictions because they never try the password as the owner password
  557 +if it works as the user password. Nonetheless, one of the powerful
  558 +aspects of qpdf is that it allows you to finely specify the way
  559 +encrypted files are created, even if the results are not useful to
  560 +some readers. One use case for this would be for testing a PDF reader
  561 +to ensure that it handles odd configurations of input files. If you
  562 +attempt to create an encrypted file that is not secure, qpdf will warn
  563 +you and require you to explicitly state your intention to create an
  564 +insecure file. So while qpdf can create insecure files, it won't let
  565 +you do it by mistake.
568 566
569 .. _random-numbers: 567 .. _random-numbers:
570 568
@@ -630,23 +628,21 @@ Copying Objects From Other PDF Files @@ -630,23 +628,21 @@ Copying Objects From Other PDF Files
630 628
631 Version 3.0 of qpdf introduced the ability to copy objects into a 629 Version 3.0 of qpdf introduced the ability to copy objects into a
632 ``QPDF`` object from a different ``QPDF`` object, which we refer to as 630 ``QPDF`` object from a different ``QPDF`` object, which we refer to as
633 -*foreign objects*. This allows arbitrary  
634 -merging of PDF files. The "from" ``QPDF`` object must remain valid after  
635 -the copy as discussed in the note below. The  
636 -:command:`qpdf` command-line tool provides limited  
637 -support for basic page selection, including merging in pages from other  
638 -files, but the library's API makes it possible to implement arbitrarily  
639 -complex merging operations. The main method for copying foreign objects  
640 -is ``QPDF::copyForeignObject``. This takes an indirect object from 631 +*foreign objects*. This allows arbitrary merging of PDF files. The
  632 +:command:`qpdf` command-line tool provides limited support for basic
  633 +page selection, including merging in pages from other files, but the
  634 +library's API makes it possible to implement arbitrarily complex
  635 +merging operations. The main method for copying foreign objects is
  636 +``QPDF::copyForeignObject``. This takes an indirect object from
641 another ``QPDF`` and copies it recursively into this object while 637 another ``QPDF`` and copies it recursively into this object while
642 preserving all object structure, including circular references. This 638 preserving all object structure, including circular references. This
643 means you can add a direct object that you create from scratch to a 639 means you can add a direct object that you create from scratch to a
644 ``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an 640 ``QPDF`` object with ``QPDF::makeIndirectObject``, and you can add an
645 -indirect object from another file with ``QPDF::copyForeignObject``. The  
646 -fact that ``QPDF::makeIndirectObject`` does not automatically detect a  
647 -foreign object and copy it is an explicit design decision. Copying a  
648 -foreign object seems like a sufficiently significant thing to do that it  
649 -should be done explicitly. 641 +indirect object from another file with ``QPDF::copyForeignObject``.
  642 +The fact that ``QPDF::makeIndirectObject`` does not automatically
  643 +detect a foreign object and copy it is an explicit design decision.
  644 +Copying a foreign object seems like a sufficiently significant thing
  645 +to do that it should be done explicitly.
650 646
651 The other way to copy foreign objects is by passing a page from one 647 The other way to copy foreign objects is by passing a page from one
652 ``QPDF`` to another by calling ``QPDF::addPage``. In contrast to 648 ``QPDF`` to another by calling ``QPDF::addPage``. In contrast to
@@ -654,26 +650,30 @@ The other way to copy foreign objects is by passing a page from one @@ -654,26 +650,30 @@ The other way to copy foreign objects is by passing a page from one
654 between indirect objects in the current file, foreign objects, and 650 between indirect objects in the current file, foreign objects, and
655 direct objects. 651 direct objects.
656 652
657 -Please note: when you copy objects from one ``QPDF`` to another, the  
658 -source ``QPDF`` object must remain valid until you have finished with  
659 -the destination object. This is because the original object is still  
660 -used to retrieve any referenced stream data from the copied object. 653 +When you copy objects from one ``QPDF`` to another, the input source
  654 +of the original file remain valid until you have finished with the
  655 +destination object. This is because the input source is still used
  656 +to retrieve any referenced stream data from the copied object. If
  657 +needed, there are methods to force the data to be copied. See comments
  658 +near the declaration of ``copyForeignObject`` in
  659 +:file:`include/qpdf/QPDF.hh` for details.
661 660
662 .. _rewriting: 661 .. _rewriting:
663 662
664 Writing PDF Files 663 Writing PDF Files
665 ----------------- 664 -----------------
666 665
667 -The qpdf library supports file writing of ``QPDF`` objects to PDF files  
668 -through the ``QPDFWriter`` class. The ``QPDFWriter`` class has two  
669 -writing modes: one for non-linearized files, and one for linearized  
670 -files. See :ref:`linearization` for a description of 666 +The qpdf library supports file writing of ``QPDF`` objects to PDF
  667 +files through the ``QPDFWriter`` class. The ``QPDFWriter`` class has
  668 +two writing modes: one for non-linearized files, and one for
  669 +linearized files. See :ref:`linearization` for a description of
671 linearization is implemented. This section describes how we write 670 linearization is implemented. This section describes how we write
672 -non-linearized files including the creation of QDF files (see :ref:`qdf`. 671 +non-linearized files including the creation of QDF files (see
  672 +:ref:`qdf`).
673 673
674 This outline was written prior to implementation and is not exactly 674 This outline was written prior to implementation and is not exactly
675 -accurate, but it provides a correct "notional" idea of how writing  
676 -works. Look at the code in ``QPDFWriter`` for exact details. 675 +accurate, but it portrays the essence of how writing works. Look at
  676 +the code in ``QPDFWriter`` for exact details.
677 677
678 - Initialize state: 678 - Initialize state:
679 679
@@ -685,7 +685,7 @@ works. Look at the code in ``QPDFWriter`` for exact details. @@ -685,7 +685,7 @@ works. Look at the code in ``QPDFWriter`` for exact details.
685 685
686 - xref table: new id -> offset = empty 686 - xref table: new id -> offset = empty
687 687
688 -- Create a QPDF object from a file. 688 +- Create a ``QPDF`` object from a file.
689 689
690 - Write header for new PDF file. 690 - Write header for new PDF file.
691 691
@@ -750,7 +750,7 @@ Filtered Streams @@ -750,7 +750,7 @@ Filtered Streams
750 ---------------- 750 ----------------
751 751
752 Support for streams is implemented through the ``Pipeline`` interface 752 Support for streams is implemented through the ``Pipeline`` interface
753 -which was designed for this package. 753 +which was designed for this library.
754 754
755 When reading streams, create a series of ``Pipeline`` objects. The 755 When reading streams, create a series of ``Pipeline`` objects. The
756 ``Pipeline`` abstract base requires implementation ``write()`` and 756 ``Pipeline`` abstract base requires implementation ``write()`` and
@@ -802,32 +802,20 @@ file might be, the presence of type warnings can save lots of developer @@ -802,32 +802,20 @@ file might be, the presence of type warnings can save lots of developer
802 time. They have also proven useful in exposing issues in qpdf itself 802 time. They have also proven useful in exposing issues in qpdf itself
803 that would have otherwise gone undetected. 803 that would have otherwise gone undetected.
804 804
805 -*Can there be a type-safe ``QPDFObjectHandle``?* It would be great if  
806 -``QPDFObjectHandle`` could be more strongly typed so that you'd have to  
807 -have check that something was of a particular type before calling  
808 -type-specific accessor methods. However, implementing this at this stage  
809 -of the library's history would be quite difficult, and it would make a  
810 -the common pattern of drilling into an object no longer work. While it  
811 -would be possible to have a parallel interface, it would create a lot of  
812 -extra code. If qpdf were written in a language like rust, an interface  
813 -like this would make a lot of sense, but, for a variety of reasons, the  
814 -qpdf API is consistent with other APIs of its time, relying on exception  
815 -handling to catch errors. The underlying PDF objects are inherently not  
816 -type-safe. Forcing stronger type safety in ``QPDFObjectHandle`` would  
817 -ultimately cause a lot more code to have to be written and would like  
818 -make software that uses qpdf more brittle, and even so, checks would  
819 -have to occur at runtime.  
820 -  
821 -*Why do type errors sometimes raise exceptions?* The way warnings work  
822 -in qpdf requires a ``QPDF`` object to be associated with an object  
823 -handle for a warning to be issued. It would be nice if this could be  
824 -fixed, but it would require major changes to the API. Rather than  
825 -throwing away these conditions, we convert them to exceptions. It's not  
826 -that bad though. Since any object handle that was read from a file has  
827 -an associated ``QPDF`` object, it would only be type errors on objects  
828 -that were created explicitly that would cause exceptions, and in that  
829 -case, type errors are much more likely to be the result of a coding  
830 -error than invalid input. 805 +*Can there be a type-safe* ``QPDFObjectHandle``? At the time of the
  806 +release of qpdf 11, there is active work being done toward the goal of
  807 +creating a way to work with PDF objects that is more type-safe and
  808 +closer in feel to the current C++ standard library. It is hoped that
  809 +this work will make it easier to write bindings to qpdf in modern
  810 +languages like `Rust <https://www.rust-lang.org/>`__. If this happens,
  811 +it will likely be by providing an alternative to ``QPDFObjectHandle``
  812 +that provides a separate path to the underlying object. Details are
  813 +still being worked out. Fundamentally, PDF objects are not strongly
  814 +typed. They are similar to ``JSON`` objects or to objects in dynamic
  815 +languages like `Python <https://python.org/>`__: there are certain
  816 +things you can only do to objects of a given type, but you can replace
  817 +an object of one type with an object of another. Because of this,
  818 +there will always be some checks that will happen at runtime.
831 819
832 *Why does the behavior of a type exception differ between the C and C++ 820 *Why does the behavior of a type exception differ between the C and C++
833 API?* There is no way to throw and catch exceptions in C short of 821 API?* There is no way to throw and catch exceptions in C short of