Commit 0675a3f61a465f282eba8e1f54bdda3920257959
1 parent
cc889507
Decide not to allow stream data providers to modify dictionary
Showing
2 changed files
with
68 additions
and
12 deletions
TODO
| ... | ... | @@ -29,11 +29,6 @@ Candidates for upcoming release |
| 29 | 29 | * big page even with --remove-unreferenced-resources=yes, even with --empty |
| 30 | 30 | * optimize image failure because of colorspace |
| 31 | 31 | |
| 32 | -* Make it possible for StreamDataProvider to modify the stream | |
| 33 | - dictionary in addition to the stream data so it can calculate things | |
| 34 | - about the dictionary at runtime. Will require a small change to | |
| 35 | - QPDFWriter. | |
| 36 | - | |
| 37 | 32 | * Take flattenRotation code from pdf-split and do something with it, |
| 38 | 33 | maybe adding it to the library. Once there, call it from pdf-split |
| 39 | 34 | and bump up the required version of qpdf. |
| ... | ... | @@ -558,3 +553,49 @@ I find it useful to make reference to them in this list |
| 558 | 553 | filtering and tokenizer rewrite and should be done in a manner that |
| 559 | 554 | takes advantage of the other lexical features. This sanitizer |
| 560 | 555 | should also clear metadata and replace images. |
| 556 | + | |
| 557 | + * Here are some notes about having stream data providers modify | |
| 558 | + stream dictionaries. I had wanted to add this functionality to make | |
| 559 | + it more efficient to create stream data providers that may | |
| 560 | + dynamically decide what kind of filters to use and that may end up | |
| 561 | + modifying the dictionary conditionally depending on the original | |
| 562 | + stream data. Ultimately I decided not to implement this feature. | |
| 563 | + This paragraph describes why. | |
| 564 | + | |
| 565 | + * When writing, the way objects are placed into the queue for | |
| 566 | + writing strongly precludes creation of any new indirect objects, | |
| 567 | + or even changing which indirect objects are referenced from which | |
| 568 | + other objects, because we sometimes write as we are traversing | |
| 569 | + and enqueuing objects. For non-linearized files, there is a risk | |
| 570 | + that an indirect object that used to be referenced would no | |
| 571 | + longer be referenced, and whether it was already written to the | |
| 572 | + output file would be based on an accident of where it was | |
| 573 | + encountered when traversing the object structure. For linearized | |
| 574 | + files, the situation is considerably worse. We decide which | |
| 575 | + section of the file to write an object to based on a mapping of | |
| 576 | + which objects are used by which other objects. Changing this | |
| 577 | + mapping could cause an object to appear in the wrong section, to | |
| 578 | + be written even though it is unreferenced, or to be entirely | |
| 579 | + omitted since, during linearization, we don't enqueue new objects | |
| 580 | + as we traverse for writing. | |
| 581 | + | |
| 582 | + * There are several places in QPDFWriter that query a stream's | |
| 583 | + dictionary in order to prepare for writing or to make decisions | |
| 584 | + about certain aspects of the writing process. If the stream data | |
| 585 | + provider has the chance to modify the dictionary, every piece of | |
| 586 | + code that gets stream data would have to be aware of this. This | |
| 587 | + would potentially include end user code. For example, any code | |
| 588 | + that called getDict() on a stream before installing a stream data | |
| 589 | + provider and expected that dictionary to be valid would | |
| 590 | + potentially be broken. As implemented right now, you must perform | |
| 591 | + any modifications on the dictionary in advance and provided | |
| 592 | + /Filter and /DecodeParms at the time you installed the stream | |
| 593 | + data provider. This means that some computations would have to be | |
| 594 | + done more than once, but for linearized files, stream data | |
| 595 | + providers are already called more than once. If the work done by | |
| 596 | + a stream data provider is especially expensive, it can implement | |
| 597 | + its own cache. | |
| 598 | + | |
| 599 | + The implementation of pluggable stream filters includes an example | |
| 600 | + that illustrates how a program might handle making decisions about | |
| 601 | + filters and decode parameters based on the input data. | ... | ... |
include/qpdf/QPDFObjectHandle.hh
| ... | ... | @@ -70,13 +70,28 @@ class QPDFObjectHandle |
| 70 | 70 | // QPDFWriter may, in some cases, add compression, but if it |
| 71 | 71 | // does, it will update the filters as needed. Every call to |
| 72 | 72 | // provideStreamData for a given stream must write the same |
| 73 | - // data. The object ID and generation passed to this method | |
| 74 | - // are those that belong to the stream on behalf of which the | |
| 75 | - // provider is called. They may be ignored or used by the | |
| 76 | - // implementation for indexing or other purposes. This | |
| 77 | - // information is made available just to make it more | |
| 78 | - // convenient to use a single StreamDataProvider object to | |
| 79 | - // provide data for multiple streams. | |
| 73 | + // data. Note that, when writing linearized files, qpdf will | |
| 74 | + // call your provideStreamData twice, and if it generates | |
| 75 | + // different output, you risk generating invalid output or | |
| 76 | + // having qpdf throw an exception. The object ID and | |
| 77 | + // generation passed to this method are those that belong to | |
| 78 | + // the stream on behalf of which the provider is called. They | |
| 79 | + // may be ignored or used by the implementation for indexing | |
| 80 | + // or other purposes. This information is made available just | |
| 81 | + // to make it more convenient to use a single | |
| 82 | + // StreamDataProvider object to provide data for multiple | |
| 83 | + // streams. | |
| 84 | + | |
| 85 | + // A few things to keep in mind: | |
| 86 | + // | |
| 87 | + // * Stream data providers must not modify any objects since | |
| 88 | + // they may be called after some parts of the file have | |
| 89 | + // already been written. | |
| 90 | + // | |
| 91 | + // * Since qpdf may call provideStreamData multiple times when | |
| 92 | + // writing linearized files, if the work done by your stream | |
| 93 | + // data provider is slow or computationally intensive, you | |
| 94 | + // might want to implement your own cache. | |
| 80 | 95 | |
| 81 | 96 | // Prior to qpdf 10.0.0, it was not possible to handle errors |
| 82 | 97 | // the way pipeStreamData does or to pass back success. | ... | ... |