Commit 0675a3f61a465f282eba8e1f54bdda3920257959

Authored by Jay Berkenbilt
1 parent cc889507

Decide not to allow stream data providers to modify dictionary

@@ -29,11 +29,6 @@ Candidates for upcoming release @@ -29,11 +29,6 @@ Candidates for upcoming release
29 * big page even with --remove-unreferenced-resources=yes, even with --empty 29 * big page even with --remove-unreferenced-resources=yes, even with --empty
30 * optimize image failure because of colorspace 30 * optimize image failure because of colorspace
31 31
32 -* Make it possible for StreamDataProvider to modify the stream  
33 - dictionary in addition to the stream data so it can calculate things  
34 - about the dictionary at runtime. Will require a small change to  
35 - QPDFWriter.  
36 -  
37 * Take flattenRotation code from pdf-split and do something with it, 32 * Take flattenRotation code from pdf-split and do something with it,
38 maybe adding it to the library. Once there, call it from pdf-split 33 maybe adding it to the library. Once there, call it from pdf-split
39 and bump up the required version of qpdf. 34 and bump up the required version of qpdf.
@@ -558,3 +553,49 @@ I find it useful to make reference to them in this list @@ -558,3 +553,49 @@ I find it useful to make reference to them in this list
558 filtering and tokenizer rewrite and should be done in a manner that 553 filtering and tokenizer rewrite and should be done in a manner that
559 takes advantage of the other lexical features. This sanitizer 554 takes advantage of the other lexical features. This sanitizer
560 should also clear metadata and replace images. 555 should also clear metadata and replace images.
  556 +
  557 + * Here are some notes about having stream data providers modify
  558 + stream dictionaries. I had wanted to add this functionality to make
  559 + it more efficient to create stream data providers that may
  560 + dynamically decide what kind of filters to use and that may end up
  561 + modifying the dictionary conditionally depending on the original
  562 + stream data. Ultimately I decided not to implement this feature.
  563 + This paragraph describes why.
  564 +
  565 + * When writing, the way objects are placed into the queue for
  566 + writing strongly precludes creation of any new indirect objects,
  567 + or even changing which indirect objects are referenced from which
  568 + other objects, because we sometimes write as we are traversing
  569 + and enqueuing objects. For non-linearized files, there is a risk
  570 + that an indirect object that used to be referenced would no
  571 + longer be referenced, and whether it was already written to the
  572 + output file would be based on an accident of where it was
  573 + encountered when traversing the object structure. For linearized
  574 + files, the situation is considerably worse. We decide which
  575 + section of the file to write an object to based on a mapping of
  576 + which objects are used by which other objects. Changing this
  577 + mapping could cause an object to appear in the wrong section, to
  578 + be written even though it is unreferenced, or to be entirely
  579 + omitted since, during linearization, we don't enqueue new objects
  580 + as we traverse for writing.
  581 +
  582 + * There are several places in QPDFWriter that query a stream's
  583 + dictionary in order to prepare for writing or to make decisions
  584 + about certain aspects of the writing process. If the stream data
  585 + provider has the chance to modify the dictionary, every piece of
  586 + code that gets stream data would have to be aware of this. This
  587 + would potentially include end user code. For example, any code
  588 + that called getDict() on a stream before installing a stream data
  589 + provider and expected that dictionary to be valid would
  590 + potentially be broken. As implemented right now, you must perform
  591 + any modifications on the dictionary in advance and provided
  592 + /Filter and /DecodeParms at the time you installed the stream
  593 + data provider. This means that some computations would have to be
  594 + done more than once, but for linearized files, stream data
  595 + providers are already called more than once. If the work done by
  596 + a stream data provider is especially expensive, it can implement
  597 + its own cache.
  598 +
  599 + The implementation of pluggable stream filters includes an example
  600 + that illustrates how a program might handle making decisions about
  601 + filters and decode parameters based on the input data.
include/qpdf/QPDFObjectHandle.hh
@@ -70,13 +70,28 @@ class QPDFObjectHandle @@ -70,13 +70,28 @@ class QPDFObjectHandle
70 // QPDFWriter may, in some cases, add compression, but if it 70 // QPDFWriter may, in some cases, add compression, but if it
71 // does, it will update the filters as needed. Every call to 71 // does, it will update the filters as needed. Every call to
72 // provideStreamData for a given stream must write the same 72 // provideStreamData for a given stream must write the same
73 - // data. The object ID and generation passed to this method  
74 - // are those that belong to the stream on behalf of which the  
75 - // provider is called. They may be ignored or used by the  
76 - // implementation for indexing or other purposes. This  
77 - // information is made available just to make it more  
78 - // convenient to use a single StreamDataProvider object to  
79 - // provide data for multiple streams. 73 + // data. Note that, when writing linearized files, qpdf will
  74 + // call your provideStreamData twice, and if it generates
  75 + // different output, you risk generating invalid output or
  76 + // having qpdf throw an exception. The object ID and
  77 + // generation passed to this method are those that belong to
  78 + // the stream on behalf of which the provider is called. They
  79 + // may be ignored or used by the implementation for indexing
  80 + // or other purposes. This information is made available just
  81 + // to make it more convenient to use a single
  82 + // StreamDataProvider object to provide data for multiple
  83 + // streams.
  84 +
  85 + // A few things to keep in mind:
  86 + //
  87 + // * Stream data providers must not modify any objects since
  88 + // they may be called after some parts of the file have
  89 + // already been written.
  90 + //
  91 + // * Since qpdf may call provideStreamData multiple times when
  92 + // writing linearized files, if the work done by your stream
  93 + // data provider is slow or computationally intensive, you
  94 + // might want to implement your own cache.
80 95
81 // Prior to qpdf 10.0.0, it was not possible to handle errors 96 // Prior to qpdf 10.0.0, it was not possible to handle errors
82 // the way pipeStreamData does or to pass back success. 97 // the way pipeStreamData does or to pass back success.