Commit fddbcab0e7cc5978802251696055efa64667f637

Authored by Jay Berkenbilt
1 parent fbbb0ee0

Mostly don't require original QPDF for copyForeignObject (fixes #219)

The original QPDF is only required now when the source
QPDFObjectHandle is a stream that gets its stream data from a
QPDFObjectHandle::StreamDataProvider.
ChangeLog
  1 +2019-01-06 Jay Berkenbilt <ejb@ql.org>
  2 +
  3 + * Remove the restriction in most cases that the source QPDF used
  4 + in a copyForeignObject call has to stick around until the
  5 + destination QPDF is written. The exceptional case is when the
  6 + source stream gets is data using a
  7 + QPDFObjectHandle::StreamDataProvider. For a more in-depth
  8 + discussion, see comments around copyForeignObject in QPDF.hh.
  9 + Fixes #219.
  10 +
1 11 2019-01-05 Jay Berkenbilt <ejb@ql.org>
2 12  
3 13 * When generating appearances, if the font uses one of the
... ...
... ... @@ -3,19 +3,6 @@ Soon
3 3  
4 4 * Set up OSS-Fuzz (Google). See starred email in qpdf label.
5 5  
6   - * Issue #219, not requiring QPDF reference of copied object to be
7   - kept around. Comments are in the issue. Most likely we should take
8   - all the encryption parameters and push them into a separate class
9   - within QPDF. We can convert the encryption computation methods to
10   - methods of that class and then hang onto that along with the input
11   - stream for foreign streams. That should handle the encrypted file
12   - case. For the stream data provider case, we don't have to do
13   - anything special. In that case, it's definitely up to the user
14   - to make sure the stream data provider and everything it uses stay
15   - in scope. The user can use a PointerHolder<QPDF> if they want as
16   - long as the original QPDF object is allocated this way. That would
17   - be a good case for the test suite.
18   -
19 6 * Figure out how to render Gajić correctly in the PDF version of the
20 7 qpdf manual.
21 8  
... ... @@ -333,3 +320,42 @@ I find it useful to make reference to them in this list
333 320 of all that code to make the checks more robust. In particular,
334 321 it's hard to look at the code and quickly determine what is a true
335 322 logic error and what could happen because of malformed user input.
  323 +
  324 + * There are a few known limitations of copying foreign streams. These
  325 + are somewhat discussed in github issue 219. They are most likely
  326 + not worth fixing.
  327 +
  328 + * The original pipeStreamData in QPDF_Stream has various logic for
  329 + reporting warnings and letting the caller retry. This logic is
  330 + not implemented for stream data providers. When copying foreign
  331 + streams, qpdf uses a stream data provider
  332 + (QPDF::CopiedStreamDataProvider) to read the stream data from the
  333 + original file. While a warning is issued for that case, there is
  334 + no way to actually propagate failure information back through
  335 + because StreamDataProvider::provideStreamData doesn't take the
  336 + suppress_warnings or will_retry options, and adding them would
  337 + break source compatibility.
  338 +
  339 + * When copying streams that use StreamDataProvider provider, the
  340 + original QPDF object must still be kept around. This is the case
  341 + of where QPDF q1 has a stream for which replaceStreamData was
  342 + called to provide a StreamDataProvider, and then that stream from
  343 + q1 was subsequently copied to q2. In that case, q1 must be kept
  344 + around until q2 is written. That is a pretty unusual case as,
  345 + most of the time, people could just attach the stream data
  346 + provider directly to the q2. That said, it actually appears in
  347 + qpdf itself. qpdf allows you to combine the --pages and --split
  348 + options. In that case, a merged file, which uses
  349 + StreamDataProvider to access the original stream, is subsequently
  350 + copied to the final split output files. Solving this restriction
  351 + is difficult and would probably require breaking source
  352 + compatibility because QPDF_Stream doesn't have enough information
  353 + to know whether the original QPDF object is needed by the
  354 + stream's data provider. Also, the interface is designed to pass
  355 + the object ID and generation of the object whose data is being
  356 + provided so the provider can use it for lookup or any other
  357 + purpose. As such, copying a stream data provider to a new stream
  358 + potentially changes the meaning of the parameters passed to the
  359 + provider. This happens with CopiedStreamDataProvider, which uses
  360 + the object ID and generation of the new stream to locate the
  361 + parameters of the old stream.
... ...
include/qpdf/QPDF.hh
... ... @@ -237,19 +237,26 @@ class QPDF
237 237 replaceReserved(QPDFObjectHandle reserved,
238 238 QPDFObjectHandle replacement);
239 239  
240   - // Copy an object from another QPDF to this one. Please note that
241   - // the QPDF object containing the object being copied must stick
242   - // around because it is still used to retrieve any stream data
243   - // referenced by the copied objects.
  240 + // Copy an object from another QPDF to this one. Starting with
  241 + // qpdf version 8.3.0, it is no longer necessary to keep the
  242 + // original QPDF around after the call to copyForeignObject as
  243 + // long as the source of any copied stream data is still
  244 + // available. Usually this means you just have to keep the input
  245 + // file around, not the QPDF object. The exception to this is if
  246 + // you copy a stream that gets its data from a
  247 + // QPDFObjectHandle::StreamDataProvider. In this case only, the
  248 + // original stream's QPDF object must stick around because the
  249 + // QPDF object is itself the source of the original stream data.
  250 + // For a more in-depth discussion, please see the TODO file.
244 251 //
245   - // The return value is an indirect reference to the copied object
246   - // in this file. This method is intended to be used to copy
247   - // non-page objects and will not copy page objects. To copy page
248   - // objects, pass the foreign page object directly to addPage (or
249   - // addPageAt). If you copy objects that contain references to
250   - // pages, you should copy the pages first using addPage(At).
251   - // Otherwise references to the pages that have not been copied
252   - // will be replaced with nulls.
  252 + // The return value of this method is an indirect reference to the
  253 + // copied object in this file. This method is intended to be used
  254 + // to copy non-page objects and will not copy page objects. To
  255 + // copy page objects, pass the foreign page object directly to
  256 + // addPage (or addPageAt). If you copy objects that contain
  257 + // references to pages, you should copy the pages first using
  258 + // addPage(At). Otherwise references to the pages that have not
  259 + // been copied will be replaced with nulls.
253 260  
254 261 // When copying objects with this method, object structure will be
255 262 // preserved, so all indirectly referenced indirect objects will
... ... @@ -641,9 +648,59 @@ class QPDF
641 648 std::set<QPDFObjGen> visiting;
642 649 };
643 650  
  651 + class EncryptionParameters
  652 + {
  653 + friend class QPDF;
  654 + public:
  655 + EncryptionParameters();
  656 +
  657 + private:
  658 + bool encrypted;
  659 + bool encryption_initialized;
  660 + int encryption_V;
  661 + int encryption_R;
  662 + bool encrypt_metadata;
  663 + std::map<std::string, encryption_method_e> crypt_filters;
  664 + encryption_method_e cf_stream;
  665 + encryption_method_e cf_string;
  666 + encryption_method_e cf_file;
  667 + std::string provided_password;
  668 + std::string user_password;
  669 + std::string encryption_key;
  670 + std::string cached_object_encryption_key;
  671 + int cached_key_objid;
  672 + int cached_key_generation;
  673 + };
  674 +
  675 + class ForeignStreamData
  676 + {
  677 + friend class QPDF;
  678 + public:
  679 + ForeignStreamData(
  680 + PointerHolder<EncryptionParameters> encp,
  681 + PointerHolder<InputSource> file,
  682 + int foreign_objid,
  683 + int foreign_generation,
  684 + qpdf_offset_t offset,
  685 + size_t length,
  686 + bool is_attachment_stream,
  687 + QPDFObjectHandle local_dict);
  688 +
  689 + private:
  690 + PointerHolder<EncryptionParameters> encp;
  691 + PointerHolder<InputSource> file;
  692 + int foreign_objid;
  693 + int foreign_generation;
  694 + qpdf_offset_t offset;
  695 + size_t length;
  696 + bool is_attachment_stream;
  697 + QPDFObjectHandle local_dict;
  698 + };
  699 +
644 700 class CopiedStreamDataProvider: public QPDFObjectHandle::StreamDataProvider
645 701 {
646 702 public:
  703 + CopiedStreamDataProvider(QPDF& destination_qpdf);
647 704 virtual ~CopiedStreamDataProvider()
648 705 {
649 706 }
... ... @@ -651,9 +708,14 @@ class QPDF
651 708 Pipeline* pipeline);
652 709 void registerForeignStream(QPDFObjGen const& local_og,
653 710 QPDFObjectHandle foreign_stream);
  711 + void registerForeignStream(QPDFObjGen const& local_og,
  712 + PointerHolder<ForeignStreamData>);
654 713  
655 714 private:
  715 + QPDF& destination_qpdf;
656 716 std::map<QPDFObjGen, QPDFObjectHandle> foreign_streams;
  717 + std::map<QPDFObjGen,
  718 + PointerHolder<ForeignStreamData> > foreign_stream_data;
657 719 };
658 720  
659 721 class StringDecrypter: public QPDFObjectHandle::StringDecrypter
... ... @@ -692,30 +754,6 @@ class QPDF
692 754 };
693 755 friend class ResolveRecorder;
694 756  
695   - class EncryptionParameters
696   - {
697   - friend class QPDF;
698   - public:
699   - EncryptionParameters();
700   -
701   - private:
702   - bool encrypted;
703   - bool encryption_initialized;
704   - int encryption_V;
705   - int encryption_R;
706   - bool encrypt_metadata;
707   - std::map<std::string, encryption_method_e> crypt_filters;
708   - encryption_method_e cf_stream;
709   - encryption_method_e cf_string;
710   - encryption_method_e cf_file;
711   - std::string provided_password;
712   - std::string user_password;
713   - std::string encryption_key;
714   - std::string cached_object_encryption_key;
715   - int cached_key_objid;
716   - int cached_key_generation;
717   - };
718   -
719 757 void parse(char const* password);
720 758 void warn(QPDFExc const& e);
721 759 void setTrailer(QPDFObjectHandle obj);
... ... @@ -759,6 +797,11 @@ class QPDF
759 797 Pipeline* pipeline,
760 798 bool suppress_warnings,
761 799 bool will_retry);
  800 + bool pipeForeignStreamData(
  801 + PointerHolder<ForeignStreamData>,
  802 + Pipeline*,
  803 + unsigned long encode_flags,
  804 + qpdf_stream_decode_level_e decode_level);
762 805 static bool pipeStreamData(PointerHolder<QPDF::EncryptionParameters> encp,
763 806 PointerHolder<InputSource> file,
764 807 QPDF& qpdf_for_warning,
... ...
libqpdf/QPDF.cc
... ... @@ -18,6 +18,7 @@
18 18 #include <qpdf/QPDFExc.hh>
19 19 #include <qpdf/QPDF_Null.hh>
20 20 #include <qpdf/QPDF_Dictionary.hh>
  21 +#include <qpdf/QPDF_Stream.hh>
21 22  
22 23 std::string QPDF::qpdf_version = "8.2.1";
23 24  
... ... @@ -39,13 +40,50 @@ static char const* EMPTY_PDF =
39 40 "110\n"
40 41 "%%EOF\n";
41 42  
  43 +QPDF::ForeignStreamData::ForeignStreamData(
  44 + PointerHolder<EncryptionParameters> encp,
  45 + PointerHolder<InputSource> file,
  46 + int foreign_objid,
  47 + int foreign_generation,
  48 + qpdf_offset_t offset,
  49 + size_t length,
  50 + bool is_attachment_stream,
  51 + QPDFObjectHandle local_dict)
  52 + :
  53 + encp(encp),
  54 + file(file),
  55 + foreign_objid(foreign_objid),
  56 + foreign_generation(foreign_generation),
  57 + offset(offset),
  58 + length(length),
  59 + is_attachment_stream(is_attachment_stream),
  60 + local_dict(local_dict)
  61 +{
  62 +}
  63 +
  64 +QPDF::CopiedStreamDataProvider::CopiedStreamDataProvider(
  65 + QPDF& destination_qpdf) :
  66 + destination_qpdf(destination_qpdf)
  67 +{
  68 +}
  69 +
42 70 void
43 71 QPDF::CopiedStreamDataProvider::provideStreamData(
44 72 int objid, int generation, Pipeline* pipeline)
45 73 {
46   - QPDFObjectHandle foreign_stream =
47   - this->foreign_streams[QPDFObjGen(objid, generation)];
48   - foreign_stream.pipeStreamData(pipeline, 0, qpdf_dl_none);
  74 + PointerHolder<ForeignStreamData> foreign_data =
  75 + this->foreign_stream_data[QPDFObjGen(objid, generation)];
  76 + if (foreign_data.getPointer())
  77 + {
  78 + destination_qpdf.pipeForeignStreamData(
  79 + foreign_data, pipeline, 0, qpdf_dl_none);
  80 + }
  81 + else
  82 + {
  83 + QPDFObjectHandle foreign_stream =
  84 + this->foreign_streams[QPDFObjGen(objid, generation)];
  85 + foreign_stream.pipeStreamData(pipeline, 0, qpdf_dl_none);
  86 + }
49 87 }
50 88  
51 89 void
... ... @@ -55,6 +93,14 @@ QPDF::CopiedStreamDataProvider::registerForeignStream(
55 93 this->foreign_streams[local_og] = foreign_stream;
56 94 }
57 95  
  96 +void
  97 +QPDF::CopiedStreamDataProvider::registerForeignStream(
  98 + QPDFObjGen const& local_og,
  99 + PointerHolder<ForeignStreamData> foreign_stream)
  100 +{
  101 + this->foreign_stream_data[local_og] = foreign_stream;
  102 +}
  103 +
58 104 QPDF::StringDecrypter::StringDecrypter(QPDF* qpdf, int objid, int gen) :
59 105 qpdf(qpdf),
60 106 objid(objid),
... ... @@ -2307,15 +2353,67 @@ QPDF::replaceForeignIndirectObjects(
2307 2353 if (this->m->copied_stream_data_provider == 0)
2308 2354 {
2309 2355 this->m->copied_stream_data_provider =
2310   - new CopiedStreamDataProvider();
  2356 + new CopiedStreamDataProvider(*this);
2311 2357 this->m->copied_streams = this->m->copied_stream_data_provider;
2312 2358 }
2313 2359 QPDFObjGen local_og(result.getObjGen());
2314   - this->m->copied_stream_data_provider->registerForeignStream(
2315   - local_og, foreign);
2316   - result.replaceStreamData(this->m->copied_streams,
2317   - dict.getKey("/Filter"),
2318   - dict.getKey("/DecodeParms"));
  2360 + // Copy information from the foreign stream so we can pipe its
  2361 + // data later without keeping the original QPDF object around.
  2362 + QPDF* foreign_stream_qpdf = foreign.getOwningQPDF();
  2363 + if (! foreign_stream_qpdf)
  2364 + {
  2365 + throw std::logic_error("unable to retrieve owning qpdf"
  2366 + " from foreign stream");
  2367 + }
  2368 + QPDF_Stream* stream =
  2369 + dynamic_cast<QPDF_Stream*>(
  2370 + QPDFObjectHandle::ObjAccessor::getObject(
  2371 + foreign).getPointer());
  2372 + if (! stream)
  2373 + {
  2374 + throw std::logic_error("unable to retrieve underlying"
  2375 + " stream object from foreign stream");
  2376 + }
  2377 + PointerHolder<Buffer> stream_buffer =
  2378 + stream->getStreamDataBuffer();
  2379 + PointerHolder<QPDFObjectHandle::StreamDataProvider> stream_provider =
  2380 + stream->getStreamDataProvider();
  2381 + if (stream_buffer.getPointer())
  2382 + {
  2383 +// QTC::TC("qpdf", "QPDF copy foreign stream with buffer");
  2384 + result.replaceStreamData(stream_buffer,
  2385 + dict.getKey("/Filter"),
  2386 + dict.getKey("/DecodeParms"));
  2387 + }
  2388 + else if (stream_provider.getPointer())
  2389 + {
  2390 + // In this case, the remote stream's QPDF must stay in scope.
  2391 +// QTC::TC("qpdf", "QPDF copy foreign stream with provider");
  2392 + this->m->copied_stream_data_provider->registerForeignStream(
  2393 + local_og, foreign);
  2394 + result.replaceStreamData(this->m->copied_streams,
  2395 + dict.getKey("/Filter"),
  2396 + dict.getKey("/DecodeParms"));
  2397 + }
  2398 + else
  2399 + {
  2400 + PointerHolder<ForeignStreamData> foreign_stream_data =
  2401 + new ForeignStreamData(
  2402 + foreign_stream_qpdf->m->encp,
  2403 + foreign_stream_qpdf->m->file,
  2404 + foreign.getObjectID(),
  2405 + foreign.getGeneration(),
  2406 + stream->getOffset(),
  2407 + stream->getLength(),
  2408 + (foreign_stream_qpdf->m->attachment_streams.count(
  2409 + foreign.getObjGen()) > 0),
  2410 + dict);
  2411 + this->m->copied_stream_data_provider->registerForeignStream(
  2412 + local_og, foreign_stream_data);
  2413 + result.replaceStreamData(this->m->copied_streams,
  2414 + dict.getKey("/Filter"),
  2415 + dict.getKey("/DecodeParms"));
  2416 + }
2319 2417 }
2320 2418 else
2321 2419 {
... ... @@ -2613,6 +2711,25 @@ QPDF::pipeStreamData(int objid, int generation,
2613 2711 pipeline, suppress_warnings, will_retry);
2614 2712 }
2615 2713  
  2714 +bool
  2715 +QPDF::pipeForeignStreamData(
  2716 + PointerHolder<ForeignStreamData> foreign,
  2717 + Pipeline* pipeline,
  2718 + unsigned long encode_flags,
  2719 + qpdf_stream_decode_level_e decode_level)
  2720 +{
  2721 + if (foreign->encp->encrypted)
  2722 + {
  2723 + QTC::TC("qpdf", "QPDF pipe foreign encrypted stream");
  2724 + }
  2725 + return pipeStreamData(
  2726 + foreign->encp, foreign->file, *this,
  2727 + foreign->foreign_objid, foreign->foreign_generation,
  2728 + foreign->offset, foreign->length,
  2729 + foreign->local_dict, foreign->is_attachment_stream,
  2730 + pipeline, false, false);
  2731 +}
  2732 +
2616 2733 void
2617 2734 QPDF::findAttachmentStreams()
2618 2735 {
... ...
libqpdf/QPDF_Stream.cc
... ... @@ -133,6 +133,30 @@ QPDF_Stream::isDataModified() const
133 133 return (! this->token_filters.empty());
134 134 }
135 135  
  136 +qpdf_offset_t
  137 +QPDF_Stream::getOffset() const
  138 +{
  139 + return this->offset;
  140 +}
  141 +
  142 +size_t
  143 +QPDF_Stream::getLength() const
  144 +{
  145 + return this->length;
  146 +}
  147 +
  148 +PointerHolder<Buffer>
  149 +QPDF_Stream::getStreamDataBuffer() const
  150 +{
  151 + return this->stream_data;
  152 +}
  153 +
  154 +PointerHolder<QPDFObjectHandle::StreamDataProvider>
  155 +QPDF_Stream::getStreamDataProvider() const
  156 +{
  157 + return this->stream_provider;
  158 +}
  159 +
136 160 PointerHolder<Buffer>
137 161 QPDF_Stream::getStreamData(qpdf_stream_decode_level_e decode_level)
138 162 {
... ...
libqpdf/qpdf/QPDF_Stream.hh
... ... @@ -24,6 +24,12 @@ class QPDF_Stream: public QPDFObject
24 24 QPDFObjectHandle getDict() const;
25 25 bool isDataModified() const;
26 26  
  27 + // Methods to help QPDF copy foreign streams
  28 + qpdf_offset_t getOffset() const;
  29 + size_t getLength() const;
  30 + PointerHolder<Buffer> getStreamDataBuffer() const;
  31 + PointerHolder<QPDFObjectHandle::StreamDataProvider> getStreamDataProvider() const;
  32 +
27 33 // See comments in QPDFObjectHandle.hh for these methods.
28 34 bool pipeStreamData(Pipeline*,
29 35 unsigned long encode_flags,
... ...
qpdf/qpdf.testcov
... ... @@ -407,3 +407,4 @@ qpdf image optimize no shrink 0
407 407 qpdf image optimize too small 0
408 408 QPDFFormFieldObjectHelper WinAnsi 0
409 409 QPDF_encryption attachment stream 0
  410 +QPDF pipe foreign encrypted stream 0
... ...