Commit fddbcab0e7cc5978802251696055efa64667f637

Authored by Jay Berkenbilt
1 parent fbbb0ee0

Mostly don't require original QPDF for copyForeignObject (fixes #219)

The original QPDF is only required now when the source
QPDFObjectHandle is a stream that gets its stream data from a
QPDFObjectHandle::StreamDataProvider.
ChangeLog
  1 +2019-01-06 Jay Berkenbilt <ejb@ql.org>
  2 +
  3 + * Remove the restriction in most cases that the source QPDF used
  4 + in a copyForeignObject call has to stick around until the
  5 + destination QPDF is written. The exceptional case is when the
  6 + source stream gets is data using a
  7 + QPDFObjectHandle::StreamDataProvider. For a more in-depth
  8 + discussion, see comments around copyForeignObject in QPDF.hh.
  9 + Fixes #219.
  10 +
1 2019-01-05 Jay Berkenbilt <ejb@ql.org> 11 2019-01-05 Jay Berkenbilt <ejb@ql.org>
2 12
3 * When generating appearances, if the font uses one of the 13 * When generating appearances, if the font uses one of the
@@ -3,19 +3,6 @@ Soon @@ -3,19 +3,6 @@ Soon
3 3
4 * Set up OSS-Fuzz (Google). See starred email in qpdf label. 4 * Set up OSS-Fuzz (Google). See starred email in qpdf label.
5 5
6 - * Issue #219, not requiring QPDF reference of copied object to be  
7 - kept around. Comments are in the issue. Most likely we should take  
8 - all the encryption parameters and push them into a separate class  
9 - within QPDF. We can convert the encryption computation methods to  
10 - methods of that class and then hang onto that along with the input  
11 - stream for foreign streams. That should handle the encrypted file  
12 - case. For the stream data provider case, we don't have to do  
13 - anything special. In that case, it's definitely up to the user  
14 - to make sure the stream data provider and everything it uses stay  
15 - in scope. The user can use a PointerHolder<QPDF> if they want as  
16 - long as the original QPDF object is allocated this way. That would  
17 - be a good case for the test suite.  
18 -  
19 * Figure out how to render Gajić correctly in the PDF version of the 6 * Figure out how to render Gajić correctly in the PDF version of the
20 qpdf manual. 7 qpdf manual.
21 8
@@ -333,3 +320,42 @@ I find it useful to make reference to them in this list @@ -333,3 +320,42 @@ I find it useful to make reference to them in this list
333 of all that code to make the checks more robust. In particular, 320 of all that code to make the checks more robust. In particular,
334 it's hard to look at the code and quickly determine what is a true 321 it's hard to look at the code and quickly determine what is a true
335 logic error and what could happen because of malformed user input. 322 logic error and what could happen because of malformed user input.
  323 +
  324 + * There are a few known limitations of copying foreign streams. These
  325 + are somewhat discussed in github issue 219. They are most likely
  326 + not worth fixing.
  327 +
  328 + * The original pipeStreamData in QPDF_Stream has various logic for
  329 + reporting warnings and letting the caller retry. This logic is
  330 + not implemented for stream data providers. When copying foreign
  331 + streams, qpdf uses a stream data provider
  332 + (QPDF::CopiedStreamDataProvider) to read the stream data from the
  333 + original file. While a warning is issued for that case, there is
  334 + no way to actually propagate failure information back through
  335 + because StreamDataProvider::provideStreamData doesn't take the
  336 + suppress_warnings or will_retry options, and adding them would
  337 + break source compatibility.
  338 +
  339 + * When copying streams that use StreamDataProvider provider, the
  340 + original QPDF object must still be kept around. This is the case
  341 + of where QPDF q1 has a stream for which replaceStreamData was
  342 + called to provide a StreamDataProvider, and then that stream from
  343 + q1 was subsequently copied to q2. In that case, q1 must be kept
  344 + around until q2 is written. That is a pretty unusual case as,
  345 + most of the time, people could just attach the stream data
  346 + provider directly to the q2. That said, it actually appears in
  347 + qpdf itself. qpdf allows you to combine the --pages and --split
  348 + options. In that case, a merged file, which uses
  349 + StreamDataProvider to access the original stream, is subsequently
  350 + copied to the final split output files. Solving this restriction
  351 + is difficult and would probably require breaking source
  352 + compatibility because QPDF_Stream doesn't have enough information
  353 + to know whether the original QPDF object is needed by the
  354 + stream's data provider. Also, the interface is designed to pass
  355 + the object ID and generation of the object whose data is being
  356 + provided so the provider can use it for lookup or any other
  357 + purpose. As such, copying a stream data provider to a new stream
  358 + potentially changes the meaning of the parameters passed to the
  359 + provider. This happens with CopiedStreamDataProvider, which uses
  360 + the object ID and generation of the new stream to locate the
  361 + parameters of the old stream.
include/qpdf/QPDF.hh
@@ -237,19 +237,26 @@ class QPDF @@ -237,19 +237,26 @@ class QPDF
237 replaceReserved(QPDFObjectHandle reserved, 237 replaceReserved(QPDFObjectHandle reserved,
238 QPDFObjectHandle replacement); 238 QPDFObjectHandle replacement);
239 239
240 - // Copy an object from another QPDF to this one. Please note that  
241 - // the QPDF object containing the object being copied must stick  
242 - // around because it is still used to retrieve any stream data  
243 - // referenced by the copied objects. 240 + // Copy an object from another QPDF to this one. Starting with
  241 + // qpdf version 8.3.0, it is no longer necessary to keep the
  242 + // original QPDF around after the call to copyForeignObject as
  243 + // long as the source of any copied stream data is still
  244 + // available. Usually this means you just have to keep the input
  245 + // file around, not the QPDF object. The exception to this is if
  246 + // you copy a stream that gets its data from a
  247 + // QPDFObjectHandle::StreamDataProvider. In this case only, the
  248 + // original stream's QPDF object must stick around because the
  249 + // QPDF object is itself the source of the original stream data.
  250 + // For a more in-depth discussion, please see the TODO file.
244 // 251 //
245 - // The return value is an indirect reference to the copied object  
246 - // in this file. This method is intended to be used to copy  
247 - // non-page objects and will not copy page objects. To copy page  
248 - // objects, pass the foreign page object directly to addPage (or  
249 - // addPageAt). If you copy objects that contain references to  
250 - // pages, you should copy the pages first using addPage(At).  
251 - // Otherwise references to the pages that have not been copied  
252 - // will be replaced with nulls. 252 + // The return value of this method is an indirect reference to the
  253 + // copied object in this file. This method is intended to be used
  254 + // to copy non-page objects and will not copy page objects. To
  255 + // copy page objects, pass the foreign page object directly to
  256 + // addPage (or addPageAt). If you copy objects that contain
  257 + // references to pages, you should copy the pages first using
  258 + // addPage(At). Otherwise references to the pages that have not
  259 + // been copied will be replaced with nulls.
253 260
254 // When copying objects with this method, object structure will be 261 // When copying objects with this method, object structure will be
255 // preserved, so all indirectly referenced indirect objects will 262 // preserved, so all indirectly referenced indirect objects will
@@ -641,9 +648,59 @@ class QPDF @@ -641,9 +648,59 @@ class QPDF
641 std::set<QPDFObjGen> visiting; 648 std::set<QPDFObjGen> visiting;
642 }; 649 };
643 650
  651 + class EncryptionParameters
  652 + {
  653 + friend class QPDF;
  654 + public:
  655 + EncryptionParameters();
  656 +
  657 + private:
  658 + bool encrypted;
  659 + bool encryption_initialized;
  660 + int encryption_V;
  661 + int encryption_R;
  662 + bool encrypt_metadata;
  663 + std::map<std::string, encryption_method_e> crypt_filters;
  664 + encryption_method_e cf_stream;
  665 + encryption_method_e cf_string;
  666 + encryption_method_e cf_file;
  667 + std::string provided_password;
  668 + std::string user_password;
  669 + std::string encryption_key;
  670 + std::string cached_object_encryption_key;
  671 + int cached_key_objid;
  672 + int cached_key_generation;
  673 + };
  674 +
  675 + class ForeignStreamData
  676 + {
  677 + friend class QPDF;
  678 + public:
  679 + ForeignStreamData(
  680 + PointerHolder<EncryptionParameters> encp,
  681 + PointerHolder<InputSource> file,
  682 + int foreign_objid,
  683 + int foreign_generation,
  684 + qpdf_offset_t offset,
  685 + size_t length,
  686 + bool is_attachment_stream,
  687 + QPDFObjectHandle local_dict);
  688 +
  689 + private:
  690 + PointerHolder<EncryptionParameters> encp;
  691 + PointerHolder<InputSource> file;
  692 + int foreign_objid;
  693 + int foreign_generation;
  694 + qpdf_offset_t offset;
  695 + size_t length;
  696 + bool is_attachment_stream;
  697 + QPDFObjectHandle local_dict;
  698 + };
  699 +
644 class CopiedStreamDataProvider: public QPDFObjectHandle::StreamDataProvider 700 class CopiedStreamDataProvider: public QPDFObjectHandle::StreamDataProvider
645 { 701 {
646 public: 702 public:
  703 + CopiedStreamDataProvider(QPDF& destination_qpdf);
647 virtual ~CopiedStreamDataProvider() 704 virtual ~CopiedStreamDataProvider()
648 { 705 {
649 } 706 }
@@ -651,9 +708,14 @@ class QPDF @@ -651,9 +708,14 @@ class QPDF
651 Pipeline* pipeline); 708 Pipeline* pipeline);
652 void registerForeignStream(QPDFObjGen const& local_og, 709 void registerForeignStream(QPDFObjGen const& local_og,
653 QPDFObjectHandle foreign_stream); 710 QPDFObjectHandle foreign_stream);
  711 + void registerForeignStream(QPDFObjGen const& local_og,
  712 + PointerHolder<ForeignStreamData>);
654 713
655 private: 714 private:
  715 + QPDF& destination_qpdf;
656 std::map<QPDFObjGen, QPDFObjectHandle> foreign_streams; 716 std::map<QPDFObjGen, QPDFObjectHandle> foreign_streams;
  717 + std::map<QPDFObjGen,
  718 + PointerHolder<ForeignStreamData> > foreign_stream_data;
657 }; 719 };
658 720
659 class StringDecrypter: public QPDFObjectHandle::StringDecrypter 721 class StringDecrypter: public QPDFObjectHandle::StringDecrypter
@@ -692,30 +754,6 @@ class QPDF @@ -692,30 +754,6 @@ class QPDF
692 }; 754 };
693 friend class ResolveRecorder; 755 friend class ResolveRecorder;
694 756
695 - class EncryptionParameters  
696 - {  
697 - friend class QPDF;  
698 - public:  
699 - EncryptionParameters();  
700 -  
701 - private:  
702 - bool encrypted;  
703 - bool encryption_initialized;  
704 - int encryption_V;  
705 - int encryption_R;  
706 - bool encrypt_metadata;  
707 - std::map<std::string, encryption_method_e> crypt_filters;  
708 - encryption_method_e cf_stream;  
709 - encryption_method_e cf_string;  
710 - encryption_method_e cf_file;  
711 - std::string provided_password;  
712 - std::string user_password;  
713 - std::string encryption_key;  
714 - std::string cached_object_encryption_key;  
715 - int cached_key_objid;  
716 - int cached_key_generation;  
717 - };  
718 -  
719 void parse(char const* password); 757 void parse(char const* password);
720 void warn(QPDFExc const& e); 758 void warn(QPDFExc const& e);
721 void setTrailer(QPDFObjectHandle obj); 759 void setTrailer(QPDFObjectHandle obj);
@@ -759,6 +797,11 @@ class QPDF @@ -759,6 +797,11 @@ class QPDF
759 Pipeline* pipeline, 797 Pipeline* pipeline,
760 bool suppress_warnings, 798 bool suppress_warnings,
761 bool will_retry); 799 bool will_retry);
  800 + bool pipeForeignStreamData(
  801 + PointerHolder<ForeignStreamData>,
  802 + Pipeline*,
  803 + unsigned long encode_flags,
  804 + qpdf_stream_decode_level_e decode_level);
762 static bool pipeStreamData(PointerHolder<QPDF::EncryptionParameters> encp, 805 static bool pipeStreamData(PointerHolder<QPDF::EncryptionParameters> encp,
763 PointerHolder<InputSource> file, 806 PointerHolder<InputSource> file,
764 QPDF& qpdf_for_warning, 807 QPDF& qpdf_for_warning,
libqpdf/QPDF.cc
@@ -18,6 +18,7 @@ @@ -18,6 +18,7 @@
18 #include <qpdf/QPDFExc.hh> 18 #include <qpdf/QPDFExc.hh>
19 #include <qpdf/QPDF_Null.hh> 19 #include <qpdf/QPDF_Null.hh>
20 #include <qpdf/QPDF_Dictionary.hh> 20 #include <qpdf/QPDF_Dictionary.hh>
  21 +#include <qpdf/QPDF_Stream.hh>
21 22
22 std::string QPDF::qpdf_version = "8.2.1"; 23 std::string QPDF::qpdf_version = "8.2.1";
23 24
@@ -39,13 +40,50 @@ static char const* EMPTY_PDF = @@ -39,13 +40,50 @@ static char const* EMPTY_PDF =
39 "110\n" 40 "110\n"
40 "%%EOF\n"; 41 "%%EOF\n";
41 42
  43 +QPDF::ForeignStreamData::ForeignStreamData(
  44 + PointerHolder<EncryptionParameters> encp,
  45 + PointerHolder<InputSource> file,
  46 + int foreign_objid,
  47 + int foreign_generation,
  48 + qpdf_offset_t offset,
  49 + size_t length,
  50 + bool is_attachment_stream,
  51 + QPDFObjectHandle local_dict)
  52 + :
  53 + encp(encp),
  54 + file(file),
  55 + foreign_objid(foreign_objid),
  56 + foreign_generation(foreign_generation),
  57 + offset(offset),
  58 + length(length),
  59 + is_attachment_stream(is_attachment_stream),
  60 + local_dict(local_dict)
  61 +{
  62 +}
  63 +
  64 +QPDF::CopiedStreamDataProvider::CopiedStreamDataProvider(
  65 + QPDF& destination_qpdf) :
  66 + destination_qpdf(destination_qpdf)
  67 +{
  68 +}
  69 +
42 void 70 void
43 QPDF::CopiedStreamDataProvider::provideStreamData( 71 QPDF::CopiedStreamDataProvider::provideStreamData(
44 int objid, int generation, Pipeline* pipeline) 72 int objid, int generation, Pipeline* pipeline)
45 { 73 {
46 - QPDFObjectHandle foreign_stream =  
47 - this->foreign_streams[QPDFObjGen(objid, generation)];  
48 - foreign_stream.pipeStreamData(pipeline, 0, qpdf_dl_none); 74 + PointerHolder<ForeignStreamData> foreign_data =
  75 + this->foreign_stream_data[QPDFObjGen(objid, generation)];
  76 + if (foreign_data.getPointer())
  77 + {
  78 + destination_qpdf.pipeForeignStreamData(
  79 + foreign_data, pipeline, 0, qpdf_dl_none);
  80 + }
  81 + else
  82 + {
  83 + QPDFObjectHandle foreign_stream =
  84 + this->foreign_streams[QPDFObjGen(objid, generation)];
  85 + foreign_stream.pipeStreamData(pipeline, 0, qpdf_dl_none);
  86 + }
49 } 87 }
50 88
51 void 89 void
@@ -55,6 +93,14 @@ QPDF::CopiedStreamDataProvider::registerForeignStream( @@ -55,6 +93,14 @@ QPDF::CopiedStreamDataProvider::registerForeignStream(
55 this->foreign_streams[local_og] = foreign_stream; 93 this->foreign_streams[local_og] = foreign_stream;
56 } 94 }
57 95
  96 +void
  97 +QPDF::CopiedStreamDataProvider::registerForeignStream(
  98 + QPDFObjGen const& local_og,
  99 + PointerHolder<ForeignStreamData> foreign_stream)
  100 +{
  101 + this->foreign_stream_data[local_og] = foreign_stream;
  102 +}
  103 +
58 QPDF::StringDecrypter::StringDecrypter(QPDF* qpdf, int objid, int gen) : 104 QPDF::StringDecrypter::StringDecrypter(QPDF* qpdf, int objid, int gen) :
59 qpdf(qpdf), 105 qpdf(qpdf),
60 objid(objid), 106 objid(objid),
@@ -2307,15 +2353,67 @@ QPDF::replaceForeignIndirectObjects( @@ -2307,15 +2353,67 @@ QPDF::replaceForeignIndirectObjects(
2307 if (this->m->copied_stream_data_provider == 0) 2353 if (this->m->copied_stream_data_provider == 0)
2308 { 2354 {
2309 this->m->copied_stream_data_provider = 2355 this->m->copied_stream_data_provider =
2310 - new CopiedStreamDataProvider(); 2356 + new CopiedStreamDataProvider(*this);
2311 this->m->copied_streams = this->m->copied_stream_data_provider; 2357 this->m->copied_streams = this->m->copied_stream_data_provider;
2312 } 2358 }
2313 QPDFObjGen local_og(result.getObjGen()); 2359 QPDFObjGen local_og(result.getObjGen());
2314 - this->m->copied_stream_data_provider->registerForeignStream(  
2315 - local_og, foreign);  
2316 - result.replaceStreamData(this->m->copied_streams,  
2317 - dict.getKey("/Filter"),  
2318 - dict.getKey("/DecodeParms")); 2360 + // Copy information from the foreign stream so we can pipe its
  2361 + // data later without keeping the original QPDF object around.
  2362 + QPDF* foreign_stream_qpdf = foreign.getOwningQPDF();
  2363 + if (! foreign_stream_qpdf)
  2364 + {
  2365 + throw std::logic_error("unable to retrieve owning qpdf"
  2366 + " from foreign stream");
  2367 + }
  2368 + QPDF_Stream* stream =
  2369 + dynamic_cast<QPDF_Stream*>(
  2370 + QPDFObjectHandle::ObjAccessor::getObject(
  2371 + foreign).getPointer());
  2372 + if (! stream)
  2373 + {
  2374 + throw std::logic_error("unable to retrieve underlying"
  2375 + " stream object from foreign stream");
  2376 + }
  2377 + PointerHolder<Buffer> stream_buffer =
  2378 + stream->getStreamDataBuffer();
  2379 + PointerHolder<QPDFObjectHandle::StreamDataProvider> stream_provider =
  2380 + stream->getStreamDataProvider();
  2381 + if (stream_buffer.getPointer())
  2382 + {
  2383 +// QTC::TC("qpdf", "QPDF copy foreign stream with buffer");
  2384 + result.replaceStreamData(stream_buffer,
  2385 + dict.getKey("/Filter"),
  2386 + dict.getKey("/DecodeParms"));
  2387 + }
  2388 + else if (stream_provider.getPointer())
  2389 + {
  2390 + // In this case, the remote stream's QPDF must stay in scope.
  2391 +// QTC::TC("qpdf", "QPDF copy foreign stream with provider");
  2392 + this->m->copied_stream_data_provider->registerForeignStream(
  2393 + local_og, foreign);
  2394 + result.replaceStreamData(this->m->copied_streams,
  2395 + dict.getKey("/Filter"),
  2396 + dict.getKey("/DecodeParms"));
  2397 + }
  2398 + else
  2399 + {
  2400 + PointerHolder<ForeignStreamData> foreign_stream_data =
  2401 + new ForeignStreamData(
  2402 + foreign_stream_qpdf->m->encp,
  2403 + foreign_stream_qpdf->m->file,
  2404 + foreign.getObjectID(),
  2405 + foreign.getGeneration(),
  2406 + stream->getOffset(),
  2407 + stream->getLength(),
  2408 + (foreign_stream_qpdf->m->attachment_streams.count(
  2409 + foreign.getObjGen()) > 0),
  2410 + dict);
  2411 + this->m->copied_stream_data_provider->registerForeignStream(
  2412 + local_og, foreign_stream_data);
  2413 + result.replaceStreamData(this->m->copied_streams,
  2414 + dict.getKey("/Filter"),
  2415 + dict.getKey("/DecodeParms"));
  2416 + }
2319 } 2417 }
2320 else 2418 else
2321 { 2419 {
@@ -2613,6 +2711,25 @@ QPDF::pipeStreamData(int objid, int generation, @@ -2613,6 +2711,25 @@ QPDF::pipeStreamData(int objid, int generation,
2613 pipeline, suppress_warnings, will_retry); 2711 pipeline, suppress_warnings, will_retry);
2614 } 2712 }
2615 2713
  2714 +bool
  2715 +QPDF::pipeForeignStreamData(
  2716 + PointerHolder<ForeignStreamData> foreign,
  2717 + Pipeline* pipeline,
  2718 + unsigned long encode_flags,
  2719 + qpdf_stream_decode_level_e decode_level)
  2720 +{
  2721 + if (foreign->encp->encrypted)
  2722 + {
  2723 + QTC::TC("qpdf", "QPDF pipe foreign encrypted stream");
  2724 + }
  2725 + return pipeStreamData(
  2726 + foreign->encp, foreign->file, *this,
  2727 + foreign->foreign_objid, foreign->foreign_generation,
  2728 + foreign->offset, foreign->length,
  2729 + foreign->local_dict, foreign->is_attachment_stream,
  2730 + pipeline, false, false);
  2731 +}
  2732 +
2616 void 2733 void
2617 QPDF::findAttachmentStreams() 2734 QPDF::findAttachmentStreams()
2618 { 2735 {
libqpdf/QPDF_Stream.cc
@@ -133,6 +133,30 @@ QPDF_Stream::isDataModified() const @@ -133,6 +133,30 @@ QPDF_Stream::isDataModified() const
133 return (! this->token_filters.empty()); 133 return (! this->token_filters.empty());
134 } 134 }
135 135
  136 +qpdf_offset_t
  137 +QPDF_Stream::getOffset() const
  138 +{
  139 + return this->offset;
  140 +}
  141 +
  142 +size_t
  143 +QPDF_Stream::getLength() const
  144 +{
  145 + return this->length;
  146 +}
  147 +
  148 +PointerHolder<Buffer>
  149 +QPDF_Stream::getStreamDataBuffer() const
  150 +{
  151 + return this->stream_data;
  152 +}
  153 +
  154 +PointerHolder<QPDFObjectHandle::StreamDataProvider>
  155 +QPDF_Stream::getStreamDataProvider() const
  156 +{
  157 + return this->stream_provider;
  158 +}
  159 +
136 PointerHolder<Buffer> 160 PointerHolder<Buffer>
137 QPDF_Stream::getStreamData(qpdf_stream_decode_level_e decode_level) 161 QPDF_Stream::getStreamData(qpdf_stream_decode_level_e decode_level)
138 { 162 {
libqpdf/qpdf/QPDF_Stream.hh
@@ -24,6 +24,12 @@ class QPDF_Stream: public QPDFObject @@ -24,6 +24,12 @@ class QPDF_Stream: public QPDFObject
24 QPDFObjectHandle getDict() const; 24 QPDFObjectHandle getDict() const;
25 bool isDataModified() const; 25 bool isDataModified() const;
26 26
  27 + // Methods to help QPDF copy foreign streams
  28 + qpdf_offset_t getOffset() const;
  29 + size_t getLength() const;
  30 + PointerHolder<Buffer> getStreamDataBuffer() const;
  31 + PointerHolder<QPDFObjectHandle::StreamDataProvider> getStreamDataProvider() const;
  32 +
27 // See comments in QPDFObjectHandle.hh for these methods. 33 // See comments in QPDFObjectHandle.hh for these methods.
28 bool pipeStreamData(Pipeline*, 34 bool pipeStreamData(Pipeline*,
29 unsigned long encode_flags, 35 unsigned long encode_flags,
qpdf/qpdf.testcov
@@ -407,3 +407,4 @@ qpdf image optimize no shrink 0 @@ -407,3 +407,4 @@ qpdf image optimize no shrink 0
407 qpdf image optimize too small 0 407 qpdf image optimize too small 0
408 QPDFFormFieldObjectHelper WinAnsi 0 408 QPDFFormFieldObjectHelper WinAnsi 0
409 QPDF_encryption attachment stream 0 409 QPDF_encryption attachment stream 0
  410 +QPDF pipe foreign encrypted stream 0