-
Reject objects containing arrays or dictionaries with more than 5000 elements. We are by definition dealing with damaged files, and such objects are extremely likely to be invalid or malicious.
-
... containing objects with no white-space between them. To enforce the rule that objects end at the start-offset of the next object, each object is parsed in it own object stream. To facilitate this, a new private API input source is::OffsetBuffer has been added which only contains the object but reports offsets relative to the start of the object stream. This is adapted from OffsetInputSource by changing the direction of the offset, endowing it with its own BufferInputSource and striooing out checks duplicated in BufferInputSource. Fixes the expected failure in the test case added in #1266.
-
Add static parse methods. Make all external access to QPDFParser through static methods. Make all non-static methods including constructors private.
-
Only build strings when needed.
-
Avoid creating new identical descriptions for each content stream token.
-
Also remove some shared pointers and use std::string instead of Pl_Buffer in Pl_QPDFTokenizer.
-
Remove remaining QPDFTokenizer private methods. Remove QPDFTokenizer privileged access to Tokenizer.
-
#1349 introduced a limit on the maximum size of arrays and dictionaries contained in objects that generate errors during parsing, and #1354 reduced that limit to 5000 objects. However, the limit was only imposed once a further error was encountered. Stop adding objects to containers once the limit is reached. Fixes oss-fuzz issue 398060137
-
This improves indentation of long strings. This commit also fixes some trailing whitespace in ChangeLog.
-
Currently, QPDFParser gives up attempting to parse an object if 5 near-consecutive bad tokens are encountered. Add a limit of a total of 15 bad tokens in a single object before giving up.
-
Create unresolved objects only for objects in the xref table (except during parsing of the xref table). Do not add indirect nulls into the the object cache as the result of a cache miss during a call to getObject except during parsing or creation/updating from JSON. To support this behaviour, add new private methods getObjectForParser and getObjectForJSON. As a result of this change, dangling references are treated as direct nulls rather than indirect nulls.
-
Prepare for treating indirect references differently depending on whether we are parsing a PDF file (in which case reference to objects not in the xref table are null even if they are in the object cache) or whether parse from user code (in which case an indirect reference can refer to a user created object).
-
Also, don't search for /Contents name unless the result is used.
-
The new method is temporarily an (almost) complete copy of parse, which is temporarily (almost) unchanged.
-
Also, name the type QPDFValue::Description.
-
Set parsed offset at the same time as setting description.
-
Part of #729
-
Part of #729
-
Part of #729