-
Given that the PDF spec requires the xref table to contain entries for all object ids <= the maximum id present in a PDF document, max_size is a qpdf implementation limitation for legitimate object ids.
-
Refactor Pl_QPDFTokenizer
-
Create unresolved objects only for objects in the xref table (except during parsing of the xref table). Do not add indirect nulls into the the object cache as the result of a cache miss during a call to getObject except during parsing or creation/updating from JSON. To support this behaviour, add new private methods getObjectForParser and getObjectForJSON. As a result of this change, dangling references are treated as direct nulls rather than indirect nulls.
-
Prepare for treating indirect references differently depending on whether we are parsing a PDF file (in which case reference to objects not in the xref table are null even if they are in the object cache) or whether parse from user code (in which case an indirect reference can refer to a user created object).
-
Buffer output locally. Add qpdf_fuzzer test case.
-
Avoid unnecessary rescanning of lines and repositioning of input file. Limit max size of tokens.
-
Remove unnecessary use of shared pointers and avoid unnecessary string creation.
-
Avoid writing each space char individually.
-
Wrap-around is intentional and generates false positives
-
In FUTURE make various QPDFObjectHandle methods const
-
Adjust fuzzer warning and memory limits
-
Fix QPDFOutlineDocumentHelper::resolveNamedDest (fixes #1238)
-
Throw damagedFile if max_warnings is exceeded. Change qpdf_fuzzer warnings limit to limit to 500.
-
Handle case where named destination is a dictionary with /D entry. Test case is hand-edited outlines-with-old-root-dests.pdf with modified object 107.
-
Run getAllPages as sanity check and throw an exception if too many warnings are generated or no pages are found.
-
Try a limit of 50MB. For very large limits processing time before damage is encountered may exceed oss-fuzz limits. Add further test cases.
-
If reconstruct_xref generates more than 1000 warnings give up because the file is so severely damaged that there is very little point continuing.
-
Reject non-dictionary Page and Pages objects. Also add additional qpdf_fuzzer test cases.
-
If throw_on_corrupt is set, use a custom implementation of libjeg's emit_message procedure to throw an exception when the first corrupt data warning is encountered.
-
Check that xref table is not empty after recovery. Empty xref tables disable other sanity checks.