-
... to remove the /Root /StructTreeRoot and /MarkInfo entries.
-
... containing objects with no white-space between them. To enforce the rule that objects end at the start-offset of the next object, each object is parsed in it own object stream. To facilitate this, a new private API input source is::OffsetBuffer has been added which only contains the object but reports offsets relative to the start of the object stream. This is adapted from OffsetInputSource by changing the direction of the offset, endowing it with its own BufferInputSource and striooing out checks duplicated in BufferInputSource. Fixes the expected failure in the test case added in #1266.
-
This was due to the use of last_object_description, which is not set for the object stream itself. Also, modify the messages introduced #1391 and #1392 to report the supposed offset of the objects.
-
If startxref cannot be found in the last 1024 try finding it in the whole file and check whether it is valid.
-
Split reconstruction into three passes - scanning of input for objects and trailer, insertion of objects into the xref table, and loading the trailer. This allows insertion to take place in the usual reverse order and removes the need for a separate insertReconstructedXrefEntry method. It also allows trailer to be tried from most recent to oldest. Ignore any found trailers without /Root entry.
-
The original test file contains multiple entries with id 0 and offset 0. One entry has been modified such that the id is valid (6). Object streams with invalid offsets are a source of unreproduceable oss-fuzz time-outs.
-
Also add debugging information so we can save time if $^O used in GitHub Actions changes again.
-
Reduce the container size for which a single bad token will cause a failure from 100,000 to 5,000. Count missing dictionary keys as errors.
-
This requires a special build option.
-
When recovering XRef streams, start with the stream with the largest /Size rather than the largest offset. Also, if reconstruction fails to find a trailer with a valid /Root entry search for a root object.
-
Change QPDFWriter stream_decode_level default to qpdf_dl_generalized (fixes #1286)
-
Exercise stream containing objects with no white-space between them.
-
Also, fix disabling of preserve_encryption to be ignore stream_decode_level, but disable preserve_encryption if compress_streams is false. Fixes #1286
-
without filtering
-
Test fixing /P entry.
-
Fix QPDF::getAllPagesInternal warning
-
Provide correct obj_gen and offset.
-
Provide correct obj_gen.
-
This reverts commit ff2a78f579ebdd06b417e34260a17dba06e71137, reversing changes made to 8f54319f7a6514110f4b05cbbf1cb1c9fc8cb6a0.
-
This reverts commit 0e92cf6bf399249c603c3d0212e898fd29e71fcd, reversing changes made to 7d34b89a69e8e89c098dd373442f7df809c28eff.
-
Ghostscript 10.0.2 failed to handle the files changed in this commit, but ghostscript 10.0.4 handles them fine as do earlier versions. These files all have hybird xref in the form of a file with an xref table appended with a section that has an xref stream. They all have /PageLabels pointing to 107 0 R in the original file, with 107 higher than the highest object. The spec says that this should be treated as null, which results in /PageLabels null, which results in ghostscript errors in that version. While ghostscript 10.0.2 may be handling the file incorrectly, the file does something that's not really kosher, and it's easier to fix the files, which had not been changed since the very first open source release of qpdf, than to try to work around the issue. This was discovered with the GitHub actions runner was bumped to Ubuntu 24.04, which contains the buggy version of ghostscript. I was not able to find a specific ghostscript issue that addressed this, but the problem went away in either 10.0.3 or 10.0.4. Commenting out /PageLabels without changing offsets was a pragmatic move to avoid having to regenerate the xref tables manually. I just had to manually edit the binary xref stream to change the offset of one item (the new object 1), which I put at the end to avoid breaking other things.
-
Add new commands --remove-metadata and --remove-info
-
Optimistically read subsection headers without reading individual object entries, assuming that they are 20 bytes long as per the PDF spec. If problems are encountered, fall back to calling bad_subsections.
-
Temporarily disable 3 specific-bugs tests. Remove 'xref size mismatch' test.
-
Split reconstruction into two passes - scanning of input for objects and insertion of objects into the xref table. This allows insertion to take place in the usual reverse order and removes the need for a separate insert_reconstructed method.
-
Calculate all subsections before reading subsection entries. Duplicates some warnings for the time being.
-
Also, when recovering trailer from xref streams, pick the last valid trailer encountered rather than the first.
-
Change first xref stream dictionary to point to an invalid root in order to detect failure to recover the last valid trailer.
-
Ensure the the recovered stream end is not part of a different object. Test file is bad24.pdf with stream 4 'endstream' corrupted.
-
Create unresolved objects only for objects in the xref table (except during parsing of the xref table). Do not add indirect nulls into the the object cache as the result of a cache miss during a call to getObject except during parsing or creation/updating from JSON. To support this behaviour, add new private methods getObjectForParser and getObjectForJSON. As a result of this change, dangling references are treated as direct nulls rather than indirect nulls.
-
Handle case where named destination is a dictionary with /D entry. Test case is hand-edited outlines-with-old-root-dests.pdf with modified object 107.
-
If reconstruct_xref generates more than 1000 warnings give up because the file is so severely damaged that there is very little point continuing.
-
Check that xref table is not empty after recovery. Empty xref tables disable other sanity checks.
-
Previous test case was lost in #1221. Test file was created from object-stream.pdf by adding a reference to itself into object stream 1 0.
-
Ensure objects with impossibly large ids are ignored.