-
This reverts commit ff2a78f579ebdd06b417e34260a17dba06e71137, reversing changes made to 8f54319f7a6514110f4b05cbbf1cb1c9fc8cb6a0.
-
This reverts commit 0e92cf6bf399249c603c3d0212e898fd29e71fcd, reversing changes made to 7d34b89a69e8e89c098dd373442f7df809c28eff.
-
Ghostscript 10.0.2 failed to handle the files changed in this commit, but ghostscript 10.0.4 handles them fine as do earlier versions. These files all have hybird xref in the form of a file with an xref table appended with a section that has an xref stream. They all have /PageLabels pointing to 107 0 R in the original file, with 107 higher than the highest object. The spec says that this should be treated as null, which results in /PageLabels null, which results in ghostscript errors in that version. While ghostscript 10.0.2 may be handling the file incorrectly, the file does something that's not really kosher, and it's easier to fix the files, which had not been changed since the very first open source release of qpdf, than to try to work around the issue. This was discovered with the GitHub actions runner was bumped to Ubuntu 24.04, which contains the buggy version of ghostscript. I was not able to find a specific ghostscript issue that addressed this, but the problem went away in either 10.0.3 or 10.0.4. Commenting out /PageLabels without changing offsets was a pragmatic move to avoid having to regenerate the xref tables manually. I just had to manually edit the binary xref stream to change the offset of one item (the new object 1), which I put at the end to avoid breaking other things.
-
Why did this ever work? Hard to say...perhaps a library we linked against was setting `int _dowildcard = -1;` somewhere and no longer is. Apparently linking with CRT_glob.o has been the way to do this for a very long time, and we've just been lucky that it worked all this time.
-
Add new commands --remove-metadata and --remove-info
-
Optimistically read subsection headers without reading individual object entries, assuming that they are 20 bytes long as per the PDF spec. If problems are encountered, fall back to calling bad_subsections.
-
Temporarily disable 3 specific-bugs tests. Remove 'xref size mismatch' test.
-
Split reconstruction into two passes - scanning of input for objects and insertion of objects into the xref table. This allows insertion to take place in the usual reverse order and removes the need for a separate insert_reconstructed method.
-
Calculate all subsections before reading subsection entries. Duplicates some warnings for the time being.
-
Also, when recovering trailer from xref streams, pick the last valid trailer encountered rather than the first.
-
Change first xref stream dictionary to point to an invalid root in order to detect failure to recover the last valid trailer.
-
Ensure the the recovered stream end is not part of a different object. Test file is bad24.pdf with stream 4 'endstream' corrupted.
-
Create unresolved objects only for objects in the xref table (except during parsing of the xref table). Do not add indirect nulls into the the object cache as the result of a cache miss during a call to getObject except during parsing or creation/updating from JSON. To support this behaviour, add new private methods getObjectForParser and getObjectForJSON. As a result of this change, dangling references are treated as direct nulls rather than indirect nulls.
-
Handle case where named destination is a dictionary with /D entry. Test case is hand-edited outlines-with-old-root-dests.pdf with modified object 107.
-
If reconstruct_xref generates more than 1000 warnings give up because the file is so severely damaged that there is very little point continuing.
-
Check that xref table is not empty after recovery. Empty xref tables disable other sanity checks.
-
Previous test case was lost in #1221. Test file was created from object-stream.pdf by adding a reference to itself into object stream 1 0.
-
Ensure objects with impossibly large ids are ignored.
-
Also add new fuzz test case.
-
A file that has Widget annotations that can't be mapped back to form fields would crash qpdf json.
-
Code failed to allow for QPDF::getCompressibleObjSet deleting objects from the object cache in case of multiple entries for the same object id. Add fuzz test case 68668.
-
Also, test writing JSON v1 files and files with deeply nested containers.
-
The code accepted values other than /Yes but still used /Yes as the checked value instead of obeying the normal appearance dictionary.