-
Do not include the trailing EI, and handle cases where EI is not preceded by a delimiter. Such cases have been seen in the wild.
-
Add a version of expectInlineImage that takes an input source and searches for EI. This is in preparation for improving the way EI is found. This commit just refactors the code without changing the functionality and adds tests to make sure the old and new code behave identically.
-
The inline image token erroneously included the delimiter that followed EI. The ObjectHandle created from it was correct.
-
Add getAttribute for handling inheritable page attributes, and fix getPageImages and annotation flattening code to use it.
-
Instead of directly putting the contents of the annotation appearance streams into the page's content stream, add commands to render the form xobjects directly. This is a more robust way to do it than the original solution as it works properly with patterns and avoids problems with resource name clashes between the pages and the form xobjects.
-
Wrap an object in an array if it is not already an array.
-
If parsing content streams is treated as a warning, there is no way for a caller to know if a parsing operation has failed. This is very dangerous and will likely result in data loss when token filters are parser callbacks are in use.
-
It's not really a shallow copy. It just doesn't cross indirect object boundaries. The old implementation had a bug that would cause multiple shallow copies of the same object to share memory, which was not the intention.
-
Provide a convenient way of accessing rectangles.
-
This fixes CVE-2018-9918.
-
Remove calls to assertPageObject(). All cases in the library that called assertPageObject() work fine if you don't call assertPageObject() because nothing assumes anything that was being checked by that call. Removing the calls enables more files to be successfully processed.
-
Give objects descriptions and context so it is possible to issue warnings instead of fatal errors for attempts to access objects of the wrong type.
-
As in other cases, this is to enable adding new member variables in the future without breaking ABI compatibility.
-
Expose Pl_QPDFTokenizer, and have it do more of the work of managing the token filter's pipeline.
-
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a general filter that passes data through a TokenFilter.
-
Remove a redundant method that was equal to another one with additional arguments. This breaks binary compatibility, but there are other ABI breaking changes in the upcoming release, so now is the time to do it.
-
Tweak the message so that we inform the user that we are mitigating data loss.
-
This commit adds several API methods that enable control over which types of filters QPDF will attempt to decode. It also adds support for /RunLengthDecode and /DCTDecode filters for both encoding and decoding.
-
When parsing content streams, allow content to be split arbitrarily across stream boundaries.
-
When requested, QPDFWriter will do more aggress prechecking of streams to make sure it can actually succeed in decoding them before attempting to do so. This will allow preservation of raw data even when the raw data is corrupted relative to the specified filters.
-
QPDFObjectHandle::parseInternal now issues warnings instead of throwing exceptions for all error conditions that it finds (except internal logic errors) and has stronger recovery for things like invalid tokens and malformed dictionaries. This should improve qpdf's ability to recover from a wide range of broken files that currently cause it to fail.
-
This is CVE-2017-9208. The QPDF library uses object ID 0 internally as a sentinel to represent a direct object, but prior to this fix, was not blocking handling of 0 0 obj or 0 0 R as a special case. Creating an object in the file with 0 0 obj could cause various infinite loops. The PDF spec doesn't allow for object 0. Having qpdf handle object 0 might be a better fix, but changing all the places in the code that assumes objid == 0 means direct would be risky.
-
This is CVE-2017-9210. The description string for an error message included unparsing an object, which is too complex of a thing to try to do while throwing an exception. There was only one example of this in the entire codebase, so it is not a pervasive problem. Fixing this eliminated one class of infinite loop errors.
-
When checking two objects preceding R while parsing, ensure that the objects are direct. This avoids stuff like 1 0 obj containing 1 0 R 0 R from causing an infinite loop in object resolution.
-
The spec allows /Contents to be omitted for pages that are blank, but QPDFObjectHandle::getPageContents() was throwing an exception in this case.
-
For std::string and std::vector, replace operator[] with at. This was done using an automated process. See README.hardening for details.