-
Some files in the test suite trigger antivirus warnings. These are not infected files with malicious intent. They are test files to ensure that qpdf does not crash when it encounters the files. This change enables those files to be obfuscated in the source repository so that checking out qpdf from version control or extracting the source code doesn't trigger antivirus warnings.
-
CR, CRLF, and LF are all supposed to be treated as LF; only one EOL is to be ignored after backslash.
-
Turns out you can keep adding zero to a number over and over again and it just doesn't get any bigger. Who would have known?
-
If we are unable to filter a page's content streams, don't attempt to remove objects from the page's resource dictionary. Also provide a command line option to suppress resource removal in case we ever need this as a workaround for some bug or broken PDF files.
-
If parsing content streams is treated as a warning, there is no way for a caller to know if a parsing operation has failed. This is very dangerous and will likely result in data loss when token filters are parser callbacks are in use.
-
It's not really a shallow copy. It just doesn't cross indirect object boundaries. The old implementation had a bug that would cause multiple shallow copies of the same object to share memory, which was not the intention.
-
This fixes CVE-2018-9918.
-
Remove calls to assertPageObject(). All cases in the library that called assertPageObject() work fine if you don't call assertPageObject() because nothing assumes anything that was being checked by that call. Removing the calls enables more files to be successfully processed.
-
Prior to this fix, if there was a loop detected in following /Prev pointers in xref streams/tables, it would cause qpdf to lose data. Note that this condition causes many PDF readers to hang or fail.
-
The QPDF_String::getUTF8Val() method was not treating strings that weren't explicitly Unicode as PDF Doc Encoded. This only affects characters in the range 0x80 through 0xa0.
-
Too many test cases were "miscellaneous".
-
Give objects descriptions and context so it is possible to issue warnings instead of fatal errors for attempts to access objects of the wrong type.
-
Sometimes it's an offset in an object stream or a content stream, so file position is confusing in some cases.
-
Implement a TokenFilter class and refactor Pl_QPDFTokenizer to use a TokenFilter class called ContentNormalizer. Pl_QPDFTokenizer is now a general filter that passes data through a TokenFilter.
-
Adding a trailing newline in content normalization damages files whose contents are split across streams in the middle of tokens. Let QPDFWriter add the newline with the indicator to ignore the newline, which it already does. This changes the way some qdf files look.
-
Significant enhancements to the lexer to improve EOF handling and to support comments and spaces as tokens. Various other minor issues were fixed as well.
-
This tokenizes outer parts of the file, page content streams, and object streams. It is for exercising the tokenizer in isolation and is being introduced before reworking the lexical layer of qpdf.
-
This is useful only for debugging the linearization code.