Commit 3d0de5b92471a280bc8baf467159561d79428ebc

Authored by Jay Berkenbilt
1 parent 98174373

Fixes to ChangeLog and manual for 10.0.0 changes

ChangeLog
... ... @@ -8,6 +8,12 @@
8 8 recovery when objects are copied from other files and when
9 9 "immediate copy from" is enabled.
10 10  
  11 + * When copying foreign streams with immediateCopyFrom set, the
  12 + same type of recovery from streams with filtering errors is
  13 + performed as when dealing with streams in the original input. This
  14 + could happen, for example, if you are using the --pages option to
  15 + take pages from another file and that file has errors in it.
  16 +
11 17 * Add a new version of QPDFObjectHandle::pipeStreamData whose
12 18 return value indicates overall success or failure rather than
13 19 whether nor not filtering was attempted. It should have always
... ... @@ -36,6 +42,12 @@
36 42 --preserve-unreferenced-resources is now a synonym for
37 43 --remove-unreferenced-resources=no.
38 44  
  45 + * Use std::atomic for unique ID generation internally within the
  46 + library. This eliminates the already extremely low chance of a
  47 + collision, improves thread safety, and removes a dependency on a
  48 + random number generator. Thanks to Dean Scarff for the
  49 + contribution.
  50 +
39 51 2020-04-03 Jay Berkenbilt <ejb@ql.org>
40 52  
41 53 * Allow qpdf to be built on systems without wchar_t. All "normal"
... ... @@ -50,6 +62,10 @@
50 62 maximally fill the destination rectangle. Prior to this change,
51 63 placeFormXObject might shrink it but would never expand it.
52 64  
  65 + * When calling the C API, accept any non-zero value as TRUE rather
  66 + than just 1. This appears to resolve issues on Windows when
  67 + calling some versions of the DLL directly from other languages.
  68 +
53 69 2020-04-02 Jay Berkenbilt <ejb@ql.org>
54 70  
55 71 * Add method QPDFObjectHandle::unsafeShallowCopy for copying only
... ...
manual/qpdf-manual.xml
... ... @@ -1945,20 +1945,50 @@ outfile.pdf&lt;/option&gt;
1945 1945 </listitem>
1946 1946 </varlistentry>
1947 1947 <varlistentry>
  1948 + <term><option>--remove-unreferenced-resources=<replaceable>option</replaceable></option></term>
  1949 + <listitem>
  1950 + <para>
  1951 + The <replaceable>option</replaceable> may be
  1952 + <literal>auto</literal>, <literal>yes</literal>, or
  1953 + <literal>no</literal>. The default is <literal>auto</literal>.
  1954 + </para>
  1955 + <para>
  1956 + Starting with qpdf 8.1, when splitting pages, qpdf is able to
  1957 + attempt to remove images and fonts that are not used by a page
  1958 + even if they are referenced in the page's resources
  1959 + dictionary. When shared resources are in use, this behavior
  1960 + can greatly reduce the file sizes of split pages, but the
  1961 + analysis is very slow. In versions from 8.1 through 9.1.1,
  1962 + qpdf did this analysis by default. Starting in qpdf 10.0.0, if
  1963 + <literal>auto</literal> is used, qpdf does a quick analysis of
  1964 + the file to determine whether the file is likely to have
  1965 + unreferenced objects on pages, a pattern that frequently
  1966 + occurs when resource dictionaries are shared across multiple
  1967 + pages and rarely occurs otherwise. If it discovers this
  1968 + pattern, then it will attempt to remove unreferenced
  1969 + resources. Usually this means you get the slower splitting
  1970 + speed only when it's actually going to create smaller files.
  1971 + You can suppress removal of unreferenced resources altogether
  1972 + by specifying <literal>no</literal> or force it to do the full
  1973 + algorithm by specifying <literal>yes</literal>.
  1974 + </para>
  1975 + <para>
  1976 + Other than cases in which you don't care about file size and
  1977 + care a lot about runtime, there are few reasons to use this
  1978 + option, especially now that <literal>auto</literal> mode is
  1979 + supported. One reason to use this is if you suspect that qpdf
  1980 + is removing resources it shouldn't be removing. If you
  1981 + encounter that case, please report it as bug at <ulink
  1982 + url="https://github.com/qpdf/qpdf/issues/">https://github.com/qpdf/qpdf/issues/</ulink>.
  1983 + </para>
  1984 + </listitem>
  1985 + </varlistentry>
  1986 + <varlistentry>
1948 1987 <term><option>--preserve-unreferenced-resources</option></term>
1949 1988 <listitem>
1950 1989 <para>
1951   - Starting with qpdf 8.1, when splitting pages, qpdf ordinarily
1952   - attempts to remove images and fonts that are not used by a
1953   - page even if they are referenced in the page's resources
1954   - dictionary. This option suppresses that behavior. There are
1955   - few reasons to use this option. One reason to use this is if
1956   - you suspect that qpdf is removing resources it shouldn't be
1957   - removing. If you encounter that case, please report it as a
1958   - bug. Another reason is that the new behavior can be much
1959   - slower for files that include a very large number of images or
1960   - other XObjects on a page. In that case, using this option will
1961   - return qpdf to the old behavior and speed.
  1990 + This is a synonym for
  1991 + <option>--remove-unreferenced-resources=no</option>.
1962 1992 </para>
1963 1993 <para>
1964 1994 See also <option>--preserve-unreferenced</option>, which does
... ... @@ -4700,6 +4730,239 @@ print &quot;\n&quot;;
4700 4730 <filename>ChangeLog</filename> in the source distribution.
4701 4731 </para>
4702 4732 <variablelist>
  4733 +<!--
  4734 + <varlistentry>
  4735 + <term>x.y.z: Month dd, YYYY</term>
  4736 + <listitem>
  4737 + <itemizedlist>
  4738 + <listitem>
  4739 + <para>
  4740 + Category
  4741 + </para>
  4742 + <itemizedlist>
  4743 + <listitem>
  4744 + <para>
  4745 + Item
  4746 + </para>
  4747 + </listitem>
  4748 + <listitem>
  4749 + <para>
  4750 + Item
  4751 + </para>
  4752 + </listitem>
  4753 + </itemizedlist>
  4754 + </listitem>
  4755 + <listitem>
  4756 + <para>
  4757 + Category
  4758 + </para>
  4759 + <itemizedlist>
  4760 + <listitem>
  4761 + <para>
  4762 + Item
  4763 + </para>
  4764 + </listitem>
  4765 + <listitem>
  4766 + <para>
  4767 + Item
  4768 + </para>
  4769 + </listitem>
  4770 + </itemizedlist>
  4771 + </listitem>
  4772 + </itemizedlist>
  4773 + </listitem>
  4774 + </varlistentry>
  4775 +-->
  4776 + <varlistentry>
  4777 + <term>10.0.0: April 6, 2020</term>
  4778 + <listitem>
  4779 + <itemizedlist>
  4780 + <listitem>
  4781 + <para>
  4782 + Performance Enhancements
  4783 + </para>
  4784 + <itemizedlist>
  4785 + <listitem>
  4786 + <para>
  4787 + The qpdf library and executable should run much faster in
  4788 + this version than in the last several releases. Several
  4789 + internal library optimizations have been made, and there has
  4790 + been improved behavior on page splitting as well. This
  4791 + version of qpdf should outperform any of the 8.x or 9.x
  4792 + versions.
  4793 + </para>
  4794 + </listitem>
  4795 + </itemizedlist>
  4796 + </listitem>
  4797 + <listitem>
  4798 + <para>
  4799 + CLI Enhancements
  4800 + </para>
  4801 + <itemizedlist>
  4802 + <listitem>
  4803 + <para>
  4804 + Add <literal>objectinfo</literal> key to the JSON output.
  4805 + This will be a place to put computed metadata or other
  4806 + information about PDF objects that are not immediately
  4807 + evident in other ways or that seem useful for some other
  4808 + reason. In this version, information is provided about each
  4809 + object indicating whether it is a stream and, if so, what
  4810 + its length and filters are. Without this, it was not
  4811 + possible to tell conclusively from the JSON output alone
  4812 + whether or not an object was a stream. Run <command>qpdf
  4813 + --json-help</command> for details.
  4814 + </para>
  4815 + </listitem>
  4816 + <listitem>
  4817 + <para>
  4818 + Add new option
  4819 + <option>--remove-unreferenced-resources</option> which takes
  4820 + <literal>auto</literal>, <literal>yes</literal>, or
  4821 + <literal>no</literal> as arguments. The new
  4822 + <literal>auto</literal> mode, which is the default, performs
  4823 + a fast heuristic over a PDF file when splitting pages to
  4824 + determine whether the expensive process of finding and
  4825 + removing unreferenced resources is likely to be of benefit.
  4826 + For most files, this new default will result in a
  4827 + significant performance improvement for splitting pages. See
  4828 + <xref linkend="ref.advanced-transformation"/> for a more
  4829 + detailed discussion.
  4830 + </para>
  4831 + </listitem>
  4832 + <listitem>
  4833 + <para>
  4834 + The <option>--preserve-unreferenced-resources</option> is
  4835 + now just a synonym for
  4836 + <option>--remove-unreferenced-resources=no</option>.
  4837 + </para>
  4838 + </listitem>
  4839 + <listitem>
  4840 + <para>
  4841 + If the <literal>QPDF_EXECUTABLE</literal> environment
  4842 + variable is set when invoking <command>qpdf
  4843 + --bash-completion</command> or <command>qpdf
  4844 + --zsh-completion</command>, the completion command that it
  4845 + outputs will refer to qpdf using the value of that variable
  4846 + rather than what <command>qpdf</command> determines its
  4847 + executable path to be. This can be useful when wrapping
  4848 + <command>qpdf</command> with a script, working with a
  4849 + version in the source tree, using an AppImage, or other
  4850 + situations where there is some indirection.
  4851 + </para>
  4852 + </listitem>
  4853 + </itemizedlist>
  4854 + </listitem>
  4855 + <listitem>
  4856 + <para>
  4857 + Library Enhancements
  4858 + </para>
  4859 + <itemizedlist>
  4860 + <listitem>
  4861 + <para>
  4862 + Add a new version of
  4863 + <function>QPDFObjectHandle::StreamDataProvider::provideStreamData</function>
  4864 + that accepts the <function>suppress_warnings</function> and
  4865 + <function>will_retry</function> options and allows a success
  4866 + code to be returned. This makes it possible to implement a
  4867 + <classname>StreamDataProvider</classname> that calls
  4868 + <function>pipeStreamData</function> on another stream and to
  4869 + pass the response back to the caller, which enables better
  4870 + error handling on those proxied streams.
  4871 + </para>
  4872 + </listitem>
  4873 + <listitem>
  4874 + <para>
  4875 + Update <function>QPDFObjectHandle::pipeStreamData</function>
  4876 + to return an overall success code that goes beyond whether
  4877 + or not filtered data was written successfully. This allows
  4878 + better error handling of cases that were not filtering
  4879 + errors. You have to call this explicitly. Methods in
  4880 + previously existing APIs have the same semantics as before.
  4881 + </para>
  4882 + </listitem>
  4883 + <listitem>
  4884 + <para>
  4885 + The
  4886 + <function>QPDFPageObjectHelper::placeFormXObject</function>
  4887 + method now allows separate control over whether it should be
  4888 + willing to shrink or expand objects to fit them better into
  4889 + the destination rectangle. The previous behavior was that
  4890 + shrinking was allowed but expansion was not. The previous
  4891 + behavior is still the default.
  4892 + </para>
  4893 + </listitem>
  4894 + <listitem>
  4895 + <para>
  4896 + When calling the C API, any non-zero value passed to a
  4897 + boolean parameter is treated as <literal>TRUE</literal>.
  4898 + Previously only the value <literal>1</literal> was accepted.
  4899 + This makes the C API behave more like most C interfaces and
  4900 + is known to improve compatibility with some Windows
  4901 + environments that dynamically load the DLL and call
  4902 + functions from it.
  4903 + </para>
  4904 + </listitem>
  4905 + <listitem>
  4906 + <para>
  4907 + Add <function>QPDFObjectHandle::unsafeShallowCopy</function>
  4908 + for copying only top-level dictionary keys or array items.
  4909 + This is unsafe because it creates a situation in which
  4910 + changing a lower-level item in one object may also change it
  4911 + in another object, but for cases in which you
  4912 + <emphasis>know</emphasis> you are only inserting or
  4913 + replacing top-level items, it is much faster than
  4914 + <function>QPDFObjectHandle::shallowCopy</function>.
  4915 + </para>
  4916 + </listitem>
  4917 + <listitem>
  4918 + <para>
  4919 + Add <function>QPDFObjectHandle::filterAsContents</function>,
  4920 + which filter's a stream's data as a content stream. This is
  4921 + useful for parsing the contents for form XObjects in the
  4922 + same way as parsing page content streams.
  4923 + </para>
  4924 + </listitem>
  4925 + </itemizedlist>
  4926 + </listitem>
  4927 + <listitem>
  4928 + <para>
  4929 + Bug Fixes
  4930 + </para>
  4931 + <itemizedlist>
  4932 + <listitem>
  4933 + <para>
  4934 + When detecting and removing unreferenced resources during
  4935 + page splitting, traverse into form XObjects and handle their
  4936 + resources dictionaries as well.
  4937 + </para>
  4938 + </listitem>
  4939 + <listitem>
  4940 + <para>
  4941 + The same error recovery is applied to streams in other than
  4942 + the primary input file when merging or splitting pages.
  4943 + </para>
  4944 + </listitem>
  4945 + </itemizedlist>
  4946 + </listitem>
  4947 + <listitem>
  4948 + <para>
  4949 + Build Changes
  4950 + </para>
  4951 + <itemizedlist>
  4952 + <listitem>
  4953 + <para>
  4954 + Allow qpdf to built on stripped down systems whose C/C++
  4955 + libraries lack the <classname>wchar_t</classname> type.
  4956 + Search for <classname>wchar_t</classname> in qpdf's
  4957 + README.md for details. This should be very rare, but it is
  4958 + known to be helpful in some embedded environments.
  4959 + </para>
  4960 + </listitem>
  4961 + </itemizedlist>
  4962 + </listitem>
  4963 + </itemizedlist>
  4964 + </listitem>
  4965 + </varlistentry>
4703 4966 <varlistentry>
4704 4967 <term>9.1.1: January 26, 2020</term>
4705 4968 <listitem>
... ... @@ -4804,8 +5067,6 @@ print &quot;\n&quot;;
4804 5067 </itemizedlist>
4805 5068 </listitem>
4806 5069 </varlistentry>
4807   - </variablelist>
4808   - <variablelist>
4809 5070 <varlistentry>
4810 5071 <term>9.1.0: November 17, 2019</term>
4811 5072 <listitem>
... ... @@ -4905,8 +5166,6 @@ print &quot;\n&quot;;
4905 5166 </itemizedlist>
4906 5167 </listitem>
4907 5168 </varlistentry>
4908   - </variablelist>
4909   - <variablelist>
4910 5169 <varlistentry>
4911 5170 <term>9.0.2: October 12, 2019</term>
4912 5171 <listitem>
... ... @@ -5272,7 +5531,7 @@ print &quot;\n&quot;;
5272 5531 in dynamically linked code catching exceptions or
5273 5532 subclassing, this could be the reason. If you see this,
5274 5533 please report a bug at <ulink
5275   - url="https://github.com/qpdf/qpdf/issues/">pikepdf</ulink>.
  5534 + url="https://github.com/qpdf/qpdf/issues/">https://github.com/qpdf/qpdf/issues/</ulink>.
5276 5535 </para>
5277 5536 </listitem>
5278 5537 <listitem>
... ...
qpdf/qpdf.cc
... ... @@ -1483,10 +1483,10 @@ ArgParser::argHelp()
1483 1483 << "--normalize-content=[yn] enables or disables normalization of content streams\n"
1484 1484 << "--object-streams=mode controls handing of object streams\n"
1485 1485 << "--preserve-unreferenced preserve unreferenced objects\n"
1486   - << "--preserve-unreferenced-resources\n"
1487   - << " synonym for --remove-unreferenced-resources=no\n"
1488 1486 << "--remove-unreferenced-resources={auto,yes,no}\n"
1489 1487 << " whether to remove unreferenced page resources\n"
  1488 + << "--preserve-unreferenced-resources\n"
  1489 + << " synonym for --remove-unreferenced-resources=no\n"
1490 1490 << "--newline-before-endstream always put a newline before endstream\n"
1491 1491 << "--coalesce-contents force all pages' content to be a single stream\n"
1492 1492 << "--flatten-annotations=option\n"
... ...