Commit 419949574df4525c61ffe060ad1c63daf66e806c
1 parent
0b05111d
Add information about helper classes to the documentation
Showing
1 changed file
with
239 additions
and
92 deletions
manual/qpdf-manual.xml
| @@ -1751,53 +1751,54 @@ outfile.pdf</option> | @@ -1751,53 +1751,54 @@ outfile.pdf</option> | ||
| 1751 | </para> | 1751 | </para> |
| 1752 | <para> | 1752 | <para> |
| 1753 | In general, one should adhere strictly to a specification when | 1753 | In general, one should adhere strictly to a specification when |
| 1754 | - writing but be liberal in reading. This way, the product of our | 1754 | + writing but be liberal in reading. This way, the product of our |
| 1755 | software will be accepted by the widest range of other programs, | 1755 | software will be accepted by the widest range of other programs, |
| 1756 | - and we will accept the widest range of input files. This library | 1756 | + and we will accept the widest range of input files. This library |
| 1757 | attempts to conform to that philosophy whenever possible but also | 1757 | attempts to conform to that philosophy whenever possible but also |
| 1758 | aims to provide strict checking for people who want to validate | 1758 | aims to provide strict checking for people who want to validate |
| 1759 | - PDF files. If you don't want to see warnings and are trying to | 1759 | + PDF files. If you don't want to see warnings and are trying to |
| 1760 | write something that is tolerant, you can call | 1760 | write something that is tolerant, you can call |
| 1761 | - <literal>setSuppressWarnings(true)</literal>. If you want to fail | 1761 | + <literal>setSuppressWarnings(true)</literal>. If you want to fail |
| 1762 | on the first error, you can call | 1762 | on the first error, you can call |
| 1763 | - <literal>setAttemptRecovery(false)</literal>. The default | ||
| 1764 | - behavior is to generating warnings for recoverable problems. Note | ||
| 1765 | - that recovery will not always produce the desired results even if | ||
| 1766 | - it is able to get through the file. Unlike most other PDF files | ||
| 1767 | - that produce generic warnings such as “This file is | 1763 | + <literal>setAttemptRecovery(false)</literal>. The default behavior |
| 1764 | + is to generating warnings for recoverable problems. Note that | ||
| 1765 | + recovery will not always produce the desired results even if it is | ||
| 1766 | + able to get through the file. Unlike most other PDF files that | ||
| 1767 | + produce generic warnings such as “This file is | ||
| 1768 | damaged,”, qpdf generally issues a detailed error message | 1768 | damaged,”, qpdf generally issues a detailed error message |
| 1769 | - that would be most useful to a PDF developer. This is by design | ||
| 1770 | - as there seems to be a shortage of PDF validation tools out | ||
| 1771 | - there. (This was, in fact, one of the major motivations behind | ||
| 1772 | - the initial creation of qpdf.) | 1769 | + that would be most useful to a PDF developer. This is by design as |
| 1770 | + there seems to be a shortage of PDF validation tools out there. | ||
| 1771 | + This was, in fact, one of the major motivations behind the initial | ||
| 1772 | + creation of qpdf. | ||
| 1773 | </para> | 1773 | </para> |
| 1774 | </sect1> | 1774 | </sect1> |
| 1775 | <sect1 id="ref.design-goals"> | 1775 | <sect1 id="ref.design-goals"> |
| 1776 | <title>Design Goals</title> | 1776 | <title>Design Goals</title> |
| 1777 | <para> | 1777 | <para> |
| 1778 | The QPDF package includes support for reading and rewriting PDF | 1778 | The QPDF package includes support for reading and rewriting PDF |
| 1779 | - files. It aims to hide from the user details involving object | 1779 | + files. It aims to hide from the user details involving object |
| 1780 | locations, modified (appended) PDF files, the | 1780 | locations, modified (appended) PDF files, the |
| 1781 | directness/indirectness of objects, and stream filters including | 1781 | directness/indirectness of objects, and stream filters including |
| 1782 | - encryption. It does not aim to hide knowledge of the object | ||
| 1783 | - hierarchy or content stream contents. Put another way, a user of | 1782 | + encryption. It does not aim to hide knowledge of the object |
| 1783 | + hierarchy or content stream contents. Put another way, a user of | ||
| 1784 | the qpdf library is expected to have knowledge about how PDF files | 1784 | the qpdf library is expected to have knowledge about how PDF files |
| 1785 | work, but is not expected to have to keep track of bookkeeping | 1785 | work, but is not expected to have to keep track of bookkeeping |
| 1786 | details such as file positions. | 1786 | details such as file positions. |
| 1787 | </para> | 1787 | </para> |
| 1788 | <para> | 1788 | <para> |
| 1789 | A user of the library never has to care whether an object is | 1789 | A user of the library never has to care whether an object is |
| 1790 | - direct or indirect. All access to objects deals with this | ||
| 1791 | - transparently. All memory management details are also handled by | ||
| 1792 | - the library. | 1790 | + direct or indirect, though it is possible to determine whether an |
| 1791 | + object is direct or not if this information is needed. All access | ||
| 1792 | + to objects deals with this transparently. All memory management | ||
| 1793 | + details are also handled by the library. | ||
| 1793 | </para> | 1794 | </para> |
| 1794 | <para> | 1795 | <para> |
| 1795 | The <classname>PointerHolder</classname> object is used internally | 1796 | The <classname>PointerHolder</classname> object is used internally |
| 1796 | - by the library to deal with memory management. This is basically | ||
| 1797 | - a smart pointer object very similar in spirit to the Boost | ||
| 1798 | - library's <classname>shared_ptr</classname> object, but predating | ||
| 1799 | - it by several years. This library also makes use of a technique | ||
| 1800 | - for giving fine-grained access to methods in one class to other | 1797 | + by the library to deal with memory management. This is basically a |
| 1798 | + smart pointer object very similar in spirit to C++-11's | ||
| 1799 | + <classname>std::shared_ptr</classname> object, but predating it by | ||
| 1800 | + several years. This library also makes use of a technique for | ||
| 1801 | + giving fine-grained access to methods in one class to other | ||
| 1801 | classes by using public subclasses with friends and only private | 1802 | classes by using public subclasses with friends and only private |
| 1802 | members that in turn call private methods of the containing class. | 1803 | members that in turn call private methods of the containing class. |
| 1803 | See <classname>QPDFObjectHandle::Factory</classname> as an | 1804 | See <classname>QPDFObjectHandle::Factory</classname> as an |
| @@ -1810,29 +1811,20 @@ outfile.pdf</option> | @@ -1810,29 +1811,20 @@ outfile.pdf</option> | ||
| 1810 | files. | 1811 | files. |
| 1811 | </para> | 1812 | </para> |
| 1812 | <para> | 1813 | <para> |
| 1813 | - <classname>QPDFObject</classname> is the basic PDF Object class. | ||
| 1814 | - It is an abstract base class from which are derived classes for | ||
| 1815 | - each type of PDF object. Clients do not interact with Objects | ||
| 1816 | - directly but instead interact with | ||
| 1817 | - <classname>QPDFObjectHandle</classname>. | ||
| 1818 | - </para> | ||
| 1819 | - <para> | ||
| 1820 | - <classname>QPDFObjectHandle</classname> contains | ||
| 1821 | - <classname>PointerHolder<QPDFObject></classname> and | ||
| 1822 | - includes accessor methods that are type-safe proxies to the | ||
| 1823 | - methods of the derived object classes as well as methods for | ||
| 1824 | - querying object types. They can be passed around by value, | ||
| 1825 | - copied, stored in containers, etc. with very low overhead. | ||
| 1826 | - Instances of <classname>QPDFObjectHandle</classname> always | ||
| 1827 | - contain a reference back to the <classname>QPDF</classname> object | ||
| 1828 | - from which they were created. A | 1814 | + The primary class for interacting with PDF objects is |
| 1815 | + <classname>QPDFObjectHandle</classname>. Instances of this class | ||
| 1816 | + can be passed around by value, copied, stored in containers, etc. | ||
| 1817 | + with very low overhead. Instances of | ||
| 1818 | + <classname>QPDFObjectHandle</classname> created by reading from a | ||
| 1819 | + file will always contain a reference back to the | ||
| 1820 | + <classname>QPDF</classname> object from which they were created. A | ||
| 1829 | <classname>QPDFObjectHandle</classname> may be direct or indirect. | 1821 | <classname>QPDFObjectHandle</classname> may be direct or indirect. |
| 1830 | If indirect, the <classname>QPDFObject</classname> the | 1822 | If indirect, the <classname>QPDFObject</classname> the |
| 1831 | <classname>PointerHolder</classname> initially points to is a null | 1823 | <classname>PointerHolder</classname> initially points to is a null |
| 1832 | - pointer. In this case, the first attempt to access the underlying | 1824 | + pointer. In this case, the first attempt to access the underlying |
| 1833 | <classname>QPDFObject</classname> will result in the | 1825 | <classname>QPDFObject</classname> will result in the |
| 1834 | <classname>QPDFObject</classname> being resolved via a call to the | 1826 | <classname>QPDFObject</classname> being resolved via a call to the |
| 1835 | - referenced <classname>QPDF</classname> instance. This makes it | 1827 | + referenced <classname>QPDF</classname> instance. This makes it |
| 1836 | essentially impossible to make coding errors in which certain | 1828 | essentially impossible to make coding errors in which certain |
| 1837 | things will work for some PDF files and not for others based on | 1829 | things will work for some PDF files and not for others based on |
| 1838 | which objects are direct and which objects are indirect. | 1830 | which objects are direct and which objects are indirect. |
| @@ -1849,48 +1841,6 @@ outfile.pdf</option> | @@ -1849,48 +1841,6 @@ outfile.pdf</option> | ||
| 1849 | <filename>QPDFObjectHandle.hh</filename> for details. | 1841 | <filename>QPDFObjectHandle.hh</filename> for details. |
| 1850 | </para> | 1842 | </para> |
| 1851 | <para> | 1843 | <para> |
| 1852 | - When the <classname>QPDF</classname> class creates a new object, | ||
| 1853 | - it dynamically allocates the appropriate type of | ||
| 1854 | - <classname>QPDFObject</classname> and immediately hands the | ||
| 1855 | - pointer to an instance of <classname>QPDFObjectHandle</classname>. | ||
| 1856 | - The parser reads a token from the current file position. If the | ||
| 1857 | - token is a not either a dictionary or array opener, an object is | ||
| 1858 | - immediately constructed from the single token and the parser | ||
| 1859 | - returns. Otherwise, the parser is invoked recursively in a | ||
| 1860 | - special mode in which it accumulates objects until it finds a | ||
| 1861 | - balancing closer. During this process, the | ||
| 1862 | - “<literal>R</literal>” keyword is recognized and an | ||
| 1863 | - indirect <classname>QPDFObjectHandle</classname> may be | ||
| 1864 | - constructed. | ||
| 1865 | - </para> | ||
| 1866 | - <para> | ||
| 1867 | - The <function>QPDF::resolve()</function> method, which is used to | ||
| 1868 | - resolve an indirect object, may be invoked from the | ||
| 1869 | - <classname>QPDFObjectHandle</classname> class. It first checks a | ||
| 1870 | - cache to see whether this object has already been read. If not, | ||
| 1871 | - it reads the object from the PDF file and caches it. It the | ||
| 1872 | - returns the resulting <classname>QPDFObjectHandle</classname>. | ||
| 1873 | - The calling object handle then replaces its | ||
| 1874 | - <classname>PointerHolder<QDFObject></classname> with the one | ||
| 1875 | - from the newly returned <classname>QPDFObjectHandle</classname>. | ||
| 1876 | - In this way, only a single copy of any direct object need exist | ||
| 1877 | - and clients can access objects transparently without knowing | ||
| 1878 | - caring whether they are direct or indirect objects. Additionally, | ||
| 1879 | - no object is ever read from the file more than once. That means | ||
| 1880 | - that only the portions of the PDF file that are actually needed | ||
| 1881 | - are ever read from the input file, thus allowing the qpdf package | ||
| 1882 | - to take advantage of this important design goal of PDF files. | ||
| 1883 | - </para> | ||
| 1884 | - <para> | ||
| 1885 | - If the requested object is inside of an object stream, the object | ||
| 1886 | - stream itself is first read into memory. Then the tokenizer reads | ||
| 1887 | - objects from the memory stream based on the offset information | ||
| 1888 | - stored in the stream. Those individual objects are cached, after | ||
| 1889 | - which the temporary buffer holding the object stream contents are | ||
| 1890 | - discarded. In this way, the first time an object in an object | ||
| 1891 | - stream is requested, all objects in the stream are cached. | ||
| 1892 | - </para> | ||
| 1893 | - <para> | ||
| 1894 | An instance of <classname>QPDF</classname> is constructed by using | 1844 | An instance of <classname>QPDF</classname> is constructed by using |
| 1895 | the class's default constructor. If desired, the | 1845 | the class's default constructor. If desired, the |
| 1896 | <classname>QPDF</classname> object may be configured with various | 1846 | <classname>QPDF</classname> object may be configured with various |
| @@ -1934,8 +1884,206 @@ outfile.pdf</option> | @@ -1934,8 +1884,206 @@ outfile.pdf</option> | ||
| 1934 | <para> | 1884 | <para> |
| 1935 | There are some convenience routines for very common operations | 1885 | There are some convenience routines for very common operations |
| 1936 | such as walking the page tree and returning a vector of all page | 1886 | such as walking the page tree and returning a vector of all page |
| 1937 | - objects. For full details, please see the header file | ||
| 1938 | - <filename>QPDF.hh</filename>. | 1887 | + objects. For full details, please see the header files |
| 1888 | + <filename>QPDF.hh</filename> and | ||
| 1889 | + <filename>QPDFObjectHandle.hh</filename>. There are also some | ||
| 1890 | + additional helper classes that provide higher level API functions | ||
| 1891 | + for certain document constructions. These are discussed in <xref | ||
| 1892 | + linkend="ref.helper-classes"/>. | ||
| 1893 | + </para> | ||
| 1894 | + </sect1> | ||
| 1895 | + <sect1 id="ref.helper-classes"> | ||
| 1896 | + <title>Helper Classes</title> | ||
| 1897 | + <para> | ||
| 1898 | + QPDF version 8.1 introduced the concept of helper classes. Helper | ||
| 1899 | + classes are intended to contain higher level APIs that allow | ||
| 1900 | + developers to work with certain document constructs at an | ||
| 1901 | + abstraction level above that of | ||
| 1902 | + <classname>QPDFObjectHandle</classname> while staying true to | ||
| 1903 | + qpdf's philosophy of not hiding document structure from the | ||
| 1904 | + developer. As with qpdf in general, the goal is take away some of | ||
| 1905 | + the more tedious bookkeeping aspects of working with PDF files, | ||
| 1906 | + not to remove the need for the developer to understand how the PDF | ||
| 1907 | + construction in question works. The driving factor behind the | ||
| 1908 | + creation of helper classes was to allow the evolution of higher | ||
| 1909 | + level interfaces in qpdf without polluting the interfaces of the | ||
| 1910 | + main top-level classes <classname>QPDF</classname> and | ||
| 1911 | + <classname>QPDFObjectHandle</classname>. | ||
| 1912 | + </para> | ||
| 1913 | + <para> | ||
| 1914 | + There are two kinds of helper classes: | ||
| 1915 | + <emphasis>document</emphasis> helpers and | ||
| 1916 | + <emphasis>object</emphasis> helpers. Document helpers are | ||
| 1917 | + constructed with a reference to a <classname>QPDF</classname> | ||
| 1918 | + object and provide methods for working with structures that are at | ||
| 1919 | + the document level. Object helpers are constructed with an | ||
| 1920 | + instance of a <classname>QPDFObjectHandle</classname> and provide | ||
| 1921 | + methods for working with specific types of objects. | ||
| 1922 | + </para> | ||
| 1923 | + <para> | ||
| 1924 | + Examples of document helpers include | ||
| 1925 | + <classname>QPDFPageDocumentHelper</classname>, which contains | ||
| 1926 | + methods for operating on the document's page trees, such as | ||
| 1927 | + enumerating all pages of a document and adding and removing pages; | ||
| 1928 | + and <classname>QPDFAcroFormDocumentHelper</classname>, which | ||
| 1929 | + contains document-level methods related to interactive forms, such | ||
| 1930 | + as enumerating form fields and creating mappings between form | ||
| 1931 | + fields and annotations. | ||
| 1932 | + </para> | ||
| 1933 | + <para> | ||
| 1934 | + Examples of object helpers include | ||
| 1935 | + <classname>QPDFPageObjectHelper</classname> for performing | ||
| 1936 | + operations on pages such as page rotation and some operations on | ||
| 1937 | + content streams, <classname>QPDFFormFieldObjectHelper</classname> | ||
| 1938 | + for performing operations related to interactive form fields, and | ||
| 1939 | + <classname>QPDFAnnotationObjectHelper</classname> for working with | ||
| 1940 | + annotations. | ||
| 1941 | + </para> | ||
| 1942 | + <para> | ||
| 1943 | + It is always possible to retrieve the underlying | ||
| 1944 | + <classname>QPDF</classname> reference from a document helper and | ||
| 1945 | + the underlying <classname>QPDFObjectHandle</classname> reference | ||
| 1946 | + from an object helper. Helpers are designed to be helpers, not | ||
| 1947 | + wrappers. The intention is that, in general, it is safe to freely | ||
| 1948 | + intermix operations that use helpers with operations that use the | ||
| 1949 | + underlying objects. Document and object helpers do not attempt to | ||
| 1950 | + provide a complete interface for working with the things they are | ||
| 1951 | + helping with, nor do they attempt to encapsulate underlying | ||
| 1952 | + structures. They just provide a few methods to help with | ||
| 1953 | + error-prone, repetitive, or complex tasks. In some cases, a helper | ||
| 1954 | + object may cache some information that is expensive to gather. In | ||
| 1955 | + such cases, the helper classes are implemented so that their own | ||
| 1956 | + methods keep the cache consistent, and the header file will | ||
| 1957 | + provide a method to invalidate the cache and a description of what | ||
| 1958 | + kinds of operations would make the cache invalid. If in doubt, you | ||
| 1959 | + can always discard a helper class and create a new one with the | ||
| 1960 | + same underlying objects, which will ensure that you have discarded | ||
| 1961 | + any stale information. | ||
| 1962 | + </para> | ||
| 1963 | + <para> | ||
| 1964 | + By Convention, document helpers are called | ||
| 1965 | + <classname>QPDFSomethingDocumentHelper</classname> and are derived | ||
| 1966 | + from <classname>QPDFDocumentHelper</classname>, and object helpers | ||
| 1967 | + are called <classname>QPDFSomethingObjectHelper</classname> and | ||
| 1968 | + are derived from <classname>QPDFObjectHelper</classname>. For | ||
| 1969 | + details on specific helpers, please see their header files. You | ||
| 1970 | + can find them by looking at | ||
| 1971 | + <filename>include/qpdf/QPDF*DocumentHelper.hh</filename> and | ||
| 1972 | + <filename>include/qpdf/QPDF*ObjectHelper.hh</filename>. | ||
| 1973 | + </para> | ||
| 1974 | + <para> | ||
| 1975 | + In order to avoid creation of circular dependencies, the following | ||
| 1976 | + general guidelines are followed with helper classes: | ||
| 1977 | + <itemizedlist> | ||
| 1978 | + <listitem> | ||
| 1979 | + <para> | ||
| 1980 | + Core class interfaces do not know about helper classes. For | ||
| 1981 | + example, no methods of <classname>QPDF</classname> or | ||
| 1982 | + <classname>QPDFObjectHandle</classname> will include helper | ||
| 1983 | + classes in their interfaces. | ||
| 1984 | + </para> | ||
| 1985 | + </listitem> | ||
| 1986 | + <listitem> | ||
| 1987 | + <para> | ||
| 1988 | + Interfaces of object helpers will usually not use document | ||
| 1989 | + helpers in their interfaces. This is because it is much more | ||
| 1990 | + useful for document helpers to have methods that return object | ||
| 1991 | + helpers. Most operations in PDF files start at the document | ||
| 1992 | + level and go from there to the object level rather than the | ||
| 1993 | + other way around. It can sometimes be useful to map back from | ||
| 1994 | + object-level structures to document-level structures. If there | ||
| 1995 | + is a desire to do this, it will generally be provided by a | ||
| 1996 | + method in the document helper class. | ||
| 1997 | + </para> | ||
| 1998 | + </listitem> | ||
| 1999 | + <listitem> | ||
| 2000 | + <para> | ||
| 2001 | + Most of the time, object helpers don't know about other object | ||
| 2002 | + helpers. However, in some cases, one type of object may be a | ||
| 2003 | + container for another type of object, in which case it may make | ||
| 2004 | + sense for the outer object to know about the inner object. For | ||
| 2005 | + example, there are methods in the | ||
| 2006 | + <classname>QPDFPageObjectHelper</classname> that know | ||
| 2007 | + <classname>QPDFAnnotationObjectHelper</classname> because | ||
| 2008 | + references to annotations are contained in page dictionaries. | ||
| 2009 | + </para> | ||
| 2010 | + </listitem> | ||
| 2011 | + <listitem> | ||
| 2012 | + <para> | ||
| 2013 | + Any helper or core library class may use helpers in their | ||
| 2014 | + implementations. | ||
| 2015 | + </para> | ||
| 2016 | + </listitem> | ||
| 2017 | + </itemizedlist> | ||
| 2018 | + </para> | ||
| 2019 | + <para> | ||
| 2020 | + Prior to qpdf version 8.1, higher level interfaces were added as | ||
| 2021 | + “convenience functions” in either | ||
| 2022 | + <classname>QPDF</classname> or | ||
| 2023 | + <classname>QPDFObjectHandle</classname>. For compatibility, older | ||
| 2024 | + convenience functions for operating with pages will remain in | ||
| 2025 | + those classes even as alternatives are provided in helper classes. | ||
| 2026 | + Going forward, new higher level interfaces will be provided using | ||
| 2027 | + helper classes. | ||
| 2028 | + </para> | ||
| 2029 | + </sect1> | ||
| 2030 | + <sect1 id="ref.implementation-notes"> | ||
| 2031 | + <title>Implementation Notes</title> | ||
| 2032 | + <para> | ||
| 2033 | + This section contains a few notes about QPDF's internal | ||
| 2034 | + implementation, particularly around what it does when it first | ||
| 2035 | + processes a file. This section is a bit of a simplification of | ||
| 2036 | + what it actually does, but it could serve as a starting point to | ||
| 2037 | + someone trying to understand the implementation. There is nothing | ||
| 2038 | + in this section that you need to know to use the qpdf library. | ||
| 2039 | + </para> | ||
| 2040 | + <para> | ||
| 2041 | + <classname>QPDFObject</classname> is the basic PDF Object class. | ||
| 2042 | + It is an abstract base class from which are derived classes for | ||
| 2043 | + each type of PDF object. Clients do not interact with Objects | ||
| 2044 | + directly but instead interact with | ||
| 2045 | + <classname>QPDFObjectHandle</classname>. | ||
| 2046 | + </para> | ||
| 2047 | + <para> | ||
| 2048 | + When the <classname>QPDF</classname> class creates a new object, | ||
| 2049 | + it dynamically allocates the appropriate type of | ||
| 2050 | + <classname>QPDFObject</classname> and immediately hands the | ||
| 2051 | + pointer to an instance of <classname>QPDFObjectHandle</classname>. | ||
| 2052 | + The parser reads a token from the current file position. If the | ||
| 2053 | + token is a not either a dictionary or array opener, an object is | ||
| 2054 | + immediately constructed from the single token and the parser | ||
| 2055 | + returns. Otherwise, the parser iterates in a special mode in which | ||
| 2056 | + it accumulates objects until it finds a balancing closer. During | ||
| 2057 | + this process, the “<literal>R</literal>” keyword is | ||
| 2058 | + recognized and an indirect <classname>QPDFObjectHandle</classname> | ||
| 2059 | + may be constructed. | ||
| 2060 | + </para> | ||
| 2061 | + <para> | ||
| 2062 | + The <function>QPDF::resolve()</function> method, which is used to | ||
| 2063 | + resolve an indirect object, may be invoked from the | ||
| 2064 | + <classname>QPDFObjectHandle</classname> class. It first checks a | ||
| 2065 | + cache to see whether this object has already been read. If not, | ||
| 2066 | + it reads the object from the PDF file and caches it. It the | ||
| 2067 | + returns the resulting <classname>QPDFObjectHandle</classname>. | ||
| 2068 | + The calling object handle then replaces its | ||
| 2069 | + <classname>PointerHolder<QDFObject></classname> with the one | ||
| 2070 | + from the newly returned <classname>QPDFObjectHandle</classname>. | ||
| 2071 | + In this way, only a single copy of any direct object need exist | ||
| 2072 | + and clients can access objects transparently without knowing | ||
| 2073 | + caring whether they are direct or indirect objects. Additionally, | ||
| 2074 | + no object is ever read from the file more than once. That means | ||
| 2075 | + that only the portions of the PDF file that are actually needed | ||
| 2076 | + are ever read from the input file, thus allowing the qpdf package | ||
| 2077 | + to take advantage of this important design goal of PDF files. | ||
| 2078 | + </para> | ||
| 2079 | + <para> | ||
| 2080 | + If the requested object is inside of an object stream, the object | ||
| 2081 | + stream itself is first read into memory. Then the tokenizer reads | ||
| 2082 | + objects from the memory stream based on the offset information | ||
| 2083 | + stored in the stream. Those individual objects are cached, after | ||
| 2084 | + which the temporary buffer holding the object stream contents are | ||
| 2085 | + discarded. In this way, the first time an object in an object | ||
| 2086 | + stream is requested, all objects in the stream are cached. | ||
| 1939 | </para> | 2087 | </para> |
| 1940 | <para> | 2088 | <para> |
| 1941 | The following example should clarify how | 2089 | The following example should clarify how |
| @@ -1951,12 +2099,11 @@ outfile.pdf</option> | @@ -1951,12 +2099,11 @@ outfile.pdf</option> | ||
| 1951 | <listitem> | 2099 | <listitem> |
| 1952 | <para> | 2100 | <para> |
| 1953 | The <classname>QPDF</classname> class checks the beginning of | 2101 | The <classname>QPDF</classname> class checks the beginning of |
| 1954 | - <filename>a.pdf</filename> for | ||
| 1955 | - <literal>%!PDF-1.[0-9]+</literal>. It then reads the cross | ||
| 1956 | - reference table mentioned at the end of the file, ensuring that | ||
| 1957 | - it is looking before the last <literal>%%EOF</literal>. After | ||
| 1958 | - getting to <literal>trailer</literal> keyword, it invokes the | ||
| 1959 | - parser. | 2102 | + <filename>a.pdf</filename> for a PDF header. It then reads the |
| 2103 | + cross reference table mentioned at the end of the file, | ||
| 2104 | + ensuring that it is looking before the last | ||
| 2105 | + <literal>%%EOF</literal>. After getting to | ||
| 2106 | + <literal>trailer</literal> keyword, it invokes the parser. | ||
| 1960 | </para> | 2107 | </para> |
| 1961 | </listitem> | 2108 | </listitem> |
| 1962 | <listitem> | 2109 | <listitem> |