Commit 5d63730b9347a755d2906f7a929db9dba71ea37f
1 parent
12d065c7
Clean up documentation
Showing
6 changed files
with
153 additions
and
114 deletions
TODO
| @@ -9,7 +9,9 @@ Before Release: | @@ -9,7 +9,9 @@ Before Release: | ||
| 9 | * Release qtest with updates to qtest-driver and copy back into qpdf | 9 | * Release qtest with updates to qtest-driver and copy back into qpdf |
| 10 | 10 | ||
| 11 | Next: | 11 | Next: |
| 12 | -* JSON v2 fixes | 12 | +* Support json v2 in the C API. At a minimum, write_json, |
| 13 | + create_from_json, and update_from_json need to be there and should | ||
| 14 | + take the same kinds of functions as the C API for logger. | ||
| 13 | 15 | ||
| 14 | Pending changes: | 16 | Pending changes: |
| 15 | 17 | ||
| @@ -65,19 +67,6 @@ direct objects, which are always "resolved" in QPDFObjectHandle. | @@ -65,19 +67,6 @@ direct objects, which are always "resolved" in QPDFObjectHandle. | ||
| 65 | 67 | ||
| 66 | Soon: Break ground on "Document-level work" | 68 | Soon: Break ground on "Document-level work" |
| 67 | 69 | ||
| 68 | - | ||
| 69 | -JSON v2 fixes | ||
| 70 | -============= | ||
| 71 | - | ||
| 72 | -* Support json v2 in the C API. At a minimum, write_json, | ||
| 73 | - create_from_json, and update_from_json need to be there and should | ||
| 74 | - take the same kinds of functions as the C API for logger. | ||
| 75 | - | ||
| 76 | -* Address json.rst comment from m-holger: "The discussion of stream | ||
| 77 | - objects is very wordy. Would a table similar to the style of the PDF | ||
| 78 | - spec be easier to use?" | ||
| 79 | - | ||
| 80 | - | ||
| 81 | Possible future JSON enhancements | 70 | Possible future JSON enhancements |
| 82 | ================================= | 71 | ================================= |
| 83 | 72 |
job.sums
| @@ -8,10 +8,10 @@ include/qpdf/auto_job_c_pages.hh b3cc0f21029f6d89efa043dcdbfa183cb59325b6506001c | @@ -8,10 +8,10 @@ include/qpdf/auto_job_c_pages.hh b3cc0f21029f6d89efa043dcdbfa183cb59325b6506001c | ||
| 8 | include/qpdf/auto_job_c_uo.hh ae21b69a1efa9333050f4833d465f6daff87e5b38e5106e49bbef5d4132e4ed1 | 8 | include/qpdf/auto_job_c_uo.hh ae21b69a1efa9333050f4833d465f6daff87e5b38e5106e49bbef5d4132e4ed1 |
| 9 | job.yml f9564f18b08a45d17328af43652645771d3498471820c858b8c9013a193e1412 | 9 | job.yml f9564f18b08a45d17328af43652645771d3498471820c858b8c9013a193e1412 |
| 10 | libqpdf/qpdf/auto_job_decl.hh 7844eba58edffb9494b19e8eca6fd59a24d6e152ca606c3b07da569f753df2da | 10 | libqpdf/qpdf/auto_job_decl.hh 7844eba58edffb9494b19e8eca6fd59a24d6e152ca606c3b07da569f753df2da |
| 11 | -libqpdf/qpdf/auto_job_help.hh 700d7600b34588169c80f3e325e39e592e2f5c1af1cdac16614150ff38424b40 | 11 | +libqpdf/qpdf/auto_job_help.hh 53306e4aef8aaca641c0087bc9e064ada1c44a94b826c0bcac7b4eb0c8c41fd5 |
| 12 | libqpdf/qpdf/auto_job_init.hh fd1635a5ad6ba16b7ae008467145560a59a5ecfd10d29c5ef7cd0d8347747cd2 | 12 | libqpdf/qpdf/auto_job_init.hh fd1635a5ad6ba16b7ae008467145560a59a5ecfd10d29c5ef7cd0d8347747cd2 |
| 13 | libqpdf/qpdf/auto_job_json_decl.hh 06caa46eaf71db8a50c046f91866baa8087745a9474319fb7c86d92634cc8297 | 13 | libqpdf/qpdf/auto_job_json_decl.hh 06caa46eaf71db8a50c046f91866baa8087745a9474319fb7c86d92634cc8297 |
| 14 | libqpdf/qpdf/auto_job_json_init.hh 59545578a2e47c660ff98516ed53f06638be75eb4658e2a09d32cc08e0cb7268 | 14 | libqpdf/qpdf/auto_job_json_init.hh 59545578a2e47c660ff98516ed53f06638be75eb4658e2a09d32cc08e0cb7268 |
| 15 | libqpdf/qpdf/auto_job_schema.hh 5352ef1be1ad7cc6f4f36dab88f2937d278e6bd3a0e2d46259794dc226c8ba6b | 15 | libqpdf/qpdf/auto_job_schema.hh 5352ef1be1ad7cc6f4f36dab88f2937d278e6bd3a0e2d46259794dc226c8ba6b |
| 16 | manual/_ext/qpdf.py 6add6321666031d55ed4aedf7c00e5662bba856dfcd66ccb526563bffefbb580 | 16 | manual/_ext/qpdf.py 6add6321666031d55ed4aedf7c00e5662bba856dfcd66ccb526563bffefbb580 |
| 17 | -manual/cli.rst bbce4cfb662a96c8df0c8563f8065844b77aca7b4ec6385955546b9a455d9953 | 17 | +manual/cli.rst 41ee93f23f46160fe9eaf7c99fd2ab3bd2e0f6792a341a35bdac1a41cb853ed5 |
libqpdf/qpdf/auto_job_help.hh
| @@ -813,7 +813,8 @@ ap.addOptionHelp("--json-key", "json", "limit which keys are in JSON output", R" | @@ -813,7 +813,8 @@ ap.addOptionHelp("--json-key", "json", "limit which keys are in JSON output", R" | ||
| 813 | 813 | ||
| 814 | This option is repeatable. If given, only the specified | 814 | This option is repeatable. If given, only the specified |
| 815 | top-level keys will be included in the JSON output. Otherwise, | 815 | top-level keys will be included in the JSON output. Otherwise, |
| 816 | -all keys will be included. | 816 | +all keys will be included. With --json-output, when not given, |
| 817 | +only the "qpdf" key will appear in the output. | ||
| 817 | )"); | 818 | )"); |
| 818 | ap.addOptionHelp("--json-object", "json", "limit which objects are in JSON", R"(--json-object={trailer|obj[,gen]} | 819 | ap.addOptionHelp("--json-object", "json", "limit which objects are in JSON", R"(--json-object={trailer|obj[,gen]} |
| 819 | 820 |
manual/cli.rst
| @@ -913,7 +913,7 @@ Related Options | @@ -913,7 +913,7 @@ Related Options | ||
| 913 | qpdf will recompress streams with generalized filters using flate | 913 | qpdf will recompress streams with generalized filters using flate |
| 914 | compression, effectively eliminating LZW and ASCII-based filters. | 914 | compression, effectively eliminating LZW and ASCII-based filters. |
| 915 | This is usually desirable behavior but can be disabled with | 915 | This is usually desirable behavior but can be disabled with |
| 916 | - ``--decode-level=none``. Note that ``--decode-level=node`` is the | 916 | + ``--decode-level=none``. Note that ``--decode-level=none`` is the |
| 917 | default when :qpdf:ref:`--json-output` is specified, but it can be | 917 | default when :qpdf:ref:`--json-output` is specified, but it can be |
| 918 | overridden in that case as well. | 918 | overridden in that case as well. |
| 919 | 919 | ||
| @@ -3197,7 +3197,8 @@ Related Options | @@ -3197,7 +3197,8 @@ Related Options | ||
| 3197 | Starting with qpdf 11, when this option is specified, an output | 3197 | Starting with qpdf 11, when this option is specified, an output |
| 3198 | file is optional (for backward compatibility) and defaults to | 3198 | file is optional (for backward compatibility) and defaults to |
| 3199 | standard output. You may specify an output file to write the JSON | 3199 | standard output. You may specify an output file to write the JSON |
| 3200 | - to a file rather than standard output. | 3200 | + to a file rather than standard output. (Example: ``qpdf --json |
| 3201 | + in.pdf out.json``) | ||
| 3201 | 3202 | ||
| 3202 | Stream data is only included if :qpdf:ref:`--json-output` is | 3203 | Stream data is only included if :qpdf:ref:`--json-output` is |
| 3203 | specified or if a value other than ``none`` is passed to | 3204 | specified or if a value other than ``none`` is passed to |
| @@ -3225,14 +3226,16 @@ Related Options | @@ -3225,14 +3226,16 @@ Related Options | ||
| 3225 | 3226 | ||
| 3226 | This option is repeatable. If given, only the specified | 3227 | This option is repeatable. If given, only the specified |
| 3227 | top-level keys will be included in the JSON output. Otherwise, | 3228 | top-level keys will be included in the JSON output. Otherwise, |
| 3228 | - all keys will be included. | 3229 | + all keys will be included. With --json-output, when not given, |
| 3230 | + only the "qpdf" key will appear in the output. | ||
| 3229 | 3231 | ||
| 3230 | This option is repeatable. If given, only the specified top-level | 3232 | This option is repeatable. If given, only the specified top-level |
| 3231 | keys will be included in the JSON output. Otherwise, all keys will | 3233 | keys will be included in the JSON output. Otherwise, all keys will |
| 3232 | - be included. ``version`` and ``parameters`` will always appear in | ||
| 3233 | - the output. If not given, all keys will be included, unless | 3234 | + be included. If not given, all keys will be included, unless |
| 3234 | :qpdf:ref:`--json-output` was specified, in which case, only the | 3235 | :qpdf:ref:`--json-output` was specified, in which case, only the |
| 3235 | - ``"qpdf"`` key will be included by default. | 3236 | + ``"qpdf"`` key will be included by default. If |
| 3237 | + :qpdf:ref:`--json-output` was not given, the ``version`` and | ||
| 3238 | + ``parameters`` keys will always appear in the output. | ||
| 3236 | 3239 | ||
| 3237 | .. qpdf:option:: --json-object={trailer|obj[,gen]} | 3240 | .. qpdf:option:: --json-object={trailer|obj[,gen]} |
| 3238 | 3241 | ||
| @@ -3311,8 +3314,8 @@ Related Options | @@ -3311,8 +3314,8 @@ Related Options | ||
| 3311 | output, but you can add additional keys with | 3314 | output, but you can add additional keys with |
| 3312 | :qpdf:ref:`--json-key`. | 3315 | :qpdf:ref:`--json-key`. |
| 3313 | 3316 | ||
| 3314 | - - Excludes the ``"version"`` and ``"parameters"`` keys from the | ||
| 3315 | - JSON output. | 3317 | + - The ``"version"`` and ``"parameters"`` keys will be excluded from |
| 3318 | + the JSON output. | ||
| 3316 | 3319 | ||
| 3317 | If you want to look at the contents of streams easily as you would | 3320 | If you want to look at the contents of streams easily as you would |
| 3318 | in QDF mode (see :ref:`qdf`), you can use | 3321 | in QDF mode (see :ref:`qdf`), you can use |
manual/conf.py
| @@ -35,6 +35,7 @@ latex_elements = { | @@ -35,6 +35,7 @@ latex_elements = { | ||
| 35 | 'preamble': r''' | 35 | 'preamble': r''' |
| 36 | \sphinxDUC{2264}{$\leq$} | 36 | \sphinxDUC{2264}{$\leq$} |
| 37 | \sphinxDUC{2265}{$\geq$} | 37 | \sphinxDUC{2265}{$\geq$} |
| 38 | +\sphinxDUC{03C0}{$\pi$} | ||
| 38 | ''', | 39 | ''', |
| 39 | } | 40 | } |
| 40 | highlight_language = 'none' | 41 | highlight_language = 'none' |
manual/json.rst
| @@ -24,28 +24,33 @@ represents the contents of a PDF file. This is distinct from the | @@ -24,28 +24,33 @@ represents the contents of a PDF file. This is distinct from the | ||
| 24 | interacting with qpdf the way the command-line tool does. For | 24 | interacting with qpdf the way the command-line tool does. For |
| 25 | information about that, see :ref:`qpdf-job`. | 25 | information about that, see :ref:`qpdf-job`. |
| 26 | 26 | ||
| 27 | -The qpdf JSON format is specific to qpdf. With JSON version 2, the | ||
| 28 | -:qpdf:ref:`--json` command-line flag causes creation of a JSON | ||
| 29 | -representation of all the objects in a PDF file. This includes an | ||
| 30 | -unambiguous representation of the PDF object structure and also | ||
| 31 | -provides JSON-formatted summaries of other information about the file. | ||
| 32 | -This functionality is built into ``QPDFJob`` and can be accessed from | ||
| 33 | -the ``qpdf`` command-line tool or from the ``QPDFJob`` C or C++ API. | ||
| 34 | - | ||
| 35 | -By default, stream data is omitted, but it can be included by | ||
| 36 | -specifying the :qpdf:ref:`--json-stream-data` option. With stream data | ||
| 37 | -included, the generated JSON file completely represents a PDF file. | ||
| 38 | -You can think of this as using JSON as an *alternative syntax* for | ||
| 39 | -representing a PDF file. Using qpdf JSON, it is possible to convert a | ||
| 40 | -PDF file to JSON, manipulate the structure or contents of the objects | ||
| 41 | -at a low level, and convert the results back to a PDF file. This | ||
| 42 | -functionality can be accessed from the command-line with the | ||
| 43 | -:qpdf:ref:`--json-input`, and :qpdf:ref:`--update-from-json` flags, or | ||
| 44 | -from the API using the ``QPDF::writeJSON``, ``QPDF::createFromJSON``, | ||
| 45 | -and ``QPDF::updateFromJSON`` methods. The :qpdf:ref:`--json-output` | ||
| 46 | -flag changes a handful of defaults so that the resulting JSON is as | ||
| 47 | -close as possible to the original input and is ready for being | ||
| 48 | -converted back to PDF. | 27 | +The qpdf JSON format is specific to qpdf. The :qpdf:ref:`--json` |
| 28 | +command-line flag causes creation of a JSON representation the objects | ||
| 29 | +in a PDF file along with JSON-formatted summaries of other information | ||
| 30 | +about the file. This functionality is built into ``QPDFJob`` and can | ||
| 31 | +be accessed from the ``qpdf`` command-line tool or from the | ||
| 32 | +``QPDFJob`` C or C++ API. | ||
| 33 | + | ||
| 34 | +Starting with qpdf JSON version 2, from qpdf 11.0.0, the JSON output | ||
| 35 | +includes an unambiguous and complete representation of the PDF objects | ||
| 36 | +and header. The information without the JSON-formatted summaries of | ||
| 37 | +other information is also available using the ``QPDF::writeJSON`` | ||
| 38 | +method. | ||
| 39 | + | ||
| 40 | +By default, stream data is omitted from the JSON data, but it can be | ||
| 41 | +included by specifying the :qpdf:ref:`--json-stream-data` option. With | ||
| 42 | +stream data included, the generated JSON file completely represents a | ||
| 43 | +PDF file. You can think of this as using JSON as an *alternative | ||
| 44 | +syntax* for representing a PDF file. Using qpdf JSON, it is possible | ||
| 45 | +to convert a PDF file to JSON, manipulate the structure or contents of | ||
| 46 | +the objects at a low level, and convert the results back to a PDF | ||
| 47 | +file. This functionality can be accessed from the command-line with | ||
| 48 | +the :qpdf:ref:`--json-input`, and :qpdf:ref:`--update-from-json` | ||
| 49 | +flags, or from the API using the ``QPDF::createFromJSON``, and | ||
| 50 | +``QPDF::updateFromJSON`` methods. The :qpdf:ref:`--json-output` flag | ||
| 51 | +changes a handful of defaults so that the resulting JSON is as close | ||
| 52 | +as possible to the original input and is ready for being converted | ||
| 53 | +back to PDF. | ||
| 49 | 54 | ||
| 50 | .. _json-terminology: | 55 | .. _json-terminology: |
| 51 | 56 | ||
| @@ -71,7 +76,8 @@ This manual is not entirely consistent about its use of *dictionary* | @@ -71,7 +76,8 @@ This manual is not entirely consistent about its use of *dictionary* | ||
| 71 | vs. *object* because sometimes one term or another is clearer in | 76 | vs. *object* because sometimes one term or another is clearer in |
| 72 | context. Just be aware of the ambiguity when reading the manual. We | 77 | context. Just be aware of the ambiguity when reading the manual. We |
| 73 | frequently use the term *dictionary* to refer to a JSON object because | 78 | frequently use the term *dictionary* to refer to a JSON object because |
| 74 | -of the consistency with PDF terminology. | 79 | +of the consistency with PDF terminology, particular when referring to |
| 80 | +a dictionary that contains information PDF objects. | ||
| 75 | 81 | ||
| 76 | .. _what-qpdf-json-is-not: | 82 | .. _what-qpdf-json-is-not: |
| 77 | 83 | ||
| @@ -121,12 +127,14 @@ qpdf JSON Object Representation | @@ -121,12 +127,14 @@ qpdf JSON Object Representation | ||
| 121 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 127 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
| 122 | 128 | ||
| 123 | This section describes the representation of PDF objects in qpdf JSON | 129 | This section describes the representation of PDF objects in qpdf JSON |
| 124 | -version 2. PDF objects are represented within the ``"qpdf"`` entry of | ||
| 125 | -a qpdf JSON file. The ``"qpdf"`` entry is a two-element array. The | ||
| 126 | -first element is a dictionary containing header-like information about | ||
| 127 | -the file such as the PDF version. The second element is a dictionary | ||
| 128 | -containing all the objects in the PDF file. We refer to this as the | ||
| 129 | -*objects dictionary*. | 130 | +version 2. An example appears in :ref:`json.example`. |
| 131 | + | ||
| 132 | +PDF objects are represented within the ``"qpdf"`` entry of a qpdf JSON | ||
| 133 | +file. The ``"qpdf"`` entry is a two-element array. The first element | ||
| 134 | +is a dictionary containing header-like information about the file such | ||
| 135 | +as the PDF version. The second element is a dictionary containing all | ||
| 136 | +the objects in the PDF file. We refer to this as the *objects | ||
| 137 | +dictionary*. | ||
| 130 | 138 | ||
| 131 | The first element contains the following keys: | 139 | The first element contains the following keys: |
| 132 | 140 | ||
| @@ -136,17 +144,19 @@ The first element contains the following keys: | @@ -136,17 +144,19 @@ The first element contains the following keys: | ||
| 136 | - ``"pdfversion"`` -- a string containing PDF version as indicated in | 144 | - ``"pdfversion"`` -- a string containing PDF version as indicated in |
| 137 | the PDF header (e.g. ``"1.7"``, ``"2.0"``) | 145 | the PDF header (e.g. ``"1.7"``, ``"2.0"``) |
| 138 | 146 | ||
| 139 | -- ``pushedinheritedpageresources`` -- a boolean indicating whether | ||
| 140 | - the library pushed inherited resources down to the page level. | ||
| 141 | - Certain library calls cause this to happen, and qpdf needs to know | ||
| 142 | - when reading a JSON file back in whether it should do this as it may | ||
| 143 | - cause certain objects to be renumbered. | 147 | +- ``pushedinheritedpageresources`` -- a boolean indicating whether the |
| 148 | + library pushed inherited resources down to the page level. Certain | ||
| 149 | + library calls cause this to happen, and qpdf needs to know when | ||
| 150 | + reading a JSON file back in whether it should do this as it may | ||
| 151 | + cause certain objects to be renumbered. This field is ignored when | ||
| 152 | + :qpdf:ref:`--update-from-json` was not given. | ||
| 144 | 153 | ||
| 145 | - ``calledgetallpages`` -- a boolean indicating whether | 154 | - ``calledgetallpages`` -- a boolean indicating whether |
| 146 | ``getAllPages`` was called prior to writing the JSON output. This | 155 | ``getAllPages`` was called prior to writing the JSON output. This |
| 147 | method causes page tree repair to occur, which may renumber some | 156 | method causes page tree repair to occur, which may renumber some |
| 148 | objects (in very rare cases of corrupted page trees), so qpdf needs | 157 | objects (in very rare cases of corrupted page trees), so qpdf needs |
| 149 | - to know this information when reading a JSON file back in. | 158 | + to know this information when reading a JSON file back in. This |
| 159 | + field is ignored when :qpdf:ref:`--update-from-json` was not given. | ||
| 150 | 160 | ||
| 151 | - ``"maxobjectid"`` -- a number indicating the object ID of the | 161 | - ``"maxobjectid"`` -- a number indicating the object ID of the |
| 152 | highest numbered object in the file. This is provided to make it | 162 | highest numbered object in the file. This is provided to make it |
| @@ -162,12 +172,12 @@ The first element contains the following keys: | @@ -162,12 +172,12 @@ The first element contains the following keys: | ||
| 162 | if objects are removed from a PDF file.) | 172 | if objects are removed from a PDF file.) |
| 163 | 173 | ||
| 164 | The second element is the objects dictionary. Each key in the objects | 174 | The second element is the objects dictionary. Each key in the objects |
| 165 | -dictionary is either ``"trailer"`` or a string of the form ``"obj:O G | ||
| 166 | -R"`` where ``O`` and ``G`` are the object and generation numbers and | ||
| 167 | -``R`` is the literal string ``R``. This is the PDF syntax for the | ||
| 168 | -indirect object reference prepended by ``obj:``. The value, | ||
| 169 | -representing the object itself, is a JSON object whose structure is | ||
| 170 | -described below. | 175 | +dictionary is either ``"trailer"`` or a string of the form |
| 176 | +:samp:`"obj:{O} {G} R"` where :samp:`{O}` and :samp:`{G}` are the | ||
| 177 | +object and generation numbers and ``R`` is the literal string ``R``. | ||
| 178 | +This is the PDF syntax for the indirect object reference prepended by | ||
| 179 | +``obj:``. The value, representing the object itself, is a JSON object | ||
| 180 | +whose structure is described below. | ||
| 171 | 181 | ||
| 172 | Top-level Stream Objects | 182 | Top-level Stream Objects |
| 173 | Stream objects are represented as a JSON object with the single key | 183 | Stream objects are represented as a JSON object with the single key |
| @@ -234,11 +244,11 @@ Object Values | @@ -234,11 +244,11 @@ Object Values | ||
| 234 | JSON as ``"/text/plain"`` and in PDF as ``"/text#2fplain"``. | 244 | JSON as ``"/text/plain"`` and in PDF as ``"/text#2fplain"``. |
| 235 | 245 | ||
| 236 | - Indirect object references are represented as JSON strings that | 246 | - Indirect object references are represented as JSON strings that |
| 237 | - look like a PDF indirect object reference and have the form ``"O G | ||
| 238 | - R"`` where ``O`` and ``G`` are the object and generation numbers | ||
| 239 | - and ``R`` is the literal string ``R``. For example, ``"3 0 R"`` | ||
| 240 | - would represent a reference to the object with object ID 3 and | ||
| 241 | - generation 0. | 247 | + look like a PDF indirect object reference and have the form |
| 248 | + :samp:`"{O} {G} R"` where :samp:`{O}` and :samp:`{G}` are the | ||
| 249 | + object and generation numbers and ``R`` is the literal string | ||
| 250 | + ``R``. For example, ``"3 0 R"`` would represent a reference to the | ||
| 251 | + object with object ID 3 and generation 0. | ||
| 242 | 252 | ||
| 243 | - PDF strings are represented as JSON strings in one of two ways: | 253 | - PDF strings are represented as JSON strings in one of two ways: |
| 244 | 254 | ||
| @@ -288,11 +298,11 @@ Object Values | @@ -288,11 +298,11 @@ Object Values | ||
| 288 | 298 | ||
| 289 | Note that writing JSON output is done by ``QPDF``, not ``QPDFWriter``. | 299 | Note that writing JSON output is done by ``QPDF``, not ``QPDFWriter``. |
| 290 | As such, none of the things ``QPDFWriter`` does apply. This includes | 300 | As such, none of the things ``QPDFWriter`` does apply. This includes |
| 291 | -recompression of streams, renumbering of objects, anything to do with | ||
| 292 | -object streams (which are not represented by qpdf JSON at all since | ||
| 293 | -they are PDF syntax, not semantics), encryption, decryption, | ||
| 294 | -linearization, QDF mode, etc. See :ref:`rewriting` for a more in-depth | ||
| 295 | -discussion. | 301 | +recompression of streams, renumbering of objects, removal of |
| 302 | +unreferenced objects, anything to do with object streams (which are | ||
| 303 | +not represented by qpdf JSON at all since they are PDF syntax, not | ||
| 304 | +semantics), encryption, decryption, linearization, QDF mode, etc. See | ||
| 305 | +:ref:`rewriting` for a more in-depth discussion. | ||
| 296 | 306 | ||
| 297 | .. _json.example: | 307 | .. _json.example: |
| 298 | 308 | ||
| @@ -311,36 +321,55 @@ qpdf JSON format. | @@ -311,36 +321,55 @@ qpdf JSON format. | ||
| 311 | "pdfversion": "1.3", | 321 | "pdfversion": "1.3", |
| 312 | "pushedinheritedpageresources": false, | 322 | "pushedinheritedpageresources": false, |
| 313 | "calledgetallpages": false, | 323 | "calledgetallpages": false, |
| 314 | - "maxobjectid": 5 | 324 | + "maxobjectid": 6 |
| 315 | }, | 325 | }, |
| 316 | { | 326 | { |
| 317 | "obj:1 0 R": { | 327 | "obj:1 0 R": { |
| 318 | "value": { | 328 | "value": { |
| 319 | - "/Pages": "2 0 R", | 329 | + "/Pages": "3 0 R", |
| 320 | "/Type": "/Catalog" | 330 | "/Type": "/Catalog" |
| 321 | } | 331 | } |
| 322 | }, | 332 | }, |
| 323 | "obj:2 0 R": { | 333 | "obj:2 0 R": { |
| 324 | "value": { | 334 | "value": { |
| 335 | + "/Author": "u:Digits of ฯ", | ||
| 336 | + "/CreationDate": "u:D:20220731155308-05'00'", | ||
| 337 | + "/Creator": "u:A person typing in Emacs", | ||
| 338 | + "/Keywords": "u:potato, example", | ||
| 339 | + "/ModDate": "u:D:20220731155308-05'00'", | ||
| 340 | + "/Producer": "u:qpdf", | ||
| 341 | + "/Subject": "u:Example", | ||
| 342 | + "/Title": "u:Something potato-related" | ||
| 343 | + } | ||
| 344 | + }, | ||
| 345 | + "obj:3 0 R": { | ||
| 346 | + "value": { | ||
| 325 | "/Count": 1, | 347 | "/Count": 1, |
| 326 | - "/Kids": [ "3 0 R" ], | 348 | + "/Kids": [ |
| 349 | + "4 0 R" | ||
| 350 | + ], | ||
| 327 | "/Type": "/Pages" | 351 | "/Type": "/Pages" |
| 328 | } | 352 | } |
| 329 | }, | 353 | }, |
| 330 | - "obj:3 0 R": { | 354 | + "obj:4 0 R": { |
| 331 | "value": { | 355 | "value": { |
| 332 | - "/Contents": "4 0 R", | ||
| 333 | - "/MediaBox": [ 0, 0, 612, 792 ], | ||
| 334 | - "/Parent": "2 0 R", | 356 | + "/Contents": "5 0 R", |
| 357 | + "/MediaBox": [ | ||
| 358 | + 0, | ||
| 359 | + 0, | ||
| 360 | + 612, | ||
| 361 | + 792 | ||
| 362 | + ], | ||
| 363 | + "/Parent": "3 0 R", | ||
| 335 | "/Resources": { | 364 | "/Resources": { |
| 336 | "/Font": { | 365 | "/Font": { |
| 337 | - "/F1": "5 0 R" | 366 | + "/F1": "6 0 R" |
| 338 | } | 367 | } |
| 339 | }, | 368 | }, |
| 340 | "/Type": "/Page" | 369 | "/Type": "/Page" |
| 341 | } | 370 | } |
| 342 | }, | 371 | }, |
| 343 | - "obj:4 0 R": { | 372 | + "obj:5 0 R": { |
| 344 | "stream": { | 373 | "stream": { |
| 345 | "data": "eJxzCuFSUNB3M1QwMlEISQOyzY2AyEAhJAXI1gjIL0ksyddUCMnicg3hAgDLAQnI", | 374 | "data": "eJxzCuFSUNB3M1QwMlEISQOyzY2AyEAhJAXI1gjIL0ksyddUCMnicg3hAgDLAQnI", |
| 346 | "dict": { | 375 | "dict": { |
| @@ -348,7 +377,7 @@ qpdf JSON format. | @@ -348,7 +377,7 @@ qpdf JSON format. | ||
| 348 | } | 377 | } |
| 349 | } | 378 | } |
| 350 | }, | 379 | }, |
| 351 | - "obj:5 0 R": { | 380 | + "obj:6 0 R": { |
| 352 | "value": { | 381 | "value": { |
| 353 | "/BaseFont": "/Helvetica", | 382 | "/BaseFont": "/Helvetica", |
| 354 | "/Encoding": "/WinAnsiEncoding", | 383 | "/Encoding": "/WinAnsiEncoding", |
| @@ -360,10 +389,11 @@ qpdf JSON format. | @@ -360,10 +389,11 @@ qpdf JSON format. | ||
| 360 | "value": { | 389 | "value": { |
| 361 | "/ID": [ | 390 | "/ID": [ |
| 362 | "b:98b5a26966fba4d3a769b715b2558da6", | 391 | "b:98b5a26966fba4d3a769b715b2558da6", |
| 363 | - "b:98b5a26966fba4d3a769b715b2558da6" | 392 | + "b:6bea23330e0b9ff0ddb47b6757fb002e" |
| 364 | ], | 393 | ], |
| 394 | + "/Info": "2 0 R", | ||
| 365 | "/Root": "1 0 R", | 395 | "/Root": "1 0 R", |
| 366 | - "/Size": 6 | 396 | + "/Size": 7 |
| 367 | } | 397 | } |
| 368 | } | 398 | } |
| 369 | } | 399 | } |
| @@ -410,9 +440,6 @@ Here are some important things to know about qpdf JSON input. | @@ -410,9 +440,6 @@ Here are some important things to know about qpdf JSON input. | ||
| 410 | - ``"maxobjectid"`` is ignored, so it is not necessary to update it | 440 | - ``"maxobjectid"`` is ignored, so it is not necessary to update it |
| 411 | when adding new objects. | 441 | when adding new objects. |
| 412 | 442 | ||
| 413 | - - ``"calledgetallpages"`` and ``"pushedinheritedpageresources"`` are | ||
| 414 | - treated as false if omitted. | ||
| 415 | - | ||
| 416 | - ``"/Length"`` is ignored in all stream dictionaries. qpdf doesn't | 443 | - ``"/Length"`` is ignored in all stream dictionaries. qpdf doesn't |
| 417 | put it there when it creates JSON output, and it is not necessary | 444 | put it there when it creates JSON output, and it is not necessary |
| 418 | to add it. | 445 | to add it. |
| @@ -420,16 +447,24 @@ Here are some important things to know about qpdf JSON input. | @@ -420,16 +447,24 @@ Here are some important things to know about qpdf JSON input. | ||
| 420 | - ``"/Size"`` is ignored if it appears in a trailer dictionary as | 447 | - ``"/Size"`` is ignored if it appears in a trailer dictionary as |
| 421 | that is always recomputed by ``QPDFWriter``. | 448 | that is always recomputed by ``QPDFWriter``. |
| 422 | 449 | ||
| 423 | - - Unknown keys at the to top level of the file, within ``objects``, | ||
| 424 | - at the top level of each individual object (inside the object that | ||
| 425 | - has the ``"value"`` or ``"stream"`` key) and directly within | ||
| 426 | - ``"stream"`` are ignored for future compatibility. This includes | ||
| 427 | - other top-level keys generated by ``qpdf`` itself (such as | ||
| 428 | - ``"pages"``). As such, those keys don't have to be consistent with | ||
| 429 | - the ``"qpdf"`` key if modifying a JSON file for conversion back to | ||
| 430 | - PDF. If you wish to store application-specific metadata, you can | ||
| 431 | - do so by adding a key whose name starts with ``x-``. qpdf is | ||
| 432 | - guaranteed not to add any of its own keys that starts with ``x-``. | 450 | + - Unknown keys at the top level of the file, within ``"qpdf"``, and |
| 451 | + at the top level of each individual PDF object (inside the | ||
| 452 | + dictionary that has the ``"value"`` or ``"stream"`` key) and | ||
| 453 | + directly within ``"stream"`` are ignored for future compatibility. | ||
| 454 | + This includes other top-level keys generated by ``qpdf`` itself | ||
| 455 | + (such as ``"pages"``). As such, those keys don't have to be | ||
| 456 | + consistent with the ``"qpdf"`` key if modifying a JSON file for | ||
| 457 | + conversion back to PDF. If you wish to store application-specific | ||
| 458 | + metadata, you can do so by adding a key whose name starts with | ||
| 459 | + ``x-``. qpdf is guaranteed not to add any of its own keys that | ||
| 460 | + starts with ``x-``. Note that any ``"version"`` key at the top | ||
| 461 | + level is ignored. The JSON version is obtained from the | ||
| 462 | + ``"jsonversion"`` key of the first element of the ``"qpdf"`` | ||
| 463 | + field. | ||
| 464 | + | ||
| 465 | +- The values of ``"calledgetallpages"`` and | ||
| 466 | + ``"pushedinheritedpageresources"`` are ignored when creating a file. | ||
| 467 | + When updating a file, they treated as ``false`` if omitted. | ||
| 433 | 468 | ||
| 434 | - When qpdf reads a PDF file, the internal object numbers are always | 469 | - When qpdf reads a PDF file, the internal object numbers are always |
| 435 | preserved. However, when qpdf writes a file using ``QPDFWriter``, | 470 | preserved. However, when qpdf writes a file using ``QPDFWriter``, |
| @@ -465,14 +500,14 @@ Here are some important things to know about qpdf JSON input. | @@ -465,14 +500,14 @@ Here are some important things to know about qpdf JSON input. | ||
| 465 | ``QPDF::updateFromJSON``), existing objects are updated in place. | 500 | ``QPDF::updateFromJSON``), existing objects are updated in place. |
| 466 | This has the following implications: | 501 | This has the following implications: |
| 467 | 502 | ||
| 468 | - - You may omit both ``"data"`` and ``"datafile"`` if the object you | ||
| 469 | - are updating is already a stream. In that case the original stream | 503 | + - If the object you are updating is a stream, you may omit both |
| 504 | + ``"data"`` and ``"datafile"``. In that case the original stream | ||
| 470 | data is preserved. You must always provide a stream dictionary, | 505 | data is preserved. You must always provide a stream dictionary, |
| 471 | but it may be empty. Note that an empty stream dictionary will | 506 | but it may be empty. Note that an empty stream dictionary will |
| 472 | clear the old dictionary. There is no way to indicate that an old | 507 | clear the old dictionary. There is no way to indicate that an old |
| 473 | stream dictionary should be left alone, so if your intention is to | 508 | stream dictionary should be left alone, so if your intention is to |
| 474 | - replace the stream data and preserve the dictionary, the | ||
| 475 | - original dictionary must appear in the JSON file. | 509 | + replace the stream data and preserve the dictionary, the original |
| 510 | + dictionary must appear in the JSON file. | ||
| 476 | 511 | ||
| 477 | - You can change one object type to another object type including | 512 | - You can change one object type to another object type including |
| 478 | replacing a stream with a non-stream or a non-stream with a | 513 | replacing a stream with a non-stream or a non-stream with a |
| @@ -577,11 +612,14 @@ Compatibility | @@ -577,11 +612,14 @@ Compatibility | ||
| 577 | change would be any change that involves removal of a key, a change | 612 | change would be any change that involves removal of a key, a change |
| 578 | to the format of data pointed to by a key, or a semantic change | 613 | to the format of data pointed to by a key, or a semantic change |
| 579 | that requires a different interpretation of a previously existing | 614 | that requires a different interpretation of a previously existing |
| 580 | - key. | 615 | + key. Note that, starting with version 2, the JSON version also |
| 616 | + appears in the ``"jsonversion"`` field of the first element of | ||
| 617 | + ``"qpdf"`` field. | ||
| 581 | 618 | ||
| 582 | - With a specific qpdf JSON version, future versions of qpdf are free | ||
| 583 | - to add additional keys but not to remove keys or change the type of | ||
| 584 | - object that a key points to. | 619 | + Within a specific qpdf JSON version, future versions of qpdf are |
| 620 | + free to add additional keys but not to remove keys or change the | ||
| 621 | + type of object that a key points to. That means that consumers of | ||
| 622 | + qpdf JSON should ignore keys they don't know about. | ||
| 585 | 623 | ||
| 586 | Documentation | 624 | Documentation |
| 587 | The :command:`qpdf` command can be invoked with the | 625 | The :command:`qpdf` command can be invoked with the |
| @@ -634,7 +672,13 @@ Directness and Simplicity | @@ -634,7 +672,13 @@ Directness and Simplicity | ||
| 634 | functions in that it allows you to look at certain aspects of the | 672 | functions in that it allows you to look at certain aspects of the |
| 635 | PDF file without having to understand all the nuances of the PDF | 673 | PDF file without having to understand all the nuances of the PDF |
| 636 | specification, while the raw objects allow you to mine the PDF for | 674 | specification, while the raw objects allow you to mine the PDF for |
| 637 | - anything that the higher-level interfaces are lacking. | 675 | + anything that the higher-level interfaces are lacking. It is |
| 676 | + especially useful to create a JSON file with the ``"pages"`` and | ||
| 677 | + ``"qpdf"`` keys and to use the ``"pages"`` information to find a | ||
| 678 | + page rather than navigating the pages tree manually. This can be | ||
| 679 | + done safely, and changes can made to the objects dictionary without | ||
| 680 | + worrying about keeping ``"pages"`` up to date since it is ignored | ||
| 681 | + when reading the file back in. | ||
| 638 | 682 | ||
| 639 | .. _json.considerations: | 683 | .. _json.considerations: |
| 640 | 684 | ||
| @@ -741,10 +785,11 @@ version 2. | @@ -741,10 +785,11 @@ version 2. | ||
| 741 | dictionary within ``"objects"``, and the PDF version was not | 785 | dictionary within ``"objects"``, and the PDF version was not |
| 742 | captured at all. | 786 | captured at all. |
| 743 | 787 | ||
| 744 | -- Within the objects dictionary, keys are now ``"obj:O G R"`` where | ||
| 745 | - ``O`` and ``G`` are the object and generation number. ``"trailer"`` | ||
| 746 | - remains the key for the trailer dictionary. In v1, the ``obj:`` | ||
| 747 | - prefix was not present. The rationale for this change is as follows: | 788 | +- Within the objects dictionary, keys are now :samp:`"obj:{O} {G} R"` |
| 789 | + where :samp:`{O}` and :samp:`{G}` are the object and generation | ||
| 790 | + number. ``"trailer"`` remains the key for the trailer dictionary. In | ||
| 791 | + v1, the ``obj:`` prefix was not present. The rationale for this | ||
| 792 | + change is as follows: | ||
| 748 | 793 | ||
| 749 | - Having a unique prefix (``obj:``) makes it much easier to search | 794 | - Having a unique prefix (``obj:``) makes it much easier to search |
| 750 | in the JSON file for the definition of an object | 795 | in the JSON file for the definition of an object |