Commit 0bd908b550603a6bcc399a825a170a1263378b22
1 parent
b7bbf12e
Update documentation for qpdf JSON v2
Showing
14 changed files
with
903 additions
and
419 deletions
TODO
| ... | ... | @@ -2,14 +2,13 @@ |
| 2 | 2 | Next |
| 3 | 3 | ==== |
| 4 | 4 | |
| 5 | +Before Release: | |
| 6 | + | |
| 5 | 7 | * At next release, hide release-qpdf-10.6.3.0cmake* versions at readthedocs |
| 6 | 8 | * Stay on top of https://github.com/pikepdf/pikepdf/pull/315 |
| 7 | 9 | * Release qtest with updates to qtest-driver and copy back into qpdf |
| 8 | 10 | |
| 9 | -In order: | |
| 10 | -* json v2 | |
| 11 | - | |
| 12 | -Other (do in any order): | |
| 11 | +Pending changes: | |
| 13 | 12 | |
| 14 | 13 | * Good C API for json v2 |
| 15 | 14 | * QPDFPagesTree -- avoid ever flattening the pages tree. |
| ... | ... | @@ -50,180 +49,10 @@ Other (do in any order): |
| 50 | 49 | * Rework tests so that nothing is written into the source directory. |
| 51 | 50 | Ideally then the entire build could be done with a read-only |
| 52 | 51 | source tree. |
| 52 | +* Consider adding fuzzer code for JSON | |
| 53 | 53 | |
| 54 | 54 | Soon: Break ground on "Document-level work" |
| 55 | 55 | |
| 56 | -Output JSON v2 | |
| 57 | -============== | |
| 58 | - | |
| 59 | -Remaining work: | |
| 60 | - | |
| 61 | -* Make sure all the information from informational options is | |
| 62 | - available in the json output. | |
| 63 | - | |
| 64 | - * --check: add but maybe not by default? | |
| 65 | - | |
| 66 | - * --show-linearization: add but maybe not by default? Also figure | |
| 67 | - out whether warnings reported for some of the PDF specs (1.7) are | |
| 68 | - qpdf problems. This may not be worth adding in the first | |
| 69 | - increment. | |
| 70 | - | |
| 71 | - * --show-xref: add | |
| 72 | - | |
| 73 | -* Consider having --check, --show-encryption, etc., just select the | |
| 74 | - right keys when in json mode. I don't think I want check on by | |
| 75 | - default, so that might be different. | |
| 76 | - | |
| 77 | -* Consider having warnings be included in the json in a "warnings" key | |
| 78 | - in json mode. | |
| 79 | - | |
| 80 | -Notes for documentation: | |
| 81 | - | |
| 82 | -* Find all mentions of json in the manual and update. | |
| 83 | - | |
| 84 | -* Document typo fix in encrypt in release notes along with any other | |
| 85 | - non-compatible json 2 changes. Scrutinize all the output to decide | |
| 86 | - what should change. | |
| 87 | - | |
| 88 | -* Keys other than "qpdf-v2" are ignored so people can stash their own | |
| 89 | - stuff. Unknown keys are ignored at other places for future | |
| 90 | - compatibility. Readers of qpdf json should continue to ignore keys | |
| 91 | - they don't recognize. | |
| 92 | - | |
| 93 | -* Change: names are written in canonical form with a leading slash | |
| 94 | - just as they are treated in the code. In v1, they were written in | |
| 95 | - PDF syntax in the json file. Example: /text#2fplain in pdf will be | |
| 96 | - written as /text/plain in json v2 and as /text#2fplain in json v1. | |
| 97 | - | |
| 98 | -* Document changes to strings, objects, streams, object keys. | |
| 99 | - | |
| 100 | -* CLI: --json-input, --json-output[=version], --update-from-json. With | |
| 101 | - --json-input, the input file is a JSON file instead of a PDF file. | |
| 102 | - It must be complete, meaning that a PDF version must be given, all | |
| 103 | - streams must have exactly one of data or datafile, and a trailer | |
| 104 | - dictionary must be present, even if empty. | |
| 105 | - | |
| 106 | - With --update-from-json, the JSON file updates objects in place. If | |
| 107 | - updating an old stream, if stream data is omitted, the data remains | |
| 108 | - untouched. The dictionary is always required. Remember that | |
| 109 | - QPDFWriter does not preserve object numbers, though --json-output | |
| 110 | - does. Therefore, if you want to update a PDF with a JSON, the input | |
| 111 | - to --update-from-json must be the same PDF as the one that | |
| 112 | - --json-output was run on previously. Otherwise, object numbers won't | |
| 113 | - match. Show this with an example. When updating, | |
| 114 | - | |
| 115 | -* Certain fields are ignored when reading the JSON. This includes | |
| 116 | - maxobjectid, any computed fields in trailer (such as /Size), and all | |
| 117 | - /Length keys in stream dictionaries. There is no need for the user | |
| 118 | - to correct, remove, or otherwise worry about any values those keys | |
| 119 | - might have. The maxobjectid field is present in the original output | |
| 120 | - to assist with adding new objects to the file. | |
| 121 | - | |
| 122 | -* JSON strings within PDF objects: | |
| 123 | - | |
| 124 | - * "n n R" is an indirect object | |
| 125 | - | |
| 126 | - * "/Name" is a name in canonical form with a leading slash (like | |
| 127 | - "/text/plain"), not PDF syntax (like "/text#2fplain"). | |
| 128 | - | |
| 129 | - * "b:hex-digits" is a binary string ("b:feff03c0"). Hex digits may be | |
| 130 | - mixed case. There must be an even number of digits. | |
| 131 | - | |
| 132 | - * "u:utf-8" is a UTF-8 encoded string ("u:ฯ", "u:\u03c0"). UTF-16 | |
| 133 | - surrogate pairs are allowed. These are all equivalent: "u:๐ฅ", | |
| 134 | - "u:\ud83e\udd54", "b:FEFFD83EDD54", "b:efbbbff09fa594". | |
| 135 | - | |
| 136 | - * Both "b:" and "u:" are valid representations of the empty string. | |
| 137 | - | |
| 138 | - * Anything else is an error | |
| 139 | - | |
| 140 | -* Document use of --json-input and --json-output together to show | |
| 141 | - preservation of object numbers. Draw attention to "original object | |
| 142 | - ID" comments in qdf as another way to show it. | |
| 143 | - | |
| 144 | -* Document top-level keys of "qpdf-v2" ("pdfversion", "objects", | |
| 145 | - "maxobjectid") noting that "maxobjectid" is ignored when reading. | |
| 146 | - | |
| 147 | -* Stream data: "data" is base64-encoded stream data. "datafile" is the | |
| 148 | - path to a file (relative path recommended but not required) | |
| 149 | - containing the binary data. As with any PDF representation, the data | |
| 150 | - must be consistent with the filters. --decode-level is honored by | |
| 151 | - --json-output. | |
| 152 | - | |
| 153 | -* Other changes from v1: | |
| 154 | - | |
| 155 | - * in "objects", keys are "obj:o g R" or "trailer" | |
| 156 | - | |
| 157 | - * Non-stream objects are dictionaries with a "value" key whose value | |
| 158 | - is the object. Stream objects are dictionaries with a "stream" key | |
| 159 | - whose value is {"dict": stream-dictionary}. The "/Length" key is | |
| 160 | - omitted from the stream dictionary. | |
| 161 | - | |
| 162 | - * "objectinfo" is gone as it is now possible to tell a stream from a | |
| 163 | - non-stream directly. To get stream data, use the --json-output | |
| 164 | - option. Note about how "pages" may cause the pages tree to be | |
| 165 | - corrected. | |
| 166 | - | |
| 167 | -For non-streams: | |
| 168 | - | |
| 169 | - "obj:o g R": { | |
| 170 | - "value": ... | |
| 171 | - } | |
| 172 | - | |
| 173 | -For streams: | |
| 174 | - | |
| 175 | - "obj:o g R": { | |
| 176 | - "stream": { | |
| 177 | - "dict": { ... stream dictionary ... }, | |
| 178 | - "data": "base64-encoded data", | |
| 179 | - "datafile": "path to base64-encoded data" | |
| 180 | - } | |
| 181 | - } | |
| 182 | - | |
| 183 | -Rationale of "obj:o g R" is that indirect object references are just | |
| 184 | -"o g R", and so code that wants to resolve one can do so easily by | |
| 185 | -just prepending "obj:" and not having to parse or split the string. | |
| 186 | -Having a prefix rather than making the key just "o g R" makes it much | |
| 187 | -easier to search in the JSON for the definition of an object. | |
| 188 | - | |
| 189 | -CLI: | |
| 190 | - | |
| 191 | -Example workflow: | |
| 192 | -* qpdf in.pdf --json-output pdf.json | |
| 193 | -* edit pdf.json | |
| 194 | -* qpdf --json-input pdf.json out.pdf | |
| 195 | - | |
| 196 | -* qpdf in.pdf --json-output pdf.json | |
| 197 | -* edit pdf.json keeping only objects that need to be changed | |
| 198 | -* qpdf in.pdf --update-from-json=pdf.json out.pdf | |
| 199 | - | |
| 200 | -To modify a single object: | |
| 201 | - | |
| 202 | -* qpdf in.pdf --json-output pdf.json --json-object=o,g | |
| 203 | -* edit pdf.json | |
| 204 | -* qpdf in.pdf --update-from-json=pdf.json out.pdf | |
| 205 | - | |
| 206 | -Historical note: you can't create a PDF from v1 json because | |
| 207 | - | |
| 208 | -* The PDF version header is not recorded | |
| 209 | - | |
| 210 | -* Strings cannot be unambiguously encoded/decoded | |
| 211 | - | |
| 212 | - * Can't tell string from name from indirect object | |
| 213 | - | |
| 214 | - * Strings are treated as PDF doc encoding and output as UTF-8, which | |
| 215 | - doesn't work since multiple PDF doc code points are undefined and | |
| 216 | - is absurd for binary strings | |
| 217 | - | |
| 218 | -* There is no representation of stream data | |
| 219 | - | |
| 220 | -* You can't tell a stream from a dictionary except by looking in both | |
| 221 | - "object" and "objectinfo". | |
| 222 | - | |
| 223 | -* Using "n n R" as a key in "objects" and "objectinfo" makes it hard | |
| 224 | - to search for things when viewing the JSON file in an editor. | |
| 225 | - | |
| 226 | - | |
| 227 | 56 | QPDFPagesTree |
| 228 | 57 | ============= |
| 229 | 58 | |
| ... | ... | @@ -256,6 +85,28 @@ sure /Count and /Parent are correct. |
| 256 | 85 | refs/attic/QPDFPagesTree-old -- original, abandoned branch -- clean up |
| 257 | 86 | when done. |
| 258 | 87 | |
| 88 | +Possible future JSON enhancements | |
| 89 | +================================= | |
| 90 | + | |
| 91 | +* Add to JSON output the information available from a few additional | |
| 92 | + informational options: | |
| 93 | + | |
| 94 | + * --check: add but maybe not by default? | |
| 95 | + | |
| 96 | + * --show-linearization: add but maybe not by default? Also figure | |
| 97 | + out whether warnings reported for some of the PDF specs (1.7) are | |
| 98 | + qpdf problems. This may not be worth adding in the first | |
| 99 | + increment. | |
| 100 | + | |
| 101 | + * --show-xref: add | |
| 102 | + | |
| 103 | +* Consider having --check, --show-encryption, etc., just select the | |
| 104 | + right keys when in json mode. I don't think I want check on by | |
| 105 | + default, so that might be different. | |
| 106 | + | |
| 107 | +* Consider having warnings be included in the json in a "warnings" key | |
| 108 | + in json mode. | |
| 109 | + | |
| 259 | 110 | QPDFJob |
| 260 | 111 | ======= |
| 261 | 112 | ... | ... |
cSpell.json
include/qpdf/QPDF.hh
| ... | ... | @@ -112,8 +112,11 @@ class QPDF |
| 112 | 112 | |
| 113 | 113 | // Create a PDF from an input source that contains JSON as written |
| 114 | 114 | // by writeJSON (or qpdf --json-output, version 2 or higher). The |
| 115 | - // JSON must be a complete representation of a PDF. See "QPDF JSON | |
| 116 | - // Format" in the manual for details. | |
| 115 | + // JSON must be a complete representation of a PDF. See "qpdf | |
| 116 | + // JSON" in the manual for details. The input JSON may be | |
| 117 | + // arbitrarily large. QPDF does not load stream data into memory | |
| 118 | + // for more than one stream at a time, even if the stream data is | |
| 119 | + // specified inline. | |
| 117 | 120 | QPDF_DLL |
| 118 | 121 | void createFromJSON(std::string const& json_file); |
| 119 | 122 | QPDF_DLL |
| ... | ... | @@ -122,24 +125,40 @@ class QPDF |
| 122 | 125 | // Update a PDF from an input source that contains JSON in the |
| 123 | 126 | // same format as is written by writeJSON (or qpdf --json-output, |
| 124 | 127 | // version 2 or higher). Objects in the PDF and not in the JSON |
| 125 | - // are not modified. See "QPDF JSON Format" in the manual for | |
| 126 | - // details. | |
| 128 | + // are not modified. See "qpdf JSON" in the manual for details. As | |
| 129 | + // with createFromJSON, the input JSON may be arbitrarily large. | |
| 127 | 130 | QPDF_DLL |
| 128 | 131 | void updateFromJSON(std::string const& json_file); |
| 129 | 132 | QPDF_DLL |
| 130 | 133 | void updateFromJSON(std::shared_ptr<InputSource>); |
| 131 | 134 | |
| 132 | - // Write qpdf json format. The only supported version is 2. If | |
| 133 | - // wanted_objects is empty, write all objects. Otherwise, write | |
| 134 | - // only objects whose keys are in wanted_objects. Keys may be | |
| 135 | - // either "trailer" or of the form "obj:n n R". Invalid keys are | |
| 136 | - // ignored. | |
| 135 | + // Write qpdf json format to the pipeline "p". The only supported | |
| 136 | + // version is 2. The finish() method is called on the pipeline at | |
| 137 | + // the end. The decode_level parameter controls which streams are | |
| 138 | + // uncompressed in the JSON. Use qpdf_dl_none to preserve all | |
| 139 | + // stream data exactly as it appears in the input. The possible | |
| 140 | + // values for json_stream_data can be found in qpdf/Constants.h | |
| 141 | + // and correspond to the --json-stream-data command-line argument. | |
| 142 | + // If json_stream_data is qpdf_sj_file, file_prefix must be | |
| 143 | + // specified. Each stream will be written to a file whose path is | |
| 144 | + // constructed by appending "-nnn" to file_prefix, where "nnn" is | |
| 145 | + // the object number (not zero-filled). If wanted_objects is | |
| 146 | + // empty, write all objects. Otherwise, write only objects whose | |
| 147 | + // keys are in wanted_objects. Keys may be either "trailer" or of | |
| 148 | + // the form "obj:n n R". Invalid keys are ignored. This | |
| 149 | + // corresponds to the --json-object command-line argument. | |
| 150 | + // | |
| 151 | + // QPDF is efficient with regard to memory when writing, allowing | |
| 152 | + // you to write arbitrarily large PDF files to a pipeline. You can | |
| 153 | + // use a pipeline like Pl_Buffer or Pl_String to capture the JSON | |
| 154 | + // output in memory, but do so with caution as this will allocate | |
| 155 | + // enough memory to hold the entire PDF file. | |
| 137 | 156 | QPDF_DLL |
| 138 | 157 | void writeJSON( |
| 139 | 158 | int version, |
| 140 | - Pipeline*, | |
| 141 | - qpdf_stream_decode_level_e, | |
| 142 | - qpdf_json_stream_data_e, | |
| 159 | + Pipeline* p, | |
| 160 | + qpdf_stream_decode_level_e decode_level, | |
| 161 | + qpdf_json_stream_data_e json_stream_data, | |
| 143 | 162 | std::string const& file_prefix, |
| 144 | 163 | std::set<std::string> wanted_objects); |
| 145 | 164 | ... | ... |
job.sums
| ... | ... | @@ -8,10 +8,10 @@ include/qpdf/auto_job_c_pages.hh b3cc0f21029f6d89efa043dcdbfa183cb59325b6506001c |
| 8 | 8 | include/qpdf/auto_job_c_uo.hh ae21b69a1efa9333050f4833d465f6daff87e5b38e5106e49bbef5d4132e4ed1 |
| 9 | 9 | job.yml 3b2b3c6f92b48f6c76109711cbfdd74669fa31a80cd17379548b09f8e76be05d |
| 10 | 10 | libqpdf/qpdf/auto_job_decl.hh 74df4d7fdbdf51ecd0d58ce1e9844bb5525b9adac5a45f7c9a787ecdda2868df |
| 11 | -libqpdf/qpdf/auto_job_help.hh c1cc99f6fe17285ee5e40730f6280e37d17da1a5f408086ce34e01af121df7ad | |
| 11 | +libqpdf/qpdf/auto_job_help.hh 3aaae4cde004e5314d3ac6d554da575e40209c0f0611f6a308957986f9c7967b | |
| 12 | 12 | libqpdf/qpdf/auto_job_init.hh 7ea8e0641dc26fdfba6e283e14dbbff0c016654e174cdace8054f8bef53750fd |
| 13 | 13 | libqpdf/qpdf/auto_job_json_decl.hh 06caa46eaf71db8a50c046f91866baa8087745a9474319fb7c86d92634cc8297 |
| 14 | 14 | libqpdf/qpdf/auto_job_json_init.hh 5f6b53e3c81d4b54ce5c4cf9c3f52d0c02f987c53bf8841c0280367bad23e335 |
| 15 | 15 | libqpdf/qpdf/auto_job_schema.hh 9d543cd4a43eafffc2c4b8a6fee29e399c271c52cb6f7d417ae5497b3c1127dc |
| 16 | 16 | manual/_ext/qpdf.py 6add6321666031d55ed4aedf7c00e5662bba856dfcd66ccb526563bffefbb580 |
| 17 | -manual/cli.rst 82ead389c03bbf5e0498bd0571a11dc06544d591f4e4454c00322e3473fc556d | |
| 17 | +manual/cli.rst e3f4331befa17450e0d0fff87569722a5aab42ea619ef64f0a3a04e1f99ed65c | ... | ... |
libqpdf/QPDF_json.cc
libqpdf/qpdf/auto_job_help.hh
| ... | ... | @@ -70,6 +70,9 @@ ap.addOptionHelp("--copyright", "help", "show copyright information", R"(Display |
| 70 | 70 | ap.addOptionHelp("--show-crypto", "help", "show available crypto providers", R"(Show a list of available crypto providers, one per line. The |
| 71 | 71 | default provider is shown first. |
| 72 | 72 | )"); |
| 73 | +ap.addOptionHelp("--job-json-help", "help", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by | |
| 74 | +--job-json-file. | |
| 75 | +)"); | |
| 73 | 76 | ap.addHelpTopic("general", "general options", R"(General options control qpdf's behavior in ways that are not |
| 74 | 77 | directly related to the operation it is performing. |
| 75 | 78 | )"); |
| ... | ... | @@ -87,11 +90,11 @@ ap.addOptionHelp("--verbose", "general", "print additional information", R"(Outp |
| 87 | 90 | doing, including information about files created and operations |
| 88 | 91 | performed. |
| 89 | 92 | )"); |
| 90 | -ap.addOptionHelp("--progress", "general", "show progress when writing", R"(Indicate progress when writing files. | |
| 91 | -)"); | |
| 92 | 93 | } |
| 93 | 94 | static void add_help_2(QPDFArgParser& ap) |
| 94 | 95 | { |
| 96 | +ap.addOptionHelp("--progress", "general", "show progress when writing", R"(Indicate progress when writing files. | |
| 97 | +)"); | |
| 95 | 98 | ap.addOptionHelp("--no-warn", "general", "suppress printing of warning messages", R"(Suppress printing of warning messages. If warnings were |
| 96 | 99 | encountered, qpdf still exits with exit status 3. |
| 97 | 100 | Use --warning-exit-0 with --no-warn to completely ignore |
| ... | ... | @@ -172,12 +175,12 @@ companion tool "fix-qdf" can be used to repair hand-edited QDF |
| 172 | 175 | files. QDF is a feature specific to the qpdf tool. Please see |
| 173 | 176 | the "QDF Mode" chapter in the manual. |
| 174 | 177 | )"); |
| 175 | -ap.addOptionHelp("--no-original-object-ids", "transformation", "omit original object IDs in qdf", R"(Omit comments in a QDF file indicating the object ID an object | |
| 176 | -had in the original file. | |
| 177 | -)"); | |
| 178 | 178 | } |
| 179 | 179 | static void add_help_3(QPDFArgParser& ap) |
| 180 | 180 | { |
| 181 | +ap.addOptionHelp("--no-original-object-ids", "transformation", "omit original object IDs in qdf", R"(Omit comments in a QDF file indicating the object ID an object | |
| 182 | +had in the original file. | |
| 183 | +)"); | |
| 181 | 184 | ap.addOptionHelp("--compress-streams", "transformation", "compress uncompressed streams", R"(--compress-streams=[y|n] |
| 182 | 185 | |
| 183 | 186 | Setting --compress-streams=n prevents qpdf from compressing |
| ... | ... | @@ -188,9 +191,11 @@ ap.addOptionHelp("--decode-level", "transformation", "control which streams to u |
| 188 | 191 | |
| 189 | 192 | When uncompressing streams, control which types of compression |
| 190 | 193 | schemes should be uncompressed: |
| 191 | -- none: don't uncompress anything. This is the default with --json-output. | |
| 194 | +- none: don't uncompress anything. This is the default with | |
| 195 | + --json-output. | |
| 192 | 196 | - generalized: uncompress streams compressed with a |
| 193 | - general-purpose compression algorithm. This is the default. | |
| 197 | + general-purpose compression algorithm. This is the default | |
| 198 | + except when --json-output is given. | |
| 194 | 199 | - specialized: in addition to generalized, also uncompress |
| 195 | 200 | streams compressed with a special-purpose but non-lossy |
| 196 | 201 | compression scheme |
| ... | ... | @@ -290,13 +295,13 @@ from the resulting set, not based on the original page numbers. |
| 290 | 295 | ap.addHelpTopic("modification", "change parts of the PDF", R"(Modification options make systematic changes to certain parts of |
| 291 | 296 | the PDF, causing the PDF to render differently from the original. |
| 292 | 297 | )"); |
| 298 | +} | |
| 299 | +static void add_help_4(QPDFArgParser& ap) | |
| 300 | +{ | |
| 293 | 301 | ap.addOptionHelp("--pages", "modification", "begin page selection", R"(--pages file [--password=password] [page-range] [...] -- |
| 294 | 302 | |
| 295 | 303 | Run qpdf --help=page-selection for details. |
| 296 | 304 | )"); |
| 297 | -} | |
| 298 | -static void add_help_4(QPDFArgParser& ap) | |
| 299 | -{ | |
| 300 | 305 | ap.addOptionHelp("--collate", "modification", "collate with --pages", R"(--collate[=n] |
| 301 | 306 | |
| 302 | 307 | Collate rather than concatenate pages specified with --pages. |
| ... | ... | @@ -460,14 +465,14 @@ ap.addOptionHelp("--assemble", "encryption", "restrict document assembly", R"(-- |
| 460 | 465 | Enable/disable document assembly (rotation and reordering of |
| 461 | 466 | pages). This option is not available with 40-bit encryption. |
| 462 | 467 | )"); |
| 468 | +} | |
| 469 | +static void add_help_5(QPDFArgParser& ap) | |
| 470 | +{ | |
| 463 | 471 | ap.addOptionHelp("--extract", "encryption", "restrict text/graphic extraction", R"(--extract=[y|n] |
| 464 | 472 | |
| 465 | 473 | Enable/disable text/graphic extraction for purposes other than |
| 466 | 474 | accessibility. |
| 467 | 475 | )"); |
| 468 | -} | |
| 469 | -static void add_help_5(QPDFArgParser& ap) | |
| 470 | -{ | |
| 471 | 476 | ap.addOptionHelp("--form", "encryption", "restrict form filling", R"(--form=[y|n] |
| 472 | 477 | |
| 473 | 478 | Enable/disable whether filling form fields is allowed even if |
| ... | ... | @@ -638,6 +643,9 @@ ap.addOptionHelp("--remove-attachment", "attachments", "remove an embedded file" |
| 638 | 643 | Remove an embedded file using its key. Get the key with |
| 639 | 644 | --list-attachments. |
| 640 | 645 | )"); |
| 646 | +} | |
| 647 | +static void add_help_6(QPDFArgParser& ap) | |
| 648 | +{ | |
| 641 | 649 | ap.addHelpTopic("pdf-dates", "PDF date format", R"(When a date is required, the date should conform to the PDF date |
| 642 | 650 | format specification, which is "D:yyyymmddhhmmssz" where "z" is |
| 643 | 651 | either literally upper case "Z" for UTC or a timezone offset in |
| ... | ... | @@ -650,9 +658,6 @@ Examples: |
| 650 | 658 | - D:20210207161528-05'00' February 7, 2021 at 4:15:28 p.m. |
| 651 | 659 | - D:20210207211528Z February 7, 2021 at 21:15:28 UTC |
| 652 | 660 | )"); |
| 653 | -} | |
| 654 | -static void add_help_6(QPDFArgParser& ap) | |
| 655 | -{ | |
| 656 | 661 | ap.addHelpTopic("add-attachment", "attach (embed) files", R"(The options listed below appear between --add-attachment and its |
| 657 | 662 | terminating "--". |
| 658 | 663 | )"); |
| ... | ... | @@ -747,14 +752,14 @@ the linearization hint tables are correct. |
| 747 | 752 | )"); |
| 748 | 753 | ap.addOptionHelp("--show-linearization", "inspection", "show linearization hint tables", R"(Check and display all data in the linearization hint tables. |
| 749 | 754 | )"); |
| 755 | +} | |
| 756 | +static void add_help_7(QPDFArgParser& ap) | |
| 757 | +{ | |
| 750 | 758 | ap.addOptionHelp("--show-xref", "inspection", "show cross reference data", R"(Show the contents of the cross-reference table or stream (object |
| 751 | 759 | locations in the file) in a human-readable form. This is |
| 752 | 760 | especially useful for files with cross-reference streams, which |
| 753 | 761 | are stored in a binary format. |
| 754 | 762 | )"); |
| 755 | -} | |
| 756 | -static void add_help_7(QPDFArgParser& ap) | |
| 757 | -{ | |
| 758 | 763 | ap.addOptionHelp("--show-object", "inspection", "show contents of an object", R"(--show-object={trailer|obj[,gen]} |
| 759 | 764 | |
| 760 | 765 | Show the contents of the given object. This is especially useful |
| ... | ... | @@ -814,21 +819,20 @@ This option is repeatable. If given, only specified objects will |
| 814 | 819 | be shown in the "objects" key of the JSON output. Otherwise, all |
| 815 | 820 | objects will be shown. |
| 816 | 821 | )"); |
| 817 | -ap.addOptionHelp("--job-json-help", "json", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by | |
| 818 | ---job-json-file. | |
| 819 | -)"); | |
| 820 | 822 | ap.addOptionHelp("--json-stream-data", "json", "how to handle streams in json output", R"(--json-stream-data={none|inline|file} |
| 821 | 823 | |
| 822 | -Control whether streams in json output should be omitted, | |
| 823 | -written inline (base64-encoded) or written to a file. If "file" | |
| 824 | -is chosen, the file will be the name of the input file appended | |
| 825 | -with -nnn where nnn is the object number. The prefix can be | |
| 826 | -overridden with --json-stream-prefix. | |
| 824 | +When used with --json-output, this option controls whether | |
| 825 | +streams in json output should be omitted, written inline | |
| 826 | +(base64-encoded) or written to a file. If "file" is chosen, the | |
| 827 | +file will be the name of the output file appended with -nnn where | |
| 828 | +nnn is the object number. The prefix can be overridden with | |
| 829 | +--json-stream-prefix. | |
| 827 | 830 | )"); |
| 828 | 831 | ap.addOptionHelp("--json-stream-prefix", "json", "prefix for json stream data files", R"(--json-stream-prefix=file-prefix |
| 829 | 832 | |
| 830 | -When --json-stream-data=file is given, override the input file | |
| 831 | -name as the prefix for stream data files. Whatever is given here | |
| 833 | +When used with --json-output, --json-stream-data=file-prefix | |
| 834 | +sets the prefix for stream data files, overriding the default, | |
| 835 | +which is to use the output file name. Whatever is given here | |
| 832 | 836 | will be appended with -nnn to create the name of the file that |
| 833 | 837 | will contain the data for the stream stream in object nnn. |
| 834 | 838 | )"); |
| ... | ... | @@ -836,19 +840,19 @@ ap.addOptionHelp("--json-output", "json", "serialize to JSON", R"(--json-output[ |
| 836 | 840 | |
| 837 | 841 | The output file will be qpdf JSON format at the given version. |
| 838 | 842 | "version" may be a specific version or "latest" (the default). |
| 839 | -Version 1 is not supported. See also --json-stream-data, | |
| 843 | +The only supported version is 2. See also --json-stream-data, | |
| 840 | 844 | --json-stream-prefix, and --decode-level. |
| 841 | 845 | )"); |
| 842 | 846 | ap.addOptionHelp("--json-input", "json", "input file is qpdf JSON", R"(Treat the input file as a JSON file in qpdf JSON format as |
| 843 | -written by qpdf --json-output. See the "QPDF JSON Format" | |
| 847 | +written by qpdf --json-output. See the "qpdf JSON Format" | |
| 844 | 848 | section of the manual for information about how to use this |
| 845 | 849 | option. |
| 846 | 850 | )"); |
| 847 | 851 | ap.addOptionHelp("--update-from-json", "json", "update a PDF from qpdf JSON", R"(--update-from-json=qpdf-json-file |
| 848 | 852 | |
| 849 | -Update a PDF file from a JSON file. Please see the "QPDF JSON | |
| 850 | -Format" section of the manual for information about how to use | |
| 851 | -this option. | |
| 853 | +Update a PDF file from a JSON file. Please see the "qpdf JSON" | |
| 854 | +chapter of the manual for information about how to use this | |
| 855 | +option. | |
| 852 | 856 | )"); |
| 853 | 857 | } |
| 854 | 858 | static void add_help_8(QPDFArgParser& ap) | ... | ... |
manual/cli.rst
| ... | ... | @@ -171,7 +171,9 @@ Related Options |
| 171 | 171 | equivalent command-line arguments were supplied. It can be repeated |
| 172 | 172 | and mixed freely with other options. Run ``qpdf`` with |
| 173 | 173 | :qpdf:ref:`--job-json-help` for a description of the job JSON input |
| 174 | - file format. For more information, see :ref:`qpdf-job`. | |
| 174 | + file format. For more information, see :ref:`qpdf-job`. Note that | |
| 175 | + this is unrelated to :qpdf:ref:`--json` but may be combined with | |
| 176 | + it. | |
| 175 | 177 | |
| 176 | 178 | .. _exit-status: |
| 177 | 179 | |
| ... | ... | @@ -341,6 +343,17 @@ Related Options |
| 341 | 343 | itself. The default provider is always listed first. See |
| 342 | 344 | :ref:`crypto` for more information about crypto providers. |
| 343 | 345 | |
| 346 | +.. qpdf:option:: --job-json-help | |
| 347 | + | |
| 348 | + .. help: show format of job JSON | |
| 349 | + | |
| 350 | + Describe the format of the QPDFJob JSON input used by | |
| 351 | + --job-json-file. | |
| 352 | + | |
| 353 | + Describe the format of the QPDFJob JSON input used by | |
| 354 | + :qpdf:ref:`--job-json-file`. For more information about QPDFJob, | |
| 355 | + see :ref:`qpdf-job`. | |
| 356 | + | |
| 344 | 357 | .. _general-options: |
| 345 | 358 | |
| 346 | 359 | General Options |
| ... | ... | @@ -852,9 +865,11 @@ Related Options |
| 852 | 865 | |
| 853 | 866 | When uncompressing streams, control which types of compression |
| 854 | 867 | schemes should be uncompressed: |
| 855 | - - none: don't uncompress anything. This is the default with --json-output. | |
| 868 | + - none: don't uncompress anything. This is the default with | |
| 869 | + --json-output. | |
| 856 | 870 | - generalized: uncompress streams compressed with a |
| 857 | - general-purpose compression algorithm. This is the default. | |
| 871 | + general-purpose compression algorithm. This is the default | |
| 872 | + except when --json-output is given. | |
| 858 | 873 | - specialized: in addition to generalized, also uncompress |
| 859 | 874 | streams compressed with a special-purpose but non-lossy |
| 860 | 875 | compression scheme |
| ... | ... | @@ -875,7 +890,8 @@ Related Options |
| 875 | 890 | ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define |
| 876 | 891 | generalized filters as those to be used for general-purpose |
| 877 | 892 | compression or encoding, as opposed to filters specifically |
| 878 | - designed for image data. This is the default. | |
| 893 | + designed for image data. This is the default except when | |
| 894 | + :qpdf:ref:`--json-output` is given. | |
| 879 | 895 | |
| 880 | 896 | - :samp:`specialized`: in addition to generalized, decode streams |
| 881 | 897 | with supported non-lossy specialized filters; currently this is |
| ... | ... | @@ -3126,8 +3142,9 @@ Related Options |
| 3126 | 3142 | is usually but not always equal to the file name and is needed by |
| 3127 | 3143 | some of the other options. See also :ref:`attachments`. Note that |
| 3128 | 3144 | this option displays dates in PDF timestamp syntax. When attachment |
| 3129 | - information is included in json output (see :ref:`--json`), dates | |
| 3130 | - are shown in ISO-8601 format. | |
| 3145 | + information is included in json output in the ``"attachments"`` key | |
| 3146 | + (see :ref:`--json`), dates are shown (just within that object) in | |
| 3147 | + ISO-8601 format. | |
| 3131 | 3148 | |
| 3132 | 3149 | .. qpdf:option:: --show-attachment=key |
| 3133 | 3150 | |
| ... | ... | @@ -3169,14 +3186,11 @@ Related Options |
| 3169 | 3186 | |
| 3170 | 3187 | Generate a JSON representation of the file. This is described in |
| 3171 | 3188 | depth in :ref:`json`. The version parameter can be used to specify |
| 3172 | - which version of the qpdf JSON format should be output. The only | |
| 3173 | - supported value is ``1``, but it's possible that a new JSON output | |
| 3174 | - version will be added in a future version. You can also specify | |
| 3175 | - ``latest`` to use the latest JSON version. For backward | |
| 3176 | - compatibility, the default value will remain ``1`` until qpdf | |
| 3177 | - version 11, after which point it will become ``latest``. In all | |
| 3178 | - case, you can tell what version of the JSON output you have from | |
| 3179 | - the ``"version"`` key in the output. Use the | |
| 3189 | + which version of the qpdf JSON format should be output. The version | |
| 3190 | + number be a number or ``latest``. The default is ``latest``. As of | |
| 3191 | + qpdf 11, the latest version is ``2``. If you have code that reads | |
| 3192 | + qpdf JSON output, you can tell what version of the JSON output you | |
| 3193 | + have from the ``"version"`` key in the output. Use the | |
| 3180 | 3194 | :qpdf:ref:`--json-help` option to get a description of the JSON |
| 3181 | 3195 | object. |
| 3182 | 3196 | |
| ... | ... | @@ -3189,11 +3203,11 @@ Related Options |
| 3189 | 3203 | containing descriptive text. |
| 3190 | 3204 | |
| 3191 | 3205 | Describe the format of the JSON output by writing to standard |
| 3192 | - output a JSON object with the same structure with the same keys as | |
| 3193 | - the JSON generated by qpdf. In the output written by | |
| 3194 | - ``--json-help``, each key's value is a description of the key. The | |
| 3195 | - specific contract guaranteed by qpdf in its JSON representation is | |
| 3196 | - explained in more detail in the :ref:`json`. | |
| 3206 | + output a JSON object with the same structure as the JSON generated | |
| 3207 | + by qpdf. In the output written by ``--json-help``, each key's value | |
| 3208 | + is a description of the key. The specific contract guaranteed by | |
| 3209 | + qpdf in its JSON representation is explained in more detail in the | |
| 3210 | + :ref:`json`. | |
| 3197 | 3211 | |
| 3198 | 3212 | .. qpdf:option:: --json-key=key |
| 3199 | 3213 | |
| ... | ... | @@ -3216,53 +3230,50 @@ Related Options |
| 3216 | 3230 | be shown in the "objects" key of the JSON output. Otherwise, all |
| 3217 | 3231 | objects will be shown. |
| 3218 | 3232 | |
| 3219 | - This option is repeatable. If given, only specified objects will | |
| 3220 | - be shown in the "``objects``" key of the JSON output. Otherwise, all | |
| 3221 | - objects will be shown. | |
| 3222 | - | |
| 3223 | -.. qpdf:option:: --job-json-help | |
| 3224 | - | |
| 3225 | - .. help: show format of job JSON | |
| 3226 | - | |
| 3227 | - Describe the format of the QPDFJob JSON input used by | |
| 3228 | - --job-json-file. | |
| 3229 | - | |
| 3230 | - Describe the format of the QPDFJob JSON input used by | |
| 3231 | - :qpdf:ref:`--job-json-file`. For more information about QPDFJob, | |
| 3232 | - see :ref:`qpdf-job`. | |
| 3233 | + This option is repeatable. If given, only specified objects will be | |
| 3234 | + shown in the ``"objects"`` key of the JSON output. Otherwise, all | |
| 3235 | + objects will be shown. For qpdf JSON version 1, this also affects | |
| 3236 | + the ``"objectinfo"`` key, which is not present in version 2. This | |
| 3237 | + option may be used with :qpdf:ref:`--json` and also with | |
| 3238 | + :qpdf:ref:`--json-output`. | |
| 3233 | 3239 | |
| 3234 | 3240 | .. qpdf:option:: --json-stream-data={none|inline|file} |
| 3235 | 3241 | |
| 3236 | 3242 | .. help: how to handle streams in json output |
| 3237 | 3243 | |
| 3238 | - Control whether streams in json output should be omitted, | |
| 3239 | - written inline (base64-encoded) or written to a file. If "file" | |
| 3240 | - is chosen, the file will be the name of the input file appended | |
| 3241 | - with -nnn where nnn is the object number. The prefix can be | |
| 3242 | - overridden with --json-stream-prefix. | |
| 3243 | - | |
| 3244 | - Control whether streams in json output should be omitted, written | |
| 3245 | - inline (base64-encoded) or written to a file. If ``file`` is | |
| 3246 | - chosen, the file will be the name of the input file appended with | |
| 3247 | - :samp:`-{nnn}` where :samp:`{nnn}` is the object number. The prefix | |
| 3248 | - can be overridden with :qpdf:ref:`--json-stream-prefix`. This | |
| 3249 | - option only applies when used with :qpdf:ref:`--json-output`. | |
| 3244 | + When used with --json-output, this option controls whether | |
| 3245 | + streams in json output should be omitted, written inline | |
| 3246 | + (base64-encoded) or written to a file. If "file" is chosen, the | |
| 3247 | + file will be the name of the output file appended with -nnn where | |
| 3248 | + nnn is the object number. The prefix can be overridden with | |
| 3249 | + --json-stream-prefix. | |
| 3250 | + | |
| 3251 | + When used with :qpdf:ref:`--json-output`, this option controls | |
| 3252 | + whether streams in JSON output should be omitted, written inline | |
| 3253 | + (base64-encoded) or written to a file. If ``file`` is chosen, the | |
| 3254 | + file will be the name of the output file appended with | |
| 3255 | + :samp:`-{nnn}` where :samp:`{nnn}` is the object number. The stream | |
| 3256 | + data file prefix can be overridden with | |
| 3257 | + :qpdf:ref:`--json-stream-prefix`. This option only applies when | |
| 3258 | + used with :qpdf:ref:`--json-output`. | |
| 3250 | 3259 | |
| 3251 | 3260 | .. qpdf:option:: --json-stream-prefix=file-prefix |
| 3252 | 3261 | |
| 3253 | 3262 | .. help: prefix for json stream data files |
| 3254 | 3263 | |
| 3255 | - When --json-stream-data=file is given, override the input file | |
| 3256 | - name as the prefix for stream data files. Whatever is given here | |
| 3264 | + When used with --json-output, --json-stream-data=file-prefix | |
| 3265 | + sets the prefix for stream data files, overriding the default, | |
| 3266 | + which is to use the output file name. Whatever is given here | |
| 3257 | 3267 | will be appended with -nnn to create the name of the file that |
| 3258 | 3268 | will contain the data for the stream stream in object nnn. |
| 3259 | 3269 | |
| 3260 | - When :qpdf:ref:`--json-stream-data` is given with the value | |
| 3261 | - ``file``, override the input file name as the prefix for stream | |
| 3262 | - data files. Whatever is given here will be appended with | |
| 3263 | - :samp:`-{nnn}` to create the name of the file that will contain the | |
| 3264 | - data for the stream stream in object :samp:`{nnn}`. This | |
| 3265 | - option only applies when used with :qpdf:ref:`--json-output`. | |
| 3270 | + When used with :qpdf:ref:`--json-output`, | |
| 3271 | + ``--json-stream-data=file-prefix`` sets the prefix for stream data | |
| 3272 | + files, overriding the default, which is to use the output file | |
| 3273 | + name. Whatever is given here will be appended with :samp:`-{nnn}` | |
| 3274 | + to create the name of the file that will contain the data for the | |
| 3275 | + stream stream in object :samp:`{nnn}`. This option only applies | |
| 3276 | + when used with :qpdf:ref:`--json-output`. | |
| 3266 | 3277 | |
| 3267 | 3278 | .. qpdf:option:: --json-output[=version] |
| 3268 | 3279 | |
| ... | ... | @@ -3270,44 +3281,45 @@ Related Options |
| 3270 | 3281 | |
| 3271 | 3282 | The output file will be qpdf JSON format at the given version. |
| 3272 | 3283 | "version" may be a specific version or "latest" (the default). |
| 3273 | - Version 1 is not supported. See also --json-stream-data, | |
| 3284 | + The only supported version is 2. See also --json-stream-data, | |
| 3274 | 3285 | --json-stream-prefix, and --decode-level. |
| 3275 | 3286 | |
| 3276 | - The output file will be qpdf JSON format at the given version. | |
| 3277 | - ``version`` may be a specific version or ``latest`` (the default). | |
| 3278 | - Version 1 is not supported. See also :qpdf:ref:`--json-stream-data` | |
| 3279 | - and :qpdf:ref:`--json-stream-prefix`. The default decode level is | |
| 3280 | - ``none``, but you can override it with :qpdf:ref:`--decode-level`. | |
| 3281 | - If you want to look at the contents of streams easily as you would | |
| 3282 | - in QDF mode (see :ref:`qdf`), you can use | |
| 3283 | - ``--decode-level=generalized`` and ``--json-stream-data=file`` for | |
| 3284 | - a convenient way to do that. | |
| 3287 | + The output file, instead of being a PDF file, will be a JSON file | |
| 3288 | + in qpdf JSON format at the given version. ``version`` may be a | |
| 3289 | + specific version or ``latest`` (the default). The only supported | |
| 3290 | + version is 2. See also :qpdf:ref:`--json-stream-data` and | |
| 3291 | + :qpdf:ref:`--json-stream-prefix`. When this option is specified, | |
| 3292 | + the default decode level for stream data is ``none``, but you can | |
| 3293 | + override it with :qpdf:ref:`--decode-level`. If you want to look at | |
| 3294 | + the contents of streams easily as you would in QDF mode (see | |
| 3295 | + :ref:`qdf`), you can use ``--decode-level=generalized`` and | |
| 3296 | + ``--json-stream-data=file`` for a convenient way to do that. | |
| 3285 | 3297 | |
| 3286 | 3298 | .. qpdf:option:: --json-input |
| 3287 | 3299 | |
| 3288 | 3300 | .. help: input file is qpdf JSON |
| 3289 | 3301 | |
| 3290 | 3302 | Treat the input file as a JSON file in qpdf JSON format as |
| 3291 | - written by qpdf --json-output. See the "QPDF JSON Format" | |
| 3303 | + written by qpdf --json-output. See the "qpdf JSON Format" | |
| 3292 | 3304 | section of the manual for information about how to use this |
| 3293 | 3305 | option. |
| 3294 | 3306 | |
| 3295 | 3307 | Treat the input file as a JSON file in qpdf JSON format as written |
| 3296 | 3308 | by ``qpdf --json-output``. The input file must be complete and |
| 3297 | 3309 | include all stream data. For information about converting between |
| 3298 | - PDF and JSON, please see :ref:`qpdf-json`. | |
| 3310 | + PDF and JSON, please see :ref:`json`. | |
| 3299 | 3311 | |
| 3300 | 3312 | .. qpdf:option:: --update-from-json=qpdf-json-file |
| 3301 | 3313 | |
| 3302 | 3314 | .. help: update a PDF from qpdf JSON |
| 3303 | 3315 | |
| 3304 | - Update a PDF file from a JSON file. Please see the "QPDF JSON | |
| 3305 | - Format" section of the manual for information about how to use | |
| 3306 | - this option. | |
| 3316 | + Update a PDF file from a JSON file. Please see the "qpdf JSON" | |
| 3317 | + chapter of the manual for information about how to use this | |
| 3318 | + option. | |
| 3307 | 3319 | |
| 3308 | - This option updates a PDF file from a qpdf JSON file. For a | |
| 3309 | - information about how to use this option, please see | |
| 3310 | - :ref:`qpdf-json`. | |
| 3320 | + This option updates a PDF file from the specified qpdf JSON file. | |
| 3321 | + For a information about how to use this option, please see | |
| 3322 | + :ref:`json`. | |
| 3311 | 3323 | |
| 3312 | 3324 | .. _test-options: |
| 3313 | 3325 | |
| ... | ... | @@ -3420,7 +3432,7 @@ Related Options |
| 3420 | 3432 | |
| 3421 | 3433 | This is used by qpdf's test suite to check consistency between the |
| 3422 | 3434 | output of ``qpdf --json`` and the output of ``qpdf --json-help``. |
| 3423 | - This option causes an extra copy of the generated json to appear in | |
| 3435 | + This option causes an extra copy of the generated JSON to appear in | |
| 3424 | 3436 | memory and is therefore unsuitable for use with large files. This |
| 3425 | 3437 | is why it's also not on by default. |
| 3426 | 3438 | ... | ... |
manual/design.rst
| ... | ... | @@ -242,7 +242,7 @@ the current file position. If the token is a not either a dictionary or |
| 242 | 242 | array opener, an object is immediately constructed from the single token |
| 243 | 243 | and the parser returns. Otherwise, the parser iterates in a special mode |
| 244 | 244 | in which it accumulates objects until it finds a balancing closer. |
| 245 | -During this process, the "``R``" keyword is recognized and an indirect | |
| 245 | +During this process, the ``R`` keyword is recognized and an indirect | |
| 246 | 246 | ``QPDFObjectHandle`` may be constructed. |
| 247 | 247 | |
| 248 | 248 | The ``QPDF::resolve()`` method, which is used to resolve an indirect |
| ... | ... | @@ -280,15 +280,15 @@ file. |
| 280 | 280 | it is looking before the last ``%%EOF``. After getting to ``trailer`` |
| 281 | 281 | keyword, it invokes the parser. |
| 282 | 282 | |
| 283 | -- The parser sees "``<<``", so it calls itself recursively in | |
| 283 | +- The parser sees ``<<``, so it calls itself recursively in | |
| 284 | 284 | dictionary creation mode. |
| 285 | 285 | |
| 286 | 286 | - In dictionary creation mode, the parser keeps accumulating objects |
| 287 | - until it encounters "``>>``". Each object that is read is pushed onto | |
| 288 | - a stack. If "``R``" is read, the last two objects on the stack are | |
| 287 | + until it encounters ``>>``. Each object that is read is pushed onto | |
| 288 | + a stack. If ``R`` is read, the last two objects on the stack are | |
| 289 | 289 | inspected. If they are integers, they are popped off the stack and |
| 290 | 290 | their values are used to construct an indirect object handle which is |
| 291 | - then pushed onto the stack. When "``>>``" is finally read, the stack | |
| 291 | + then pushed onto the stack. When ``>>`` is finally read, the stack | |
| 292 | 292 | is converted into a ``QPDF_Dictionary`` which is placed in a |
| 293 | 293 | ``QPDFObjectHandle`` and returned. |
| 294 | 294 | ... | ... |
manual/json.rst
| 1 | +.. cSpell:ignore moddifyannotations | |
| 2 | +.. cSpell:ignore feff | |
| 3 | + | |
| 1 | 4 | .. _json: |
| 2 | 5 | |
| 3 | -QPDF JSON | |
| 6 | +qpdf JSON | |
| 4 | 7 | ========= |
| 5 | 8 | |
| 6 | 9 | .. _json-overview: |
| ... | ... | @@ -8,27 +11,540 @@ QPDF JSON |
| 8 | 11 | Overview |
| 9 | 12 | -------- |
| 10 | 13 | |
| 11 | -Beginning with qpdf version 8.3.0, the :command:`qpdf` | |
| 12 | -command-line program can produce a JSON representation of the | |
| 13 | -non-content data in a PDF file. It includes a dump in JSON format of all | |
| 14 | -objects in the PDF file excluding the content of streams. This JSON | |
| 15 | -representation makes it very easy to look in detail at the structure of | |
| 16 | -a given PDF file, and it also provides a great way to work with PDF | |
| 17 | -files programmatically from the command-line in languages that can't | |
| 18 | -call or link with the qpdf library directly. Note that stream data can | |
| 19 | -be extracted from PDF files using other qpdf command-line options. | |
| 14 | +Beginning with qpdf version 11.0.0, the qpdf library and command-line | |
| 15 | +program can produce a JSON representation of the in a PDF file. qpdf | |
| 16 | +version 11 introduces JSON format version 2. Prior to qpdf 11, | |
| 17 | +versions 8.3.0 onward had a more limited JSON representation | |
| 18 | +accessible only from the command-line. For details on what changed, | |
| 19 | +see :ref:`json-v2-changes`. The rest of this chapter documents qpdf | |
| 20 | +JSON version 2. | |
| 21 | + | |
| 22 | +Please note: this chapter discusses *qpdf JSON format*, which | |
| 23 | +represents the contents of a PDF file. This is distinct from the | |
| 24 | +*QPDFJob JSON format* which provides a higher-level interface | |
| 25 | +interacting with qpdf the way the command-line tool does. For | |
| 26 | +information about that, see :ref:`qpdf-job`. | |
| 27 | + | |
| 28 | +The qpdf JSON format is specific to qpdf. There are two ways to use | |
| 29 | +qpdf JSON: | |
| 30 | + | |
| 31 | +- The :qpdf:ref:`--json` command-ine flag causes creation of a JSON | |
| 32 | + representation of all the objects in a PDF file, excluding stream | |
| 33 | + data. This includes an unambiguous representation of the PDF object | |
| 34 | + structure and also provides JSON-formatted summaries of other | |
| 35 | + information about the file. This functionality is built into | |
| 36 | + ``QPDFJob`` and can be accessed from the ``qpdf`` command-line tool | |
| 37 | + or from the ``QPDFJob`` C or C++ API. | |
| 38 | + | |
| 39 | +- qpdf can create a JSON file that completely represents a PDF file. | |
| 40 | + You can think of this as using JSON as an *alternative syntax* for | |
| 41 | + representing a PDF file. Using qpdf JSON, it is possible to | |
| 42 | + convert a PDF file to JSON, manipulate the structure or contents of | |
| 43 | + the objects at a low level, and convert the results back to a PDF | |
| 44 | + file. This functionality can be accessed from the command-line with | |
| 45 | + the :qpdf:ref:`--json-output`, :qpdf:ref:`--json-input`, and | |
| 46 | + :qpdf:ref:`--update-from-json` flags, or from the API using the | |
| 47 | + ``QPDF::writeJSON``, ``QPDF::createFromJSON``, and | |
| 48 | + ``QPDF::updateFromJSON`` methods. | |
| 49 | + | |
| 50 | +.. _json-terminology: | |
| 51 | + | |
| 52 | +JSON Terminology | |
| 53 | +---------------- | |
| 54 | + | |
| 55 | +Notes about terminology: | |
| 56 | + | |
| 57 | +- In JavaScript and JSON, that thing that has keys and values is | |
| 58 | + typically called an *object*. | |
| 59 | + | |
| 60 | +- In PDF, that thing that has keys and values is typically called a | |
| 61 | + *dictionary*. An *object* is a PDF object such as integer, real, | |
| 62 | + boolean, null, string, array, dictionary, or stream. | |
| 63 | + | |
| 64 | +- Some languages that use JSON call an *object* a *dictionary*, a | |
| 65 | + *map*, or a *hash*. | |
| 66 | + | |
| 67 | +- Sometimes, it's called on *object* if it has fixed keys and a | |
| 68 | + *dictionary* if it has variable keys. | |
| 69 | + | |
| 70 | +This manual is not entirely consistent about its use of *dictionary* | |
| 71 | +vs. *object* because sometimes one term or another is clearer in | |
| 72 | +context. Just be aware of the ambiguity when reading the manual. We | |
| 73 | +frequently use the term *dictionary* to refer to a JSON object because | |
| 74 | +of the consistency with PDF terminology. | |
| 75 | + | |
| 76 | +.. _what-qpdf-json-is-not: | |
| 77 | + | |
| 78 | +What qpdf JSON is not | |
| 79 | +--------------------- | |
| 80 | + | |
| 81 | +Please note that qpdf JSON offers a convenient syntax for manipulating | |
| 82 | +PDF files at a low level using JSON syntax. JSON syntax is much easier | |
| 83 | +to work with than native PDF syntax, and there are good JSON libraries | |
| 84 | +in virtually every commonly used programming language. Working with | |
| 85 | +PDF objects in JSON removes the need to worry about stream lengths, | |
| 86 | +cross reference tables, and PDF-specific representations of Unicode or | |
| 87 | +binary strings that appear outside of content streams. It does not | |
| 88 | +eliminate the need to understand the semantic structure of PDF files. | |
| 89 | +Working with qpdf JSON still requires familiarity with the PDF | |
| 90 | +specification. | |
| 91 | + | |
| 92 | +In particular, qpdf JSON *does not* provide any of the following | |
| 93 | +capabilities: | |
| 94 | + | |
| 95 | +- Text extraction. While you could use qpdf JSON syntax to navigate to | |
| 96 | + a page's content streams and font structures, text within pages is | |
| 97 | + still encoded using PDF syntax within content streams, and there is | |
| 98 | + no assistance for text extraction. | |
| 99 | + | |
| 100 | +- Reflowing text, document structure. qpdf JSON does not add any new | |
| 101 | + information or insight into the content of PDF files. If you have a | |
| 102 | + PDF file that lacks any structural information, qpdf JSON won't help | |
| 103 | + you solve any of those problems. | |
| 104 | + | |
| 105 | +This is what we mean when we say that JSON provides a *alternative | |
| 106 | +syntax* for working with PDF data. Semantically, it is identical to | |
| 107 | +native PDF. | |
| 20 | 108 | |
| 21 | 109 | .. _qpdf-json: |
| 22 | 110 | |
| 23 | -QPDF JSON Format | |
| 111 | +qpdf JSON Format | |
| 24 | 112 | ---------------- |
| 25 | 113 | |
| 26 | -XXX Write this. | |
| 114 | +This section describes how qpdf represents PDF objects in JSON format. | |
| 115 | +It also describes how to work with qpdf JSON to create or | |
| 116 | +modify PDF files. | |
| 117 | + | |
| 118 | +.. _json.objects: | |
| 119 | + | |
| 120 | +qpdf JSON Object Representation | |
| 121 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| 122 | + | |
| 123 | +This section describes the representation of PDF objects in qpdf JSON | |
| 124 | +version 2. PDF objects are represented within the ``"objects"`` | |
| 125 | +dictionary of a qpdf JSON file. This is true both for PDF serialized | |
| 126 | +to JSON (:qpdf:ref:`--json-output`, ``QPDF::writeJSON``) or objects as | |
| 127 | +they appear in the output of ``qpdf`` with the :qpdf:ref:`--json` | |
| 128 | +option. | |
| 129 | + | |
| 130 | +Each key in the ``"objects"`` dictionary is either ``"trailer"`` or a | |
| 131 | +string of the form ``"obj:O G R"`` where ``O`` and ``G`` are the | |
| 132 | +object and generation numbers and ``R`` is the literal string ``R``. | |
| 133 | +This is the PDF syntax for the indirect object reference prepended by | |
| 134 | +``obj:``. The value, representing the object itself, is a JSON object | |
| 135 | +whose structure is described below. | |
| 136 | + | |
| 137 | +Top-level Stream Objects | |
| 138 | + Stream objects are represented as a JSON object with the single key | |
| 139 | + ``"stream"``. The stream object has a key called ``"dict"`` whose | |
| 140 | + value is the stream dictionary as an object value (described below) | |
| 141 | + with the ``"/Length"`` key omitted. Other keys are determined by the | |
| 142 | + value for json stream data (:qpdf:ref:`--json-stream-data`, or a | |
| 143 | + parameter of type ``qpdf_json_stream_data_e``) as follows: | |
| 144 | + | |
| 145 | + - ``none``: stream data is not represented; no other keys are | |
| 146 | + present | |
| 147 | + | |
| 148 | + - ``inline``: the stream data appears as a base64-encoded string as | |
| 149 | + the value of the ``"data"`` key | |
| 150 | + | |
| 151 | + - ``file``: the stream data is written to a file, and the path to | |
| 152 | + the file is stored in the ``"datafile"`` key. A relative path is | |
| 153 | + interpreted as relative to the current directory when qpdf is | |
| 154 | + invoked. | |
| 155 | + | |
| 156 | + Keys other than ``"dict"``, ``"data"``, and ``"datafile"`` are | |
| 157 | + ignored. This is primarily for future compatibility in case a newer | |
| 158 | + version of qpdf includes additional information. | |
| 159 | + | |
| 160 | + As with the native PDF representation, the stream data must be | |
| 161 | + consistent with whatever filters and decode parameters are specified | |
| 162 | + in the stream dictionary. | |
| 163 | + | |
| 164 | +Top-level Non-stream Objects | |
| 165 | + Non-stream objects are represented as a dictionary with the single | |
| 166 | + key ``"value"``. Other keys are ignored for future compatibility. | |
| 167 | + The value's structure is described in "Object Values" below. | |
| 168 | + | |
| 169 | + Note: in files that use object streams, the trailer "dictionary" is | |
| 170 | + actually a stream, but in the JSON representation, the value of the | |
| 171 | + ``"trailer"`` key is always written as a dictionary (with a | |
| 172 | + ``"value"`` key like other non-stream objects). There will also be a | |
| 173 | + a stream object whose key is the object ID of the cross-reference | |
| 174 | + stream, even though this stream will generally be unreferenced. This | |
| 175 | + makes it possible to assume ``"trailer"`` points to a dictionary | |
| 176 | + without having to consider whether the file uses object streams or | |
| 177 | + not. It is also consistent with how ``QPDF::getTrailer`` behaves in | |
| 178 | + the C++ API. | |
| 179 | + | |
| 180 | +Object Values | |
| 181 | + Within ``"value"`` or ``"stream"."dict"``, PDF objects are | |
| 182 | + represented as follows: | |
| 183 | + | |
| 184 | + - Objects of type Boolean or null are represented as JSON objects of | |
| 185 | + the same type. | |
| 186 | + | |
| 187 | + - Objects that are numeric are represented as numeric in the JSON | |
| 188 | + without regard to precision. Internally, qpdf stores numeric | |
| 189 | + values as strings, so qpdf will preserve arbitrary precision | |
| 190 | + numerical values when reading and writing JSON. It is likely that | |
| 191 | + other JSON readers and writers will have implementation-dependent | |
| 192 | + ways of handling numerical values that are out of range. | |
| 193 | + | |
| 194 | + - Name objects are represented as JSON strings that start with ``/`` | |
| 195 | + and are followed by the PDF name in canonical form with all PDF | |
| 196 | + syntax resolved. For example, the name whose canonical form (per | |
| 197 | + the PDF specification) is ``text/plain`` would be represented in | |
| 198 | + JSON as ``"/text/plain"`` and in PDF as ``"/text#2fplain"``. | |
| 199 | + | |
| 200 | + - Indirect object references are represented as JSON strings that | |
| 201 | + look like a PDF indirect object reference and have the form ``"O G | |
| 202 | + R"`` where ``O`` and ``G`` are the object and generation numbers | |
| 203 | + and ``R`` is the literal string ``R``. For example, ``"3 0 R"`` | |
| 204 | + would represent a reference to the object with object ID 3 and | |
| 205 | + generation 0. | |
| 206 | + | |
| 207 | + - PDF strings are represented as JSON strings in one of two ways: | |
| 208 | + | |
| 209 | + - ``"u:utf8-encoded-string"``: this format is used when the PDF | |
| 210 | + string can be unambiguously represented as a Unicode string and | |
| 211 | + contains no unprintable characters. This is the case whether the | |
| 212 | + input string is encoded as UTF-16, UTF-8 (as allowed by PDF | |
| 213 | + 2.0), or PDF doc encoding. Strings are only represented this way | |
| 214 | + if they can be encoded without loss of information. | |
| 215 | + | |
| 216 | + - ``"b:hex-string"``: this format is used to represent any binary | |
| 217 | + string value that can't be represented as a Unicode string. | |
| 218 | + ``hex-string`` must have an even number of characters that range | |
| 219 | + from ``a`` through ``f``, ``A`` through ``F``, or ``0`` through | |
| 220 | + ``9``. | |
| 221 | + | |
| 222 | + qpdf writes empty strings as ``"u:"``, but both ``"b:"`` and | |
| 223 | + ``"u:"`` are valid representations of the empty string. | |
| 224 | + | |
| 225 | + There is full support for UTF-16 surrogate pairs. Binary strings | |
| 226 | + encoded with ``"b:..."`` are the internal PDF representations. | |
| 227 | + As such, the following are equivalent: | |
| 228 | + | |
| 229 | + - ``"u:\ud83e\udd54"`` -- representation of U+1F954 as a surrogate | |
| 230 | + pair in JSON syntax | |
| 231 | + | |
| 232 | + - ``"b:FEFFD83EDD54"`` -- representation of U+1F954 as the bytes | |
| 233 | + of a UTF-16 string in PDF syntax with the leading ``FEFF`` | |
| 234 | + indicating UTF-16 | |
| 235 | + | |
| 236 | + - ``"b:efbbbff09fa594"`` -- representation of U+1F954 as the | |
| 237 | + bytes of a UTF-8 string in PDF syntax (as allowed by PDF 2.0) | |
| 238 | + with the leading ``EF``, ``BB``, ``BF`` sequence (which is just | |
| 239 | + UTF-8 encoding of ``FEFF``). | |
| 240 | + | |
| 241 | + - A JSON string whose contents are ``u:`` followed by the UTF-8 | |
| 242 | + representation of U+1F954. This is the potato emoji. | |
| 243 | + Unfortunately, I am not able to render it in the PDF version | |
| 244 | + of this manual. | |
| 245 | + | |
| 246 | + - PDF arrays are represented as JSON arrays of objects as described | |
| 247 | + above | |
| 248 | + | |
| 249 | + - PDF dictionaries are represented as JSON objects whose keys are | |
| 250 | + the string representations of names and whose values are | |
| 251 | + representations of PDF objects. | |
| 252 | + | |
| 253 | +.. _json.output: | |
| 254 | + | |
| 255 | +qpdf JSON Output | |
| 256 | +~~~~~~~~~~~~~~~~ | |
| 257 | + | |
| 258 | +The format of the JSON written by qpdf's :qpdf:ref:`--json-output` | |
| 259 | +flag or the ``QPDF::writeJSON`` API call is a JSON object consisting | |
| 260 | +of a single key: ``"qpdf-v2"``. Any other top-level keys are ignored. | |
| 261 | +While unknown keys in other places are ignored for future | |
| 262 | +compatibility, in this case, ignoring other top-level keys is an | |
| 263 | +explicit decision to allow users to include other keys for their own | |
| 264 | +use. No new top-level keys will be added in JSON version 2. | |
| 265 | + | |
| 266 | +The ``"qpdf-v2"`` key points to a JSON object with the following keys: | |
| 267 | + | |
| 268 | +- ``"pdfversion"`` -- a string containing PDF version as indicated in | |
| 269 | + the PDF header (e.g. ``"1.7"``, ``"2.0"``) | |
| 270 | + | |
| 271 | +- ``"maxobjectid"`` -- a number indicating the object ID of the | |
| 272 | + highest numbered object in the file. This is provided to make it | |
| 273 | + easier for software that wants to add new objects to the file as you | |
| 274 | + can safely start with one above that number when creating new | |
| 275 | + objects. Note that the value of ``"maxobjectid"`` may be higher than | |
| 276 | + the actual maximum object that appears in the input PDF since it | |
| 277 | + takes into consideration any dangling indirect object references | |
| 278 | + from the original file. This prevents you from unwittingly creating | |
| 279 | + an object that doesn't exist but that is referenced, which may have | |
| 280 | + unintended side effects. (The PDF specification explicitly allows | |
| 281 | + dangling references and says to treat them as nulls. This can happen | |
| 282 | + if objects are removed from a PDF file.) | |
| 283 | + | |
| 284 | +- ``"objects"`` -- the actual PDF objects as described in | |
| 285 | + :ref:`json.objects`. | |
| 286 | + | |
| 287 | +Note that writing JSON output is done by ``QPDF``, not ``QPDFWriter``. | |
| 288 | +As such, none of the things ``QPDFWriter`` does apply. This includes | |
| 289 | +recompression of streams, renumbering of objects, anything to do with | |
| 290 | +object streams (which are not represented by qpdf JSON at all since | |
| 291 | +they are PDF syntax, not semantics), encryption, decryption, | |
| 292 | +linearization, QDF mode, etc. | |
| 293 | + | |
| 294 | +.. _json.example: | |
| 295 | + | |
| 296 | +qpdf JSON Example | |
| 297 | +~~~~~~~~~~~~~~~~~ | |
| 298 | + | |
| 299 | +The JSON below shows an example of a simple PDF file represented in | |
| 300 | +qpdf JSON format. | |
| 301 | + | |
| 302 | +.. code-block:: json | |
| 303 | + | |
| 304 | + { | |
| 305 | + "qpdf-v2": { | |
| 306 | + "pdfversion": "1.3", | |
| 307 | + "maxobjectid": 5, | |
| 308 | + "objects": { | |
| 309 | + "obj:1 0 R": { | |
| 310 | + "value": { | |
| 311 | + "/Pages": "2 0 R", | |
| 312 | + "/Type": "/Catalog" | |
| 313 | + } | |
| 314 | + }, | |
| 315 | + "obj:2 0 R": { | |
| 316 | + "value": { | |
| 317 | + "/Count": 1, | |
| 318 | + "/Kids": [ "3 0 R" ], | |
| 319 | + "/Type": "/Pages" | |
| 320 | + } | |
| 321 | + }, | |
| 322 | + "obj:3 0 R": { | |
| 323 | + "value": { | |
| 324 | + "/Contents": "4 0 R", | |
| 325 | + "/MediaBox": [ 0, 0, 612, 792 ], | |
| 326 | + "/Parent": "2 0 R", | |
| 327 | + "/Resources": { | |
| 328 | + "/Font": { | |
| 329 | + "/F1": "5 0 R" | |
| 330 | + } | |
| 331 | + }, | |
| 332 | + "/Type": "/Page" | |
| 333 | + } | |
| 334 | + }, | |
| 335 | + "obj:4 0 R": { | |
| 336 | + "stream": { | |
| 337 | + "data": "eJxzCuFSUNB3M1QwMlEISQOyzY2AyEAhJAXI1gjIL0ksyddUCMnicg3hAgDLAQnI", | |
| 338 | + "dict": { | |
| 339 | + "/Filter": "/FlateDecode" | |
| 340 | + } | |
| 341 | + } | |
| 342 | + }, | |
| 343 | + "obj:5 0 R": { | |
| 344 | + "value": { | |
| 345 | + "/BaseFont": "/Helvetica", | |
| 346 | + "/Encoding": "/WinAnsiEncoding", | |
| 347 | + "/Subtype": "/Type1", | |
| 348 | + "/Type": "/Font" | |
| 349 | + } | |
| 350 | + }, | |
| 351 | + "trailer": { | |
| 352 | + "value": { | |
| 353 | + "/ID": [ | |
| 354 | + "b:98b5a26966fba4d3a769b715b2558da6", | |
| 355 | + "b:98b5a26966fba4d3a769b715b2558da6" | |
| 356 | + ], | |
| 357 | + "/Root": "1 0 R", | |
| 358 | + "/Size": 6 | |
| 359 | + } | |
| 360 | + } | |
| 361 | + } | |
| 362 | + } | |
| 363 | + } | |
| 364 | + | |
| 365 | +.. _json.input: | |
| 366 | + | |
| 367 | +qpdf JSON Input | |
| 368 | +~~~~~~~~~~~~~~~ | |
| 369 | + | |
| 370 | +Output in the JSON output format described in :ref:`json.output` can | |
| 371 | +be used in two different ways: | |
| 372 | + | |
| 373 | +- By using the :qpdf:ref:`--json-input` flag or calling | |
| 374 | + ``QPDF::createFromJSON`` in place of ``QPDF::processFile``, a qpdf | |
| 375 | + JSON file can be used in place of a PDF file as the input to qpdf. | |
| 376 | + | |
| 377 | +- By using the :qpdf:ref:`--update-from-json` flag or calling | |
| 378 | + ``QPDF::updateFromJSON`` on an initialized ``QPDF`` object, a qpdf | |
| 379 | + JSON file can be used to apply changes to an existing ``QPDF`` | |
| 380 | + object. That ``QPDF`` object can have come from any source including | |
| 381 | + a PDF file, a qpdf JSON file, or the result of any other process | |
| 382 | + that results in a valid, initialized ``QPDF`` object. | |
| 383 | + | |
| 384 | +Here are some important things to know about qpdf JSON input. | |
| 385 | + | |
| 386 | +- When a qpdf JSON file is used as the primary input file, it must be | |
| 387 | + complete. This means | |
| 388 | + | |
| 389 | + - A PDF version number must be specified with the ``"pdfversion"`` | |
| 390 | + key | |
| 391 | + | |
| 392 | + - Stream data must be present for all streams | |
| 393 | + | |
| 394 | + - The trailer dictionary must be present, though only the | |
| 395 | + ``"/Root"`` key is required. | |
| 396 | + | |
| 397 | +- Certain fields from the input are ignored whether creating or | |
| 398 | + updating from a JSON file: | |
| 399 | + | |
| 400 | + - ``"maxobjectid"`` is ignored, so it is not necessary to update it | |
| 401 | + when adding new objects. | |
| 402 | + | |
| 403 | + - ``"/Length"`` is ignored in all stream dictionaries. qpdf doesn't | |
| 404 | + put it there when it creates JSON output, and it is not necessary | |
| 405 | + to add it. | |
| 406 | + | |
| 407 | + - ``"/Size"`` is ignored if it appears in a trailer dictionary as | |
| 408 | + that is always recomputed by ``QPDFWriter``. | |
| 409 | + | |
| 410 | + - Unknown keys at the to top level of the file, within ``objects``, | |
| 411 | + at the top level of each individual object (inside the object that | |
| 412 | + has the ``"value"`` or ``"stream"`` key) and directly within | |
| 413 | + ``"stream"`` are ignored for future compatibility. You should | |
| 414 | + avoid putting your own values in those places if you wish to avoid | |
| 415 | + risking that your JSON files will not work in future versions of | |
| 416 | + qpdf. The exception to this advice is at the top level of the | |
| 417 | + overall file where it is explicitly supported for you to add your | |
| 418 | + own keys. For example, you could add your own metadata at the top | |
| 419 | + level, and qpdf will ignore it. Note that extra top-level keys are | |
| 420 | + not preserved when qpdf reads your JSON file. | |
| 421 | + | |
| 422 | +- When qpdf reads a PDF file, the internal object numbers are always | |
| 423 | + preserved. However, when qpdf writes a file using ``QPDFWriter``, | |
| 424 | + ``QPDFWriter`` does its own numbering and, in general, does not | |
| 425 | + preserve input object numbers. That means that a qpdf JSON file that | |
| 426 | + is used to update an existing PDF must have object numbers that | |
| 427 | + match the input file it is modifying. In practical terms, this means | |
| 428 | + that you can't use a JSON file created from one PDF file to modify | |
| 429 | + the *output of running qpdf on that file*. | |
| 430 | + | |
| 431 | + To put this more concretely, the following is valid: | |
| 432 | + | |
| 433 | + :: | |
| 434 | + | |
| 435 | + qpdf --json-output in.pdf pdf.json | |
| 436 | + # edit pdf.json | |
| 437 | + qpdf in.pdf out.pdf --update-from-json=pdf.json | |
| 438 | + | |
| 439 | + The following will not produce predictable results because | |
| 440 | + ``out.pdf`` won't have the same object numbers as ``pdf.json`` and | |
| 441 | + ``in.pdf``. | |
| 442 | + | |
| 443 | + :: | |
| 444 | + | |
| 445 | + qpdf --json-output in.pdf pdf.json | |
| 446 | + # edit pdf.json | |
| 447 | + qpdf in.pdf out.pdf --update-from-json=pdf.json | |
| 448 | + # edit pdf.json again | |
| 449 | + # Don't do this | |
| 450 | + qpdf out.pdf out2.pdf --update-from-json=pdf.json | |
| 451 | + | |
| 452 | +- When updating from a JSON file (:qpdf:ref:`--update-from-json`, | |
| 453 | + ``QPDF::updateFromJSON``), existing objects are updated in place. | |
| 454 | + This has the following implications: | |
| 455 | + | |
| 456 | + - You may omit both ``"data"`` and ``"datafile"`` if the object you | |
| 457 | + are updating is already a stream. In that case the original stream | |
| 458 | + data is preserved. You must always provide a stream dictionary, | |
| 459 | + but it may be empty. Note that an empty stream dictionary will | |
| 460 | + clear the old dictionary. There is no way to indicate that an old | |
| 461 | + stream dictionary should be left alone, so if your intention is to | |
| 462 | + replace the stream data and preserve the dictionary, the | |
| 463 | + original dictionary must appear in the JSON file. | |
| 464 | + | |
| 465 | + - You can change one object type to another object type including | |
| 466 | + replacing a stream with a non-stream or a non-stream with a | |
| 467 | + stream. If you replace a non-stream with a stream, you must | |
| 468 | + provide data for the stream. | |
| 469 | + | |
| 470 | + - Objects that you do not wish to modify can be omitted from the | |
| 471 | + JSON. That includes the trailer. That means you can use the output | |
| 472 | + of a qpdf JSON file that was written using | |
| 473 | + :qpdf:ref:`--json-object` to have it include only the objects you | |
| 474 | + intend to modify. | |
| 475 | + | |
| 476 | + - You can omit the ``"pdfversion"`` key. The input PDF version will | |
| 477 | + be preserved. | |
| 478 | + | |
| 479 | +.. _json.workflow-cli: | |
| 480 | + | |
| 481 | +qpdf JSON Workflow: CLI | |
| 482 | +~~~~~~~~~~~~~~~~~~~~~~~ | |
| 483 | + | |
| 484 | +This section includes a few examples of using qpdf JSON. | |
| 485 | + | |
| 486 | +- Convert a PDF file to JSON format, edit the JSON, and convert back | |
| 487 | + to PDF. This is an alternative to using QDF mode (see :ref:`qdf`) to | |
| 488 | + modify PDF files in a text editor. Each method has its own | |
| 489 | + advantages and disadvantages. | |
| 490 | + | |
| 491 | + :: | |
| 492 | + | |
| 493 | + qpdf --json-output in.pdf pdf.json | |
| 494 | + # edit pdf.json | |
| 495 | + qpdf --json-input pdf.json out.pdf | |
| 496 | + | |
| 497 | +- Extract only a specific object into a JSON file, modify the object | |
| 498 | + in JSON, and use the modified object to update the original PDF. In | |
| 499 | + this case, we're editing object 4, whatever that may happen to be. | |
| 500 | + You would have to know through some other means which object you | |
| 501 | + wanted to edit, such as by looking at other JSON output or using a | |
| 502 | + tool (possibly but not necessarily qpdf) to identify the object. | |
| 503 | + | |
| 504 | + :: | |
| 505 | + | |
| 506 | + qpdf --json-output in.pdf pdf.json --json-object=4,0 | |
| 507 | + # edit pdf.json | |
| 508 | + qpdf in.pdf --update-from-json=pdf.json out.pdf | |
| 509 | + | |
| 510 | + Rather than using :qpdf:ref:`--json-object` as in the above example, | |
| 511 | + you could edit the JSON file to remove the objects you didn't need. | |
| 512 | + You could also just leave them there, though the update process | |
| 513 | + would be slower. | |
| 514 | + | |
| 515 | + You could also add new objects to a file by adding them to | |
| 516 | + ``pdf.json``. Just be sure the object number doesn't conflict with | |
| 517 | + an existing object. The ``"maxobjectid"`` field in the original | |
| 518 | + output can help with this. You don't have to update it if you add | |
| 519 | + objects as it is ignored when the file is read back in. | |
| 520 | + | |
| 521 | +- Use :qpdf:ref:`--json-input` and :qpdf:ref:`--json-output` together | |
| 522 | + to demonstrate preservation of object numbers. In this example, | |
| 523 | + ``a.json`` and ``b.json`` will have the same objects and object | |
| 524 | + numbers. The files may not be identical since strings may be | |
| 525 | + normalized, fields may appear in a different order, etc. However | |
| 526 | + ``b.json`` and ``c.json`` are probably identical. | |
| 527 | + | |
| 528 | + :: | |
| 529 | + | |
| 530 | + qpdf --json-output in.pdf a.json | |
| 531 | + qpdf --json-input --json-output a.json b.json | |
| 532 | + qpdf --json-input --json-output b.json c.json | |
| 533 | + | |
| 534 | + | |
| 535 | +.. _json.workflow-api: | |
| 536 | + | |
| 537 | +qpdf JSON Workflow: API | |
| 538 | +~~~~~~~~~~~~~~~~~~~~~~~ | |
| 539 | + | |
| 540 | +Everything that can be done using the qpdf CLI can be done using the | |
| 541 | +C++ API. See comments in :file:`QPDF.hh` for ``writeJSON``, | |
| 542 | +``createFromJSON``, and ``updateFromJSON`` for details. | |
| 27 | 543 | |
| 28 | 544 | .. _json-guarantees: |
| 29 | 545 | |
| 30 | -JSON Guarantees | |
| 31 | ---------------- | |
| 546 | +JSON Compatibility Guarantees | |
| 547 | +----------------------------- | |
| 32 | 548 | |
| 33 | 549 | The qpdf JSON representation includes a JSON serialization of the raw |
| 34 | 550 | objects in the PDF file as well as some computed information in a more |
| ... | ... | @@ -37,24 +553,23 @@ format. These guarantees are designed to simplify the experience of a |
| 37 | 553 | developer working with the JSON format. |
| 38 | 554 | |
| 39 | 555 | Compatibility |
| 40 | - The top-level JSON object output is a dictionary. The JSON output | |
| 41 | - contains various nested dictionaries and arrays. With the exception | |
| 42 | - of dictionaries that are populated by the fields of objects from the | |
| 43 | - file, all instances of a dictionary are guaranteed to have exactly | |
| 44 | - the same keys. Future versions of qpdf are free to add additional | |
| 45 | - keys but not to remove keys or change the type of object that a key | |
| 46 | - points to. The qpdf program validates this guarantee, and in the | |
| 47 | - unlikely event that a bug in qpdf should cause it to generate data | |
| 48 | - that doesn't conform to this rule, it will ask you to file a bug | |
| 49 | - report. | |
| 50 | - | |
| 51 | - The top-level JSON structure contains a "``version``" key whose value | |
| 52 | - is simple integer. The value of the ``version`` key will be | |
| 556 | + The top-level JSON object is a dictionary (JSON "object"). The JSON | |
| 557 | + output contains various nested dictionaries and arrays. With the | |
| 558 | + exception of dictionaries that are populated by the fields of | |
| 559 | + PDF objects from the file, all instances of a dictionary are | |
| 560 | + guaranteed to have exactly the same keys. | |
| 561 | + | |
| 562 | + The top-level JSON structure contains a ``"version"`` key whose | |
| 563 | + value is simple integer. The value of the ``version`` key will be | |
| 53 | 564 | incremented if a non-compatible change is made. A non-compatible |
| 54 | 565 | change would be any change that involves removal of a key, a change |
| 55 | - to the format of data pointed to by a key, or a semantic change that | |
| 56 | - requires a different interpretation of a previously existing key. A | |
| 57 | - strong effort will be made to avoid breaking compatibility. | |
| 566 | + to the format of data pointed to by a key, or a semantic change | |
| 567 | + that requires a different interpretation of a previously existing | |
| 568 | + key. | |
| 569 | + | |
| 570 | + With a specific qpdf JSON version, future versions of qpdf are free | |
| 571 | + to add additional keys but not to remove keys or change the type of | |
| 572 | + object that a key points to. | |
| 58 | 573 | |
| 59 | 574 | Documentation |
| 60 | 575 | The :command:`qpdf` command can be invoked with the |
| ... | ... | @@ -66,28 +581,29 @@ Documentation |
| 66 | 581 | |
| 67 | 582 | - A dictionary in the help output means that the corresponding |
| 68 | 583 | location in the actual JSON output is also a dictionary with |
| 69 | - exactly the same keys; that is, no keys present in help are absent | |
| 70 | - in the real output, and no keys will be present in the real output | |
| 71 | - that are not in help. As a special case, if the dictionary has a | |
| 72 | - single key whose name starts with ``<`` and ends with ``>``, it | |
| 73 | - means that the JSON output is a dictionary that can have any keys, | |
| 74 | - each of which conforms to the value of the special key. This is | |
| 75 | - used for cases in which the keys of the dictionary are things like | |
| 76 | - object IDs. | |
| 584 | + exactly the same keys; that is, no keys present in help are | |
| 585 | + absent in the real output, and no keys will be present in the | |
| 586 | + real output that are not in help. It is possible for a key to be | |
| 587 | + present and have a value that is explicitly ``null``. As a | |
| 588 | + special case, if the dictionary has a single key whose name | |
| 589 | + starts with ``<`` and ends with ``>``, it means that the JSON | |
| 590 | + output is a dictionary that can have any value as a key. This is | |
| 591 | + used for cases in which the keys of the dictionary are things | |
| 592 | + like object IDs. | |
| 77 | 593 | |
| 78 | 594 | - A string in the help output is a description of the item that |
| 79 | 595 | appears in the corresponding location of the actual output. The |
| 80 | - corresponding output can have any format. | |
| 596 | + corresponding output can have any value including ``null``. | |
| 81 | 597 | |
| 82 | 598 | - An array in the help output always contains a single element. It |
| 83 | 599 | indicates that the corresponding location in the actual output is |
| 84 | - also an array, and that each element of the array has whatever | |
| 85 | - format is implied by the single element of the help output's | |
| 86 | - array. | |
| 600 | + an array of any length, and that each element of the array has | |
| 601 | + whatever format is implied by the single element of the help | |
| 602 | + output's array. | |
| 87 | 603 | |
| 88 | - For example, the help output indicates includes a "``pagelabels``" | |
| 604 | + For example, the help output indicates includes a ``"pagelabels"`` | |
| 89 | 605 | key whose value is an array of one element. That element is a |
| 90 | - dictionary with keys "``index``" and "``label``". In addition to | |
| 606 | + dictionary with keys ``"index"`` and ``"label"``. In addition to | |
| 91 | 607 | describing the meaning of those keys, this tells you that the actual |
| 92 | 608 | JSON output will contain a ``pagelabels`` array, each of whose |
| 93 | 609 | elements is a dictionary that contains an ``index`` key, a ``label`` |
| ... | ... | @@ -95,56 +611,13 @@ Documentation |
| 95 | 611 | |
| 96 | 612 | Directness and Simplicity |
| 97 | 613 | The JSON output contains the value of every object in the file, but |
| 98 | - it also contains some processed data. This is analogous to how qpdf's | |
| 99 | - library interface works. The processed data is similar to the helper | |
| 100 | - functions in that it allows you to look at certain aspects of the PDF | |
| 101 | - file without having to understand all the nuances of the PDF | |
| 614 | + it also contains some summary data. This is analogous to how qpdf's | |
| 615 | + library interface works. The summary data is similar to the helper | |
| 616 | + functions in that it allows you to look at certain aspects of the | |
| 617 | + PDF file without having to understand all the nuances of the PDF | |
| 102 | 618 | specification, while the raw objects allow you to mine the PDF for |
| 103 | 619 | anything that the higher-level interfaces are lacking. |
| 104 | 620 | |
| 105 | -.. _json.limitations: | |
| 106 | - | |
| 107 | -Limitations of JSON Representation | |
| 108 | ----------------------------------- | |
| 109 | - | |
| 110 | -There are a few limitations to be aware of with the JSON structure: | |
| 111 | - | |
| 112 | -- Strings, names, and indirect object references in the original PDF | |
| 113 | - file are all converted to strings in the JSON representation. In the | |
| 114 | - case of a "normal" PDF file, you can tell the difference because a | |
| 115 | - name starts with a slash (``/``), and an indirect object reference | |
| 116 | - looks like ``n n R``, but if there were to be a string that looked | |
| 117 | - like a name or indirect object reference, there would be no way to | |
| 118 | - tell this from the JSON output. Note that there are certain cases | |
| 119 | - where you know for sure what something is, such as knowing that | |
| 120 | - dictionary keys in objects are always names and that certain things | |
| 121 | - in the higher-level computed data are known to contain indirect | |
| 122 | - object references. | |
| 123 | - | |
| 124 | -- The JSON format doesn't support binary data very well. Mostly the | |
| 125 | - details are not important, but they are presented here for | |
| 126 | - information. When qpdf outputs a string in the JSON representation, | |
| 127 | - it converts the string to UTF-8, assuming usual PDF string semantics. | |
| 128 | - Specifically, if the original string is UTF-16, it is converted to | |
| 129 | - UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is | |
| 130 | - converted to UTF-8 with that assumption. This causes strange things | |
| 131 | - to happen to binary strings. For example, if you had the binary | |
| 132 | - string ``<038051>``, this would be output to the JSON as ``\u0003โขQ`` | |
| 133 | - because ``03`` is not a printable character and ``80`` is the bullet | |
| 134 | - character in PDF doc encoding and is mapped to the Unicode value | |
| 135 | - ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to | |
| 136 | - convert back from here to a binary string, would have to recognize | |
| 137 | - Unicode values whose code points are higher than ``0xFF`` and map | |
| 138 | - those back to their corresponding PDF doc encoding characters. There | |
| 139 | - is no way to tell the difference between a Unicode string that was | |
| 140 | - originally encoded as UTF-16 or one that was converted from PDF doc | |
| 141 | - encoding. In other words, it's best if you don't try to use the JSON | |
| 142 | - format to extract binary strings from the PDF file, but if you really | |
| 143 | - had to, it could be done. Note that qpdf's | |
| 144 | - :qpdf:ref:`--show-object` option does not have this | |
| 145 | - limitation and will reveal the string as encoded in the original | |
| 146 | - file. | |
| 147 | - | |
| 148 | 621 | .. _json.considerations: |
| 149 | 622 | |
| 150 | 623 | JSON: Special Considerations |
| ... | ... | @@ -157,12 +630,15 @@ be aware of: |
| 157 | 630 | - If a PDF file has certain types of errors in its pages tree (such as |
| 158 | 631 | page objects that are direct or multiple pages sharing the same |
| 159 | 632 | object ID), qpdf will automatically repair the pages tree. If you |
| 160 | - specify ``"objects"`` and/or ``"objectinfo"`` without any other | |
| 161 | - keys, you will see the original pages tree without any corrections. | |
| 162 | - If you specify any of keys that require page tree traversal (for | |
| 163 | - example, ``"pages"``, ``"outlines"``, or ``"pagelabel"``), then | |
| 164 | - ``"objects"`` and ``"objectinfo"`` will show the repaired page tree | |
| 165 | - so that object references will be consistent throughout the file. | |
| 633 | + specify ``"objects"`` (and, with qpdf JSON version 1, also | |
| 634 | + ``"objectinfo"``) without any other keys, you will see the original | |
| 635 | + pages tree without any corrections. If you specify any of keys that | |
| 636 | + require page tree traversal (for example, ``"pages"``, | |
| 637 | + ``"outlines"``, or ``"pagelabel"``), then ``"objects"`` (and | |
| 638 | + ``"objectinfo"``) will show the repaired page tree so that object | |
| 639 | + references will be consistent throughout the file. This is not an | |
| 640 | + issue with :qpdf:ref:`--json-output`, which doesn't repair the pages | |
| 641 | + tree. | |
| 166 | 642 | |
| 167 | 643 | - While qpdf guarantees that keys present in the help will be present |
| 168 | 644 | in the output, those fields may be null or empty if the information |
| ... | ... | @@ -177,22 +653,128 @@ be aware of: |
| 177 | 653 | 1. Note that JSON indexes from 0, and you would also use 0-based |
| 178 | 654 | indexing using the API. However, 1-based indexing is easier in this |
| 179 | 655 | case because the command-line syntax for specifying page ranges is |
| 180 | - 1-based. If you were going to write a program that looked through the | |
| 181 | - JSON for information about specific pages and then use the | |
| 656 | + 1-based. If you were going to write a program that looked through | |
| 657 | + the JSON for information about specific pages and then use the | |
| 182 | 658 | command-line to extract those pages, 1-based indexing is easier. |
| 183 | - Besides, it's more convenient to subtract 1 from a program in a real | |
| 184 | - programming language than it is to add 1 from shell code. | |
| 659 | + Besides, it's more convenient to subtract 1 in a real programming | |
| 660 | + language than it is to add 1 in shell code. | |
| 185 | 661 | |
| 186 | 662 | - The image information included in the ``page`` section of the JSON |
| 187 | - output includes the key "``filterable``". Note that the value of this | |
| 188 | - field may depend on the :qpdf:ref:`--decode-level` that | |
| 189 | - you invoke qpdf with. The JSON output includes a top-level key | |
| 190 | - "``parameters``" that indicates the decode level used for computing | |
| 191 | - whether a stream was filterable. For example, jpeg images will be | |
| 192 | - shown as not filterable by default, but they will be shown as | |
| 193 | - filterable if you run :command:`qpdf --json | |
| 663 | + output includes the key ``"filterable"``. Note that the value of | |
| 664 | + this field may depend on the :qpdf:ref:`--decode-level` that you | |
| 665 | + invoke qpdf with. The JSON output includes a top-level key | |
| 666 | + ``"parameters"`` that indicates the decode level that was used for | |
| 667 | + computing whether a stream was filterable. For example, jpeg images | |
| 668 | + will be shown as not filterable by default, but they will be shown | |
| 669 | + as filterable if you run :command:`qpdf --json | |
| 194 | 670 | --decode-level=all`. |
| 195 | 671 | |
| 196 | 672 | - The ``encrypt`` key's values will be populated for non-encrypted |
| 197 | 673 | files. Some values will be null, and others will have values that |
| 198 | 674 | apply to unencrypted files. |
| 675 | + | |
| 676 | +- The qpdf library itself never loads an entire PDF into memory. This | |
| 677 | + remains true for PDF files represented in JSON format. In general, | |
| 678 | + qpdf will hold the entire object structure in memory once a file has | |
| 679 | + been fully read (objects are loaded into memory lazily but stay | |
| 680 | + there once loaded), but it will never have more than two copies of a | |
| 681 | + stream in memory at once. That said, if you ask qpdf to write JSON | |
| 682 | + to memory, it will do so, so be careful about this if you are | |
| 683 | + working with very large PDF files. There is nothing in the qpdf | |
| 684 | + library itself that prevents working with PDF files much larger than | |
| 685 | + available system memory. qpdf can both read and write such files in | |
| 686 | + JSON format. If you need to work with a PDF file's json | |
| 687 | + representation in memory, it is recommended that you use either | |
| 688 | + ``none`` or ``file`` as the argument to | |
| 689 | + :qpdf:ref:`--json-stream-data`, or if using the API, use | |
| 690 | + ``qpdf_sj_none`` or ``pdf_sj_file`` as the json stream data value. | |
| 691 | + If using ``none``, you can use other means to obtain the stream | |
| 692 | + data. | |
| 693 | + | |
| 694 | +.. _json-v2-changes: | |
| 695 | + | |
| 696 | +Changes from JSON v1 to v2 | |
| 697 | +-------------------------- | |
| 698 | + | |
| 699 | +The following changes were made to qpdf's JSON output format for | |
| 700 | +version 2. | |
| 701 | + | |
| 702 | +- The representation of objects has changed. For details, see | |
| 703 | + :ref:`json.objects`. | |
| 704 | + | |
| 705 | + - The representation of strings is now unambiguous for all strings. | |
| 706 | + Strings a prefixed with either ``u:`` for Unicode strings or | |
| 707 | + ``b:`` for byte strings. | |
| 708 | + | |
| 709 | + - Names are shown in qpdf's canonical form rather than in PDF | |
| 710 | + syntax. (Example: the PDF-syntax name ``/text#2fplain`` appeared | |
| 711 | + as ``"/text#2fplain"`` in v1 but appears as ``"/text/plain"`` in | |
| 712 | + v2. | |
| 713 | + | |
| 714 | + - The top-level representation of an object in ``"objects"`` is a | |
| 715 | + dictionary containing either a ``"value"`` key or a ``"stream"`` | |
| 716 | + key, making it possible to distinguish streams from other objects. | |
| 717 | + | |
| 718 | +- The ``"objectinfo"`` key has been removed in favor of a | |
| 719 | + representation in ``"objects"`` that differentiates between a stream | |
| 720 | + and other kinds of objects. In v1, it was not possible to tell a | |
| 721 | + stream from a dictionary within ``"objects"``. | |
| 722 | + | |
| 723 | +- Within the ``"objects"`` dictionary, keys are now ``"obj:O G R"`` | |
| 724 | + where ``O`` and ``G`` are the object and generation number. | |
| 725 | + ``"trailer"`` remains the key for the trailer dictionary. In v1, the | |
| 726 | + ``obj:`` prefix was not present. The rationale for this change is as | |
| 727 | + follows: | |
| 728 | + | |
| 729 | + - Having a unique prefix (``obj:``) makes it much easier to search | |
| 730 | + in the JSON file for the definition of an object | |
| 731 | + | |
| 732 | + - Having the key still contain ``O G R`` makes it much easier to | |
| 733 | + construct the key from an indirect reference. You just have to | |
| 734 | + prepend ``obj:``. There is no need to parse the indirect object | |
| 735 | + reference. | |
| 736 | + | |
| 737 | +- In the ``"encrypt"`` object, the ``"modifyannotations"`` was | |
| 738 | + misspelled as ``"moddifyannotations"`` in v1. This has been | |
| 739 | + corrected. | |
| 740 | + | |
| 741 | +Motivation for qpdf JSON version 2 | |
| 742 | +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | |
| 743 | + | |
| 744 | +qpdf JSON version 2 was created to make it possible to manipulate PDF | |
| 745 | +files using JSON syntax instead of native PDF syntax. This makes it | |
| 746 | +possible to make low-level updates to PDF files from just about any | |
| 747 | +programming language or even to do so from the command-line using | |
| 748 | +tools like ``jq`` or any editor that's capable of working with JSON | |
| 749 | +files. There were several limitations of JSON format version 1 that | |
| 750 | +made this impossible: | |
| 751 | + | |
| 752 | +- Strings, names, and indirect object references in the original PDF | |
| 753 | + file were all converted to strings in the JSON representation. For | |
| 754 | + casual human inspection, this was fine, but in the general case, | |
| 755 | + there was no way to tell the difference between a string that looked | |
| 756 | + like a name or indirect object reference from an actual name or | |
| 757 | + indirect object reference. | |
| 758 | + | |
| 759 | +- PDF strings were not unambiguously represented in the JSON format. | |
| 760 | + The way qpdf JSON v1 represented a string was to try to convert the | |
| 761 | + string to UTF-8. This was done by assuming a string that was not | |
| 762 | + explicitly marked as Unicode was encoded in PDF doc encoding. The | |
| 763 | + problem is that there is not a perfect bidirectional mapping between | |
| 764 | + Unicode and PDF doc encoding, so if a binary string happened to | |
| 765 | + contain characters that couldn't be bidirectionally mapped, there | |
| 766 | + would be no way to get back to the original PDF string. Even when | |
| 767 | + possible, trying to map from the JSON representation of a binary | |
| 768 | + string back to the original string required knowledge of the mapping | |
| 769 | + between PDF doc encoding and Unicode. | |
| 770 | + | |
| 771 | +- There was no representation of stream data. If you wanted to extract | |
| 772 | + stream data, you could use :qpdf:ref:`--show-object`, so this wasn't | |
| 773 | + that important for inspection, but it was a blocker for being able | |
| 774 | + to go from JSON back to PDF. qpdf JSON version 2 allows stream data | |
| 775 | + to be included inline as base64-encoded data. There is also an | |
| 776 | + option to write all stream data to external files, which makes it | |
| 777 | + possible to work with very large PDF files in JSON format even with | |
| 778 | + tools that try to read the entire JSON structure into memory. | |
| 779 | + | |
| 780 | +- The PDF version from PDF header was not represented in qpdf JSON v1. | ... | ... |
manual/library.rst
| ... | ... | @@ -70,12 +70,14 @@ Python |
| 70 | 70 | qpdf's capabilities with other functionality provided by Python's |
| 71 | 71 | rich standard library and available modules. |
| 72 | 72 | |
| 73 | -Other Languages | |
| 74 | - Starting with version 8.3.0, the :command:`qpdf` | |
| 75 | - command-line tool can produce a JSON representation of the PDF file's | |
| 76 | - non-content data. This can facilitate interacting programmatically | |
| 77 | - with PDF files through qpdf's command line interface. For more | |
| 78 | - information, please see :ref:`json`. | |
| 73 | +Other Languages Starting with version 11.0.0, the :command:`qpdf` | |
| 74 | + command-line tool can produce an unambiguous JSON representation of | |
| 75 | + a PDF file and can also create or update PDF files using this JSON | |
| 76 | + representation. qpdf versions from 8.3.0 through 10.6.3 had a more | |
| 77 | + limited JSON output format. The qpdf JSON format makes it possible | |
| 78 | + to inspect and modify the structure of a PDF file down to the | |
| 79 | + object level from the command-line or from any language that can | |
| 80 | + handle JSON data. Please see :ref:`json` for details. | |
| 79 | 81 | |
| 80 | 82 | Wrappers |
| 81 | 83 | The `qpdf Wiki <https://github.com/qpdf/qpdf/wiki>`__ contains a | ... | ... |
manual/object-streams.rst
| ... | ... | @@ -122,7 +122,7 @@ entries in ``/W`` above. Each entry consists of one or more fields, the |
| 122 | 122 | first of which is the type of the field. The number of bytes for each |
| 123 | 123 | field is given by ``/W`` above. A 0 in ``/W`` indicates that the field |
| 124 | 124 | is omitted and has the default value. The default value for the field |
| 125 | -type is "``1``". All other default values are "``0``". | |
| 125 | +type is ``1``. All other default values are ``0``. | |
| 126 | 126 | |
| 127 | 127 | PDF 1.5 has three field types: |
| 128 | 128 | ... | ... |
manual/qdf.rst
| ... | ... | @@ -28,6 +28,13 @@ able to restore edited files to a correct state. The |
| 28 | 28 | arguments. It reads a possibly edited QDF file from standard input and |
| 29 | 29 | writes a repaired file to standard output. |
| 30 | 30 | |
| 31 | +For another way to work with PDF files in an editor, see :ref:`json`. | |
| 32 | +Using qpdf JSON format allows you to edit the PDF file semantically | |
| 33 | +without having to be concerned about PDF syntax. However, QDF files | |
| 34 | +are actually valid PDF files, so the feedback cycle may be faster if | |
| 35 | +previewing with a PDF reader. Also, since QDF files are valid PDF, you | |
| 36 | +can experiment with all aspects of the PDF file, including syntax. | |
| 37 | + | |
| 31 | 38 | The following attributes characterize a QDF file: |
| 32 | 39 | |
| 33 | 40 | - All objects appear in numerical order in the PDF file, including when | ... | ... |
manual/qpdf-job.rst
| ... | ... | @@ -27,6 +27,10 @@ executable is available from inside the C++ library using the |
| 27 | 27 | |
| 28 | 28 | - Use from the C API with ``qpdfjob_run_from_json`` from :file:`qpdfjob-c.h` |
| 29 | 29 | |
| 30 | + - Note: this is unrelated to :qpdf:ref:`--json` but can be combined | |
| 31 | + with it. For more information on qpdf JSON (vs. QPDFJob JSON), see | |
| 32 | + :ref:`json`. | |
| 33 | + | |
| 30 | 34 | - The ``QPDFJob`` C++ API |
| 31 | 35 | |
| 32 | 36 | If you can understand how to use the :command:`qpdf` CLI, you can | ... | ... |
manual/release-notes.rst
| ... | ... | @@ -60,7 +60,8 @@ For a detailed list of changes, please see the file |
| 60 | 60 | - CLI: breaking changes |
| 61 | 61 | |
| 62 | 62 | - The default json output version when :qpdf:ref:`--json` is |
| 63 | - specified has been changed from ``1`` to ``latest``. | |
| 63 | + specified has been changed from ``1`` to ``latest``, which is | |
| 64 | + now ``2``. | |
| 64 | 65 | |
| 65 | 66 | - The :qpdf:ref:`--allow-weak-crypto` flag is now mandatory when |
| 66 | 67 | explicitly creating files with weak cryptographic algorithms. |
| ... | ... | @@ -100,7 +101,7 @@ For a detailed list of changes, please see the file |
| 100 | 101 | |
| 101 | 102 | - ``qpdf --list-attachments --verbose`` include some additional |
| 102 | 103 | information about attachments. Additional information about |
| 103 | - attachments is also included in the ``attachments`` json key | |
| 104 | + attachments is also included in the ``attachments`` JSON key | |
| 104 | 105 | with ``--json``. |
| 105 | 106 | |
| 106 | 107 | - For encrypted files, ``qpdf --json`` reveals the user password |
| ... | ... | @@ -647,8 +648,8 @@ For a detailed list of changes, please see the file |
| 647 | 648 | passwords from files or standard input than using |
| 648 | 649 | :samp:`@file` for this purpose. |
| 649 | 650 | |
| 650 | - - Add some information about attachments to the json output, and | |
| 651 | - added ``attachments`` as an additional json key. The | |
| 651 | + - Add some information about attachments to the JSON output, and | |
| 652 | + added ``attachments`` as an additional JSON key. The | |
| 652 | 653 | information included here is limited to the preferred name and |
| 653 | 654 | content stream and a reference to the file spec object. This is |
| 654 | 655 | enough detail for clients to avoid the hassle of navigating a | ... | ... |