Commit 0bd908b550603a6bcc399a825a170a1263378b22

Authored by Jay Berkenbilt
1 parent b7bbf12e

Update documentation for qpdf JSON v2

@@ -2,14 +2,13 @@ @@ -2,14 +2,13 @@
2 Next 2 Next
3 ==== 3 ====
4 4
  5 +Before Release:
  6 +
5 * At next release, hide release-qpdf-10.6.3.0cmake* versions at readthedocs 7 * At next release, hide release-qpdf-10.6.3.0cmake* versions at readthedocs
6 * Stay on top of https://github.com/pikepdf/pikepdf/pull/315 8 * Stay on top of https://github.com/pikepdf/pikepdf/pull/315
7 * Release qtest with updates to qtest-driver and copy back into qpdf 9 * Release qtest with updates to qtest-driver and copy back into qpdf
8 10
9 -In order:  
10 -* json v2  
11 -  
12 -Other (do in any order): 11 +Pending changes:
13 12
14 * Good C API for json v2 13 * Good C API for json v2
15 * QPDFPagesTree -- avoid ever flattening the pages tree. 14 * QPDFPagesTree -- avoid ever flattening the pages tree.
@@ -50,180 +49,10 @@ Other (do in any order): @@ -50,180 +49,10 @@ Other (do in any order):
50 * Rework tests so that nothing is written into the source directory. 49 * Rework tests so that nothing is written into the source directory.
51 Ideally then the entire build could be done with a read-only 50 Ideally then the entire build could be done with a read-only
52 source tree. 51 source tree.
  52 +* Consider adding fuzzer code for JSON
53 53
54 Soon: Break ground on "Document-level work" 54 Soon: Break ground on "Document-level work"
55 55
56 -Output JSON v2  
57 -==============  
58 -  
59 -Remaining work:  
60 -  
61 -* Make sure all the information from informational options is  
62 - available in the json output.  
63 -  
64 - * --check: add but maybe not by default?  
65 -  
66 - * --show-linearization: add but maybe not by default? Also figure  
67 - out whether warnings reported for some of the PDF specs (1.7) are  
68 - qpdf problems. This may not be worth adding in the first  
69 - increment.  
70 -  
71 - * --show-xref: add  
72 -  
73 -* Consider having --check, --show-encryption, etc., just select the  
74 - right keys when in json mode. I don't think I want check on by  
75 - default, so that might be different.  
76 -  
77 -* Consider having warnings be included in the json in a "warnings" key  
78 - in json mode.  
79 -  
80 -Notes for documentation:  
81 -  
82 -* Find all mentions of json in the manual and update.  
83 -  
84 -* Document typo fix in encrypt in release notes along with any other  
85 - non-compatible json 2 changes. Scrutinize all the output to decide  
86 - what should change.  
87 -  
88 -* Keys other than "qpdf-v2" are ignored so people can stash their own  
89 - stuff. Unknown keys are ignored at other places for future  
90 - compatibility. Readers of qpdf json should continue to ignore keys  
91 - they don't recognize.  
92 -  
93 -* Change: names are written in canonical form with a leading slash  
94 - just as they are treated in the code. In v1, they were written in  
95 - PDF syntax in the json file. Example: /text#2fplain in pdf will be  
96 - written as /text/plain in json v2 and as /text#2fplain in json v1.  
97 -  
98 -* Document changes to strings, objects, streams, object keys.  
99 -  
100 -* CLI: --json-input, --json-output[=version], --update-from-json. With  
101 - --json-input, the input file is a JSON file instead of a PDF file.  
102 - It must be complete, meaning that a PDF version must be given, all  
103 - streams must have exactly one of data or datafile, and a trailer  
104 - dictionary must be present, even if empty.  
105 -  
106 - With --update-from-json, the JSON file updates objects in place. If  
107 - updating an old stream, if stream data is omitted, the data remains  
108 - untouched. The dictionary is always required. Remember that  
109 - QPDFWriter does not preserve object numbers, though --json-output  
110 - does. Therefore, if you want to update a PDF with a JSON, the input  
111 - to --update-from-json must be the same PDF as the one that  
112 - --json-output was run on previously. Otherwise, object numbers won't  
113 - match. Show this with an example. When updating,  
114 -  
115 -* Certain fields are ignored when reading the JSON. This includes  
116 - maxobjectid, any computed fields in trailer (such as /Size), and all  
117 - /Length keys in stream dictionaries. There is no need for the user  
118 - to correct, remove, or otherwise worry about any values those keys  
119 - might have. The maxobjectid field is present in the original output  
120 - to assist with adding new objects to the file.  
121 -  
122 -* JSON strings within PDF objects:  
123 -  
124 - * "n n R" is an indirect object  
125 -  
126 - * "/Name" is a name in canonical form with a leading slash (like  
127 - "/text/plain"), not PDF syntax (like "/text#2fplain").  
128 -  
129 - * "b:hex-digits" is a binary string ("b:feff03c0"). Hex digits may be  
130 - mixed case. There must be an even number of digits.  
131 -  
132 - * "u:utf-8" is a UTF-8 encoded string ("u:ฯ€", "u:\u03c0"). UTF-16  
133 - surrogate pairs are allowed. These are all equivalent: "u:๐Ÿฅ”",  
134 - "u:\ud83e\udd54", "b:FEFFD83EDD54", "b:efbbbff09fa594".  
135 -  
136 - * Both "b:" and "u:" are valid representations of the empty string.  
137 -  
138 - * Anything else is an error  
139 -  
140 -* Document use of --json-input and --json-output together to show  
141 - preservation of object numbers. Draw attention to "original object  
142 - ID" comments in qdf as another way to show it.  
143 -  
144 -* Document top-level keys of "qpdf-v2" ("pdfversion", "objects",  
145 - "maxobjectid") noting that "maxobjectid" is ignored when reading.  
146 -  
147 -* Stream data: "data" is base64-encoded stream data. "datafile" is the  
148 - path to a file (relative path recommended but not required)  
149 - containing the binary data. As with any PDF representation, the data  
150 - must be consistent with the filters. --decode-level is honored by  
151 - --json-output.  
152 -  
153 -* Other changes from v1:  
154 -  
155 - * in "objects", keys are "obj:o g R" or "trailer"  
156 -  
157 - * Non-stream objects are dictionaries with a "value" key whose value  
158 - is the object. Stream objects are dictionaries with a "stream" key  
159 - whose value is {"dict": stream-dictionary}. The "/Length" key is  
160 - omitted from the stream dictionary.  
161 -  
162 - * "objectinfo" is gone as it is now possible to tell a stream from a  
163 - non-stream directly. To get stream data, use the --json-output  
164 - option. Note about how "pages" may cause the pages tree to be  
165 - corrected.  
166 -  
167 -For non-streams:  
168 -  
169 - "obj:o g R": {  
170 - "value": ...  
171 - }  
172 -  
173 -For streams:  
174 -  
175 - "obj:o g R": {  
176 - "stream": {  
177 - "dict": { ... stream dictionary ... },  
178 - "data": "base64-encoded data",  
179 - "datafile": "path to base64-encoded data"  
180 - }  
181 - }  
182 -  
183 -Rationale of "obj:o g R" is that indirect object references are just  
184 -"o g R", and so code that wants to resolve one can do so easily by  
185 -just prepending "obj:" and not having to parse or split the string.  
186 -Having a prefix rather than making the key just "o g R" makes it much  
187 -easier to search in the JSON for the definition of an object.  
188 -  
189 -CLI:  
190 -  
191 -Example workflow:  
192 -* qpdf in.pdf --json-output pdf.json  
193 -* edit pdf.json  
194 -* qpdf --json-input pdf.json out.pdf  
195 -  
196 -* qpdf in.pdf --json-output pdf.json  
197 -* edit pdf.json keeping only objects that need to be changed  
198 -* qpdf in.pdf --update-from-json=pdf.json out.pdf  
199 -  
200 -To modify a single object:  
201 -  
202 -* qpdf in.pdf --json-output pdf.json --json-object=o,g  
203 -* edit pdf.json  
204 -* qpdf in.pdf --update-from-json=pdf.json out.pdf  
205 -  
206 -Historical note: you can't create a PDF from v1 json because  
207 -  
208 -* The PDF version header is not recorded  
209 -  
210 -* Strings cannot be unambiguously encoded/decoded  
211 -  
212 - * Can't tell string from name from indirect object  
213 -  
214 - * Strings are treated as PDF doc encoding and output as UTF-8, which  
215 - doesn't work since multiple PDF doc code points are undefined and  
216 - is absurd for binary strings  
217 -  
218 -* There is no representation of stream data  
219 -  
220 -* You can't tell a stream from a dictionary except by looking in both  
221 - "object" and "objectinfo".  
222 -  
223 -* Using "n n R" as a key in "objects" and "objectinfo" makes it hard  
224 - to search for things when viewing the JSON file in an editor.  
225 -  
226 -  
227 QPDFPagesTree 56 QPDFPagesTree
228 ============= 57 =============
229 58
@@ -256,6 +85,28 @@ sure /Count and /Parent are correct. @@ -256,6 +85,28 @@ sure /Count and /Parent are correct.
256 refs/attic/QPDFPagesTree-old -- original, abandoned branch -- clean up 85 refs/attic/QPDFPagesTree-old -- original, abandoned branch -- clean up
257 when done. 86 when done.
258 87
  88 +Possible future JSON enhancements
  89 +=================================
  90 +
  91 +* Add to JSON output the information available from a few additional
  92 + informational options:
  93 +
  94 + * --check: add but maybe not by default?
  95 +
  96 + * --show-linearization: add but maybe not by default? Also figure
  97 + out whether warnings reported for some of the PDF specs (1.7) are
  98 + qpdf problems. This may not be worth adding in the first
  99 + increment.
  100 +
  101 + * --show-xref: add
  102 +
  103 +* Consider having --check, --show-encryption, etc., just select the
  104 + right keys when in json mode. I don't think I want check on by
  105 + default, so that might be different.
  106 +
  107 +* Consider having warnings be included in the json in a "warnings" key
  108 + in json mode.
  109 +
259 QPDFJob 110 QPDFJob
260 ======= 111 =======
261 112
cSpell.json
@@ -271,6 +271,7 @@ @@ -271,6 +271,7 @@
271 "mkinstalldirs", 271 "mkinstalldirs",
272 "mklink", 272 "mklink",
273 "moddate", 273 "moddate",
  274 + "modifyannotations",
274 "monoseq", 275 "monoseq",
275 "msvc", 276 "msvc",
276 "msvcrt", 277 "msvcrt",
include/qpdf/QPDF.hh
@@ -112,8 +112,11 @@ class QPDF @@ -112,8 +112,11 @@ class QPDF
112 112
113 // Create a PDF from an input source that contains JSON as written 113 // Create a PDF from an input source that contains JSON as written
114 // by writeJSON (or qpdf --json-output, version 2 or higher). The 114 // by writeJSON (or qpdf --json-output, version 2 or higher). The
115 - // JSON must be a complete representation of a PDF. See "QPDF JSON  
116 - // Format" in the manual for details. 115 + // JSON must be a complete representation of a PDF. See "qpdf
  116 + // JSON" in the manual for details. The input JSON may be
  117 + // arbitrarily large. QPDF does not load stream data into memory
  118 + // for more than one stream at a time, even if the stream data is
  119 + // specified inline.
117 QPDF_DLL 120 QPDF_DLL
118 void createFromJSON(std::string const& json_file); 121 void createFromJSON(std::string const& json_file);
119 QPDF_DLL 122 QPDF_DLL
@@ -122,24 +125,40 @@ class QPDF @@ -122,24 +125,40 @@ class QPDF
122 // Update a PDF from an input source that contains JSON in the 125 // Update a PDF from an input source that contains JSON in the
123 // same format as is written by writeJSON (or qpdf --json-output, 126 // same format as is written by writeJSON (or qpdf --json-output,
124 // version 2 or higher). Objects in the PDF and not in the JSON 127 // version 2 or higher). Objects in the PDF and not in the JSON
125 - // are not modified. See "QPDF JSON Format" in the manual for  
126 - // details. 128 + // are not modified. See "qpdf JSON" in the manual for details. As
  129 + // with createFromJSON, the input JSON may be arbitrarily large.
127 QPDF_DLL 130 QPDF_DLL
128 void updateFromJSON(std::string const& json_file); 131 void updateFromJSON(std::string const& json_file);
129 QPDF_DLL 132 QPDF_DLL
130 void updateFromJSON(std::shared_ptr<InputSource>); 133 void updateFromJSON(std::shared_ptr<InputSource>);
131 134
132 - // Write qpdf json format. The only supported version is 2. If  
133 - // wanted_objects is empty, write all objects. Otherwise, write  
134 - // only objects whose keys are in wanted_objects. Keys may be  
135 - // either "trailer" or of the form "obj:n n R". Invalid keys are  
136 - // ignored. 135 + // Write qpdf json format to the pipeline "p". The only supported
  136 + // version is 2. The finish() method is called on the pipeline at
  137 + // the end. The decode_level parameter controls which streams are
  138 + // uncompressed in the JSON. Use qpdf_dl_none to preserve all
  139 + // stream data exactly as it appears in the input. The possible
  140 + // values for json_stream_data can be found in qpdf/Constants.h
  141 + // and correspond to the --json-stream-data command-line argument.
  142 + // If json_stream_data is qpdf_sj_file, file_prefix must be
  143 + // specified. Each stream will be written to a file whose path is
  144 + // constructed by appending "-nnn" to file_prefix, where "nnn" is
  145 + // the object number (not zero-filled). If wanted_objects is
  146 + // empty, write all objects. Otherwise, write only objects whose
  147 + // keys are in wanted_objects. Keys may be either "trailer" or of
  148 + // the form "obj:n n R". Invalid keys are ignored. This
  149 + // corresponds to the --json-object command-line argument.
  150 + //
  151 + // QPDF is efficient with regard to memory when writing, allowing
  152 + // you to write arbitrarily large PDF files to a pipeline. You can
  153 + // use a pipeline like Pl_Buffer or Pl_String to capture the JSON
  154 + // output in memory, but do so with caution as this will allocate
  155 + // enough memory to hold the entire PDF file.
137 QPDF_DLL 156 QPDF_DLL
138 void writeJSON( 157 void writeJSON(
139 int version, 158 int version,
140 - Pipeline*,  
141 - qpdf_stream_decode_level_e,  
142 - qpdf_json_stream_data_e, 159 + Pipeline* p,
  160 + qpdf_stream_decode_level_e decode_level,
  161 + qpdf_json_stream_data_e json_stream_data,
143 std::string const& file_prefix, 162 std::string const& file_prefix,
144 std::set<std::string> wanted_objects); 163 std::set<std::string> wanted_objects);
145 164
job.sums
@@ -8,10 +8,10 @@ include/qpdf/auto_job_c_pages.hh b3cc0f21029f6d89efa043dcdbfa183cb59325b6506001c @@ -8,10 +8,10 @@ include/qpdf/auto_job_c_pages.hh b3cc0f21029f6d89efa043dcdbfa183cb59325b6506001c
8 include/qpdf/auto_job_c_uo.hh ae21b69a1efa9333050f4833d465f6daff87e5b38e5106e49bbef5d4132e4ed1 8 include/qpdf/auto_job_c_uo.hh ae21b69a1efa9333050f4833d465f6daff87e5b38e5106e49bbef5d4132e4ed1
9 job.yml 3b2b3c6f92b48f6c76109711cbfdd74669fa31a80cd17379548b09f8e76be05d 9 job.yml 3b2b3c6f92b48f6c76109711cbfdd74669fa31a80cd17379548b09f8e76be05d
10 libqpdf/qpdf/auto_job_decl.hh 74df4d7fdbdf51ecd0d58ce1e9844bb5525b9adac5a45f7c9a787ecdda2868df 10 libqpdf/qpdf/auto_job_decl.hh 74df4d7fdbdf51ecd0d58ce1e9844bb5525b9adac5a45f7c9a787ecdda2868df
11 -libqpdf/qpdf/auto_job_help.hh c1cc99f6fe17285ee5e40730f6280e37d17da1a5f408086ce34e01af121df7ad 11 +libqpdf/qpdf/auto_job_help.hh 3aaae4cde004e5314d3ac6d554da575e40209c0f0611f6a308957986f9c7967b
12 libqpdf/qpdf/auto_job_init.hh 7ea8e0641dc26fdfba6e283e14dbbff0c016654e174cdace8054f8bef53750fd 12 libqpdf/qpdf/auto_job_init.hh 7ea8e0641dc26fdfba6e283e14dbbff0c016654e174cdace8054f8bef53750fd
13 libqpdf/qpdf/auto_job_json_decl.hh 06caa46eaf71db8a50c046f91866baa8087745a9474319fb7c86d92634cc8297 13 libqpdf/qpdf/auto_job_json_decl.hh 06caa46eaf71db8a50c046f91866baa8087745a9474319fb7c86d92634cc8297
14 libqpdf/qpdf/auto_job_json_init.hh 5f6b53e3c81d4b54ce5c4cf9c3f52d0c02f987c53bf8841c0280367bad23e335 14 libqpdf/qpdf/auto_job_json_init.hh 5f6b53e3c81d4b54ce5c4cf9c3f52d0c02f987c53bf8841c0280367bad23e335
15 libqpdf/qpdf/auto_job_schema.hh 9d543cd4a43eafffc2c4b8a6fee29e399c271c52cb6f7d417ae5497b3c1127dc 15 libqpdf/qpdf/auto_job_schema.hh 9d543cd4a43eafffc2c4b8a6fee29e399c271c52cb6f7d417ae5497b3c1127dc
16 manual/_ext/qpdf.py 6add6321666031d55ed4aedf7c00e5662bba856dfcd66ccb526563bffefbb580 16 manual/_ext/qpdf.py 6add6321666031d55ed4aedf7c00e5662bba856dfcd66ccb526563bffefbb580
17 -manual/cli.rst 82ead389c03bbf5e0498bd0571a11dc06544d591f4e4454c00322e3473fc556d 17 +manual/cli.rst e3f4331befa17450e0d0fff87569722a5aab42ea619ef64f0a3a04e1f99ed65c
libqpdf/QPDF_json.cc
@@ -817,4 +817,5 @@ QPDF::writeJSON( @@ -817,4 +817,5 @@ QPDF::writeJSON(
817 JSON::writeDictionaryClose(p, first_qpdf, 1); 817 JSON::writeDictionaryClose(p, first_qpdf, 1);
818 JSON::writeDictionaryClose(p, first, 0); 818 JSON::writeDictionaryClose(p, first, 0);
819 *p << "\n"; 819 *p << "\n";
  820 + p->finish();
820 } 821 }
libqpdf/qpdf/auto_job_help.hh
@@ -70,6 +70,9 @@ ap.addOptionHelp(&quot;--copyright&quot;, &quot;help&quot;, &quot;show copyright information&quot;, R&quot;(Display @@ -70,6 +70,9 @@ ap.addOptionHelp(&quot;--copyright&quot;, &quot;help&quot;, &quot;show copyright information&quot;, R&quot;(Display
70 ap.addOptionHelp("--show-crypto", "help", "show available crypto providers", R"(Show a list of available crypto providers, one per line. The 70 ap.addOptionHelp("--show-crypto", "help", "show available crypto providers", R"(Show a list of available crypto providers, one per line. The
71 default provider is shown first. 71 default provider is shown first.
72 )"); 72 )");
  73 +ap.addOptionHelp("--job-json-help", "help", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by
  74 +--job-json-file.
  75 +)");
73 ap.addHelpTopic("general", "general options", R"(General options control qpdf's behavior in ways that are not 76 ap.addHelpTopic("general", "general options", R"(General options control qpdf's behavior in ways that are not
74 directly related to the operation it is performing. 77 directly related to the operation it is performing.
75 )"); 78 )");
@@ -87,11 +90,11 @@ ap.addOptionHelp(&quot;--verbose&quot;, &quot;general&quot;, &quot;print additional information&quot;, R&quot;(Outp @@ -87,11 +90,11 @@ ap.addOptionHelp(&quot;--verbose&quot;, &quot;general&quot;, &quot;print additional information&quot;, R&quot;(Outp
87 doing, including information about files created and operations 90 doing, including information about files created and operations
88 performed. 91 performed.
89 )"); 92 )");
90 -ap.addOptionHelp("--progress", "general", "show progress when writing", R"(Indicate progress when writing files.  
91 -)");  
92 } 93 }
93 static void add_help_2(QPDFArgParser& ap) 94 static void add_help_2(QPDFArgParser& ap)
94 { 95 {
  96 +ap.addOptionHelp("--progress", "general", "show progress when writing", R"(Indicate progress when writing files.
  97 +)");
95 ap.addOptionHelp("--no-warn", "general", "suppress printing of warning messages", R"(Suppress printing of warning messages. If warnings were 98 ap.addOptionHelp("--no-warn", "general", "suppress printing of warning messages", R"(Suppress printing of warning messages. If warnings were
96 encountered, qpdf still exits with exit status 3. 99 encountered, qpdf still exits with exit status 3.
97 Use --warning-exit-0 with --no-warn to completely ignore 100 Use --warning-exit-0 with --no-warn to completely ignore
@@ -172,12 +175,12 @@ companion tool &quot;fix-qdf&quot; can be used to repair hand-edited QDF @@ -172,12 +175,12 @@ companion tool &quot;fix-qdf&quot; can be used to repair hand-edited QDF
172 files. QDF is a feature specific to the qpdf tool. Please see 175 files. QDF is a feature specific to the qpdf tool. Please see
173 the "QDF Mode" chapter in the manual. 176 the "QDF Mode" chapter in the manual.
174 )"); 177 )");
175 -ap.addOptionHelp("--no-original-object-ids", "transformation", "omit original object IDs in qdf", R"(Omit comments in a QDF file indicating the object ID an object  
176 -had in the original file.  
177 -)");  
178 } 178 }
179 static void add_help_3(QPDFArgParser& ap) 179 static void add_help_3(QPDFArgParser& ap)
180 { 180 {
  181 +ap.addOptionHelp("--no-original-object-ids", "transformation", "omit original object IDs in qdf", R"(Omit comments in a QDF file indicating the object ID an object
  182 +had in the original file.
  183 +)");
181 ap.addOptionHelp("--compress-streams", "transformation", "compress uncompressed streams", R"(--compress-streams=[y|n] 184 ap.addOptionHelp("--compress-streams", "transformation", "compress uncompressed streams", R"(--compress-streams=[y|n]
182 185
183 Setting --compress-streams=n prevents qpdf from compressing 186 Setting --compress-streams=n prevents qpdf from compressing
@@ -188,9 +191,11 @@ ap.addOptionHelp(&quot;--decode-level&quot;, &quot;transformation&quot;, &quot;control which streams to u @@ -188,9 +191,11 @@ ap.addOptionHelp(&quot;--decode-level&quot;, &quot;transformation&quot;, &quot;control which streams to u
188 191
189 When uncompressing streams, control which types of compression 192 When uncompressing streams, control which types of compression
190 schemes should be uncompressed: 193 schemes should be uncompressed:
191 -- none: don't uncompress anything. This is the default with --json-output. 194 +- none: don't uncompress anything. This is the default with
  195 + --json-output.
192 - generalized: uncompress streams compressed with a 196 - generalized: uncompress streams compressed with a
193 - general-purpose compression algorithm. This is the default. 197 + general-purpose compression algorithm. This is the default
  198 + except when --json-output is given.
194 - specialized: in addition to generalized, also uncompress 199 - specialized: in addition to generalized, also uncompress
195 streams compressed with a special-purpose but non-lossy 200 streams compressed with a special-purpose but non-lossy
196 compression scheme 201 compression scheme
@@ -290,13 +295,13 @@ from the resulting set, not based on the original page numbers. @@ -290,13 +295,13 @@ from the resulting set, not based on the original page numbers.
290 ap.addHelpTopic("modification", "change parts of the PDF", R"(Modification options make systematic changes to certain parts of 295 ap.addHelpTopic("modification", "change parts of the PDF", R"(Modification options make systematic changes to certain parts of
291 the PDF, causing the PDF to render differently from the original. 296 the PDF, causing the PDF to render differently from the original.
292 )"); 297 )");
  298 +}
  299 +static void add_help_4(QPDFArgParser& ap)
  300 +{
293 ap.addOptionHelp("--pages", "modification", "begin page selection", R"(--pages file [--password=password] [page-range] [...] -- 301 ap.addOptionHelp("--pages", "modification", "begin page selection", R"(--pages file [--password=password] [page-range] [...] --
294 302
295 Run qpdf --help=page-selection for details. 303 Run qpdf --help=page-selection for details.
296 )"); 304 )");
297 -}  
298 -static void add_help_4(QPDFArgParser& ap)  
299 -{  
300 ap.addOptionHelp("--collate", "modification", "collate with --pages", R"(--collate[=n] 305 ap.addOptionHelp("--collate", "modification", "collate with --pages", R"(--collate[=n]
301 306
302 Collate rather than concatenate pages specified with --pages. 307 Collate rather than concatenate pages specified with --pages.
@@ -460,14 +465,14 @@ ap.addOptionHelp(&quot;--assemble&quot;, &quot;encryption&quot;, &quot;restrict document assembly&quot;, R&quot;(-- @@ -460,14 +465,14 @@ ap.addOptionHelp(&quot;--assemble&quot;, &quot;encryption&quot;, &quot;restrict document assembly&quot;, R&quot;(--
460 Enable/disable document assembly (rotation and reordering of 465 Enable/disable document assembly (rotation and reordering of
461 pages). This option is not available with 40-bit encryption. 466 pages). This option is not available with 40-bit encryption.
462 )"); 467 )");
  468 +}
  469 +static void add_help_5(QPDFArgParser& ap)
  470 +{
463 ap.addOptionHelp("--extract", "encryption", "restrict text/graphic extraction", R"(--extract=[y|n] 471 ap.addOptionHelp("--extract", "encryption", "restrict text/graphic extraction", R"(--extract=[y|n]
464 472
465 Enable/disable text/graphic extraction for purposes other than 473 Enable/disable text/graphic extraction for purposes other than
466 accessibility. 474 accessibility.
467 )"); 475 )");
468 -}  
469 -static void add_help_5(QPDFArgParser& ap)  
470 -{  
471 ap.addOptionHelp("--form", "encryption", "restrict form filling", R"(--form=[y|n] 476 ap.addOptionHelp("--form", "encryption", "restrict form filling", R"(--form=[y|n]
472 477
473 Enable/disable whether filling form fields is allowed even if 478 Enable/disable whether filling form fields is allowed even if
@@ -638,6 +643,9 @@ ap.addOptionHelp(&quot;--remove-attachment&quot;, &quot;attachments&quot;, &quot;remove an embedded file&quot; @@ -638,6 +643,9 @@ ap.addOptionHelp(&quot;--remove-attachment&quot;, &quot;attachments&quot;, &quot;remove an embedded file&quot;
638 Remove an embedded file using its key. Get the key with 643 Remove an embedded file using its key. Get the key with
639 --list-attachments. 644 --list-attachments.
640 )"); 645 )");
  646 +}
  647 +static void add_help_6(QPDFArgParser& ap)
  648 +{
641 ap.addHelpTopic("pdf-dates", "PDF date format", R"(When a date is required, the date should conform to the PDF date 649 ap.addHelpTopic("pdf-dates", "PDF date format", R"(When a date is required, the date should conform to the PDF date
642 format specification, which is "D:yyyymmddhhmmssz" where "z" is 650 format specification, which is "D:yyyymmddhhmmssz" where "z" is
643 either literally upper case "Z" for UTC or a timezone offset in 651 either literally upper case "Z" for UTC or a timezone offset in
@@ -650,9 +658,6 @@ Examples: @@ -650,9 +658,6 @@ Examples:
650 - D:20210207161528-05'00' February 7, 2021 at 4:15:28 p.m. 658 - D:20210207161528-05'00' February 7, 2021 at 4:15:28 p.m.
651 - D:20210207211528Z February 7, 2021 at 21:15:28 UTC 659 - D:20210207211528Z February 7, 2021 at 21:15:28 UTC
652 )"); 660 )");
653 -}  
654 -static void add_help_6(QPDFArgParser& ap)  
655 -{  
656 ap.addHelpTopic("add-attachment", "attach (embed) files", R"(The options listed below appear between --add-attachment and its 661 ap.addHelpTopic("add-attachment", "attach (embed) files", R"(The options listed below appear between --add-attachment and its
657 terminating "--". 662 terminating "--".
658 )"); 663 )");
@@ -747,14 +752,14 @@ the linearization hint tables are correct. @@ -747,14 +752,14 @@ the linearization hint tables are correct.
747 )"); 752 )");
748 ap.addOptionHelp("--show-linearization", "inspection", "show linearization hint tables", R"(Check and display all data in the linearization hint tables. 753 ap.addOptionHelp("--show-linearization", "inspection", "show linearization hint tables", R"(Check and display all data in the linearization hint tables.
749 )"); 754 )");
  755 +}
  756 +static void add_help_7(QPDFArgParser& ap)
  757 +{
750 ap.addOptionHelp("--show-xref", "inspection", "show cross reference data", R"(Show the contents of the cross-reference table or stream (object 758 ap.addOptionHelp("--show-xref", "inspection", "show cross reference data", R"(Show the contents of the cross-reference table or stream (object
751 locations in the file) in a human-readable form. This is 759 locations in the file) in a human-readable form. This is
752 especially useful for files with cross-reference streams, which 760 especially useful for files with cross-reference streams, which
753 are stored in a binary format. 761 are stored in a binary format.
754 )"); 762 )");
755 -}  
756 -static void add_help_7(QPDFArgParser& ap)  
757 -{  
758 ap.addOptionHelp("--show-object", "inspection", "show contents of an object", R"(--show-object={trailer|obj[,gen]} 763 ap.addOptionHelp("--show-object", "inspection", "show contents of an object", R"(--show-object={trailer|obj[,gen]}
759 764
760 Show the contents of the given object. This is especially useful 765 Show the contents of the given object. This is especially useful
@@ -814,21 +819,20 @@ This option is repeatable. If given, only specified objects will @@ -814,21 +819,20 @@ This option is repeatable. If given, only specified objects will
814 be shown in the "objects" key of the JSON output. Otherwise, all 819 be shown in the "objects" key of the JSON output. Otherwise, all
815 objects will be shown. 820 objects will be shown.
816 )"); 821 )");
817 -ap.addOptionHelp("--job-json-help", "json", "show format of job JSON", R"(Describe the format of the QPDFJob JSON input used by  
818 ---job-json-file.  
819 -)");  
820 ap.addOptionHelp("--json-stream-data", "json", "how to handle streams in json output", R"(--json-stream-data={none|inline|file} 822 ap.addOptionHelp("--json-stream-data", "json", "how to handle streams in json output", R"(--json-stream-data={none|inline|file}
821 823
822 -Control whether streams in json output should be omitted,  
823 -written inline (base64-encoded) or written to a file. If "file"  
824 -is chosen, the file will be the name of the input file appended  
825 -with -nnn where nnn is the object number. The prefix can be  
826 -overridden with --json-stream-prefix. 824 +When used with --json-output, this option controls whether
  825 +streams in json output should be omitted, written inline
  826 +(base64-encoded) or written to a file. If "file" is chosen, the
  827 +file will be the name of the output file appended with -nnn where
  828 +nnn is the object number. The prefix can be overridden with
  829 +--json-stream-prefix.
827 )"); 830 )");
828 ap.addOptionHelp("--json-stream-prefix", "json", "prefix for json stream data files", R"(--json-stream-prefix=file-prefix 831 ap.addOptionHelp("--json-stream-prefix", "json", "prefix for json stream data files", R"(--json-stream-prefix=file-prefix
829 832
830 -When --json-stream-data=file is given, override the input file  
831 -name as the prefix for stream data files. Whatever is given here 833 +When used with --json-output, --json-stream-data=file-prefix
  834 +sets the prefix for stream data files, overriding the default,
  835 +which is to use the output file name. Whatever is given here
832 will be appended with -nnn to create the name of the file that 836 will be appended with -nnn to create the name of the file that
833 will contain the data for the stream stream in object nnn. 837 will contain the data for the stream stream in object nnn.
834 )"); 838 )");
@@ -836,19 +840,19 @@ ap.addOptionHelp(&quot;--json-output&quot;, &quot;json&quot;, &quot;serialize to JSON&quot;, R&quot;(--json-output[ @@ -836,19 +840,19 @@ ap.addOptionHelp(&quot;--json-output&quot;, &quot;json&quot;, &quot;serialize to JSON&quot;, R&quot;(--json-output[
836 840
837 The output file will be qpdf JSON format at the given version. 841 The output file will be qpdf JSON format at the given version.
838 "version" may be a specific version or "latest" (the default). 842 "version" may be a specific version or "latest" (the default).
839 -Version 1 is not supported. See also --json-stream-data, 843 +The only supported version is 2. See also --json-stream-data,
840 --json-stream-prefix, and --decode-level. 844 --json-stream-prefix, and --decode-level.
841 )"); 845 )");
842 ap.addOptionHelp("--json-input", "json", "input file is qpdf JSON", R"(Treat the input file as a JSON file in qpdf JSON format as 846 ap.addOptionHelp("--json-input", "json", "input file is qpdf JSON", R"(Treat the input file as a JSON file in qpdf JSON format as
843 -written by qpdf --json-output. See the "QPDF JSON Format" 847 +written by qpdf --json-output. See the "qpdf JSON Format"
844 section of the manual for information about how to use this 848 section of the manual for information about how to use this
845 option. 849 option.
846 )"); 850 )");
847 ap.addOptionHelp("--update-from-json", "json", "update a PDF from qpdf JSON", R"(--update-from-json=qpdf-json-file 851 ap.addOptionHelp("--update-from-json", "json", "update a PDF from qpdf JSON", R"(--update-from-json=qpdf-json-file
848 852
849 -Update a PDF file from a JSON file. Please see the "QPDF JSON  
850 -Format" section of the manual for information about how to use  
851 -this option. 853 +Update a PDF file from a JSON file. Please see the "qpdf JSON"
  854 +chapter of the manual for information about how to use this
  855 +option.
852 )"); 856 )");
853 } 857 }
854 static void add_help_8(QPDFArgParser& ap) 858 static void add_help_8(QPDFArgParser& ap)
manual/cli.rst
@@ -171,7 +171,9 @@ Related Options @@ -171,7 +171,9 @@ Related Options
171 equivalent command-line arguments were supplied. It can be repeated 171 equivalent command-line arguments were supplied. It can be repeated
172 and mixed freely with other options. Run ``qpdf`` with 172 and mixed freely with other options. Run ``qpdf`` with
173 :qpdf:ref:`--job-json-help` for a description of the job JSON input 173 :qpdf:ref:`--job-json-help` for a description of the job JSON input
174 - file format. For more information, see :ref:`qpdf-job`. 174 + file format. For more information, see :ref:`qpdf-job`. Note that
  175 + this is unrelated to :qpdf:ref:`--json` but may be combined with
  176 + it.
175 177
176 .. _exit-status: 178 .. _exit-status:
177 179
@@ -341,6 +343,17 @@ Related Options @@ -341,6 +343,17 @@ Related Options
341 itself. The default provider is always listed first. See 343 itself. The default provider is always listed first. See
342 :ref:`crypto` for more information about crypto providers. 344 :ref:`crypto` for more information about crypto providers.
343 345
  346 +.. qpdf:option:: --job-json-help
  347 +
  348 + .. help: show format of job JSON
  349 +
  350 + Describe the format of the QPDFJob JSON input used by
  351 + --job-json-file.
  352 +
  353 + Describe the format of the QPDFJob JSON input used by
  354 + :qpdf:ref:`--job-json-file`. For more information about QPDFJob,
  355 + see :ref:`qpdf-job`.
  356 +
344 .. _general-options: 357 .. _general-options:
345 358
346 General Options 359 General Options
@@ -852,9 +865,11 @@ Related Options @@ -852,9 +865,11 @@ Related Options
852 865
853 When uncompressing streams, control which types of compression 866 When uncompressing streams, control which types of compression
854 schemes should be uncompressed: 867 schemes should be uncompressed:
855 - - none: don't uncompress anything. This is the default with --json-output. 868 + - none: don't uncompress anything. This is the default with
  869 + --json-output.
856 - generalized: uncompress streams compressed with a 870 - generalized: uncompress streams compressed with a
857 - general-purpose compression algorithm. This is the default. 871 + general-purpose compression algorithm. This is the default
  872 + except when --json-output is given.
858 - specialized: in addition to generalized, also uncompress 873 - specialized: in addition to generalized, also uncompress
859 streams compressed with a special-purpose but non-lossy 874 streams compressed with a special-purpose but non-lossy
860 compression scheme 875 compression scheme
@@ -875,7 +890,8 @@ Related Options @@ -875,7 +890,8 @@ Related Options
875 ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define 890 ``/ASCII85Decode``, and ``/ASCIIHexDecode``. We define
876 generalized filters as those to be used for general-purpose 891 generalized filters as those to be used for general-purpose
877 compression or encoding, as opposed to filters specifically 892 compression or encoding, as opposed to filters specifically
878 - designed for image data. This is the default. 893 + designed for image data. This is the default except when
  894 + :qpdf:ref:`--json-output` is given.
879 895
880 - :samp:`specialized`: in addition to generalized, decode streams 896 - :samp:`specialized`: in addition to generalized, decode streams
881 with supported non-lossy specialized filters; currently this is 897 with supported non-lossy specialized filters; currently this is
@@ -3126,8 +3142,9 @@ Related Options @@ -3126,8 +3142,9 @@ Related Options
3126 is usually but not always equal to the file name and is needed by 3142 is usually but not always equal to the file name and is needed by
3127 some of the other options. See also :ref:`attachments`. Note that 3143 some of the other options. See also :ref:`attachments`. Note that
3128 this option displays dates in PDF timestamp syntax. When attachment 3144 this option displays dates in PDF timestamp syntax. When attachment
3129 - information is included in json output (see :ref:`--json`), dates  
3130 - are shown in ISO-8601 format. 3145 + information is included in json output in the ``"attachments"`` key
  3146 + (see :ref:`--json`), dates are shown (just within that object) in
  3147 + ISO-8601 format.
3131 3148
3132 .. qpdf:option:: --show-attachment=key 3149 .. qpdf:option:: --show-attachment=key
3133 3150
@@ -3169,14 +3186,11 @@ Related Options @@ -3169,14 +3186,11 @@ Related Options
3169 3186
3170 Generate a JSON representation of the file. This is described in 3187 Generate a JSON representation of the file. This is described in
3171 depth in :ref:`json`. The version parameter can be used to specify 3188 depth in :ref:`json`. The version parameter can be used to specify
3172 - which version of the qpdf JSON format should be output. The only  
3173 - supported value is ``1``, but it's possible that a new JSON output  
3174 - version will be added in a future version. You can also specify  
3175 - ``latest`` to use the latest JSON version. For backward  
3176 - compatibility, the default value will remain ``1`` until qpdf  
3177 - version 11, after which point it will become ``latest``. In all  
3178 - case, you can tell what version of the JSON output you have from  
3179 - the ``"version"`` key in the output. Use the 3189 + which version of the qpdf JSON format should be output. The version
  3190 + number be a number or ``latest``. The default is ``latest``. As of
  3191 + qpdf 11, the latest version is ``2``. If you have code that reads
  3192 + qpdf JSON output, you can tell what version of the JSON output you
  3193 + have from the ``"version"`` key in the output. Use the
3180 :qpdf:ref:`--json-help` option to get a description of the JSON 3194 :qpdf:ref:`--json-help` option to get a description of the JSON
3181 object. 3195 object.
3182 3196
@@ -3189,11 +3203,11 @@ Related Options @@ -3189,11 +3203,11 @@ Related Options
3189 containing descriptive text. 3203 containing descriptive text.
3190 3204
3191 Describe the format of the JSON output by writing to standard 3205 Describe the format of the JSON output by writing to standard
3192 - output a JSON object with the same structure with the same keys as  
3193 - the JSON generated by qpdf. In the output written by  
3194 - ``--json-help``, each key's value is a description of the key. The  
3195 - specific contract guaranteed by qpdf in its JSON representation is  
3196 - explained in more detail in the :ref:`json`. 3206 + output a JSON object with the same structure as the JSON generated
  3207 + by qpdf. In the output written by ``--json-help``, each key's value
  3208 + is a description of the key. The specific contract guaranteed by
  3209 + qpdf in its JSON representation is explained in more detail in the
  3210 + :ref:`json`.
3197 3211
3198 .. qpdf:option:: --json-key=key 3212 .. qpdf:option:: --json-key=key
3199 3213
@@ -3216,53 +3230,50 @@ Related Options @@ -3216,53 +3230,50 @@ Related Options
3216 be shown in the "objects" key of the JSON output. Otherwise, all 3230 be shown in the "objects" key of the JSON output. Otherwise, all
3217 objects will be shown. 3231 objects will be shown.
3218 3232
3219 - This option is repeatable. If given, only specified objects will  
3220 - be shown in the "``objects``" key of the JSON output. Otherwise, all  
3221 - objects will be shown.  
3222 -  
3223 -.. qpdf:option:: --job-json-help  
3224 -  
3225 - .. help: show format of job JSON  
3226 -  
3227 - Describe the format of the QPDFJob JSON input used by  
3228 - --job-json-file.  
3229 -  
3230 - Describe the format of the QPDFJob JSON input used by  
3231 - :qpdf:ref:`--job-json-file`. For more information about QPDFJob,  
3232 - see :ref:`qpdf-job`. 3233 + This option is repeatable. If given, only specified objects will be
  3234 + shown in the ``"objects"`` key of the JSON output. Otherwise, all
  3235 + objects will be shown. For qpdf JSON version 1, this also affects
  3236 + the ``"objectinfo"`` key, which is not present in version 2. This
  3237 + option may be used with :qpdf:ref:`--json` and also with
  3238 + :qpdf:ref:`--json-output`.
3233 3239
3234 .. qpdf:option:: --json-stream-data={none|inline|file} 3240 .. qpdf:option:: --json-stream-data={none|inline|file}
3235 3241
3236 .. help: how to handle streams in json output 3242 .. help: how to handle streams in json output
3237 3243
3238 - Control whether streams in json output should be omitted,  
3239 - written inline (base64-encoded) or written to a file. If "file"  
3240 - is chosen, the file will be the name of the input file appended  
3241 - with -nnn where nnn is the object number. The prefix can be  
3242 - overridden with --json-stream-prefix.  
3243 -  
3244 - Control whether streams in json output should be omitted, written  
3245 - inline (base64-encoded) or written to a file. If ``file`` is  
3246 - chosen, the file will be the name of the input file appended with  
3247 - :samp:`-{nnn}` where :samp:`{nnn}` is the object number. The prefix  
3248 - can be overridden with :qpdf:ref:`--json-stream-prefix`. This  
3249 - option only applies when used with :qpdf:ref:`--json-output`. 3244 + When used with --json-output, this option controls whether
  3245 + streams in json output should be omitted, written inline
  3246 + (base64-encoded) or written to a file. If "file" is chosen, the
  3247 + file will be the name of the output file appended with -nnn where
  3248 + nnn is the object number. The prefix can be overridden with
  3249 + --json-stream-prefix.
  3250 +
  3251 + When used with :qpdf:ref:`--json-output`, this option controls
  3252 + whether streams in JSON output should be omitted, written inline
  3253 + (base64-encoded) or written to a file. If ``file`` is chosen, the
  3254 + file will be the name of the output file appended with
  3255 + :samp:`-{nnn}` where :samp:`{nnn}` is the object number. The stream
  3256 + data file prefix can be overridden with
  3257 + :qpdf:ref:`--json-stream-prefix`. This option only applies when
  3258 + used with :qpdf:ref:`--json-output`.
3250 3259
3251 .. qpdf:option:: --json-stream-prefix=file-prefix 3260 .. qpdf:option:: --json-stream-prefix=file-prefix
3252 3261
3253 .. help: prefix for json stream data files 3262 .. help: prefix for json stream data files
3254 3263
3255 - When --json-stream-data=file is given, override the input file  
3256 - name as the prefix for stream data files. Whatever is given here 3264 + When used with --json-output, --json-stream-data=file-prefix
  3265 + sets the prefix for stream data files, overriding the default,
  3266 + which is to use the output file name. Whatever is given here
3257 will be appended with -nnn to create the name of the file that 3267 will be appended with -nnn to create the name of the file that
3258 will contain the data for the stream stream in object nnn. 3268 will contain the data for the stream stream in object nnn.
3259 3269
3260 - When :qpdf:ref:`--json-stream-data` is given with the value  
3261 - ``file``, override the input file name as the prefix for stream  
3262 - data files. Whatever is given here will be appended with  
3263 - :samp:`-{nnn}` to create the name of the file that will contain the  
3264 - data for the stream stream in object :samp:`{nnn}`. This  
3265 - option only applies when used with :qpdf:ref:`--json-output`. 3270 + When used with :qpdf:ref:`--json-output`,
  3271 + ``--json-stream-data=file-prefix`` sets the prefix for stream data
  3272 + files, overriding the default, which is to use the output file
  3273 + name. Whatever is given here will be appended with :samp:`-{nnn}`
  3274 + to create the name of the file that will contain the data for the
  3275 + stream stream in object :samp:`{nnn}`. This option only applies
  3276 + when used with :qpdf:ref:`--json-output`.
3266 3277
3267 .. qpdf:option:: --json-output[=version] 3278 .. qpdf:option:: --json-output[=version]
3268 3279
@@ -3270,44 +3281,45 @@ Related Options @@ -3270,44 +3281,45 @@ Related Options
3270 3281
3271 The output file will be qpdf JSON format at the given version. 3282 The output file will be qpdf JSON format at the given version.
3272 "version" may be a specific version or "latest" (the default). 3283 "version" may be a specific version or "latest" (the default).
3273 - Version 1 is not supported. See also --json-stream-data, 3284 + The only supported version is 2. See also --json-stream-data,
3274 --json-stream-prefix, and --decode-level. 3285 --json-stream-prefix, and --decode-level.
3275 3286
3276 - The output file will be qpdf JSON format at the given version.  
3277 - ``version`` may be a specific version or ``latest`` (the default).  
3278 - Version 1 is not supported. See also :qpdf:ref:`--json-stream-data`  
3279 - and :qpdf:ref:`--json-stream-prefix`. The default decode level is  
3280 - ``none``, but you can override it with :qpdf:ref:`--decode-level`.  
3281 - If you want to look at the contents of streams easily as you would  
3282 - in QDF mode (see :ref:`qdf`), you can use  
3283 - ``--decode-level=generalized`` and ``--json-stream-data=file`` for  
3284 - a convenient way to do that. 3287 + The output file, instead of being a PDF file, will be a JSON file
  3288 + in qpdf JSON format at the given version. ``version`` may be a
  3289 + specific version or ``latest`` (the default). The only supported
  3290 + version is 2. See also :qpdf:ref:`--json-stream-data` and
  3291 + :qpdf:ref:`--json-stream-prefix`. When this option is specified,
  3292 + the default decode level for stream data is ``none``, but you can
  3293 + override it with :qpdf:ref:`--decode-level`. If you want to look at
  3294 + the contents of streams easily as you would in QDF mode (see
  3295 + :ref:`qdf`), you can use ``--decode-level=generalized`` and
  3296 + ``--json-stream-data=file`` for a convenient way to do that.
3285 3297
3286 .. qpdf:option:: --json-input 3298 .. qpdf:option:: --json-input
3287 3299
3288 .. help: input file is qpdf JSON 3300 .. help: input file is qpdf JSON
3289 3301
3290 Treat the input file as a JSON file in qpdf JSON format as 3302 Treat the input file as a JSON file in qpdf JSON format as
3291 - written by qpdf --json-output. See the "QPDF JSON Format" 3303 + written by qpdf --json-output. See the "qpdf JSON Format"
3292 section of the manual for information about how to use this 3304 section of the manual for information about how to use this
3293 option. 3305 option.
3294 3306
3295 Treat the input file as a JSON file in qpdf JSON format as written 3307 Treat the input file as a JSON file in qpdf JSON format as written
3296 by ``qpdf --json-output``. The input file must be complete and 3308 by ``qpdf --json-output``. The input file must be complete and
3297 include all stream data. For information about converting between 3309 include all stream data. For information about converting between
3298 - PDF and JSON, please see :ref:`qpdf-json`. 3310 + PDF and JSON, please see :ref:`json`.
3299 3311
3300 .. qpdf:option:: --update-from-json=qpdf-json-file 3312 .. qpdf:option:: --update-from-json=qpdf-json-file
3301 3313
3302 .. help: update a PDF from qpdf JSON 3314 .. help: update a PDF from qpdf JSON
3303 3315
3304 - Update a PDF file from a JSON file. Please see the "QPDF JSON  
3305 - Format" section of the manual for information about how to use  
3306 - this option. 3316 + Update a PDF file from a JSON file. Please see the "qpdf JSON"
  3317 + chapter of the manual for information about how to use this
  3318 + option.
3307 3319
3308 - This option updates a PDF file from a qpdf JSON file. For a  
3309 - information about how to use this option, please see  
3310 - :ref:`qpdf-json`. 3320 + This option updates a PDF file from the specified qpdf JSON file.
  3321 + For a information about how to use this option, please see
  3322 + :ref:`json`.
3311 3323
3312 .. _test-options: 3324 .. _test-options:
3313 3325
@@ -3420,7 +3432,7 @@ Related Options @@ -3420,7 +3432,7 @@ Related Options
3420 3432
3421 This is used by qpdf's test suite to check consistency between the 3433 This is used by qpdf's test suite to check consistency between the
3422 output of ``qpdf --json`` and the output of ``qpdf --json-help``. 3434 output of ``qpdf --json`` and the output of ``qpdf --json-help``.
3423 - This option causes an extra copy of the generated json to appear in 3435 + This option causes an extra copy of the generated JSON to appear in
3424 memory and is therefore unsuitable for use with large files. This 3436 memory and is therefore unsuitable for use with large files. This
3425 is why it's also not on by default. 3437 is why it's also not on by default.
3426 3438
manual/design.rst
@@ -242,7 +242,7 @@ the current file position. If the token is a not either a dictionary or @@ -242,7 +242,7 @@ the current file position. If the token is a not either a dictionary or
242 array opener, an object is immediately constructed from the single token 242 array opener, an object is immediately constructed from the single token
243 and the parser returns. Otherwise, the parser iterates in a special mode 243 and the parser returns. Otherwise, the parser iterates in a special mode
244 in which it accumulates objects until it finds a balancing closer. 244 in which it accumulates objects until it finds a balancing closer.
245 -During this process, the "``R``" keyword is recognized and an indirect 245 +During this process, the ``R`` keyword is recognized and an indirect
246 ``QPDFObjectHandle`` may be constructed. 246 ``QPDFObjectHandle`` may be constructed.
247 247
248 The ``QPDF::resolve()`` method, which is used to resolve an indirect 248 The ``QPDF::resolve()`` method, which is used to resolve an indirect
@@ -280,15 +280,15 @@ file. @@ -280,15 +280,15 @@ file.
280 it is looking before the last ``%%EOF``. After getting to ``trailer`` 280 it is looking before the last ``%%EOF``. After getting to ``trailer``
281 keyword, it invokes the parser. 281 keyword, it invokes the parser.
282 282
283 -- The parser sees "``<<``", so it calls itself recursively in 283 +- The parser sees ``<<``, so it calls itself recursively in
284 dictionary creation mode. 284 dictionary creation mode.
285 285
286 - In dictionary creation mode, the parser keeps accumulating objects 286 - In dictionary creation mode, the parser keeps accumulating objects
287 - until it encounters "``>>``". Each object that is read is pushed onto  
288 - a stack. If "``R``" is read, the last two objects on the stack are 287 + until it encounters ``>>``. Each object that is read is pushed onto
  288 + a stack. If ``R`` is read, the last two objects on the stack are
289 inspected. If they are integers, they are popped off the stack and 289 inspected. If they are integers, they are popped off the stack and
290 their values are used to construct an indirect object handle which is 290 their values are used to construct an indirect object handle which is
291 - then pushed onto the stack. When "``>>``" is finally read, the stack 291 + then pushed onto the stack. When ``>>`` is finally read, the stack
292 is converted into a ``QPDF_Dictionary`` which is placed in a 292 is converted into a ``QPDF_Dictionary`` which is placed in a
293 ``QPDFObjectHandle`` and returned. 293 ``QPDFObjectHandle`` and returned.
294 294
manual/json.rst
  1 +.. cSpell:ignore moddifyannotations
  2 +.. cSpell:ignore feff
  3 +
1 .. _json: 4 .. _json:
2 5
3 -QPDF JSON 6 +qpdf JSON
4 ========= 7 =========
5 8
6 .. _json-overview: 9 .. _json-overview:
@@ -8,27 +11,540 @@ QPDF JSON @@ -8,27 +11,540 @@ QPDF JSON
8 Overview 11 Overview
9 -------- 12 --------
10 13
11 -Beginning with qpdf version 8.3.0, the :command:`qpdf`  
12 -command-line program can produce a JSON representation of the  
13 -non-content data in a PDF file. It includes a dump in JSON format of all  
14 -objects in the PDF file excluding the content of streams. This JSON  
15 -representation makes it very easy to look in detail at the structure of  
16 -a given PDF file, and it also provides a great way to work with PDF  
17 -files programmatically from the command-line in languages that can't  
18 -call or link with the qpdf library directly. Note that stream data can  
19 -be extracted from PDF files using other qpdf command-line options. 14 +Beginning with qpdf version 11.0.0, the qpdf library and command-line
  15 +program can produce a JSON representation of the in a PDF file. qpdf
  16 +version 11 introduces JSON format version 2. Prior to qpdf 11,
  17 +versions 8.3.0 onward had a more limited JSON representation
  18 +accessible only from the command-line. For details on what changed,
  19 +see :ref:`json-v2-changes`. The rest of this chapter documents qpdf
  20 +JSON version 2.
  21 +
  22 +Please note: this chapter discusses *qpdf JSON format*, which
  23 +represents the contents of a PDF file. This is distinct from the
  24 +*QPDFJob JSON format* which provides a higher-level interface
  25 +interacting with qpdf the way the command-line tool does. For
  26 +information about that, see :ref:`qpdf-job`.
  27 +
  28 +The qpdf JSON format is specific to qpdf. There are two ways to use
  29 +qpdf JSON:
  30 +
  31 +- The :qpdf:ref:`--json` command-ine flag causes creation of a JSON
  32 + representation of all the objects in a PDF file, excluding stream
  33 + data. This includes an unambiguous representation of the PDF object
  34 + structure and also provides JSON-formatted summaries of other
  35 + information about the file. This functionality is built into
  36 + ``QPDFJob`` and can be accessed from the ``qpdf`` command-line tool
  37 + or from the ``QPDFJob`` C or C++ API.
  38 +
  39 +- qpdf can create a JSON file that completely represents a PDF file.
  40 + You can think of this as using JSON as an *alternative syntax* for
  41 + representing a PDF file. Using qpdf JSON, it is possible to
  42 + convert a PDF file to JSON, manipulate the structure or contents of
  43 + the objects at a low level, and convert the results back to a PDF
  44 + file. This functionality can be accessed from the command-line with
  45 + the :qpdf:ref:`--json-output`, :qpdf:ref:`--json-input`, and
  46 + :qpdf:ref:`--update-from-json` flags, or from the API using the
  47 + ``QPDF::writeJSON``, ``QPDF::createFromJSON``, and
  48 + ``QPDF::updateFromJSON`` methods.
  49 +
  50 +.. _json-terminology:
  51 +
  52 +JSON Terminology
  53 +----------------
  54 +
  55 +Notes about terminology:
  56 +
  57 +- In JavaScript and JSON, that thing that has keys and values is
  58 + typically called an *object*.
  59 +
  60 +- In PDF, that thing that has keys and values is typically called a
  61 + *dictionary*. An *object* is a PDF object such as integer, real,
  62 + boolean, null, string, array, dictionary, or stream.
  63 +
  64 +- Some languages that use JSON call an *object* a *dictionary*, a
  65 + *map*, or a *hash*.
  66 +
  67 +- Sometimes, it's called on *object* if it has fixed keys and a
  68 + *dictionary* if it has variable keys.
  69 +
  70 +This manual is not entirely consistent about its use of *dictionary*
  71 +vs. *object* because sometimes one term or another is clearer in
  72 +context. Just be aware of the ambiguity when reading the manual. We
  73 +frequently use the term *dictionary* to refer to a JSON object because
  74 +of the consistency with PDF terminology.
  75 +
  76 +.. _what-qpdf-json-is-not:
  77 +
  78 +What qpdf JSON is not
  79 +---------------------
  80 +
  81 +Please note that qpdf JSON offers a convenient syntax for manipulating
  82 +PDF files at a low level using JSON syntax. JSON syntax is much easier
  83 +to work with than native PDF syntax, and there are good JSON libraries
  84 +in virtually every commonly used programming language. Working with
  85 +PDF objects in JSON removes the need to worry about stream lengths,
  86 +cross reference tables, and PDF-specific representations of Unicode or
  87 +binary strings that appear outside of content streams. It does not
  88 +eliminate the need to understand the semantic structure of PDF files.
  89 +Working with qpdf JSON still requires familiarity with the PDF
  90 +specification.
  91 +
  92 +In particular, qpdf JSON *does not* provide any of the following
  93 +capabilities:
  94 +
  95 +- Text extraction. While you could use qpdf JSON syntax to navigate to
  96 + a page's content streams and font structures, text within pages is
  97 + still encoded using PDF syntax within content streams, and there is
  98 + no assistance for text extraction.
  99 +
  100 +- Reflowing text, document structure. qpdf JSON does not add any new
  101 + information or insight into the content of PDF files. If you have a
  102 + PDF file that lacks any structural information, qpdf JSON won't help
  103 + you solve any of those problems.
  104 +
  105 +This is what we mean when we say that JSON provides a *alternative
  106 +syntax* for working with PDF data. Semantically, it is identical to
  107 +native PDF.
20 108
21 .. _qpdf-json: 109 .. _qpdf-json:
22 110
23 -QPDF JSON Format 111 +qpdf JSON Format
24 ---------------- 112 ----------------
25 113
26 -XXX Write this. 114 +This section describes how qpdf represents PDF objects in JSON format.
  115 +It also describes how to work with qpdf JSON to create or
  116 +modify PDF files.
  117 +
  118 +.. _json.objects:
  119 +
  120 +qpdf JSON Object Representation
  121 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  122 +
  123 +This section describes the representation of PDF objects in qpdf JSON
  124 +version 2. PDF objects are represented within the ``"objects"``
  125 +dictionary of a qpdf JSON file. This is true both for PDF serialized
  126 +to JSON (:qpdf:ref:`--json-output`, ``QPDF::writeJSON``) or objects as
  127 +they appear in the output of ``qpdf`` with the :qpdf:ref:`--json`
  128 +option.
  129 +
  130 +Each key in the ``"objects"`` dictionary is either ``"trailer"`` or a
  131 +string of the form ``"obj:O G R"`` where ``O`` and ``G`` are the
  132 +object and generation numbers and ``R`` is the literal string ``R``.
  133 +This is the PDF syntax for the indirect object reference prepended by
  134 +``obj:``. The value, representing the object itself, is a JSON object
  135 +whose structure is described below.
  136 +
  137 +Top-level Stream Objects
  138 + Stream objects are represented as a JSON object with the single key
  139 + ``"stream"``. The stream object has a key called ``"dict"`` whose
  140 + value is the stream dictionary as an object value (described below)
  141 + with the ``"/Length"`` key omitted. Other keys are determined by the
  142 + value for json stream data (:qpdf:ref:`--json-stream-data`, or a
  143 + parameter of type ``qpdf_json_stream_data_e``) as follows:
  144 +
  145 + - ``none``: stream data is not represented; no other keys are
  146 + present
  147 +
  148 + - ``inline``: the stream data appears as a base64-encoded string as
  149 + the value of the ``"data"`` key
  150 +
  151 + - ``file``: the stream data is written to a file, and the path to
  152 + the file is stored in the ``"datafile"`` key. A relative path is
  153 + interpreted as relative to the current directory when qpdf is
  154 + invoked.
  155 +
  156 + Keys other than ``"dict"``, ``"data"``, and ``"datafile"`` are
  157 + ignored. This is primarily for future compatibility in case a newer
  158 + version of qpdf includes additional information.
  159 +
  160 + As with the native PDF representation, the stream data must be
  161 + consistent with whatever filters and decode parameters are specified
  162 + in the stream dictionary.
  163 +
  164 +Top-level Non-stream Objects
  165 + Non-stream objects are represented as a dictionary with the single
  166 + key ``"value"``. Other keys are ignored for future compatibility.
  167 + The value's structure is described in "Object Values" below.
  168 +
  169 + Note: in files that use object streams, the trailer "dictionary" is
  170 + actually a stream, but in the JSON representation, the value of the
  171 + ``"trailer"`` key is always written as a dictionary (with a
  172 + ``"value"`` key like other non-stream objects). There will also be a
  173 + a stream object whose key is the object ID of the cross-reference
  174 + stream, even though this stream will generally be unreferenced. This
  175 + makes it possible to assume ``"trailer"`` points to a dictionary
  176 + without having to consider whether the file uses object streams or
  177 + not. It is also consistent with how ``QPDF::getTrailer`` behaves in
  178 + the C++ API.
  179 +
  180 +Object Values
  181 + Within ``"value"`` or ``"stream"."dict"``, PDF objects are
  182 + represented as follows:
  183 +
  184 + - Objects of type Boolean or null are represented as JSON objects of
  185 + the same type.
  186 +
  187 + - Objects that are numeric are represented as numeric in the JSON
  188 + without regard to precision. Internally, qpdf stores numeric
  189 + values as strings, so qpdf will preserve arbitrary precision
  190 + numerical values when reading and writing JSON. It is likely that
  191 + other JSON readers and writers will have implementation-dependent
  192 + ways of handling numerical values that are out of range.
  193 +
  194 + - Name objects are represented as JSON strings that start with ``/``
  195 + and are followed by the PDF name in canonical form with all PDF
  196 + syntax resolved. For example, the name whose canonical form (per
  197 + the PDF specification) is ``text/plain`` would be represented in
  198 + JSON as ``"/text/plain"`` and in PDF as ``"/text#2fplain"``.
  199 +
  200 + - Indirect object references are represented as JSON strings that
  201 + look like a PDF indirect object reference and have the form ``"O G
  202 + R"`` where ``O`` and ``G`` are the object and generation numbers
  203 + and ``R`` is the literal string ``R``. For example, ``"3 0 R"``
  204 + would represent a reference to the object with object ID 3 and
  205 + generation 0.
  206 +
  207 + - PDF strings are represented as JSON strings in one of two ways:
  208 +
  209 + - ``"u:utf8-encoded-string"``: this format is used when the PDF
  210 + string can be unambiguously represented as a Unicode string and
  211 + contains no unprintable characters. This is the case whether the
  212 + input string is encoded as UTF-16, UTF-8 (as allowed by PDF
  213 + 2.0), or PDF doc encoding. Strings are only represented this way
  214 + if they can be encoded without loss of information.
  215 +
  216 + - ``"b:hex-string"``: this format is used to represent any binary
  217 + string value that can't be represented as a Unicode string.
  218 + ``hex-string`` must have an even number of characters that range
  219 + from ``a`` through ``f``, ``A`` through ``F``, or ``0`` through
  220 + ``9``.
  221 +
  222 + qpdf writes empty strings as ``"u:"``, but both ``"b:"`` and
  223 + ``"u:"`` are valid representations of the empty string.
  224 +
  225 + There is full support for UTF-16 surrogate pairs. Binary strings
  226 + encoded with ``"b:..."`` are the internal PDF representations.
  227 + As such, the following are equivalent:
  228 +
  229 + - ``"u:\ud83e\udd54"`` -- representation of U+1F954 as a surrogate
  230 + pair in JSON syntax
  231 +
  232 + - ``"b:FEFFD83EDD54"`` -- representation of U+1F954 as the bytes
  233 + of a UTF-16 string in PDF syntax with the leading ``FEFF``
  234 + indicating UTF-16
  235 +
  236 + - ``"b:efbbbff09fa594"`` -- representation of U+1F954 as the
  237 + bytes of a UTF-8 string in PDF syntax (as allowed by PDF 2.0)
  238 + with the leading ``EF``, ``BB``, ``BF`` sequence (which is just
  239 + UTF-8 encoding of ``FEFF``).
  240 +
  241 + - A JSON string whose contents are ``u:`` followed by the UTF-8
  242 + representation of U+1F954. This is the potato emoji.
  243 + Unfortunately, I am not able to render it in the PDF version
  244 + of this manual.
  245 +
  246 + - PDF arrays are represented as JSON arrays of objects as described
  247 + above
  248 +
  249 + - PDF dictionaries are represented as JSON objects whose keys are
  250 + the string representations of names and whose values are
  251 + representations of PDF objects.
  252 +
  253 +.. _json.output:
  254 +
  255 +qpdf JSON Output
  256 +~~~~~~~~~~~~~~~~
  257 +
  258 +The format of the JSON written by qpdf's :qpdf:ref:`--json-output`
  259 +flag or the ``QPDF::writeJSON`` API call is a JSON object consisting
  260 +of a single key: ``"qpdf-v2"``. Any other top-level keys are ignored.
  261 +While unknown keys in other places are ignored for future
  262 +compatibility, in this case, ignoring other top-level keys is an
  263 +explicit decision to allow users to include other keys for their own
  264 +use. No new top-level keys will be added in JSON version 2.
  265 +
  266 +The ``"qpdf-v2"`` key points to a JSON object with the following keys:
  267 +
  268 +- ``"pdfversion"`` -- a string containing PDF version as indicated in
  269 + the PDF header (e.g. ``"1.7"``, ``"2.0"``)
  270 +
  271 +- ``"maxobjectid"`` -- a number indicating the object ID of the
  272 + highest numbered object in the file. This is provided to make it
  273 + easier for software that wants to add new objects to the file as you
  274 + can safely start with one above that number when creating new
  275 + objects. Note that the value of ``"maxobjectid"`` may be higher than
  276 + the actual maximum object that appears in the input PDF since it
  277 + takes into consideration any dangling indirect object references
  278 + from the original file. This prevents you from unwittingly creating
  279 + an object that doesn't exist but that is referenced, which may have
  280 + unintended side effects. (The PDF specification explicitly allows
  281 + dangling references and says to treat them as nulls. This can happen
  282 + if objects are removed from a PDF file.)
  283 +
  284 +- ``"objects"`` -- the actual PDF objects as described in
  285 + :ref:`json.objects`.
  286 +
  287 +Note that writing JSON output is done by ``QPDF``, not ``QPDFWriter``.
  288 +As such, none of the things ``QPDFWriter`` does apply. This includes
  289 +recompression of streams, renumbering of objects, anything to do with
  290 +object streams (which are not represented by qpdf JSON at all since
  291 +they are PDF syntax, not semantics), encryption, decryption,
  292 +linearization, QDF mode, etc.
  293 +
  294 +.. _json.example:
  295 +
  296 +qpdf JSON Example
  297 +~~~~~~~~~~~~~~~~~
  298 +
  299 +The JSON below shows an example of a simple PDF file represented in
  300 +qpdf JSON format.
  301 +
  302 +.. code-block:: json
  303 +
  304 + {
  305 + "qpdf-v2": {
  306 + "pdfversion": "1.3",
  307 + "maxobjectid": 5,
  308 + "objects": {
  309 + "obj:1 0 R": {
  310 + "value": {
  311 + "/Pages": "2 0 R",
  312 + "/Type": "/Catalog"
  313 + }
  314 + },
  315 + "obj:2 0 R": {
  316 + "value": {
  317 + "/Count": 1,
  318 + "/Kids": [ "3 0 R" ],
  319 + "/Type": "/Pages"
  320 + }
  321 + },
  322 + "obj:3 0 R": {
  323 + "value": {
  324 + "/Contents": "4 0 R",
  325 + "/MediaBox": [ 0, 0, 612, 792 ],
  326 + "/Parent": "2 0 R",
  327 + "/Resources": {
  328 + "/Font": {
  329 + "/F1": "5 0 R"
  330 + }
  331 + },
  332 + "/Type": "/Page"
  333 + }
  334 + },
  335 + "obj:4 0 R": {
  336 + "stream": {
  337 + "data": "eJxzCuFSUNB3M1QwMlEISQOyzY2AyEAhJAXI1gjIL0ksyddUCMnicg3hAgDLAQnI",
  338 + "dict": {
  339 + "/Filter": "/FlateDecode"
  340 + }
  341 + }
  342 + },
  343 + "obj:5 0 R": {
  344 + "value": {
  345 + "/BaseFont": "/Helvetica",
  346 + "/Encoding": "/WinAnsiEncoding",
  347 + "/Subtype": "/Type1",
  348 + "/Type": "/Font"
  349 + }
  350 + },
  351 + "trailer": {
  352 + "value": {
  353 + "/ID": [
  354 + "b:98b5a26966fba4d3a769b715b2558da6",
  355 + "b:98b5a26966fba4d3a769b715b2558da6"
  356 + ],
  357 + "/Root": "1 0 R",
  358 + "/Size": 6
  359 + }
  360 + }
  361 + }
  362 + }
  363 + }
  364 +
  365 +.. _json.input:
  366 +
  367 +qpdf JSON Input
  368 +~~~~~~~~~~~~~~~
  369 +
  370 +Output in the JSON output format described in :ref:`json.output` can
  371 +be used in two different ways:
  372 +
  373 +- By using the :qpdf:ref:`--json-input` flag or calling
  374 + ``QPDF::createFromJSON`` in place of ``QPDF::processFile``, a qpdf
  375 + JSON file can be used in place of a PDF file as the input to qpdf.
  376 +
  377 +- By using the :qpdf:ref:`--update-from-json` flag or calling
  378 + ``QPDF::updateFromJSON`` on an initialized ``QPDF`` object, a qpdf
  379 + JSON file can be used to apply changes to an existing ``QPDF``
  380 + object. That ``QPDF`` object can have come from any source including
  381 + a PDF file, a qpdf JSON file, or the result of any other process
  382 + that results in a valid, initialized ``QPDF`` object.
  383 +
  384 +Here are some important things to know about qpdf JSON input.
  385 +
  386 +- When a qpdf JSON file is used as the primary input file, it must be
  387 + complete. This means
  388 +
  389 + - A PDF version number must be specified with the ``"pdfversion"``
  390 + key
  391 +
  392 + - Stream data must be present for all streams
  393 +
  394 + - The trailer dictionary must be present, though only the
  395 + ``"/Root"`` key is required.
  396 +
  397 +- Certain fields from the input are ignored whether creating or
  398 + updating from a JSON file:
  399 +
  400 + - ``"maxobjectid"`` is ignored, so it is not necessary to update it
  401 + when adding new objects.
  402 +
  403 + - ``"/Length"`` is ignored in all stream dictionaries. qpdf doesn't
  404 + put it there when it creates JSON output, and it is not necessary
  405 + to add it.
  406 +
  407 + - ``"/Size"`` is ignored if it appears in a trailer dictionary as
  408 + that is always recomputed by ``QPDFWriter``.
  409 +
  410 + - Unknown keys at the to top level of the file, within ``objects``,
  411 + at the top level of each individual object (inside the object that
  412 + has the ``"value"`` or ``"stream"`` key) and directly within
  413 + ``"stream"`` are ignored for future compatibility. You should
  414 + avoid putting your own values in those places if you wish to avoid
  415 + risking that your JSON files will not work in future versions of
  416 + qpdf. The exception to this advice is at the top level of the
  417 + overall file where it is explicitly supported for you to add your
  418 + own keys. For example, you could add your own metadata at the top
  419 + level, and qpdf will ignore it. Note that extra top-level keys are
  420 + not preserved when qpdf reads your JSON file.
  421 +
  422 +- When qpdf reads a PDF file, the internal object numbers are always
  423 + preserved. However, when qpdf writes a file using ``QPDFWriter``,
  424 + ``QPDFWriter`` does its own numbering and, in general, does not
  425 + preserve input object numbers. That means that a qpdf JSON file that
  426 + is used to update an existing PDF must have object numbers that
  427 + match the input file it is modifying. In practical terms, this means
  428 + that you can't use a JSON file created from one PDF file to modify
  429 + the *output of running qpdf on that file*.
  430 +
  431 + To put this more concretely, the following is valid:
  432 +
  433 + ::
  434 +
  435 + qpdf --json-output in.pdf pdf.json
  436 + # edit pdf.json
  437 + qpdf in.pdf out.pdf --update-from-json=pdf.json
  438 +
  439 + The following will not produce predictable results because
  440 + ``out.pdf`` won't have the same object numbers as ``pdf.json`` and
  441 + ``in.pdf``.
  442 +
  443 + ::
  444 +
  445 + qpdf --json-output in.pdf pdf.json
  446 + # edit pdf.json
  447 + qpdf in.pdf out.pdf --update-from-json=pdf.json
  448 + # edit pdf.json again
  449 + # Don't do this
  450 + qpdf out.pdf out2.pdf --update-from-json=pdf.json
  451 +
  452 +- When updating from a JSON file (:qpdf:ref:`--update-from-json`,
  453 + ``QPDF::updateFromJSON``), existing objects are updated in place.
  454 + This has the following implications:
  455 +
  456 + - You may omit both ``"data"`` and ``"datafile"`` if the object you
  457 + are updating is already a stream. In that case the original stream
  458 + data is preserved. You must always provide a stream dictionary,
  459 + but it may be empty. Note that an empty stream dictionary will
  460 + clear the old dictionary. There is no way to indicate that an old
  461 + stream dictionary should be left alone, so if your intention is to
  462 + replace the stream data and preserve the dictionary, the
  463 + original dictionary must appear in the JSON file.
  464 +
  465 + - You can change one object type to another object type including
  466 + replacing a stream with a non-stream or a non-stream with a
  467 + stream. If you replace a non-stream with a stream, you must
  468 + provide data for the stream.
  469 +
  470 + - Objects that you do not wish to modify can be omitted from the
  471 + JSON. That includes the trailer. That means you can use the output
  472 + of a qpdf JSON file that was written using
  473 + :qpdf:ref:`--json-object` to have it include only the objects you
  474 + intend to modify.
  475 +
  476 + - You can omit the ``"pdfversion"`` key. The input PDF version will
  477 + be preserved.
  478 +
  479 +.. _json.workflow-cli:
  480 +
  481 +qpdf JSON Workflow: CLI
  482 +~~~~~~~~~~~~~~~~~~~~~~~
  483 +
  484 +This section includes a few examples of using qpdf JSON.
  485 +
  486 +- Convert a PDF file to JSON format, edit the JSON, and convert back
  487 + to PDF. This is an alternative to using QDF mode (see :ref:`qdf`) to
  488 + modify PDF files in a text editor. Each method has its own
  489 + advantages and disadvantages.
  490 +
  491 + ::
  492 +
  493 + qpdf --json-output in.pdf pdf.json
  494 + # edit pdf.json
  495 + qpdf --json-input pdf.json out.pdf
  496 +
  497 +- Extract only a specific object into a JSON file, modify the object
  498 + in JSON, and use the modified object to update the original PDF. In
  499 + this case, we're editing object 4, whatever that may happen to be.
  500 + You would have to know through some other means which object you
  501 + wanted to edit, such as by looking at other JSON output or using a
  502 + tool (possibly but not necessarily qpdf) to identify the object.
  503 +
  504 + ::
  505 +
  506 + qpdf --json-output in.pdf pdf.json --json-object=4,0
  507 + # edit pdf.json
  508 + qpdf in.pdf --update-from-json=pdf.json out.pdf
  509 +
  510 + Rather than using :qpdf:ref:`--json-object` as in the above example,
  511 + you could edit the JSON file to remove the objects you didn't need.
  512 + You could also just leave them there, though the update process
  513 + would be slower.
  514 +
  515 + You could also add new objects to a file by adding them to
  516 + ``pdf.json``. Just be sure the object number doesn't conflict with
  517 + an existing object. The ``"maxobjectid"`` field in the original
  518 + output can help with this. You don't have to update it if you add
  519 + objects as it is ignored when the file is read back in.
  520 +
  521 +- Use :qpdf:ref:`--json-input` and :qpdf:ref:`--json-output` together
  522 + to demonstrate preservation of object numbers. In this example,
  523 + ``a.json`` and ``b.json`` will have the same objects and object
  524 + numbers. The files may not be identical since strings may be
  525 + normalized, fields may appear in a different order, etc. However
  526 + ``b.json`` and ``c.json`` are probably identical.
  527 +
  528 + ::
  529 +
  530 + qpdf --json-output in.pdf a.json
  531 + qpdf --json-input --json-output a.json b.json
  532 + qpdf --json-input --json-output b.json c.json
  533 +
  534 +
  535 +.. _json.workflow-api:
  536 +
  537 +qpdf JSON Workflow: API
  538 +~~~~~~~~~~~~~~~~~~~~~~~
  539 +
  540 +Everything that can be done using the qpdf CLI can be done using the
  541 +C++ API. See comments in :file:`QPDF.hh` for ``writeJSON``,
  542 +``createFromJSON``, and ``updateFromJSON`` for details.
27 543
28 .. _json-guarantees: 544 .. _json-guarantees:
29 545
30 -JSON Guarantees  
31 ---------------- 546 +JSON Compatibility Guarantees
  547 +-----------------------------
32 548
33 The qpdf JSON representation includes a JSON serialization of the raw 549 The qpdf JSON representation includes a JSON serialization of the raw
34 objects in the PDF file as well as some computed information in a more 550 objects in the PDF file as well as some computed information in a more
@@ -37,24 +553,23 @@ format. These guarantees are designed to simplify the experience of a @@ -37,24 +553,23 @@ format. These guarantees are designed to simplify the experience of a
37 developer working with the JSON format. 553 developer working with the JSON format.
38 554
39 Compatibility 555 Compatibility
40 - The top-level JSON object output is a dictionary. The JSON output  
41 - contains various nested dictionaries and arrays. With the exception  
42 - of dictionaries that are populated by the fields of objects from the  
43 - file, all instances of a dictionary are guaranteed to have exactly  
44 - the same keys. Future versions of qpdf are free to add additional  
45 - keys but not to remove keys or change the type of object that a key  
46 - points to. The qpdf program validates this guarantee, and in the  
47 - unlikely event that a bug in qpdf should cause it to generate data  
48 - that doesn't conform to this rule, it will ask you to file a bug  
49 - report.  
50 -  
51 - The top-level JSON structure contains a "``version``" key whose value  
52 - is simple integer. The value of the ``version`` key will be 556 + The top-level JSON object is a dictionary (JSON "object"). The JSON
  557 + output contains various nested dictionaries and arrays. With the
  558 + exception of dictionaries that are populated by the fields of
  559 + PDF objects from the file, all instances of a dictionary are
  560 + guaranteed to have exactly the same keys.
  561 +
  562 + The top-level JSON structure contains a ``"version"`` key whose
  563 + value is simple integer. The value of the ``version`` key will be
53 incremented if a non-compatible change is made. A non-compatible 564 incremented if a non-compatible change is made. A non-compatible
54 change would be any change that involves removal of a key, a change 565 change would be any change that involves removal of a key, a change
55 - to the format of data pointed to by a key, or a semantic change that  
56 - requires a different interpretation of a previously existing key. A  
57 - strong effort will be made to avoid breaking compatibility. 566 + to the format of data pointed to by a key, or a semantic change
  567 + that requires a different interpretation of a previously existing
  568 + key.
  569 +
  570 + With a specific qpdf JSON version, future versions of qpdf are free
  571 + to add additional keys but not to remove keys or change the type of
  572 + object that a key points to.
58 573
59 Documentation 574 Documentation
60 The :command:`qpdf` command can be invoked with the 575 The :command:`qpdf` command can be invoked with the
@@ -66,28 +581,29 @@ Documentation @@ -66,28 +581,29 @@ Documentation
66 581
67 - A dictionary in the help output means that the corresponding 582 - A dictionary in the help output means that the corresponding
68 location in the actual JSON output is also a dictionary with 583 location in the actual JSON output is also a dictionary with
69 - exactly the same keys; that is, no keys present in help are absent  
70 - in the real output, and no keys will be present in the real output  
71 - that are not in help. As a special case, if the dictionary has a  
72 - single key whose name starts with ``<`` and ends with ``>``, it  
73 - means that the JSON output is a dictionary that can have any keys,  
74 - each of which conforms to the value of the special key. This is  
75 - used for cases in which the keys of the dictionary are things like  
76 - object IDs. 584 + exactly the same keys; that is, no keys present in help are
  585 + absent in the real output, and no keys will be present in the
  586 + real output that are not in help. It is possible for a key to be
  587 + present and have a value that is explicitly ``null``. As a
  588 + special case, if the dictionary has a single key whose name
  589 + starts with ``<`` and ends with ``>``, it means that the JSON
  590 + output is a dictionary that can have any value as a key. This is
  591 + used for cases in which the keys of the dictionary are things
  592 + like object IDs.
77 593
78 - A string in the help output is a description of the item that 594 - A string in the help output is a description of the item that
79 appears in the corresponding location of the actual output. The 595 appears in the corresponding location of the actual output. The
80 - corresponding output can have any format. 596 + corresponding output can have any value including ``null``.
81 597
82 - An array in the help output always contains a single element. It 598 - An array in the help output always contains a single element. It
83 indicates that the corresponding location in the actual output is 599 indicates that the corresponding location in the actual output is
84 - also an array, and that each element of the array has whatever  
85 - format is implied by the single element of the help output's  
86 - array. 600 + an array of any length, and that each element of the array has
  601 + whatever format is implied by the single element of the help
  602 + output's array.
87 603
88 - For example, the help output indicates includes a "``pagelabels``" 604 + For example, the help output indicates includes a ``"pagelabels"``
89 key whose value is an array of one element. That element is a 605 key whose value is an array of one element. That element is a
90 - dictionary with keys "``index``" and "``label``". In addition to 606 + dictionary with keys ``"index"`` and ``"label"``. In addition to
91 describing the meaning of those keys, this tells you that the actual 607 describing the meaning of those keys, this tells you that the actual
92 JSON output will contain a ``pagelabels`` array, each of whose 608 JSON output will contain a ``pagelabels`` array, each of whose
93 elements is a dictionary that contains an ``index`` key, a ``label`` 609 elements is a dictionary that contains an ``index`` key, a ``label``
@@ -95,56 +611,13 @@ Documentation @@ -95,56 +611,13 @@ Documentation
95 611
96 Directness and Simplicity 612 Directness and Simplicity
97 The JSON output contains the value of every object in the file, but 613 The JSON output contains the value of every object in the file, but
98 - it also contains some processed data. This is analogous to how qpdf's  
99 - library interface works. The processed data is similar to the helper  
100 - functions in that it allows you to look at certain aspects of the PDF  
101 - file without having to understand all the nuances of the PDF 614 + it also contains some summary data. This is analogous to how qpdf's
  615 + library interface works. The summary data is similar to the helper
  616 + functions in that it allows you to look at certain aspects of the
  617 + PDF file without having to understand all the nuances of the PDF
102 specification, while the raw objects allow you to mine the PDF for 618 specification, while the raw objects allow you to mine the PDF for
103 anything that the higher-level interfaces are lacking. 619 anything that the higher-level interfaces are lacking.
104 620
105 -.. _json.limitations:  
106 -  
107 -Limitations of JSON Representation  
108 -----------------------------------  
109 -  
110 -There are a few limitations to be aware of with the JSON structure:  
111 -  
112 -- Strings, names, and indirect object references in the original PDF  
113 - file are all converted to strings in the JSON representation. In the  
114 - case of a "normal" PDF file, you can tell the difference because a  
115 - name starts with a slash (``/``), and an indirect object reference  
116 - looks like ``n n R``, but if there were to be a string that looked  
117 - like a name or indirect object reference, there would be no way to  
118 - tell this from the JSON output. Note that there are certain cases  
119 - where you know for sure what something is, such as knowing that  
120 - dictionary keys in objects are always names and that certain things  
121 - in the higher-level computed data are known to contain indirect  
122 - object references.  
123 -  
124 -- The JSON format doesn't support binary data very well. Mostly the  
125 - details are not important, but they are presented here for  
126 - information. When qpdf outputs a string in the JSON representation,  
127 - it converts the string to UTF-8, assuming usual PDF string semantics.  
128 - Specifically, if the original string is UTF-16, it is converted to  
129 - UTF-8. Otherwise, it is assumed to have PDF doc encoding, and is  
130 - converted to UTF-8 with that assumption. This causes strange things  
131 - to happen to binary strings. For example, if you had the binary  
132 - string ``<038051>``, this would be output to the JSON as ``\u0003โ€ขQ``  
133 - because ``03`` is not a printable character and ``80`` is the bullet  
134 - character in PDF doc encoding and is mapped to the Unicode value  
135 - ``2022``. Since ``51`` is ``Q``, it is output as is. If you wanted to  
136 - convert back from here to a binary string, would have to recognize  
137 - Unicode values whose code points are higher than ``0xFF`` and map  
138 - those back to their corresponding PDF doc encoding characters. There  
139 - is no way to tell the difference between a Unicode string that was  
140 - originally encoded as UTF-16 or one that was converted from PDF doc  
141 - encoding. In other words, it's best if you don't try to use the JSON  
142 - format to extract binary strings from the PDF file, but if you really  
143 - had to, it could be done. Note that qpdf's  
144 - :qpdf:ref:`--show-object` option does not have this  
145 - limitation and will reveal the string as encoded in the original  
146 - file.  
147 -  
148 .. _json.considerations: 621 .. _json.considerations:
149 622
150 JSON: Special Considerations 623 JSON: Special Considerations
@@ -157,12 +630,15 @@ be aware of: @@ -157,12 +630,15 @@ be aware of:
157 - If a PDF file has certain types of errors in its pages tree (such as 630 - If a PDF file has certain types of errors in its pages tree (such as
158 page objects that are direct or multiple pages sharing the same 631 page objects that are direct or multiple pages sharing the same
159 object ID), qpdf will automatically repair the pages tree. If you 632 object ID), qpdf will automatically repair the pages tree. If you
160 - specify ``"objects"`` and/or ``"objectinfo"`` without any other  
161 - keys, you will see the original pages tree without any corrections.  
162 - If you specify any of keys that require page tree traversal (for  
163 - example, ``"pages"``, ``"outlines"``, or ``"pagelabel"``), then  
164 - ``"objects"`` and ``"objectinfo"`` will show the repaired page tree  
165 - so that object references will be consistent throughout the file. 633 + specify ``"objects"`` (and, with qpdf JSON version 1, also
  634 + ``"objectinfo"``) without any other keys, you will see the original
  635 + pages tree without any corrections. If you specify any of keys that
  636 + require page tree traversal (for example, ``"pages"``,
  637 + ``"outlines"``, or ``"pagelabel"``), then ``"objects"`` (and
  638 + ``"objectinfo"``) will show the repaired page tree so that object
  639 + references will be consistent throughout the file. This is not an
  640 + issue with :qpdf:ref:`--json-output`, which doesn't repair the pages
  641 + tree.
166 642
167 - While qpdf guarantees that keys present in the help will be present 643 - While qpdf guarantees that keys present in the help will be present
168 in the output, those fields may be null or empty if the information 644 in the output, those fields may be null or empty if the information
@@ -177,22 +653,128 @@ be aware of: @@ -177,22 +653,128 @@ be aware of:
177 1. Note that JSON indexes from 0, and you would also use 0-based 653 1. Note that JSON indexes from 0, and you would also use 0-based
178 indexing using the API. However, 1-based indexing is easier in this 654 indexing using the API. However, 1-based indexing is easier in this
179 case because the command-line syntax for specifying page ranges is 655 case because the command-line syntax for specifying page ranges is
180 - 1-based. If you were going to write a program that looked through the  
181 - JSON for information about specific pages and then use the 656 + 1-based. If you were going to write a program that looked through
  657 + the JSON for information about specific pages and then use the
182 command-line to extract those pages, 1-based indexing is easier. 658 command-line to extract those pages, 1-based indexing is easier.
183 - Besides, it's more convenient to subtract 1 from a program in a real  
184 - programming language than it is to add 1 from shell code. 659 + Besides, it's more convenient to subtract 1 in a real programming
  660 + language than it is to add 1 in shell code.
185 661
186 - The image information included in the ``page`` section of the JSON 662 - The image information included in the ``page`` section of the JSON
187 - output includes the key "``filterable``". Note that the value of this  
188 - field may depend on the :qpdf:ref:`--decode-level` that  
189 - you invoke qpdf with. The JSON output includes a top-level key  
190 - "``parameters``" that indicates the decode level used for computing  
191 - whether a stream was filterable. For example, jpeg images will be  
192 - shown as not filterable by default, but they will be shown as  
193 - filterable if you run :command:`qpdf --json 663 + output includes the key ``"filterable"``. Note that the value of
  664 + this field may depend on the :qpdf:ref:`--decode-level` that you
  665 + invoke qpdf with. The JSON output includes a top-level key
  666 + ``"parameters"`` that indicates the decode level that was used for
  667 + computing whether a stream was filterable. For example, jpeg images
  668 + will be shown as not filterable by default, but they will be shown
  669 + as filterable if you run :command:`qpdf --json
194 --decode-level=all`. 670 --decode-level=all`.
195 671
196 - The ``encrypt`` key's values will be populated for non-encrypted 672 - The ``encrypt`` key's values will be populated for non-encrypted
197 files. Some values will be null, and others will have values that 673 files. Some values will be null, and others will have values that
198 apply to unencrypted files. 674 apply to unencrypted files.
  675 +
  676 +- The qpdf library itself never loads an entire PDF into memory. This
  677 + remains true for PDF files represented in JSON format. In general,
  678 + qpdf will hold the entire object structure in memory once a file has
  679 + been fully read (objects are loaded into memory lazily but stay
  680 + there once loaded), but it will never have more than two copies of a
  681 + stream in memory at once. That said, if you ask qpdf to write JSON
  682 + to memory, it will do so, so be careful about this if you are
  683 + working with very large PDF files. There is nothing in the qpdf
  684 + library itself that prevents working with PDF files much larger than
  685 + available system memory. qpdf can both read and write such files in
  686 + JSON format. If you need to work with a PDF file's json
  687 + representation in memory, it is recommended that you use either
  688 + ``none`` or ``file`` as the argument to
  689 + :qpdf:ref:`--json-stream-data`, or if using the API, use
  690 + ``qpdf_sj_none`` or ``pdf_sj_file`` as the json stream data value.
  691 + If using ``none``, you can use other means to obtain the stream
  692 + data.
  693 +
  694 +.. _json-v2-changes:
  695 +
  696 +Changes from JSON v1 to v2
  697 +--------------------------
  698 +
  699 +The following changes were made to qpdf's JSON output format for
  700 +version 2.
  701 +
  702 +- The representation of objects has changed. For details, see
  703 + :ref:`json.objects`.
  704 +
  705 + - The representation of strings is now unambiguous for all strings.
  706 + Strings a prefixed with either ``u:`` for Unicode strings or
  707 + ``b:`` for byte strings.
  708 +
  709 + - Names are shown in qpdf's canonical form rather than in PDF
  710 + syntax. (Example: the PDF-syntax name ``/text#2fplain`` appeared
  711 + as ``"/text#2fplain"`` in v1 but appears as ``"/text/plain"`` in
  712 + v2.
  713 +
  714 + - The top-level representation of an object in ``"objects"`` is a
  715 + dictionary containing either a ``"value"`` key or a ``"stream"``
  716 + key, making it possible to distinguish streams from other objects.
  717 +
  718 +- The ``"objectinfo"`` key has been removed in favor of a
  719 + representation in ``"objects"`` that differentiates between a stream
  720 + and other kinds of objects. In v1, it was not possible to tell a
  721 + stream from a dictionary within ``"objects"``.
  722 +
  723 +- Within the ``"objects"`` dictionary, keys are now ``"obj:O G R"``
  724 + where ``O`` and ``G`` are the object and generation number.
  725 + ``"trailer"`` remains the key for the trailer dictionary. In v1, the
  726 + ``obj:`` prefix was not present. The rationale for this change is as
  727 + follows:
  728 +
  729 + - Having a unique prefix (``obj:``) makes it much easier to search
  730 + in the JSON file for the definition of an object
  731 +
  732 + - Having the key still contain ``O G R`` makes it much easier to
  733 + construct the key from an indirect reference. You just have to
  734 + prepend ``obj:``. There is no need to parse the indirect object
  735 + reference.
  736 +
  737 +- In the ``"encrypt"`` object, the ``"modifyannotations"`` was
  738 + misspelled as ``"moddifyannotations"`` in v1. This has been
  739 + corrected.
  740 +
  741 +Motivation for qpdf JSON version 2
  742 +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  743 +
  744 +qpdf JSON version 2 was created to make it possible to manipulate PDF
  745 +files using JSON syntax instead of native PDF syntax. This makes it
  746 +possible to make low-level updates to PDF files from just about any
  747 +programming language or even to do so from the command-line using
  748 +tools like ``jq`` or any editor that's capable of working with JSON
  749 +files. There were several limitations of JSON format version 1 that
  750 +made this impossible:
  751 +
  752 +- Strings, names, and indirect object references in the original PDF
  753 + file were all converted to strings in the JSON representation. For
  754 + casual human inspection, this was fine, but in the general case,
  755 + there was no way to tell the difference between a string that looked
  756 + like a name or indirect object reference from an actual name or
  757 + indirect object reference.
  758 +
  759 +- PDF strings were not unambiguously represented in the JSON format.
  760 + The way qpdf JSON v1 represented a string was to try to convert the
  761 + string to UTF-8. This was done by assuming a string that was not
  762 + explicitly marked as Unicode was encoded in PDF doc encoding. The
  763 + problem is that there is not a perfect bidirectional mapping between
  764 + Unicode and PDF doc encoding, so if a binary string happened to
  765 + contain characters that couldn't be bidirectionally mapped, there
  766 + would be no way to get back to the original PDF string. Even when
  767 + possible, trying to map from the JSON representation of a binary
  768 + string back to the original string required knowledge of the mapping
  769 + between PDF doc encoding and Unicode.
  770 +
  771 +- There was no representation of stream data. If you wanted to extract
  772 + stream data, you could use :qpdf:ref:`--show-object`, so this wasn't
  773 + that important for inspection, but it was a blocker for being able
  774 + to go from JSON back to PDF. qpdf JSON version 2 allows stream data
  775 + to be included inline as base64-encoded data. There is also an
  776 + option to write all stream data to external files, which makes it
  777 + possible to work with very large PDF files in JSON format even with
  778 + tools that try to read the entire JSON structure into memory.
  779 +
  780 +- The PDF version from PDF header was not represented in qpdf JSON v1.
manual/library.rst
@@ -70,12 +70,14 @@ Python @@ -70,12 +70,14 @@ Python
70 qpdf's capabilities with other functionality provided by Python's 70 qpdf's capabilities with other functionality provided by Python's
71 rich standard library and available modules. 71 rich standard library and available modules.
72 72
73 -Other Languages  
74 - Starting with version 8.3.0, the :command:`qpdf`  
75 - command-line tool can produce a JSON representation of the PDF file's  
76 - non-content data. This can facilitate interacting programmatically  
77 - with PDF files through qpdf's command line interface. For more  
78 - information, please see :ref:`json`. 73 +Other Languages Starting with version 11.0.0, the :command:`qpdf`
  74 + command-line tool can produce an unambiguous JSON representation of
  75 + a PDF file and can also create or update PDF files using this JSON
  76 + representation. qpdf versions from 8.3.0 through 10.6.3 had a more
  77 + limited JSON output format. The qpdf JSON format makes it possible
  78 + to inspect and modify the structure of a PDF file down to the
  79 + object level from the command-line or from any language that can
  80 + handle JSON data. Please see :ref:`json` for details.
79 81
80 Wrappers 82 Wrappers
81 The `qpdf Wiki <https://github.com/qpdf/qpdf/wiki>`__ contains a 83 The `qpdf Wiki <https://github.com/qpdf/qpdf/wiki>`__ contains a
manual/object-streams.rst
@@ -122,7 +122,7 @@ entries in ``/W`` above. Each entry consists of one or more fields, the @@ -122,7 +122,7 @@ entries in ``/W`` above. Each entry consists of one or more fields, the
122 first of which is the type of the field. The number of bytes for each 122 first of which is the type of the field. The number of bytes for each
123 field is given by ``/W`` above. A 0 in ``/W`` indicates that the field 123 field is given by ``/W`` above. A 0 in ``/W`` indicates that the field
124 is omitted and has the default value. The default value for the field 124 is omitted and has the default value. The default value for the field
125 -type is "``1``". All other default values are "``0``". 125 +type is ``1``. All other default values are ``0``.
126 126
127 PDF 1.5 has three field types: 127 PDF 1.5 has three field types:
128 128
manual/qdf.rst
@@ -28,6 +28,13 @@ able to restore edited files to a correct state. The @@ -28,6 +28,13 @@ able to restore edited files to a correct state. The
28 arguments. It reads a possibly edited QDF file from standard input and 28 arguments. It reads a possibly edited QDF file from standard input and
29 writes a repaired file to standard output. 29 writes a repaired file to standard output.
30 30
  31 +For another way to work with PDF files in an editor, see :ref:`json`.
  32 +Using qpdf JSON format allows you to edit the PDF file semantically
  33 +without having to be concerned about PDF syntax. However, QDF files
  34 +are actually valid PDF files, so the feedback cycle may be faster if
  35 +previewing with a PDF reader. Also, since QDF files are valid PDF, you
  36 +can experiment with all aspects of the PDF file, including syntax.
  37 +
31 The following attributes characterize a QDF file: 38 The following attributes characterize a QDF file:
32 39
33 - All objects appear in numerical order in the PDF file, including when 40 - All objects appear in numerical order in the PDF file, including when
manual/qpdf-job.rst
@@ -27,6 +27,10 @@ executable is available from inside the C++ library using the @@ -27,6 +27,10 @@ executable is available from inside the C++ library using the
27 27
28 - Use from the C API with ``qpdfjob_run_from_json`` from :file:`qpdfjob-c.h` 28 - Use from the C API with ``qpdfjob_run_from_json`` from :file:`qpdfjob-c.h`
29 29
  30 + - Note: this is unrelated to :qpdf:ref:`--json` but can be combined
  31 + with it. For more information on qpdf JSON (vs. QPDFJob JSON), see
  32 + :ref:`json`.
  33 +
30 - The ``QPDFJob`` C++ API 34 - The ``QPDFJob`` C++ API
31 35
32 If you can understand how to use the :command:`qpdf` CLI, you can 36 If you can understand how to use the :command:`qpdf` CLI, you can
manual/release-notes.rst
@@ -60,7 +60,8 @@ For a detailed list of changes, please see the file @@ -60,7 +60,8 @@ For a detailed list of changes, please see the file
60 - CLI: breaking changes 60 - CLI: breaking changes
61 61
62 - The default json output version when :qpdf:ref:`--json` is 62 - The default json output version when :qpdf:ref:`--json` is
63 - specified has been changed from ``1`` to ``latest``. 63 + specified has been changed from ``1`` to ``latest``, which is
  64 + now ``2``.
64 65
65 - The :qpdf:ref:`--allow-weak-crypto` flag is now mandatory when 66 - The :qpdf:ref:`--allow-weak-crypto` flag is now mandatory when
66 explicitly creating files with weak cryptographic algorithms. 67 explicitly creating files with weak cryptographic algorithms.
@@ -100,7 +101,7 @@ For a detailed list of changes, please see the file @@ -100,7 +101,7 @@ For a detailed list of changes, please see the file
100 101
101 - ``qpdf --list-attachments --verbose`` include some additional 102 - ``qpdf --list-attachments --verbose`` include some additional
102 information about attachments. Additional information about 103 information about attachments. Additional information about
103 - attachments is also included in the ``attachments`` json key 104 + attachments is also included in the ``attachments`` JSON key
104 with ``--json``. 105 with ``--json``.
105 106
106 - For encrypted files, ``qpdf --json`` reveals the user password 107 - For encrypted files, ``qpdf --json`` reveals the user password
@@ -647,8 +648,8 @@ For a detailed list of changes, please see the file @@ -647,8 +648,8 @@ For a detailed list of changes, please see the file
647 passwords from files or standard input than using 648 passwords from files or standard input than using
648 :samp:`@file` for this purpose. 649 :samp:`@file` for this purpose.
649 650
650 - - Add some information about attachments to the json output, and  
651 - added ``attachments`` as an additional json key. The 651 + - Add some information about attachments to the JSON output, and
  652 + added ``attachments`` as an additional JSON key. The
652 information included here is limited to the preferred name and 653 information included here is limited to the preferred name and
653 content stream and a reference to the file spec object. This is 654 content stream and a reference to the file spec object. This is
654 enough detail for clients to avoid the hassle of navigating a 655 enough detail for clients to avoid the hassle of navigating a