Commit ed6130036c65124cb236e709201330404a4b1d72

Authored by Jay Berkenbilt
1 parent 9a0e9a1a

TODO: solidify work for JSON to PDF

Showing 2 changed files with 56 additions and 14 deletions
... ... @@ -18,7 +18,9 @@ Other (do in any order):
18 18 * See if I can change all output and error messages issued by the
19 19 library, when context is available, to have a pipeline rather than a
20 20 FILE* or std::ostream. This makes it possible for people to capture
21   - output more flexibly.
  21 + output more flexibly. We could also add a generic pipeline that
  22 + takes std::function<void(char const*, size_t)> or even a
  23 + void(*)(char const*, unsigned long) for the C API.
22 24 * Make job JSON accept a single element and treat as an array of one
23 25 when an array is expected. This allows for making things repeatable
24 26 in the future without breaking compatibility and is needed for the
... ... @@ -62,31 +64,59 @@ General things to remember:
62 64 when present in the schema. It's reasonable for people to check for
63 65 presence of a key. Most languages make this easy to do.
64 66  
  67 +* Document typo fix in encrypt in release notes along with any other
  68 + non-compatible json 2 changes. Scrutinize all the output to decide
  69 + what should change.
  70 +
65 71 * When we get to full serialization, add json serialization
66 72 performance test.
67 73  
68 74 * Add json to the large file tests.
69 75  
70   -* We could consider arguments like --replace-object that would take a
71   - JSON representation of the object and could include indirect
72   - references, etc. We could also add --delete object.
73   -
74 76 * Object representation tests
75 77 * "b:cf80", "b:CF80", "u:π", "u:\u03c0"
76 78 * "b:d83edd54", "u:🥔", "u:\ud83e\udd54"
77 79  
78 80 JSON to PDF:
79 81  
80   -When reading a JSON string, any string that doesn't follow the above rules
81   -is an error. Just use newUnicodeString on "u:" strings. For "b:"
82   -strings, decode the bytes with hex_decode and use newString.
  82 +Have --create-from-json and --update-from-json. With
  83 +--create-from-json, the json file must be complete, meaning all stream
  84 +data, the trailer, and the PDF version must be present. In
  85 +--update-from-json, an object explicitly set to null (not "value":
  86 +null) is deleted. For streams with no stream data, the dictionary is
  87 +updated but the data is left untouched. Other things that are omitted
  88 +are left alone. Make sure document that, when writing a PDF file from
  89 +QPDF, there is no expectation of object numbers being preserved. As
  90 +such, --update-from-json can only be used to update the exact file
  91 +that the json was created from. You can put multiple objects in the
  92 +update file, but you can't use a json from one file to update the
  93 +output of a previous update since the object numbers will have
  94 +changed. Note that, when creating from a JSON, object numbers are
  95 +preserved in the resulting QPDF object but still modified by
  96 +QPDFWriter for the output. This would be visible by combining
  97 +--to-json and --create-from-json. Also using --qdf with
  98 +--create-from-json would show original object IDs in comments. It will
  99 +be important to capture this in the documentation.
  100 +
  101 +When reading a JSON string, any string that doesn't look like a name
  102 +or indirect object or start with "b:" or "u:" should be considered an
  103 +error. Just use newUnicodeString on "u:" strings. For "b:" strings,
  104 +decode the bytes with hex_decode and use newString.
83 105  
84 106 For going back from JSON to PDF, we can have
85   -QPDF::fromJSON(std::shared_ptr<InputSource> which will have logic
86   -similar to copyForeignObject. Note that this InputSource is not going
87   -to be this->file. We have to keep it separately.
  107 +QPDF::createFromJSON(std::shared_ptr<InputSource>)
  108 +which will have logic similar to copyForeignObject. Note that this
  109 +InputSource is not going to be this->file. We have to keep it
  110 +separately. There's also non-static QPDF::updateFromJSON. Both
  111 +createFromJSON and updateFromJSON will call the same internal method
  112 +with different options. That method will use a reactor that is a
  113 +private QPDF class that just proxies to private QPDF methods.
  114 +
  115 +Test case: combine --create-from-json and --to-json to preservation of
  116 +object numbers. QPDFWriter won't show that although --qdf with the
  117 +original object ID comments would.
88 118  
89   -The backing input source is this memory block:
  119 +The backing input source for createFromJSON is this memory block:
90 120  
91 121 ```
92 122 %PDF-1.3
... ... @@ -116,7 +146,9 @@ startxref
116 146 For streams, have a stream data provider that, for inline streams,
117 147 does a base64 from the file offsets and for file-based streams, reads
118 148 the file. For the inline case, we have to keep the json InputSource
119   -around. Otherwise, we don't. It is an error if there is no stream data.
  149 +around. Otherwise, we don't. It is an error if there is no stream
  150 +data. For files, we can have a stream data provider that just reads
  151 +the file. Remember QUtil::file_provider.
120 152  
121 153 Documentation:
122 154  
... ... @@ -125,6 +157,7 @@ Serialized PDF:
125 157 The JSON output will have a "qpdf" key containing
126 158 * jsonversion
127 159 * pdfversion
  160 +* maxobjectid
128 161 * objects
129 162  
130 163 The "qpdf" key replaces "objects" and "objectinfo" in v1 JSON.
... ... @@ -175,7 +208,11 @@ CLI:
175 208 Example workflow:
176 209 * qpdf in.pdf --to-json > pdf.json
177 210 * edit pdf.json
178   -* qpdf --from-json=pdf.json out.pdf
  211 +* qpdf --create-from-json=pdf.json out.pdf
  212 +
  213 +* qpdf in.pdf --to-json > pdf.json
  214 +* edit pdf.json keeping only objects that need to be changed
  215 +* qpdf in.pdf --update-from-json=pdf.json out.pdf
179 216  
180 217 Update --json option in cli.rst to mention v2 and update json.rst.
181 218  
... ...
cSpell.json
... ... @@ -79,6 +79,7 @@
79 79 "ctest",
80 80 "cxxflags",
81 81 "cygwin",
  82 + "datafile",
82 83 "dbuild",
83 84 "dcmake",
84 85 "dctdecode",
... ... @@ -216,6 +217,7 @@
216 217 "jsample",
217 218 "jsamprow",
218 219 "jsimd",
  220 + "jsonversion",
219 221 "jstr",
220 222 "jurczyk",
221 223 "kgdl",
... ... @@ -262,6 +264,7 @@
262 264 "masamichi",
263 265 "mateusz",
264 266 "maxdepth",
  267 + "maxobjectid",
265 268 "mdash",
266 269 "mindepth",
267 270 "mkdir",
... ... @@ -344,6 +347,7 @@
344 347 "pcre",
345 348 "pdflatex",
346 349 "pdfs",
  350 + "pdfversion",
347 351 "pdlin",
348 352 "pfeifle",
349 353 "pikepdf",
... ... @@ -434,6 +438,7 @@
434 438 "rpath",
435 439 "rstream",
436 440 "runlength",
  441 + "runpath",
437 442 "runtest",
438 443 "sahil",
439 444 "samp",
... ...