Commit ed6130036c65124cb236e709201330404a4b1d72
1 parent
9a0e9a1a
TODO: solidify work for JSON to PDF
Showing
2 changed files
with
56 additions
and
14 deletions
TODO
| ... | ... | @@ -18,7 +18,9 @@ Other (do in any order): |
| 18 | 18 | * See if I can change all output and error messages issued by the |
| 19 | 19 | library, when context is available, to have a pipeline rather than a |
| 20 | 20 | FILE* or std::ostream. This makes it possible for people to capture |
| 21 | - output more flexibly. | |
| 21 | + output more flexibly. We could also add a generic pipeline that | |
| 22 | + takes std::function<void(char const*, size_t)> or even a | |
| 23 | + void(*)(char const*, unsigned long) for the C API. | |
| 22 | 24 | * Make job JSON accept a single element and treat as an array of one |
| 23 | 25 | when an array is expected. This allows for making things repeatable |
| 24 | 26 | in the future without breaking compatibility and is needed for the |
| ... | ... | @@ -62,31 +64,59 @@ General things to remember: |
| 62 | 64 | when present in the schema. It's reasonable for people to check for |
| 63 | 65 | presence of a key. Most languages make this easy to do. |
| 64 | 66 | |
| 67 | +* Document typo fix in encrypt in release notes along with any other | |
| 68 | + non-compatible json 2 changes. Scrutinize all the output to decide | |
| 69 | + what should change. | |
| 70 | + | |
| 65 | 71 | * When we get to full serialization, add json serialization |
| 66 | 72 | performance test. |
| 67 | 73 | |
| 68 | 74 | * Add json to the large file tests. |
| 69 | 75 | |
| 70 | -* We could consider arguments like --replace-object that would take a | |
| 71 | - JSON representation of the object and could include indirect | |
| 72 | - references, etc. We could also add --delete object. | |
| 73 | - | |
| 74 | 76 | * Object representation tests |
| 75 | 77 | * "b:cf80", "b:CF80", "u:π", "u:\u03c0" |
| 76 | 78 | * "b:d83edd54", "u:🥔", "u:\ud83e\udd54" |
| 77 | 79 | |
| 78 | 80 | JSON to PDF: |
| 79 | 81 | |
| 80 | -When reading a JSON string, any string that doesn't follow the above rules | |
| 81 | -is an error. Just use newUnicodeString on "u:" strings. For "b:" | |
| 82 | -strings, decode the bytes with hex_decode and use newString. | |
| 82 | +Have --create-from-json and --update-from-json. With | |
| 83 | +--create-from-json, the json file must be complete, meaning all stream | |
| 84 | +data, the trailer, and the PDF version must be present. In | |
| 85 | +--update-from-json, an object explicitly set to null (not "value": | |
| 86 | +null) is deleted. For streams with no stream data, the dictionary is | |
| 87 | +updated but the data is left untouched. Other things that are omitted | |
| 88 | +are left alone. Make sure document that, when writing a PDF file from | |
| 89 | +QPDF, there is no expectation of object numbers being preserved. As | |
| 90 | +such, --update-from-json can only be used to update the exact file | |
| 91 | +that the json was created from. You can put multiple objects in the | |
| 92 | +update file, but you can't use a json from one file to update the | |
| 93 | +output of a previous update since the object numbers will have | |
| 94 | +changed. Note that, when creating from a JSON, object numbers are | |
| 95 | +preserved in the resulting QPDF object but still modified by | |
| 96 | +QPDFWriter for the output. This would be visible by combining | |
| 97 | +--to-json and --create-from-json. Also using --qdf with | |
| 98 | +--create-from-json would show original object IDs in comments. It will | |
| 99 | +be important to capture this in the documentation. | |
| 100 | + | |
| 101 | +When reading a JSON string, any string that doesn't look like a name | |
| 102 | +or indirect object or start with "b:" or "u:" should be considered an | |
| 103 | +error. Just use newUnicodeString on "u:" strings. For "b:" strings, | |
| 104 | +decode the bytes with hex_decode and use newString. | |
| 83 | 105 | |
| 84 | 106 | For going back from JSON to PDF, we can have |
| 85 | -QPDF::fromJSON(std::shared_ptr<InputSource> which will have logic | |
| 86 | -similar to copyForeignObject. Note that this InputSource is not going | |
| 87 | -to be this->file. We have to keep it separately. | |
| 107 | +QPDF::createFromJSON(std::shared_ptr<InputSource>) | |
| 108 | +which will have logic similar to copyForeignObject. Note that this | |
| 109 | +InputSource is not going to be this->file. We have to keep it | |
| 110 | +separately. There's also non-static QPDF::updateFromJSON. Both | |
| 111 | +createFromJSON and updateFromJSON will call the same internal method | |
| 112 | +with different options. That method will use a reactor that is a | |
| 113 | +private QPDF class that just proxies to private QPDF methods. | |
| 114 | + | |
| 115 | +Test case: combine --create-from-json and --to-json to preservation of | |
| 116 | +object numbers. QPDFWriter won't show that although --qdf with the | |
| 117 | +original object ID comments would. | |
| 88 | 118 | |
| 89 | -The backing input source is this memory block: | |
| 119 | +The backing input source for createFromJSON is this memory block: | |
| 90 | 120 | |
| 91 | 121 | ``` |
| 92 | 122 | %PDF-1.3 |
| ... | ... | @@ -116,7 +146,9 @@ startxref |
| 116 | 146 | For streams, have a stream data provider that, for inline streams, |
| 117 | 147 | does a base64 from the file offsets and for file-based streams, reads |
| 118 | 148 | the file. For the inline case, we have to keep the json InputSource |
| 119 | -around. Otherwise, we don't. It is an error if there is no stream data. | |
| 149 | +around. Otherwise, we don't. It is an error if there is no stream | |
| 150 | +data. For files, we can have a stream data provider that just reads | |
| 151 | +the file. Remember QUtil::file_provider. | |
| 120 | 152 | |
| 121 | 153 | Documentation: |
| 122 | 154 | |
| ... | ... | @@ -125,6 +157,7 @@ Serialized PDF: |
| 125 | 157 | The JSON output will have a "qpdf" key containing |
| 126 | 158 | * jsonversion |
| 127 | 159 | * pdfversion |
| 160 | +* maxobjectid | |
| 128 | 161 | * objects |
| 129 | 162 | |
| 130 | 163 | The "qpdf" key replaces "objects" and "objectinfo" in v1 JSON. |
| ... | ... | @@ -175,7 +208,11 @@ CLI: |
| 175 | 208 | Example workflow: |
| 176 | 209 | * qpdf in.pdf --to-json > pdf.json |
| 177 | 210 | * edit pdf.json |
| 178 | -* qpdf --from-json=pdf.json out.pdf | |
| 211 | +* qpdf --create-from-json=pdf.json out.pdf | |
| 212 | + | |
| 213 | +* qpdf in.pdf --to-json > pdf.json | |
| 214 | +* edit pdf.json keeping only objects that need to be changed | |
| 215 | +* qpdf in.pdf --update-from-json=pdf.json out.pdf | |
| 179 | 216 | |
| 180 | 217 | Update --json option in cli.rst to mention v2 and update json.rst. |
| 181 | 218 | ... | ... |
cSpell.json
| ... | ... | @@ -79,6 +79,7 @@ |
| 79 | 79 | "ctest", |
| 80 | 80 | "cxxflags", |
| 81 | 81 | "cygwin", |
| 82 | + "datafile", | |
| 82 | 83 | "dbuild", |
| 83 | 84 | "dcmake", |
| 84 | 85 | "dctdecode", |
| ... | ... | @@ -216,6 +217,7 @@ |
| 216 | 217 | "jsample", |
| 217 | 218 | "jsamprow", |
| 218 | 219 | "jsimd", |
| 220 | + "jsonversion", | |
| 219 | 221 | "jstr", |
| 220 | 222 | "jurczyk", |
| 221 | 223 | "kgdl", |
| ... | ... | @@ -262,6 +264,7 @@ |
| 262 | 264 | "masamichi", |
| 263 | 265 | "mateusz", |
| 264 | 266 | "maxdepth", |
| 267 | + "maxobjectid", | |
| 265 | 268 | "mdash", |
| 266 | 269 | "mindepth", |
| 267 | 270 | "mkdir", |
| ... | ... | @@ -344,6 +347,7 @@ |
| 344 | 347 | "pcre", |
| 345 | 348 | "pdflatex", |
| 346 | 349 | "pdfs", |
| 350 | + "pdfversion", | |
| 347 | 351 | "pdlin", |
| 348 | 352 | "pfeifle", |
| 349 | 353 | "pikepdf", |
| ... | ... | @@ -434,6 +438,7 @@ |
| 434 | 438 | "rpath", |
| 435 | 439 | "rstream", |
| 436 | 440 | "runlength", |
| 441 | + "runpath", | |
| 437 | 442 | "runtest", |
| 438 | 443 | "sahil", |
| 439 | 444 | "samp", | ... | ... |