Commit 7882b85b0691d6a669cb0b2656f1e4c7438c552b

Authored by Jay Berkenbilt
1 parent 3c4d2bfb

TODO: more JSON notes

Showing 1 changed file with 109 additions and 3 deletions
... ... @@ -39,6 +39,108 @@ Soon: Break ground on "Document-level work"
39 39 Output JSON v2
40 40 ==============
41 41  
  42 +----
  43 +notes from 5/2:
  44 +
  45 +Need new pipelines:
  46 +* Pl_OStream(std::ostream) with semantics like Pl_StdioFile
  47 +* Pl_String to std::string with semantics like Pl_Buffer
  48 +* Pl_Base64
  49 +
  50 +New Pipeline methods:
  51 +* writeString(std::string const&)
  52 +* writeCString(char*)
  53 +* writeChars(char*, size_t)
  54 +
  55 +* Consider templated operator<< which could specialize for char* and
  56 + std::string and could use std::ostringstream otherwise
  57 +
  58 +See if I can change all output and error messages issued by the
  59 +library, when context is available, to have a pipeline rather than a
  60 +FILE* or std::ostream. This makes it possible for people to capture
  61 +output more flexibly.
  62 +
  63 +JSON: rather than unparse() -> string, there should be write method
  64 +that takes a pipeline and a depth. Then rewrite all the unparse
  65 +methods to use it. This makes incremental write possible as well as
  66 +writing arbitrarily large amounts of output.
  67 +
  68 +JSON::parse should work from an InputSource. BufferInputSource can
  69 +already start with a std::string.
  70 +
  71 +Have a json blob defined by a function that takes a pipeline and
  72 +writes data to the pipeline. It's writer should create a Pl_Base64 ->
  73 +Pl_Concatenate in front of the pipeline passed to write and call the
  74 +function with that.
  75 +
  76 +Add methods needed to do incremental writes. Basically we need to
  77 +expose functionality the array and dictionary unparse methods. Maybe
  78 +we can have a DictionaryWriter and an ArrayWriter that deal with the
  79 +first/depth logic and have writeElement or writeEntry(key, value)
  80 +methods.
  81 +
  82 +For json output, do not unparse to string. Use the writers instead.
  83 +Write incrementally. This changes ordering only, but we should be able
  84 +manually update the test output for those cases. Objects should be
  85 +written in numerical order, not lexically sorted. It probably makes
  86 +sense to put the trailer at the end since that's where it is in a
  87 +regular PDF.
  88 +
  89 +When we get to full serialization, add json serialization performance
  90 +test.
  91 +
  92 +Some if not all of the json output functionality for v2 should move
  93 +into QPDF proper rather than living in QPDFJob. There can be a
  94 +top-level QPDF method that takes a pipeline and writes the JSON
  95 +serialization to it.
  96 +
  97 +Decide what the API/CLI will be for serializing to v2. Will it just be
  98 +part of --json or will it be its own separate thing? Probably we
  99 +should make it so that a serialized PDF is different but uses the same
  100 +object format as regular json mode.
  101 +
  102 +For going back from JSON to PDF, a separate utility will be needed.
  103 +It's not practical for QPDFObjectHandle to be able to read JSON
  104 +because of the special handling that is required for indirect objects,
  105 +and QPDF can't just accept JSON because the way InputSource is used is
  106 +complete different. Instead, we will need a separate utility that has
  107 +logic similar to what copyForeignObject does. It will go something
  108 +like this:
  109 +
  110 +* Create an empty QPDF (not emptyPDF, one with no objects in it at
  111 + all). This works:
  112 +
  113 +```
  114 +%PDF-1.3
  115 +xref
  116 +0 1
  117 +0000000000 65535 f
  118 +trailer << /Size 1 >>
  119 +startxref
  120 +9
  121 +%%EOF
  122 +```
  123 +
  124 +For each object:
  125 +
  126 +* Walk through the object detecting any indirect objects. For each one
  127 + that is not already known, reserve the object. We can also validate
  128 + but we should try to do the best we can with invalid JSON so people
  129 + can get good error messages.
  130 +* Construct a QPDFObjectHandle from the JSON
  131 +* If the object is the trailer, update the trailer
  132 +* Else if the object doesn't exist, reserve it
  133 +* If the object is reserved, call replaceReserved()
  134 +* Else the object already exists; this is an error.
  135 +
  136 +This can almost be done through public API. I think all we need is the
  137 +ability to create a reserved object with a specific object ID.
  138 +
  139 +The choices for json_key (job.yml) will be different for v1 and v2.
  140 +That information is already duplicated in multiple places.
  141 +
  142 +----
  143 +
42 144 Remember typo: search for "Typo" In QPDFJob::doJSONEncrypt.
43 145  
44 146 Remember to test interaction between generators and schemas.
... ... @@ -173,21 +275,25 @@ JSON:
173 275 object. No dictionary merges or anything like that are performed.
174 276 It will call replaceObject.
175 277  
176   -Within .qpdf.objects, the key is "obj:o,g" or "obj:trailer", and the
  278 +Within .qpdf.objects, the key is "obj:o g R" or "obj:trailer", and the
177 279 value is a dictionary with exactly one of "value" or "stream" as its
178 280 single key.
179 281  
  282 +Rationale of "obj:o g R" is that indirect object references are just
  283 +"o g R", and so code that wants to resolve one can do so easily by
  284 +just prepending "obj:" and not having to parse or split the string.
  285 +
180 286 For non-streams:
181 287  
182 288 {
183   - "obj:o,g": {
  289 + "obj:o g R": {
184 290 "value": ...
185 291 }
186 292 }
187 293  
188 294 For streams:
189 295  
190   - "obj:o,g": {
  296 + "obj:o g R": {
191 297 "stream": {
192 298 "dict": { ... stream dictionary ... },
193 299 "filterable": bool,
... ...