Commit 62d47bff523ec6b64161651b33bb1563b6a80776

Authored by Jay Berkenbilt
1 parent 05460d40

TODO: notes on QPDFPagesTree

Showing 1 changed file with 33 additions and 9 deletions
... ... @@ -11,6 +11,7 @@ In order:
11 11  
12 12 Other (do in any order):
13 13  
  14 +* QPDFPagesTree -- avoid ever flattening the pages tree.
14 15 * Check about runpath in the linux-bin distribution. I think the
15 16 appimage build specifically is setting the runpath, which is
16 17 actually desirable in this case. Make sure to understand and
... ... @@ -56,17 +57,8 @@ Output JSON v2
56 57  
57 58 Some of this documentation has drifted from the actual implementation.
58 59  
59   -Make sure pages tree repair generates warnings.
60   -
61 60 * Document that /Length is ignored in stream dictionary replacements
62 61  
63   -Try to never flatten pages tree. Make sure we do something reasonable
64   -with pages tree repair. The problem is that if pages tree repair is
65   -done as a side effect of running --json, the qpdf part of the json may
66   -contain object numbers that aren't there. Maybe we need to indicate
67   -whether pages tree repair has been done in the json, but this would
68   -have to be known early in parsing, which is a problem.
69   -
70 62 General things to remember:
71 63  
72 64 * Make sure all the information from --check and other informational
... ... @@ -240,6 +232,38 @@ Additionally, using "n n R" as a key in "objects" and "objectinfo"
240 232 messes up searching for things.
241 233  
242 234  
  235 +QPDFPagesTree
  236 +=============
  237 +
  238 +Partial work is on qpdf-pages-tree branch. QPDFPageTree is mostly
  239 +implemented and mostly tested. There are not enough cases of different
  240 +kinds of operations (pclm, linearize, json, etc.) with non-flat pages
  241 +trees. Insertion is not implemented.
  242 +
  243 +Page tree repair is silent (no warnings) and has a comment saying that
  244 +we don't need warnings, but I think we should have warnings now that
  245 +we have json v2. The reason is that page tree repair will change
  246 +object numbers, and it's useful to know that.
  247 +
  248 +I'm thinking we will want to keep a pages cache for efficient
  249 +insertion. There's no reason we can't keep a vector of page objects up
  250 +to date and just do a traversal the first time we do getAllPages just
  251 +like we do now. The difference is that we would not flatten the pages
  252 +tree. It would be useful to go through QPDF_pages and re-reimplement
  253 +everything without calling flattenPagesTree. Then we can remove
  254 +flattenPagesTree, which is private.
  255 +
  256 +In its current state, QPDFPagesTree does not proactively fix /Type or
  257 +correct page objects that are used multiple times. You have to
  258 +traverse the pages tree to trigger this operation. It would be nice if
  259 +we would do that somewhere but not do it more often than necessary so
  260 +isPagesObject and isPageObject are reliable and can be made more
  261 +reliable. Maybe add a validate or repair function? It should also make
  262 +sure /Count and /Parent are correct.
  263 +
  264 +refs/attic/QPDFPagesTree-old -- original, abndoned branch -- clean up
  265 +when done.
  266 +
243 267 QPDFJob
244 268 =======
245 269  
... ...