Commit 62d47bff523ec6b64161651b33bb1563b6a80776

Authored by Jay Berkenbilt
1 parent 05460d40

TODO: notes on QPDFPagesTree

Showing 1 changed file with 33 additions and 9 deletions
@@ -11,6 +11,7 @@ In order: @@ -11,6 +11,7 @@ In order:
11 11
12 Other (do in any order): 12 Other (do in any order):
13 13
  14 +* QPDFPagesTree -- avoid ever flattening the pages tree.
14 * Check about runpath in the linux-bin distribution. I think the 15 * Check about runpath in the linux-bin distribution. I think the
15 appimage build specifically is setting the runpath, which is 16 appimage build specifically is setting the runpath, which is
16 actually desirable in this case. Make sure to understand and 17 actually desirable in this case. Make sure to understand and
@@ -56,17 +57,8 @@ Output JSON v2 @@ -56,17 +57,8 @@ Output JSON v2
56 57
57 Some of this documentation has drifted from the actual implementation. 58 Some of this documentation has drifted from the actual implementation.
58 59
59 -Make sure pages tree repair generates warnings.  
60 -  
61 * Document that /Length is ignored in stream dictionary replacements 60 * Document that /Length is ignored in stream dictionary replacements
62 61
63 -Try to never flatten pages tree. Make sure we do something reasonable  
64 -with pages tree repair. The problem is that if pages tree repair is  
65 -done as a side effect of running --json, the qpdf part of the json may  
66 -contain object numbers that aren't there. Maybe we need to indicate  
67 -whether pages tree repair has been done in the json, but this would  
68 -have to be known early in parsing, which is a problem.  
69 -  
70 General things to remember: 62 General things to remember:
71 63
72 * Make sure all the information from --check and other informational 64 * Make sure all the information from --check and other informational
@@ -240,6 +232,38 @@ Additionally, using "n n R" as a key in "objects" and "objectinfo" @@ -240,6 +232,38 @@ Additionally, using "n n R" as a key in "objects" and "objectinfo"
240 messes up searching for things. 232 messes up searching for things.
241 233
242 234
  235 +QPDFPagesTree
  236 +=============
  237 +
  238 +Partial work is on qpdf-pages-tree branch. QPDFPageTree is mostly
  239 +implemented and mostly tested. There are not enough cases of different
  240 +kinds of operations (pclm, linearize, json, etc.) with non-flat pages
  241 +trees. Insertion is not implemented.
  242 +
  243 +Page tree repair is silent (no warnings) and has a comment saying that
  244 +we don't need warnings, but I think we should have warnings now that
  245 +we have json v2. The reason is that page tree repair will change
  246 +object numbers, and it's useful to know that.
  247 +
  248 +I'm thinking we will want to keep a pages cache for efficient
  249 +insertion. There's no reason we can't keep a vector of page objects up
  250 +to date and just do a traversal the first time we do getAllPages just
  251 +like we do now. The difference is that we would not flatten the pages
  252 +tree. It would be useful to go through QPDF_pages and re-reimplement
  253 +everything without calling flattenPagesTree. Then we can remove
  254 +flattenPagesTree, which is private.
  255 +
  256 +In its current state, QPDFPagesTree does not proactively fix /Type or
  257 +correct page objects that are used multiple times. You have to
  258 +traverse the pages tree to trigger this operation. It would be nice if
  259 +we would do that somewhere but not do it more often than necessary so
  260 +isPagesObject and isPageObject are reliable and can be made more
  261 +reliable. Maybe add a validate or repair function? It should also make
  262 +sure /Count and /Parent are correct.
  263 +
  264 +refs/attic/QPDFPagesTree-old -- original, abndoned branch -- clean up
  265 +when done.
  266 +
243 QPDFJob 267 QPDFJob
244 ======= 268 =======
245 269