Commit 25aff0bd52b0382b9349c81aaabc2fde51528923

Authored by Jay Berkenbilt
1 parent 8a32515a

TODO: abandon (again) and update notes about QPDFPagesTree

Showing 2 changed files with 47 additions and 35 deletions
... ... @@ -9,7 +9,10 @@ Before Release:
9 9 * Release qtest with updates to qtest-driver and copy back into qpdf
10 10  
11 11 Next:
12   -* QPDFPagesTree -- avoid ever flattening the pages tree.
  12 +* QPDF -- track whether the pages tree was modified (whether
  13 + getAllPages was ever called. If so, consider generating a non-flat
  14 + pages tree before creating output to better handle files with lots
  15 + of pages.
13 16 * JSON v2 fixes
14 17  
15 18 Pending changes:
... ... @@ -45,39 +48,6 @@ Pending changes:
45 48 Soon: Break ground on "Document-level work"
46 49  
47 50  
48   -QPDFPagesTree
49   -=============
50   -
51   -Partial work is on qpdf-pages-tree branch. QPDFPageTree is mostly
52   -implemented and mostly tested. There are not enough cases of different
53   -kinds of operations (pclm, linearize, json, etc.) with non-flat pages
54   -trees. Insertion is not implemented.
55   -
56   -Page tree repair is silent (no warnings) and has a comment saying that
57   -we don't need warnings, but I think we should have warnings now that
58   -we have json v2. The reason is that page tree repair will change
59   -object numbers, and it's useful to know that.
60   -
61   -I'm thinking we will want to keep a pages cache for efficient
62   -insertion. There's no reason we can't keep a vector of page objects up
63   -to date and just do a traversal the first time we do getAllPages just
64   -like we do now. The difference is that we would not flatten the pages
65   -tree. It would be useful to go through QPDF_pages and reimplement
66   -everything without calling flattenPagesTree. Then we can remove
67   -flattenPagesTree, which is private.
68   -
69   -In its current state, QPDFPagesTree does not proactively fix /Type or
70   -correct page objects that are used multiple times. You have to
71   -traverse the pages tree to trigger this operation. It would be nice if
72   -we would do that somewhere but not do it more often than necessary so
73   -isPagesObject and isPageObject are reliable and can be made more
74   -reliable. Maybe add a validate or repair function? It should also make
75   -sure /Count and /Parent are correct.
76   -
77   -refs/attic/QPDFPagesTree-old -- original, abandoned branch -- clean up
78   -when done.
79   -
80   -
81 51 JSON v2 fixes
82 52 =============
83 53  
... ... @@ -676,7 +646,47 @@ A few important lessons (in README-maintainer)
676 646 possible.
677 647  
678 648 Also, it turns out that PointerHolder is more performant than
679   -std::shared_ptr.
  649 +std::shared_ptr. (This was true at the time but subsequent
  650 +implementations of std::shared_ptr became much more efficient.)
  651 +
  652 +QPDFPagesTree
  653 +=============
  654 +
  655 +On a few occasions, I have considered implementing a QPDFPagesTree
  656 +object that would allow the document's original page tree structure to
  657 +be preserved. See comments at the top QPDF_pages.cc for why this was
  658 +abandoned.
  659 +
  660 +Partial work is in refs/attic/QPDFPagesTree. QPDFPageTree is mostly
  661 +implemented and mostly tested. There are not enough cases of different
  662 +kinds of operations (pclm, linearize, json, etc.) with non-flat pages
  663 +trees. Insertion is not implemented. Insertion is potentially complex
  664 +because of the issue of inherited objects. We will have to call
  665 +pushInheritedAttributesToPage before adding any pages to the pages
  666 +tree. The test suite is failing on that branch.
  667 +
  668 +Some parts of page tree repair are silent (no warnings). All page tree
  669 +repair should warn. The reason is that page tree repair will change
  670 +object numbers, and knowing that is important when working with JSON
  671 +output.
  672 +
  673 +If we were to do this, we would still need keep a pages cache for
  674 +efficient insertion. There's no reason we can't keep a vector of page
  675 +objects up to date and just do a traversal the first time we do
  676 +getAllPages just like we do now. The difference is that we would not
  677 +flatten the pages tree. It would be useful to go through QPDF_pages
  678 +and reimplement everything without calling flattenPagesTree. Then we
  679 +can remove flattenPagesTree, which is private. That said, with the
  680 +addition of creating non-flat pages trees, there is really no reason
  681 +not to flatten the pages tree for internal use.
  682 +
  683 +In its current state, QPDFPagesTree does not proactively fix /Type or
  684 +correct page objects that are used multiple times. You have to
  685 +traverse the pages tree to trigger this operation. It would be nice if
  686 +we would do that somewhere but not do it more often than necessary so
  687 +isPagesObject and isPageObject are reliable and can be made more
  688 +reliable. Maybe add a validate or repair function? It should also make
  689 +sure /Count and /Parent are correct.
680 690  
681 691 Rejected Ideas
682 692 ==============
... ...
libqpdf/QPDF_pages.cc
... ... @@ -21,6 +21,8 @@
21 21 // reference is returned, the current code is fine and does not need
22 22 // to be replaced. A partial implementation of QPDFPagesTree is in
23 23 // github in attic in case there is ever a reason to resurrect it.
  24 +// There are additional notes in README-maintainer, which also refers
  25 +// to this comment.
24 26  
25 27 // The goal of this code is to ensure that the all_pages vector, which
26 28 // users may have a reference to, and the pageobj_to_pages_pos map,
... ...