Commit 25aff0bd52b0382b9349c81aaabc2fde51528923
1 parent
8a32515a
TODO: abandon (again) and update notes about QPDFPagesTree
Showing
2 changed files
with
47 additions
and
35 deletions
TODO
| ... | ... | @@ -9,7 +9,10 @@ Before Release: |
| 9 | 9 | * Release qtest with updates to qtest-driver and copy back into qpdf |
| 10 | 10 | |
| 11 | 11 | Next: |
| 12 | -* QPDFPagesTree -- avoid ever flattening the pages tree. | |
| 12 | +* QPDF -- track whether the pages tree was modified (whether | |
| 13 | + getAllPages was ever called. If so, consider generating a non-flat | |
| 14 | + pages tree before creating output to better handle files with lots | |
| 15 | + of pages. | |
| 13 | 16 | * JSON v2 fixes |
| 14 | 17 | |
| 15 | 18 | Pending changes: |
| ... | ... | @@ -45,39 +48,6 @@ Pending changes: |
| 45 | 48 | Soon: Break ground on "Document-level work" |
| 46 | 49 | |
| 47 | 50 | |
| 48 | -QPDFPagesTree | |
| 49 | -============= | |
| 50 | - | |
| 51 | -Partial work is on qpdf-pages-tree branch. QPDFPageTree is mostly | |
| 52 | -implemented and mostly tested. There are not enough cases of different | |
| 53 | -kinds of operations (pclm, linearize, json, etc.) with non-flat pages | |
| 54 | -trees. Insertion is not implemented. | |
| 55 | - | |
| 56 | -Page tree repair is silent (no warnings) and has a comment saying that | |
| 57 | -we don't need warnings, but I think we should have warnings now that | |
| 58 | -we have json v2. The reason is that page tree repair will change | |
| 59 | -object numbers, and it's useful to know that. | |
| 60 | - | |
| 61 | -I'm thinking we will want to keep a pages cache for efficient | |
| 62 | -insertion. There's no reason we can't keep a vector of page objects up | |
| 63 | -to date and just do a traversal the first time we do getAllPages just | |
| 64 | -like we do now. The difference is that we would not flatten the pages | |
| 65 | -tree. It would be useful to go through QPDF_pages and reimplement | |
| 66 | -everything without calling flattenPagesTree. Then we can remove | |
| 67 | -flattenPagesTree, which is private. | |
| 68 | - | |
| 69 | -In its current state, QPDFPagesTree does not proactively fix /Type or | |
| 70 | -correct page objects that are used multiple times. You have to | |
| 71 | -traverse the pages tree to trigger this operation. It would be nice if | |
| 72 | -we would do that somewhere but not do it more often than necessary so | |
| 73 | -isPagesObject and isPageObject are reliable and can be made more | |
| 74 | -reliable. Maybe add a validate or repair function? It should also make | |
| 75 | -sure /Count and /Parent are correct. | |
| 76 | - | |
| 77 | -refs/attic/QPDFPagesTree-old -- original, abandoned branch -- clean up | |
| 78 | -when done. | |
| 79 | - | |
| 80 | - | |
| 81 | 51 | JSON v2 fixes |
| 82 | 52 | ============= |
| 83 | 53 | |
| ... | ... | @@ -676,7 +646,47 @@ A few important lessons (in README-maintainer) |
| 676 | 646 | possible. |
| 677 | 647 | |
| 678 | 648 | Also, it turns out that PointerHolder is more performant than |
| 679 | -std::shared_ptr. | |
| 649 | +std::shared_ptr. (This was true at the time but subsequent | |
| 650 | +implementations of std::shared_ptr became much more efficient.) | |
| 651 | + | |
| 652 | +QPDFPagesTree | |
| 653 | +============= | |
| 654 | + | |
| 655 | +On a few occasions, I have considered implementing a QPDFPagesTree | |
| 656 | +object that would allow the document's original page tree structure to | |
| 657 | +be preserved. See comments at the top QPDF_pages.cc for why this was | |
| 658 | +abandoned. | |
| 659 | + | |
| 660 | +Partial work is in refs/attic/QPDFPagesTree. QPDFPageTree is mostly | |
| 661 | +implemented and mostly tested. There are not enough cases of different | |
| 662 | +kinds of operations (pclm, linearize, json, etc.) with non-flat pages | |
| 663 | +trees. Insertion is not implemented. Insertion is potentially complex | |
| 664 | +because of the issue of inherited objects. We will have to call | |
| 665 | +pushInheritedAttributesToPage before adding any pages to the pages | |
| 666 | +tree. The test suite is failing on that branch. | |
| 667 | + | |
| 668 | +Some parts of page tree repair are silent (no warnings). All page tree | |
| 669 | +repair should warn. The reason is that page tree repair will change | |
| 670 | +object numbers, and knowing that is important when working with JSON | |
| 671 | +output. | |
| 672 | + | |
| 673 | +If we were to do this, we would still need keep a pages cache for | |
| 674 | +efficient insertion. There's no reason we can't keep a vector of page | |
| 675 | +objects up to date and just do a traversal the first time we do | |
| 676 | +getAllPages just like we do now. The difference is that we would not | |
| 677 | +flatten the pages tree. It would be useful to go through QPDF_pages | |
| 678 | +and reimplement everything without calling flattenPagesTree. Then we | |
| 679 | +can remove flattenPagesTree, which is private. That said, with the | |
| 680 | +addition of creating non-flat pages trees, there is really no reason | |
| 681 | +not to flatten the pages tree for internal use. | |
| 682 | + | |
| 683 | +In its current state, QPDFPagesTree does not proactively fix /Type or | |
| 684 | +correct page objects that are used multiple times. You have to | |
| 685 | +traverse the pages tree to trigger this operation. It would be nice if | |
| 686 | +we would do that somewhere but not do it more often than necessary so | |
| 687 | +isPagesObject and isPageObject are reliable and can be made more | |
| 688 | +reliable. Maybe add a validate or repair function? It should also make | |
| 689 | +sure /Count and /Parent are correct. | |
| 680 | 690 | |
| 681 | 691 | Rejected Ideas |
| 682 | 692 | ============== | ... | ... |
libqpdf/QPDF_pages.cc
| ... | ... | @@ -21,6 +21,8 @@ |
| 21 | 21 | // reference is returned, the current code is fine and does not need |
| 22 | 22 | // to be replaced. A partial implementation of QPDFPagesTree is in |
| 23 | 23 | // github in attic in case there is ever a reason to resurrect it. |
| 24 | +// There are additional notes in README-maintainer, which also refers | |
| 25 | +// to this comment. | |
| 24 | 26 | |
| 25 | 27 | // The goal of this code is to ensure that the all_pages vector, which |
| 26 | 28 | // users may have a reference to, and the pageobj_to_pages_pos map, | ... | ... |