Commit 5f0ce88f96ba016b958e85896aa66c830b09d8a5

Authored by Jay Berkenbilt
1 parent 9bc3c5a0

Add new ideas to TODO-pages.md

Showing 1 changed file with 130 additions and 2 deletions
TODO-pages.md
... ... @@ -2,9 +2,137 @@
2 2  
3 3 This file contains plans and notes regarding implementing of the "pages epic." The pages epic consists of the following features:
4 4 * Proper handling of document-level features when splitting and merging documents
5   -* More flexible aways of selecting pages from one or more documents
6   -* More flexible ways of organizing pages, such as n-up, booklet generation ("signatures", as in what `psbook` does), scaling, and more control over overlay and underlay regarding scale and position
7 5 * Insertion of blank pages
  6 +* More flexible aways of
  7 + * selecting pages from one or more documents
  8 + * composing pages out of other pages
  9 + * underlay and overlay with control over position, transformation, and bounding box selection
  10 + * organizing pages
  11 + * n-up
  12 + * booklet generation ("signatures", as in what `psbook` does)
  13 +* Possibly others pending analysis of open issues and public discussion
  14 +
  15 +# Architectural Thoughts
  16 +
  17 +I want to encapsulate various aspects of the logic into interfaces that can be implemented by developers to add their own logic. It should be easy to contribute these. Here are some rough ideas.
  18 +
  19 +A page group is just a group of pages.
  20 +
  21 +* PageSelector -- creates page groups from other page groups
  22 +* PageTransformer -- selects a part of a page and possibly transforms it; applies to all pages of a group. Based on the page dictionary; does not look at the content stream
  23 +* PageFilter -- apply arbitrary code to a page; may access the content stream
  24 +* PageAssembler -- combines pages from groups into new groups whose pages are each assembled from corresponding pages of the input groups
  25 +
  26 +These should be able to be composed in arbitrary ways. There should be a natural API for doing this, and it there should be some specification, probably based on JSON, that can be provided on the command line or embedded in the job JSON format. I have been considering whether a lisp-like S-expression syntax may be less cumbersome to work with. I'll have to decide whether to support this or some other syntax in addition to a JSON representation.
  27 +
  28 +There also needs to be something to represent how document-level structures relate to this. I'm not sure exactly how this should work, but we need things like
  29 +* what to do with page labels, especially when assembling pages from other pages
  30 +* whether to preserve destinations (outlines, links, etc.), particularly when pages are duplicated
  31 + * If A refers to B and there is more than one copy of B, how do you decide which copies of A link to which copies of B?
  32 +* what to do with pages that belong to more than one group, e.g., what happens if you used document structure or outlines to form page groups and a group boundary lies in the middle of the page
  33 +
  34 +Maybe pages groups can have arbitrary, user-defined tags so we can specify that links should only point to other pages with the same value of some tag. We can probably many-to-one links if the source is duplicated.
  35 +
  36 +We probably need to hold onto the concept of the primary input file. If there is a primary input file, there may need to be a way to specify what gets preserved it. The behavior of qpdf prior to all of this is to preserve all document-level constructs from the primary input file and to try to preserve page labels from other input files when combining pages.
  37 +
  38 +Here are some examples.
  39 +
  40 +* PageSelector
  41 + * all pages from an input file
  42 + * pages from a group using a NumericRange
  43 + * concatenate groups
  44 + * pages from a group in reverse order
  45 + * a group repeated as often as necessary until a specified number of pages is reached
  46 + * a group padded with blank pages to create a multiple of n pages
  47 + * odd or even pages from a group
  48 + * every nth page from a group
  49 + * pages interleaved from multiple groups
  50 + * the left-front (left-back, right-front, right-back) pages of a booklet with signatures of n pages
  51 + * all pages reachable from a section of the outline hierarchy or something based on threads or other structure
  52 + * selection based on page labels
  53 +* PageTransformer
  54 + * clip to media box (trim box, crop box, etc.)
  55 + * clip to specific absolute or relative size
  56 + * scale
  57 + * translate
  58 + * rotate
  59 + * apply transformation matrix
  60 +* PageFilter
  61 + * optimize images
  62 + * flatten annotations
  63 +* PageAssembler
  64 + * Overlay/underlay all pages from one group onto corresponding pages from another group
  65 + * Control placement based on properties of all the groups, so higher order than a stand-alone transformer
  66 + * Examples
  67 + * Scale the smaller page up to the size of the larger page
  68 + * Center the smaller page horizontally and bottom-align the trim boxes
  69 + * Generalized overlay/underlay allowing n pages in a given order with transformations.
  70 + * n-up -- application of generalized overlay/underlay
  71 +
  72 +It should be possible to represent all of the existing qpdf operations using the above framework. It would be good to re-implement all of them in terms of this framework to exercise it. We will have to look through all the command-line arguments and make sure. Of course also make sure suggestions from issues can be implemented or at least supported by adding new selectors.
  73 +
  74 +Here are a few bits of scratch work. The top-level call is a selector. This doesn't capture everything. Implementing this would be tedious and challenging. It could be done using JSON arrays, but it would be clunky. This feels over-designed and possibly in conflict with QPDFJob.
  75 +
  76 +```
  77 +(concat
  78 + (primary-input)
  79 + (file "file2.pdf")
  80 + (page-range (file "file3.pdf") "1-4,5-8")
  81 +)
  82 +
  83 +(with
  84 + ("a"
  85 + (concat
  86 + (primary-input)
  87 + (file "file2.pdf")
  88 + (page-range (file "file3.pdf") "1-4,5-8")
  89 + )
  90 + )
  91 + (concat
  92 + (even-pages (from "a"))
  93 + (reverse (odd-pages (from "a")))
  94 + )
  95 +)
  96 +
  97 +(with
  98 + ("a"
  99 + (concat
  100 + (primary-input)
  101 + (file "file2.pdf")
  102 + (page-range (file "file3.pdf") "1-4,5-8")
  103 + )
  104 + "b-even"
  105 + (even-pages (from "a"))
  106 + "b-odd"
  107 + (reverse (odd-pages (from "a")))
  108 + )
  109 + (stack
  110 + (repeat-range (from "a") "z")
  111 + (pad-end (from "b"))
  112 + )
  113 +)
  114 +```
  115 +
  116 +Easier to parse but yuck:
  117 +```json
  118 +["with",
  119 + ["a",
  120 + ["concat",
  121 + ["primary-input"],
  122 + ["file", "file2.pdf"],
  123 + ["page-range", ["file", "file3.pdf"], "1-4,5-8"]
  124 + ],
  125 + "b-even",
  126 + ["even-pages", ["from", "a"]],
  127 + "b-odd",
  128 + ["reverse", ["odd-pages", ["from", "a"]]]
  129 + ],
  130 + ["stack",
  131 + ["repeat-range", ["from", "a"], "z"],
  132 + ["pad-end", ["from", "b"]]
  133 + ]
  134 +]
  135 +```
8 136  
9 137 # To-do list
10 138  
... ...