Commit e52b026db4a7f23b79f21b4563d2b02d3b87fde4
1 parent
379fc7e5
Major rework of TODO-pages.md
This is converging into something that will be possible to do.
Showing
1 changed file
with
322 additions
and
471 deletions
TODO-pages.md
| 1 | 1 | # Pages |
| 2 | 2 | |
| 3 | -**THIS IS A WORK IN PROGRESS. THE ACTUAL IMPLEMENTATION MAY NOT LOOK ANYTHING LIKE THIS. When this | |
| 4 | -gets to the stage where it is starting to congeal into an actual plan, I will remove this disclaimer | |
| 5 | -and open a discussion ticket in GitHub to work out details.** | |
| 3 | +**This is a work in progress, but it's getting close. When this gets to the stage where it is | |
| 4 | +starting to congeal into an actual plan, I will remove this disclaimer and open a discussion ticket | |
| 5 | +in GitHub to work out details.** | |
| 6 | 6 | |
| 7 | 7 | This document describes a project known as the _pages epic_. The goal of the pages epic is to enable |
| 8 | 8 | qpdf to properly preserve all functionality associated with a page as pages are copied from one PDF |
| 9 | -to another (or back to the same PDF). | |
| 9 | +to another (or back to the same PDF). A secondary goal is to add more flexiblity to the ways in | |
| 10 | +which documents can be split and combined (flexible assembly). | |
| 10 | 11 | |
| 11 | 12 | Terminology: |
| 12 | 13 | * _Page-level data_: information that is contained within objects reachable from the page dictionary |
| ... | ... | @@ -14,30 +15,33 @@ Terminology: |
| 14 | 15 | * _Document-level data_: information that is reachable from the document catalog (`/Root`) that is |
| 15 | 16 | not reachable from a page dictionary as well as the `/Info` dictionary |
| 16 | 17 | |
| 17 | -Some document-level data references specific pages by page object ID, such as outlines or | |
| 18 | -interactive forms. Some document-level data doesn't reference any pages, such as embedded files or | |
| 19 | -optional content (layers). Some document-level data contains information that pertains to a specific | |
| 20 | -page but does not reference the page, such as page labels (explicit page numbers). Some page-level | |
| 21 | -data may sometimes depend on document-level data. For example, a _named destination_ depends on the | |
| 22 | -document-level _names tree_. | |
| 18 | +PDF uses document-level data in a variety of ways. There is some document-level data that has each | |
| 19 | +of the following properties, among others: | |
| 20 | +* References pages by object ID (outlines, interactive forms) | |
| 21 | +* Doesn't reference any pages (embedded files) | |
| 22 | +* Doesn't reference any pages but influences page rendering (optional content/layers) | |
| 23 | +* Doesn't reference any pages but contains information about pages (page labels) | |
| 24 | +* Contains information used by pages (named destinations) | |
| 23 | 25 | |
| 24 | 26 | As long as qpdf has had the ability to copy pages from one PDF to another, it has had robust |
| 25 | 27 | handling of page-level data. Prior to the implementation of the pages epic, with the exception of |
| 26 | -page labels, qpdf has ignored document-level data during page copy operations. Specifically, when | |
| 27 | -qpdf creates a new PDF file from existing PDF files, it always starts with a specific PDF, known as | |
| 28 | -the _primary input_. The primary input may be the built-in _empty PDF_. With the exception of page | |
| 29 | -labels, document-level constructs that appear in the primary input are preserved, and document-level | |
| 30 | -constructs from the other PDF files are ignored. The exception to this is page labels. With page | |
| 31 | -labels, qpdf always ensures that any given page has the same label in the final output as it had in | |
| 32 | -whichever input file it originated from, which is usually (but not always) the desired behavior. | |
| 28 | +page labels and form fields, qpdf has ignored document-level data during page copy operations. | |
| 29 | +Specifically, when qpdf creates a new PDF file from existing PDF files, it always starts with a | |
| 30 | +specific PDF, known as the _primary input_. The primary input may be a file or the built-in _empty | |
| 31 | +PDF_. With the exception of page labels and form fields, document-level constructs that appear in | |
| 32 | +the primary input are preserved, and document-level constructs from the other PDF files are ignored. | |
| 33 | +With page labels, qpdf always ensures that any given page has the same label in the final output as | |
| 34 | +it had in whichever input file it originated from, which is usually (but not always) the desired | |
| 35 | +behavior. With form fields, qpdf has awareness and ensures that all form fields remain operational. | |
| 36 | +The goal is to extend this document-level-awareness to other document-level constructs. | |
| 33 | 37 | |
| 34 | 38 | Here are several examples of problems in qpdf prior to the implementation of the pages epic: |
| 35 | 39 | * If two files with optional content (layers) are merged, all layers in all but the primary input |
| 36 | 40 | will be visible in the combined file. |
| 37 | 41 | * If two files with file attachments are merged, attachments will be retained on the primary input |
| 38 | 42 | but dropped on the others. (qpdf has other ways to copy attachments from one file to another.) |
| 39 | -* If two files with hyperlinks are merged, any hyperlink from other than primary input whose | |
| 40 | - destination is a named destination will become non-functional. | |
| 43 | +* If two files with hyperlinks are merged, any hyperlink from other than primary input become | |
| 44 | + non-functional. | |
| 41 | 45 | * If two files with outlines are merged, the outlines from the original file will appear in their |
| 42 | 46 | entirety, including outlines that point to pages that are no longer there, and outlines will be |
| 43 | 47 | lost from all files except the primary input. |
| ... | ... | @@ -55,27 +59,32 @@ arbitrary combinations of input and output files. The command-line allows only t |
| 55 | 59 | |
| 56 | 60 | The pages epic consists of two broad categories of work: |
| 57 | 61 | * Proper handling of document-level features when splitting and merging documents |
| 58 | -* Greatly increased flexibility in the ways in which pages can be selected from the various input | |
| 59 | - files and combined for the output file. This includes creation of blank pages. | |
| 62 | +* Flexible assembly: greatly increased flexibility in the ways in which pages can be selected from | |
| 63 | + the various input files and combined for the output file. This includes creation of blank pages | |
| 64 | + and composition of pages (n-up or other ways of combining multiple input pages into one output | |
| 65 | + page) | |
| 60 | 66 | |
| 61 | 67 | Here are some examples of things that will become possible: |
| 62 | 68 | |
| 63 | 69 | * Stacking arbitrary pages on top of each other with full control over transformation and cropping, |
| 64 | 70 | including being able to access information about the various bounding boxes associated with the |
| 65 | - pages | |
| 71 | + pages (generalization of underlay/overlay) | |
| 66 | 72 | * Inserting blank pages |
| 67 | 73 | * Doing n-up page layouts |
| 74 | +* Creating single very long or wide pages with output from other pages | |
| 68 | 75 | * Re-ordering pages for printing booklets (also called signatures or printer spreads) |
| 69 | 76 | * Selecting pages based on the outline hierarchy, tags, or article threads |
| 70 | 77 | * Keeping only and all relevant parts of the outline hierarchies from all input files |
| 71 | -* Creating single very long or wide pages with output from other pages | |
| 72 | 78 | |
| 73 | 79 | The rest of this document describes the details of what how these features will work and what needs |
| 74 | 80 | to be done to make them possible to build. |
| 75 | 81 | |
| 76 | -# QPDFJob Summary | |
| 82 | +# Architectural Thoughts | |
| 83 | + | |
| 84 | +Open question: if I do all the complex logic in `QPDFJob`, what are the implications for pikepdf or | |
| 85 | +other wrappers? This will need to be discussed in the discussion ticket. | |
| 77 | 86 | |
| 78 | -`QPDFJob` goes through the following stages: | |
| 87 | +Prior to implementation of the pages epic, `QPDFJob` goes through the following stages: | |
| 79 | 88 | |
| 80 | 89 | * create QPDF |
| 81 | 90 | * update from JSON |
| ... | ... | @@ -113,176 +122,299 @@ to be done to make them possible to build. |
| 113 | 122 | * Remove unreference resources if needed |
| 114 | 123 | * Preserve form fields and page labels |
| 115 | 124 | |
| 116 | -# Architectural Thoughts | |
| 117 | - | |
| 118 | -XXX WORK IN: Dump `QPDFAssembler`. Instead, these are enhancements to `QPDFJob`. Don't try to | |
| 119 | -generalize this too much. There are actually only a few things we need to add to `QPDFJob`. Go | |
| 120 | -through and flesh out the list, but roughly: | |
| 121 | - | |
| 125 | +Broadly, the above has to be modified in the following ways: | |
| 122 | 126 | * From the C++ API, make it possible to use an arbitrary QPDF as an input rather than having to |
| 123 | 127 | start with a file. That makes it possible to do arbitrary work on the PDF prior to submitting it. |
| 124 | -* Allow specification of n blank pages of a given size, e.g. `--blank=5@612x792`. Maybe we can | |
| 125 | - support standard paper sizes, inches, centimeters, or sizes relative to other pages. | |
| 128 | +* Allow creation of blank pages as an additional input source | |
| 126 | 129 | * Generalize underlay/overlay |
| 127 | - * Maybe we can do it by adding flags and allowing them to be repeated | |
| 128 | - * Maybe we need a new syntax, like pstops, but with the ability to specify anchors and proportions | |
| 129 | - based on varoius boxes | |
| 130 | - * Maybe we need something like `--stack` | |
| 131 | - * It needs to be possible to stack arbitrary pages with arbitrary transformations and to have the | |
| 132 | - transformations be a function of the source or destination page; the rectangle mapping idea | |
| 133 | - discussed elsewhere may be a good basis | |
| 130 | + * Enable controlling placement | |
| 131 | + * Make repeatable | |
| 132 | +* Add additional reordering options | |
| 133 | + * We don't need to provide hooks for this. If someone is going to code a hook, they can just | |
| 134 | + compute the page ordering directly. | |
| 134 | 135 | * Have a page composition phase after the overlay/underlay stage |
| 135 | 136 | * Allow n-up, left-to-right (can reverse page order to get rtl), top-to-bottom, or modular |
| 136 | 137 | composition like pstops |
| 137 | - * Possible hook for page composition to allow custom compositions | |
| 138 | -* A few additional split options | |
| 139 | - | |
| 140 | -Then, we need to make the existing logic handle other document-level structures, preferably in a way | |
| 141 | -that requires less duplication between split and merge. Maybe we can add a flag to disregard | |
| 142 | -document-level structures for speed, but I think being able to turn them on and off individually is | |
| 143 | -overkill, especially since people who are that sophisticated can tweak with JSON or just do it in | |
| 144 | -code. | |
| 145 | - | |
| 146 | -The challenge will be able to come up with command-line syntax to do most things from the CLI and to | |
| 147 | -make the C++ API flexible enough for users to insert their own bits in key places, just as we can | |
| 148 | -now grab the QPDF before the write phase. This approach eliminates all the function stuff. We just | |
| 149 | -have to make sure we can support all these features and have a relatively easy way to add new ones | |
| 150 | -or to let developers extend. The documentation will have to explain the flow of QPDFJob so people | |
| 151 | -can know where to apply hooks. | |
| 152 | - | |
| 153 | ----------- | |
| 154 | - | |
| 155 | -Create a new top-level class called `QPDFAssembler` that will be used to perform page-level | |
| 156 | -operations. Its implementation will use existing APIs, and it will add many new APIs. It should be | |
| 157 | -possible to perform all existing page splitting and merging operations using `QPDFAssembler` without | |
| 158 | -having to worry about details such as copying annotations, remapping destinations, and adjusting | |
| 159 | -document-level data. | |
| 160 | - | |
| 161 | -Early strategy: keep `QPDFAssembler` private to the library, and start with a pure C++ API (no JSON | |
| 162 | -support). Migrate splitting and merging from `QPDFJob` into `QPDFAssembler`, then build in | |
| 163 | -document-level support. Also work the difference between normal write and split, which are two | |
| 164 | -separate ways to write output files. | |
| 165 | - | |
| 166 | -One of the main responsibilities of `QPDFAssembler` will be to remap destinations as data from a | |
| 167 | -page is moved or copied. For example, if an outline has a destination that points to a particular | |
| 168 | -rectangle on page 5 of the second file, and we end up dropping a portion of that page into an n-up | |
| 169 | -configuration on a specific output page, we will have to keep track of enough information to replace | |
| 170 | -the destination with a new one that points to the new physical location of the same material. For | |
| 171 | -another example, consider a case in which the left side of page 3 of the primary input ends up as | |
| 172 | -page 5 of the output and the right side of page 3 ends up as page 6. We would have to map | |
| 173 | -destinations from a single source page to different destination pages based on which part of the | |
| 174 | -page it was on. If part of the rectangle points to one page and part to another, what do we do? I | |
| 175 | -suggest we go with the top/center of the rectangle. | |
| 176 | - | |
| 177 | -A destination consists of a QPDF, page object, and rectangle in user coordinates. When | |
| 178 | -`QPDFAssembler` copies a page or converts it to a form XObject, possibly with transformations | |
| 179 | -applied, it will have to be able to map a destination to the same triple (QPDF, page object, | |
| 180 | -rectangle) on all pages that contain data from the original page. When writing the final output, any | |
| 181 | -destination that no longer points anywhere should be dropped, and any destination that points to | |
| 182 | -multiple places will need to be handled according to some specification. | |
| 138 | +* Add additional ways to select pages besides range (e.g. based on outlines) | |
| 139 | +* Add additional ways to specify boundaries for splitting | |
| 140 | +* Enhance existing logic to handle other document-level structures, preferably in a way that | |
| 141 | + requires less duplication between split and merge. | |
| 142 | + * We don't need to turn on and off most types of document constructs individually. People can | |
| 143 | + preprocess using the API or qpdf JSON if they want fine-grained control. | |
| 144 | + * For things like attachments and outlines, we can add additional flags. | |
| 145 | + | |
| 146 | +## Flexible Assembly | |
| 147 | + | |
| 148 | +This section discusses modifications to the command-line syntax to make it easier to add flexibility | |
| 149 | +going forward without breaking backward compatibility. The main thrust will be to create | |
| 150 | +non-positional alternatives to some things that currently use positional arguments (`--pages`, | |
| 151 | +`--overlay`, `--underlay`), as was done for `--encrypt` in 11.7.0, to make it possible to add | |
| 152 | +additional flags. | |
| 153 | + | |
| 154 | +In several cases, we allow specification of transformations or placements. In this context: | |
| 155 | +* The origin is always lower-left corner. | |
| 156 | +* A _dimension_ may be absolute or relative. | |
| 157 | + * An _absolute dimension_ is `{n}` (in points), `{n}in` (inches), `{n}cm` (centimeters), | |
| 158 | + * A _relative dimension_ is expressed in terms of the corresponding dimension of one of a page's | |
| 159 | + boxes. Which dimension is determined by context. | |
| 160 | + * `{n}{M|C|B|T|A}` is `{n}` times the corresopnding dimension of the media, crop, bleed, trim, | |
| 161 | + or art box. Example: `0.5M` would be half the width or height of the media box. | |
| 162 | + * `{n}+{M|C|B|T|A}` is `{n}` plus the corresponding dimension. Example: `-0.5in+T` is half an | |
| 163 | + inch (36 points) less than the width or height of the trim box. | |
| 164 | +* A _size_ is | |
| 165 | + * `{w}x{h}`, where `{w}` and `{h}` are dimensions | |
| 166 | + * `letter|a4` (potentially add other page sizes) | |
| 167 | +* A _position_ is `{x}x{y}` where `{x}` and `{y}` are dimensions offset from the origin | |
| 168 | +* A _rectangle_ is `{llx},{lly},{urx},{ury}` (lower|upper left|right x|y) with `llx` < `urx` and | |
| 169 | + `lly` < `ury` | |
| 170 | + * Examples: | |
| 171 | + * `0.1M,0.1M,0.9M,0.9M` is a box whose llx is 10% of the media box width, lly is 10% of the | |
| 172 | + height, urx is 90% of the width, and ury is 90% of the height | |
| 173 | + * `0,0,612,792` is a box whose size is that of a US Letter page. | |
| 174 | + * A rectangle may also be just one of `M|C|B|T|A` to refer to a page's media, crop, bleed, trim, | |
| 175 | + or art box. | |
| 176 | + | |
| 177 | +Tweak `--pages` similarly to `--encrypt`. As an alternative to `--pages file [--password=p] range | |
| 178 | +--`, support `--pages --file=x --password=y --range=z --`. This allows for a more flexible syntax. | |
| 179 | +If `--file` appears, positional arguments are disallowed. The same applies to `--overlay` and | |
| 180 | +`--underlay`. | |
| 181 | + | |
| 182 | +``` | |
| 183 | +OLD: qpdf 2.pdf --pages 1.pdf --password=x . 3.pdf 1-z -- out.pdf | |
| 184 | +NEW: qpdf 2.pdf --pages --file=1.pdf --password=x --file=. --file 3.pdf --range=1-z -- out.pdf | |
| 185 | +``` | |
| 186 | + | |
| 187 | +This makes it possible to add additional flags to do things like control how document-level features | |
| 188 | +are handled, specify placement options, etc. Given the above framework, it would be possible to add | |
| 189 | +additional features incrementally, without breaking compatibility, such as selecting or splitting | |
| 190 | +pages based on tags, article threads, or outlines. | |
| 191 | + | |
| 192 | +It's tempting to allow assemblies to be nested, but this gets very complicated. From the C++ API, we | |
| 193 | +could modify QPDFJob to allow the use any QPDF as an input, but supporting this from the CLI is hard | |
| 194 | +because of the way JSON/arg parsing is set up. If people need to do that, they can just create | |
| 195 | +intermediate files. | |
| 196 | + | |
| 197 | +Proposed CLI enhancements: | |
| 198 | + | |
| 199 | +``` | |
| 200 | +# --pages: inputs | |
| 201 | +--file=x [ --password=x ] | |
| 202 | +--blank=n [ --size={size} [ --size-from-page=n ] ] # see below | |
| 203 | +# modifiers refer to most recent input | |
| 204 | +--range=... | |
| 205 | +--with-attachments={none|all|referenced} # default = referenced | |
| 206 | +--with-outlines={none|all|referenced} # default = referenced | |
| 207 | +--... # future options to select pages based on outlines, article threads, tags, etc. | |
| 208 | +# placement (matrix transformation -- see notes below) | |
| 209 | +--rotate=[+-]angle[:page-range] # existing | |
| 210 | +--scale=x,y[:page-range] | |
| 211 | +--translate=dx,dy[:page-range] # dx and dy are dimensions | |
| 212 | +--flip={h|v}[:page-range] | |
| 213 | +--transform=a,b,c,d,e,f[:page-range] | |
| 214 | +--set-box={M|C|B|T|A}=rect[:page-range] # change a bounding box | |
| 215 | +# stacking -- make --underlay and --overlay repeatbale | |
| 216 | +--{underlay|overlay} ... -- | |
| 217 | +--file=x [ --password=x ] | |
| 218 | +--from, --to, --repeat # same as current --overlay, --underlay | |
| 219 | +--from-rect={rect} # default = T -- see notes | |
| 220 | +--to-rect={rect} # default = M -- see notes | |
| 221 | +# composition -- a new QPDFJob stage between stacking and transformation | |
| 222 | +--compose=... # see notes | |
| 223 | +--n-up={2,4,6,9,16} | |
| 224 | +--concat={h|v} # concatenate all pages to a single big page | |
| 225 | +# reordering | |
| 226 | +--collate=a,b,c # exists | |
| 227 | +--booklet=... # re-order pages for book signatures like psbook -- see notes | |
| 228 | +# split | |
| 229 | +--split-pages=n # existing | |
| 230 | +--split-after=a,b,c # split after each named page | |
| 231 | +--... # future options to split based on outlines, article threads, tags, etc. | |
| 232 | +# post-processing (with transformations like optimize images) | |
| 233 | +--set-page-labels ... # See issue #939 | |
| 234 | +``` | |
| 235 | + | |
| 236 | +Notes: | |
| 237 | +* For `--blank`, `--size` specifies the size of the blank page. If any relative dimensions are used, | |
| 238 | + `--size-from-page=n` must be used to specify the page (from n in the overall input) that relative | |
| 239 | + dimensions should be taken from. It is an error to specify a relative size based on another blank | |
| 240 | + page. (Let's not complicate things by doing a graph traversal to find an eventual absolute page. | |
| 241 | + Just disallow a blank page to specified relative to another blank page.) | |
| 242 | +* For stacking, the default is to map the source page's trim box onto the destination page's | |
| 243 | + mediabox. This is a weird default, but it's there for compatibility. The `--from-rect` and | |
| 244 | + `--to-rect` may be used to map an arbitrary region of the over/underlay file into an arbitrary | |
| 245 | + region of a page. With the defaults, an overlay or underlay page will be stretched or shrunk if | |
| 246 | + pages are of variable size. Absolute rectangles can be used to avoid this. If a rectangle uses | |
| 247 | + relative dimensions, they are relative to the page that has the rectangle. You can't create a | |
| 248 | + `--to-rect` relative to the size of the from page or vice versa. If you need to do this, use | |
| 249 | + external logic to compute the rectangles and then use absolute rectangles. | |
| 250 | +* `--compose`: XXX | |
| 251 | +* `--booklet`: XXX | |
| 252 | +* The `--set-page-labels` option would be done at the very end and is actually not blocked by | |
| 253 | + anything else here. It can be done near removing page labels in `handleTransformations`. | |
| 254 | +* I'm not sure what impact composition should have on page labels. Most likely, we should drop page | |
| 255 | + labels on composition. If someone wants them, they can use `--set-page-labels`. | |
| 256 | + | |
| 257 | +### Compose, Booklet | |
| 258 | + | |
| 259 | +This section needs to be fleshed out. It is probably lower priority than document-level work. | |
| 260 | + | |
| 261 | +Here are some ideas from pstops. The following is an excerpt from the pstops manual page. Maybe we | |
| 262 | +can come up with something similar using our enhanced rectangle syntax. | |
| 263 | + | |
| 264 | +This section contains some sample reโarrangements. To put two pages on one sheet (of A4 paper), | |
| 265 | +the pagespec to use is: | |
| 266 | +``` | |
| 267 | +2:0L@.7(21cm,0)+1L@.7(21cm,14.85cm) | |
| 268 | +``` | |
| 269 | +To select all of the odd pages in reverse order, use: | |
| 270 | +``` | |
| 271 | +2:โ0 | |
| 272 | +``` | |
| 273 | +To reโarrange pages for printing 2โup booklets, use | |
| 274 | +``` | |
| 275 | +4:โ3L@.7(21cm,0)+0L@.7(21cm,14.85cm) | |
| 276 | +``` | |
| 277 | +for the front sides, and | |
| 278 | +``` | |
| 279 | +4:1L@.7(21cm,0)+โ2L@.7(21cm,14.85cm) | |
| 280 | +``` | |
| 281 | +for the reverse sides (or join them with a comma for duplex printing). | |
| 282 | + | |
| 283 | +From issue #493 | |
| 284 | +``` | |
| 285 | + pdf2ps infile.pdf infile.ps | |
| 286 | + ps2ps -pa4 "2:0R(4.5cm,26.85cm)+1R(4.5cm,14.85cm)" infile.ps outfile.ps | |
| 287 | + ps2pdf outfile.ps outfile.pdf | |
| 288 | + ``` | |
| 289 | + | |
| 290 | +Notes on signatures (psbook). For a signature of size 3, we have the following assuming a 2-up | |
| 291 | +configuration that is printed double-sided so that, when the whole stack is placed face-up and | |
| 292 | +folded in half, page 1 is on top. | |
| 293 | +* front: 6,7, back: 8,5 | |
| 294 | +* front: 4,9, back: 10,3 | |
| 295 | +* front: 2,11, back: 12,1 | |
| 296 | + | |
| 297 | +This is the same as duplex 2-up with pages in order 6, 7, 8, 5, 4, 9, 10, 3, 2, 11, 12, 1 | |
| 298 | + | |
| 299 | +n-up: | |
| 300 | +* For 2-up, calculate new w and h such that w/h maintains a fixed ratio and w and h are the largest | |
| 301 | + values that can fit within 1/2 the page with specified margins. | |
| 302 | +* Can support 1, 2, 4, 6, 9, 16. 2 and 6 require rotation. The others don't. Will probably need to | |
| 303 | + change getFormXObjectForPage to handle other boxes than trim box. | |
| 304 | +* Maybe define n-up a scale and rotate followed by fitting the result into a specified rectangle. I | |
| 305 | + might already have this logic in QPDFAnnotationObjectHelper::getPageContentForAppearance. | |
| 306 | + | |
| 307 | +## Destinations | |
| 308 | + | |
| 309 | +We will have to keep track of destinations that point to a page when the page is moved or copied. | |
| 310 | +For example, if an outline has a destination that points to a particular rectangle on page 5 of the | |
| 311 | +second file, and we end up dropping a portion of that page into an n-up configuration on a specific | |
| 312 | +output page, we will have to keep track of enough information to replace the destination with a new | |
| 313 | +one that points to the new physical location of the same material. For another example, consider a | |
| 314 | +case in which the left side of page 3 of the primary input ends up as page 5 of the output and the | |
| 315 | +right side of page 3 ends up as page 6. We would have to map destinations from a single source page | |
| 316 | +to different destination pages based on which part of the page it was on. If part of the rectangle | |
| 317 | +points to one page and part to another, what do we do? I suggest we go with the top/center of the | |
| 318 | +rectangle. | |
| 319 | + | |
| 320 | +A destination consists of a QPDF, page object, and rectangle in user coordinates. When `QPDFJob` | |
| 321 | +copies a page or converts it to a form XObject, possibly with transformations applied, it will have | |
| 322 | +to be able to map a destination to the same triple (QPDF, page object, rectangle) on all pages that | |
| 323 | +contain data from the original page. When writing the final output, any destination that no longer | |
| 324 | +points anywhere should be dropped, and any destination that points to multiple places will need to | |
| 325 | +be handled according to some specification. | |
| 183 | 326 | |
| 184 | 327 | Whenever we create any new thing from a page, we create _derived page data_. Examples of derived |
| 185 | -page data would include a copy of the page and a form XObject created from a page. `QPDFAssembler` | |
| 186 | -will have to keep a mapping from any source page to all of its derived objects along with any | |
| 187 | -transformations or clipping. When a derived page data object is placed on a final page, that | |
| 188 | -information can be combined with the position and any transformations onto the final page to be able | |
| 189 | -to map any destination to a new one or to determine that it points outside of the visible area. | |
| 190 | - | |
| 191 | -If a source page is copied multiple times, then if exactly one copy is explicitly marked as the | |
| 192 | -target, that becomes the target. Otherwise, the first derived object to be placed becomes the | |
| 193 | -target. | |
| 194 | - | |
| 195 | -## Overall Structure | |
| 196 | - | |
| 197 | -A single instance of `QPDFAssembler` creates a single assembly job. `QPDFJob` can create one | |
| 198 | -assembly job but does other things, such as setting writer options, inspection operations, etc. An | |
| 199 | -assembly job consists of the following: | |
| 200 | -* Global document-level data handling information | |
| 201 | - * Mode | |
| 202 | - * intelligent: try to combine everything using latest capabilities of qpdf; this is the default | |
| 203 | - * legacy: document-level features are kept from primary input; this is for compatibility and can | |
| 204 | - be selected from the CLI | |
| 205 | -* Input sources | |
| 206 | - * File/password | |
| 207 | - * Whether to keep attachments: yes, no, if-all-pages (default) | |
| 208 | - * Empty | |
| 209 | -* Output mode | |
| 210 | - * Single file | |
| 211 | - * Split -- this must include definitions of the split groups | |
| 212 | -* Description of the output in terms of the input sources and some series of transformations | |
| 213 | - | |
| 214 | -## Cases to support | |
| 215 | - | |
| 216 | -Here is a list of cases that need to be expressible. | |
| 217 | - | |
| 218 | -* Create output by concatenating pages from page groups where each page group is pages specified by | |
| 219 | - a numeric range. This is what `--pages` does now. | |
| 220 | -* Collation, including different sized groups. | |
| 221 | -* Overlay/underlay, generalized to support a stack consisting of various underlays, the base page, | |
| 222 | - and various overlays, with flexibility around posititioning. It should be natural to express | |
| 223 | - exactly whate underlay and overlay do now. | |
| 224 | -* Split into groups of fixed size (what `--split-pages` does) with the ability to define split | |
| 225 | - groups based on other things, like outlines, article threads, and document structure | |
| 226 | -* Examples from the manual: | |
| 227 | - * `qpdf in.pdf --pages . a.pdf b.pdf:even -- out.pdf` | |
| 228 | - * `qpdf --empty --pages a.pdf b.pdf --password=x z-1 c.pdf 3,6` | |
| 229 | - * `qpdf --collate odd.pdf --pages . even.pdf -- all.pdf` | |
| 230 | - * `qpdf --collate --empty --pages odd.pdf even.pdf -- all.pdf` | |
| 231 | - * `qpdf --collate --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf r1 -- out.pdf` | |
| 232 | - * `qpdf --collate=2 --empty --pages a.pdf 1-5 b.pdf 6-4 c.pdf r1 -- out.pdf` | |
| 233 | - * `qpdf file2.pdf --pages file1.pdf 1-5 . 15-11 -- outfile.pdf` | |
| 234 | - * | |
| 235 | - ``` | |
| 236 | - qpdf --empty --copy-encryption=encrypted.pdf \ | |
| 237 | - --encryption-file-password=pass \ | |
| 238 | - --pages encrypted.pdf --password=pass 1 \ | |
| 239 | - ./encrypted.pdf --password=pass 1 -- \ | |
| 240 | - outfile.pdf | |
| 241 | - ``` | |
| 242 | - * `qpdf --collate=2,6 a.pdf --pages . b.pdf -- all.pdf` | |
| 243 | - * Take A 1-2, B 1-6, A 3-4, C 7-12, A 5-6, B 13-18, ... | |
| 244 | -* Ideas from pstops. The following is an excerpt from the pstops manual page. | |
| 245 | - | |
| 246 | - This section contains some sample reโarrangements. To put two pages on one sheet (of A4 paper), | |
| 247 | - the pagespec to use is: | |
| 248 | - ``` | |
| 249 | - 2:0L@.7(21cm,0)+1L@.7(21cm,14.85cm) | |
| 250 | - ``` | |
| 251 | - To select all of the odd pages in reverse order, use: | |
| 252 | - ``` | |
| 253 | - 2:โ0 | |
| 254 | - ``` | |
| 255 | - To reโarrange pages for printing 2โup booklets, use | |
| 256 | - ``` | |
| 257 | - 4:โ3L@.7(21cm,0)+0L@.7(21cm,14.85cm) | |
| 258 | - ``` | |
| 259 | - for the front sides, and | |
| 260 | - ``` | |
| 261 | - 4:1L@.7(21cm,0)+โ2L@.7(21cm,14.85cm) | |
| 262 | - ``` | |
| 263 | - for the reverse sides (or join them with a comma for duplex printing). | |
| 264 | -* From #493 | |
| 265 | - ``` | |
| 266 | - pdf2ps infile.pdf infile.ps | |
| 267 | - ps2ps -pa4 "2:0R(4.5cm,26.85cm)+1R(4.5cm,14.85cm)" infile.ps outfile.ps | |
| 268 | - ps2pdf outfile.ps outfile.pdf | |
| 269 | - ``` | |
| 270 | -* Like psbook. Signature size n: | |
| 271 | - * take groups of 4n | |
| 272 | - * shown for n=3 in order such that, if printed so that the front of the first page is on top, the | |
| 273 | - whole stack can be folded in half. | |
| 274 | - * front: 6,7, back: 8,5 | |
| 275 | - * front: 4,9, back: 10,3 | |
| 276 | - * front: 2,11, back: 12,1 | |
| 277 | - | |
| 278 | - This is the same as duplex 2-up with pages in order 6, 7, 8, 5, 4, 9, 10, 3, 2, 11, 12, 1 | |
| 279 | -* n-up: | |
| 280 | - * For 2-up, calculate new w and h such that w/h maintains a fixed ratio and w and h are the | |
| 281 | - largest values that can fit within 1/2 the page with specified margins. | |
| 282 | - * Can support 1, 2, 4, 6, 9, 16. 2 and 6 require rotation. The others don't. Will probably need to | |
| 283 | - change getFormXObjectForPage to handle other boxes than trim box. | |
| 284 | - * Maybe define n-up a scale and rotate followed by fitting the result into a specified rectangle. | |
| 285 | - I might already have this logic in QPDFAnnotationObjectHelper::getPageContentForAppearance. | |
| 328 | +page data would include a copy of the page and a form XObject created from a page. We will have to | |
| 329 | +keep a mapping from any source page to all of its derived objects along with any transformations or | |
| 330 | +clipping. When a derived page data object is placed on a final page, that information can be | |
| 331 | +combined with the position and any transformations onto the final page to be able to map any | |
| 332 | +destination to a new one or to determine that it points outside of the visible area. There is | |
| 333 | +already code in placeFormXObject and the code that places appearance streams that deals with these | |
| 334 | +kinds of mappings. | |
| 335 | + | |
| 336 | +What do we do if a source page is copied multiple times? I think we will have to just make the new | |
| 337 | +destination point to the first place that the target appears with precedence going to the original | |
| 338 | +location. If we can detect this, we can give a warning. | |
| 339 | + | |
| 340 | +# Document-level Behavior | |
| 341 | + | |
| 342 | +Both merging and splitting contain logic, sometimes duplicated, to handle page labels, form fields, | |
| 343 | +and annotations. We will need to build logic for other things. This section is a rough breakdown of | |
| 344 | +the different things in the document catalog (plus the info dictionary, which is referenced from the | |
| 345 | +trailer) and how we may have to handle them. We will need to implement various ObjectHelper and | |
| 346 | +DocumentHelper classes. | |
| 347 | + | |
| 348 | +7.7.2 contains the list of all keys in the document catalog. | |
| 349 | + | |
| 350 | +Document-level structures to merge: | |
| 351 | +* Extensions | |
| 352 | + * Must be combination of Extensions from all input files | |
| 353 | +* PageLabels | |
| 354 | + * Ensure each page has its original label | |
| 355 | + * Allow post-processing | |
| 356 | +* Names -- see below | |
| 357 | + * Combine per tree | |
| 358 | + * May require disambiguation | |
| 359 | + * Page: TemplateInstantiated | |
| 360 | +* Dests | |
| 361 | + * Keep referenced destinations across all files | |
| 362 | + * May need to disambiguate or "flatten" or convert to named dests with the names tree | |
| 363 | +* Outlines | |
| 364 | +* Threads (easy) | |
| 365 | + * Page: B | |
| 366 | +* AcroForm | |
| 367 | +* StructTreeRoot | |
| 368 | + * Page: StructParents | |
| 369 | +* MarkInfo (see 14.7 - Logical Structure, 14.8 Tagged PDF) | |
| 370 | +* SpiderInfo | |
| 371 | + * Page: ID | |
| 372 | +* OutputIntents | |
| 373 | + * Page: OutputIntents | |
| 374 | +* PieceInfo | |
| 375 | + * Page: PieceInfo | |
| 376 | +* OCProperties | |
| 377 | +* Requirements | |
| 378 | +* AF (file specification dictionaries) | |
| 379 | + * Page: AF | |
| 380 | +* DPartRoot | |
| 381 | + * Page: DPart | |
| 382 | +* Version | |
| 383 | + * Maximum | |
| 384 | + | |
| 385 | +Things that stay with the first document that has one and/or will not be supported | |
| 386 | +* AA (Additional Actions) | |
| 387 | + * Would be possible to combine and let the first contributor win, but it probably wouldn't usually | |
| 388 | + be what we want. | |
| 389 | +* Info (not part of document catalog) | |
| 390 | +* ViewerPreferences | |
| 391 | +* PageLayout | |
| 392 | +* PageMode | |
| 393 | +* OpenAction | |
| 394 | +* URI | |
| 395 | +* Metadata | |
| 396 | +* Lang | |
| 397 | +* NeedsRendering | |
| 398 | +* Collection | |
| 399 | +* Perms | |
| 400 | +* Legal | |
| 401 | +* DSS | |
| 402 | + | |
| 403 | +Name dictionary (7.7.4) | |
| 404 | +* Dests | |
| 405 | +* AP (appearance streams) | |
| 406 | +* JavaScript | |
| 407 | +* Pages (named pages) | |
| 408 | +* Templates | |
| 409 | + * Combine across all documents | |
| 410 | + * Page: TemplateInstantiated points to a named page | |
| 411 | +* IDS | |
| 412 | +* URLS | |
| 413 | +* EmbeddedFiles | |
| 414 | +* AlternatePresentations | |
| 415 | +* Renditions | |
| 416 | + | |
| 417 | +Most of chapter 12 applies. See Document-level navigation (12.3). | |
| 286 | 418 | |
| 287 | 419 | # Feature to Issue Mapping |
| 288 | 420 | |
| ... | ... | @@ -292,6 +424,8 @@ Last checked: 2023-12-29 |
| 292 | 424 | gh search issues label:pages --repo qpdf/qpdf --limit 200 --state=open |
| 293 | 425 | ``` |
| 294 | 426 | |
| 427 | +* Allow an existing `QPDF` to be an input to a merge operation when using the QPDFJob C++ API | |
| 428 | + * Issues: none | |
| 295 | 429 | * Generate a mapping from source to destination for all destinations |
| 296 | 430 | * Issues: #1077 |
| 297 | 431 | * Notes: |
| ... | ... | @@ -328,7 +462,7 @@ gh search issues label:pages --repo qpdf/qpdf --limit 200 --state=open |
| 328 | 462 | * This looks complicated. It may be not be possible to do this fully in the first increment, but |
| 329 | 463 | we have to keep it in mind and warn if we can't and we see /SD in an action. |
| 330 | 464 | * #490 has some good analysis |
| 331 | -* Assign page labels | |
| 465 | +* Assign page labels (renumber pages) | |
| 332 | 466 | * Issues: #939 |
| 333 | 467 | * Notes: |
| 334 | 468 | * #939 has a good proposal |
| ... | ... | @@ -381,286 +515,3 @@ gh search issues label:pages --repo qpdf/qpdf --limit 200 --state=open |
| 381 | 515 | * There is some helpful discussion in #343 including |
| 382 | 516 | * Preserving open/closed status |
| 383 | 517 | * Preserving javascript actions |
| 384 | - | |
| 385 | -# XXX OLD NOTES | |
| 386 | - | |
| 387 | -I want to encapsulate various aspects of the logic into interfaces that can be implemented by | |
| 388 | -developers to add their own logic. It should be easy to contribute these. Here are some rough ideas. | |
| 389 | - | |
| 390 | -A source is an input file, the output of another operation, or a blank page. In the API, it can be | |
| 391 | -any QPDF object. | |
| 392 | - | |
| 393 | -A page group is just a group of pages. | |
| 394 | - | |
| 395 | -* PageSelector -- creates page groups from other page groups | |
| 396 | -* PageTransformer -- selects a part of a page and possibly transforms it; applies to all pages of a | |
| 397 | - group. Based on the page dictionary; does not look at the content stream | |
| 398 | -* PageFilter -- apply arbitrary code to a page; may access the content stream | |
| 399 | -* PageAssembler -- combines pages from groups into new groups whose pages are each assembled from | |
| 400 | - corresponding pages of the input groups | |
| 401 | - | |
| 402 | -These should be able to be composed in arbitrary ways. There should be a natural API for doing this, | |
| 403 | -and it there should be some specification, probably based on JSON, that can be provided on the | |
| 404 | -command line or embedded in the job JSON format. I have been considering whether a lisp-like | |
| 405 | -S-expression syntax may be less cumbersome to work with. I'll have to decide whether to support this | |
| 406 | -or some other syntax in addition to a JSON representation. | |
| 407 | - | |
| 408 | -There also needs to be something to represent how document-level structures relate to this. I'm not | |
| 409 | -sure exactly how this should work, but we need things like | |
| 410 | -* what to do with page labels, especially when assembling pages from other pages | |
| 411 | -* whether to preserve destinations (outlines, links, etc.), particularly when pages are duplicated | |
| 412 | - * If A refers to B and there is more than one copy of B, how do you decide which copies of A link | |
| 413 | - to which copies of B? | |
| 414 | -* what to do with pages that belong to more than one group, e.g., what happens if you used document | |
| 415 | - structure or outlines to form page groups and a group boundary lies in the middle of the page | |
| 416 | - | |
| 417 | -Maybe pages groups can have arbitrary, user-defined tags so we can specify that links should only | |
| 418 | -point to other pages with the same value of some tag. We can probably many-to-one links if the | |
| 419 | -source is duplicated. | |
| 420 | - | |
| 421 | -We probably need to hold onto the concept of the primary input file. If there is a primary input | |
| 422 | -file, there may need to be a way to specify what gets preserved it. The behavior of qpdf prior to | |
| 423 | -all of this is to preserve all document-level constructs from the primary input file and to try to | |
| 424 | -preserve page labels from other input files when combining pages. | |
| 425 | - | |
| 426 | -Here are some examples. | |
| 427 | - | |
| 428 | -* PageSelector | |
| 429 | - * all pages from an input file | |
| 430 | - * pages from a group using a NumericRange | |
| 431 | - * concatenate groups | |
| 432 | - * pages from a group in reverse order | |
| 433 | - * a group repeated as often as necessary until a specified number of pages is reached | |
| 434 | - * a group padded with blank pages to create a multiple of n pages | |
| 435 | - * odd or even pages from a group | |
| 436 | - * every nth page from a group | |
| 437 | - * pages interleaved from multiple groups | |
| 438 | - * the left-front (left-back, right-front, right-back) pages of a booklet with signatures of n | |
| 439 | - pages | |
| 440 | - * all pages reachable from a section of the outline hierarchy or something based on threads or | |
| 441 | - other structure | |
| 442 | - * selection based on page labels | |
| 443 | -* PageTransformer | |
| 444 | - * clip to media box (trim box, crop box, etc.) | |
| 445 | - * clip to specific absolute or relative size | |
| 446 | - * scale | |
| 447 | - * translate | |
| 448 | - * rotate | |
| 449 | - * apply transformation matrix | |
| 450 | -* PageFilter | |
| 451 | - * optimize images | |
| 452 | - * flatten annotations | |
| 453 | -* PageAssembler | |
| 454 | - * Overlay/underlay all pages from one group onto corresponding pages from another group | |
| 455 | - * Control placement based on properties of all the groups, so higher order than a stand-alone | |
| 456 | - transformer | |
| 457 | - * Examples | |
| 458 | - * Scale the smaller page up to the size of the larger page | |
| 459 | - * Center the smaller page horizontally and bottom-align the trim boxes | |
| 460 | - * Generalized overlay/underlay allowing n pages in a given order with transformations. | |
| 461 | - * n-up -- application of generalized overlay/underlay | |
| 462 | - * make one long page with an arbitrary number of pages one after the other (#546) | |
| 463 | - | |
| 464 | -It should be possible to represent all of the existing qpdf operations using the above framework. It | |
| 465 | -would be good to re-implement all of them in terms of this framework to exercise it. We will have to | |
| 466 | -look through all the command-line arguments and make sure. Of course also make sure suggestions from | |
| 467 | -issues can be implemented or at least supported by adding new selectors. | |
| 468 | - | |
| 469 | -Here are a few bits of scratch work. The top-level call is a selector. This doesn't capture | |
| 470 | -everything. Implementing this would be tedious and challenging. It could be done using JSON arrays, | |
| 471 | -but it would be clunky. This feels over-designed and possibly in conflict with QPDFJob. | |
| 472 | - | |
| 473 | -``` | |
| 474 | -(concat | |
| 475 | - (primary-input) | |
| 476 | - (file "file2.pdf") | |
| 477 | - (page-range (file "file3.pdf") "1-4,5-8") | |
| 478 | -) | |
| 479 | - | |
| 480 | -(with | |
| 481 | - ("a" | |
| 482 | - (concat | |
| 483 | - (primary-input) | |
| 484 | - (file "file2.pdf") | |
| 485 | - (page-range (file "file3.pdf") "1-4,5-8") | |
| 486 | - ) | |
| 487 | - ) | |
| 488 | - (concat | |
| 489 | - (even-pages (from "a")) | |
| 490 | - (reverse (odd-pages (from "a"))) | |
| 491 | - ) | |
| 492 | -) | |
| 493 | - | |
| 494 | -(with | |
| 495 | - ("a" | |
| 496 | - (concat | |
| 497 | - (primary-input) | |
| 498 | - (file "file2.pdf") | |
| 499 | - (page-range (file "file3.pdf") "1-4,5-8") | |
| 500 | - ) | |
| 501 | - "b-even" | |
| 502 | - (even-pages (from "a")) | |
| 503 | - "b-odd" | |
| 504 | - (reverse (odd-pages (from "a"))) | |
| 505 | - ) | |
| 506 | - (stack | |
| 507 | - (repeat-range (from "a") "z") | |
| 508 | - (pad-end (from "b")) | |
| 509 | - ) | |
| 510 | -) | |
| 511 | -``` | |
| 512 | - | |
| 513 | -```json | |
| 514 | - | |
| 515 | -``` | |
| 516 | - | |
| 517 | -# Supporting Document-level Features | |
| 518 | - | |
| 519 | -qpdf needs full support for document-level features like article threads, outlines, etc. There is no | |
| 520 | -support for some things and partial support for others. See notes below for a comprehensive list. | |
| 521 | - | |
| 522 | -Most likely, this will be done by creating DocumentHelper and ObjectHelper classes. | |
| 523 | - | |
| 524 | -It will be necessary not only to read information about these structures from a single PDF file as | |
| 525 | -the existing document helpers do but also to reconstruct or update these based on modifications to | |
| 526 | -the pages in a file. I'm not sure how to do that, but one idea would be to allow a document helper | |
| 527 | -to register a callback with QPDFPageDocumentHelper that notifies it when a page is added or removed. | |
| 528 | -This may be able to take other parameters such as a document helper from a foreign file. | |
| 529 | - | |
| 530 | -Since these operations can be expensive, there will need to be a way to opt in and out. The default | |
| 531 | -(to be clearly documented) should be that all supported document-level constructs are preserved. | |
| 532 | -That way, as new features are added, changes to the output of previous operations to include | |
| 533 | -information that was previously omitted will not constitute a non-backward compatible change that | |
| 534 | -requires a major version bump. This will be a default for the API when using the higher-level page | |
| 535 | -assemebly API (below) as well as the CLI. | |
| 536 | - | |
| 537 | -There will also need to be some kind of support for features that are document-level and not tied to | |
| 538 | -any pages, such as (sometimes) embedded files. When splitting/merging files, there needs to be a way | |
| 539 | -to specify what should happen with those things. Perhaps the default here should be that these are | |
| 540 | -preserved from files from which all pages are selected. For some things, like viewer preferences, it | |
| 541 | -may make sense to take them from the first file. | |
| 542 | - | |
| 543 | -# Page Assembly (page selection) | |
| 544 | - | |
| 545 | -In addition to the existing numeric ranges of page numbers, page selection could be driven by | |
| 546 | -document-level features like the outlines hierarchy or article threads. There have been a lot of | |
| 547 | -suggestions about this in various tickets. There will need to be some kind of page manipulation | |
| 548 | -class with configuration options. I'm thinking something similar to QPDFJob, where you construct a | |
| 549 | -class and then call a bunch of methods to configure it, including the ability to configure with | |
| 550 | -JSON. Several suggestions have been made in issues, which I will go through and distill into a list. | |
| 551 | -Off hand, some ideas include being able to split based on explicit chunks and being able to do all | |
| 552 | -pages except a list of pages. | |
| 553 | - | |
| 554 | -For CLI, I'm probably going to have it take a JSON blob or JSON file on the CLI rather than having | |
| 555 | -some absurd way of doing it with arguments (people have a lot of trouble with --pages as it is). See | |
| 556 | -TODO for a feature on command-line/job JSON support for JSON specification arguments. | |
| 557 | - | |
| 558 | -There are some other things, like allowing n-up and genearlizing overlay/underlay to allow different | |
| 559 | -placement and scaling options, that I think may also be in scope. | |
| 560 | - | |
| 561 | -# Scaling/Transforming Pages | |
| 562 | - | |
| 563 | -* Keep in mind that destinations, such as links and outlines, may need to be adjusted when a page is | |
| 564 | - scaled or otherwise transformed. | |
| 565 | - | |
| 566 | -# Notes | |
| 567 | - | |
| 568 | -PDF document structure | |
| 569 | - | |
| 570 | -The trailer contains the catalog and the Info dictionary. We probably need to do something | |
| 571 | -intelligent with the info dictionary. | |
| 572 | - | |
| 573 | -7.7.2 contains the list of all keys in the document catalog. | |
| 574 | - | |
| 575 | -Document-level structures to merge: | |
| 576 | -* Extensions | |
| 577 | - * Must be combination of Extensions from all input files | |
| 578 | -* PageLabels | |
| 579 | - * Ensure each page has its original label | |
| 580 | - * Allow post-processing | |
| 581 | -* Names -- see below | |
| 582 | - * Combine per tree | |
| 583 | - * May require disambiguation | |
| 584 | - * Page: TemplateInstantiated | |
| 585 | -* Dests | |
| 586 | - * Keep referenced destinations across all files | |
| 587 | - * May need to disambiguate or "flatten" or convert to named dests with the names tree | |
| 588 | -* Outlines | |
| 589 | -* Threads (easy) | |
| 590 | - * Page: B | |
| 591 | -* AcroForm | |
| 592 | -* StructTreeRoot | |
| 593 | - * Page: StructParents | |
| 594 | -* MarkInfo (see 14.7 - Logical Structure, 14.8 Tagged PDF) | |
| 595 | -* SpiderInfo | |
| 596 | - * Page: ID | |
| 597 | -* OutputIntents | |
| 598 | - * Page: OutputIntents | |
| 599 | -* PieceInfo | |
| 600 | - * Page: PieceInfo | |
| 601 | -* OCProperties | |
| 602 | -* Requirements | |
| 603 | -* AF (file specification dictionaries) | |
| 604 | - * Page: AF | |
| 605 | -* DPartRoot | |
| 606 | - * Page: DPart | |
| 607 | -* Version | |
| 608 | - * Maximum | |
| 609 | - | |
| 610 | -Things that stay with the first document that has one and/or will not be supported | |
| 611 | -* AA (Additional Actions) | |
| 612 | - * Would be possible to combine and let the first contributor win, but it probably wouldn't usually | |
| 613 | - be what we want. | |
| 614 | -* Info (not part of document catalog) | |
| 615 | -* ViewerPreferences | |
| 616 | -* PageLayout | |
| 617 | -* PageMode | |
| 618 | -* OpenAction | |
| 619 | -* URI | |
| 620 | -* Metadata | |
| 621 | -* Lang | |
| 622 | -* NeedsRendering | |
| 623 | -* Collection | |
| 624 | -* Perms | |
| 625 | -* Legal | |
| 626 | -* DSS | |
| 627 | - | |
| 628 | -Name dictionary (7.7.4) | |
| 629 | -* Dests | |
| 630 | -* AP (appearance streams) | |
| 631 | -* JavaScript | |
| 632 | -* Pages (named pages) | |
| 633 | -* Templates | |
| 634 | - * Combine across all documents | |
| 635 | - * Page: TemplateInstantiated points to a named page | |
| 636 | -* IDS | |
| 637 | -* URLS | |
| 638 | -* EmbeddedFiles | |
| 639 | -* AlternatePresentations | |
| 640 | -* Renditions | |
| 641 | - | |
| 642 | -Most of chapter 12 applies. | |
| 643 | - | |
| 644 | -Document-level navigation (12.3) | |
| 645 | - | |
| 646 | -QPDF will need a global way to reference a page. This will most likely be in the form of the QPDF | |
| 647 | -uuid and a QPDFObjectHandle to the page. If this can just be a QPDFObjectHandle, that would be | |
| 648 | -better. I need to make sure we can meaningfully interact with QPDFObjectHandle objects from multiple | |
| 649 | -QPDFs in a safe fashion. Figure out how this works with immediateCopyFrom, etc. Better to avoid this | |
| 650 | -whole thing and make sure that we just keep all the document-level stuff specific to a PDF, but we | |
| 651 | -will need to have some internal representation that can be used to reconstruct the document-level | |
| 652 | -dictionaries when writing. Making this work with structures (structure destinations) will require | |
| 653 | -more indirection. | |
| 654 | - | |
| 655 | -I imagine that there will be some internal repreentation of what document-level things come along | |
| 656 | -for the ride when we take a page from a document. I wonder whether this need to change the way | |
| 657 | -linearization works. | |
| 658 | - | |
| 659 | -There should be different ways to specify collections of pages. The existing one, which is using a | |
| 660 | -numeric range, is just one. Other ideas include things related to document structure (all pages in | |
| 661 | -an article thread, all pages in an outline hierarchy), page labels, book binding (Is that called | |
| 662 | -folio? There's an issue for it.), subgroups, or any number of things. | |
| 663 | - | |
| 664 | -We will need to be able to start with document-level objects to get page groups and also to start | |
| 665 | -with pages and reconstruct document level objects. For example, it should be possibe to reconstruct | |
| 666 | -article threads to omit beads that don't belong to any of the pages. Likewise with outlines. | ... | ... |