Commit 9ae7bdea966102f9621b22192747a891078e7470

Authored by m-holger
1 parent f13947de

Reflow TODO.md to line length 100

Showing 1 changed file with 485 additions and 655 deletions
@@ -21,105 +21,85 @@ Contents @@ -21,105 +21,85 @@ Contents
21 Always 21 Always
22 ====== 22 ======
23 23
24 -* Evaluate issues tagged with `next` and `bug`. Remember to check  
25 - discussions and pull requests in addition to regular issues.  
26 -* When close to release, make sure external-libs is building and  
27 - follow instructions in ../external-libs/README 24 +* Evaluate issues tagged with `next` and `bug`. Remember to check discussions and pull requests in
  25 + addition to regular issues.
  26 +* When close to release, make sure external-libs is building and follow instructions in
  27 + ../external-libs/README
28 28
29 Next 29 Next
30 ==== 30 ====
31 31
32 -* Fix #874 -- make args in --encrypt to match the json and make  
33 - positional fill in the gaps 32 +* Fix #874 -- make args in --encrypt to match the json and make positional fill in the gaps
34 * Maybe fix #553 -- use file times for attachments 33 * Maybe fix #553 -- use file times for attachments
35 * std::string_view transition -- work being done by m-holger 34 * std::string_view transition -- work being done by m-holger
36 -* Break ground on "Document-level work" -- TODO-pages.md lives on a  
37 - separate branch.  
38 -* Standard for CLI and Job JSON support for JSON-based command-line  
39 - arguments. Come up with a standard way of supporting command-line  
40 - arguments that take JSON specifications of things so that  
41 - * there is a predictable way to indicate whether an argument is a  
42 - file or a JSON blob  
43 - * with QPDFJob JSON, make sure it is possible to directly include  
44 - the JSON rather than having to stringify a JSON blob  
45 - * One option might be to prepend file:// to a filename or otherwise  
46 - to take a JSON blob. We could have that as a particular type of  
47 - argument that would behave properly for both job JSON and CLI.  
48 - 35 +* Break ground on "Document-level work" -- TODO-pages.md lives on a separate branch.
  36 +* Standard for CLI and Job JSON support for JSON-based command-line arguments. Come up with a
  37 + standard way of supporting command-line arguments that take JSON specifications of things so that
  38 + * there is a predictable way to indicate whether an argument is a file or a JSON blob
  39 + * with QPDFJob JSON, make sure it is possible to directly include the JSON rather than having to
  40 + stringify a JSON blob
  41 + * One option might be to prepend file:// to a filename or otherwise to take a JSON blob. We could
  42 + have that as a particular type of argument that would behave properly for both job JSON and CLI.
49 43
50 Possible future JSON enhancements 44 Possible future JSON enhancements
51 ================================= 45 =================================
52 46
53 -* Consider not including unreferenced objects and trimming the trailer  
54 - in the same way that QPDFWriter does (except don't remove `/ID`).  
55 - This means excluding the linearization dictionary and hint stream,  
56 - the encryption dictionary, all keys from trailer that are removed by  
57 - QPDFWriter::getTrimmedTrailer except `/ID`, any object streams, and  
58 - the xref stream as long as all those objects are unreferenced. (They  
59 - always should be, but there could be some bizarre case of someone  
60 - creating a PDF file that has an indirect reference to one of those,  
61 - in which case we need to preserve it.) If this is done, make  
62 - `--preserve-unreferenced` preserve unreference objects and also  
63 - those extra keys. Search for "linear" and "trailer" in json.rst to  
64 - update the various places in the documentation that discuss this.  
65 - Also update the help for --json and --preserve-unreferenced.  
66 -  
67 -* Add to JSON output the information available from a few additional  
68 - informational options: 47 +* Consider not including unreferenced objects and trimming the trailer in the same way that
  48 + QPDFWriter does (except don't remove `/ID`). This means excluding the linearization dictionary and
  49 + hint stream, the encryption dictionary, all keys from trailer that are removed by QPDFWriter::
  50 + getTrimmedTrailer except `/ID`, any object streams, and the xref stream as long as all those
  51 + objects are unreferenced. (They always should be, but there could be some bizarre case of someone
  52 + creating a PDF file that has an indirect reference to one of those, in which case we need to
  53 + preserve it.) If this is done, make
  54 + `--preserve-unreferenced` preserve unreference objects and also those extra keys. Search for "
  55 + linear" and "trailer" in json.rst to update the various places in the documentation that discuss
  56 + this. Also update the help for --json and --preserve-unreferenced.
  57 +
  58 +* Add to JSON output the information available from a few additional informational options:
69 59
70 * --check: add but maybe not by default? 60 * --check: add but maybe not by default?
71 61
72 - * --show-linearization: add but maybe not by default? Also figure  
73 - out whether warnings reported for some of the PDF specs (1.7) are  
74 - qpdf problems. This may not be worth adding in the first 62 + * --show-linearization: add but maybe not by default? Also figure out whether warnings reported
  63 + for some of the PDF specs (1.7) are qpdf problems. This may not be worth adding in the first
75 increment. 64 increment.
76 65
77 * --show-xref: add 66 * --show-xref: add
78 67
79 -* Consider having --check, --show-encryption, etc., just select the  
80 - right keys when in json mode. I don't think I want check on by  
81 - default, so that might be different. 68 +* Consider having --check, --show-encryption, etc., just select the right keys when in json mode. I
  69 + don't think I want check on by default, so that might be different.
82 70
83 -* Consider having warnings be included in the json in a "warnings" key  
84 - in json mode. 71 +* Consider having warnings be included in the json in a "warnings" key in json mode.
85 72
86 QPDFJob 73 QPDFJob
87 ======= 74 =======
88 75
89 -Here are some ideas for QPDFJob that didn't make it into 10.6. Not all  
90 -of these are necessarily good -- just things to consider.  
91 -  
92 -* How do we chain jobs? The idea would be that the input and/or output  
93 - of a QPDFJob could be a QPDF object rather than a file. For input,  
94 - it's pretty easy. For output, none of the output-specific options  
95 - (encrypt, compress-streams, objects-streams, etc.) would have any  
96 - affect, so we would have to treat this like inspect for error  
97 - checking. The QPDF object in the state where it's ready to be sent  
98 - off to QPDFWriter would be used as the input to the next QPDFJob.  
99 - For the job json, I think we can have the output be an identifier  
100 - that can be used as the input for another QPDFJob. For a json file,  
101 - we could the top level detect if it's an array with the convention  
102 - that exactly one has an output, or we could have a subkey with other  
103 - job definitions or something. Ideally, any input  
104 - (copy-attachments-from, pages, etc.) could use a QPDF object. It  
105 - wouldn't surprise me if this exposes bugs in qpdf around foreign  
106 - streams as this has been a relatively fragile area before. 76 +Here are some ideas for QPDFJob that didn't make it into 10.6. Not all of these are necessarily
  77 +good -- just things to consider.
  78 +
  79 +* How do we chain jobs? The idea would be that the input and/or output of a QPDFJob could be a QPDF
  80 + object rather than a file. For input, it's pretty easy. For output, none of the output-specific
  81 + options
  82 + (encrypt, compress-streams, objects-streams, etc.) would have any affect, so we would have to
  83 + treat this like inspect for error checking. The QPDF object in the state where it's ready to be
  84 + sent off to QPDFWriter would be used as the input to the next QPDFJob. For the job json, I think
  85 + we can have the output be an identifier that can be used as the input for another QPDFJob. For a
  86 + json file, we could the top level detect if it's an array with the convention that exactly one has
  87 + an output, or we could have a subkey with other job definitions or something. Ideally, any input
  88 + (copy-attachments-from, pages, etc.) could use a QPDF object. It wouldn't surprise me if this
  89 + exposes bugs in qpdf around foreign streams as this has been a relatively fragile area before.
107 90
108 Documentation 91 Documentation
109 ============= 92 =============
110 93
111 * Do a full pass through the documentation. 94 * Do a full pass through the documentation.
112 95
113 - * Make sure `qpdf` is consistent. Use QPDF when just referring to  
114 - the package. 96 + * Make sure `qpdf` is consistent. Use QPDF when just referring to the package.
115 * Make sure markup is consistent 97 * Make sure markup is consistent
116 * Autogenerate where possible 98 * Autogenerate where possible
117 - * Consider which parts might be good candidates for moving to the  
118 - wiki. 99 + * Consider which parts might be good candidates for moving to the wiki.
119 100
120 -* Commit 'Manual - enable line wrapping in table cells' from  
121 - Mon Jan 17 12:22:35 2022 +0000 enables table cell wrapping. See if  
122 - this can be incorporated directly into sphinx_rtd_theme and the 101 +* Commit 'Manual - enable line wrapping in table cells' from Mon Jan 17 12:22:35 2022 +0000 enables
  102 + table cell wrapping. See if this can be incorporated directly into sphinx_rtd_theme and the
123 workaround can be removed. 103 workaround can be removed.
124 104
125 * When possible, update the debian package to include docs again. See 105 * When possible, update the debian package to include docs again. See
@@ -130,76 +110,62 @@ Document-level work @@ -130,76 +110,62 @@ Document-level work
130 110
131 * Ideas here may by superseded by #593. 111 * Ideas here may by superseded by #593.
132 112
133 -* QPDFPageCopier -- object for moving pages around within files or  
134 - between files and performing various transformations. Reread/rewrite 113 +* QPDFPageCopier -- object for moving pages around within files or between files and performing
  114 + various transformations. Reread/rewrite
135 _page-selection in the manual if needed. 115 _page-selection in the manual if needed.
136 116
137 * Handle all the stuff of pages and split-pages 117 * Handle all the stuff of pages and split-pages
138 * Do n-up, booklet, collation 118 * Do n-up, booklet, collation
139 * Look through cli and see what else...flatten-*? 119 * Look through cli and see what else...flatten-*?
140 - * See comments in QPDFPageDocumentHelper.hh for addPage -- search  
141 - for "a future version". 120 + * See comments in QPDFPageDocumentHelper.hh for addPage -- search for "a future version".
142 * Make it efficient for bulk operations 121 * Make it efficient for bulk operations
143 * Make certain doc-level features selectable 122 * Make certain doc-level features selectable
144 - * qpdf.cc should do all its page operations, including  
145 - overlay/underlay, splitting, and merging, using this 123 + * qpdf.cc should do all its page operations, including overlay/underlay, splitting, and merging,
  124 + using this
146 * There should also be example code 125 * There should also be example code
147 126
148 -* After doc-level checks are in, call --check on the output files in  
149 - the "Copy Annotations" tests. 127 +* After doc-level checks are in, call --check on the output files in the "Copy Annotations" tests.
150 128
151 -* Document-level checks. For example, for forms, make sure all form  
152 - fields point to an annotation on exactly one page as well as that  
153 - all widget annotations are associated with a form field. Hook this  
154 - into QPDFPageCopier as well as the doc helpers. Make sure it is  
155 - called from --check. 129 +* Document-level checks. For example, for forms, make sure all form fields point to an annotation on
  130 + exactly one page as well as that all widget annotations are associated with a form field. Hook
  131 + this into QPDFPageCopier as well as the doc helpers. Make sure it is called from --check.
156 132
157 * See also issues tagged with "pages". Include closed issues. 133 * See also issues tagged with "pages". Include closed issues.
158 134
159 -* Add flags to CLI to select which document-level options to  
160 - preserve or not preserve. We will probably need a pair of mutually  
161 - exclusive, repeatable options with a way to specify all, none, only  
162 - {x,y}, or all but {x,y}. 135 +* Add flags to CLI to select which document-level options to preserve or not preserve. We will
  136 + probably need a pair of mutually exclusive, repeatable options with a way to specify all, none,
  137 + only {x,y}, or all but {x,y}.
163 138
164 -* If a page contains a reference a file attachment annotation, when  
165 - that page is copied, if the file attachment appears in the top-level  
166 - EmbeddedFiles tree, that entry should be preserved in the  
167 - destination file. Otherwise, we probably will require the use of  
168 - --copy-attachments-from to preserve these. What will the strategy be  
169 - for deduplicating in the automatic case? 139 +* If a page contains a reference a file attachment annotation, when that page is copied, if the file
  140 + attachment appears in the top-level EmbeddedFiles tree, that entry should be preserved in the
  141 + destination file. Otherwise, we probably will require the use of --copy-attachments-from to
  142 + preserve these. What will the strategy be for deduplicating in the automatic case?
170 143
171 Text Appearance Streams 144 Text Appearance Streams
172 ======================= 145 =======================
173 146
174 -This is a list of known issues with text appearance streams and things  
175 -we might do about it.  
176 -  
177 -* For variable text, the spec says to pull any resources from /DR that  
178 - are referenced in /DA but if the resource dictionary already has  
179 - that resource, just use the one that's there. The current code looks  
180 - only for /Tf and adds it if needed. We might want to instead merge  
181 - /DR with resources and then remove anything that's unreferenced. We  
182 - have all the code required for that in ResourceFinder except  
183 - TfFinder also gets the font size, which ResourceFinder doesn't do.  
184 -  
185 -* There are things we are missing because we don't look at font  
186 - metrics. The code from TextBuilder (work) has almost everything in  
187 - it that is required. Once we have knowledge of character widths, we  
188 - can support quadding and multiline text fields (/Ff 4096), and we  
189 - can potentially squeeze text to fit into a field. For multiline,  
190 - first squeeze vertically down to the font height, then squeeze  
191 - horizontally with Tz. For single line, squeeze horizontally with Tz.  
192 - If we use Tz, issue a warning.  
193 -  
194 -* When mapping characters to widths, we will need to care about  
195 - character encoding. For built-in fonts, we can create a map from  
196 - Unicode code point to width and then go from the font's encoding to  
197 - unicode to the width. See misc/character-encoding/ (not on github)  
198 - and font metric information for the 14 standard fonts in my local  
199 - pdf-spec directory.  
200 -  
201 -* Once we know about character widths, we can correctly support  
202 - auto-sized variable text fields (0 Tf). If this is fixed, search for 147 +This is a list of known issues with text appearance streams and things we might do about it.
  148 +
  149 +* For variable text, the spec says to pull any resources from /DR that are referenced in /DA but if
  150 + the resource dictionary already has that resource, just use the one that's there. The current code
  151 + looks only for /Tf and adds it if needed. We might want to instead merge /DR with resources and
  152 + then remove anything that's unreferenced. We have all the code required for that in ResourceFinder
  153 + except TfFinder also gets the font size, which ResourceFinder doesn't do.
  154 +
  155 +* There are things we are missing because we don't look at font metrics. The code from TextBuilder (
  156 + work) has almost everything in it that is required. Once we have knowledge of character widths, we
  157 + can support quadding and multiline text fields (/Ff 4096), and we can potentially squeeze text to
  158 + fit into a field. For multiline, first squeeze vertically down to the font height, then squeeze
  159 + horizontally with Tz. For single line, squeeze horizontally with Tz. If we use Tz, issue a
  160 + warning.
  161 +
  162 +* When mapping characters to widths, we will need to care about character encoding. For built-in
  163 + fonts, we can create a map from Unicode code point to width and then go from the font's encoding
  164 + to unicode to the width. See misc/character-encoding/ (not on github)
  165 + and font metric information for the 14 standard fonts in my local pdf-spec directory.
  166 +
  167 +* Once we know about character widths, we can correctly support auto-sized variable text fields (0
  168 + Tf). If this is fixed, search for
203 "auto-sized" in cli.rst. 169 "auto-sized" in cli.rst.
204 170
205 Fuzz Errors 171 Fuzz Errors
@@ -215,367 +181,297 @@ External Libraries @@ -215,367 +181,297 @@ External Libraries
215 181
216 Current state (10.0.2): 182 Current state (10.0.2):
217 183
218 -* qpdf/external-libs repository builds external-libs on a schedule.  
219 - It detects and downloads the latest versions of zlib, jpeg, and  
220 - openssl and creates source and binary distribution zip files in an  
221 - artifact called "distribution". 184 +* qpdf/external-libs repository builds external-libs on a schedule. It detects and downloads the
  185 + latest versions of zlib, jpeg, and openssl and creates source and binary distribution zip files in
  186 + an artifact called "distribution".
222 187
223 -* Releases in qpdf/external-libs are made manually. They contain  
224 - qpdf-external-libs-{bin,src}.zip. 188 +* Releases in qpdf/external-libs are made manually. They contain qpdf-external-libs-{bin,src}.zip.
225 189
226 -* The qpdf build finds the latest non-prerelease release and downloads  
227 - the qpdf-external-libs-*.zip files from the releases in the setup  
228 - stage. 190 +* The qpdf build finds the latest non-prerelease release and downloads the qpdf-external-libs-*.zip
  191 + files from the releases in the setup stage.
229 192
230 -* To upgrade to a new version of external-libs, create a new release  
231 - of qpdf/external-libs (see README-maintainer in external-libs) from  
232 - the distribution artifact of the most recent successful build after  
233 - ensuring that it works. 193 +* To upgrade to a new version of external-libs, create a new release of qpdf/external-libs (see
  194 + README-maintainer in external-libs) from the distribution artifact of the most recent successful
  195 + build after ensuring that it works.
234 196
235 Desired state: 197 Desired state:
236 198
237 -* The qpdf/external-libs repository should create release candidates.  
238 - Ideally, every scheduled run would make its zip files available. A  
239 - personal access token with actions:read scope for the  
240 - qpdf/external-libs repository is required to download the artifact  
241 - from an action run, and qpdf/qpdf's secrets.GITHUB_TOKEN doesn't  
242 - have this access. We could create a service account for this  
243 - purpose. As an alternative, we could have a draft release in  
244 - qpdf/external-libs that the qpdf/external-libs build could update  
245 - with each candidate. It may also be possible to solve this by  
246 - developing a simple GitHub app.  
247 -  
248 -* Scheduled runs of the qpdf build in the qpdf/qpdf repository (not a  
249 - fork or pull request) could download external-libs from the release  
250 - candidate area instead of the latest stable release. Pushes to the  
251 - build branch should still use the latest release so it always  
252 - matches the main branch.  
253 -  
254 -* Periodically, we would create a release of external-libs from the  
255 - release candidate zip files. This could be done safely because we  
256 - know the latest qpdf works with it. This could be done at least  
257 - before every release of qpdf, but potentially it could be done at  
258 - other times, such as when a new dependency version is available or  
259 - after some period of time. 199 +* The qpdf/external-libs repository should create release candidates. Ideally, every scheduled run
  200 + would make its zip files available. A personal access token with actions:read scope for the
  201 + qpdf/external-libs repository is required to download the artifact from an action run, and
  202 + qpdf/qpdf's secrets.GITHUB_TOKEN doesn't have this access. We could create a service account for
  203 + this purpose. As an alternative, we could have a draft release in qpdf/external-libs that the
  204 + qpdf/external-libs build could update with each candidate. It may also be possible to solve this
  205 + by developing a simple GitHub app.
  206 +
  207 +* Scheduled runs of the qpdf build in the qpdf/qpdf repository (not a fork or pull request) could
  208 + download external-libs from the release candidate area instead of the latest stable release.
  209 + Pushes to the build branch should still use the latest release so it always matches the main
  210 + branch.
  211 +
  212 +* Periodically, we would create a release of external-libs from the release candidate zip files.
  213 + This could be done safely because we know the latest qpdf works with it. This could be done at
  214 + least before every release of qpdf, but potentially it could be done at other times, such as when
  215 + a new dependency version is available or after some period of time.
260 216
261 Other notes: 217 Other notes:
262 218
263 -* The external-libs branch in qpdf/qpdf was never documented. We might  
264 - be able to get away with deleting it. 219 +* The external-libs branch in qpdf/qpdf was never documented. We might be able to get away with
  220 + deleting it.
265 221
266 -* See README-maintainer in qpdf/external-libs for information on  
267 - creating a release. This could be at least partially scripted in a  
268 - way that works for the qpdf/qpdf repository as well since they are  
269 - very similar. 222 +* See README-maintainer in qpdf/external-libs for information on creating a release. This could be
  223 + at least partially scripted in a way that works for the qpdf/qpdf repository as well since they
  224 + are very similar.
270 225
271 ABI Changes 226 ABI Changes
272 =========== 227 ===========
273 228
274 -This is a list of changes to make next time there is an ABI change.  
275 -Comments appear in the code prefixed by "ABI". 229 +This is a list of changes to make next time there is an ABI change. Comments appear in the code
  230 +prefixed by "ABI".
276 231
277 Always: 232 Always:
278 * Search for ABI in source and header files 233 * Search for ABI in source and header files
279 * Search for "[[deprecated" to find deprecated APIs that can be removed 234 * Search for "[[deprecated" to find deprecated APIs that can be removed
280 * Search for issues, pull requests, and discussions with the "abi" label 235 * Search for issues, pull requests, and discussions with the "abi" label
281 -* Check discussion "qpdf X planning" where X is the next major  
282 - version. This should be tagged `abi` 236 +* Check discussion "qpdf X planning" where X is the next major version. This should be tagged `abi`
283 237
284 For qpdf 12, see https://github.com/qpdf/qpdf/discussions/785 238 For qpdf 12, see https://github.com/qpdf/qpdf/discussions/785
285 239
286 C++ Version Changes 240 C++ Version Changes
287 =================== 241 ===================
288 242
289 -Use  
290 -// C++NN: ...  
291 -to mark places in the code that should be updated when we require at  
292 -least that version of C++. 243 +Use // C++NN: ... to mark places in the code that should be updated when we require at least that
  244 +version of C++.
293 245
294 Page splitting/merging 246 Page splitting/merging
295 ====================== 247 ======================
296 248
297 - * Update page splitting and merging to handle document-level  
298 - constructs with page impact such as interactive forms and article  
299 - threading. Check keys in the document catalog for others, such as  
300 - outlines, page labels, thumbnails, and zones. For threads,  
301 - Subramanyam provided a test file; see ../misc/article-threads.pdf.  
302 - Email Q-Count: 431864 from 2009-11-03.  
303 -  
304 - * bookmarks (outlines) 12.3.3  
305 - * support bookmarks when merging  
306 - * prune bookmarks that don't point to a surviving page when merging  
307 - or splitting  
308 - * make sure conflicting named destinations work possibly test by  
309 - including the same file by two paths in a merge  
310 - * see also comments in issue 343  
311 -  
312 - Note: original implementation of bookmark preservation for split  
313 - pages caused a very high performance hit. The problem was  
314 - introduced in 313ba081265f69ac9a0324f9fe87087c72918191 and reverted  
315 - in the commit that adds this paragraph. The revert includes marking  
316 - a few tests cases as $td->EXPECT_FAILURE. When properly coded, the  
317 - test cases will need to be adjusted to only include the parts of  
318 - the outlines that are actually copied. The tests in question are  
319 - "split page with outlines". When implementing properly, ensure that  
320 - the performance is not adversely affected by timing split-pages on  
321 - a large file with complex outlines such as the PDF specification.  
322 -  
323 - When pruning outlines, keep all outlines in the hierarchy that are  
324 - above an outline for a page we care about. If one of the ancestor  
325 - outlines points to a non-existent page, clear its dest. If an  
326 - outline does not have any children that point to pages in the  
327 - document, just omit it.  
328 -  
329 - Possible strategy:  
330 - * resolve all named destinations to explicit destinations  
331 - * concatenate top-level outlines  
332 - * prune outlines whose dests don't point to a valid page  
333 - * recompute all /Count fields  
334 -  
335 - Test files  
336 - * page-labels-and-outlines.pdf: old file with both page labels and  
337 - outlines. All destinations are explicit destinations. Each page  
338 - has Potato and a number. All titles are feline names.  
339 - * outlines-with-actions.pdf: mixture of explicit destinations,  
340 - named destinations, goto actions with explicit destinations, and  
341 - goto actions with named destinations; uses /Dests key in names  
342 - dictionary. Each page has Salad and a number. All titles are  
343 - silly words. One destination is an indirect object.  
344 - * outlines-with-old-root-dests.pdf: like outlines-with-actions  
345 - except it uses the PDF-1.1 /Dests dictionary for named  
346 - destinations, and each page has Soup and a number. Also pages are  
347 - numbered with upper-case Roman numerals starting with 0. All  
348 - titles are silly words preceded by a bullet.  
349 -  
350 - If outline handling is significantly improved, see  
351 - ../misc/bad-outlines/bad-outlines.pdf and email:  
352 - https://mail.google.com/mail/u/0/#search/rfc822msgid%3A02aa01d3d013%249f766990%24de633cb0%24%40mono.hr)  
353 -  
354 - * Form fields: should be similar to outlines. 249 +* Update page splitting and merging to handle document-level constructs with page impact such as
  250 + interactive forms and article threading. Check keys in the document catalog for others, such as
  251 + outlines, page labels, thumbnails, and zones. For threads, Subramanyam provided a test file; see
  252 + ../misc/article-threads.pdf. Email Q-Count: 431864 from 2009-11-03.
  253 +
  254 +* bookmarks (outlines) 12.3.3
  255 + * support bookmarks when merging
  256 + * prune bookmarks that don't point to a surviving page when merging or splitting
  257 + * make sure conflicting named destinations work possibly test by including the same file by two
  258 + paths in a merge
  259 + * see also comments in issue 343
  260 +
  261 + Note: original implementation of bookmark preservation for split pages caused a very high
  262 + performance hit. The problem was introduced in 313ba081265f69ac9a0324f9fe87087c72918191 and
  263 + reverted in the commit that adds this paragraph. The revert includes marking a few tests cases as
  264 + $td->EXPECT_FAILURE. When properly coded, the test cases will need to be adjusted to only include
  265 + the parts of the outlines that are actually copied. The tests in question are
  266 + "split page with outlines". When implementing properly, ensure that the performance is not
  267 + adversely affected by timing split-pages on a large file with complex outlines such as the PDF
  268 + specification.
  269 +
  270 + When pruning outlines, keep all outlines in the hierarchy that are above an outline for a page we
  271 + care about. If one of the ancestor outlines points to a non-existent page, clear its dest. If an
  272 + outline does not have any children that point to pages in the document, just omit it.
  273 +
  274 + Possible strategy:
  275 + * resolve all named destinations to explicit destinations
  276 + * concatenate top-level outlines
  277 + * prune outlines whose dests don't point to a valid page
  278 + * recompute all /Count fields
  279 +
  280 + Test files
  281 + * page-labels-and-outlines.pdf: old file with both page labels and outlines. All destinations are
  282 + explicit destinations. Each page has Potato and a number. All titles are feline names.
  283 + * outlines-with-actions.pdf: mixture of explicit destinations, named destinations, goto actions
  284 + with explicit destinations, and goto actions with named destinations; uses /Dests key in names
  285 + dictionary. Each page has Salad and a number. All titles are silly words. One destination is an
  286 + indirect object.
  287 + * outlines-with-old-root-dests.pdf: like outlines-with-actions except it uses the PDF-1.1 /Dests
  288 + dictionary for named destinations, and each page has Soup and a number. Also pages are numbered
  289 + with upper-case Roman numerals starting with 0. All titles are silly words preceded by a bullet.
  290 +
  291 + If outline handling is significantly improved, see ../misc/bad-outlines/bad-outlines.pdf and
  292 + email:
  293 + https://mail.google.com/mail/u/0/#search/rfc822msgid%3A02aa01d3d013%249f766990%24de633cb0%24%40mono.hr)
  294 +
  295 +* Form fields: should be similar to outlines.
355 296
356 Analytics 297 Analytics
357 ========= 298 =========
358 299
359 -Consider features that make it easier to detect certain patterns in  
360 -PDF files. The information below could be computed using an external  
361 -program that reads the existing json, but if it's useful enough, we  
362 -could add it directly to the json output. 300 +Consider features that make it easier to detect certain patterns in PDF files. The information below
  301 +could be computed using an external program that reads the existing json, but if it's useful enough,
  302 +we could add it directly to the json output.
363 303
364 - * Add to "pages" in the json:  
365 - * "inheritsresources": bool; whether there are any inherited  
366 - attributes from ancestor page tree nodes  
367 - * "sharedresources": a list of indirect objects that are  
368 - "/Resources" dictionaries or "XObject" resource dictionary subkeys  
369 - of either the page itself or of any form XObject referenced by the  
370 - page. 304 +* Add to "pages" in the json:
  305 + * "inheritsresources": bool; whether there are any inherited attributes from ancestor page tree
  306 + nodes
  307 + * "sharedresources": a list of indirect objects that are
  308 + "/Resources" dictionaries or "XObject" resource dictionary subkeys of either the page itself or
  309 + of any form XObject referenced by the page.
371 310
372 - * Add to "objectinfo" in json: "directpagerefcount": the number of  
373 - pages that directly reference this object (i.e., you can find an  
374 - indirect reference to the object in the page dictionary without  
375 - traversing over any indirect objects) 311 +* Add to "objectinfo" in json: "directpagerefcount": the number of pages that directly reference
  312 + this object (i.e., you can find an indirect reference to the object in the page dictionary without
  313 + traversing over any indirect objects)
376 314
377 General 315 General
378 ======= 316 =======
379 317
380 -NOTE: Some items in this list refer to files in my personal home  
381 -directory or that are otherwise not publicly accessible. This includes  
382 -things sent to me by email that are specifically not public. Even so,  
383 -I find it useful to make reference to them in this list. 318 +NOTE: Some items in this list refer to files in my personal home directory or that are otherwise not
  319 +publicly accessible. This includes things sent to me by email that are specifically not public. Even
  320 +so, I find it useful to make reference to them in this list.
384 321
385 * Consider enabling code scanning on GitHub. 322 * Consider enabling code scanning on GitHub.
386 323
387 -* Add an option --ignore-encryption to ignore encryption information  
388 - and treat encrypted files as if they weren't encrypted. This should  
389 - make it possible to solve #598 (--show-encryption without a  
390 - password). We'll need to make sure we don't try to filter any  
391 - streams in this mode. Ideally we should be able to combine this with  
392 - --json so we can look at the raw encrypted strings and streams if we  
393 - want to, though be sure to document that the resulting JSON won't be  
394 - convertible back to a valid PDF. Since providing the password may  
395 - reveal additional details, --show-encryption could potentially retry  
396 - with this option if the first time doesn't work. Then, with the file  
397 - open, we can read the encryption dictionary normally. If this is  
398 - done, search for "raw, encrypted" in json.rst.  
399 -  
400 -* In libtests, separate executables that need the object library  
401 - from those that strictly use public API. Move as many of the test  
402 - drivers from the qpdf directory into the latter category as long  
403 - as doing so isn't too troublesome from a coverage standpoint.  
404 -  
405 -* Consider generating a non-flat pages tree before creating output to  
406 - better handle files with lots of pages. If there are more than 256  
407 - pages, add a second layer with the second layer nodes having no more  
408 - than 256 nodes and being as evenly sizes as possible. Don't worry  
409 - about the case of more than 65,536 pages. If the top node has more  
410 - than 256 children, we'll live with it. This is only safe if all  
411 - intermediate page nodes have only /Kids, /Parent, /Type, and /Count. 324 +* Add an option --ignore-encryption to ignore encryption information and treat encrypted files as if
  325 + they weren't encrypted. This should make it possible to solve #598 (--show-encryption without a
  326 + password). We'll need to make sure we don't try to filter any streams in this mode. Ideally we
  327 + should be able to combine this with --json so we can look at the raw encrypted strings and streams
  328 + if we want to, though be sure to document that the resulting JSON won't be convertible back to a
  329 + valid PDF. Since providing the password may reveal additional details, --show-encryption could
  330 + potentially retry with this option if the first time doesn't work. Then, with the file open, we
  331 + can read the encryption dictionary normally. If this is done, search for "raw, encrypted" in
  332 + json.rst.
  333 +
  334 +* In libtests, separate executables that need the object library from those that strictly use public
  335 + API. Move as many of the test drivers from the qpdf directory into the latter category as long as
  336 + doing so isn't too troublesome from a coverage standpoint.
  337 +
  338 +* Consider generating a non-flat pages tree before creating output to better handle files with lots
  339 + of pages. If there are more than 256 pages, add a second layer with the second layer nodes having
  340 + no more than 256 nodes and being as evenly sizes as possible. Don't worry about the case of more
  341 + than 65,536 pages. If the top node has more than 256 children, we'll live with it. This is only
  342 + safe if all intermediate page nodes have only /Kids, /Parent, /Type, and /Count.
412 343
413 * Look at https://bestpractices.coreinfrastructure.org/en 344 * Look at https://bestpractices.coreinfrastructure.org/en
414 345
415 * Consider adding fuzzer code for JSON 346 * Consider adding fuzzer code for JSON
416 347
417 -* Rework tests so that nothing is written into the source directory.  
418 - Ideally then the entire build could be done with a read-only  
419 - source tree.  
420 -  
421 -* Large file tests fail with linux32 before and after cmake. This was  
422 - first noticed after 10.6.3. I don't think it's worth fixing.  
423 -  
424 -* Consider updating the fuzzer with code that exercises  
425 - copyAnnotations, file attachments, and name and number trees. Check  
426 - fuzzer coverage.  
427 -  
428 -* Add code for creation of a file attachment annotation. It should  
429 - also be possible to create a widget annotation and a form field.  
430 - Update the pdf-attach-file.cc example with new APIs when ready.  
431 -  
432 -* Flattening of form XObjects seems like something that would be  
433 - useful in the library. We are seeing more cases of completely valid  
434 - PDF files with form XObjects that cause problems in other software.  
435 - Flattening of form XObjects could be a useful way to work around  
436 - those issues or to prepare files for additional processing, making  
437 - it possible for users of the qpdf library to not be concerned about  
438 - form XObjects. This could be done recursively; i.e., we could have a  
439 - method to embed a form XObject into whatever contains it, whether  
440 - that is a form XObject or a page. This would require more  
441 - significant interpretation of the content stream. We would need a  
442 - test file in which the placement of the form XObject has to be in  
443 - the right place, e.g., the form XObject partially obscures earlier  
444 - code and is partially obscured by later code. Keys in the resource  
445 - dictionary may need to be changed -- create test cases with lots of  
446 - duplicated/overlapping keys.  
447 -  
448 -* Part of closed_file_input_source.cc is disabled on Windows because  
449 - of odd failures. It might be worth investigating so we can fully  
450 - exercise this in the test suite. That said, ClosedFileInputSource  
451 - is exercised elsewhere in qpdf's test suite, so this is not that  
452 - pressing.  
453 -  
454 -* If possible, consider adding CCITT3, CCITT4, or any other easy  
455 - filters. For some reference code that we probably can't use but may  
456 - be handy anyway, see 348 +* Rework tests so that nothing is written into the source directory. Ideally then the entire build
  349 + could be done with a read-only source tree.
  350 +
  351 +* Large file tests fail with linux32 before and after cmake. This was first noticed after 10.6.3. I
  352 + don't think it's worth fixing.
  353 +
  354 +* Consider updating the fuzzer with code that exercises copyAnnotations, file attachments, and name
  355 + and number trees. Check fuzzer coverage.
  356 +
  357 +* Add code for creation of a file attachment annotation. It should also be possible to create a
  358 + widget annotation and a form field. Update the pdf-attach-file.cc example with new APIs when
  359 + ready.
  360 +
  361 +* Flattening of form XObjects seems like something that would be useful in the library. We are
  362 + seeing more cases of completely valid PDF files with form XObjects that cause problems in other
  363 + software. Flattening of form XObjects could be a useful way to work around those issues or to
  364 + prepare files for additional processing, making it possible for users of the qpdf library to not
  365 + be concerned about form XObjects. This could be done recursively; i.e., we could have a method to
  366 + embed a form XObject into whatever contains it, whether that is a form XObject or a page. This
  367 + would require more significant interpretation of the content stream. We would need a test file in
  368 + which the placement of the form XObject has to be in the right place, e.g., the form XObject
  369 + partially obscures earlier code and is partially obscured by later code. Keys in the resource
  370 + dictionary may need to be changed -- create test cases with lots of duplicated/overlapping keys.
  371 +
  372 +* Part of closed_file_input_source.cc is disabled on Windows because of odd failures. It might be
  373 + worth investigating so we can fully exercise this in the test suite. That said,
  374 + ClosedFileInputSource is exercised elsewhere in qpdf's test suite, so this is not that pressing.
  375 +
  376 +* If possible, consider adding CCITT3, CCITT4, or any other easy filters. For some reference code
  377 + that we probably can't use but may be handy anyway, see
457 http://partners.adobe.com/public/developer/ps/sdk/index_archive.html 378 http://partners.adobe.com/public/developer/ps/sdk/index_archive.html
458 379
459 * If possible, support the following types of broken files: 380 * If possible, support the following types of broken files:
460 381
461 - - Files that have no whitespace token after "endobj" such that  
462 - endobj collides with the start of the next object 382 + - Files that have no whitespace token after "endobj" such that endobj collides with the start of
  383 + the next object
463 384
464 - - See ../misc/broken-files 385 + - See ../misc/broken-files
465 386
466 - - See ../misc/bad-files-issue-476. This directory contains a  
467 - snapshot of the google doc and linked PDF files from issue #476.  
468 - Please see the issue for details. 387 + - See ../misc/bad-files-issue-476. This directory contains a snapshot of the google doc and linked
  388 + PDF files from issue #476. Please see the issue for details.
469 389
470 * Additional form features 390 * Additional form features
471 - * set value from CLI? Specify title, and provide way to  
472 - disambiguate, probably by giving objgen of field 391 + * set value from CLI? Specify title, and provide way to disambiguate, probably by giving objgen of
  392 + field
473 393
474 * Pl_TIFFPredictor is pretty slow. 394 * Pl_TIFFPredictor is pretty slow.
475 395
476 -* Support for handling file names with Unicode characters in Windows  
477 - is incomplete. qpdf seems to support them okay from a functionality  
478 - standpoint, and the right thing happens if you pass in UTF-8  
479 - encoded filenames to QPDF library routines in Windows (they are  
480 - converted internally to wchar_t*), but file names are encoded in  
481 - UTF-8 on output, which doesn't produce nice error messages or  
482 - output on Windows in some cases.  
483 -  
484 -* If we ever wanted to do anything more with character encoding, see  
485 - ../misc/character-encoding/, which includes machine-readable dump  
486 - of table D.2 in the ISO-32000 PDF spec. This shows the mapping  
487 - between Unicode, StandardEncoding, WinAnsiEncoding,  
488 - MacRomanEncoding, and PDFDocEncoding.  
489 -  
490 -* Some test cases on bad files fail because qpdf is unable to find  
491 - the root dictionary when it fails to read the trailer. Recovery  
492 - could find the root dictionary and even the info dictionary in  
493 - other ways. In particular, issue-202.pdf can be opened by evince,  
494 - and there's no real reason that qpdf couldn't be made to be able to  
495 - recover that file as well.  
496 -  
497 -* Audit every place where qpdf allocates memory to see whether there  
498 - are cases where malicious inputs could cause qpdf to attempt to  
499 - grab very large amounts of memory. Certainly there are cases like  
500 - this, such as if a very highly compressed, very large image stream  
501 - is requested in a buffer. Hopefully normal input to output  
502 - filtering doesn't ever try to do this. QPDFWriter should be checked  
503 - carefully too. See also bugs/private/from-email-663916/ 396 +* Support for handling file names with Unicode characters in Windows is incomplete. qpdf seems to
  397 + support them okay from a functionality standpoint, and the right thing happens if you pass in
  398 + UTF-8 encoded filenames to QPDF library routines in Windows (they are converted internally to
  399 + wchar_t*), but file names are encoded in UTF-8 on output, which doesn't produce nice error
  400 + messages or output on Windows in some cases.
  401 +
  402 +* If we ever wanted to do anything more with character encoding, see ../misc/character-encoding/,
  403 + which includes machine-readable dump of table D.2 in the ISO-32000 PDF spec. This shows the
  404 + mapping between Unicode, StandardEncoding, WinAnsiEncoding, MacRomanEncoding, and PDFDocEncoding.
  405 +
  406 +* Some test cases on bad files fail because qpdf is unable to find the root dictionary when it fails
  407 + to read the trailer. Recovery could find the root dictionary and even the info dictionary in other
  408 + ways. In particular, issue-202.pdf can be opened by evince, and there's no real reason that qpdf
  409 + couldn't be made to be able to recover that file as well.
  410 +
  411 +* Audit every place where qpdf allocates memory to see whether there are cases where malicious
  412 + inputs could cause qpdf to attempt to grab very large amounts of memory. Certainly there are cases
  413 + like this, such as if a very highly compressed, very large image stream is requested in a buffer.
  414 + Hopefully normal input to output filtering doesn't ever try to do this. QPDFWriter should be
  415 + checked carefully too. See also bugs/private/from-email-663916/
504 416
505 * Interactive form modification: 417 * Interactive form modification:
506 - https://github.com/qpdf/qpdf/issues/213 contains a good discussion  
507 - of some ideas for adding methods to modify annotations and form  
508 - fields if we want to make it easier to support modifications to  
509 - interactive forms. Some of the ideas have been implemented, and  
510 - some of the probably never will be implemented, but it's worth a  
511 - read if there is an intention to work on this. In the issue, search  
512 - for "Regarding write functionality", and read that comment and the 418 + https://github.com/qpdf/qpdf/issues/213 contains a good discussion of some ideas for adding
  419 + methods to modify annotations and form fields if we want to make it easier to support
  420 + modifications to interactive forms. Some of the ideas have been implemented, and some of the
  421 + probably never will be implemented, but it's worth a read if there is an intention to work on
  422 + this. In the issue, search for "Regarding write functionality", and read that comment and the
513 responses to it. 423 responses to it.
514 424
515 * Look at ~/Q/pdf-collection/forms-from-appian/ 425 * Look at ~/Q/pdf-collection/forms-from-appian/
516 426
517 -* When decrypting files with /R=6, hash_V5 is called more than once  
518 - with the same inputs. Caching the results or refactoring to reduce  
519 - the number of identical calls could improve performance for 427 +* When decrypting files with /R=6, hash_V5 is called more than once with the same inputs. Caching
  428 + the results or refactoring to reduce the number of identical calls could improve performance for
520 workloads that involve processing large numbers of small files. 429 workloads that involve processing large numbers of small files.
521 430
522 -* Consider adding a method to balance the pages tree. It would call  
523 - pushInheritedAttributesToPage, construct a pages tree from scratch,  
524 - and replace the /Pages key of the root dictionary with the new  
525 - tree.  
526 -  
527 -* Study what's required to support savable forms that can be saved by  
528 - Adobe Reader. Does this require actually signing the document with  
529 - an Adobe private key? Search for "Digital signatures" in the PDF  
530 - spec, and look at ~/Q/pdf-collection/form-with-full-save.pdf, which  
531 - came from Adobe's example site. See also  
532 - ../misc/digital-sign-from-trueroad/ and  
533 - ../misc/digital-signatures/digitally-signed-pdf-xfa.pdf. If digital  
534 - signatures are implemented, update the docs on crypto providers,  
535 - which mention that this may happen in the future.  
536 -  
537 -* Qpdf does not honor /EFF when adding new file attachments. When it  
538 - encrypts, it never generates streams with explicit crypt filters.  
539 - Prior to 10.2, there was an incorrect attempt to treat /EFF as a  
540 - default value for decrypting file attachment streams, but it is not  
541 - supposed to mean that. Instead, it is intended for conforming  
542 - writers to obey this when adding new attachments. Qpdf is not a  
543 - conforming writer in that respect.  
544 -  
545 -* The whole xref handling code in the QPDF object allows the same  
546 - object with more than one generation to coexist, but a lot of logic  
547 - assumes this isn't the case. Anything that creates mappings only  
548 - with the object number and not the generation is this way,  
549 - including most of the interaction between QPDFWriter and QPDF. If  
550 - we wanted to allow the same object with more than one generation to  
551 - coexist, which I'm not sure is allowed, we could fix this by  
552 - changing xref_table. Alternatively, we could detect and disallow  
553 - that case. In fact, it appears that Adobe reader and other PDF  
554 - viewing software silently ignores objects of this type, so this is  
555 - probably not a big deal.  
556 -  
557 -* From a suggestion in bug 3152169, consider having an option to  
558 - re-encode inline images with an ASCII encoding.  
559 -  
560 -* From github issue 2, provide more in-depth output for examining  
561 - hint stream contents. Consider adding on option to provide a  
562 - human-readable dump of linearization hint tables. This should  
563 - include improving the 'overflow reading bit stream' message as  
564 - reported in issue #2. There are multiple calls to stopOnError in  
565 - the linearization checking code. Ideally, these should not  
566 - terminate checking. It would require re-acquiring an understanding  
567 - of all that code to make the checks more robust. In particular,  
568 - it's hard to look at the code and quickly determine what is a true  
569 - logic error and what could happen because of malformed user input.  
570 - See also ../misc/linearization-errors.  
571 -  
572 -* If I ever decide to make appearance stream-generation aware of  
573 - fonts or font metrics, see email from Tobias with Message-ID 431 +* Consider adding a method to balance the pages tree. It would call pushInheritedAttributesToPage,
  432 + construct a pages tree from scratch, and replace the /Pages key of the root dictionary with the
  433 + new tree.
  434 +
  435 +* Study what's required to support savable forms that can be saved by Adobe Reader. Does this
  436 + require actually signing the document with an Adobe private key? Search for "Digital signatures"
  437 + in the PDF spec, and look at ~/Q/pdf-collection/form-with-full-save.pdf, which came from Adobe's
  438 + example site. See also ../misc/digital-sign-from-trueroad/ and
  439 + ../misc/digital-signatures/digitally-signed-pdf-xfa.pdf. If digital signatures are implemented,
  440 + update the docs on crypto providers, which mention that this may happen in the future.
  441 +
  442 +* Qpdf does not honor /EFF when adding new file attachments. When it encrypts, it never generates
  443 + streams with explicit crypt filters. Prior to 10.2, there was an incorrect attempt to treat /EFF
  444 + as a default value for decrypting file attachment streams, but it is not supposed to mean that.
  445 + Instead, it is intended for conforming writers to obey this when adding new attachments. Qpdf is
  446 + not a conforming writer in that respect.
  447 +
  448 +* The whole xref handling code in the QPDF object allows the same object with more than one
  449 + generation to coexist, but a lot of logic assumes this isn't the case. Anything that creates
  450 + mappings only with the object number and not the generation is this way, including most of the
  451 + interaction between QPDFWriter and QPDF. If we wanted to allow the same object with more than one
  452 + generation to coexist, which I'm not sure is allowed, we could fix this by changing xref_table.
  453 + Alternatively, we could detect and disallow that case. In fact, it appears that Adobe reader and
  454 + other PDF viewing software silently ignores objects of this type, so this is probably not a big
  455 + deal.
  456 +
  457 +* From a suggestion in bug 3152169, consider having an option to re-encode inline images with an
  458 + ASCII encoding.
  459 +
  460 +* From github issue 2, provide more in-depth output for examining hint stream contents. Consider
  461 + adding on option to provide a human-readable dump of linearization hint tables. This should
  462 + include improving the 'overflow reading bit stream' message as reported in issue #2. There are
  463 + multiple calls to stopOnError in the linearization checking code. Ideally, these should not
  464 + terminate checking. It would require re-acquiring an understanding of all that code to make the
  465 + checks more robust. In particular, it's hard to look at the code and quickly determine what is a
  466 + true logic error and what could happen because of malformed user input. See also
  467 + ../misc/linearization-errors.
  468 +
  469 +* If I ever decide to make appearance stream-generation aware of fonts or font metrics, see email
  470 + from Tobias with Message-ID
574 <5C3C9C6C.8000102@thax.hardliners.org> dated 2019-01-14. 471 <5C3C9C6C.8000102@thax.hardliners.org> dated 2019-01-14.
575 472
576 -* Look at places in the code where object traversal is being done and,  
577 - where possible, try to avoid it entirely or at least avoid ever  
578 - traversing the same objects multiple times. 473 +* Look at places in the code where object traversal is being done and, where possible, try to avoid
  474 + it entirely or at least avoid ever traversing the same objects multiple times.
579 475
580 ---------------------------------------------------------------------- 476 ----------------------------------------------------------------------
581 477
@@ -588,281 +484,215 @@ I find it useful to make reference to them in this list. @@ -588,281 +484,215 @@ I find it useful to make reference to them in this list.
588 Performance 484 Performance
589 =========== 485 ===========
590 486
591 -As described in https://github.com/qpdf/qpdf/issues/401, there was  
592 -great performance degradation between qpdf 7.1.1 and 9.1.1. Doing a  
593 -bisect between dac65a21fb4fa5f871e31c314280b75adde89a6c and  
594 -release-qpdf-7.1.1, I found several commits that damaged performance.  
595 -I fixed some of them to improve performance by about 70% (as measured  
596 -by saying that old times were 170% of new times). The remaining  
597 -commits that broke performance either can't be correct because they  
598 -would re-introduce an old bug or aren't worth correcting because of  
599 -the high value they offer relative to a relatively low penalty. For  
600 -historical reference, here are the commits. The numbers are the time  
601 -in seconds on the machine I happened to be using of splitting the  
602 -first 100 pages of PDF32000_2008.pdf 20 times and taking an average  
603 -duration. 487 +As described in https://github.com/qpdf/qpdf/issues/401, there was great performance degradation
  488 +between qpdf 7.1.1 and 9.1.1. Doing a bisect between dac65a21fb4fa5f871e31c314280b75adde89a6c and
  489 +release-qpdf-7.1.1, I found several commits that damaged performance. I fixed some of them to
  490 +improve performance by about 70% (as measured by saying that old times were 170% of new times). The
  491 +remaining commits that broke performance either can't be correct because they would re-introduce an
  492 +old bug or aren't worth correcting because of the high value they offer relative to a relatively low
  493 +penalty. For historical reference, here are the commits. The numbers are the time in seconds on the
  494 +machine I happened to be using of splitting the first 100 pages of PDF32000_2008.pdf 20 times and
  495 +taking an average duration.
604 496
605 Commits that broke performance: 497 Commits that broke performance:
606 498
607 -* d0e99f195a987c483bbb6c5449cf39bee34e08a1 -- object description and  
608 - context: 0.39 -> 0.45  
609 -* a01359189b32c60c2d55b039f7aefd6c3ce0ebde (minus 313ba08) -- fix  
610 - dangling references: 0.55 -> 0.6 499 +* d0e99f195a987c483bbb6c5449cf39bee34e08a1 -- object description and context: 0.39 -> 0.45
  500 +* a01359189b32c60c2d55b039f7aefd6c3ce0ebde (minus 313ba08) -- fix dangling references: 0.55 -> 0.6
611 * e5f504b6c5dc34337cc0b316b4a7b1fca7e614b1 -- sparse array: 0.6 -> 0.62 501 * e5f504b6c5dc34337cc0b316b4a7b1fca7e614b1 -- sparse array: 0.6 -> 0.62
612 502
613 Other intermediate steps that were previously fixed: 503 Other intermediate steps that were previously fixed:
614 504
615 -* 313ba081265f69ac9a0324f9fe87087c72918191 -- copy outlines into  
616 - split: 0.55 -> 4.0 505 +* 313ba081265f69ac9a0324f9fe87087c72918191 -- copy outlines into split: 0.55 -> 4.0
617 * a01359189b32c60c2d55b039f7aefd6c3ce0ebde -- fix dangling references: 506 * a01359189b32c60c2d55b039f7aefd6c3ce0ebde -- fix dangling references:
618 4.0 -> 9.0 507 4.0 -> 9.0
619 508
620 This commit fixed the awful problem introduced in 313ba081: 509 This commit fixed the awful problem introduced in 313ba081:
621 510
622 -* a5a016cdd26a8e5c99e5f019bc30d1bdf6c050a2 -- revert outline  
623 - preservation: 9.0 -> 0.6 511 +* a5a016cdd26a8e5c99e5f019bc30d1bdf6c050a2 -- revert outline preservation: 9.0 -> 0.6
624 512
625 -Note that the fix dangling references commit had a much worse impact  
626 -prior to removing the outline preservation, so I also measured its  
627 -impact in isolation. 513 +Note that the fix dangling references commit had a much worse impact prior to removing the outline
  514 +preservation, so I also measured its impact in isolation.
628 515
629 A few important lessons (in README-maintainer) 516 A few important lessons (in README-maintainer)
630 517
631 -* Indirection through PointerHolder<Members> is expensive, and should  
632 - not be used for things that are created and destroyed frequently  
633 - such as QPDFObjectHandle and QPDFObject.  
634 -* Traversal of objects is expensive and should be avoided where  
635 - possible. 518 +* Indirection through PointerHolder<Members> is expensive, and should not be used for things that
  519 + are created and destroyed frequently such as QPDFObjectHandle and QPDFObject.
  520 +* Traversal of objects is expensive and should be avoided where possible.
636 521
637 -Also, it turns out that PointerHolder is more performant than  
638 -std::shared_ptr. (This was true at the time but subsequent  
639 -implementations of std::shared_ptr became much more efficient.) 522 +Also, it turns out that PointerHolder is more performant than std::shared_ptr. (This was true at the
  523 +time but subsequent implementations of std::shared_ptr became much more efficient.)
640 524
641 QPDFPagesTree 525 QPDFPagesTree
642 ============= 526 =============
643 527
644 -On a few occasions, I have considered implementing a QPDFPagesTree  
645 -object that would allow the document's original page tree structure to  
646 -be preserved. See comments at the top QPDF_pages.cc for why this was  
647 -abandoned.  
648 -  
649 -Partial work is in refs/attic/QPDFPagesTree. QPDFPageTree is mostly  
650 -implemented and mostly tested. There are not enough cases of different  
651 -kinds of operations (pclm, linearize, json, etc.) with non-flat pages  
652 -trees. Insertion is not implemented. Insertion is potentially complex  
653 -because of the issue of inherited objects. We will have to call  
654 -pushInheritedAttributesToPage before adding any pages to the pages  
655 -tree. The test suite is failing on that branch.  
656 -  
657 -Some parts of page tree repair are silent (no warnings). All page tree  
658 -repair should warn. The reason is that page tree repair will change  
659 -object numbers, and knowing that is important when working with JSON  
660 -output.  
661 -  
662 -If we were to do this, we would still need keep a pages cache for  
663 -efficient insertion. There's no reason we can't keep a vector of page  
664 -objects up to date and just do a traversal the first time we do  
665 -getAllPages just like we do now. The difference is that we would not  
666 -flatten the pages tree. It would be useful to go through QPDF_pages  
667 -and reimplement everything without calling flattenPagesTree. Then we  
668 -can remove flattenPagesTree, which is private. That said, with the  
669 -addition of creating non-flat pages trees, there is really no reason  
670 -not to flatten the pages tree for internal use.  
671 -  
672 -In its current state, QPDFPagesTree does not proactively fix /Type or  
673 -correct page objects that are used multiple times. You have to  
674 -traverse the pages tree to trigger this operation. It would be nice if  
675 -we would do that somewhere but not do it more often than necessary so  
676 -isPagesObject and isPageObject are reliable and can be made more  
677 -reliable. Maybe add a validate or repair function? It should also make  
678 -sure /Count and /Parent are correct. 528 +On a few occasions, I have considered implementing a QPDFPagesTree object that would allow the
  529 +document's original page tree structure to be preserved. See comments at the top QPDF_pages.cc for
  530 +why this was abandoned.
  531 +
  532 +Partial work is in refs/attic/QPDFPagesTree. QPDFPageTree is mostly implemented and mostly tested.
  533 +There are not enough cases of different kinds of operations (pclm, linearize, json, etc.) with
  534 +non-flat pages trees. Insertion is not implemented. Insertion is potentially complex because of the
  535 +issue of inherited objects. We will have to call pushInheritedAttributesToPage before adding any
  536 +pages to the pages tree. The test suite is failing on that branch.
  537 +
  538 +Some parts of page tree repair are silent (no warnings). All page tree repair should warn. The
  539 +reason is that page tree repair will change object numbers, and knowing that is important when
  540 +working with JSON output.
  541 +
  542 +If we were to do this, we would still need keep a pages cache for efficient insertion. There's no
  543 +reason we can't keep a vector of page objects up to date and just do a traversal the first time we
  544 +do getAllPages just like we do now. The difference is that we would not flatten the pages tree. It
  545 +would be useful to go through QPDF_pages and reimplement everything without calling
  546 +flattenPagesTree. Then we can remove flattenPagesTree, which is private. That said, with the
  547 +addition of creating non-flat pages trees, there is really no reason not to flatten the pages tree
  548 +for internal use.
  549 +
  550 +In its current state, QPDFPagesTree does not proactively fix /Type or correct page objects that are
  551 +used multiple times. You have to traverse the pages tree to trigger this operation. It would be nice
  552 +if we would do that somewhere but not do it more often than necessary so isPagesObject and
  553 +isPageObject are reliable and can be made more reliable. Maybe add a validate or repair function? It
  554 +should also make sure /Count and /Parent are correct.
679 555
680 Rejected Ideas 556 Rejected Ideas
681 ============== 557 ==============
682 558
683 -* Investigate whether there is a way to automate the memory checker  
684 - tests for Windows.  
685 -  
686 -* Provide support in QPDFWriter for writing incremental updates.  
687 - Provide support in qpdf for preserving incremental updates. The  
688 - goal should be that QDF mode should be fully functional for files  
689 - with incremental updates including fix_qdf.  
690 -  
691 - Note that there's nothing that says an indirect object in one  
692 - update can't refer to an object that doesn't appear until a later  
693 - update. This means that QPDF has to treat indirect null objects  
694 - differently from how it does now. QPDF drops indirect null objects  
695 - that appear as members of arrays or dictionaries. For arrays, it's  
696 - handled in QPDFWriter where we make indirect nulls direct. This is  
697 - in a single if block, and nothing else in the code cares about it.  
698 - We could just remove that if block and not break anything except a  
699 - few test cases that exercise the current behavior. For  
700 - dictionaries, it's more complicated. In this case,  
701 - QPDF_Dictionary::getKeys() ignores all keys with null values, and  
702 - hasKey() returns false for keys that have null values. We would  
703 - probably want to make QPDF_Dictionary able to handle the special  
704 - case of keys that are indirect nulls and basically never have it  
705 - drop any keys that are indirect objects.  
706 -  
707 - If we make a change to have qpdf preserve indirect references to  
708 - null objects, we have to note this in ChangeLog and in the release  
709 - notes since this will change output files. We did this before when  
710 - we stopped flattening scalar references, so this is probably not a  
711 - big deal. We also have to make sure that the testing for this  
712 - handles non-trivial cases of the targets of indirect nulls being  
713 - replaced by real objects in an update. I'm not sure how this plays  
714 - with linearization, if at all. For cases where incremental updates  
715 - are not being preserved as incremental updates and where the data  
716 - is being folded in (as is always the case with qpdf now), none of  
717 - this should make any difference in the actual semantics of the  
718 - files.  
719 -  
720 -* The second xref stream for linearized files has to be padded only  
721 - because we need file_size as computed in pass 1 to be accurate. If  
722 - we were not allowing writing to a pipe, we could seek back to the  
723 - beginning and fill in the value of /L in the linearization  
724 - dictionary as an optimization to alleviate the need for this  
725 - padding. Doing so would require us to pad the /L value  
726 - individually and also to save the file descriptor and determine  
727 - whether it's seekable. This is probably not worth bothering with.  
728 -  
729 -* Based on an idea suggested by user "Atom Smasher", consider  
730 - providing some mechanism to recover earlier versions of a file  
731 - embedded prior to appended sections.  
732 -  
733 -* Consider creating a sanitizer to make it easier for people to send  
734 - broken files. Now that we have json mode, this is probably no  
735 - longer worth doing. Here is the previous idea, possibly implemented  
736 - by making it possible to run the lexer (tokenizer) over a whole  
737 - file. Make it possible to replace all strings in a file lexically  
738 - even on badly broken files. Ideally this should work files that are  
739 - lacking xref, have broken links, duplicated dictionary keys, syntax  
740 - errors, etc., and ideally it should work with encrypted files if  
741 - possible. This should go through the streams and strings and  
742 - replace them with fixed or random characters, preferably, but not  
743 - necessarily, in a manner that works with fonts. One possibility  
744 - would be to detect whether a string contains characters with normal  
745 - encoding, and if so, use 0x41. If the string uses character maps,  
746 - use 0x01. The output should otherwise be unrelated to the input.  
747 - This could be built after the filtering and tokenizer rewrite and  
748 - should be done in a manner that takes advantage of the other  
749 - lexical features. This sanitizer should also clear metadata and  
750 - replace images. If I ever do this, the file from issue #494 would  
751 - be a great one to look at.  
752 -  
753 -* Here are some notes about having stream data providers modify  
754 - stream dictionaries. I had wanted to add this functionality to make  
755 - it more efficient to create stream data providers that may  
756 - dynamically decide what kind of filters to use and that may end up  
757 - modifying the dictionary conditionally depending on the original  
758 - stream data. Ultimately I decided not to implement this feature.  
759 - This paragraph describes why.  
760 -  
761 - * When writing, the way objects are placed into the queue for  
762 - writing strongly precludes creation of any new indirect objects,  
763 - or even changing which indirect objects are referenced from which  
764 - other objects, because we sometimes write as we are traversing  
765 - and enqueuing objects. For non-linearized files, there is a risk  
766 - that an indirect object that used to be referenced would no  
767 - longer be referenced, and whether it was already written to the  
768 - output file would be based on an accident of where it was  
769 - encountered when traversing the object structure. For linearized  
770 - files, the situation is considerably worse. We decide which  
771 - section of the file to write an object to based on a mapping of  
772 - which objects are used by which other objects. Changing this  
773 - mapping could cause an object to appear in the wrong section, to  
774 - be written even though it is unreferenced, or to be entirely  
775 - omitted since, during linearization, we don't enqueue new objects  
776 - as we traverse for writing.  
777 -  
778 - * There are several places in QPDFWriter that query a stream's  
779 - dictionary in order to prepare for writing or to make decisions  
780 - about certain aspects of the writing process. If the stream data  
781 - provider has the chance to modify the dictionary, every piece of  
782 - code that gets stream data would have to be aware of this. This  
783 - would potentially include end user code. For example, any code  
784 - that called getDict() on a stream before installing a stream data  
785 - provider and expected that dictionary to be valid would  
786 - potentially be broken. As implemented right now, you must perform  
787 - any modifications on the dictionary in advance and provided  
788 - /Filter and /DecodeParms at the time you installed the stream  
789 - data provider. This means that some computations would have to be  
790 - done more than once, but for linearized files, stream data  
791 - providers are already called more than once. If the work done by  
792 - a stream data provider is especially expensive, it can implement 559 +* Investigate whether there is a way to automate the memory checker tests for Windows.
  560 +
  561 +* Provide support in QPDFWriter for writing incremental updates. Provide support in qpdf for
  562 + preserving incremental updates. The goal should be that QDF mode should be fully functional for
  563 + files with incremental updates including fix_qdf.
  564 +
  565 + Note that there's nothing that says an indirect object in one update can't refer to an object that
  566 + doesn't appear until a later update. This means that QPDF has to treat indirect null objects
  567 + differently from how it does now. QPDF drops indirect null objects that appear as members of
  568 + arrays or dictionaries. For arrays, it's handled in QPDFWriter where we make indirect nulls
  569 + direct. This is in a single if block, and nothing else in the code cares about it. We could just
  570 + remove that if block and not break anything except a few test cases that exercise the current
  571 + behavior. For dictionaries, it's more complicated. In this case, QPDF_Dictionary::getKeys()
  572 + ignores all keys with null values, and hasKey() returns false for keys that have null values. We
  573 + would probably want to make QPDF_Dictionary able to handle the special case of keys that are
  574 + indirect nulls and basically never have it drop any keys that are indirect objects.
  575 +
  576 + If we make a change to have qpdf preserve indirect references to null objects, we have to note
  577 + this in ChangeLog and in the release notes since this will change output files. We did this before
  578 + when we stopped flattening scalar references, so this is probably not a big deal. We also have to
  579 + make sure that the testing for this handles non-trivial cases of the targets of indirect nulls
  580 + being replaced by real objects in an update. I'm not sure how this plays with linearization, if at
  581 + all. For cases where incremental updates are not being preserved as incremental updates and where
  582 + the data is being folded in (as is always the case with qpdf now), none of this should make any
  583 + difference in the actual semantics of the files.
  584 +
  585 +* The second xref stream for linearized files has to be padded only because we need file_size as
  586 + computed in pass 1 to be accurate. If we were not allowing writing to a pipe, we could seek back
  587 + to the beginning and fill in the value of /L in the linearization dictionary as an optimization to
  588 + alleviate the need for this padding. Doing so would require us to pad the /L value individually
  589 + and also to save the file descriptor and determine whether it's seekable. This is probably not
  590 + worth bothering with.
  591 +
  592 +* Based on an idea suggested by user "Atom Smasher", consider providing some mechanism to recover
  593 + earlier versions of a file embedded prior to appended sections.
  594 +
  595 +* Consider creating a sanitizer to make it easier for people to send broken files. Now that we have
  596 + json mode, this is probably no longer worth doing. Here is the previous idea, possibly implemented
  597 + by making it possible to run the lexer (tokenizer) over a whole file. Make it possible to replace
  598 + all strings in a file lexically even on badly broken files. Ideally this should work files that
  599 + are lacking xref, have broken links, duplicated dictionary keys, syntax errors, etc., and ideally
  600 + it should work with encrypted files if possible. This should go through the streams and strings
  601 + and replace them with fixed or random characters, preferably, but not necessarily, in a manner
  602 + that works with fonts. One possibility would be to detect whether a string contains characters
  603 + with normal encoding, and if so, use 0x41. If the string uses character maps, use 0x01. The output
  604 + should otherwise be unrelated to the input. This could be built after the filtering and tokenizer
  605 + rewrite and should be done in a manner that takes advantage of the other lexical features. This
  606 + sanitizer should also clear metadata and replace images. If I ever do this, the file from issue
  607 + #494 would be a great one to look at.
  608 +
  609 +* Here are some notes about having stream data providers modify stream dictionaries. I had wanted to
  610 + add this functionality to make it more efficient to create stream data providers that may
  611 + dynamically decide what kind of filters to use and that may end up modifying the dictionary
  612 + conditionally depending on the original stream data. Ultimately I decided not to implement this
  613 + feature. This paragraph describes why.
  614 +
  615 + * When writing, the way objects are placed into the queue for writing strongly precludes creation
  616 + of any new indirect objects, or even changing which indirect objects are referenced from which
  617 + other objects, because we sometimes write as we are traversing and enqueuing objects. For
  618 + non-linearized files, there is a risk that an indirect object that used to be referenced would
  619 + no longer be referenced, and whether it was already written to the output file would be based on
  620 + an accident of where it was encountered when traversing the object structure. For linearized
  621 + files, the situation is considerably worse. We decide which section of the file to write an
  622 + object to based on a mapping of which objects are used by which other objects. Changing this
  623 + mapping could cause an object to appear in the wrong section, to be written even though it is
  624 + unreferenced, or to be entirely omitted since, during linearization, we don't enqueue new
  625 + objects as we traverse for writing.
  626 +
  627 + * There are several places in QPDFWriter that query a stream's dictionary in order to prepare for
  628 + writing or to make decisions about certain aspects of the writing process. If the stream data
  629 + provider has the chance to modify the dictionary, every piece of code that gets stream data
  630 + would have to be aware of this. This would potentially include end user code. For example, any
  631 + code that called getDict() on a stream before installing a stream data provider and expected
  632 + that dictionary to be valid would potentially be broken. As implemented right now, you must
  633 + perform any modifications on the dictionary in advance and provided /Filter and /DecodeParms at
  634 + the time you installed the stream data provider. This means that some computations would have to
  635 + be done more than once, but for linearized files, stream data providers are already called more
  636 + than once. If the work done by a stream data provider is especially expensive, it can implement
793 its own cache. 637 its own cache.
794 638
795 - The example examples/pdf-custom-filter.cc demonstrates the use of  
796 - custom stream filters. This includes a custom pipeline, a custom  
797 - stream filter, as well as modification of a stream's dictionary to  
798 - include creation of a new stream that is referenced from  
799 - /DecodeParms.  
800 -  
801 -* Removal of raw QPDF* from the API. Discussions in #747 and #754.  
802 - This is a summary of the arguments I put forth in #754. The idea was  
803 - to make QPDF::QPDF() private and require all QPDF objects to be  
804 - shared pointers created with QPDF::create(). This would enable us to  
805 - have QPDFObjectHandle::getOwningQPDF() return a std::weak_ptr<QPDF>.  
806 - Prior to #726 (QPDFObject/QPDFValue split, released in qpdf 11.0.0),  
807 - getOwningQPDF() could return an invalid pointer if the owning QPDF  
808 - disappeared, but this is no longer the case, which removes the main 639 + The example examples/pdf-custom-filter.cc demonstrates the use of custom stream filters. This
  640 + includes a custom pipeline, a custom stream filter, as well as modification of a stream's
  641 + dictionary to include creation of a new stream that is referenced from /DecodeParms.
  642 +
  643 +* Removal of raw QPDF* from the API. Discussions in #747 and #754. This is a summary of the
  644 + arguments I put forth in #754. The idea was to make QPDF::QPDF() private and require all QPDF
  645 + objects to be shared pointers created with QPDF::create(). This would enable us to have
  646 + QPDFObjectHandle::getOwningQPDF() return a std::weak_ptr<QPDF>. Prior to #726 (
  647 + QPDFObject/QPDFValue split, released in qpdf 11.0.0), getOwningQPDF() could return an invalid
  648 + pointer if the owning QPDF disappeared, but this is no longer the case, which removes the main
809 motivation. QPDF 11 added QPDF::create() anyway though. 649 motivation. QPDF 11 added QPDF::create() anyway though.
810 650
811 - Removing raw QPDF* would look something like this. Note that you  
812 - can't use std::make_shared<T> unless T has a public constructor. 651 + Removing raw QPDF* would look something like this. Note that you can't use std::make_shared<T>
  652 + unless T has a public constructor.
813 653
814 QPDF_POINTER_TRANSITION = 0 -- no warnings around calling the QPDF constructor 654 QPDF_POINTER_TRANSITION = 0 -- no warnings around calling the QPDF constructor
815 - QPDF_POINTER_TRANSITION = 1 -- calls to QPDF() are deprecated, but QPDF is still available so code can be backward compatible and use std::make_shared<QPDF>  
816 - QPDF_POINTER_TRANSITION = 2 -- the QPDF constructor is private; all calls to std::make_shared<QPDF> have to be replaced with QPDF::create  
817 -  
818 - If we were to do this, we'd have to look at each use of QPDF* in the  
819 - interface and decide whether to use a std::shared_ptr or a  
820 - std::weak_ptr. The answer would almost always be to use a  
821 - std::weak_ptr, which means we'd have to take the extra step of  
822 - calling lock(), and it means there would be lots of code changes  
823 - cause people would have to pass weak pointers instead of raw  
824 - pointers around, and those have to be constructed and locked.  
825 - Passing std::shared_ptr around leaves the possibility of creating  
826 - circular references. It seems to be too much trouble in the library  
827 - and too much toil for library users to be worth the small benefit of  
828 - not having to call resetObjGen in QPDF's destructor. 655 + QPDF_POINTER_TRANSITION = 1 -- calls to QPDF() are deprecated, but QPDF is still available so code
  656 + can be backward compatible and use std::make_shared<QPDF>
  657 + QPDF_POINTER_TRANSITION = 2 -- the QPDF constructor is private; all calls to std::
  658 + make_shared<QPDF> have to be replaced with QPDF::create
  659 +
  660 + If we were to do this, we'd have to look at each use of QPDF* in the interface and decide whether
  661 + to use a std::shared_ptr or a std::weak_ptr. The answer would almost always be to use a std::
  662 + weak_ptr, which means we'd have to take the extra step of calling lock(), and it means there would
  663 + be lots of code changes cause people would have to pass weak pointers instead of raw pointers
  664 + around, and those have to be constructed and locked. Passing std::shared_ptr around leaves the
  665 + possibility of creating circular references. It seems to be too much trouble in the library and
  666 + too much toil for library users to be worth the small benefit of not having to call resetObjGen in
  667 + QPDF's destructor.
829 668
830 * Fix Multiple Direct Object Parent Issue 669 * Fix Multiple Direct Object Parent Issue
831 670
832 - This idea was rejected because it would be complicated to implement  
833 - and would likely have a high performance cost to fix what is not  
834 - really that big of a problem in practice.  
835 -  
836 - It is possible for a QPDFObjectHandle for a direct object to be  
837 - contained inside of multiple QPDFObjectHandle objects or even  
838 - replicated across multiple QPDF objects. This creates a potentially  
839 - confusing and unintentional aliasing of direct objects. There are  
840 - known cases in the qpdf library where this happens including page  
841 - splitting and merging (particularly with page labels, and possibly  
842 - with other cases), and also with unsafeShallowCopy. Disallowing this  
843 - would incur a significant performance penalty and is probably not  
844 - worth doing. If we were to do it, here are some ideas.  
845 -  
846 - * Add std::weak_ptr<QPDFObject> parent to QPDFObject. When adding a  
847 - direct object to an array or dictionary, set its parent. When  
848 - removing it, clear the parent pointer. The parent pointer would  
849 - always be null for indirect objects, so the parent pointer, which  
850 - would reside in QPDFObject, would have to be managed by  
851 - QPDFObjectHandle. This is because QPDFObject can't tell the 671 + This idea was rejected because it would be complicated to implement and would likely have a high
  672 + performance cost to fix what is not really that big of a problem in practice.
  673 +
  674 + It is possible for a QPDFObjectHandle for a direct object to be contained inside of multiple
  675 + QPDFObjectHandle objects or even replicated across multiple QPDF objects. This creates a
  676 + potentially confusing and unintentional aliasing of direct objects. There are known cases in the
  677 + qpdf library where this happens including page splitting and merging (particularly with page
  678 + labels, and possibly with other cases), and also with unsafeShallowCopy. Disallowing this would
  679 + incur a significant performance penalty and is probably not worth doing. If we were to do it, here
  680 + are some ideas.
  681 +
  682 + * Add std::weak_ptr<QPDFObject> parent to QPDFObject. When adding a direct object to an array or
  683 + dictionary, set its parent. When removing it, clear the parent pointer. The parent pointer would
  684 + always be null for indirect objects, so the parent pointer, which would reside in QPDFObject,
  685 + would have to be managed by QPDFObjectHandle. This is because QPDFObject can't tell the
852 difference between a resolved indirect object and a direct object. 686 difference between a resolved indirect object and a direct object.
853 687
854 - * Phase 1: When a direct object that already has a parent is added  
855 - to a dictionary or array, issue a warning. There would need to be  
856 - unsafe add methods used by unsafeShallowCopy. These would add but  
857 - not modify the parent pointer.  
858 -  
859 - * Phase 2: In the next major release, make the multiple parent case  
860 - an error. Require people to create a copy. The unsafe operations  
861 - would still have to be permitted.  
862 -  
863 - This approach would allow an object to be moved from one object to  
864 - another by removing it, which returns the now orphaned object, and  
865 - then inserting it somewhere else. It also doesn't break the pattern  
866 - of adding a direct object to something and subsequently mutating it.  
867 - It just prevents the same object from being added to more than one  
868 - thing. 688 + * Phase 1: When a direct object that already has a parent is added to a dictionary or array, issue
  689 + a warning. There would need to be unsafe add methods used by unsafeShallowCopy. These would add
  690 + but not modify the parent pointer.
  691 +
  692 + * Phase 2: In the next major release, make the multiple parent case an error. Require people to
  693 + create a copy. The unsafe operations would still have to be permitted.
  694 +
  695 + This approach would allow an object to be moved from one object to another by removing it, which
  696 + returns the now orphaned object, and then inserting it somewhere else. It also doesn't break the
  697 + pattern of adding a direct object to something and subsequently mutating it. It just prevents the
  698 + same object from being added to more than one thing.