Commit 512a518dd9327147444d4207cc395bff967d1079
1 parent
f34af6b8
Update TODO
Showing
1 changed file
with
40 additions
and
1 deletions
TODO
| 1 | 1 | Soon |
| 2 | 2 | ==== |
| 3 | 3 | |
| 4 | + * Take changes on encryption-keys branch and make them usable. | |
| 5 | + Replace the hex encoding and decoding piece, and come up with a | |
| 6 | + more robust way of specifying the key. | |
| 7 | + | |
| 4 | 8 | * Consider whether there should be a mode in which QPDFObjectHandle |
| 5 | 9 | returns nulls for operations on the wrong type instead of asserting |
| 6 | 10 | the type. The way things are wired up now, this would have to be a |
| ... | ... | @@ -19,7 +23,7 @@ Soon |
| 19 | 23 | |
| 20 | 24 | * Support user-pluggable stream filters. This would enable external |
| 21 | 25 | code to provide interpretation for filters that are missing from |
| 22 | - qpdf. Make it possible for user-provided fitlers to override | |
| 26 | + qpdf. Make it possible for user-provided filters to override | |
| 23 | 27 | built-in filters. Make sure that the pluggable filters can be |
| 24 | 28 | prioritized so that we can poll all registered filters to see |
| 25 | 29 | whether they are capable of filtering a particular stream. |
| ... | ... | @@ -37,6 +41,41 @@ Soon |
| 37 | 41 | - See ../misc/broken-files |
| 38 | 42 | |
| 39 | 43 | |
| 44 | +Lexical | |
| 45 | +======= | |
| 46 | + | |
| 47 | +Consider rewriting the tokenizer. These are rough ideas at this point. | |
| 48 | +I may or may not do this as described. | |
| 49 | + | |
| 50 | + * Use flex. Generate them from ./autogen.sh and include them in the | |
| 51 | + source package, but do not commit them. | |
| 52 | + | |
| 53 | + * Make it possible to run the lexer (tokenizer) over a while file | |
| 54 | + such that the following things would be possible: | |
| 55 | + | |
| 56 | + * Rewrite fix-qdf in C++ so that there is no longer a runtime perl | |
| 57 | + dependency | |
| 58 | + | |
| 59 | + * Create a way to filter content streams that could be used to | |
| 60 | + preserve the content stream exactly including spaces but also to | |
| 61 | + do things like replace everything between a detected set of | |
| 62 | + markers. This is to support form flattening. Ideally, it should | |
| 63 | + be possible to use this programmatically on broken files. | |
| 64 | + | |
| 65 | + * Make it possible to replace all strings in a file lexically even | |
| 66 | + on badly broken files. Ideally this should work files that are | |
| 67 | + lacking xref, have broken links, etc., and ideally it should work | |
| 68 | + with encrypted files if possible. This should go through the | |
| 69 | + streams and strings and replace them with fixed or random | |
| 70 | + characters, preferably, but not necessarily, in a manner that | |
| 71 | + works with fonts. One possibility would be to detect whether a | |
| 72 | + string contains characters with normal encoding, and if so, use | |
| 73 | + 0x41. If the string uses character maps, use 0x01. The output | |
| 74 | + should otherwise be unrelated to the input. This could be built | |
| 75 | + after the filtering and tokenizer rewrite and should be done in a | |
| 76 | + manner that takes advantage of the other lexical features. This | |
| 77 | + sanitizer should also clear metadata and replace images. | |
| 78 | + | |
| 40 | 79 | General |
| 41 | 80 | ======= |
| 42 | 81 | ... | ... |