Commit 512a518dd9327147444d4207cc395bff967d1079
1 parent
f34af6b8
Update TODO
Showing
1 changed file
with
40 additions
and
1 deletions
TODO
| 1 | Soon | 1 | Soon |
| 2 | ==== | 2 | ==== |
| 3 | 3 | ||
| 4 | + * Take changes on encryption-keys branch and make them usable. | ||
| 5 | + Replace the hex encoding and decoding piece, and come up with a | ||
| 6 | + more robust way of specifying the key. | ||
| 7 | + | ||
| 4 | * Consider whether there should be a mode in which QPDFObjectHandle | 8 | * Consider whether there should be a mode in which QPDFObjectHandle |
| 5 | returns nulls for operations on the wrong type instead of asserting | 9 | returns nulls for operations on the wrong type instead of asserting |
| 6 | the type. The way things are wired up now, this would have to be a | 10 | the type. The way things are wired up now, this would have to be a |
| @@ -19,7 +23,7 @@ Soon | @@ -19,7 +23,7 @@ Soon | ||
| 19 | 23 | ||
| 20 | * Support user-pluggable stream filters. This would enable external | 24 | * Support user-pluggable stream filters. This would enable external |
| 21 | code to provide interpretation for filters that are missing from | 25 | code to provide interpretation for filters that are missing from |
| 22 | - qpdf. Make it possible for user-provided fitlers to override | 26 | + qpdf. Make it possible for user-provided filters to override |
| 23 | built-in filters. Make sure that the pluggable filters can be | 27 | built-in filters. Make sure that the pluggable filters can be |
| 24 | prioritized so that we can poll all registered filters to see | 28 | prioritized so that we can poll all registered filters to see |
| 25 | whether they are capable of filtering a particular stream. | 29 | whether they are capable of filtering a particular stream. |
| @@ -37,6 +41,41 @@ Soon | @@ -37,6 +41,41 @@ Soon | ||
| 37 | - See ../misc/broken-files | 41 | - See ../misc/broken-files |
| 38 | 42 | ||
| 39 | 43 | ||
| 44 | +Lexical | ||
| 45 | +======= | ||
| 46 | + | ||
| 47 | +Consider rewriting the tokenizer. These are rough ideas at this point. | ||
| 48 | +I may or may not do this as described. | ||
| 49 | + | ||
| 50 | + * Use flex. Generate them from ./autogen.sh and include them in the | ||
| 51 | + source package, but do not commit them. | ||
| 52 | + | ||
| 53 | + * Make it possible to run the lexer (tokenizer) over a while file | ||
| 54 | + such that the following things would be possible: | ||
| 55 | + | ||
| 56 | + * Rewrite fix-qdf in C++ so that there is no longer a runtime perl | ||
| 57 | + dependency | ||
| 58 | + | ||
| 59 | + * Create a way to filter content streams that could be used to | ||
| 60 | + preserve the content stream exactly including spaces but also to | ||
| 61 | + do things like replace everything between a detected set of | ||
| 62 | + markers. This is to support form flattening. Ideally, it should | ||
| 63 | + be possible to use this programmatically on broken files. | ||
| 64 | + | ||
| 65 | + * Make it possible to replace all strings in a file lexically even | ||
| 66 | + on badly broken files. Ideally this should work files that are | ||
| 67 | + lacking xref, have broken links, etc., and ideally it should work | ||
| 68 | + with encrypted files if possible. This should go through the | ||
| 69 | + streams and strings and replace them with fixed or random | ||
| 70 | + characters, preferably, but not necessarily, in a manner that | ||
| 71 | + works with fonts. One possibility would be to detect whether a | ||
| 72 | + string contains characters with normal encoding, and if so, use | ||
| 73 | + 0x41. If the string uses character maps, use 0x01. The output | ||
| 74 | + should otherwise be unrelated to the input. This could be built | ||
| 75 | + after the filtering and tokenizer rewrite and should be done in a | ||
| 76 | + manner that takes advantage of the other lexical features. This | ||
| 77 | + sanitizer should also clear metadata and replace images. | ||
| 78 | + | ||
| 40 | General | 79 | General |
| 41 | ======= | 80 | ======= |
| 42 | 81 |