Commit 512a518dd9327147444d4207cc395bff967d1079

Authored by Jay Berkenbilt
1 parent f34af6b8

Update TODO

Showing 1 changed file with 40 additions and 1 deletions
1 1 Soon
2 2 ====
3 3  
  4 + * Take changes on encryption-keys branch and make them usable.
  5 + Replace the hex encoding and decoding piece, and come up with a
  6 + more robust way of specifying the key.
  7 +
4 8 * Consider whether there should be a mode in which QPDFObjectHandle
5 9 returns nulls for operations on the wrong type instead of asserting
6 10 the type. The way things are wired up now, this would have to be a
... ... @@ -19,7 +23,7 @@ Soon
19 23  
20 24 * Support user-pluggable stream filters. This would enable external
21 25 code to provide interpretation for filters that are missing from
22   - qpdf. Make it possible for user-provided fitlers to override
  26 + qpdf. Make it possible for user-provided filters to override
23 27 built-in filters. Make sure that the pluggable filters can be
24 28 prioritized so that we can poll all registered filters to see
25 29 whether they are capable of filtering a particular stream.
... ... @@ -37,6 +41,41 @@ Soon
37 41 - See ../misc/broken-files
38 42  
39 43  
  44 +Lexical
  45 +=======
  46 +
  47 +Consider rewriting the tokenizer. These are rough ideas at this point.
  48 +I may or may not do this as described.
  49 +
  50 + * Use flex. Generate them from ./autogen.sh and include them in the
  51 + source package, but do not commit them.
  52 +
  53 + * Make it possible to run the lexer (tokenizer) over a while file
  54 + such that the following things would be possible:
  55 +
  56 + * Rewrite fix-qdf in C++ so that there is no longer a runtime perl
  57 + dependency
  58 +
  59 + * Create a way to filter content streams that could be used to
  60 + preserve the content stream exactly including spaces but also to
  61 + do things like replace everything between a detected set of
  62 + markers. This is to support form flattening. Ideally, it should
  63 + be possible to use this programmatically on broken files.
  64 +
  65 + * Make it possible to replace all strings in a file lexically even
  66 + on badly broken files. Ideally this should work files that are
  67 + lacking xref, have broken links, etc., and ideally it should work
  68 + with encrypted files if possible. This should go through the
  69 + streams and strings and replace them with fixed or random
  70 + characters, preferably, but not necessarily, in a manner that
  71 + works with fonts. One possibility would be to detect whether a
  72 + string contains characters with normal encoding, and if so, use
  73 + 0x41. If the string uses character maps, use 0x01. The output
  74 + should otherwise be unrelated to the input. This could be built
  75 + after the filtering and tokenizer rewrite and should be done in a
  76 + manner that takes advantage of the other lexical features. This
  77 + sanitizer should also clear metadata and replace images.
  78 +
40 79 General
41 80 =======
42 81  
... ...