Commit 4f103c6182df491ac2c6cec39a19ab2eb9032f06

Authored by Jay Berkenbilt
1 parent 8ed3e8c7

TODO note about sanitizer

Showing 1 changed file with 13 additions and 11 deletions
... ... @@ -491,17 +491,19 @@ I find it useful to make reference to them in this list.
491 491 by making it possible to run the lexer (tokenizer) over a whole
492 492 file. Make it possible to replace all strings in a file lexically
493 493 even on badly broken files. Ideally this should work files that are
494   - lacking xref, have broken links, etc., and ideally it should work
495   - with encrypted files if possible. This should go through the
496   - streams and strings and replace them with fixed or random
497   - characters, preferably, but not necessarily, in a manner that works
498   - with fonts. One possibility would be to detect whether a string
499   - contains characters with normal encoding, and if so, use 0x41. If
500   - the string uses character maps, use 0x01. The output should
501   - otherwise be unrelated to the input. This could be built after the
502   - filtering and tokenizer rewrite and should be done in a manner that
503   - takes advantage of the other lexical features. This sanitizer
504   - should also clear metadata and replace images.
  494 + lacking xref, have broken links, duplicated dictionary keys, syntax
  495 + errors, etc., and ideally it should work with encrypted files if
  496 + possible. This should go through the streams and strings and
  497 + replace them with fixed or random characters, preferably, but not
  498 + necessarily, in a manner that works with fonts. One possibility
  499 + would be to detect whether a string contains characters with normal
  500 + encoding, and if so, use 0x41. If the string uses character maps,
  501 + use 0x01. The output should otherwise be unrelated to the input.
  502 + This could be built after the filtering and tokenizer rewrite and
  503 + should be done in a manner that takes advantage of the other
  504 + lexical features. This sanitizer should also clear metadata and
  505 + replace images. If I ever do this, the file from issue #494 would
  506 + be a great one to look at.
505 507  
506 508 * Here are some notes about having stream data providers modify
507 509 stream dictionaries. I had wanted to add this functionality to make
... ...