Commit 4f103c6182df491ac2c6cec39a19ab2eb9032f06

Authored by Jay Berkenbilt
1 parent 8ed3e8c7

TODO note about sanitizer

Showing 1 changed file with 13 additions and 11 deletions
@@ -491,17 +491,19 @@ I find it useful to make reference to them in this list. @@ -491,17 +491,19 @@ I find it useful to make reference to them in this list.
491 by making it possible to run the lexer (tokenizer) over a whole 491 by making it possible to run the lexer (tokenizer) over a whole
492 file. Make it possible to replace all strings in a file lexically 492 file. Make it possible to replace all strings in a file lexically
493 even on badly broken files. Ideally this should work files that are 493 even on badly broken files. Ideally this should work files that are
494 - lacking xref, have broken links, etc., and ideally it should work  
495 - with encrypted files if possible. This should go through the  
496 - streams and strings and replace them with fixed or random  
497 - characters, preferably, but not necessarily, in a manner that works  
498 - with fonts. One possibility would be to detect whether a string  
499 - contains characters with normal encoding, and if so, use 0x41. If  
500 - the string uses character maps, use 0x01. The output should  
501 - otherwise be unrelated to the input. This could be built after the  
502 - filtering and tokenizer rewrite and should be done in a manner that  
503 - takes advantage of the other lexical features. This sanitizer  
504 - should also clear metadata and replace images. 494 + lacking xref, have broken links, duplicated dictionary keys, syntax
  495 + errors, etc., and ideally it should work with encrypted files if
  496 + possible. This should go through the streams and strings and
  497 + replace them with fixed or random characters, preferably, but not
  498 + necessarily, in a manner that works with fonts. One possibility
  499 + would be to detect whether a string contains characters with normal
  500 + encoding, and if so, use 0x41. If the string uses character maps,
  501 + use 0x01. The output should otherwise be unrelated to the input.
  502 + This could be built after the filtering and tokenizer rewrite and
  503 + should be done in a manner that takes advantage of the other
  504 + lexical features. This sanitizer should also clear metadata and
  505 + replace images. If I ever do this, the file from issue #494 would
  506 + be a great one to look at.
505 507
506 * Here are some notes about having stream data providers modify 508 * Here are some notes about having stream data providers modify
507 stream dictionaries. I had wanted to add this functionality to make 509 stream dictionaries. I had wanted to add this functionality to make