Commit 4f103c6182df491ac2c6cec39a19ab2eb9032f06
1 parent
8ed3e8c7
TODO note about sanitizer
Showing
1 changed file
with
13 additions
and
11 deletions
TODO
| ... | ... | @@ -491,17 +491,19 @@ I find it useful to make reference to them in this list. |
| 491 | 491 | by making it possible to run the lexer (tokenizer) over a whole |
| 492 | 492 | file. Make it possible to replace all strings in a file lexically |
| 493 | 493 | even on badly broken files. Ideally this should work files that are |
| 494 | - lacking xref, have broken links, etc., and ideally it should work | |
| 495 | - with encrypted files if possible. This should go through the | |
| 496 | - streams and strings and replace them with fixed or random | |
| 497 | - characters, preferably, but not necessarily, in a manner that works | |
| 498 | - with fonts. One possibility would be to detect whether a string | |
| 499 | - contains characters with normal encoding, and if so, use 0x41. If | |
| 500 | - the string uses character maps, use 0x01. The output should | |
| 501 | - otherwise be unrelated to the input. This could be built after the | |
| 502 | - filtering and tokenizer rewrite and should be done in a manner that | |
| 503 | - takes advantage of the other lexical features. This sanitizer | |
| 504 | - should also clear metadata and replace images. | |
| 494 | + lacking xref, have broken links, duplicated dictionary keys, syntax | |
| 495 | + errors, etc., and ideally it should work with encrypted files if | |
| 496 | + possible. This should go through the streams and strings and | |
| 497 | + replace them with fixed or random characters, preferably, but not | |
| 498 | + necessarily, in a manner that works with fonts. One possibility | |
| 499 | + would be to detect whether a string contains characters with normal | |
| 500 | + encoding, and if so, use 0x41. If the string uses character maps, | |
| 501 | + use 0x01. The output should otherwise be unrelated to the input. | |
| 502 | + This could be built after the filtering and tokenizer rewrite and | |
| 503 | + should be done in a manner that takes advantage of the other | |
| 504 | + lexical features. This sanitizer should also clear metadata and | |
| 505 | + replace images. If I ever do this, the file from issue #494 would | |
| 506 | + be a great one to look at. | |
| 505 | 507 | |
| 506 | 508 | * Here are some notes about having stream data providers modify |
| 507 | 509 | stream dictionaries. I had wanted to add this functionality to make | ... | ... |