Commit 4f103c6182df491ac2c6cec39a19ab2eb9032f06
1 parent
8ed3e8c7
TODO note about sanitizer
Showing
1 changed file
with
13 additions
and
11 deletions
TODO
| @@ -491,17 +491,19 @@ I find it useful to make reference to them in this list. | @@ -491,17 +491,19 @@ I find it useful to make reference to them in this list. | ||
| 491 | by making it possible to run the lexer (tokenizer) over a whole | 491 | by making it possible to run the lexer (tokenizer) over a whole |
| 492 | file. Make it possible to replace all strings in a file lexically | 492 | file. Make it possible to replace all strings in a file lexically |
| 493 | even on badly broken files. Ideally this should work files that are | 493 | even on badly broken files. Ideally this should work files that are |
| 494 | - lacking xref, have broken links, etc., and ideally it should work | ||
| 495 | - with encrypted files if possible. This should go through the | ||
| 496 | - streams and strings and replace them with fixed or random | ||
| 497 | - characters, preferably, but not necessarily, in a manner that works | ||
| 498 | - with fonts. One possibility would be to detect whether a string | ||
| 499 | - contains characters with normal encoding, and if so, use 0x41. If | ||
| 500 | - the string uses character maps, use 0x01. The output should | ||
| 501 | - otherwise be unrelated to the input. This could be built after the | ||
| 502 | - filtering and tokenizer rewrite and should be done in a manner that | ||
| 503 | - takes advantage of the other lexical features. This sanitizer | ||
| 504 | - should also clear metadata and replace images. | 494 | + lacking xref, have broken links, duplicated dictionary keys, syntax |
| 495 | + errors, etc., and ideally it should work with encrypted files if | ||
| 496 | + possible. This should go through the streams and strings and | ||
| 497 | + replace them with fixed or random characters, preferably, but not | ||
| 498 | + necessarily, in a manner that works with fonts. One possibility | ||
| 499 | + would be to detect whether a string contains characters with normal | ||
| 500 | + encoding, and if so, use 0x41. If the string uses character maps, | ||
| 501 | + use 0x01. The output should otherwise be unrelated to the input. | ||
| 502 | + This could be built after the filtering and tokenizer rewrite and | ||
| 503 | + should be done in a manner that takes advantage of the other | ||
| 504 | + lexical features. This sanitizer should also clear metadata and | ||
| 505 | + replace images. If I ever do this, the file from issue #494 would | ||
| 506 | + be a great one to look at. | ||
| 505 | 507 | ||
| 506 | * Here are some notes about having stream data providers modify | 508 | * Here are some notes about having stream data providers modify |
| 507 | stream dictionaries. I had wanted to add this functionality to make | 509 | stream dictionaries. I had wanted to add this functionality to make |