Commit 49f4600dd6feae74079ad3a3678f6a390bb4e3a1
1 parent
0ae19c37
TODO: Move lexical stuff and add detail
Showing
1 changed file
with
18 additions
and
23 deletions
TODO
| ... | ... | @@ -59,29 +59,6 @@ C++-11 |
| 59 | 59 | time. |
| 60 | 60 | |
| 61 | 61 | |
| 62 | -Lexical | |
| 63 | -======= | |
| 64 | - | |
| 65 | - * Make it possible to run the lexer (tokenizer) over a whole file | |
| 66 | - such that the following things would be possible: | |
| 67 | - | |
| 68 | - * Rewrite fix-qdf in C++ so that there is no longer a runtime perl | |
| 69 | - dependency | |
| 70 | - | |
| 71 | - * Make it possible to replace all strings in a file lexically even | |
| 72 | - on badly broken files. Ideally this should work files that are | |
| 73 | - lacking xref, have broken links, etc., and ideally it should work | |
| 74 | - with encrypted files if possible. This should go through the | |
| 75 | - streams and strings and replace them with fixed or random | |
| 76 | - characters, preferably, but not necessarily, in a manner that | |
| 77 | - works with fonts. One possibility would be to detect whether a | |
| 78 | - string contains characters with normal encoding, and if so, use | |
| 79 | - 0x41. If the string uses character maps, use 0x01. The output | |
| 80 | - should otherwise be unrelated to the input. This could be built | |
| 81 | - after the filtering and tokenizer rewrite and should be done in a | |
| 82 | - manner that takes advantage of the other lexical features. This | |
| 83 | - sanitizer should also clear metadata and replace images. | |
| 84 | - | |
| 85 | 62 | Page splitting/merging |
| 86 | 63 | ====================== |
| 87 | 64 | |
| ... | ... | @@ -407,3 +384,21 @@ I find it useful to make reference to them in this list |
| 407 | 384 | * If I ever decide to make appearance stream-generation aware of |
| 408 | 385 | fonts or font metrics, see email from Tobias with Message-ID |
| 409 | 386 | <5C3C9C6C.8000102@thax.hardliners.org> dated 2019-01-14. |
| 387 | + | |
| 388 | + * Consider creating a sanitizer to make it easier for people to send | |
| 389 | + broken files. Now that we have json mode, this is probably no | |
| 390 | + longer worth doing. Here is the previous idea, possibly implemented | |
| 391 | + by making it possible to run the lexer (tokenizer) over a whole | |
| 392 | + file. Make it possible to replace all strings in a file lexically | |
| 393 | + even on badly broken files. Ideally this should work files that are | |
| 394 | + lacking xref, have broken links, etc., and ideally it should work | |
| 395 | + with encrypted files if possible. This should go through the | |
| 396 | + streams and strings and replace them with fixed or random | |
| 397 | + characters, preferably, but not necessarily, in a manner that works | |
| 398 | + with fonts. One possibility would be to detect whether a string | |
| 399 | + contains characters with normal encoding, and if so, use 0x41. If | |
| 400 | + the string uses character maps, use 0x01. The output should | |
| 401 | + otherwise be unrelated to the input. This could be built after the | |
| 402 | + filtering and tokenizer rewrite and should be done in a manner that | |
| 403 | + takes advantage of the other lexical features. This sanitizer | |
| 404 | + should also clear metadata and replace images. | ... | ... |