Commit 49f4600dd6feae74079ad3a3678f6a390bb4e3a1
1 parent
0ae19c37
TODO: Move lexical stuff and add detail
Showing
1 changed file
with
18 additions
and
23 deletions
TODO
| @@ -59,29 +59,6 @@ C++-11 | @@ -59,29 +59,6 @@ C++-11 | ||
| 59 | time. | 59 | time. |
| 60 | 60 | ||
| 61 | 61 | ||
| 62 | -Lexical | ||
| 63 | -======= | ||
| 64 | - | ||
| 65 | - * Make it possible to run the lexer (tokenizer) over a whole file | ||
| 66 | - such that the following things would be possible: | ||
| 67 | - | ||
| 68 | - * Rewrite fix-qdf in C++ so that there is no longer a runtime perl | ||
| 69 | - dependency | ||
| 70 | - | ||
| 71 | - * Make it possible to replace all strings in a file lexically even | ||
| 72 | - on badly broken files. Ideally this should work files that are | ||
| 73 | - lacking xref, have broken links, etc., and ideally it should work | ||
| 74 | - with encrypted files if possible. This should go through the | ||
| 75 | - streams and strings and replace them with fixed or random | ||
| 76 | - characters, preferably, but not necessarily, in a manner that | ||
| 77 | - works with fonts. One possibility would be to detect whether a | ||
| 78 | - string contains characters with normal encoding, and if so, use | ||
| 79 | - 0x41. If the string uses character maps, use 0x01. The output | ||
| 80 | - should otherwise be unrelated to the input. This could be built | ||
| 81 | - after the filtering and tokenizer rewrite and should be done in a | ||
| 82 | - manner that takes advantage of the other lexical features. This | ||
| 83 | - sanitizer should also clear metadata and replace images. | ||
| 84 | - | ||
| 85 | Page splitting/merging | 62 | Page splitting/merging |
| 86 | ====================== | 63 | ====================== |
| 87 | 64 | ||
| @@ -407,3 +384,21 @@ I find it useful to make reference to them in this list | @@ -407,3 +384,21 @@ I find it useful to make reference to them in this list | ||
| 407 | * If I ever decide to make appearance stream-generation aware of | 384 | * If I ever decide to make appearance stream-generation aware of |
| 408 | fonts or font metrics, see email from Tobias with Message-ID | 385 | fonts or font metrics, see email from Tobias with Message-ID |
| 409 | <5C3C9C6C.8000102@thax.hardliners.org> dated 2019-01-14. | 386 | <5C3C9C6C.8000102@thax.hardliners.org> dated 2019-01-14. |
| 387 | + | ||
| 388 | + * Consider creating a sanitizer to make it easier for people to send | ||
| 389 | + broken files. Now that we have json mode, this is probably no | ||
| 390 | + longer worth doing. Here is the previous idea, possibly implemented | ||
| 391 | + by making it possible to run the lexer (tokenizer) over a whole | ||
| 392 | + file. Make it possible to replace all strings in a file lexically | ||
| 393 | + even on badly broken files. Ideally this should work files that are | ||
| 394 | + lacking xref, have broken links, etc., and ideally it should work | ||
| 395 | + with encrypted files if possible. This should go through the | ||
| 396 | + streams and strings and replace them with fixed or random | ||
| 397 | + characters, preferably, but not necessarily, in a manner that works | ||
| 398 | + with fonts. One possibility would be to detect whether a string | ||
| 399 | + contains characters with normal encoding, and if so, use 0x41. If | ||
| 400 | + the string uses character maps, use 0x01. The output should | ||
| 401 | + otherwise be unrelated to the input. This could be built after the | ||
| 402 | + filtering and tokenizer rewrite and should be done in a manner that | ||
| 403 | + takes advantage of the other lexical features. This sanitizer | ||
| 404 | + should also clear metadata and replace images. |