-
The code made a copy of most of the input before shortening it, and that happened for every control word (of which there are often many). That made the code very slow. For my 11MB test file, this gave a speed-up by a factor of 10.
-
rtfobj: remove check for uppercased RTF magic
-
Fix some PEP8-related linter complaints.
-
Word does not accept files which magic is not fully lowercase.
-
Olevba skip rtf
-
convert byte concat to bytearray
-
aligned olevba with olevba3
-
PR to fix decalage2#251
-
Oleobj more samples
-
Also: some minor clean-up (forgotten import, add docu, re-wrap docu, remove excess whitespace
-
Also: 2 minor changes
-
Oleobj used to work with all data being read from file in the start. This has recently changed to working on streams to save memory, but compatibility with pre-read was maintained. Compatibility was broken for office2007+ files (zipped xml). Established this now also for these types.
-
The pre-read test found a bug in oleobj for zipped-xml files. Will fix with next commit.
-
oleobj for office2007
-
Forgot to set this back to False after testing
-
Tried around to somehow allow relative imports but gave up (for now)
-
Want to discourage people working on ppt_parser, which would increase the amount of code required to reprodcue in ppt_record_parser in order for it to replace ppt_parser
-
Regular expression \w behaves differently in Python2 (matches only ascii) and Python3 (matches all unicode word characters). Clarify that we only want ascii in sanitized filenames.