-
PR to fix decalage2#251
-
Oleobj more samples
-
Also: some minor clean-up (forgotten import, add docu, re-wrap docu, remove excess whitespace
-
Also: 2 minor changes
-
Oleobj used to work with all data being read from file in the start. This has recently changed to working on streams to save memory, but compatibility with pre-read was maintained. Compatibility was broken for office2007+ files (zipped xml). Established this now also for these types.
-
The pre-read test found a bug in oleobj for zipped-xml files. Will fix with next commit.
-
oleobj for office2007
-
Forgot to set this back to False after testing
-
Tried around to somehow allow relative imports but gave up (for now)
-
Want to discourage people working on ppt_parser, which would increase the amount of code required to reprodcue in ppt_record_parser in order for it to replace ppt_parser
-
Regular expression \w behaves differently in Python2 (matches only ascii) and Python3 (matches all unicode word characters). Clarify that we only want ascii in sanitized filenames.
-
Strangest thing: this change was necessary for unittesting oleobj. Without this, running python3.3 -m unittest tests.oleobj.test_basic resulted in: AttributeError: 'module' object has no attribute 'oleobj' . That was a rather unhelpful error message.
-
OleFileIO requires a complete seek() and checks for closed attribute. Also added some commented debug print commands to ZipSubFile
-
Also remove 1 exception from output and add a comment
-
This make compatibility with py3 easier, but requires us to guess an encoding. Should work fine for European-generated files, could produce strange results from Asian files.
-
Most changes are just whitespace or line break or case changes. But: - this did find an actual error (variable exc was used before creation) - did move imports up between license and changelog (although I would prefer it in its original place) - removed the _ansi_ from read_*_ansi_string - move logging constants from main to global scope
-
Tell caller of script roughly what happened in call. Also: check whether given file arguments exist and return non-zero exit and remove print of non-existent __doc__
-
This way we do not have to keep a whole big office file in memory. (Olefile might do that, anyway, but then we have one copy less.) Also merge subfunction process_native_stream back into process_file (harder to read but makes more sense for exception handling)