-
The pre-read test found a bug in oleobj for zipped-xml files. Will fix with next commit.
-
oleobj for office2007
-
Forgot to set this back to False after testing
-
Tried around to somehow allow relative imports but gave up (for now)
-
Want to discourage people working on ppt_parser, which would increase the amount of code required to reprodcue in ppt_record_parser in order for it to replace ppt_parser
-
Regular expression \w behaves differently in Python2 (matches only ascii) and Python3 (matches all unicode word characters). Clarify that we only want ascii in sanitized filenames.
-
Strangest thing: this change was necessary for unittesting oleobj. Without this, running python3.3 -m unittest tests.oleobj.test_basic resulted in: AttributeError: 'module' object has no attribute 'oleobj' . That was a rather unhelpful error message.
-
OleFileIO requires a complete seek() and checks for closed attribute. Also added some commented debug print commands to ZipSubFile
-
Also remove 1 exception from output and add a comment
-
This make compatibility with py3 easier, but requires us to guess an encoding. Should work fine for European-generated files, could produce strange results from Asian files.
-
Most changes are just whitespace or line break or case changes. But: - this did find an actual error (variable exc was used before creation) - did move imports up between license and changelog (although I would prefer it in its original place) - removed the _ansi_ from read_*_ansi_string - move logging constants from main to global scope
-
Tell caller of script roughly what happened in call. Also: check whether given file arguments exist and return non-zero exit and remove print of non-existent __doc__
-
This way we do not have to keep a whole big office file in memory. (Olefile might do that, anyway, but then we have one copy less.) Also merge subfunction process_native_stream back into process_file (harder to read but makes more sense for exception handling)
-
Can parse both now from bytes array or stream
-
This is more efficient and simplifies generalization to using byte-streams instead of byte arrays as data input.
-
This way, oleobj can now handle office 2007+ types (docx, xlsx, pptx, and derivates). Since this adds another loop level into process_file, created own function for inner-most code part (the actual dumping).
-
This was not easy to do if we want to avoid having the complete embedded file in uncompressed form in memory. Had to create a stream around an iterable, kind of fun :-)