Peter M. Groen / oletools

17 Jan, 2018

40 commits

ppt_parser: add warning that this might be replaced ...

Want to discourage people working on ppt_parser, which would increase the
amount of code required to reprodcue in ppt_record_parser in order for it
to replace ppt_parser

authored

2018-01-17 15:43:38 +0100

Browse Code »

unittests: run pylint and pep8 on oleobj test
cb072e36

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
unittests: add more samples to oleobj test
46920be6

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »

oleobj: make sane filenames always ascii-only ...

ccbe0b23

Regular expression \w behaves differently in Python2 (matches only ascii)
and Python3 (matches all unicode word characters). Clarify that we only
want ascii in sanitized filenames.

authored

2018-01-17 15:07:33 +0100

Browse Code »

oleobj: unify closing of ole stream in error case
2c0f8847

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
oleobj: improve logging slightly
d2920ad4

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »

ppt_record_parser: ensure import is relative ...

5d7a6445

Strangest thing: this change was necessary for unittesting oleobj. Without
this, running python3.3 -m unittest tests.oleobj.test_basic resulted in:
AttributeError: 'module' object has no attribute 'oleobj' . That was a
rather unhelpful error message.

authored

2018-01-17 15:07:33 +0100

Browse Code »

oleobj: use ZipSubFile to allow OleFileIO to seek()
a5036230

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
oleobj: use absolute import to make py3-compatible
ea58877a

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
unittest: create unittests for ooxml.ZipSubFile
56b79d1e

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
ooxml: re-implement complete seek(); add attr closed ...
d4eb585e
```
OleFileIO requires a complete seek() and checks for closed attribute.

Also added some commented debug print commands to ZipSubFile
```
Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
oleobj: fix logging --> log and make it lazy where possible ...
a7d1050e
```
Also remove 1 exception from output and add a comment
```
Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
unittest: add 3 tests with 6 samples for oleobj
1ee956aa

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
oleobj: accept custom command line args for testing
9977c523

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
oleobj: upgrade from optparse to argparse
3f009e76

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »

oleobj: encode filenames/paths to unicode ...

471b141f

This make compatibility with py3 easier, but requires us to guess an
encoding. Should work fine for European-generated files, could produce
strange results from Asian files.

authored

2018-01-17 15:07:33 +0100

Browse Code »

oleobj: make pylint and pep8 happier ...

670d7075

Most changes are just whitespace or line break or case changes. But:
- this did find an actual error (variable exc was used before creation)
- did move imports up between license and changelog (although I would prefer
it in its original place)
- removed the _ansi_ from read_*_ansi_string
- move logging constants from main to global scope

authored

2018-01-17 15:07:33 +0100

Browse Code »

oleobj: vary shell status code if dumped or not / error occurred ...
1665aeea
```
Tell caller of script roughly what happened in call.

Also: check whether given file arguments exist and return non-zero exit
and remove print of non-existent __doc__
```
Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
oleobj: remember whether OleNativeStream data is stream/link
7680eb11

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »

oleobj: parse and dump from stream ...

c8a4b6a9

This way we do not have to keep a whole big office file in memory.
(Olefile might do that, anyway, but then we have one copy less.)

Also merge subfunction process_native_stream back into process_file
(harder to read but makes more sense for exception handling)

authored

2018-01-17 15:07:33 +0100

Browse Code »

oleobj: parse OleNativeStream and OleObject from stream ...
aa95f26a
```
Can parse both now from bytes array or stream
```
Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
oleobj: change data parsing to change index rather than data ...
dad20c2c
```
This is more efficient and simplifies generalization to using byte-streams
instead of byte arrays as data input.
```
Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »

oleobj: generalize "opening" of ole files to allow for other types ...

2b3f8d3e

This way, oleobj can now handle office 2007+ types (docx, xlsx, pptx, and
derivates).

Since this adds another loop level into process_file, created own function
for inner-most code part (the actual dumping).

authored

2018-01-17 15:07:30 +0100

Browse Code »

oleobj: add options -v and -i for compatibility with ripOLE
cc142ee3

Christian Herdtweck authored
2018-01-17 15:05:16 +0100
Browse Code »
xls_parser: fix "wrong" variable name
becb96f7

Christian Herdtweck authored
2018-01-17 15:00:18 +0100
Browse Code »
record_base: ensure streams are closed in iter_streams
217d6114

Christian Herdtweck authored
2018-01-17 15:00:18 +0100
Browse Code »
ppt_record_parser: pylint, pep8; fix history, add todo
de9f5e91

Christian Herdtweck authored
2018-01-17 15:00:18 +0100
Browse Code »

ppt_record_parser: provide OleFileIO from embedded files ...

79564711

This was not easy to do if we want to avoid having the complete embedded file
in uncompressed form in memory. Had to create a stream around an iterable,
kind of fun :-)

authored

2018-01-17 15:00:18 +0100

Browse Code »

record_base: simplify bugfixing by offering more verbosity
a93b2109

Christian Herdtweck authored
2018-01-17 15:00:18 +0100
Browse Code »
unittest: create tests for ppt_record_parser.is_ppt
faeb2aed

Christian Herdtweck authored
2018-01-17 15:00:18 +0100
Browse Code »
ppt_record_parser: create function is_ppt
5609051f

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
ppt_record_parser: move constants to top of file
8dc4854d

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
record_base: make pylint and pep8 happier
8be66d11

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
record_base: provide stream type constants from olefile
97035144

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
xls_parser: close stream after xlsb-parsing; update stream constructor
989ead6c

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
record_base: offer a OleRecordStream.close
cbbbfa23

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
xls_parser: fixup forgot rename parse-->finish_constructing
ef014417

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
ppt_record_parser: find and decompress embedded ole streams
acfb36b3

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
record_base: rename parse --> finish_constructing, more docu
38418c29

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »

ppt records: compensate wrong size in CurrentUserAtom ...

e90e0e5a

This compensates for an inconsistency that is probably just an error in
some ppt versions. The size attribute of the CurrentUserAtom "forgets"
about the optional unicode user name, which then creates strange data
behind the record (where nothing should be)

authored

2018-01-17 15:00:16 +0100

Browse Code »