Peter M. Groen / oletools

17 Jan, 2018

33 commits

oleobj: upgrade from optparse to argparse
3f009e76

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »

oleobj: encode filenames/paths to unicode ...

This make compatibility with py3 easier, but requires us to guess an
encoding. Should work fine for European-generated files, could produce
strange results from Asian files.

authored

2018-01-17 15:07:33 +0100

Browse Code »

oleobj: make pylint and pep8 happier ...

670d7075

Most changes are just whitespace or line break or case changes. But:
- this did find an actual error (variable exc was used before creation)
- did move imports up between license and changelog (although I would prefer
it in its original place)
- removed the _ansi_ from read_*_ansi_string
- move logging constants from main to global scope

authored

2018-01-17 15:07:33 +0100

Browse Code »

oleobj: vary shell status code if dumped or not / error occurred ...
1665aeea
```
Tell caller of script roughly what happened in call.

Also: check whether given file arguments exist and return non-zero exit
and remove print of non-existent __doc__
```
Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
oleobj: remember whether OleNativeStream data is stream/link
7680eb11

Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »

oleobj: parse and dump from stream ...

c8a4b6a9

This way we do not have to keep a whole big office file in memory.
(Olefile might do that, anyway, but then we have one copy less.)

Also merge subfunction process_native_stream back into process_file
(harder to read but makes more sense for exception handling)

authored

2018-01-17 15:07:33 +0100

Browse Code »

oleobj: parse OleNativeStream and OleObject from stream ...
aa95f26a
```
Can parse both now from bytes array or stream
```
Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »
oleobj: change data parsing to change index rather than data ...
dad20c2c
```
This is more efficient and simplifies generalization to using byte-streams
instead of byte arrays as data input.
```
Christian Herdtweck authored
2018-01-17 15:07:33 +0100
Browse Code »

oleobj: generalize "opening" of ole files to allow for other types ...

2b3f8d3e

This way, oleobj can now handle office 2007+ types (docx, xlsx, pptx, and
derivates).

Since this adds another loop level into process_file, created own function
for inner-most code part (the actual dumping).

authored

2018-01-17 15:07:30 +0100

Browse Code »

oleobj: add options -v and -i for compatibility with ripOLE
cc142ee3

Christian Herdtweck authored
2018-01-17 15:05:16 +0100
Browse Code »
xls_parser: fix "wrong" variable name
becb96f7

Christian Herdtweck authored
2018-01-17 15:00:18 +0100
Browse Code »
record_base: ensure streams are closed in iter_streams
217d6114

Christian Herdtweck authored
2018-01-17 15:00:18 +0100
Browse Code »
ppt_record_parser: pylint, pep8; fix history, add todo
de9f5e91

Christian Herdtweck authored
2018-01-17 15:00:18 +0100
Browse Code »

ppt_record_parser: provide OleFileIO from embedded files ...

79564711

This was not easy to do if we want to avoid having the complete embedded file
in uncompressed form in memory. Had to create a stream around an iterable,
kind of fun :-)

authored

2018-01-17 15:00:18 +0100

Browse Code »

record_base: simplify bugfixing by offering more verbosity
a93b2109

Christian Herdtweck authored
2018-01-17 15:00:18 +0100
Browse Code »
unittest: create tests for ppt_record_parser.is_ppt
faeb2aed

Christian Herdtweck authored
2018-01-17 15:00:18 +0100
Browse Code »
ppt_record_parser: create function is_ppt
5609051f

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
ppt_record_parser: move constants to top of file
8dc4854d

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
record_base: make pylint and pep8 happier
8be66d11

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
record_base: provide stream type constants from olefile
97035144

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
xls_parser: close stream after xlsb-parsing; update stream constructor
989ead6c

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
record_base: offer a OleRecordStream.close
cbbbfa23

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
xls_parser: fixup forgot rename parse-->finish_constructing
ef014417

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
ppt_record_parser: find and decompress embedded ole streams
acfb36b3

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »
record_base: rename parse --> finish_constructing, more docu
38418c29

Christian Herdtweck authored
2018-01-17 15:00:17 +0100
Browse Code »

ppt records: compensate wrong size in CurrentUserAtom ...

e90e0e5a

This compensates for an inconsistency that is probably just an error in
some ppt versions. The size attribute of the CurrentUserAtom "forgets"
about the optional unicode user name, which then creates strange data
behind the record (where nothing should be)

authored

2018-01-17 15:00:16 +0100

Browse Code »

record_base: make compatible with container substreams
470d0806

Christian Herdtweck authored
2018-01-17 15:00:16 +0100
Browse Code »
record_base: ignore [Document]SummaryInformation streams
5c9b328c

Christian Herdtweck authored
2018-01-17 15:00:16 +0100
Browse Code »
xls_parser: rename type to rec_type to make pylint happier
97990227

Christian Herdtweck authored
2018-01-17 15:00:16 +0100
Browse Code »
msodde: use new method name (get-->iter)_streams
f1c708ac

Christian Herdtweck authored
2018-01-17 15:00:16 +0100
Browse Code »

ppt_parser: create new alternative based on records ...

730c5088

Sofar, the ppt_parser is rather stupid, does not understand the structure
of the streams but just looks for a certain byte sequence anywhere in the
stream (search_* methods).

There was another attempt to understand and parse the stream structure
but that failed (parse_* methods).

Encouraged by xls_parser, that also parses the data as a series of
records, tried the same with ppt files and works nicely sofar. Might
be able to replace ppt_parser soon.

authored

2018-01-17 15:00:16 +0100

Browse Code »

ooxml: implement skipping data in ZipSubFile
3781f711

Christian Herdtweck authored
2018-01-17 15:00:16 +0100
Browse Code »
xls_parser: move code to new record_base for re-use with ppt files ...
d397edb5
```
Parsing through records seems to make sense. Try to repeat the same with
ppt files next. To avoid copy-and-paste, move code to be used by both to
common base record_base.py
```
Christian Herdtweck authored
2018-01-17 15:00:16 +0100
Browse Code »

11 Jan, 2018

1 commit

fixed issue #242 (apply unquote to fldSimple tags)
27dc5360

decalage2 authored
2018-01-11 11:43:26 +0100
Browse Code »

09 Jan, 2018

1 commit

Merge pull request #241 from christian-intra2net/dde-in-csv ...
95ca88d2
```
Dde in csv
```
Philippe Lagadec authored
2018-01-09 23:03:14 +0100
Browse Code »

05 Jan, 2018

5 commits

msodde: update doc, history and version
874a5105

Christian Herdtweck authored
2018-01-05 10:48:55 +0100
Browse Code »
unittests: make pylint and pep8 a bit happier ...
3977c68c
```
They actually found a few \ in strings I had overlooked
```
Christian Herdtweck authored
2018-01-05 10:44:10 +0100
Browse Code »
unittest: add simple csv file and test it
59a85138

Christian Herdtweck authored
2018-01-05 10:27:38 +0100
Browse Code »
unittest: add .csv to list of files to be ignored ...
4ac29b53
```
Replace #print(...) with DEBUG_FLAG and conditional print(...)
```
Christian Herdtweck authored
2018-01-05 10:27:38 +0100
Browse Code »
msodde: Wrap sys.stdout into unicode-encoder only in py2 ...
63546685
```
This is not necessary in python3
```
Christian Herdtweck authored
2018-01-05 10:27:38 +0100
Browse Code »