Commit 0eba3885306add74959e251612e285bd523aabc4
1 parent
e8b75aad
updated OleFileIO_PL to v0.26
Showing
4 changed files
with
487 additions
and
90 deletions
oletools/oleid.py
| ... | ... | @@ -47,6 +47,7 @@ __version__ = '0.01' |
| 47 | 47 | |
| 48 | 48 | #------------------------------------------------------------------------------ |
| 49 | 49 | # TODO: |
| 50 | +# + extract relevant metadata: codepage, author, application, timestamps, etc | |
| 50 | 51 | # - detect RTF and OpenXML |
| 51 | 52 | # - fragmentation |
| 52 | 53 | # - OLE package | ... | ... |
oletools/thirdparty/OleFileIO_PL/LICENSE.txt
| ... | ... | @@ -4,7 +4,7 @@ LICENSE for the OleFileIO_PL module: |
| 4 | 4 | OleFileIO_PL is an improved version of the OleFileIO module from the |
| 5 | 5 | Python Imaging Library (PIL). |
| 6 | 6 | |
| 7 | -OleFileIO_PL changes are Copyright (c) 2005-2012 by Philippe Lagadec | |
| 7 | +OleFileIO_PL changes are Copyright (c) 2005-2013 by Philippe Lagadec | |
| 8 | 8 | |
| 9 | 9 | The Python Imaging Library (PIL) is |
| 10 | 10 | Copyright (c) 1997-2005 by Secret Labs AB | ... | ... |
oletools/thirdparty/OleFileIO_PL/OleFileIO_PL.py
| ... | ... | @@ -6,7 +6,7 @@ OleFileIO_PL: |
| 6 | 6 | Microsoft Compound Document File Format), such as Microsoft Office |
| 7 | 7 | documents, Image Composer and FlashPix files, Outlook messages, ... |
| 8 | 8 | |
| 9 | -version 0.24 2012-09-18 Philippe Lagadec - http://www.decalage.info | |
| 9 | +version 0.26 2013-07-24 Philippe Lagadec - http://www.decalage.info | |
| 10 | 10 | |
| 11 | 11 | Project website: http://www.decalage.info/python/olefileio |
| 12 | 12 | |
| ... | ... | @@ -16,23 +16,23 @@ See: http://www.pythonware.com/products/pil/index.htm |
| 16 | 16 | The Python Imaging Library (PIL) is |
| 17 | 17 | Copyright (c) 1997-2005 by Secret Labs AB |
| 18 | 18 | Copyright (c) 1995-2005 by Fredrik Lundh |
| 19 | -OleFileIO_PL changes are Copyright (c) 2005-2012 by Philippe Lagadec | |
| 19 | +OleFileIO_PL changes are Copyright (c) 2005-2013 by Philippe Lagadec | |
| 20 | 20 | |
| 21 | 21 | See source code and LICENSE.txt for information on usage and redistribution. |
| 22 | 22 | |
| 23 | 23 | WARNING: THIS IS (STILL) WORK IN PROGRESS. |
| 24 | 24 | """ |
| 25 | 25 | |
| 26 | -__author__ = "Fredrik Lundh (Secret Labs AB), Philippe Lagadec" | |
| 27 | -__date__ = "2012-09-18" | |
| 28 | -__version__ = '0.24' | |
| 26 | +__author__ = "Philippe Lagadec, Fredrik Lundh (Secret Labs AB)" | |
| 27 | +__date__ = "2013-07-24" | |
| 28 | +__version__ = '0.26' | |
| 29 | 29 | |
| 30 | 30 | #--- LICENSE ------------------------------------------------------------------ |
| 31 | 31 | |
| 32 | 32 | # OleFileIO_PL is an improved version of the OleFileIO module from the |
| 33 | 33 | # Python Imaging Library (PIL). |
| 34 | 34 | |
| 35 | -# OleFileIO_PL changes are Copyright (c) 2005-2012 by Philippe Lagadec | |
| 35 | +# OleFileIO_PL changes are Copyright (c) 2005-2013 by Philippe Lagadec | |
| 36 | 36 | # |
| 37 | 37 | # The Python Imaging Library (PIL) is |
| 38 | 38 | # Copyright (c) 1997-2005 by Secret Labs AB |
| ... | ... | @@ -110,7 +110,27 @@ __version__ = '0.24' |
| 110 | 110 | # (https://bitbucket.org/decalage/olefileio_pl/issue/7) |
| 111 | 111 | # - added close method to OleFileIO (fixed issue #2) |
| 112 | 112 | # 2012-07-25 v0.23 PL: - added support for file-like objects (patch by mete0r_kr) |
| 113 | - | |
| 113 | +# 2013-05-05 v0.24 PL: - getproperties: added conversion from filetime to python | |
| 114 | +# datetime | |
| 115 | +# - main: displays properties with date format | |
| 116 | +# - new class OleMetadata to parse standard properties | |
| 117 | +# - added get_metadata method | |
| 118 | +# 2013-05-07 v0.24 PL: - a few improvements in OleMetadata | |
| 119 | +# 2013-05-24 v0.25 PL: - getproperties: option to not convert some timestamps | |
| 120 | +# - OleMetaData: total_edit_time is now a number of seconds, | |
| 121 | +# not a timestamp | |
| 122 | +# - getproperties: added support for VT_BOOL, VT_INT, V_UINT | |
| 123 | +# - getproperties: filter out null chars from strings | |
| 124 | +# - getproperties: raise non-fatal defects instead of | |
| 125 | +# exceptions when properties cannot be parsed properly | |
| 126 | +# 2013-05-27 PL: - getproperties: improved exception handling | |
| 127 | +# - _raise_defect: added option to set exception type | |
| 128 | +# - all non-fatal issues are now recorded, and displayed | |
| 129 | +# when run as a script | |
| 130 | +# 2013-07-11 v0.26 PL: - added methods to get modification and creation times | |
| 131 | +# of a directory entry or a storage/stream | |
| 132 | +# - fixed parsing of direntry timestamps | |
| 133 | +# 2013-07-24 PL: - new options in listdir to list storages and/or streams | |
| 114 | 134 | |
| 115 | 135 | #----------------------------------------------------------------------------- |
| 116 | 136 | # TODO (for version 1.0): |
| ... | ... | @@ -132,11 +152,10 @@ __version__ = '0.24' |
| 132 | 152 | # - improve docstrings to show more sample uses |
| 133 | 153 | # - see also original notes and FIXME below |
| 134 | 154 | # - remove all obsolete FIXMEs |
| 155 | +# - OleMetadata: fix version attrib according to | |
| 156 | +# http://msdn.microsoft.com/en-us/library/dd945671%28v=office.12%29.aspx | |
| 135 | 157 | |
| 136 | 158 | # IDEAS: |
| 137 | -# - allow _raise_defect to raise different exceptions, not only IOError | |
| 138 | -# - provide a class with named attributes to get well-known properties of | |
| 139 | -# MS Office documents (title, author, ...) ? | |
| 140 | 159 | # - in OleFileIO._open and _OleStream, use size=None instead of 0x7FFFFFFF for |
| 141 | 160 | # streams with unknown size |
| 142 | 161 | # - use arrays of int instead of long integers for FAT/MiniFAT, to improve |
| ... | ... | @@ -199,7 +218,7 @@ __version__ = '0.24' |
| 199 | 218 | |
| 200 | 219 | #------------------------------------------------------------------------------ |
| 201 | 220 | |
| 202 | -import string, StringIO, struct, array, os.path, sys | |
| 221 | +import string, StringIO, struct, array, os.path, sys, datetime | |
| 203 | 222 | |
| 204 | 223 | #[PL] Define explicitly the public API to avoid private objects in pydoc: |
| 205 | 224 | __all__ = ['OleFileIO', 'isOleFile'] |
| ... | ... | @@ -421,10 +440,172 @@ except NameError: |
| 421 | 440 | return filter(ord, s) |
| 422 | 441 | |
| 423 | 442 | |
| 443 | +def filetime2datetime(filetime): | |
| 444 | + """ | |
| 445 | + convert FILETIME (64 bits int) to Python datetime.datetime | |
| 446 | + """ | |
| 447 | + # TODO: manage exception when microseconds is too large | |
| 448 | + # inspired from http://code.activestate.com/recipes/511425-filetime-to-datetime/ | |
| 449 | + _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0) | |
| 450 | + #debug('timedelta days=%d' % (filetime/(10*1000000*3600*24))) | |
| 451 | + return _FILETIME_null_date + datetime.timedelta(microseconds=filetime/10) | |
| 452 | + | |
| 424 | 453 | |
| 425 | 454 | |
| 426 | 455 | #=== CLASSES ================================================================== |
| 427 | 456 | |
| 457 | +class OleMetadata: | |
| 458 | + """ | |
| 459 | + class to parse and store metadata from standard properties of OLE files. | |
| 460 | + | |
| 461 | + Available attributes: | |
| 462 | + codepage, title, subject, author, keywords, comments, template, | |
| 463 | + last_saved_by, revision_number, total_edit_time, last_printed, create_time, | |
| 464 | + last_saved_time, num_pages, num_words, num_chars, thumbnail, | |
| 465 | + creating_application, security, codepage_doc, category, presentation_target, | |
| 466 | + bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips, | |
| 467 | + scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty, | |
| 468 | + chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed, | |
| 469 | + version, dig_sig, content_type, content_status, language, doc_version | |
| 470 | + | |
| 471 | + Note: an attribute is set to None when not present in the properties of the | |
| 472 | + OLE file. | |
| 473 | + | |
| 474 | + References for SummaryInformation stream: | |
| 475 | + - http://msdn.microsoft.com/en-us/library/dd942545.aspx | |
| 476 | + - http://msdn.microsoft.com/en-us/library/dd925819%28v=office.12%29.aspx | |
| 477 | + - http://msdn.microsoft.com/en-us/library/windows/desktop/aa380376%28v=vs.85%29.aspx | |
| 478 | + - http://msdn.microsoft.com/en-us/library/aa372045.aspx | |
| 479 | + - http://sedna-soft.de/summary-information-stream/ | |
| 480 | + - http://poi.apache.org/apidocs/org/apache/poi/hpsf/SummaryInformation.html | |
| 481 | + | |
| 482 | + References for DocumentSummaryInformation stream: | |
| 483 | + - http://msdn.microsoft.com/en-us/library/dd945671%28v=office.12%29.aspx | |
| 484 | + - http://msdn.microsoft.com/en-us/library/windows/desktop/aa380374%28v=vs.85%29.aspx | |
| 485 | + - http://poi.apache.org/apidocs/org/apache/poi/hpsf/DocumentSummaryInformation.html | |
| 486 | + | |
| 487 | + new in version 0.25 | |
| 488 | + """ | |
| 489 | + | |
| 490 | + # attribute names for SummaryInformation stream properties: | |
| 491 | + # (ordered by property id, starting at 1) | |
| 492 | + SUMMARY_ATTRIBS = ['codepage', 'title', 'subject', 'author', 'keywords', 'comments', | |
| 493 | + 'template', 'last_saved_by', 'revision_number', 'total_edit_time', | |
| 494 | + 'last_printed', 'create_time', 'last_saved_time', 'num_pages', | |
| 495 | + 'num_words', 'num_chars', 'thumbnail', 'creating_application', | |
| 496 | + 'security'] | |
| 497 | + | |
| 498 | + # attribute names for DocumentSummaryInformation stream properties: | |
| 499 | + # (ordered by property id, starting at 1) | |
| 500 | + DOCSUM_ATTRIBS = ['codepage_doc', 'category', 'presentation_target', 'bytes', 'lines', 'paragraphs', | |
| 501 | + 'slides', 'notes', 'hidden_slides', 'mm_clips', | |
| 502 | + 'scale_crop', 'heading_pairs', 'titles_of_parts', 'manager', | |
| 503 | + 'company', 'links_dirty', 'chars_with_spaces', 'unused', 'shared_doc', | |
| 504 | + 'link_base', 'hlinks', 'hlinks_changed', 'version', 'dig_sig', | |
| 505 | + 'content_type', 'content_status', 'language', 'doc_version'] | |
| 506 | + | |
| 507 | + def __init__(self): | |
| 508 | + """ | |
| 509 | + Constructor for OleMetadata | |
| 510 | + All attributes are set to None by default | |
| 511 | + """ | |
| 512 | + # properties from SummaryInformation stream | |
| 513 | + self.codepage = None | |
| 514 | + self.title = None | |
| 515 | + self.subject = None | |
| 516 | + self.author = None | |
| 517 | + self.keywords = None | |
| 518 | + self.comments = None | |
| 519 | + self.template = None | |
| 520 | + self.last_saved_by = None | |
| 521 | + self.revision_number = None | |
| 522 | + self.total_edit_time = None | |
| 523 | + self.last_printed = None | |
| 524 | + self.create_time = None | |
| 525 | + self.last_saved_time = None | |
| 526 | + self.num_pages = None | |
| 527 | + self.num_words = None | |
| 528 | + self.num_chars = None | |
| 529 | + self.thumbnail = None | |
| 530 | + self.creating_application = None | |
| 531 | + self.security = None | |
| 532 | + # properties from DocumentSummaryInformation stream | |
| 533 | + self.codepage_doc = None | |
| 534 | + self.category = None | |
| 535 | + self.presentation_target = None | |
| 536 | + self.bytes = None | |
| 537 | + self.lines = None | |
| 538 | + self.paragraphs = None | |
| 539 | + self.slides = None | |
| 540 | + self.notes = None | |
| 541 | + self.hidden_slides = None | |
| 542 | + self.mm_clips = None | |
| 543 | + self.scale_crop = None | |
| 544 | + self.heading_pairs = None | |
| 545 | + self.titles_of_parts = None | |
| 546 | + self.manager = None | |
| 547 | + self.company = None | |
| 548 | + self.links_dirty = None | |
| 549 | + self.chars_with_spaces = None | |
| 550 | + self.unused = None | |
| 551 | + self.shared_doc = None | |
| 552 | + self.link_base = None | |
| 553 | + self.hlinks = None | |
| 554 | + self.hlinks_changed = None | |
| 555 | + self.version = None | |
| 556 | + self.dig_sig = None | |
| 557 | + self.content_type = None | |
| 558 | + self.content_status = None | |
| 559 | + self.language = None | |
| 560 | + self.doc_version = None | |
| 561 | + | |
| 562 | + | |
| 563 | + def parse_properties(self, olefile): | |
| 564 | + """ | |
| 565 | + Parse standard properties of an OLE file, from the streams | |
| 566 | + "\x05SummaryInformation" and "\x05DocumentSummaryInformation", | |
| 567 | + if present. | |
| 568 | + Properties are converted to strings, integers or python datetime objects. | |
| 569 | + If a property is not present, its value is set to None. | |
| 570 | + """ | |
| 571 | + # first set all attributes to None: | |
| 572 | + for attrib in (self.SUMMARY_ATTRIBS + self.DOCSUM_ATTRIBS): | |
| 573 | + setattr(self, attrib, None) | |
| 574 | + if olefile.exists("\x05SummaryInformation"): | |
| 575 | + # get properties from the stream: | |
| 576 | + # (converting timestamps to python datetime, except total_edit_time, | |
| 577 | + # which is property #10) | |
| 578 | + props = olefile.getproperties("\x05SummaryInformation", | |
| 579 | + convert_time=True, no_conversion=[10]) | |
| 580 | + # store them into this object's attributes: | |
| 581 | + for i in range(len(self.SUMMARY_ATTRIBS)): | |
| 582 | + # ids for standards properties start at 0x01, until 0x13 | |
| 583 | + value = props.get(i+1, None) | |
| 584 | + setattr(self, self.SUMMARY_ATTRIBS[i], value) | |
| 585 | + if olefile.exists("\x05DocumentSummaryInformation"): | |
| 586 | + # get properties from the stream: | |
| 587 | + props = olefile.getproperties("\x05DocumentSummaryInformation", | |
| 588 | + convert_time=True) | |
| 589 | + # store them into this object's attributes: | |
| 590 | + for i in range(len(self.DOCSUM_ATTRIBS)): | |
| 591 | + # ids for standards properties start at 0x01, until 0x13 | |
| 592 | + value = props.get(i+1, None) | |
| 593 | + setattr(self, self.DOCSUM_ATTRIBS[i], value) | |
| 594 | + | |
| 595 | + def dump(self): | |
| 596 | + """ | |
| 597 | + Dump all metadata, for debugging purposes. | |
| 598 | + """ | |
| 599 | + print 'Properties from SummaryInformation stream:' | |
| 600 | + for prop in self.SUMMARY_ATTRIBS: | |
| 601 | + value = getattr(self, prop) | |
| 602 | + print '- %s: %s' % (prop, repr(value)) | |
| 603 | + print 'Properties from DocumentSummaryInformation stream:' | |
| 604 | + for prop in self.DOCSUM_ATTRIBS: | |
| 605 | + value = getattr(self, prop) | |
| 606 | + print '- %s: %s' % (prop, repr(value)) | |
| 607 | + | |
| 608 | + | |
| 428 | 609 | #--- _OleStream --------------------------------------------------------------- |
| 429 | 610 | |
| 430 | 611 | class _OleStream(StringIO.StringIO): |
| ... | ... | @@ -566,7 +747,8 @@ class _OleDirectoryEntry: |
| 566 | 747 | #[PL] parsing code moved from OleFileIO.loaddirectory |
| 567 | 748 | |
| 568 | 749 | # struct to parse directory entries: |
| 569 | - # <: little-endian byte order | |
| 750 | + # <: little-endian byte order, standard sizes | |
| 751 | + # (note: this should guarantee that Q returns a 64 bits int) | |
| 570 | 752 | # 64s: string containing entry name in unicode (max 31 chars) + null char |
| 571 | 753 | # H: uint16, number of bytes used in name buffer, including null = (len+1)*2 |
| 572 | 754 | # B: uint8, dir entry type (between 0 and 5) |
| ... | ... | @@ -576,13 +758,13 @@ class _OleDirectoryEntry: |
| 576 | 758 | # I: uint32, index of child root node if it is a storage, else NOSTREAM |
| 577 | 759 | # 16s: CLSID, unique identifier (only used if it is a storage) |
| 578 | 760 | # I: uint32, user flags |
| 579 | - # 8s: uint64, creation timestamp or zero | |
| 580 | - # 8s: uint64, modification timestamp or zero | |
| 761 | + # Q (was 8s): uint64, creation timestamp or zero | |
| 762 | + # Q (was 8s): uint64, modification timestamp or zero | |
| 581 | 763 | # I: uint32, SID of first sector if stream or ministream, SID of 1st sector |
| 582 | 764 | # of stream containing ministreams if root entry, 0 otherwise |
| 583 | 765 | # I: uint32, total stream size in bytes if stream (low 32 bits), 0 otherwise |
| 584 | 766 | # I: uint32, total stream size in bytes if stream (high 32 bits), 0 otherwise |
| 585 | - STRUCT_DIRENTRY = '<64sHBBIII16sI8s8sIII' | |
| 767 | + STRUCT_DIRENTRY = '<64sHBBIII16sIQQIII' | |
| 586 | 768 | # size of a directory entry: 128 bytes |
| 587 | 769 | DIRENTRY_SIZE = 128 |
| 588 | 770 | assert struct.calcsize(STRUCT_DIRENTRY) == DIRENTRY_SIZE |
| ... | ... | @@ -772,6 +954,34 @@ class _OleDirectoryEntry: |
| 772 | 954 | kid.dump(tab + 2) |
| 773 | 955 | |
| 774 | 956 | |
| 957 | + def getmtime(self): | |
| 958 | + """ | |
| 959 | + Return modification time of a directory entry. | |
| 960 | + | |
| 961 | + return: None if modification time is null, a python datetime object | |
| 962 | + otherwise (UTC timezone) | |
| 963 | + | |
| 964 | + new in version 0.26 | |
| 965 | + """ | |
| 966 | + if self.modifyTime == 0: | |
| 967 | + return None | |
| 968 | + return filetime2datetime(self.modifyTime) | |
| 969 | + | |
| 970 | + | |
| 971 | + def getctime(self): | |
| 972 | + """ | |
| 973 | + Return creation time of a directory entry. | |
| 974 | + | |
| 975 | + return: None if modification time is null, a python datetime object | |
| 976 | + otherwise (UTC timezone) | |
| 977 | + | |
| 978 | + new in version 0.26 | |
| 979 | + """ | |
| 980 | + if self.createTime == 0: | |
| 981 | + return None | |
| 982 | + return filetime2datetime(self.createTime) | |
| 983 | + | |
| 984 | + | |
| 775 | 985 | #--- OleFileIO ---------------------------------------------------------------- |
| 776 | 986 | |
| 777 | 987 | class OleFileIO: |
| ... | ... | @@ -812,12 +1022,16 @@ class OleFileIO: |
| 812 | 1022 | (use DEFECT_FATAL for a typical application, DEFECT_INCORRECT for a |
| 813 | 1023 | security-oriented application, see source code for details) |
| 814 | 1024 | """ |
| 1025 | + # minimal level for defects to be raised as exceptions: | |
| 815 | 1026 | self._raise_defects_level = raise_defects |
| 1027 | + # list of defects/issues not raised as exceptions: | |
| 1028 | + # tuples of (exception type, message) | |
| 1029 | + self.parsing_issues = [] | |
| 816 | 1030 | if filename: |
| 817 | 1031 | self.open(filename) |
| 818 | 1032 | |
| 819 | 1033 | |
| 820 | - def _raise_defect(self, defect_level, message): | |
| 1034 | + def _raise_defect(self, defect_level, message, exception_type=IOError): | |
| 821 | 1035 | """ |
| 822 | 1036 | This method should be called for any defect found during file parsing. |
| 823 | 1037 | It may raise an IOError exception according to the minimal level chosen |
| ... | ... | @@ -829,10 +1043,14 @@ class OleFileIO: |
| 829 | 1043 | DEFECT_INCORRECT : an error according to specifications, but parsing can go on |
| 830 | 1044 | DEFECT_FATAL : an error which cannot be ignored, parsing is impossible |
| 831 | 1045 | message: string describing the defect, used with raised exception. |
| 1046 | + exception_type: exception class to be raised, IOError by default | |
| 832 | 1047 | """ |
| 833 | 1048 | # added by [PL] |
| 834 | 1049 | if defect_level >= self._raise_defects_level: |
| 835 | - raise IOError, message | |
| 1050 | + raise exception_type, message | |
| 1051 | + else: | |
| 1052 | + # just record the issue, no exception raised: | |
| 1053 | + self.parsing_issues.append((exception_type, message)) | |
| 836 | 1054 | |
| 837 | 1055 | |
| 838 | 1056 | def open(self, filename): |
| ... | ... | @@ -1378,27 +1596,42 @@ class OleFileIO: |
| 1378 | 1596 | self.sectorsize, self.fat, self._filesize) |
| 1379 | 1597 | |
| 1380 | 1598 | |
| 1381 | - def _list(self, files, prefix, node): | |
| 1599 | + def _list(self, files, prefix, node, streams=True, storages=False): | |
| 1382 | 1600 | """ |
| 1383 | 1601 | (listdir helper) |
| 1384 | 1602 | files: list of files to fill in |
| 1385 | 1603 | prefix: current location in storage tree (list of names) |
| 1386 | 1604 | node: current node (_OleDirectoryEntry object) |
| 1605 | + streams: bool, include streams if True (True by default) - new in v0.26 | |
| 1606 | + storages: bool, include storages if True (False by default) - new in v0.26 | |
| 1607 | + (note: the root storage is never included) | |
| 1387 | 1608 | """ |
| 1388 | 1609 | prefix = prefix + [node.name] |
| 1389 | 1610 | for entry in node.kids: |
| 1390 | 1611 | if entry.kids: |
| 1391 | - self._list(files, prefix, entry) | |
| 1612 | + # this is a storage | |
| 1613 | + if storages: | |
| 1614 | + # add it to the list | |
| 1615 | + files.append(prefix[1:] + [entry.name]) | |
| 1616 | + # check its kids | |
| 1617 | + self._list(files, prefix, entry, streams, storages) | |
| 1392 | 1618 | else: |
| 1393 | - files.append(prefix[1:] + [entry.name]) | |
| 1619 | + # this is a stream | |
| 1620 | + if streams: | |
| 1621 | + # add it to the list | |
| 1622 | + files.append(prefix[1:] + [entry.name]) | |
| 1394 | 1623 | |
| 1395 | 1624 | |
| 1396 | - def listdir(self): | |
| 1625 | + def listdir(self, streams=True, storages=False): | |
| 1397 | 1626 | """ |
| 1398 | 1627 | Return a list of streams stored in this file |
| 1628 | + | |
| 1629 | + streams: bool, include streams if True (True by default) - new in v0.26 | |
| 1630 | + storages: bool, include storages if True (False by default) - new in v0.26 | |
| 1631 | + (note: the root storage is never included) | |
| 1399 | 1632 | """ |
| 1400 | 1633 | files = [] |
| 1401 | - self._list(files, [], self.root) | |
| 1634 | + self._list(files, [], self.root, streams, storages) | |
| 1402 | 1635 | return files |
| 1403 | 1636 | |
| 1404 | 1637 | |
| ... | ... | @@ -1470,6 +1703,38 @@ class OleFileIO: |
| 1470 | 1703 | return False |
| 1471 | 1704 | |
| 1472 | 1705 | |
| 1706 | + def getmtime(self, filename): | |
| 1707 | + """ | |
| 1708 | + Return modification time of a stream/storage. | |
| 1709 | + | |
| 1710 | + filename: path of stream/storage in storage tree. (see openstream for | |
| 1711 | + syntax) | |
| 1712 | + return: None if modification time is null, a python datetime object | |
| 1713 | + otherwise (UTC timezone) | |
| 1714 | + | |
| 1715 | + new in version 0.26 | |
| 1716 | + """ | |
| 1717 | + sid = self._find(filename) | |
| 1718 | + entry = self.direntries[sid] | |
| 1719 | + return entry.getmtime() | |
| 1720 | + | |
| 1721 | + | |
| 1722 | + def getctime(self, filename): | |
| 1723 | + """ | |
| 1724 | + Return creation time of a stream/storage. | |
| 1725 | + | |
| 1726 | + filename: path of stream/storage in storage tree. (see openstream for | |
| 1727 | + syntax) | |
| 1728 | + return: None if creation time is null, a python datetime object | |
| 1729 | + otherwise (UTC timezone) | |
| 1730 | + | |
| 1731 | + new in version 0.26 | |
| 1732 | + """ | |
| 1733 | + sid = self._find(filename) | |
| 1734 | + entry = self.direntries[sid] | |
| 1735 | + return entry.getctime() | |
| 1736 | + | |
| 1737 | + | |
| 1473 | 1738 | def exists(self, filename): |
| 1474 | 1739 | """ |
| 1475 | 1740 | Test if given filename exists as a stream or a storage in the OLE |
| ... | ... | @@ -1509,83 +1774,167 @@ class OleFileIO: |
| 1509 | 1774 | return self.root.name |
| 1510 | 1775 | |
| 1511 | 1776 | |
| 1512 | - def getproperties(self, filename): | |
| 1777 | + def getproperties(self, filename, convert_time=False, no_conversion=None): | |
| 1513 | 1778 | """ |
| 1514 | 1779 | Return properties described in substream. |
| 1515 | 1780 | |
| 1516 | 1781 | filename: path of stream in storage tree (see openstream for syntax) |
| 1782 | + convert_time: bool, if True timestamps will be converted to Python datetime | |
| 1783 | + no_conversion: None or list of int, timestamps not to be converted | |
| 1784 | + (for example total editing time is not a real timestamp) | |
| 1517 | 1785 | return: a dictionary of values indexed by id (integer) |
| 1518 | 1786 | """ |
| 1787 | + # make sure no_conversion is a list, just to simplify code below: | |
| 1788 | + if no_conversion == None: | |
| 1789 | + no_conversion = [] | |
| 1790 | + # stream path as a string to report exceptions: | |
| 1791 | + streampath = filename | |
| 1792 | + if not isinstance(streampath, str): | |
| 1793 | + streampath = '/'.join(streampath) | |
| 1794 | + | |
| 1519 | 1795 | fp = self.openstream(filename) |
| 1520 | 1796 | |
| 1521 | 1797 | data = {} |
| 1522 | 1798 | |
| 1523 | - # header | |
| 1524 | - s = fp.read(28) | |
| 1525 | - clsid = _clsid(s[8:24]) | |
| 1526 | - | |
| 1527 | - # format id | |
| 1528 | - s = fp.read(20) | |
| 1529 | - fmtid = _clsid(s[:16]) | |
| 1530 | - fp.seek(i32(s, 16)) | |
| 1531 | - | |
| 1532 | - # get section | |
| 1533 | - s = "****" + fp.read(i32(fp.read(4))-4) | |
| 1534 | - | |
| 1535 | - for i in range(i32(s, 4)): | |
| 1536 | - | |
| 1537 | - id = i32(s, 8+i*8) | |
| 1538 | - offset = i32(s, 12+i*8) | |
| 1539 | - type = i32(s, offset) | |
| 1540 | - | |
| 1541 | - debug ('property id=%d: type=%d offset=%X' % (id, type, offset)) | |
| 1542 | - | |
| 1543 | - # test for common types first (should perhaps use | |
| 1544 | - # a dictionary instead?) | |
| 1545 | - | |
| 1546 | - if type == VT_I2: | |
| 1547 | - value = i16(s, offset+4) | |
| 1548 | - if value >= 32768: | |
| 1549 | - value = value - 65536 | |
| 1550 | - elif type == VT_UI2: | |
| 1551 | - value = i16(s, offset+4) | |
| 1552 | - elif type in (VT_I4, VT_ERROR): | |
| 1553 | - value = i32(s, offset+4) | |
| 1554 | - elif type == VT_UI4: | |
| 1555 | - value = i32(s, offset+4) # FIXME | |
| 1556 | - elif type in (VT_BSTR, VT_LPSTR): | |
| 1557 | - count = i32(s, offset+4) | |
| 1558 | - value = s[offset+8:offset+8+count-1] | |
| 1559 | - elif type == VT_BLOB: | |
| 1560 | - count = i32(s, offset+4) | |
| 1561 | - value = s[offset+8:offset+8+count] | |
| 1562 | - elif type == VT_LPWSTR: | |
| 1563 | - count = i32(s, offset+4) | |
| 1564 | - value = _unicode(s[offset+8:offset+8+count*2]) | |
| 1565 | - elif type == VT_FILETIME: | |
| 1566 | - value = long(i32(s, offset+4)) + (long(i32(s, offset+8))<<32) | |
| 1567 | - # FIXME: this is a 64-bit int: "number of 100ns periods | |
| 1568 | - # since Jan 1,1601". Should map this to Python time | |
| 1569 | - value = value / 10000000L # seconds | |
| 1570 | - elif type == VT_UI1: | |
| 1571 | - value = ord(s[offset+4]) | |
| 1572 | - elif type == VT_CLSID: | |
| 1573 | - value = _clsid(s[offset+4:offset+20]) | |
| 1574 | - elif type == VT_CF: | |
| 1575 | - count = i32(s, offset+4) | |
| 1576 | - value = s[offset+8:offset+8+count] | |
| 1577 | - else: | |
| 1578 | - value = None # everything else yields "None" | |
| 1799 | + try: | |
| 1800 | + # header | |
| 1801 | + s = fp.read(28) | |
| 1802 | + clsid = _clsid(s[8:24]) | |
| 1803 | + | |
| 1804 | + # format id | |
| 1805 | + s = fp.read(20) | |
| 1806 | + fmtid = _clsid(s[:16]) | |
| 1807 | + fp.seek(i32(s, 16)) | |
| 1808 | + | |
| 1809 | + # get section | |
| 1810 | + s = "****" + fp.read(i32(fp.read(4))-4) | |
| 1811 | + # number of properties: | |
| 1812 | + num_props = i32(s, 4) | |
| 1813 | + except: | |
| 1814 | + # catch exception while parsing property header, and only raise | |
| 1815 | + # a DEFECT_INCORRECT then return an empty dict, because this is not | |
| 1816 | + # a fatal error when parsing the whole file | |
| 1817 | + exctype, excvalue = sys.exc_info()[:2] | |
| 1818 | + msg = 'Error while parsing properties header in stream %s: %s' % ( | |
| 1819 | + repr(streampath), excvalue) | |
| 1820 | + self._raise_defect(DEFECT_INCORRECT, msg, exctype) | |
| 1821 | + return data | |
| 1822 | + | |
| 1823 | + for i in range(num_props): | |
| 1824 | + try: | |
| 1825 | + id = 0 # just in case of an exception | |
| 1826 | + id = i32(s, 8+i*8) | |
| 1827 | + offset = i32(s, 12+i*8) | |
| 1828 | + type = i32(s, offset) | |
| 1829 | + | |
| 1830 | + debug ('property id=%d: type=%d offset=%X' % (id, type, offset)) | |
| 1831 | + | |
| 1832 | + # test for common types first (should perhaps use | |
| 1833 | + # a dictionary instead?) | |
| 1834 | + | |
| 1835 | + if type == VT_I2: # 16-bit signed integer | |
| 1836 | + value = i16(s, offset+4) | |
| 1837 | + if value >= 32768: | |
| 1838 | + value = value - 65536 | |
| 1839 | + elif type == VT_UI2: # 2-byte unsigned integer | |
| 1840 | + value = i16(s, offset+4) | |
| 1841 | + elif type in (VT_I4, VT_INT, VT_ERROR): | |
| 1842 | + # VT_I4: 32-bit signed integer | |
| 1843 | + # VT_ERROR: HRESULT, similar to 32-bit signed integer, | |
| 1844 | + # see http://msdn.microsoft.com/en-us/library/cc230330.aspx | |
| 1845 | + value = i32(s, offset+4) | |
| 1846 | + elif type in (VT_UI4, VT_UINT): # 4-byte unsigned integer | |
| 1847 | + value = i32(s, offset+4) # FIXME | |
| 1848 | + elif type in (VT_BSTR, VT_LPSTR): | |
| 1849 | + # CodePageString, see http://msdn.microsoft.com/en-us/library/dd942354.aspx | |
| 1850 | + # size is a 32 bits integer, including the null terminator, and | |
| 1851 | + # possibly trailing or embedded null chars | |
| 1852 | + #TODO: if codepage is unicode, the string should be converted as such | |
| 1853 | + count = i32(s, offset+4) | |
| 1854 | + value = s[offset+8:offset+8+count-1] | |
| 1855 | + # remove all null chars: | |
| 1856 | + value = value.replace('\x00', '') | |
| 1857 | + elif type == VT_BLOB: | |
| 1858 | + # binary large object (BLOB) | |
| 1859 | + # see http://msdn.microsoft.com/en-us/library/dd942282.aspx | |
| 1860 | + count = i32(s, offset+4) | |
| 1861 | + value = s[offset+8:offset+8+count] | |
| 1862 | + elif type == VT_LPWSTR: | |
| 1863 | + # UnicodeString | |
| 1864 | + # see http://msdn.microsoft.com/en-us/library/dd942313.aspx | |
| 1865 | + # "the string should NOT contain embedded or additional trailing | |
| 1866 | + # null characters." | |
| 1867 | + count = i32(s, offset+4) | |
| 1868 | + value = _unicode(s[offset+8:offset+8+count*2]) | |
| 1869 | + elif type == VT_FILETIME: | |
| 1870 | + value = long(i32(s, offset+4)) + (long(i32(s, offset+8))<<32) | |
| 1871 | + # FILETIME is a 64-bit int: "number of 100ns periods | |
| 1872 | + # since Jan 1,1601". | |
| 1873 | + if convert_time and id not in no_conversion: | |
| 1874 | + debug('Converting property #%d to python datetime, value=%d=%fs' | |
| 1875 | + %(id, value, float(value)/10000000L)) | |
| 1876 | + # convert FILETIME to Python datetime.datetime | |
| 1877 | + # inspired from http://code.activestate.com/recipes/511425-filetime-to-datetime/ | |
| 1878 | + _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0) | |
| 1879 | + debug('timedelta days=%d' % (value/(10*1000000*3600*24))) | |
| 1880 | + value = _FILETIME_null_date + datetime.timedelta(microseconds=value/10) | |
| 1881 | + else: | |
| 1882 | + # legacy code kept for backward compatibility: returns a | |
| 1883 | + # number of seconds since Jan 1,1601 | |
| 1884 | + value = value / 10000000L # seconds | |
| 1885 | + elif type == VT_UI1: # 1-byte unsigned integer | |
| 1886 | + value = ord(s[offset+4]) | |
| 1887 | + elif type == VT_CLSID: | |
| 1888 | + value = _clsid(s[offset+4:offset+20]) | |
| 1889 | + elif type == VT_CF: | |
| 1890 | + # PropertyIdentifier or ClipboardData?? | |
| 1891 | + # see http://msdn.microsoft.com/en-us/library/dd941945.aspx | |
| 1892 | + count = i32(s, offset+4) | |
| 1893 | + value = s[offset+8:offset+8+count] | |
| 1894 | + elif type == VT_BOOL: | |
| 1895 | + # VARIANT_BOOL, 16 bits bool, 0x0000=Fals, 0xFFFF=True | |
| 1896 | + # see http://msdn.microsoft.com/en-us/library/cc237864.aspx | |
| 1897 | + value = bool(i16(s, offset+4)) | |
| 1898 | + else: | |
| 1899 | + value = None # everything else yields "None" | |
| 1900 | + debug ('property id=%d: type=%d not implemented in parser yet' % (id, type)) | |
| 1579 | 1901 | |
| 1580 | - # FIXME: add support for VT_VECTOR | |
| 1902 | + # missing: VT_EMPTY, VT_NULL, VT_R4, VT_R8, VT_CY, VT_DATE, | |
| 1903 | + # VT_DECIMAL, VT_I1, VT_I8, VT_UI8, | |
| 1904 | + # see http://msdn.microsoft.com/en-us/library/dd942033.aspx | |
| 1581 | 1905 | |
| 1582 | - #print "%08x" % id, repr(value), | |
| 1583 | - #print "(%s)" % VT[i32(s, offset) & 0xFFF] | |
| 1906 | + # FIXME: add support for VT_VECTOR | |
| 1907 | + # VT_VECTOR is a 32 uint giving the number of items, followed by | |
| 1908 | + # the items in sequence. The VT_VECTOR value is combined with the | |
| 1909 | + # type of items, e.g. VT_VECTOR|VT_BSTR | |
| 1910 | + # see http://msdn.microsoft.com/en-us/library/dd942011.aspx | |
| 1584 | 1911 | |
| 1585 | - data[id] = value | |
| 1912 | + #print "%08x" % id, repr(value), | |
| 1913 | + #print "(%s)" % VT[i32(s, offset) & 0xFFF] | |
| 1914 | + | |
| 1915 | + data[id] = value | |
| 1916 | + except: | |
| 1917 | + # catch exception while parsing each property, and only raise | |
| 1918 | + # a DEFECT_INCORRECT, because parsing can go on | |
| 1919 | + exctype, excvalue = sys.exc_info()[:2] | |
| 1920 | + msg = 'Error while parsing property id %d in stream %s: %s' % ( | |
| 1921 | + id, repr(streampath), excvalue) | |
| 1922 | + self._raise_defect(DEFECT_INCORRECT, msg, exctype) | |
| 1586 | 1923 | |
| 1587 | 1924 | return data |
| 1588 | 1925 | |
| 1926 | + def get_metadata(self): | |
| 1927 | + """ | |
| 1928 | + Parse standard properties streams, return an OleMetadata object | |
| 1929 | + containing all the available metadata. | |
| 1930 | + (also stored in the metadata attribute of the OleFileIO object) | |
| 1931 | + | |
| 1932 | + new in version 0.25 | |
| 1933 | + """ | |
| 1934 | + self.metadata = OleMetadata() | |
| 1935 | + self.metadata.parse_properties(self) | |
| 1936 | + return self.metadata | |
| 1937 | + | |
| 1589 | 1938 | # |
| 1590 | 1939 | # -------------------------------------------------------------------- |
| 1591 | 1940 | # This script can be used to dump the directory of any OLE2 structured |
| ... | ... | @@ -1622,7 +1971,7 @@ Options: |
| 1622 | 1971 | check_streams = True |
| 1623 | 1972 | continue |
| 1624 | 1973 | |
| 1625 | - ole = OleFileIO(filename, raise_defects=DEFECT_INCORRECT) | |
| 1974 | + ole = OleFileIO(filename)#, raise_defects=DEFECT_INCORRECT) | |
| 1626 | 1975 | print "-" * 68 |
| 1627 | 1976 | print filename |
| 1628 | 1977 | print "-" * 68 |
| ... | ... | @@ -1630,7 +1979,7 @@ Options: |
| 1630 | 1979 | for streamname in ole.listdir(): |
| 1631 | 1980 | if streamname[-1][0] == "\005": |
| 1632 | 1981 | print streamname, ": properties" |
| 1633 | - props = ole.getproperties(streamname) | |
| 1982 | + props = ole.getproperties(streamname, convert_time=True) | |
| 1634 | 1983 | props = props.items() |
| 1635 | 1984 | props.sort() |
| 1636 | 1985 | for k, v in props: |
| ... | ... | @@ -1661,6 +2010,23 @@ Options: |
| 1661 | 2010 | print 'NOT a stream : type=%d' % st_type |
| 1662 | 2011 | print '' |
| 1663 | 2012 | |
| 2013 | +## for streamname in ole.listdir(): | |
| 2014 | +## # print name using repr() to convert binary chars to \xNN: | |
| 2015 | +## print '-', repr('/'.join(streamname)),'-', | |
| 2016 | +## print ole.getmtime(streamname) | |
| 2017 | +## print '' | |
| 2018 | + | |
| 2019 | + print 'Modification/Creation times of all directory entries:' | |
| 2020 | + for entry in ole.direntries: | |
| 2021 | + if entry is not None: | |
| 2022 | + print '- %s: mtime=%s ctime=%s' % (entry.name, | |
| 2023 | + entry.getmtime(), entry.getctime()) | |
| 2024 | + print '' | |
| 2025 | + | |
| 2026 | + # parse and display metadata: | |
| 2027 | + meta = ole.get_metadata() | |
| 2028 | + meta.dump() | |
| 2029 | + print '' | |
| 1664 | 2030 | #[PL] Test a few new methods: |
| 1665 | 2031 | root = ole.get_rootentry_name() |
| 1666 | 2032 | print 'Root entry name: "%s"' % root |
| ... | ... | @@ -1670,5 +2036,13 @@ Options: |
| 1670 | 2036 | print "size :", ole.get_size('worddocument') |
| 1671 | 2037 | if ole.exists('macros/vba'): |
| 1672 | 2038 | print "This document may contain VBA macros." |
| 2039 | + | |
| 2040 | + # print parsing issues: | |
| 2041 | + print '\nNon-fatal issues raised during parsing:' | |
| 2042 | + if ole.parsing_issues: | |
| 2043 | + for exctype, msg in ole.parsing_issues: | |
| 2044 | + print '- %s: %s' % (exctype.__name__, msg) | |
| 2045 | + else: | |
| 2046 | + print 'None' | |
| 1673 | 2047 | ## except IOError, v: |
| 1674 | 2048 | ## print "***", "cannot read", file, "-", v | ... | ... |
oletools/thirdparty/OleFileIO_PL/README.txt
| ... | ... | @@ -12,7 +12,8 @@ This is an improved version of the OleFileIO module from |
| 12 | 12 | `PIL <http://www.pythonware.com/products/pil/index.htm>`_, the excellent |
| 13 | 13 | Python Imaging Library, created and maintained by Fredrik Lundh. The API |
| 14 | 14 | is still compatible with PIL, but I have improved the internal |
| 15 | -implementation significantly, with bugfixes and a more robust design. | |
| 15 | +implementation significantly, with new features, bugfixes and a more | |
| 16 | +robust design. | |
| 16 | 17 | |
| 17 | 18 | As far as I know, this module is now the most complete and robust Python |
| 18 | 19 | implementation to read MS OLE2 files, portable on several operating |
| ... | ... | @@ -20,18 +21,31 @@ systems. (please tell me if you know other similar Python modules) |
| 20 | 21 | |
| 21 | 22 | WARNING: THIS IS (STILL) WORK IN PROGRESS. |
| 22 | 23 | |
| 23 | -Main improvements over PIL version: | |
| 24 | ------------------------------------ | |
| 24 | +Main improvements over PIL version of OleFileIO: | |
| 25 | +------------------------------------------------ | |
| 25 | 26 | |
| 26 | 27 | - Better compatibility with Python 2.4 up to 2.7 |
| 27 | 28 | - Support for files larger than 6.8MB |
| 28 | 29 | - Robust: many checks to detect malformed files |
| 29 | 30 | - Improved API |
| 31 | +- New features: metadata extraction, stream/storage timestamps | |
| 30 | 32 | - Added setup.py and install.bat to ease installation |
| 31 | 33 | |
| 32 | 34 | News |
| 33 | 35 | ---- |
| 34 | 36 | |
| 37 | +- 2013-07-24 v0.26: added methods to parse stream/storage timestamps, | |
| 38 | + improved listdir to include storages, fixed parsing of direntry | |
| 39 | + timestamps | |
| 40 | +- 2013-05-27 v0.25: improved metadata extraction, properties parsing | |
| 41 | + and exception handling, fixed `issue | |
| 42 | + #12 <https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole>`_ | |
| 43 | +- 2013-05-07 v0.24: new features to extract metadata (get\_metadata | |
| 44 | + method and OleMetadata class), improved getproperties to convert | |
| 45 | + timestamps to Python datetime | |
| 46 | +- 2012-10-09: published | |
| 47 | + `python-oletools <http://www.decalage.info/python/oletools>`_, a | |
| 48 | + package of analysis tools based on OleFileIO\_PL | |
| 35 | 49 | - 2012-09-11 v0.23: added support for file-like objects, fixed `issue |
| 36 | 50 | #8 <https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object>`_ |
| 37 | 51 | - 2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2 |
| ... | ... | @@ -87,6 +101,14 @@ Here are a few examples: |
| 87 | 101 | f.write(data) |
| 88 | 102 | f.close() |
| 89 | 103 | |
| 104 | + # Extract metadata (new in v0.24) - see source code for all attributes: | |
| 105 | + meta = ole.get_metadata() | |
| 106 | + print 'Author:', meta.author | |
| 107 | + print 'Title:', meta.title | |
| 108 | + print 'Creation date:', meta.create_time | |
| 109 | + # print all metadata: | |
| 110 | + meta.dump() | |
| 111 | + | |
| 90 | 112 | # Close the OLE file: |
| 91 | 113 | ole.close() |
| 92 | 114 | |
| ... | ... | @@ -144,7 +166,7 @@ License |
| 144 | 166 | |
| 145 | 167 | OleFileIO\_PL is open-source. |
| 146 | 168 | |
| 147 | -OleFileIO\_PL changes are Copyright (c) 2005-2012 by Philippe Lagadec. | |
| 169 | +OleFileIO\_PL changes are Copyright (c) 2005-2013 by Philippe Lagadec. | |
| 148 | 170 | |
| 149 | 171 | The Python Imaging Library (PIL) is |
| 150 | 172 | ... | ... |