Commit f5b8a4fd7d1dd96294197d76a2e802016c37f9be

Authored by Philippe Lagadec
1 parent 0617289f

updated olefile to latest v0.43

oletools/thirdparty/olefile/LICENSE.txt
1 1 LICENSE for the olefile package:
2 2  
3   -olefile (formerly OleFileIO_PL) is copyright (c) 2005-2015 Philippe Lagadec
  3 +olefile (formerly OleFileIO_PL) is copyright (c) 2005-2016 Philippe Lagadec
4 4 (http://www.decalage.info)
5 5  
6 6 All rights reserved.
... ...
oletools/thirdparty/olefile/README.html
1   -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
2   -<html xmlns="http://www.w3.org/1999/xhtml">
3   -<head>
4   - <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
5   - <meta http-equiv="Content-Style-Type" content="text/css" />
6   - <meta name="generator" content="pandoc" />
7   - <title></title>
8   -</head>
9   -<body>
10   -<h1 id="olefile-formerly-olefileio_pl">olefile (formerly OleFileIO_PL)</h1>
11   -<p><a href="http://www.decalage.info/olefile">olefile</a> is a Python package to parse, read and write <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.</p>
12   -<p><strong>Quick links:</strong> <a href="http://www.decalage.info/olefile">Home page</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/Install">Download/Install</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">Documentation</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/issues?status=new&amp;status=open">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the author</a> - <a href="https://bitbucket.org/decalage/olefileio_pl">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a></p>
13   -<h2 id="news">News</h2>
14   -<p>Follow all updates and news on Twitter: <a href="https://twitter.com/decalage2"><code class="url">https://twitter.com/decalage2</code></a></p>
15   -<ul>
16   -<li><strong>2015-01-25 v0.42</strong>: improved handling of special characters in stream/storage names on Python 2.x (using UTF-8 instead of Latin-1), fixed bug in listdir with empty storages.</li>
17   -<li>2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files stored in byte strings, fixed installer for python 3, added support for Jython (Niko Ehrenfeuchter)</li>
18   -<li>2014-10-01 v0.40: renamed OleFileIO_PL to olefile, added initial write support for streams &gt;4K, updated doc and license, improved the setup script.</li>
19   -<li>2014-07-27 v0.31: fixed support for large files with 4K sectors, thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added test scripts from Pillow (by hugovk). Fixed setup for Python 3 (Martin Panter)</li>
20   -<li>2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin Panter who did most of the hard work.</li>
21   -<li>2013-07-24 v0.26: added methods to parse stream/storage timestamps, improved listdir to include storages, fixed parsing of direntry timestamps</li>
22   -<li>2013-05-27 v0.25: improved metadata extraction, properties parsing and exception handling, fixed <a href="https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole">issue #12</a></li>
23   -<li>2013-05-07 v0.24: new features to extract metadata (get_metadata method and OleMetadata class), improved getproperties to convert timestamps to Python datetime</li>
24   -<li>2012-10-09: published <a href="http://www.decalage.info/python/oletools">python-oletools</a>, a package of analysis tools based on OleFileIO_PL</li>
25   -<li>2012-09-11 v0.23: added support for file-like objects, fixed <a href="https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object">issue #8</a></li>
26   -<li>2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2 (added close method)</li>
27   -<li>2011-10-20: code hosted on bitbucket to ease contributions and bug tracking</li>
28   -<li>2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC Macs.</li>
29   -<li>2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not plain str.</li>
30   -<li>2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben G. and Martijn for reporting the bug)</li>
31   -<li>see changelog in source code for more info.</li>
32   -</ul>
33   -<h2 id="downloadinstall">Download/Install</h2>
34   -<p>If you have pip or setuptools installed (pip is included in Python 2.7.9+), you may simply run <strong>pip install olefile</strong> or <strong>easy_install olefile</strong> for the first installation.</p>
35   -<p>To update olefile, run <strong>pip install -U olefile</strong>.</p>
36   -<p>Otherwise, see https://bitbucket.org/decalage/olefileio_pl/wiki/Install</p>
37   -<h2 id="features">Features</h2>
38   -<ul>
39   -<li>Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView OIB files, etc</li>
40   -<li>List all the streams and storages contained in an OLE file</li>
41   -<li>Open streams as files</li>
42   -<li>Parse and read property streams, containing metadata of the file</li>
43   -<li>Portable, pure Python module, no dependency</li>
44   -</ul>
45   -<p>olefile can be used as an independent package or with PIL/Pillow.</p>
46   -<p>olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially for security purposes such as malware analysis and forensics), then please also check my <a href="http://www.decalage.info/python/oletools">python-oletools</a>, which are built upon olefile and provide a higher-level interface.</p>
47   -<h2 id="history">History</h2>
48   -<p>olefile is based on the OleFileIO module from <a href="http://www.pythonware.com/products/pil/index.htm">PIL</a>, the excellent Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate its 9 years and its new write features.</p>
49   -<p>As far as I know, olefile is the most complete and robust Python implementation to read MS OLE2 files, portable on several operating systems. (please tell me if you know other similar Python modules)</p>
50   -<p>Since 2014 olefile/OleFileIO_PL has been integrated into <a href="http://python-imaging.github.io/">Pillow</a>, the friendly fork of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow regularly.</p>
51   -<h2 id="main-improvements-over-the-original-version-of-olefileio-in-pil">Main improvements over the original version of OleFileIO in PIL:</h2>
52   -<ul>
53   -<li>Compatible with Python 3.x and 2.6+</li>
54   -<li>Many bug fixes</li>
55   -<li>Support for files larger than 6.8MB</li>
56   -<li>Support for 64 bits platforms and big-endian CPUs</li>
57   -<li>Robust: many checks to detect malformed files</li>
58   -<li>Runtime option to choose if malformed files should be parsed or raise exceptions</li>
59   -<li>Improved API</li>
60   -<li>Metadata extraction, stream/storage timestamps (e.g. for document forensics)</li>
61   -<li>Can open file-like objects</li>
62   -<li>Added setup.py and install.bat to ease installation</li>
63   -<li>More convenient slash-based syntax for stream paths</li>
64   -<li>Write features</li>
65   -</ul>
66   -<h2 id="documentation">Documentation</h2>
67   -<p>Please see the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">online documentation</a> for more information, especially the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview">OLE overview</a> and the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/API">API page</a> which describe how to use olefile in Python applications. A copy of the same documentation is also provided in the doc subfolder of the olefile package.</p>
68   -<h2 id="real-life-examples">Real-life examples</h2>
69   -<p>A real-life example: <a href="http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/">using OleFileIO_PL for malware analysis and forensics</a>.</p>
70   -<p>See also <a href="https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879">this paper</a> about python tools for forensics, which features olefile.</p>
71   -<h2 id="license">License</h2>
72   -<p>olefile (formerly OleFileIO_PL) is copyright (c) 2005-2015 Philippe Lagadec (<a href="http://www.decalage.info">http://www.decalage.info</a>)</p>
73   -<p>All rights reserved.</p>
74   -<p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p>
75   -<ul>
76   -<li>Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.</li>
77   -<li>Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.</li>
78   -</ul>
79   -<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS &quot;AS IS&quot; AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
80   -<hr />
81   -<p>olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik Lundh under the following license:</p>
82   -<p>The Python Imaging Library (PIL) is</p>
83   -<ul>
84   -<li>Copyright (c) 1997-2005 by Secret Labs AB</li>
85   -<li>Copyright (c) 1995-2005 by Fredrik Lundh</li>
86   -</ul>
87   -<p>By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:</p>
88   -<p>Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.</p>
89   -<p>SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.</p>
90   -</body>
91   -</html>
  1 +<h1 id="olefile-formerly-olefileio_pl">olefile (formerly OleFileIO_PL)</h1>
  2 +<p><a href="http://www.decalage.info/olefile">olefile</a> is a Python package to parse, read and write <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.</p>
  3 +<p><strong>Quick links:</strong> <a href="http://www.decalage.info/olefile">Home page</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/Install">Download/Install</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">Documentation</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/issues?status=new&amp;status=open">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the author</a> - <a href="https://bitbucket.org/decalage/olefileio_pl">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a></p>
  4 +<h2 id="news">News</h2>
  5 +<p>Follow all updates and news on Twitter: <a href="https://twitter.com/decalage2">https://twitter.com/decalage2</a></p>
  6 +<ul>
  7 +<li><strong>2016-02-02 v0.43</strong>: fixed issues <a href="https://bitbucket.org/decalage/olefileio_pl/issues/26/variable-referenced-before-assignment">#26</a> and <a href="https://bitbucket.org/decalage/olefileio_pl/issues/27/incomplete-ole-stream-incorrect-ole-fat">#27</a>, better handling of malformed files, use python logging.</li>
  8 +<li>2015-01-25 v0.42: improved handling of special characters in stream/storage names on Python 2.x (using UTF-8 instead of Latin-1), fixed bug in listdir with empty storages.</li>
  9 +<li>2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files stored in byte strings, fixed installer for python 3, added support for Jython (Niko Ehrenfeuchter)</li>
  10 +<li>2014-10-01 v0.40: renamed OleFileIO_PL to olefile, added initial write support for streams &gt;4K, updated doc and license, improved the setup script.</li>
  11 +<li>2014-07-27 v0.31: fixed support for large files with 4K sectors, thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added test scripts from Pillow (by hugovk). Fixed setup for Python 3 (Martin Panter)</li>
  12 +<li>2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin Panter who did most of the hard work.</li>
  13 +<li>2013-07-24 v0.26: added methods to parse stream/storage timestamps, improved listdir to include storages, fixed parsing of direntry timestamps</li>
  14 +<li>2013-05-27 v0.25: improved metadata extraction, properties parsing and exception handling, fixed <a href="https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole">issue #12</a></li>
  15 +<li>2013-05-07 v0.24: new features to extract metadata (get_metadata method and OleMetadata class), improved getproperties to convert timestamps to Python datetime</li>
  16 +<li>2012-10-09: published <a href="http://www.decalage.info/python/oletools">python-oletools</a>, a package of analysis tools based on OleFileIO_PL</li>
  17 +<li>2012-09-11 v0.23: added support for file-like objects, fixed <a href="https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object">issue #8</a></li>
  18 +<li>2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2 (added close method)</li>
  19 +<li>2011-10-20: code hosted on bitbucket to ease contributions and bug tracking</li>
  20 +<li>2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC Macs.</li>
  21 +<li>2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not plain str.</li>
  22 +<li>2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben G. and Martijn for reporting the bug)</li>
  23 +<li>see changelog in source code for more info.</li>
  24 +</ul>
  25 +<h2 id="downloadinstall">Download/Install</h2>
  26 +<p>If you have pip or setuptools installed (pip is included in Python 2.7.9+), you may simply run <strong>pip install olefile</strong> or <strong>easy_install olefile</strong> for the first installation.</p>
  27 +<p>To update olefile, run <strong>pip install -U olefile</strong>.</p>
  28 +<p>Otherwise, see https://bitbucket.org/decalage/olefileio_pl/wiki/Install</p>
  29 +<h2 id="features">Features</h2>
  30 +<ul>
  31 +<li>Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView OIB files, etc</li>
  32 +<li>List all the streams and storages contained in an OLE file</li>
  33 +<li>Open streams as files</li>
  34 +<li>Parse and read property streams, containing metadata of the file</li>
  35 +<li>Portable, pure Python module, no dependency</li>
  36 +</ul>
  37 +<p>olefile can be used as an independent package or with PIL/Pillow.</p>
  38 +<p>olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially for security purposes such as malware analysis and forensics), then please also check my <a href="http://www.decalage.info/python/oletools">python-oletools</a>, which are built upon olefile and provide a higher-level interface.</p>
  39 +<h2 id="history">History</h2>
  40 +<p>olefile is based on the OleFileIO module from <a href="http://www.pythonware.com/products/pil/index.htm">PIL</a>, the excellent Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate its 9 years and its new write features.</p>
  41 +<p>As far as I know, olefile is the most complete and robust Python implementation to read MS OLE2 files, portable on several operating systems. (please tell me if you know other similar Python modules)</p>
  42 +<p>Since 2014 olefile/OleFileIO_PL has been integrated into <a href="http://python-imaging.github.io/">Pillow</a>, the friendly fork of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow regularly.</p>
  43 +<h2 id="main-improvements-over-the-original-version-of-olefileio-in-pil">Main improvements over the original version of OleFileIO in PIL:</h2>
  44 +<ul>
  45 +<li>Compatible with Python 3.x and 2.6+</li>
  46 +<li>Many bug fixes</li>
  47 +<li>Support for files larger than 6.8MB</li>
  48 +<li>Support for 64 bits platforms and big-endian CPUs</li>
  49 +<li>Robust: many checks to detect malformed files</li>
  50 +<li>Runtime option to choose if malformed files should be parsed or raise exceptions</li>
  51 +<li>Improved API</li>
  52 +<li>Metadata extraction, stream/storage timestamps (e.g. for document forensics)</li>
  53 +<li>Can open file-like objects</li>
  54 +<li>Added setup.py and install.bat to ease installation</li>
  55 +<li>More convenient slash-based syntax for stream paths</li>
  56 +<li>Write features</li>
  57 +</ul>
  58 +<h2 id="documentation">Documentation</h2>
  59 +<p>Please see the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">online documentation</a> for more information, especially the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview">OLE overview</a> and the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/API">API page</a> which describe how to use olefile in Python applications. A copy of the same documentation is also provided in the doc subfolder of the olefile package.</p>
  60 +<h2 id="real-life-examples">Real-life examples</h2>
  61 +<p>A real-life example: <a href="http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/">using OleFileIO_PL for malware analysis and forensics</a>.</p>
  62 +<p>See also <a href="https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879">this paper</a> about python tools for forensics, which features olefile.</p>
  63 +<h2 id="license">License</h2>
  64 +<p>olefile (formerly OleFileIO_PL) is copyright (c) 2005-2016 Philippe Lagadec (<a href="http://www.decalage.info">http://www.decalage.info</a>)</p>
  65 +<p>All rights reserved.</p>
  66 +<p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p>
  67 +<ul>
  68 +<li>Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.</li>
  69 +<li>Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.</li>
  70 +</ul>
  71 +<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS &quot;AS IS&quot; AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
  72 +<hr />
  73 +<p>olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik Lundh under the following license:</p>
  74 +<p>The Python Imaging Library (PIL) is</p>
  75 +<ul>
  76 +<li>Copyright (c) 1997-2005 by Secret Labs AB</li>
  77 +<li>Copyright (c) 1995-2005 by Fredrik Lundh</li>
  78 +</ul>
  79 +<p>By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:</p>
  80 +<p>Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.</p>
  81 +<p>SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.</p>
... ...
oletools/thirdparty/olefile/README.rst
1   -olefile (formerly OleFileIO\_PL)
2   -================================
3   -
4   -`olefile <http://www.decalage.info/olefile>`_ is a Python package to
5   -parse, read and write `Microsoft OLE2
6   -files <http://en.wikipedia.org/wiki/Compound_File_Binary_Format>`_ (also
7   -called Structured Storage, Compound File Binary Format or Compound
8   -Document File Format), such as Microsoft Office 97-2003 documents,
9   -vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix
10   -files, Outlook messages, StickyNotes, several Microscopy file formats,
11   -McAfee antivirus quarantine files, etc.
12   -
13   -**Quick links:** `Home page <http://www.decalage.info/olefile>`_ -
14   -`Download/Install <https://bitbucket.org/decalage/olefileio_pl/wiki/Install>`_
15   -- `Documentation <https://bitbucket.org/decalage/olefileio_pl/wiki>`_ -
16   -`Report
17   -Issues/Suggestions/Questions <https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open>`_
18   -- `Contact the author <http://decalage.info/contact>`_ -
19   -`Repository <https://bitbucket.org/decalage/olefileio_pl>`_ - `Updates
20   -on Twitter <https://twitter.com/decalage2>`_
21   -
22   -News
23   -----
24   -
25   -Follow all updates and news on Twitter: https://twitter.com/decalage2
26   -
27   -- **2015-01-25 v0.42**: improved handling of special characters in
28   - stream/storage names on Python 2.x (using UTF-8 instead of Latin-1),
29   - fixed bug in listdir with empty storages.
30   -- 2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files
31   - stored in byte strings, fixed installer for python 3, added support
32   - for Jython (Niko Ehrenfeuchter)
33   -- 2014-10-01 v0.40: renamed OleFileIO\_PL to olefile, added initial
34   - write support for streams >4K, updated doc and license, improved the
35   - setup script.
36   -- 2014-07-27 v0.31: fixed support for large files with 4K sectors,
37   - thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added
38   - test scripts from Pillow (by hugovk). Fixed setup for Python 3
39   - (Martin Panter)
40   -- 2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin
41   - Panter who did most of the hard work.
42   -- 2013-07-24 v0.26: added methods to parse stream/storage timestamps,
43   - improved listdir to include storages, fixed parsing of direntry
44   - timestamps
45   -- 2013-05-27 v0.25: improved metadata extraction, properties parsing
46   - and exception handling, fixed `issue
47   - #12 <https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole>`_
48   -- 2013-05-07 v0.24: new features to extract metadata (get\_metadata
49   - method and OleMetadata class), improved getproperties to convert
50   - timestamps to Python datetime
51   -- 2012-10-09: published
52   - `python-oletools <http://www.decalage.info/python/oletools>`_, a
53   - package of analysis tools based on OleFileIO\_PL
54   -- 2012-09-11 v0.23: added support for file-like objects, fixed `issue
55   - #8 <https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object>`_
56   -- 2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2
57   - (added close method)
58   -- 2011-10-20: code hosted on bitbucket to ease contributions and bug
59   - tracking
60   -- 2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC
61   - Macs.
62   -- 2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not
63   - plain str.
64   -- 2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben
65   - G. and Martijn for reporting the bug)
66   -- see changelog in source code for more info.
67   -
68   -Download/Install
69   -----------------
70   -
71   -If you have pip or setuptools installed (pip is included in Python
72   -2.7.9+), you may simply run **pip install olefile** or **easy\_install
73   -olefile** for the first installation.
74   -
75   -To update olefile, run **pip install -U olefile**.
76   -
77   -Otherwise, see https://bitbucket.org/decalage/olefileio\_pl/wiki/Install
78   -
79   -Features
80   ---------
81   -
82   -- Parse, read and write any OLE file such as Microsoft Office 97-2003
83   - legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt,
84   - Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook
85   - messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView
86   - OIB files, etc
87   -- List all the streams and storages contained in an OLE file
88   -- Open streams as files
89   -- Parse and read property streams, containing metadata of the file
90   -- Portable, pure Python module, no dependency
91   -
92   -olefile can be used as an independent package or with PIL/Pillow.
93   -
94   -olefile is mostly meant for developers. If you are looking for tools to
95   -analyze OLE files or to extract data (especially for security purposes
96   -such as malware analysis and forensics), then please also check my
97   -`python-oletools <http://www.decalage.info/python/oletools>`_, which are
98   -built upon olefile and provide a higher-level interface.
99   -
100   -History
101   --------
102   -
103   -olefile is based on the OleFileIO module from
104   -`PIL <http://www.pythonware.com/products/pil/index.htm>`_, the excellent
105   -Python Imaging Library, created and maintained by Fredrik Lundh. The
106   -olefile API is still compatible with PIL, but since 2005 I have improved
107   -the internal implementation significantly, with new features, bugfixes
108   -and a more robust design. From 2005 to 2014 the project was called
109   -OleFileIO\_PL, and in 2014 I changed its name to olefile to celebrate
110   -its 9 years and its new write features.
111   -
112   -As far as I know, olefile is the most complete and robust Python
113   -implementation to read MS OLE2 files, portable on several operating
114   -systems. (please tell me if you know other similar Python modules)
115   -
116   -Since 2014 olefile/OleFileIO\_PL has been integrated into
117   -`Pillow <http://python-imaging.github.io/>`_, the friendly fork of PIL.
118   -olefile will continue to be improved as a separate project, and new
119   -versions will be merged into Pillow regularly.
120   -
121   -Main improvements over the original version of OleFileIO in PIL:
122   -----------------------------------------------------------------
123   -
124   -- Compatible with Python 3.x and 2.6+
125   -- Many bug fixes
126   -- Support for files larger than 6.8MB
127   -- Support for 64 bits platforms and big-endian CPUs
128   -- Robust: many checks to detect malformed files
129   -- Runtime option to choose if malformed files should be parsed or raise
130   - exceptions
131   -- Improved API
132   -- Metadata extraction, stream/storage timestamps (e.g. for document
133   - forensics)
134   -- Can open file-like objects
135   -- Added setup.py and install.bat to ease installation
136   -- More convenient slash-based syntax for stream paths
137   -- Write features
138   -
139   -Documentation
140   --------------
141   -
142   -Please see the `online
143   -documentation <https://bitbucket.org/decalage/olefileio_pl/wiki>`_ for
144   -more information, especially the `OLE
145   -overview <https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview>`_
146   -and the `API
147   -page <https://bitbucket.org/decalage/olefileio_pl/wiki/API>`_ which
148   -describe how to use olefile in Python applications. A copy of the same
149   -documentation is also provided in the doc subfolder of the olefile
150   -package.
151   -
152   -Real-life examples
153   -------------------
154   -
155   -A real-life example: `using OleFileIO\_PL for malware analysis and
156   -forensics <http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/>`_.
157   -
158   -See also `this
159   -paper <https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879>`_
160   -about python tools for forensics, which features olefile.
161   -
162   -License
163   --------
164   -
165   -olefile (formerly OleFileIO\_PL) is copyright (c) 2005-2015 Philippe
166   -Lagadec (`http://www.decalage.info <http://www.decalage.info>`_)
167   -
168   -All rights reserved.
169   -
170   -Redistribution and use in source and binary forms, with or without
171   -modification, are permitted provided that the following conditions are
172   -met:
173   -
174   -- Redistributions of source code must retain the above copyright
175   - notice, this list of conditions and the following disclaimer.
176   -- Redistributions in binary form must reproduce the above copyright
177   - notice, this list of conditions and the following disclaimer in the
178   - documentation and/or other materials provided with the distribution.
179   -
180   -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
181   -IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
182   -TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
183   -PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
184   -HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
185   -SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
186   -TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
187   -PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
188   -LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
189   -NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
190   -SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
191   -
192   ---------------
193   -
194   -olefile is based on source code from the OleFileIO module of the Python
195   -Imaging Library (PIL) published by Fredrik Lundh under the following
196   -license:
197   -
198   -The Python Imaging Library (PIL) is
199   -
200   -- Copyright (c) 1997-2005 by Secret Labs AB
201   -- Copyright (c) 1995-2005 by Fredrik Lundh
202   -
203   -By obtaining, using, and/or copying this software and/or its associated
204   -documentation, you agree that you have read, understood, and will comply
205   -with the following terms and conditions:
206   -
207   -Permission to use, copy, modify, and distribute this software and its
208   -associated documentation for any purpose and without fee is hereby
209   -granted, provided that the above copyright notice appears in all copies,
210   -and that both that copyright notice and this permission notice appear in
211   -supporting documentation, and that the name of Secret Labs AB or the
212   -author not be used in advertising or publicity pertaining to
213   -distribution of the software without specific, written prior permission.
214   -
215   -SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO
216   -THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
217   -FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR
218   -ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
219   -RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
220   -CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
221   -CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  1 +olefile (formerly OleFileIO\_PL)
  2 +================================
  3 +
  4 +`olefile <http://www.decalage.info/olefile>`__ is a Python package to
  5 +parse, read and write `Microsoft OLE2
  6 +files <http://en.wikipedia.org/wiki/Compound_File_Binary_Format>`__
  7 +(also called Structured Storage, Compound File Binary Format or Compound
  8 +Document File Format), such as Microsoft Office 97-2003 documents,
  9 +vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix
  10 +files, Outlook messages, StickyNotes, several Microscopy file formats,
  11 +McAfee antivirus quarantine files, etc.
  12 +
  13 +**Quick links:** `Home page <http://www.decalage.info/olefile>`__ -
  14 +`Download/Install <https://bitbucket.org/decalage/olefileio_pl/wiki/Install>`__
  15 +- `Documentation <https://bitbucket.org/decalage/olefileio_pl/wiki>`__ -
  16 +`Report
  17 +Issues/Suggestions/Questions <https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open>`__
  18 +- `Contact the author <http://decalage.info/contact>`__ -
  19 +`Repository <https://bitbucket.org/decalage/olefileio_pl>`__ - `Updates
  20 +on Twitter <https://twitter.com/decalage2>`__
  21 +
  22 +News
  23 +----
  24 +
  25 +Follow all updates and news on Twitter: https://twitter.com/decalage2
  26 +
  27 +- **2016-02-02 v0.43**: fixed issues
  28 + `#26 <https://bitbucket.org/decalage/olefileio_pl/issues/26/variable-referenced-before-assignment>`__
  29 + and
  30 + `#27 <https://bitbucket.org/decalage/olefileio_pl/issues/27/incomplete-ole-stream-incorrect-ole-fat>`__,
  31 + better handling of malformed files, use python logging.
  32 +- 2015-01-25 v0.42: improved handling of special characters in
  33 + stream/storage names on Python 2.x (using UTF-8 instead of Latin-1),
  34 + fixed bug in listdir with empty storages.
  35 +- 2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files
  36 + stored in byte strings, fixed installer for python 3, added support
  37 + for Jython (Niko Ehrenfeuchter)
  38 +- 2014-10-01 v0.40: renamed OleFileIO\_PL to olefile, added initial
  39 + write support for streams >4K, updated doc and license, improved the
  40 + setup script.
  41 +- 2014-07-27 v0.31: fixed support for large files with 4K sectors,
  42 + thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added
  43 + test scripts from Pillow (by hugovk). Fixed setup for Python 3
  44 + (Martin Panter)
  45 +- 2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin
  46 + Panter who did most of the hard work.
  47 +- 2013-07-24 v0.26: added methods to parse stream/storage timestamps,
  48 + improved listdir to include storages, fixed parsing of direntry
  49 + timestamps
  50 +- 2013-05-27 v0.25: improved metadata extraction, properties parsing
  51 + and exception handling, fixed `issue
  52 + #12 <https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole>`__
  53 +- 2013-05-07 v0.24: new features to extract metadata (get\_metadata
  54 + method and OleMetadata class), improved getproperties to convert
  55 + timestamps to Python datetime
  56 +- 2012-10-09: published
  57 + `python-oletools <http://www.decalage.info/python/oletools>`__, a
  58 + package of analysis tools based on OleFileIO\_PL
  59 +- 2012-09-11 v0.23: added support for file-like objects, fixed `issue
  60 + #8 <https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object>`__
  61 +- 2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2
  62 + (added close method)
  63 +- 2011-10-20: code hosted on bitbucket to ease contributions and bug
  64 + tracking
  65 +- 2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC
  66 + Macs.
  67 +- 2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not
  68 + plain str.
  69 +- 2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben
  70 + G. and Martijn for reporting the bug)
  71 +- see changelog in source code for more info.
  72 +
  73 +Download/Install
  74 +----------------
  75 +
  76 +If you have pip or setuptools installed (pip is included in Python
  77 +2.7.9+), you may simply run **pip install olefile** or **easy\_install
  78 +olefile** for the first installation.
  79 +
  80 +To update olefile, run **pip install -U olefile**.
  81 +
  82 +Otherwise, see https://bitbucket.org/decalage/olefileio\_pl/wiki/Install
  83 +
  84 +Features
  85 +--------
  86 +
  87 +- Parse, read and write any OLE file such as Microsoft Office 97-2003
  88 + legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt,
  89 + Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook
  90 + messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView
  91 + OIB files, etc
  92 +- List all the streams and storages contained in an OLE file
  93 +- Open streams as files
  94 +- Parse and read property streams, containing metadata of the file
  95 +- Portable, pure Python module, no dependency
  96 +
  97 +olefile can be used as an independent package or with PIL/Pillow.
  98 +
  99 +olefile is mostly meant for developers. If you are looking for tools to
  100 +analyze OLE files or to extract data (especially for security purposes
  101 +such as malware analysis and forensics), then please also check my
  102 +`python-oletools <http://www.decalage.info/python/oletools>`__, which
  103 +are built upon olefile and provide a higher-level interface.
  104 +
  105 +History
  106 +-------
  107 +
  108 +olefile is based on the OleFileIO module from
  109 +`PIL <http://www.pythonware.com/products/pil/index.htm>`__, the
  110 +excellent Python Imaging Library, created and maintained by Fredrik
  111 +Lundh. The olefile API is still compatible with PIL, but since 2005 I
  112 +have improved the internal implementation significantly, with new
  113 +features, bugfixes and a more robust design. From 2005 to 2014 the
  114 +project was called OleFileIO\_PL, and in 2014 I changed its name to
  115 +olefile to celebrate its 9 years and its new write features.
  116 +
  117 +As far as I know, olefile is the most complete and robust Python
  118 +implementation to read MS OLE2 files, portable on several operating
  119 +systems. (please tell me if you know other similar Python modules)
  120 +
  121 +Since 2014 olefile/OleFileIO\_PL has been integrated into
  122 +`Pillow <http://python-imaging.github.io/>`__, the friendly fork of PIL.
  123 +olefile will continue to be improved as a separate project, and new
  124 +versions will be merged into Pillow regularly.
  125 +
  126 +Main improvements over the original version of OleFileIO in PIL:
  127 +----------------------------------------------------------------
  128 +
  129 +- Compatible with Python 3.x and 2.6+
  130 +- Many bug fixes
  131 +- Support for files larger than 6.8MB
  132 +- Support for 64 bits platforms and big-endian CPUs
  133 +- Robust: many checks to detect malformed files
  134 +- Runtime option to choose if malformed files should be parsed or raise
  135 + exceptions
  136 +- Improved API
  137 +- Metadata extraction, stream/storage timestamps (e.g. for document
  138 + forensics)
  139 +- Can open file-like objects
  140 +- Added setup.py and install.bat to ease installation
  141 +- More convenient slash-based syntax for stream paths
  142 +- Write features
  143 +
  144 +Documentation
  145 +-------------
  146 +
  147 +Please see the `online
  148 +documentation <https://bitbucket.org/decalage/olefileio_pl/wiki>`__ for
  149 +more information, especially the `OLE
  150 +overview <https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview>`__
  151 +and the `API
  152 +page <https://bitbucket.org/decalage/olefileio_pl/wiki/API>`__ which
  153 +describe how to use olefile in Python applications. A copy of the same
  154 +documentation is also provided in the doc subfolder of the olefile
  155 +package.
  156 +
  157 +Real-life examples
  158 +------------------
  159 +
  160 +A real-life example: `using OleFileIO\_PL for malware analysis and
  161 +forensics <http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/>`__.
  162 +
  163 +See also `this
  164 +paper <https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879>`__
  165 +about python tools for forensics, which features olefile.
  166 +
  167 +License
  168 +-------
  169 +
  170 +olefile (formerly OleFileIO\_PL) is copyright (c) 2005-2016 Philippe
  171 +Lagadec (http://www.decalage.info)
  172 +
  173 +All rights reserved.
  174 +
  175 +Redistribution and use in source and binary forms, with or without
  176 +modification, are permitted provided that the following conditions are
  177 +met:
  178 +
  179 +- Redistributions of source code must retain the above copyright
  180 + notice, this list of conditions and the following disclaimer.
  181 +- Redistributions in binary form must reproduce the above copyright
  182 + notice, this list of conditions and the following disclaimer in the
  183 + documentation and/or other materials provided with the distribution.
  184 +
  185 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
  186 +IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
  187 +TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
  188 +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  189 +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  190 +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
  191 +TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
  192 +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
  193 +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
  194 +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
  195 +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  196 +
  197 +--------------
  198 +
  199 +olefile is based on source code from the OleFileIO module of the Python
  200 +Imaging Library (PIL) published by Fredrik Lundh under the following
  201 +license:
  202 +
  203 +The Python Imaging Library (PIL) is
  204 +
  205 +- Copyright (c) 1997-2005 by Secret Labs AB
  206 +- Copyright (c) 1995-2005 by Fredrik Lundh
  207 +
  208 +By obtaining, using, and/or copying this software and/or its associated
  209 +documentation, you agree that you have read, understood, and will comply
  210 +with the following terms and conditions:
  211 +
  212 +Permission to use, copy, modify, and distribute this software and its
  213 +associated documentation for any purpose and without fee is hereby
  214 +granted, provided that the above copyright notice appears in all copies,
  215 +and that both that copyright notice and this permission notice appear in
  216 +supporting documentation, and that the name of Secret Labs AB or the
  217 +author not be used in advertising or publicity pertaining to
  218 +distribution of the software without specific, written prior permission.
  219 +
  220 +SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO
  221 +THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
  222 +FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR
  223 +ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
  224 +RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
  225 +CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
  226 +CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
... ...
oletools/thirdparty/olefile/olefile.py
1 1 #!/usr/bin/env python
2 2  
3   -# olefile (formerly OleFileIO_PL) version 0.43 2015-04-17
  3 +# olefile (formerly OleFileIO_PL)
4 4 #
5 5 # Module to read/write Microsoft OLE2 files (also called Structured Storage or
6 6 # Microsoft Compound Document File Format), such as Microsoft Office 97-2003
... ... @@ -9,7 +9,7 @@
9 9 #
10 10 # Project website: http://www.decalage.info/olefile
11 11 #
12   -# olefile is copyright (c) 2005-2015 Philippe Lagadec (http://www.decalage.info)
  12 +# olefile is copyright (c) 2005-2016 Philippe Lagadec (http://www.decalage.info)
13 13 #
14 14 # olefile is based on the OleFileIO module from the PIL library v1.1.6
15 15 # See: http://www.pythonware.com/products/pil/index.htm
... ... @@ -29,12 +29,12 @@ from __future__ import print_function # This version of olefile requires Pytho
29 29  
30 30  
31 31 __author__ = "Philippe Lagadec"
32   -__date__ = "2015-04-17"
33   -__version__ = '0.43'
  32 +__date__ = "2016-02-02"
  33 +__version__ = '0.44'
34 34  
35 35 #--- LICENSE ------------------------------------------------------------------
36 36  
37   -# olefile (formerly OleFileIO_PL) is copyright (c) 2005-2015 Philippe Lagadec
  37 +# olefile (formerly OleFileIO_PL) is copyright (c) 2005-2016 Philippe Lagadec
38 38 # (http://www.decalage.info)
39 39 #
40 40 # All rights reserved.
... ... @@ -182,6 +182,14 @@ __version__ = &#39;0.43&#39;
182 182 # - added path_encoding option to override the default
183 183 # - fixed a bug in _list when a storage is empty
184 184 # 2015-04-17 v0.43 PL: - slight changes in _OleDirectoryEntry
  185 +# 2015-10-19 - fixed issue #26 in OleFileIO.getproperties
  186 +# (using id and type as local variable names)
  187 +# 2015-10-29 - replaced debug() with proper logging
  188 +# - use optparse to handle command line options
  189 +# - improved attribute names in OleFileIO class
  190 +# 2015-11-05 - fixed issue #27 by correcting the MiniFAT sector
  191 +# cutoff size if invalid.
  192 +# 2016-02-02 - logging is disabled by default
185 193  
186 194 #-----------------------------------------------------------------------------
187 195 # TODO (for version 1.0):
... ... @@ -257,7 +265,7 @@ __version__ = &#39;0.43&#39;
257 265  
258 266 import io
259 267 import sys
260   -import struct, array, os.path, datetime
  268 +import struct, array, os.path, datetime, logging
261 269  
262 270 #=== COMPATIBILITY WORKAROUNDS ================================================
263 271  
... ... @@ -327,30 +335,46 @@ else:
327 335 DEFAULT_PATH_ENCODING = None
328 336  
329 337  
330   -#=== DEBUGGING ===============================================================
  338 +# === LOGGING =================================================================
331 339  
332   -#TODO: replace this by proper logging
333   -
334   -#[PL] DEBUG display mode: False by default, use set_debug_mode() or "-d" on
335   -# command line to change it.
336   -DEBUG_MODE = False
337   -def debug_print(msg):
338   - print(msg)
339   -def debug_pass(msg):
340   - pass
341   -debug = debug_pass
  340 +class NullHandler(logging.Handler):
  341 + """
  342 + Log Handler without output, to avoid printing messages if logging is not
  343 + configured by the main application.
  344 + Python 2.7 has logging.NullHandler, but this is necessary for 2.6:
  345 + see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library
  346 + """
  347 + def emit(self, record):
  348 + pass
342 349  
343   -def set_debug_mode(debug_mode):
  350 +def get_logger(name, level=logging.CRITICAL+1):
344 351 """
345   - Set debug mode on or off, to control display of debugging messages.
346   - :param mode: True or False
  352 + Create a suitable logger object for this module.
  353 + The goal is not to change settings of the root logger, to avoid getting
  354 + other modules' logs on the screen.
  355 + If a logger exists with same name, reuse it. (Else it would have duplicate
  356 + handlers and messages would be doubled.)
  357 + The level is set to CRITICAL+1 by default, to avoid any logging.
347 358 """
348   - global DEBUG_MODE, debug
349   - DEBUG_MODE = debug_mode
350   - if debug_mode:
351   - debug = debug_print
352   - else:
353   - debug = debug_pass
  359 + # First, test if there is already a logger with the same name, else it
  360 + # will generate duplicate messages (due to duplicate handlers):
  361 + if name in logging.Logger.manager.loggerDict:
  362 + #NOTE: another less intrusive but more "hackish" solution would be to
  363 + # use getLogger then test if its effective level is not default.
  364 + logger = logging.getLogger(name)
  365 + # make sure level is OK:
  366 + logger.setLevel(level)
  367 + return logger
  368 + # get a new logger:
  369 + logger = logging.getLogger(name)
  370 + # only add a NullHandler for this logger, it is up to the application
  371 + # to configure its own logging:
  372 + logger.addHandler(NullHandler())
  373 + logger.setLevel(level)
  374 + return logger
  375 +
  376 +# a global logger object used for debugging:
  377 +log = get_logger('olefile')
354 378  
355 379  
356 380 #=== CONSTANTS ===============================================================
... ... @@ -518,7 +542,7 @@ def filetime2datetime(filetime):
518 542 # TODO: manage exception when microseconds is too large
519 543 # inspired from http://code.activestate.com/recipes/511425-filetime-to-datetime/
520 544 _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0)
521   - #debug('timedelta days=%d' % (filetime//(10*1000000*3600*24)))
  545 + #log.debug('timedelta days=%d' % (filetime//(10*1000000*3600*24)))
522 546 return _FILETIME_null_date + datetime.timedelta(microseconds=filetime//10)
523 547  
524 548  
... ... @@ -695,6 +719,7 @@ class _OleStream(io.BytesIO):
695 719  
696 720 - size: actual size of data stream, after it was opened.
697 721 """
  722 + #TODO: use _raise_defect instead of exceptions
698 723  
699 724 # FIXME: should store the list of sects obtained by following
700 725 # the fat chain, and load new sectors on demand instead of
... ... @@ -713,8 +738,8 @@ class _OleStream(io.BytesIO):
713 738 :param filesize: size of OLE file (for debugging)
714 739 :returns: a BytesIO instance containing the OLE stream
715 740 """
716   - debug('_OleStream.__init__:')
717   - debug(' sect=%d (%X), size=%d, offset=%d, sectorsize=%d, len(fat)=%d, fp=%s'
  741 + log.debug('_OleStream.__init__:')
  742 + log.debug(' sect=%d (%X), size=%d, offset=%d, sectorsize=%d, len(fat)=%d, fp=%s'
718 743 %(sect,sect,size,offset,sectorsize,len(fat), repr(fp)))
719 744 #[PL] To detect malformed documents with FAT loops, we compute the
720 745 # expected number of sectors in the stream:
... ... @@ -726,9 +751,9 @@ class _OleStream(io.BytesIO):
726 751 size = len(fat)*sectorsize
727 752 # and we keep a record that size was unknown:
728 753 unknown_size = True
729   - debug(' stream with UNKNOWN SIZE')
  754 + log.debug(' stream with UNKNOWN SIZE')
730 755 nb_sectors = (size + (sectorsize-1)) // sectorsize
731   - debug('nb_sectors = %d' % nb_sectors)
  756 + log.debug('nb_sectors = %d' % nb_sectors)
732 757 # This number should (at least) be less than the total number of
733 758 # sectors in the given FAT:
734 759 if nb_sectors > len(fat):
... ... @@ -739,7 +764,7 @@ class _OleStream(io.BytesIO):
739 764 data = []
740 765 # if size is zero, then first sector index should be ENDOFCHAIN:
741 766 if size == 0 and sect != ENDOFCHAIN:
742   - debug('size == 0 and sect != ENDOFCHAIN:')
  767 + log.debug('size == 0 and sect != ENDOFCHAIN:')
743 768 raise IOError('incorrect OLE sector index for empty stream')
744 769 #[PL] A fixed-length for loop is used instead of an undefined while
745 770 # loop to avoid DoS attacks:
... ... @@ -750,24 +775,24 @@ class _OleStream(io.BytesIO):
750 775 break
751 776 else:
752 777 # else this means that the stream is smaller than declared:
753   - debug('sect=ENDOFCHAIN before expected size')
  778 + log.debug('sect=ENDOFCHAIN before expected size')
754 779 raise IOError('incomplete OLE stream')
755 780 # sector index should be within FAT:
756 781 if sect<0 or sect>=len(fat):
757   - debug('sect=%d (%X) / len(fat)=%d' % (sect, sect, len(fat)))
758   - debug('i=%d / nb_sectors=%d' %(i, nb_sectors))
  782 + log.debug('sect=%d (%X) / len(fat)=%d' % (sect, sect, len(fat)))
  783 + log.debug('i=%d / nb_sectors=%d' %(i, nb_sectors))
759 784 ## tmp_data = b"".join(data)
760 785 ## f = open('test_debug.bin', 'wb')
761 786 ## f.write(tmp_data)
762 787 ## f.close()
763   -## debug('data read so far: %d bytes' % len(tmp_data))
  788 +## log.debug('data read so far: %d bytes' % len(tmp_data))
764 789 raise IOError('incorrect OLE FAT, sector index out of range')
765 790 #TODO: merge this code with OleFileIO.getsect() ?
766 791 #TODO: check if this works with 4K sectors:
767 792 try:
768 793 fp.seek(offset + sectorsize * sect)
769 794 except:
770   - debug('sect=%d, seek=%d, filesize=%d' %
  795 + log.debug('sect=%d, seek=%d, filesize=%d' %
771 796 (sect, offset+sectorsize*sect, filesize))
772 797 raise IOError('OLE sector index out of range')
773 798 sector_data = fp.read(sectorsize)
... ... @@ -776,9 +801,9 @@ class _OleStream(io.BytesIO):
776 801 # complete sector (of 512 or 4K), so we may read less than
777 802 # sectorsize.
778 803 if len(sector_data)!=sectorsize and sect!=(len(fat)-1):
779   - debug('sect=%d / len(fat)=%d, seek=%d / filesize=%d, len read=%d' %
  804 + log.debug('sect=%d / len(fat)=%d, seek=%d / filesize=%d, len read=%d' %
780 805 (sect, len(fat), offset+sectorsize*sect, filesize, len(sector_data)))
781   - debug('seek+len(read)=%d' % (offset+sectorsize*sect+len(sector_data)))
  806 + log.debug('seek+len(read)=%d' % (offset+sectorsize*sect+len(sector_data)))
782 807 raise IOError('incomplete OLE sector')
783 808 data.append(sector_data)
784 809 # jump to next sector in the FAT:
... ... @@ -802,7 +827,8 @@ class _OleStream(io.BytesIO):
802 827 self.size = len(data)
803 828 else:
804 829 # read data is less than expected:
805   - debug('len(data)=%d, size=%d' % (len(data), size))
  830 + log.debug('len(data)=%d, size=%d' % (len(data), size))
  831 + # TODO: provide details in exception message
806 832 raise IOError('OLE stream size is less than declared')
807 833 # when all data is read in memory, BytesIO constructor is called
808 834 io.BytesIO.__init__(self, data)
... ... @@ -888,7 +914,7 @@ class _OleDirectoryEntry:
888 914 olefile._raise_defect(DEFECT_INCORRECT, 'duplicate OLE root entry')
889 915 if sid == 0 and self.entry_type != STGTY_ROOT:
890 916 olefile._raise_defect(DEFECT_INCORRECT, 'incorrect OLE root entry')
891   - #debug (struct.unpack(fmt_entry, entry[:len_entry]))
  917 + #log.debug(struct.unpack(fmt_entry, entry[:len_entry]))
892 918 # name should be at most 31 unicode characters + null character,
893 919 # so 64 bytes in total (31*2 + 2):
894 920 if self.namelength>64:
... ... @@ -903,10 +929,10 @@ class _OleDirectoryEntry:
903 929 # name is converted from UTF-16LE to the path encoding specified in the OleFileIO:
904 930 self.name = olefile._decode_utf16_str(self.name_utf16)
905 931  
906   - debug('DirEntry SID=%d: %s' % (self.sid, repr(self.name)))
907   - debug(' - type: %d' % self.entry_type)
908   - debug(' - sect: %d' % self.isectStart)
909   - debug(' - SID left: %d, right: %d, child: %d' % (self.sid_left,
  932 + log.debug('DirEntry SID=%d: %s' % (self.sid, repr(self.name)))
  933 + log.debug(' - type: %d' % self.entry_type)
  934 + log.debug(' - sect: %Xh' % self.isectStart)
  935 + log.debug(' - SID left: %d, right: %d, child: %d' % (self.sid_left,
910 936 self.sid_right, self.sid_child))
911 937  
912 938 # sizeHigh is only used for 4K sectors, it should be zero for 512 bytes
... ... @@ -914,13 +940,14 @@ class _OleDirectoryEntry:
914 940 # or some other value so it cannot be raised as a defect in general:
915 941 if olefile.sectorsize == 512:
916 942 if self.sizeHigh != 0 and self.sizeHigh != 0xFFFFFFFF:
917   - debug('sectorsize=%d, sizeLow=%d, sizeHigh=%d (%X)' %
  943 + log.debug('sectorsize=%d, sizeLow=%d, sizeHigh=%d (%X)' %
918 944 (olefile.sectorsize, self.sizeLow, self.sizeHigh, self.sizeHigh))
919 945 olefile._raise_defect(DEFECT_UNSURE, 'incorrect OLE stream size')
920 946 self.size = self.sizeLow
921 947 else:
922 948 self.size = self.sizeLow + (long(self.sizeHigh)<<32)
923   - debug(' - size: %d (sizeLow=%d, sizeHigh=%d)' % (self.size, self.sizeLow, self.sizeHigh))
  949 + log.debug(' - size: %d (sizeLow=%d, sizeHigh=%d)' % (self.size, self.sizeLow, self.sizeHigh))
  950 +
924 951 self.clsid = _clsid(clsid)
925 952 # a storage should have a null size, BUT some implementations such as
926 953 # Word 8 for Mac seem to allow non-null values => Potential defect:
... ... @@ -945,7 +972,7 @@ class _OleDirectoryEntry:
945 972 Note that this method builds a tree of all subentries, so it should
946 973 only be called for the root object once.
947 974 """
948   - debug('build_storage_tree: SID=%d - %s - sid_child=%d'
  975 + log.debug('build_storage_tree: SID=%d - %s - sid_child=%d'
949 976 % (self.sid, repr(self.name), self.sid_child))
950 977 if self.sid_child != NOSTREAM:
951 978 # if child SID is not NOSTREAM, then this entry is a storage.
... ... @@ -980,7 +1007,7 @@ class _OleDirectoryEntry:
980 1007 self.olefile._raise_defect(DEFECT_FATAL, 'OLE DirEntry index out of range')
981 1008 # get child direntry:
982 1009 child = self.olefile._load_direntry(child_sid) #direntries[child_sid]
983   - debug('append_kids: child_sid=%d - %s - sid_left=%d, sid_right=%d, sid_child=%d'
  1010 + log.debug('append_kids: child_sid=%d - %s - sid_left=%d, sid_right=%d, sid_child=%d'
984 1011 % (child.sid, repr(child.name), child.sid_left, child.sid_right, child.sid_child))
985 1012 # the directory entries are organized as a red-black tree.
986 1013 # (cf. Wikipedia for details)
... ... @@ -1121,14 +1148,13 @@ class OleFileIO:
1121 1148 :param write_mode: bool, if True the file is opened in read/write mode instead
1122 1149 of read-only by default.
1123 1150  
1124   - :param debug: bool, set debug mode
  1151 + :param debug: bool, set debug mode (deprecated, not used anymore)
1125 1152  
1126 1153 :param path_encoding: None or str, name of the codec to use for path
1127 1154 names (streams and storages), or None for Unicode.
1128 1155 Unicode by default on Python 3+, UTF-8 on Python 2.x.
1129 1156 (new in olefile 0.42, was hardcoded to Latin-1 until olefile v0.41)
1130 1157 """
1131   - set_debug_mode(debug)
1132 1158 # minimal level for defects to be raised as exceptions:
1133 1159 self._raise_defects_level = raise_defects
1134 1160 # list of defects/issues not raised as exceptions:
... ... @@ -1160,10 +1186,12 @@ class OleFileIO:
1160 1186 """
1161 1187 # added by [PL]
1162 1188 if defect_level >= self._raise_defects_level:
  1189 + log.error(message)
1163 1190 raise exception_type(message)
1164 1191 else:
1165 1192 # just record the issue, no exception raised:
1166 1193 self.parsing_issues.append((exception_type, message))
  1194 + log.warning(message)
1167 1195  
1168 1196  
1169 1197 def _decode_utf16_str(self, utf16_str, errors='replace'):
... ... @@ -1235,6 +1263,7 @@ class OleFileIO:
1235 1263 finally:
1236 1264 self.fp.seek(0)
1237 1265 self._filesize = filesize
  1266 + log.debug('File size: %d' % self._filesize)
1238 1267  
1239 1268 # lists of streams in FAT and MiniFAT, to detect duplicate references
1240 1269 # (list of indexes of first sectors of each stream)
... ... @@ -1244,6 +1273,7 @@ class OleFileIO:
1244 1273 header = self.fp.read(512)
1245 1274  
1246 1275 if len(header) != 512 or header[:8] != MAGIC:
  1276 + log.debug('Magic = %r instead of %r' % (header[:8], MAGIC))
1247 1277 self._raise_defect(DEFECT_FATAL, "not an OLE2 structured storage file")
1248 1278  
1249 1279 # [PL] header structure according to AAF specifications:
... ... @@ -1285,120 +1315,125 @@ class OleFileIO:
1285 1315 # '<' indicates little-endian byte ordering for Intel (cf. struct module help)
1286 1316 fmt_header = '<8s16sHHHHHHLLLLLLLLLL'
1287 1317 header_size = struct.calcsize(fmt_header)
1288   - debug( "fmt_header size = %d, +FAT = %d" % (header_size, header_size + 109*4) )
  1318 + log.debug( "fmt_header size = %d, +FAT = %d" % (header_size, header_size + 109*4) )
1289 1319 header1 = header[:header_size]
1290 1320 (
1291   - self.Sig,
1292   - self.clsid,
1293   - self.MinorVersion,
1294   - self.DllVersion,
1295   - self.ByteOrder,
1296   - self.SectorShift,
1297   - self.MiniSectorShift,
1298   - self.Reserved, self.Reserved1,
1299   - self.csectDir,
1300   - self.csectFat,
1301   - self.sectDirStart,
1302   - self.signature,
1303   - self.MiniSectorCutoff,
1304   - self.MiniFatStart,
1305   - self.csectMiniFat,
1306   - self.sectDifStart,
1307   - self.csectDif
  1321 + self.header_signature,
  1322 + self.header_clsid,
  1323 + self.minor_version,
  1324 + self.dll_version,
  1325 + self.byte_order,
  1326 + self.sector_shift,
  1327 + self.mini_sector_shift,
  1328 + self.reserved1,
  1329 + self.reserved2,
  1330 + self.num_dir_sectors,
  1331 + self.num_fat_sectors,
  1332 + self.first_dir_sector,
  1333 + self.transaction_signature_number,
  1334 + self.mini_stream_cutoff_size,
  1335 + self.first_mini_fat_sector,
  1336 + self.num_mini_fat_sectors,
  1337 + self.first_difat_sector,
  1338 + self.num_difat_sectors
1308 1339 ) = struct.unpack(fmt_header, header1)
1309   - debug( struct.unpack(fmt_header, header1))
  1340 + log.debug( struct.unpack(fmt_header, header1))
1310 1341  
1311   - if self.Sig != MAGIC:
  1342 + if self.header_signature != MAGIC:
1312 1343 # OLE signature should always be present
1313 1344 self._raise_defect(DEFECT_FATAL, "incorrect OLE signature")
1314   - if self.clsid != bytearray(16):
  1345 + if self.header_clsid != bytearray(16):
1315 1346 # according to AAF specs, CLSID should always be zero
1316 1347 self._raise_defect(DEFECT_INCORRECT, "incorrect CLSID in OLE header")
1317   - debug( "MinorVersion = %d" % self.MinorVersion )
1318   - debug( "DllVersion = %d" % self.DllVersion )
1319   - if self.DllVersion not in [3, 4]:
  1348 + log.debug( "Minor Version = %d" % self.minor_version )
  1349 + log.debug( "DLL Version = %d (expected: 3 or 4)" % self.dll_version )
  1350 + if self.dll_version not in [3, 4]:
1320 1351 # version 3: usual format, 512 bytes per sector
1321 1352 # version 4: large format, 4K per sector
1322 1353 self._raise_defect(DEFECT_INCORRECT, "incorrect DllVersion in OLE header")
1323   - debug( "ByteOrder = %X" % self.ByteOrder )
1324   - if self.ByteOrder != 0xFFFE:
  1354 + log.debug( "Byte Order = %X (expected: FFFE)" % self.byte_order )
  1355 + if self.byte_order != 0xFFFE:
1325 1356 # For now only common little-endian documents are handled correctly
1326 1357 self._raise_defect(DEFECT_FATAL, "incorrect ByteOrder in OLE header")
1327 1358 # TODO: add big-endian support for documents created on Mac ?
1328 1359 # But according to [MS-CFB] ? v20140502, ByteOrder MUST be 0xFFFE.
1329   - self.SectorSize = 2**self.SectorShift
1330   - debug( "SectorSize = %d" % self.SectorSize )
1331   - if self.SectorSize not in [512, 4096]:
1332   - self._raise_defect(DEFECT_INCORRECT, "incorrect SectorSize in OLE header")
1333   - if (self.DllVersion==3 and self.SectorSize!=512) \
1334   - or (self.DllVersion==4 and self.SectorSize!=4096):
1335   - self._raise_defect(DEFECT_INCORRECT, "SectorSize does not match DllVersion in OLE header")
1336   - self.MiniSectorSize = 2**self.MiniSectorShift
1337   - debug( "MiniSectorSize = %d" % self.MiniSectorSize )
1338   - if self.MiniSectorSize not in [64]:
1339   - self._raise_defect(DEFECT_INCORRECT, "incorrect MiniSectorSize in OLE header")
1340   - if self.Reserved != 0 or self.Reserved1 != 0:
  1360 + self.sector_size = 2**self.sector_shift
  1361 + log.debug( "Sector Size = %d bytes (expected: 512 or 4096)" % self.sector_size )
  1362 + if self.sector_size not in [512, 4096]:
  1363 + self._raise_defect(DEFECT_INCORRECT, "incorrect sector_size in OLE header")
  1364 + if (self.dll_version==3 and self.sector_size!=512) \
  1365 + or (self.dll_version==4 and self.sector_size!=4096):
  1366 + self._raise_defect(DEFECT_INCORRECT, "sector_size does not match DllVersion in OLE header")
  1367 + self.mini_sector_size = 2**self.mini_sector_shift
  1368 + log.debug( "MiniFAT Sector Size = %d bytes (expected: 64)" % self.mini_sector_size )
  1369 + if self.mini_sector_size not in [64]:
  1370 + self._raise_defect(DEFECT_INCORRECT, "incorrect mini_sector_size in OLE header")
  1371 + if self.reserved1 != 0 or self.reserved2 != 0:
1341 1372 self._raise_defect(DEFECT_INCORRECT, "incorrect OLE header (non-null reserved bytes)")
1342   - debug( "csectDir = %d" % self.csectDir )
  1373 + log.debug( "Number of directory sectors = %d" % self.num_dir_sectors )
1343 1374 # Number of directory sectors (only allowed if DllVersion != 3)
1344   - if self.SectorSize==512 and self.csectDir!=0:
1345   - self._raise_defect(DEFECT_INCORRECT, "incorrect csectDir in OLE header")
1346   - debug( "csectFat = %d" % self.csectFat )
1347   - # csectFat = number of FAT sectors in the file
1348   - debug( "sectDirStart = %X" % self.sectDirStart )
1349   - # sectDirStart = 1st sector containing the directory
1350   - debug( "signature = %d" % self.signature )
  1375 + if self.sector_size==512 and self.num_dir_sectors!=0:
  1376 + self._raise_defect(DEFECT_INCORRECT, "incorrect number of directory sectors in OLE header")
  1377 + log.debug( "num_fat_sectors = %d" % self.num_fat_sectors )
  1378 + # num_fat_sectors = number of FAT sectors in the file
  1379 + log.debug( "first_dir_sector = %X" % self.first_dir_sector )
  1380 + # first_dir_sector = 1st sector containing the directory
  1381 + log.debug( "transaction_signature_number = %d" % self.transaction_signature_number )
1351 1382 # Signature should be zero, BUT some implementations do not follow this
1352 1383 # rule => only a potential defect:
1353 1384 # (according to MS-CFB, may be != 0 for applications supporting file
1354 1385 # transactions)
1355   - if self.signature != 0:
1356   - self._raise_defect(DEFECT_POTENTIAL, "incorrect OLE header (signature>0)")
1357   - debug( "MiniSectorCutoff = %d" % self.MiniSectorCutoff )
  1386 + if self.transaction_signature_number != 0:
  1387 + self._raise_defect(DEFECT_POTENTIAL, "incorrect OLE header (transaction_signature_number>0)")
  1388 + log.debug( "mini_stream_cutoff_size = 0x%X (expected: 0x1000)" % self.mini_stream_cutoff_size )
1358 1389 # MS-CFB: This integer field MUST be set to 0x00001000. This field
1359 1390 # specifies the maximum size of a user-defined data stream allocated
1360 1391 # from the mini FAT and mini stream, and that cutoff is 4096 bytes.
1361 1392 # Any user-defined data stream larger than or equal to this cutoff size
1362 1393 # must be allocated as normal sectors from the FAT.
1363   - if self.MiniSectorCutoff != 0x1000:
1364   - self._raise_defect(DEFECT_INCORRECT, "incorrect MiniSectorCutoff in OLE header")
1365   - debug( "MiniFatStart = %X" % self.MiniFatStart )
1366   - debug( "csectMiniFat = %d" % self.csectMiniFat )
1367   - debug( "sectDifStart = %X" % self.sectDifStart )
1368   - debug( "csectDif = %d" % self.csectDif )
  1394 + if self.mini_stream_cutoff_size != 0x1000:
  1395 + self._raise_defect(DEFECT_INCORRECT, "incorrect mini_stream_cutoff_size in OLE header")
  1396 + # if no exception is raised, the cutoff size is fixed to 0x1000
  1397 + log.warning('Fixing the mini_stream_cutoff_size to 4096 (mandatory value) instead of %d' %
  1398 + self.mini_stream_cutoff_size)
  1399 + self.mini_stream_cutoff_size = 0x1000
  1400 + log.debug( "first_mini_fat_sector = %Xh" % self.first_mini_fat_sector )
  1401 + log.debug( "num_mini_fat_sectors = %d" % self.num_mini_fat_sectors )
  1402 + log.debug( "first_difat_sector = %Xh" % self.first_difat_sector )
  1403 + log.debug( "num_difat_sectors = %d" % self.num_difat_sectors )
1369 1404  
1370 1405 # calculate the number of sectors in the file
1371 1406 # (-1 because header doesn't count)
1372   - self.nb_sect = ( (filesize + self.SectorSize-1) // self.SectorSize) - 1
1373   - debug( "Number of sectors in the file: %d" % self.nb_sect )
  1407 + self.nb_sect = ( (filesize + self.sector_size-1) // self.sector_size) - 1
  1408 + log.debug( "Number of sectors in the file: %d" % self.nb_sect )
1374 1409 #TODO: change this test, because an OLE file MAY contain other data
1375 1410 # after the last sector.
1376 1411  
1377 1412 # file clsid
1378   - self.clsid = _clsid(header[8:24])
  1413 + self.header_clsid = _clsid(header[8:24])
1379 1414  
1380 1415 #TODO: remove redundant attributes, and fix the code which uses them?
1381   - self.sectorsize = self.SectorSize #1 << i16(header, 30)
1382   - self.minisectorsize = self.MiniSectorSize #1 << i16(header, 32)
1383   - self.minisectorcutoff = self.MiniSectorCutoff # i32(header, 56)
  1416 + self.sectorsize = self.sector_size #1 << i16(header, 30)
  1417 + self.minisectorsize = self.mini_sector_size #1 << i16(header, 32)
  1418 + self.minisectorcutoff = self.mini_stream_cutoff_size # i32(header, 56)
1384 1419  
1385 1420 # check known streams for duplicate references (these are always in FAT,
1386 1421 # never in MiniFAT):
1387   - self._check_duplicate_stream(self.sectDirStart)
  1422 + self._check_duplicate_stream(self.first_dir_sector)
1388 1423 # check MiniFAT only if it is not empty:
1389   - if self.csectMiniFat:
1390   - self._check_duplicate_stream(self.MiniFatStart)
  1424 + if self.num_mini_fat_sectors:
  1425 + self._check_duplicate_stream(self.first_mini_fat_sector)
1391 1426 # check DIFAT only if it is not empty:
1392   - if self.csectDif:
1393   - self._check_duplicate_stream(self.sectDifStart)
  1427 + if self.num_difat_sectors:
  1428 + self._check_duplicate_stream(self.first_difat_sector)
1394 1429  
1395 1430 # Load file allocation tables
1396 1431 self.loadfat(header)
1397 1432 # Load direcory. This sets both the direntries list (ordered by sid)
1398 1433 # and the root (ordered by hierarchy) members.
1399   - self.loaddirectory(self.sectDirStart)#i32(header, 48))
  1434 + self.loaddirectory(self.first_dir_sector)#i32(header, 48))
1400 1435 self.ministream = None
1401   - self.minifatsect = self.MiniFatStart #i32(header, 60)
  1436 + self.minifatsect = self.first_mini_fat_sector #i32(header, 60)
1402 1437  
1403 1438  
1404 1439 def close(self):
... ... @@ -1418,10 +1453,10 @@ class OleFileIO:
1418 1453 :param minifat: bool, if True, stream is located in the MiniFAT, else in the FAT
1419 1454 """
1420 1455 if minifat:
1421   - debug('_check_duplicate_stream: sect=%d in MiniFAT' % first_sect)
  1456 + log.debug('_check_duplicate_stream: sect=%Xh in MiniFAT' % first_sect)
1422 1457 used_streams = self._used_streams_minifat
1423 1458 else:
1424   - debug('_check_duplicate_stream: sect=%d in FAT' % first_sect)
  1459 + log.debug('_check_duplicate_stream: sect=%Xh in FAT' % first_sect)
1425 1460 # some values can be safely ignored (not a real stream):
1426 1461 if first_sect in (DIFSECT,FATSECT,ENDOFCHAIN,FREESECT):
1427 1462 return
... ... @@ -1435,10 +1470,9 @@ class OleFileIO:
1435 1470  
1436 1471  
1437 1472 def dumpfat(self, fat, firstindex=0):
1438   - "Displays a part of FAT in human-readable form for debugging purpose"
1439   - # [PL] added only for debug
1440   - if not DEBUG_MODE:
1441   - return
  1473 + """
  1474 + Display a part of FAT in human-readable form for debugging purposes
  1475 + """
1442 1476 # dictionary to convert special FAT values in human-readable strings
1443 1477 VPL = 8 # values per line (8+1 * 8+1 = 81)
1444 1478 fatnames = {
... ... @@ -1455,7 +1489,7 @@ class OleFileIO:
1455 1489 print()
1456 1490 for l in range(nlines):
1457 1491 index = l*VPL
1458   - print("%8X:" % (firstindex+index), end=" ")
  1492 + print("%6X:" % (firstindex+index), end=" ")
1459 1493 for i in range(index, index+VPL):
1460 1494 if i>=nbsect:
1461 1495 break
... ... @@ -1473,9 +1507,9 @@ class OleFileIO:
1473 1507  
1474 1508  
1475 1509 def dumpsect(self, sector, firstindex=0):
1476   - "Displays a sector in a human-readable form, for debugging purpose."
1477   - if not DEBUG_MODE:
1478   - return
  1510 + """
  1511 + Display a sector in a human-readable form, for debugging purposes
  1512 + """
1479 1513 VPL=8 # number of values per line (8+1 * 8+1 = 81)
1480 1514 tab = array.array(UINT32, sector)
1481 1515 if sys.byteorder == 'big':
... ... @@ -1488,7 +1522,7 @@ class OleFileIO:
1488 1522 print()
1489 1523 for l in range(nlines):
1490 1524 index = l*VPL
1491   - print("%8X:" % (firstindex+index), end=" ")
  1525 + print("%6X:" % (firstindex+index), end=" ")
1492 1526 for i in range(index, index+VPL):
1493 1527 if i>=nbsect:
1494 1528 break
... ... @@ -1523,14 +1557,18 @@ class OleFileIO:
1523 1557 else:
1524 1558 # if it's a raw sector, it is parsed in an array
1525 1559 fat1 = self.sect2array(sect)
1526   - self.dumpsect(sect)
  1560 + # Display the sector contents only if the logging level is debug:
  1561 + if log.isEnabledFor(logging.DEBUG):
  1562 + self.dumpsect(sect)
1527 1563 # The FAT is a sector chain starting at the first index of itself.
  1564 + # initialize isect, just in case:
  1565 + isect = None
1528 1566 for isect in fat1:
1529 1567 isect = isect & 0xFFFFFFFF # JYTHON-WORKAROUND
1530   - debug("isect = %X" % isect)
  1568 + log.debug("isect = %X" % isect)
1531 1569 if isect == ENDOFCHAIN or isect == FREESECT:
1532 1570 # the end of the sector chain has been reached
1533   - debug("found end of sector chain")
  1571 + log.debug("found end of sector chain")
1534 1572 break
1535 1573 # read the FAT sector
1536 1574 s = self.getsect(isect)
... ... @@ -1551,7 +1589,7 @@ class OleFileIO:
1551 1589 # Additional sectors are described by DIF blocks
1552 1590  
1553 1591 sect = header[76:512]
1554   - debug( "len(sect)=%d, so %d integers" % (len(sect), len(sect)//4) )
  1592 + log.debug( "len(sect)=%d, so %d integers" % (len(sect), len(sect)//4) )
1555 1593 #fat = []
1556 1594 # [PL] FAT is an array of 32 bits unsigned ints, it's more effective
1557 1595 # to use an array than a list in Python.
... ... @@ -1567,53 +1605,57 @@ class OleFileIO:
1567 1605 ## s = self.getsect(ix)
1568 1606 ## #fat = fat + [i32(s, i) for i in range(0, len(s), 4)]
1569 1607 ## fat = fat + array.array(UINT32, s)
1570   - if self.csectDif != 0:
  1608 + if self.num_difat_sectors != 0:
1571 1609 # [PL] There's a DIFAT because file is larger than 6.8MB
1572 1610 # some checks just in case:
1573   - if self.csectFat <= 109:
  1611 + if self.num_fat_sectors <= 109:
1574 1612 # there must be at least 109 blocks in header and the rest in
1575 1613 # DIFAT, so number of sectors must be >109.
1576 1614 self._raise_defect(DEFECT_INCORRECT, 'incorrect DIFAT, not enough sectors')
1577   - if self.sectDifStart >= self.nb_sect:
  1615 + if self.first_difat_sector >= self.nb_sect:
1578 1616 # initial DIFAT block index must be valid
1579 1617 self._raise_defect(DEFECT_FATAL, 'incorrect DIFAT, first index out of range')
1580   - debug( "DIFAT analysis..." )
  1618 + log.debug( "DIFAT analysis..." )
1581 1619 # We compute the necessary number of DIFAT sectors :
1582 1620 # Number of pointers per DIFAT sector = (sectorsize/4)-1
1583 1621 # (-1 because the last pointer is the next DIFAT sector number)
1584 1622 nb_difat_sectors = (self.sectorsize//4)-1
1585 1623 # (if 512 bytes: each DIFAT sector = 127 pointers + 1 towards next DIFAT sector)
1586   - nb_difat = (self.csectFat-109 + nb_difat_sectors-1)//nb_difat_sectors
1587   - debug( "nb_difat = %d" % nb_difat )
1588   - if self.csectDif != nb_difat:
  1624 + nb_difat = (self.num_fat_sectors-109 + nb_difat_sectors-1)//nb_difat_sectors
  1625 + log.debug( "nb_difat = %d" % nb_difat )
  1626 + if self.num_difat_sectors != nb_difat:
1589 1627 raise IOError('incorrect DIFAT')
1590   - isect_difat = self.sectDifStart
  1628 + isect_difat = self.first_difat_sector
1591 1629 for i in iterrange(nb_difat):
1592   - debug( "DIFAT block %d, sector %X" % (i, isect_difat) )
  1630 + log.debug( "DIFAT block %d, sector %X" % (i, isect_difat) )
1593 1631 #TODO: check if corresponding FAT SID = DIFSECT
1594 1632 sector_difat = self.getsect(isect_difat)
1595 1633 difat = self.sect2array(sector_difat)
1596   - self.dumpsect(sector_difat)
  1634 + # Display the sector contents only if the logging level is debug:
  1635 + if log.isEnabledFor(logging.DEBUG):
  1636 + self.dumpsect(sector_difat)
1597 1637 self.loadfat_sect(difat[:nb_difat_sectors])
1598 1638 # last DIFAT pointer is next DIFAT sector:
1599 1639 isect_difat = difat[nb_difat_sectors]
1600   - debug( "next DIFAT sector: %X" % isect_difat )
  1640 + log.debug( "next DIFAT sector: %X" % isect_difat )
1601 1641 # checks:
1602 1642 if isect_difat not in [ENDOFCHAIN, FREESECT]:
1603 1643 # last DIFAT pointer value must be ENDOFCHAIN or FREESECT
1604 1644 raise IOError('incorrect end of DIFAT')
1605   -## if len(self.fat) != self.csectFat:
1606   -## # FAT should contain csectFat blocks
1607   -## print("FAT length: %d instead of %d" % (len(self.fat), self.csectFat))
  1645 +## if len(self.fat) != self.num_fat_sectors:
  1646 +## # FAT should contain num_fat_sectors blocks
  1647 +## print("FAT length: %d instead of %d" % (len(self.fat), self.num_fat_sectors))
1608 1648 ## raise IOError('incorrect DIFAT')
1609 1649 # since FAT is read from fixed-size sectors, it may contain more values
1610 1650 # than the actual number of sectors in the file.
1611 1651 # Keep only the relevant sector indexes:
1612 1652 if len(self.fat) > self.nb_sect:
1613   - debug('len(fat)=%d, shrunk to nb_sect=%d' % (len(self.fat), self.nb_sect))
  1653 + log.debug('len(fat)=%d, shrunk to nb_sect=%d' % (len(self.fat), self.nb_sect))
1614 1654 self.fat = self.fat[:self.nb_sect]
1615   - debug('\nFAT:')
1616   - self.dumpfat(self.fat)
  1655 + # Display the FAT contents only if the logging level is debug:
  1656 + if log.isEnabledFor(logging.DEBUG):
  1657 + log.debug('\nFAT:')
  1658 + self.dumpfat(self.fat)
1617 1659  
1618 1660  
1619 1661 def loadminifat(self):
... ... @@ -1626,15 +1668,15 @@ class OleFileIO:
1626 1668 # 1) Stream size is calculated according to the number of sectors
1627 1669 # declared in the OLE header. This allocated stream may be more than
1628 1670 # needed to store the actual sector indexes.
1629   - # (self.csectMiniFat is the number of sectors of size self.SectorSize)
1630   - stream_size = self.csectMiniFat * self.SectorSize
  1671 + # (self.num_mini_fat_sectors is the number of sectors of size self.sector_size)
  1672 + stream_size = self.num_mini_fat_sectors * self.sector_size
1631 1673 # 2) Actually used size is calculated by dividing the MiniStream size
1632 1674 # (given by root entry size) by the size of mini sectors, *4 for
1633 1675 # 32 bits indexes:
1634   - nb_minisectors = (self.root.size + self.MiniSectorSize-1) // self.MiniSectorSize
  1676 + nb_minisectors = (self.root.size + self.mini_sector_size-1) // self.mini_sector_size
1635 1677 used_size = nb_minisectors * 4
1636   - debug('loadminifat(): minifatsect=%d, nb FAT sectors=%d, used_size=%d, stream_size=%d, nb MiniSectors=%d' %
1637   - (self.minifatsect, self.csectMiniFat, used_size, stream_size, nb_minisectors))
  1678 + log.debug('loadminifat(): minifatsect=%d, nb FAT sectors=%d, used_size=%d, stream_size=%d, nb MiniSectors=%d' %
  1679 + (self.minifatsect, self.num_mini_fat_sectors, used_size, stream_size, nb_minisectors))
1638 1680 if used_size > stream_size:
1639 1681 # This is not really a problem, but may indicate a wrong implementation:
1640 1682 self._raise_defect(DEFECT_INCORRECT, 'OLE MiniStream is larger than MiniFAT')
... ... @@ -1644,11 +1686,13 @@ class OleFileIO:
1644 1686 #self.minifat = [i32(s, i) for i in range(0, len(s), 4)]
1645 1687 self.minifat = self.sect2array(s)
1646 1688 # Then shrink the array to used size, to avoid indexes out of MiniStream:
1647   - debug('MiniFAT shrunk from %d to %d sectors' % (len(self.minifat), nb_minisectors))
  1689 + log.debug('MiniFAT shrunk from %d to %d sectors' % (len(self.minifat), nb_minisectors))
1648 1690 self.minifat = self.minifat[:nb_minisectors]
1649   - debug('loadminifat(): len=%d' % len(self.minifat))
1650   - debug('\nMiniFAT:')
1651   - self.dumpfat(self.minifat)
  1691 + log.debug('loadminifat(): len=%d' % len(self.minifat))
  1692 + # Display the FAT contents only if the logging level is debug:
  1693 + if log.isEnabledFor(logging.DEBUG):
  1694 + log.debug('\nMiniFAT:')
  1695 + self.dumpfat(self.minifat)
1652 1696  
1653 1697 def getsect(self, sect):
1654 1698 """
... ... @@ -1671,12 +1715,12 @@ class OleFileIO:
1671 1715 try:
1672 1716 self.fp.seek(self.sectorsize * (sect+1))
1673 1717 except:
1674   - debug('getsect(): sect=%X, seek=%d, filesize=%d' %
  1718 + log.debug('getsect(): sect=%X, seek=%d, filesize=%d' %
1675 1719 (sect, self.sectorsize*(sect+1), self._filesize))
1676 1720 self._raise_defect(DEFECT_FATAL, 'OLE sector index out of range')
1677 1721 sector = self.fp.read(self.sectorsize)
1678 1722 if len(sector) != self.sectorsize:
1679   - debug('getsect(): sect=%X, read=%d, sectorsize=%d' %
  1723 + log.debug('getsect(): sect=%X, read=%d, sectorsize=%d' %
1680 1724 (sect, len(sector), self.sectorsize))
1681 1725 self._raise_defect(DEFECT_FATAL, 'incomplete OLE sector')
1682 1726 return sector
... ... @@ -1698,7 +1742,7 @@ class OleFileIO:
1698 1742 try:
1699 1743 self.fp.seek(self.sectorsize * (sect+1))
1700 1744 except:
1701   - debug('write_sect(): sect=%X, seek=%d, filesize=%d' %
  1745 + log.debug('write_sect(): sect=%X, seek=%d, filesize=%d' %
1702 1746 (sect, self.sectorsize*(sect+1), self._filesize))
1703 1747 self._raise_defect(DEFECT_FATAL, 'OLE sector index out of range')
1704 1748 if len(data) < self.sectorsize:
... ... @@ -1725,7 +1769,7 @@ class OleFileIO:
1725 1769 #[PL] to detect malformed documents and avoid DoS attacks, the maximum
1726 1770 # number of directory entries can be calculated:
1727 1771 max_entries = self.directory_fp.size // 128
1728   - debug('loaddirectory: size=%d, max_entries=%d' %
  1772 + log.debug('loaddirectory: size=%d, max_entries=%d' %
1729 1773 (self.directory_fp.size, max_entries))
1730 1774  
1731 1775 # Create list of directory entries
... ... @@ -1741,6 +1785,10 @@ class OleFileIO:
1741 1785 root_entry = self._load_direntry(0)
1742 1786 # Root entry is the first entry:
1743 1787 self.root = self.direntries[0]
  1788 + # TODO: read ALL directory entries (ignore bad entries?)
  1789 + # TODO: adapt build_storage_tree to avoid duplicate reads
  1790 + # for i in range(1, max_entries):
  1791 + # self._load_direntry(i)
1744 1792 # read and build all storage trees, starting from the root:
1745 1793 self.root.build_storage_tree()
1746 1794  
... ... @@ -1788,9 +1836,9 @@ class OleFileIO:
1788 1836 :param force_FAT: if False (default), stream will be opened in FAT or MiniFAT
1789 1837 according to size. If True, it will always be opened in FAT.
1790 1838 """
1791   - debug('OleFileIO.open(): sect=%d, size=%d, force_FAT=%s' %
  1839 + log.debug('OleFileIO.open(): sect=%Xh, size=%d, force_FAT=%s' %
1792 1840 (start, size, str(force_FAT)))
1793   - # stream size is compared to the MiniSectorCutoff threshold:
  1841 + # stream size is compared to the mini_stream_cutoff_size threshold:
1794 1842 if size < self.minisectorcutoff and not force_FAT:
1795 1843 # ministream object
1796 1844 if not self.ministream:
... ... @@ -1799,7 +1847,7 @@ class OleFileIO:
1799 1847 # The first sector index of the miniFAT stream is stored in the
1800 1848 # root directory entry:
1801 1849 size_ministream = self.root.size
1802   - debug('Opening MiniStream: sect=%d, size=%d' %
  1850 + log.debug('Opening MiniStream: sect=%Xh, size=%d' %
1803 1851 (self.root.isectStart, size_ministream))
1804 1852 self.ministream = self._open(self.root.isectStart,
1805 1853 size_ministream, force_FAT=True)
... ... @@ -1940,12 +1988,12 @@ class OleFileIO:
1940 1988 sect = entry.isectStart
1941 1989 # number of sectors to write
1942 1990 nb_sectors = (size + (self.sectorsize-1)) // self.sectorsize
1943   - debug('nb_sectors = %d' % nb_sectors)
  1991 + log.debug('nb_sectors = %d' % nb_sectors)
1944 1992 for i in range(nb_sectors):
1945 1993 ## try:
1946 1994 ## self.fp.seek(offset + self.sectorsize * sect)
1947 1995 ## except:
1948   -## debug('sect=%d, seek=%d' %
  1996 +## log.debug('sect=%d, seek=%d' %
1949 1997 ## (sect, offset+self.sectorsize*sect))
1950 1998 ## raise IOError('OLE sector index out of range')
1951 1999 # extract one sector from data, the last one being smaller:
... ... @@ -1956,7 +2004,7 @@ class OleFileIO:
1956 2004 else:
1957 2005 data_sector = data [i*self.sectorsize:]
1958 2006 #TODO: comment this if it works
1959   - debug('write_stream: size=%d sectorsize=%d data_sector=%d size%%sectorsize=%d'
  2007 + log.debug('write_stream: size=%d sectorsize=%d data_sector=%Xh size%%sectorsize=%d'
1960 2008 % (size, self.sectorsize, len(data_sector), size % self.sectorsize))
1961 2009 assert(len(data_sector) % self.sectorsize==size % self.sectorsize)
1962 2010 self.write_sect(sect, data_sector)
... ... @@ -2113,31 +2161,31 @@ class OleFileIO:
2113 2161 return data
2114 2162  
2115 2163 for i in range(num_props):
  2164 + property_id = 0 # just in case of an exception
2116 2165 try:
2117   - id = 0 # just in case of an exception
2118   - id = i32(s, 8+i*8)
  2166 + property_id = i32(s, 8+i*8)
2119 2167 offset = i32(s, 12+i*8)
2120   - type = i32(s, offset)
  2168 + property_type = i32(s, offset)
2121 2169  
2122   - debug ('property id=%d: type=%d offset=%X' % (id, type, offset))
  2170 + log.debug('property id=%d: type=%d offset=%X' % (property_id, property_type, offset))
2123 2171  
2124 2172 # test for common types first (should perhaps use
2125 2173 # a dictionary instead?)
2126 2174  
2127   - if type == VT_I2: # 16-bit signed integer
  2175 + if property_type == VT_I2: # 16-bit signed integer
2128 2176 value = i16(s, offset+4)
2129 2177 if value >= 32768:
2130 2178 value = value - 65536
2131   - elif type == VT_UI2: # 2-byte unsigned integer
  2179 + elif property_type == VT_UI2: # 2-byte unsigned integer
2132 2180 value = i16(s, offset+4)
2133   - elif type in (VT_I4, VT_INT, VT_ERROR):
  2181 + elif property_type in (VT_I4, VT_INT, VT_ERROR):
2134 2182 # VT_I4: 32-bit signed integer
2135 2183 # VT_ERROR: HRESULT, similar to 32-bit signed integer,
2136 2184 # see http://msdn.microsoft.com/en-us/library/cc230330.aspx
2137 2185 value = i32(s, offset+4)
2138   - elif type in (VT_UI4, VT_UINT): # 4-byte unsigned integer
  2186 + elif property_type in (VT_UI4, VT_UINT): # 4-byte unsigned integer
2139 2187 value = i32(s, offset+4) # FIXME
2140   - elif type in (VT_BSTR, VT_LPSTR):
  2188 + elif property_type in (VT_BSTR, VT_LPSTR):
2141 2189 # CodePageString, see http://msdn.microsoft.com/en-us/library/dd942354.aspx
2142 2190 # size is a 32 bits integer, including the null terminator, and
2143 2191 # possibly trailing or embedded null chars
... ... @@ -2146,50 +2194,50 @@ class OleFileIO:
2146 2194 value = s[offset+8:offset+8+count-1]
2147 2195 # remove all null chars:
2148 2196 value = value.replace(b'\x00', b'')
2149   - elif type == VT_BLOB:
  2197 + elif property_type == VT_BLOB:
2150 2198 # binary large object (BLOB)
2151 2199 # see http://msdn.microsoft.com/en-us/library/dd942282.aspx
2152 2200 count = i32(s, offset+4)
2153 2201 value = s[offset+8:offset+8+count]
2154   - elif type == VT_LPWSTR:
  2202 + elif property_type == VT_LPWSTR:
2155 2203 # UnicodeString
2156 2204 # see http://msdn.microsoft.com/en-us/library/dd942313.aspx
2157 2205 # "the string should NOT contain embedded or additional trailing
2158 2206 # null characters."
2159 2207 count = i32(s, offset+4)
2160 2208 value = self._decode_utf16_str(s[offset+8:offset+8+count*2])
2161   - elif type == VT_FILETIME:
  2209 + elif property_type == VT_FILETIME:
2162 2210 value = long(i32(s, offset+4)) + (long(i32(s, offset+8))<<32)
2163 2211 # FILETIME is a 64-bit int: "number of 100ns periods
2164 2212 # since Jan 1,1601".
2165   - if convert_time and id not in no_conversion:
2166   - debug('Converting property #%d to python datetime, value=%d=%fs'
2167   - %(id, value, float(value)/10000000))
  2213 + if convert_time and property_id not in no_conversion:
  2214 + log.debug('Converting property #%d to python datetime, value=%d=%fs'
  2215 + %(property_id, value, float(value)/10000000))
2168 2216 # convert FILETIME to Python datetime.datetime
2169 2217 # inspired from http://code.activestate.com/recipes/511425-filetime-to-datetime/
2170 2218 _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0)
2171   - debug('timedelta days=%d' % (value//(10*1000000*3600*24)))
  2219 + log.debug('timedelta days=%d' % (value//(10*1000000*3600*24)))
2172 2220 value = _FILETIME_null_date + datetime.timedelta(microseconds=value//10)
2173 2221 else:
2174 2222 # legacy code kept for backward compatibility: returns a
2175 2223 # number of seconds since Jan 1,1601
2176 2224 value = value // 10000000 # seconds
2177   - elif type == VT_UI1: # 1-byte unsigned integer
  2225 + elif property_type == VT_UI1: # 1-byte unsigned integer
2178 2226 value = i8(s[offset+4])
2179   - elif type == VT_CLSID:
  2227 + elif property_type == VT_CLSID:
2180 2228 value = _clsid(s[offset+4:offset+20])
2181   - elif type == VT_CF:
  2229 + elif property_type == VT_CF:
2182 2230 # PropertyIdentifier or ClipboardData??
2183 2231 # see http://msdn.microsoft.com/en-us/library/dd941945.aspx
2184 2232 count = i32(s, offset+4)
2185 2233 value = s[offset+8:offset+8+count]
2186   - elif type == VT_BOOL:
  2234 + elif property_type == VT_BOOL:
2187 2235 # VARIANT_BOOL, 16 bits bool, 0x0000=Fals, 0xFFFF=True
2188 2236 # see http://msdn.microsoft.com/en-us/library/cc237864.aspx
2189 2237 value = bool(i16(s, offset+4))
2190 2238 else:
2191 2239 value = None # everything else yields "None"
2192   - debug ('property id=%d: type=%d not implemented in parser yet' % (id, type))
  2240 + log.debug('property id=%d: type=%d not implemented in parser yet' % (property_id, property_type))
2193 2241  
2194 2242 # missing: VT_EMPTY, VT_NULL, VT_R4, VT_R8, VT_CY, VT_DATE,
2195 2243 # VT_DECIMAL, VT_I1, VT_I8, VT_UI8,
... ... @@ -2201,15 +2249,15 @@ class OleFileIO:
2201 2249 # type of items, e.g. VT_VECTOR|VT_BSTR
2202 2250 # see http://msdn.microsoft.com/en-us/library/dd942011.aspx
2203 2251  
2204   - #print("%08x" % id, repr(value), end=" ")
  2252 + #print("%08x" % property_id, repr(value), end=" ")
2205 2253 #print("(%s)" % VT[i32(s, offset) & 0xFFF])
2206 2254  
2207   - data[id] = value
  2255 + data[property_id] = value
2208 2256 except BaseException as exc:
2209 2257 # catch exception while parsing each property, and only raise
2210 2258 # a DEFECT_INCORRECT, because parsing can go on
2211 2259 msg = 'Error while parsing property id %d in stream %s: %s' % (
2212   - id, repr(streampath), exc)
  2260 + property_id, repr(streampath), exc)
2213 2261 self._raise_defect(DEFECT_INCORRECT, msg, type(exc))
2214 2262  
2215 2263 return data
... ... @@ -2233,38 +2281,47 @@ class OleFileIO:
2233 2281  
2234 2282 if __name__ == "__main__":
2235 2283  
2236   - import sys
2237   -
2238   - # [PL] display quick usage info if launched from command-line
2239   - if len(sys.argv) <= 1:
2240   - print('olefile version %s %s - %s' % (__version__, __date__, __author__))
2241   - print(
2242   -"""
2243   -Launched from the command line, this script parses OLE files and prints info.
2244   -
2245   -Usage: olefile.py [-d] [-c] <file> [file2 ...]
  2284 + import sys, optparse
  2285 +
  2286 + DEFAULT_LOG_LEVEL = "warning" # Default log level
  2287 + LOG_LEVELS = {
  2288 + 'debug': logging.DEBUG,
  2289 + 'info': logging.INFO,
  2290 + 'warning': logging.WARNING,
  2291 + 'error': logging.ERROR,
  2292 + 'critical': logging.CRITICAL
  2293 + }
  2294 +
  2295 + usage = 'usage: %prog [options] <filename> [filename2 ...]'
  2296 + parser = optparse.OptionParser(usage=usage)
  2297 + parser.add_option("-c", action="store_true", dest="check_streams",
  2298 + help='check all streams (for debugging purposes)')
  2299 + parser.add_option("-d", action="store_true", dest="debug_mode",
  2300 + help='debug mode, shortcut for -l debug (displays a lot of debug information, for developers only)')
  2301 + parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL,
  2302 + help="logging level debug/info/warning/error/critical (default=%default)")
  2303 +
  2304 + (options, args) = parser.parse_args()
  2305 +
  2306 + print('olefile version %s %s - http://www.decalage.info/en/olefile\n' % (__version__, __date__))
  2307 +
  2308 + # Print help if no arguments are passed
  2309 + if len(args) == 0:
  2310 + print(__doc__)
  2311 + parser.print_help()
  2312 + sys.exit()
2246 2313  
2247   -Options:
2248   --d : debug mode (displays a lot of debug information, for developers only)
2249   --c : check all streams (for debugging purposes)
  2314 + if options.debug_mode:
  2315 + options.loglevel = 'debug'
2250 2316  
2251   -For more information, see http://www.decalage.info/olefile
2252   -""")
2253   - sys.exit()
  2317 + # setup logging to the console
  2318 + logging.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(levelname)-8s %(message)s')
2254 2319  
2255   - check_streams = False
2256   - for filename in sys.argv[1:]:
2257   -## try:
2258   - # OPTIONS:
2259   - if filename == '-d':
2260   - # option to switch debug mode on:
2261   - set_debug_mode(True)
2262   - continue
2263   - if filename == '-c':
2264   - # option to switch check streams mode on:
2265   - check_streams = True
2266   - continue
  2320 + # also set the same log level for the module's logger to enable it:
  2321 + log.setLevel(LOG_LEVELS[options.loglevel])
2267 2322  
  2323 + for filename in args:
  2324 + try:
2268 2325 ole = OleFileIO(filename)#, raise_defects=DEFECT_INCORRECT)
2269 2326 print("-" * 68)
2270 2327 print(filename)
... ... @@ -2272,24 +2329,27 @@ For more information, see http://www.decalage.info/olefile
2272 2329 ole.dumpdirectory()
2273 2330 for streamname in ole.listdir():
2274 2331 if streamname[-1][0] == "\005":
2275   - print(streamname, ": properties")
2276   - props = ole.getproperties(streamname, convert_time=True)
2277   - props = sorted(props.items())
2278   - for k, v in props:
2279   - #[PL]: avoid to display too large or binary values:
2280   - if isinstance(v, (basestring, bytes)):
2281   - if len(v) > 50:
2282   - v = v[:50]
2283   - if isinstance(v, bytes):
2284   - # quick and dirty binary check:
2285   - for c in (1,2,3,4,5,6,7,11,12,14,15,16,17,18,19,20,
2286   - 21,22,23,24,25,26,27,28,29,30,31):
2287   - if c in bytearray(v):
2288   - v = '(binary data)'
2289   - break
2290   - print(" ", k, v)
2291   -
2292   - if check_streams:
  2332 + print("%r: properties" % streamname)
  2333 + try:
  2334 + props = ole.getproperties(streamname, convert_time=True)
  2335 + props = sorted(props.items())
  2336 + for k, v in props:
  2337 + #[PL]: avoid to display too large or binary values:
  2338 + if isinstance(v, (basestring, bytes)):
  2339 + if len(v) > 50:
  2340 + v = v[:50]
  2341 + if isinstance(v, bytes):
  2342 + # quick and dirty binary check:
  2343 + for c in (1,2,3,4,5,6,7,11,12,14,15,16,17,18,19,20,
  2344 + 21,22,23,24,25,26,27,28,29,30,31):
  2345 + if c in bytearray(v):
  2346 + v = '(binary data)'
  2347 + break
  2348 + print(" ", k, v)
  2349 + except:
  2350 + log.exception('Error while parsing property stream %r' % streamname)
  2351 +
  2352 + if options.check_streams:
2293 2353 # Read all streams to check if there are errors:
2294 2354 print('\nChecking streams...')
2295 2355 for streamname in ole.listdir():
... ... @@ -2318,8 +2378,11 @@ For more information, see http://www.decalage.info/olefile
2318 2378 print()
2319 2379  
2320 2380 # parse and display metadata:
2321   - meta = ole.get_metadata()
2322   - meta.dump()
  2381 + try:
  2382 + meta = ole.get_metadata()
  2383 + meta.dump()
  2384 + except:
  2385 + log.exception('Error while parsing metadata')
2323 2386 print()
2324 2387 #[PL] Test a few new methods:
2325 2388 root = ole.get_rootentry_name()
... ... @@ -2338,7 +2401,7 @@ For more information, see http://www.decalage.info/olefile
2338 2401 print('- %s: %s' % (exctype.__name__, msg))
2339 2402 else:
2340 2403 print('None')
2341   -## except IOError as v:
2342   -## print("***", "cannot read", file, "-", v)
  2404 + except:
  2405 + log.exception('Error while parsing file %r' % filename)
2343 2406  
2344 2407 # this code was developed while listening to The Wedding Present "Sea Monsters"
... ...
oletools/thirdparty/olefile/olefile2.py
... ... @@ -1166,33 +1166,33 @@ class OleFileIO:
1166 1166 self._raise_defect(DEFECT_FATAL, "incorrect ByteOrder in OLE header")
1167 1167 # TODO: add big-endian support for documents created on Mac ?
1168 1168 self.SectorSize = 2**self.SectorShift
1169   - debug( "SectorSize = %d" % self.SectorSize )
  1169 + debug( "sector_size = %d" % self.SectorSize )
1170 1170 if self.SectorSize not in [512, 4096]:
1171   - self._raise_defect(DEFECT_INCORRECT, "incorrect SectorSize in OLE header")
  1171 + self._raise_defect(DEFECT_INCORRECT, "incorrect sector_size in OLE header")
1172 1172 if (self.DllVersion==3 and self.SectorSize!=512) \
1173 1173 or (self.DllVersion==4 and self.SectorSize!=4096):
1174   - self._raise_defect(DEFECT_INCORRECT, "SectorSize does not match DllVersion in OLE header")
  1174 + self._raise_defect(DEFECT_INCORRECT, "sector_size does not match DllVersion in OLE header")
1175 1175 self.MiniSectorSize = 2**self.MiniSectorShift
1176   - debug( "MiniSectorSize = %d" % self.MiniSectorSize )
  1176 + debug( "mini_sector_size = %d" % self.MiniSectorSize )
1177 1177 if self.MiniSectorSize not in [64]:
1178   - self._raise_defect(DEFECT_INCORRECT, "incorrect MiniSectorSize in OLE header")
  1178 + self._raise_defect(DEFECT_INCORRECT, "incorrect mini_sector_size in OLE header")
1179 1179 if self.Reserved != 0 or self.Reserved1 != 0:
1180 1180 self._raise_defect(DEFECT_INCORRECT, "incorrect OLE header (non-null reserved bytes)")
1181 1181 debug( "csectDir = %d" % self.csectDir )
1182 1182 if self.SectorSize==512 and self.csectDir!=0:
1183 1183 self._raise_defect(DEFECT_INCORRECT, "incorrect csectDir in OLE header")
1184   - debug( "csectFat = %d" % self.csectFat )
1185   - debug( "sectDirStart = %X" % self.sectDirStart )
1186   - debug( "signature = %d" % self.signature )
  1184 + debug( "num_fat_sectors = %d" % self.csectFat )
  1185 + debug( "first_dir_sector = %X" % self.sectDirStart )
  1186 + debug( "transaction_signature_number = %d" % self.signature )
1187 1187 # Signature should be zero, BUT some implementations do not follow this
1188 1188 # rule => only a potential defect:
1189 1189 if self.signature != 0:
1190   - self._raise_defect(DEFECT_POTENTIAL, "incorrect OLE header (signature>0)")
1191   - debug( "MiniSectorCutoff = %d" % self.MiniSectorCutoff )
1192   - debug( "MiniFatStart = %X" % self.MiniFatStart )
1193   - debug( "csectMiniFat = %d" % self.csectMiniFat )
1194   - debug( "sectDifStart = %X" % self.sectDifStart )
1195   - debug( "csectDif = %d" % self.csectDif )
  1190 + self._raise_defect(DEFECT_POTENTIAL, "incorrect OLE header (transaction_signature_number>0)")
  1191 + debug( "mini_stream_cutoff_size = %d" % self.MiniSectorCutoff )
  1192 + debug( "first_mini_fat_sector = %X" % self.MiniFatStart )
  1193 + debug( "num_mini_fat_sectors = %d" % self.csectMiniFat )
  1194 + debug( "first_difat_sector = %X" % self.sectDifStart )
  1195 + debug( "num_difat_sectors = %d" % self.csectDif )
1196 1196  
1197 1197 # calculate the number of sectors in the file
1198 1198 # (-1 because header doesn't count)
... ... @@ -1414,9 +1414,9 @@ class OleFileIO:
1414 1414 if isect_difat not in [ENDOFCHAIN, FREESECT]:
1415 1415 # last DIFAT pointer value must be ENDOFCHAIN or FREESECT
1416 1416 raise IOError, 'incorrect end of DIFAT'
1417   -## if len(self.fat) != self.csectFat:
1418   -## # FAT should contain csectFat blocks
1419   -## print "FAT length: %d instead of %d" % (len(self.fat), self.csectFat)
  1417 +## if len(self.fat) != self.num_fat_sectors:
  1418 +## # FAT should contain num_fat_sectors blocks
  1419 +## print "FAT length: %d instead of %d" % (len(self.fat), self.num_fat_sectors)
1420 1420 ## raise IOError, 'incorrect DIFAT'
1421 1421 # since FAT is read from fixed-size sectors, it may contain more values
1422 1422 # than the actual number of sectors in the file.
... ... @@ -1438,7 +1438,7 @@ class OleFileIO:
1438 1438 # 1) Stream size is calculated according to the number of sectors
1439 1439 # declared in the OLE header. This allocated stream may be more than
1440 1440 # needed to store the actual sector indexes.
1441   - # (self.csectMiniFat is the number of sectors of size self.SectorSize)
  1441 + # (self.num_mini_fat_sectors is the number of sectors of size self.sector_size)
1442 1442 stream_size = self.csectMiniFat * self.SectorSize
1443 1443 # 2) Actually used size is calculated by dividing the MiniStream size
1444 1444 # (given by root entry size) by the size of mini sectors, *4 for
... ... @@ -1565,7 +1565,7 @@ class OleFileIO:
1565 1565 """
1566 1566 debug('OleFileIO.open(): sect=%d, size=%d, force_FAT=%s' %
1567 1567 (start, size, str(force_FAT)))
1568   - # stream size is compared to the MiniSectorCutoff threshold:
  1568 + # stream size is compared to the mini_stream_cutoff_size threshold:
1569 1569 if size < self.minisectorcutoff and not force_FAT:
1570 1570 # ministream object
1571 1571 if not self.ministream:
... ...