Commit f5b8a4fd7d1dd96294197d76a2e802016c37f9be

Authored by Philippe Lagadec
1 parent 0617289f

updated olefile to latest v0.43

oletools/thirdparty/olefile/LICENSE.txt
1 LICENSE for the olefile package: 1 LICENSE for the olefile package:
2 2
3 -olefile (formerly OleFileIO_PL) is copyright (c) 2005-2015 Philippe Lagadec 3 +olefile (formerly OleFileIO_PL) is copyright (c) 2005-2016 Philippe Lagadec
4 (http://www.decalage.info) 4 (http://www.decalage.info)
5 5
6 All rights reserved. 6 All rights reserved.
oletools/thirdparty/olefile/README.html
1 -<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">  
2 -<html xmlns="http://www.w3.org/1999/xhtml">  
3 -<head>  
4 - <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />  
5 - <meta http-equiv="Content-Style-Type" content="text/css" />  
6 - <meta name="generator" content="pandoc" />  
7 - <title></title>  
8 -</head>  
9 -<body>  
10 -<h1 id="olefile-formerly-olefileio_pl">olefile (formerly OleFileIO_PL)</h1>  
11 -<p><a href="http://www.decalage.info/olefile">olefile</a> is a Python package to parse, read and write <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.</p>  
12 -<p><strong>Quick links:</strong> <a href="http://www.decalage.info/olefile">Home page</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/Install">Download/Install</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">Documentation</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/issues?status=new&amp;status=open">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the author</a> - <a href="https://bitbucket.org/decalage/olefileio_pl">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a></p>  
13 -<h2 id="news">News</h2>  
14 -<p>Follow all updates and news on Twitter: <a href="https://twitter.com/decalage2"><code class="url">https://twitter.com/decalage2</code></a></p>  
15 -<ul>  
16 -<li><strong>2015-01-25 v0.42</strong>: improved handling of special characters in stream/storage names on Python 2.x (using UTF-8 instead of Latin-1), fixed bug in listdir with empty storages.</li>  
17 -<li>2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files stored in byte strings, fixed installer for python 3, added support for Jython (Niko Ehrenfeuchter)</li>  
18 -<li>2014-10-01 v0.40: renamed OleFileIO_PL to olefile, added initial write support for streams &gt;4K, updated doc and license, improved the setup script.</li>  
19 -<li>2014-07-27 v0.31: fixed support for large files with 4K sectors, thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added test scripts from Pillow (by hugovk). Fixed setup for Python 3 (Martin Panter)</li>  
20 -<li>2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin Panter who did most of the hard work.</li>  
21 -<li>2013-07-24 v0.26: added methods to parse stream/storage timestamps, improved listdir to include storages, fixed parsing of direntry timestamps</li>  
22 -<li>2013-05-27 v0.25: improved metadata extraction, properties parsing and exception handling, fixed <a href="https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole">issue #12</a></li>  
23 -<li>2013-05-07 v0.24: new features to extract metadata (get_metadata method and OleMetadata class), improved getproperties to convert timestamps to Python datetime</li>  
24 -<li>2012-10-09: published <a href="http://www.decalage.info/python/oletools">python-oletools</a>, a package of analysis tools based on OleFileIO_PL</li>  
25 -<li>2012-09-11 v0.23: added support for file-like objects, fixed <a href="https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object">issue #8</a></li>  
26 -<li>2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2 (added close method)</li>  
27 -<li>2011-10-20: code hosted on bitbucket to ease contributions and bug tracking</li>  
28 -<li>2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC Macs.</li>  
29 -<li>2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not plain str.</li>  
30 -<li>2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben G. and Martijn for reporting the bug)</li>  
31 -<li>see changelog in source code for more info.</li>  
32 -</ul>  
33 -<h2 id="downloadinstall">Download/Install</h2>  
34 -<p>If you have pip or setuptools installed (pip is included in Python 2.7.9+), you may simply run <strong>pip install olefile</strong> or <strong>easy_install olefile</strong> for the first installation.</p>  
35 -<p>To update olefile, run <strong>pip install -U olefile</strong>.</p>  
36 -<p>Otherwise, see https://bitbucket.org/decalage/olefileio_pl/wiki/Install</p>  
37 -<h2 id="features">Features</h2>  
38 -<ul>  
39 -<li>Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView OIB files, etc</li>  
40 -<li>List all the streams and storages contained in an OLE file</li>  
41 -<li>Open streams as files</li>  
42 -<li>Parse and read property streams, containing metadata of the file</li>  
43 -<li>Portable, pure Python module, no dependency</li>  
44 -</ul>  
45 -<p>olefile can be used as an independent package or with PIL/Pillow.</p>  
46 -<p>olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially for security purposes such as malware analysis and forensics), then please also check my <a href="http://www.decalage.info/python/oletools">python-oletools</a>, which are built upon olefile and provide a higher-level interface.</p>  
47 -<h2 id="history">History</h2>  
48 -<p>olefile is based on the OleFileIO module from <a href="http://www.pythonware.com/products/pil/index.htm">PIL</a>, the excellent Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate its 9 years and its new write features.</p>  
49 -<p>As far as I know, olefile is the most complete and robust Python implementation to read MS OLE2 files, portable on several operating systems. (please tell me if you know other similar Python modules)</p>  
50 -<p>Since 2014 olefile/OleFileIO_PL has been integrated into <a href="http://python-imaging.github.io/">Pillow</a>, the friendly fork of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow regularly.</p>  
51 -<h2 id="main-improvements-over-the-original-version-of-olefileio-in-pil">Main improvements over the original version of OleFileIO in PIL:</h2>  
52 -<ul>  
53 -<li>Compatible with Python 3.x and 2.6+</li>  
54 -<li>Many bug fixes</li>  
55 -<li>Support for files larger than 6.8MB</li>  
56 -<li>Support for 64 bits platforms and big-endian CPUs</li>  
57 -<li>Robust: many checks to detect malformed files</li>  
58 -<li>Runtime option to choose if malformed files should be parsed or raise exceptions</li>  
59 -<li>Improved API</li>  
60 -<li>Metadata extraction, stream/storage timestamps (e.g. for document forensics)</li>  
61 -<li>Can open file-like objects</li>  
62 -<li>Added setup.py and install.bat to ease installation</li>  
63 -<li>More convenient slash-based syntax for stream paths</li>  
64 -<li>Write features</li>  
65 -</ul>  
66 -<h2 id="documentation">Documentation</h2>  
67 -<p>Please see the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">online documentation</a> for more information, especially the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview">OLE overview</a> and the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/API">API page</a> which describe how to use olefile in Python applications. A copy of the same documentation is also provided in the doc subfolder of the olefile package.</p>  
68 -<h2 id="real-life-examples">Real-life examples</h2>  
69 -<p>A real-life example: <a href="http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/">using OleFileIO_PL for malware analysis and forensics</a>.</p>  
70 -<p>See also <a href="https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879">this paper</a> about python tools for forensics, which features olefile.</p>  
71 -<h2 id="license">License</h2>  
72 -<p>olefile (formerly OleFileIO_PL) is copyright (c) 2005-2015 Philippe Lagadec (<a href="http://www.decalage.info">http://www.decalage.info</a>)</p>  
73 -<p>All rights reserved.</p>  
74 -<p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p>  
75 -<ul>  
76 -<li>Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.</li>  
77 -<li>Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.</li>  
78 -</ul>  
79 -<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS &quot;AS IS&quot; AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>  
80 -<hr />  
81 -<p>olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik Lundh under the following license:</p>  
82 -<p>The Python Imaging Library (PIL) is</p>  
83 -<ul>  
84 -<li>Copyright (c) 1997-2005 by Secret Labs AB</li>  
85 -<li>Copyright (c) 1995-2005 by Fredrik Lundh</li>  
86 -</ul>  
87 -<p>By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:</p>  
88 -<p>Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.</p>  
89 -<p>SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.</p>  
90 -</body>  
91 -</html> 1 +<h1 id="olefile-formerly-olefileio_pl">olefile (formerly OleFileIO_PL)</h1>
  2 +<p><a href="http://www.decalage.info/olefile">olefile</a> is a Python package to parse, read and write <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office 97-2003 documents, vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.</p>
  3 +<p><strong>Quick links:</strong> <a href="http://www.decalage.info/olefile">Home page</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/Install">Download/Install</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">Documentation</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/issues?status=new&amp;status=open">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the author</a> - <a href="https://bitbucket.org/decalage/olefileio_pl">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a></p>
  4 +<h2 id="news">News</h2>
  5 +<p>Follow all updates and news on Twitter: <a href="https://twitter.com/decalage2">https://twitter.com/decalage2</a></p>
  6 +<ul>
  7 +<li><strong>2016-02-02 v0.43</strong>: fixed issues <a href="https://bitbucket.org/decalage/olefileio_pl/issues/26/variable-referenced-before-assignment">#26</a> and <a href="https://bitbucket.org/decalage/olefileio_pl/issues/27/incomplete-ole-stream-incorrect-ole-fat">#27</a>, better handling of malformed files, use python logging.</li>
  8 +<li>2015-01-25 v0.42: improved handling of special characters in stream/storage names on Python 2.x (using UTF-8 instead of Latin-1), fixed bug in listdir with empty storages.</li>
  9 +<li>2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files stored in byte strings, fixed installer for python 3, added support for Jython (Niko Ehrenfeuchter)</li>
  10 +<li>2014-10-01 v0.40: renamed OleFileIO_PL to olefile, added initial write support for streams &gt;4K, updated doc and license, improved the setup script.</li>
  11 +<li>2014-07-27 v0.31: fixed support for large files with 4K sectors, thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added test scripts from Pillow (by hugovk). Fixed setup for Python 3 (Martin Panter)</li>
  12 +<li>2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin Panter who did most of the hard work.</li>
  13 +<li>2013-07-24 v0.26: added methods to parse stream/storage timestamps, improved listdir to include storages, fixed parsing of direntry timestamps</li>
  14 +<li>2013-05-27 v0.25: improved metadata extraction, properties parsing and exception handling, fixed <a href="https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole">issue #12</a></li>
  15 +<li>2013-05-07 v0.24: new features to extract metadata (get_metadata method and OleMetadata class), improved getproperties to convert timestamps to Python datetime</li>
  16 +<li>2012-10-09: published <a href="http://www.decalage.info/python/oletools">python-oletools</a>, a package of analysis tools based on OleFileIO_PL</li>
  17 +<li>2012-09-11 v0.23: added support for file-like objects, fixed <a href="https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object">issue #8</a></li>
  18 +<li>2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2 (added close method)</li>
  19 +<li>2011-10-20: code hosted on bitbucket to ease contributions and bug tracking</li>
  20 +<li>2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC Macs.</li>
  21 +<li>2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not plain str.</li>
  22 +<li>2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben G. and Martijn for reporting the bug)</li>
  23 +<li>see changelog in source code for more info.</li>
  24 +</ul>
  25 +<h2 id="downloadinstall">Download/Install</h2>
  26 +<p>If you have pip or setuptools installed (pip is included in Python 2.7.9+), you may simply run <strong>pip install olefile</strong> or <strong>easy_install olefile</strong> for the first installation.</p>
  27 +<p>To update olefile, run <strong>pip install -U olefile</strong>.</p>
  28 +<p>Otherwise, see https://bitbucket.org/decalage/olefileio_pl/wiki/Install</p>
  29 +<h2 id="features">Features</h2>
  30 +<ul>
  31 +<li>Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView OIB files, etc</li>
  32 +<li>List all the streams and storages contained in an OLE file</li>
  33 +<li>Open streams as files</li>
  34 +<li>Parse and read property streams, containing metadata of the file</li>
  35 +<li>Portable, pure Python module, no dependency</li>
  36 +</ul>
  37 +<p>olefile can be used as an independent package or with PIL/Pillow.</p>
  38 +<p>olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially for security purposes such as malware analysis and forensics), then please also check my <a href="http://www.decalage.info/python/oletools">python-oletools</a>, which are built upon olefile and provide a higher-level interface.</p>
  39 +<h2 id="history">History</h2>
  40 +<p>olefile is based on the OleFileIO module from <a href="http://www.pythonware.com/products/pil/index.htm">PIL</a>, the excellent Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate its 9 years and its new write features.</p>
  41 +<p>As far as I know, olefile is the most complete and robust Python implementation to read MS OLE2 files, portable on several operating systems. (please tell me if you know other similar Python modules)</p>
  42 +<p>Since 2014 olefile/OleFileIO_PL has been integrated into <a href="http://python-imaging.github.io/">Pillow</a>, the friendly fork of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow regularly.</p>
  43 +<h2 id="main-improvements-over-the-original-version-of-olefileio-in-pil">Main improvements over the original version of OleFileIO in PIL:</h2>
  44 +<ul>
  45 +<li>Compatible with Python 3.x and 2.6+</li>
  46 +<li>Many bug fixes</li>
  47 +<li>Support for files larger than 6.8MB</li>
  48 +<li>Support for 64 bits platforms and big-endian CPUs</li>
  49 +<li>Robust: many checks to detect malformed files</li>
  50 +<li>Runtime option to choose if malformed files should be parsed or raise exceptions</li>
  51 +<li>Improved API</li>
  52 +<li>Metadata extraction, stream/storage timestamps (e.g. for document forensics)</li>
  53 +<li>Can open file-like objects</li>
  54 +<li>Added setup.py and install.bat to ease installation</li>
  55 +<li>More convenient slash-based syntax for stream paths</li>
  56 +<li>Write features</li>
  57 +</ul>
  58 +<h2 id="documentation">Documentation</h2>
  59 +<p>Please see the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">online documentation</a> for more information, especially the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview">OLE overview</a> and the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/API">API page</a> which describe how to use olefile in Python applications. A copy of the same documentation is also provided in the doc subfolder of the olefile package.</p>
  60 +<h2 id="real-life-examples">Real-life examples</h2>
  61 +<p>A real-life example: <a href="http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/">using OleFileIO_PL for malware analysis and forensics</a>.</p>
  62 +<p>See also <a href="https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879">this paper</a> about python tools for forensics, which features olefile.</p>
  63 +<h2 id="license">License</h2>
  64 +<p>olefile (formerly OleFileIO_PL) is copyright (c) 2005-2016 Philippe Lagadec (<a href="http://www.decalage.info">http://www.decalage.info</a>)</p>
  65 +<p>All rights reserved.</p>
  66 +<p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p>
  67 +<ul>
  68 +<li>Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.</li>
  69 +<li>Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.</li>
  70 +</ul>
  71 +<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS &quot;AS IS&quot; AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
  72 +<hr />
  73 +<p>olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik Lundh under the following license:</p>
  74 +<p>The Python Imaging Library (PIL) is</p>
  75 +<ul>
  76 +<li>Copyright (c) 1997-2005 by Secret Labs AB</li>
  77 +<li>Copyright (c) 1995-2005 by Fredrik Lundh</li>
  78 +</ul>
  79 +<p>By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:</p>
  80 +<p>Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.</p>
  81 +<p>SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.</p>
oletools/thirdparty/olefile/README.rst
1 -olefile (formerly OleFileIO\_PL)  
2 -================================  
3 -  
4 -`olefile <http://www.decalage.info/olefile>`_ is a Python package to  
5 -parse, read and write `Microsoft OLE2  
6 -files <http://en.wikipedia.org/wiki/Compound_File_Binary_Format>`_ (also  
7 -called Structured Storage, Compound File Binary Format or Compound  
8 -Document File Format), such as Microsoft Office 97-2003 documents,  
9 -vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix  
10 -files, Outlook messages, StickyNotes, several Microscopy file formats,  
11 -McAfee antivirus quarantine files, etc.  
12 -  
13 -**Quick links:** `Home page <http://www.decalage.info/olefile>`_ -  
14 -`Download/Install <https://bitbucket.org/decalage/olefileio_pl/wiki/Install>`_  
15 -- `Documentation <https://bitbucket.org/decalage/olefileio_pl/wiki>`_ -  
16 -`Report  
17 -Issues/Suggestions/Questions <https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open>`_  
18 -- `Contact the author <http://decalage.info/contact>`_ -  
19 -`Repository <https://bitbucket.org/decalage/olefileio_pl>`_ - `Updates  
20 -on Twitter <https://twitter.com/decalage2>`_  
21 -  
22 -News  
23 -----  
24 -  
25 -Follow all updates and news on Twitter: https://twitter.com/decalage2  
26 -  
27 -- **2015-01-25 v0.42**: improved handling of special characters in  
28 - stream/storage names on Python 2.x (using UTF-8 instead of Latin-1),  
29 - fixed bug in listdir with empty storages.  
30 -- 2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files  
31 - stored in byte strings, fixed installer for python 3, added support  
32 - for Jython (Niko Ehrenfeuchter)  
33 -- 2014-10-01 v0.40: renamed OleFileIO\_PL to olefile, added initial  
34 - write support for streams >4K, updated doc and license, improved the  
35 - setup script.  
36 -- 2014-07-27 v0.31: fixed support for large files with 4K sectors,  
37 - thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added  
38 - test scripts from Pillow (by hugovk). Fixed setup for Python 3  
39 - (Martin Panter)  
40 -- 2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin  
41 - Panter who did most of the hard work.  
42 -- 2013-07-24 v0.26: added methods to parse stream/storage timestamps,  
43 - improved listdir to include storages, fixed parsing of direntry  
44 - timestamps  
45 -- 2013-05-27 v0.25: improved metadata extraction, properties parsing  
46 - and exception handling, fixed `issue  
47 - #12 <https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole>`_  
48 -- 2013-05-07 v0.24: new features to extract metadata (get\_metadata  
49 - method and OleMetadata class), improved getproperties to convert  
50 - timestamps to Python datetime  
51 -- 2012-10-09: published  
52 - `python-oletools <http://www.decalage.info/python/oletools>`_, a  
53 - package of analysis tools based on OleFileIO\_PL  
54 -- 2012-09-11 v0.23: added support for file-like objects, fixed `issue  
55 - #8 <https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object>`_  
56 -- 2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2  
57 - (added close method)  
58 -- 2011-10-20: code hosted on bitbucket to ease contributions and bug  
59 - tracking  
60 -- 2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC  
61 - Macs.  
62 -- 2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not  
63 - plain str.  
64 -- 2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben  
65 - G. and Martijn for reporting the bug)  
66 -- see changelog in source code for more info.  
67 -  
68 -Download/Install  
69 -----------------  
70 -  
71 -If you have pip or setuptools installed (pip is included in Python  
72 -2.7.9+), you may simply run **pip install olefile** or **easy\_install  
73 -olefile** for the first installation.  
74 -  
75 -To update olefile, run **pip install -U olefile**.  
76 -  
77 -Otherwise, see https://bitbucket.org/decalage/olefileio\_pl/wiki/Install  
78 -  
79 -Features  
80 ---------  
81 -  
82 -- Parse, read and write any OLE file such as Microsoft Office 97-2003  
83 - legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt,  
84 - Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook  
85 - messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView  
86 - OIB files, etc  
87 -- List all the streams and storages contained in an OLE file  
88 -- Open streams as files  
89 -- Parse and read property streams, containing metadata of the file  
90 -- Portable, pure Python module, no dependency  
91 -  
92 -olefile can be used as an independent package or with PIL/Pillow.  
93 -  
94 -olefile is mostly meant for developers. If you are looking for tools to  
95 -analyze OLE files or to extract data (especially for security purposes  
96 -such as malware analysis and forensics), then please also check my  
97 -`python-oletools <http://www.decalage.info/python/oletools>`_, which are  
98 -built upon olefile and provide a higher-level interface.  
99 -  
100 -History  
101 --------  
102 -  
103 -olefile is based on the OleFileIO module from  
104 -`PIL <http://www.pythonware.com/products/pil/index.htm>`_, the excellent  
105 -Python Imaging Library, created and maintained by Fredrik Lundh. The  
106 -olefile API is still compatible with PIL, but since 2005 I have improved  
107 -the internal implementation significantly, with new features, bugfixes  
108 -and a more robust design. From 2005 to 2014 the project was called  
109 -OleFileIO\_PL, and in 2014 I changed its name to olefile to celebrate  
110 -its 9 years and its new write features.  
111 -  
112 -As far as I know, olefile is the most complete and robust Python  
113 -implementation to read MS OLE2 files, portable on several operating  
114 -systems. (please tell me if you know other similar Python modules)  
115 -  
116 -Since 2014 olefile/OleFileIO\_PL has been integrated into  
117 -`Pillow <http://python-imaging.github.io/>`_, the friendly fork of PIL.  
118 -olefile will continue to be improved as a separate project, and new  
119 -versions will be merged into Pillow regularly.  
120 -  
121 -Main improvements over the original version of OleFileIO in PIL:  
122 -----------------------------------------------------------------  
123 -  
124 -- Compatible with Python 3.x and 2.6+  
125 -- Many bug fixes  
126 -- Support for files larger than 6.8MB  
127 -- Support for 64 bits platforms and big-endian CPUs  
128 -- Robust: many checks to detect malformed files  
129 -- Runtime option to choose if malformed files should be parsed or raise  
130 - exceptions  
131 -- Improved API  
132 -- Metadata extraction, stream/storage timestamps (e.g. for document  
133 - forensics)  
134 -- Can open file-like objects  
135 -- Added setup.py and install.bat to ease installation  
136 -- More convenient slash-based syntax for stream paths  
137 -- Write features  
138 -  
139 -Documentation  
140 --------------  
141 -  
142 -Please see the `online  
143 -documentation <https://bitbucket.org/decalage/olefileio_pl/wiki>`_ for  
144 -more information, especially the `OLE  
145 -overview <https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview>`_  
146 -and the `API  
147 -page <https://bitbucket.org/decalage/olefileio_pl/wiki/API>`_ which  
148 -describe how to use olefile in Python applications. A copy of the same  
149 -documentation is also provided in the doc subfolder of the olefile  
150 -package.  
151 -  
152 -Real-life examples  
153 -------------------  
154 -  
155 -A real-life example: `using OleFileIO\_PL for malware analysis and  
156 -forensics <http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/>`_.  
157 -  
158 -See also `this  
159 -paper <https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879>`_  
160 -about python tools for forensics, which features olefile.  
161 -  
162 -License  
163 --------  
164 -  
165 -olefile (formerly OleFileIO\_PL) is copyright (c) 2005-2015 Philippe  
166 -Lagadec (`http://www.decalage.info <http://www.decalage.info>`_)  
167 -  
168 -All rights reserved.  
169 -  
170 -Redistribution and use in source and binary forms, with or without  
171 -modification, are permitted provided that the following conditions are  
172 -met:  
173 -  
174 -- Redistributions of source code must retain the above copyright  
175 - notice, this list of conditions and the following disclaimer.  
176 -- Redistributions in binary form must reproduce the above copyright  
177 - notice, this list of conditions and the following disclaimer in the  
178 - documentation and/or other materials provided with the distribution.  
179 -  
180 -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS  
181 -IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED  
182 -TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A  
183 -PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT  
184 -HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,  
185 -SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED  
186 -TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR  
187 -PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF  
188 -LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING  
189 -NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS  
190 -SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.  
191 -  
192 ---------------  
193 -  
194 -olefile is based on source code from the OleFileIO module of the Python  
195 -Imaging Library (PIL) published by Fredrik Lundh under the following  
196 -license:  
197 -  
198 -The Python Imaging Library (PIL) is  
199 -  
200 -- Copyright (c) 1997-2005 by Secret Labs AB  
201 -- Copyright (c) 1995-2005 by Fredrik Lundh  
202 -  
203 -By obtaining, using, and/or copying this software and/or its associated  
204 -documentation, you agree that you have read, understood, and will comply  
205 -with the following terms and conditions:  
206 -  
207 -Permission to use, copy, modify, and distribute this software and its  
208 -associated documentation for any purpose and without fee is hereby  
209 -granted, provided that the above copyright notice appears in all copies,  
210 -and that both that copyright notice and this permission notice appear in  
211 -supporting documentation, and that the name of Secret Labs AB or the  
212 -author not be used in advertising or publicity pertaining to  
213 -distribution of the software without specific, written prior permission.  
214 -  
215 -SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO  
216 -THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND  
217 -FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR  
218 -ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER  
219 -RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF  
220 -CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN  
221 -CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 1 +olefile (formerly OleFileIO\_PL)
  2 +================================
  3 +
  4 +`olefile <http://www.decalage.info/olefile>`__ is a Python package to
  5 +parse, read and write `Microsoft OLE2
  6 +files <http://en.wikipedia.org/wiki/Compound_File_Binary_Format>`__
  7 +(also called Structured Storage, Compound File Binary Format or Compound
  8 +Document File Format), such as Microsoft Office 97-2003 documents,
  9 +vbaProject.bin in MS Office 2007+ files, Image Composer and FlashPix
  10 +files, Outlook messages, StickyNotes, several Microscopy file formats,
  11 +McAfee antivirus quarantine files, etc.
  12 +
  13 +**Quick links:** `Home page <http://www.decalage.info/olefile>`__ -
  14 +`Download/Install <https://bitbucket.org/decalage/olefileio_pl/wiki/Install>`__
  15 +- `Documentation <https://bitbucket.org/decalage/olefileio_pl/wiki>`__ -
  16 +`Report
  17 +Issues/Suggestions/Questions <https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open>`__
  18 +- `Contact the author <http://decalage.info/contact>`__ -
  19 +`Repository <https://bitbucket.org/decalage/olefileio_pl>`__ - `Updates
  20 +on Twitter <https://twitter.com/decalage2>`__
  21 +
  22 +News
  23 +----
  24 +
  25 +Follow all updates and news on Twitter: https://twitter.com/decalage2
  26 +
  27 +- **2016-02-02 v0.43**: fixed issues
  28 + `#26 <https://bitbucket.org/decalage/olefileio_pl/issues/26/variable-referenced-before-assignment>`__
  29 + and
  30 + `#27 <https://bitbucket.org/decalage/olefileio_pl/issues/27/incomplete-ole-stream-incorrect-ole-fat>`__,
  31 + better handling of malformed files, use python logging.
  32 +- 2015-01-25 v0.42: improved handling of special characters in
  33 + stream/storage names on Python 2.x (using UTF-8 instead of Latin-1),
  34 + fixed bug in listdir with empty storages.
  35 +- 2014-11-25 v0.41: OleFileIO.open and isOleFile now support OLE files
  36 + stored in byte strings, fixed installer for python 3, added support
  37 + for Jython (Niko Ehrenfeuchter)
  38 +- 2014-10-01 v0.40: renamed OleFileIO\_PL to olefile, added initial
  39 + write support for streams >4K, updated doc and license, improved the
  40 + setup script.
  41 +- 2014-07-27 v0.31: fixed support for large files with 4K sectors,
  42 + thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added
  43 + test scripts from Pillow (by hugovk). Fixed setup for Python 3
  44 + (Martin Panter)
  45 +- 2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin
  46 + Panter who did most of the hard work.
  47 +- 2013-07-24 v0.26: added methods to parse stream/storage timestamps,
  48 + improved listdir to include storages, fixed parsing of direntry
  49 + timestamps
  50 +- 2013-05-27 v0.25: improved metadata extraction, properties parsing
  51 + and exception handling, fixed `issue
  52 + #12 <https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole>`__
  53 +- 2013-05-07 v0.24: new features to extract metadata (get\_metadata
  54 + method and OleMetadata class), improved getproperties to convert
  55 + timestamps to Python datetime
  56 +- 2012-10-09: published
  57 + `python-oletools <http://www.decalage.info/python/oletools>`__, a
  58 + package of analysis tools based on OleFileIO\_PL
  59 +- 2012-09-11 v0.23: added support for file-like objects, fixed `issue
  60 + #8 <https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object>`__
  61 +- 2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2
  62 + (added close method)
  63 +- 2011-10-20: code hosted on bitbucket to ease contributions and bug
  64 + tracking
  65 +- 2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC
  66 + Macs.
  67 +- 2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not
  68 + plain str.
  69 +- 2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben
  70 + G. and Martijn for reporting the bug)
  71 +- see changelog in source code for more info.
  72 +
  73 +Download/Install
  74 +----------------
  75 +
  76 +If you have pip or setuptools installed (pip is included in Python
  77 +2.7.9+), you may simply run **pip install olefile** or **easy\_install
  78 +olefile** for the first installation.
  79 +
  80 +To update olefile, run **pip install -U olefile**.
  81 +
  82 +Otherwise, see https://bitbucket.org/decalage/olefileio\_pl/wiki/Install
  83 +
  84 +Features
  85 +--------
  86 +
  87 +- Parse, read and write any OLE file such as Microsoft Office 97-2003
  88 + legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt,
  89 + Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook
  90 + messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView
  91 + OIB files, etc
  92 +- List all the streams and storages contained in an OLE file
  93 +- Open streams as files
  94 +- Parse and read property streams, containing metadata of the file
  95 +- Portable, pure Python module, no dependency
  96 +
  97 +olefile can be used as an independent package or with PIL/Pillow.
  98 +
  99 +olefile is mostly meant for developers. If you are looking for tools to
  100 +analyze OLE files or to extract data (especially for security purposes
  101 +such as malware analysis and forensics), then please also check my
  102 +`python-oletools <http://www.decalage.info/python/oletools>`__, which
  103 +are built upon olefile and provide a higher-level interface.
  104 +
  105 +History
  106 +-------
  107 +
  108 +olefile is based on the OleFileIO module from
  109 +`PIL <http://www.pythonware.com/products/pil/index.htm>`__, the
  110 +excellent Python Imaging Library, created and maintained by Fredrik
  111 +Lundh. The olefile API is still compatible with PIL, but since 2005 I
  112 +have improved the internal implementation significantly, with new
  113 +features, bugfixes and a more robust design. From 2005 to 2014 the
  114 +project was called OleFileIO\_PL, and in 2014 I changed its name to
  115 +olefile to celebrate its 9 years and its new write features.
  116 +
  117 +As far as I know, olefile is the most complete and robust Python
  118 +implementation to read MS OLE2 files, portable on several operating
  119 +systems. (please tell me if you know other similar Python modules)
  120 +
  121 +Since 2014 olefile/OleFileIO\_PL has been integrated into
  122 +`Pillow <http://python-imaging.github.io/>`__, the friendly fork of PIL.
  123 +olefile will continue to be improved as a separate project, and new
  124 +versions will be merged into Pillow regularly.
  125 +
  126 +Main improvements over the original version of OleFileIO in PIL:
  127 +----------------------------------------------------------------
  128 +
  129 +- Compatible with Python 3.x and 2.6+
  130 +- Many bug fixes
  131 +- Support for files larger than 6.8MB
  132 +- Support for 64 bits platforms and big-endian CPUs
  133 +- Robust: many checks to detect malformed files
  134 +- Runtime option to choose if malformed files should be parsed or raise
  135 + exceptions
  136 +- Improved API
  137 +- Metadata extraction, stream/storage timestamps (e.g. for document
  138 + forensics)
  139 +- Can open file-like objects
  140 +- Added setup.py and install.bat to ease installation
  141 +- More convenient slash-based syntax for stream paths
  142 +- Write features
  143 +
  144 +Documentation
  145 +-------------
  146 +
  147 +Please see the `online
  148 +documentation <https://bitbucket.org/decalage/olefileio_pl/wiki>`__ for
  149 +more information, especially the `OLE
  150 +overview <https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview>`__
  151 +and the `API
  152 +page <https://bitbucket.org/decalage/olefileio_pl/wiki/API>`__ which
  153 +describe how to use olefile in Python applications. A copy of the same
  154 +documentation is also provided in the doc subfolder of the olefile
  155 +package.
  156 +
  157 +Real-life examples
  158 +------------------
  159 +
  160 +A real-life example: `using OleFileIO\_PL for malware analysis and
  161 +forensics <http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/>`__.
  162 +
  163 +See also `this
  164 +paper <https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879>`__
  165 +about python tools for forensics, which features olefile.
  166 +
  167 +License
  168 +-------
  169 +
  170 +olefile (formerly OleFileIO\_PL) is copyright (c) 2005-2016 Philippe
  171 +Lagadec (http://www.decalage.info)
  172 +
  173 +All rights reserved.
  174 +
  175 +Redistribution and use in source and binary forms, with or without
  176 +modification, are permitted provided that the following conditions are
  177 +met:
  178 +
  179 +- Redistributions of source code must retain the above copyright
  180 + notice, this list of conditions and the following disclaimer.
  181 +- Redistributions in binary form must reproduce the above copyright
  182 + notice, this list of conditions and the following disclaimer in the
  183 + documentation and/or other materials provided with the distribution.
  184 +
  185 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
  186 +IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
  187 +TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
  188 +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  189 +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  190 +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
  191 +TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
  192 +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
  193 +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
  194 +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
  195 +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  196 +
  197 +--------------
  198 +
  199 +olefile is based on source code from the OleFileIO module of the Python
  200 +Imaging Library (PIL) published by Fredrik Lundh under the following
  201 +license:
  202 +
  203 +The Python Imaging Library (PIL) is
  204 +
  205 +- Copyright (c) 1997-2005 by Secret Labs AB
  206 +- Copyright (c) 1995-2005 by Fredrik Lundh
  207 +
  208 +By obtaining, using, and/or copying this software and/or its associated
  209 +documentation, you agree that you have read, understood, and will comply
  210 +with the following terms and conditions:
  211 +
  212 +Permission to use, copy, modify, and distribute this software and its
  213 +associated documentation for any purpose and without fee is hereby
  214 +granted, provided that the above copyright notice appears in all copies,
  215 +and that both that copyright notice and this permission notice appear in
  216 +supporting documentation, and that the name of Secret Labs AB or the
  217 +author not be used in advertising or publicity pertaining to
  218 +distribution of the software without specific, written prior permission.
  219 +
  220 +SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO
  221 +THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
  222 +FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR
  223 +ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
  224 +RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
  225 +CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
  226 +CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
oletools/thirdparty/olefile/olefile.py
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 2
3 -# olefile (formerly OleFileIO_PL) version 0.43 2015-04-17 3 +# olefile (formerly OleFileIO_PL)
4 # 4 #
5 # Module to read/write Microsoft OLE2 files (also called Structured Storage or 5 # Module to read/write Microsoft OLE2 files (also called Structured Storage or
6 # Microsoft Compound Document File Format), such as Microsoft Office 97-2003 6 # Microsoft Compound Document File Format), such as Microsoft Office 97-2003
@@ -9,7 +9,7 @@ @@ -9,7 +9,7 @@
9 # 9 #
10 # Project website: http://www.decalage.info/olefile 10 # Project website: http://www.decalage.info/olefile
11 # 11 #
12 -# olefile is copyright (c) 2005-2015 Philippe Lagadec (http://www.decalage.info) 12 +# olefile is copyright (c) 2005-2016 Philippe Lagadec (http://www.decalage.info)
13 # 13 #
14 # olefile is based on the OleFileIO module from the PIL library v1.1.6 14 # olefile is based on the OleFileIO module from the PIL library v1.1.6
15 # See: http://www.pythonware.com/products/pil/index.htm 15 # See: http://www.pythonware.com/products/pil/index.htm
@@ -29,12 +29,12 @@ from __future__ import print_function # This version of olefile requires Pytho @@ -29,12 +29,12 @@ from __future__ import print_function # This version of olefile requires Pytho
29 29
30 30
31 __author__ = "Philippe Lagadec" 31 __author__ = "Philippe Lagadec"
32 -__date__ = "2015-04-17"  
33 -__version__ = '0.43' 32 +__date__ = "2016-02-02"
  33 +__version__ = '0.44'
34 34
35 #--- LICENSE ------------------------------------------------------------------ 35 #--- LICENSE ------------------------------------------------------------------
36 36
37 -# olefile (formerly OleFileIO_PL) is copyright (c) 2005-2015 Philippe Lagadec 37 +# olefile (formerly OleFileIO_PL) is copyright (c) 2005-2016 Philippe Lagadec
38 # (http://www.decalage.info) 38 # (http://www.decalage.info)
39 # 39 #
40 # All rights reserved. 40 # All rights reserved.
@@ -182,6 +182,14 @@ __version__ = &#39;0.43&#39; @@ -182,6 +182,14 @@ __version__ = &#39;0.43&#39;
182 # - added path_encoding option to override the default 182 # - added path_encoding option to override the default
183 # - fixed a bug in _list when a storage is empty 183 # - fixed a bug in _list when a storage is empty
184 # 2015-04-17 v0.43 PL: - slight changes in _OleDirectoryEntry 184 # 2015-04-17 v0.43 PL: - slight changes in _OleDirectoryEntry
  185 +# 2015-10-19 - fixed issue #26 in OleFileIO.getproperties
  186 +# (using id and type as local variable names)
  187 +# 2015-10-29 - replaced debug() with proper logging
  188 +# - use optparse to handle command line options
  189 +# - improved attribute names in OleFileIO class
  190 +# 2015-11-05 - fixed issue #27 by correcting the MiniFAT sector
  191 +# cutoff size if invalid.
  192 +# 2016-02-02 - logging is disabled by default
185 193
186 #----------------------------------------------------------------------------- 194 #-----------------------------------------------------------------------------
187 # TODO (for version 1.0): 195 # TODO (for version 1.0):
@@ -257,7 +265,7 @@ __version__ = &#39;0.43&#39; @@ -257,7 +265,7 @@ __version__ = &#39;0.43&#39;
257 265
258 import io 266 import io
259 import sys 267 import sys
260 -import struct, array, os.path, datetime 268 +import struct, array, os.path, datetime, logging
261 269
262 #=== COMPATIBILITY WORKAROUNDS ================================================ 270 #=== COMPATIBILITY WORKAROUNDS ================================================
263 271
@@ -327,30 +335,46 @@ else: @@ -327,30 +335,46 @@ else:
327 DEFAULT_PATH_ENCODING = None 335 DEFAULT_PATH_ENCODING = None
328 336
329 337
330 -#=== DEBUGGING =============================================================== 338 +# === LOGGING =================================================================
331 339
332 -#TODO: replace this by proper logging  
333 -  
334 -#[PL] DEBUG display mode: False by default, use set_debug_mode() or "-d" on  
335 -# command line to change it.  
336 -DEBUG_MODE = False  
337 -def debug_print(msg):  
338 - print(msg)  
339 -def debug_pass(msg):  
340 - pass  
341 -debug = debug_pass 340 +class NullHandler(logging.Handler):
  341 + """
  342 + Log Handler without output, to avoid printing messages if logging is not
  343 + configured by the main application.
  344 + Python 2.7 has logging.NullHandler, but this is necessary for 2.6:
  345 + see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library
  346 + """
  347 + def emit(self, record):
  348 + pass
342 349
343 -def set_debug_mode(debug_mode): 350 +def get_logger(name, level=logging.CRITICAL+1):
344 """ 351 """
345 - Set debug mode on or off, to control display of debugging messages.  
346 - :param mode: True or False 352 + Create a suitable logger object for this module.
  353 + The goal is not to change settings of the root logger, to avoid getting
  354 + other modules' logs on the screen.
  355 + If a logger exists with same name, reuse it. (Else it would have duplicate
  356 + handlers and messages would be doubled.)
  357 + The level is set to CRITICAL+1 by default, to avoid any logging.
347 """ 358 """
348 - global DEBUG_MODE, debug  
349 - DEBUG_MODE = debug_mode  
350 - if debug_mode:  
351 - debug = debug_print  
352 - else:  
353 - debug = debug_pass 359 + # First, test if there is already a logger with the same name, else it
  360 + # will generate duplicate messages (due to duplicate handlers):
  361 + if name in logging.Logger.manager.loggerDict:
  362 + #NOTE: another less intrusive but more "hackish" solution would be to
  363 + # use getLogger then test if its effective level is not default.
  364 + logger = logging.getLogger(name)
  365 + # make sure level is OK:
  366 + logger.setLevel(level)
  367 + return logger
  368 + # get a new logger:
  369 + logger = logging.getLogger(name)
  370 + # only add a NullHandler for this logger, it is up to the application
  371 + # to configure its own logging:
  372 + logger.addHandler(NullHandler())
  373 + logger.setLevel(level)
  374 + return logger
  375 +
  376 +# a global logger object used for debugging:
  377 +log = get_logger('olefile')
354 378
355 379
356 #=== CONSTANTS =============================================================== 380 #=== CONSTANTS ===============================================================
@@ -518,7 +542,7 @@ def filetime2datetime(filetime): @@ -518,7 +542,7 @@ def filetime2datetime(filetime):
518 # TODO: manage exception when microseconds is too large 542 # TODO: manage exception when microseconds is too large
519 # inspired from http://code.activestate.com/recipes/511425-filetime-to-datetime/ 543 # inspired from http://code.activestate.com/recipes/511425-filetime-to-datetime/
520 _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0) 544 _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0)
521 - #debug('timedelta days=%d' % (filetime//(10*1000000*3600*24))) 545 + #log.debug('timedelta days=%d' % (filetime//(10*1000000*3600*24)))
522 return _FILETIME_null_date + datetime.timedelta(microseconds=filetime//10) 546 return _FILETIME_null_date + datetime.timedelta(microseconds=filetime//10)
523 547
524 548
@@ -695,6 +719,7 @@ class _OleStream(io.BytesIO): @@ -695,6 +719,7 @@ class _OleStream(io.BytesIO):
695 719
696 - size: actual size of data stream, after it was opened. 720 - size: actual size of data stream, after it was opened.
697 """ 721 """
  722 + #TODO: use _raise_defect instead of exceptions
698 723
699 # FIXME: should store the list of sects obtained by following 724 # FIXME: should store the list of sects obtained by following
700 # the fat chain, and load new sectors on demand instead of 725 # the fat chain, and load new sectors on demand instead of
@@ -713,8 +738,8 @@ class _OleStream(io.BytesIO): @@ -713,8 +738,8 @@ class _OleStream(io.BytesIO):
713 :param filesize: size of OLE file (for debugging) 738 :param filesize: size of OLE file (for debugging)
714 :returns: a BytesIO instance containing the OLE stream 739 :returns: a BytesIO instance containing the OLE stream
715 """ 740 """
716 - debug('_OleStream.__init__:')  
717 - debug(' sect=%d (%X), size=%d, offset=%d, sectorsize=%d, len(fat)=%d, fp=%s' 741 + log.debug('_OleStream.__init__:')
  742 + log.debug(' sect=%d (%X), size=%d, offset=%d, sectorsize=%d, len(fat)=%d, fp=%s'
718 %(sect,sect,size,offset,sectorsize,len(fat), repr(fp))) 743 %(sect,sect,size,offset,sectorsize,len(fat), repr(fp)))
719 #[PL] To detect malformed documents with FAT loops, we compute the 744 #[PL] To detect malformed documents with FAT loops, we compute the
720 # expected number of sectors in the stream: 745 # expected number of sectors in the stream:
@@ -726,9 +751,9 @@ class _OleStream(io.BytesIO): @@ -726,9 +751,9 @@ class _OleStream(io.BytesIO):
726 size = len(fat)*sectorsize 751 size = len(fat)*sectorsize
727 # and we keep a record that size was unknown: 752 # and we keep a record that size was unknown:
728 unknown_size = True 753 unknown_size = True
729 - debug(' stream with UNKNOWN SIZE') 754 + log.debug(' stream with UNKNOWN SIZE')
730 nb_sectors = (size + (sectorsize-1)) // sectorsize 755 nb_sectors = (size + (sectorsize-1)) // sectorsize
731 - debug('nb_sectors = %d' % nb_sectors) 756 + log.debug('nb_sectors = %d' % nb_sectors)
732 # This number should (at least) be less than the total number of 757 # This number should (at least) be less than the total number of
733 # sectors in the given FAT: 758 # sectors in the given FAT:
734 if nb_sectors > len(fat): 759 if nb_sectors > len(fat):
@@ -739,7 +764,7 @@ class _OleStream(io.BytesIO): @@ -739,7 +764,7 @@ class _OleStream(io.BytesIO):
739 data = [] 764 data = []
740 # if size is zero, then first sector index should be ENDOFCHAIN: 765 # if size is zero, then first sector index should be ENDOFCHAIN:
741 if size == 0 and sect != ENDOFCHAIN: 766 if size == 0 and sect != ENDOFCHAIN:
742 - debug('size == 0 and sect != ENDOFCHAIN:') 767 + log.debug('size == 0 and sect != ENDOFCHAIN:')
743 raise IOError('incorrect OLE sector index for empty stream') 768 raise IOError('incorrect OLE sector index for empty stream')
744 #[PL] A fixed-length for loop is used instead of an undefined while 769 #[PL] A fixed-length for loop is used instead of an undefined while
745 # loop to avoid DoS attacks: 770 # loop to avoid DoS attacks:
@@ -750,24 +775,24 @@ class _OleStream(io.BytesIO): @@ -750,24 +775,24 @@ class _OleStream(io.BytesIO):
750 break 775 break
751 else: 776 else:
752 # else this means that the stream is smaller than declared: 777 # else this means that the stream is smaller than declared:
753 - debug('sect=ENDOFCHAIN before expected size') 778 + log.debug('sect=ENDOFCHAIN before expected size')
754 raise IOError('incomplete OLE stream') 779 raise IOError('incomplete OLE stream')
755 # sector index should be within FAT: 780 # sector index should be within FAT:
756 if sect<0 or sect>=len(fat): 781 if sect<0 or sect>=len(fat):
757 - debug('sect=%d (%X) / len(fat)=%d' % (sect, sect, len(fat)))  
758 - debug('i=%d / nb_sectors=%d' %(i, nb_sectors)) 782 + log.debug('sect=%d (%X) / len(fat)=%d' % (sect, sect, len(fat)))
  783 + log.debug('i=%d / nb_sectors=%d' %(i, nb_sectors))
759 ## tmp_data = b"".join(data) 784 ## tmp_data = b"".join(data)
760 ## f = open('test_debug.bin', 'wb') 785 ## f = open('test_debug.bin', 'wb')
761 ## f.write(tmp_data) 786 ## f.write(tmp_data)
762 ## f.close() 787 ## f.close()
763 -## debug('data read so far: %d bytes' % len(tmp_data)) 788 +## log.debug('data read so far: %d bytes' % len(tmp_data))
764 raise IOError('incorrect OLE FAT, sector index out of range') 789 raise IOError('incorrect OLE FAT, sector index out of range')
765 #TODO: merge this code with OleFileIO.getsect() ? 790 #TODO: merge this code with OleFileIO.getsect() ?
766 #TODO: check if this works with 4K sectors: 791 #TODO: check if this works with 4K sectors:
767 try: 792 try:
768 fp.seek(offset + sectorsize * sect) 793 fp.seek(offset + sectorsize * sect)
769 except: 794 except:
770 - debug('sect=%d, seek=%d, filesize=%d' % 795 + log.debug('sect=%d, seek=%d, filesize=%d' %
771 (sect, offset+sectorsize*sect, filesize)) 796 (sect, offset+sectorsize*sect, filesize))
772 raise IOError('OLE sector index out of range') 797 raise IOError('OLE sector index out of range')
773 sector_data = fp.read(sectorsize) 798 sector_data = fp.read(sectorsize)
@@ -776,9 +801,9 @@ class _OleStream(io.BytesIO): @@ -776,9 +801,9 @@ class _OleStream(io.BytesIO):
776 # complete sector (of 512 or 4K), so we may read less than 801 # complete sector (of 512 or 4K), so we may read less than
777 # sectorsize. 802 # sectorsize.
778 if len(sector_data)!=sectorsize and sect!=(len(fat)-1): 803 if len(sector_data)!=sectorsize and sect!=(len(fat)-1):
779 - debug('sect=%d / len(fat)=%d, seek=%d / filesize=%d, len read=%d' % 804 + log.debug('sect=%d / len(fat)=%d, seek=%d / filesize=%d, len read=%d' %
780 (sect, len(fat), offset+sectorsize*sect, filesize, len(sector_data))) 805 (sect, len(fat), offset+sectorsize*sect, filesize, len(sector_data)))
781 - debug('seek+len(read)=%d' % (offset+sectorsize*sect+len(sector_data))) 806 + log.debug('seek+len(read)=%d' % (offset+sectorsize*sect+len(sector_data)))
782 raise IOError('incomplete OLE sector') 807 raise IOError('incomplete OLE sector')
783 data.append(sector_data) 808 data.append(sector_data)
784 # jump to next sector in the FAT: 809 # jump to next sector in the FAT:
@@ -802,7 +827,8 @@ class _OleStream(io.BytesIO): @@ -802,7 +827,8 @@ class _OleStream(io.BytesIO):
802 self.size = len(data) 827 self.size = len(data)
803 else: 828 else:
804 # read data is less than expected: 829 # read data is less than expected:
805 - debug('len(data)=%d, size=%d' % (len(data), size)) 830 + log.debug('len(data)=%d, size=%d' % (len(data), size))
  831 + # TODO: provide details in exception message
806 raise IOError('OLE stream size is less than declared') 832 raise IOError('OLE stream size is less than declared')
807 # when all data is read in memory, BytesIO constructor is called 833 # when all data is read in memory, BytesIO constructor is called
808 io.BytesIO.__init__(self, data) 834 io.BytesIO.__init__(self, data)
@@ -888,7 +914,7 @@ class _OleDirectoryEntry: @@ -888,7 +914,7 @@ class _OleDirectoryEntry:
888 olefile._raise_defect(DEFECT_INCORRECT, 'duplicate OLE root entry') 914 olefile._raise_defect(DEFECT_INCORRECT, 'duplicate OLE root entry')
889 if sid == 0 and self.entry_type != STGTY_ROOT: 915 if sid == 0 and self.entry_type != STGTY_ROOT:
890 olefile._raise_defect(DEFECT_INCORRECT, 'incorrect OLE root entry') 916 olefile._raise_defect(DEFECT_INCORRECT, 'incorrect OLE root entry')
891 - #debug (struct.unpack(fmt_entry, entry[:len_entry])) 917 + #log.debug(struct.unpack(fmt_entry, entry[:len_entry]))
892 # name should be at most 31 unicode characters + null character, 918 # name should be at most 31 unicode characters + null character,
893 # so 64 bytes in total (31*2 + 2): 919 # so 64 bytes in total (31*2 + 2):
894 if self.namelength>64: 920 if self.namelength>64:
@@ -903,10 +929,10 @@ class _OleDirectoryEntry: @@ -903,10 +929,10 @@ class _OleDirectoryEntry:
903 # name is converted from UTF-16LE to the path encoding specified in the OleFileIO: 929 # name is converted from UTF-16LE to the path encoding specified in the OleFileIO:
904 self.name = olefile._decode_utf16_str(self.name_utf16) 930 self.name = olefile._decode_utf16_str(self.name_utf16)
905 931
906 - debug('DirEntry SID=%d: %s' % (self.sid, repr(self.name)))  
907 - debug(' - type: %d' % self.entry_type)  
908 - debug(' - sect: %d' % self.isectStart)  
909 - debug(' - SID left: %d, right: %d, child: %d' % (self.sid_left, 932 + log.debug('DirEntry SID=%d: %s' % (self.sid, repr(self.name)))
  933 + log.debug(' - type: %d' % self.entry_type)
  934 + log.debug(' - sect: %Xh' % self.isectStart)
  935 + log.debug(' - SID left: %d, right: %d, child: %d' % (self.sid_left,
910 self.sid_right, self.sid_child)) 936 self.sid_right, self.sid_child))
911 937
912 # sizeHigh is only used for 4K sectors, it should be zero for 512 bytes 938 # sizeHigh is only used for 4K sectors, it should be zero for 512 bytes
@@ -914,13 +940,14 @@ class _OleDirectoryEntry: @@ -914,13 +940,14 @@ class _OleDirectoryEntry:
914 # or some other value so it cannot be raised as a defect in general: 940 # or some other value so it cannot be raised as a defect in general:
915 if olefile.sectorsize == 512: 941 if olefile.sectorsize == 512:
916 if self.sizeHigh != 0 and self.sizeHigh != 0xFFFFFFFF: 942 if self.sizeHigh != 0 and self.sizeHigh != 0xFFFFFFFF:
917 - debug('sectorsize=%d, sizeLow=%d, sizeHigh=%d (%X)' % 943 + log.debug('sectorsize=%d, sizeLow=%d, sizeHigh=%d (%X)' %
918 (olefile.sectorsize, self.sizeLow, self.sizeHigh, self.sizeHigh)) 944 (olefile.sectorsize, self.sizeLow, self.sizeHigh, self.sizeHigh))
919 olefile._raise_defect(DEFECT_UNSURE, 'incorrect OLE stream size') 945 olefile._raise_defect(DEFECT_UNSURE, 'incorrect OLE stream size')
920 self.size = self.sizeLow 946 self.size = self.sizeLow
921 else: 947 else:
922 self.size = self.sizeLow + (long(self.sizeHigh)<<32) 948 self.size = self.sizeLow + (long(self.sizeHigh)<<32)
923 - debug(' - size: %d (sizeLow=%d, sizeHigh=%d)' % (self.size, self.sizeLow, self.sizeHigh)) 949 + log.debug(' - size: %d (sizeLow=%d, sizeHigh=%d)' % (self.size, self.sizeLow, self.sizeHigh))
  950 +
924 self.clsid = _clsid(clsid) 951 self.clsid = _clsid(clsid)
925 # a storage should have a null size, BUT some implementations such as 952 # a storage should have a null size, BUT some implementations such as
926 # Word 8 for Mac seem to allow non-null values => Potential defect: 953 # Word 8 for Mac seem to allow non-null values => Potential defect:
@@ -945,7 +972,7 @@ class _OleDirectoryEntry: @@ -945,7 +972,7 @@ class _OleDirectoryEntry:
945 Note that this method builds a tree of all subentries, so it should 972 Note that this method builds a tree of all subentries, so it should
946 only be called for the root object once. 973 only be called for the root object once.
947 """ 974 """
948 - debug('build_storage_tree: SID=%d - %s - sid_child=%d' 975 + log.debug('build_storage_tree: SID=%d - %s - sid_child=%d'
949 % (self.sid, repr(self.name), self.sid_child)) 976 % (self.sid, repr(self.name), self.sid_child))
950 if self.sid_child != NOSTREAM: 977 if self.sid_child != NOSTREAM:
951 # if child SID is not NOSTREAM, then this entry is a storage. 978 # if child SID is not NOSTREAM, then this entry is a storage.
@@ -980,7 +1007,7 @@ class _OleDirectoryEntry: @@ -980,7 +1007,7 @@ class _OleDirectoryEntry:
980 self.olefile._raise_defect(DEFECT_FATAL, 'OLE DirEntry index out of range') 1007 self.olefile._raise_defect(DEFECT_FATAL, 'OLE DirEntry index out of range')
981 # get child direntry: 1008 # get child direntry:
982 child = self.olefile._load_direntry(child_sid) #direntries[child_sid] 1009 child = self.olefile._load_direntry(child_sid) #direntries[child_sid]
983 - debug('append_kids: child_sid=%d - %s - sid_left=%d, sid_right=%d, sid_child=%d' 1010 + log.debug('append_kids: child_sid=%d - %s - sid_left=%d, sid_right=%d, sid_child=%d'
984 % (child.sid, repr(child.name), child.sid_left, child.sid_right, child.sid_child)) 1011 % (child.sid, repr(child.name), child.sid_left, child.sid_right, child.sid_child))
985 # the directory entries are organized as a red-black tree. 1012 # the directory entries are organized as a red-black tree.
986 # (cf. Wikipedia for details) 1013 # (cf. Wikipedia for details)
@@ -1121,14 +1148,13 @@ class OleFileIO: @@ -1121,14 +1148,13 @@ class OleFileIO:
1121 :param write_mode: bool, if True the file is opened in read/write mode instead 1148 :param write_mode: bool, if True the file is opened in read/write mode instead
1122 of read-only by default. 1149 of read-only by default.
1123 1150
1124 - :param debug: bool, set debug mode 1151 + :param debug: bool, set debug mode (deprecated, not used anymore)
1125 1152
1126 :param path_encoding: None or str, name of the codec to use for path 1153 :param path_encoding: None or str, name of the codec to use for path
1127 names (streams and storages), or None for Unicode. 1154 names (streams and storages), or None for Unicode.
1128 Unicode by default on Python 3+, UTF-8 on Python 2.x. 1155 Unicode by default on Python 3+, UTF-8 on Python 2.x.
1129 (new in olefile 0.42, was hardcoded to Latin-1 until olefile v0.41) 1156 (new in olefile 0.42, was hardcoded to Latin-1 until olefile v0.41)
1130 """ 1157 """
1131 - set_debug_mode(debug)  
1132 # minimal level for defects to be raised as exceptions: 1158 # minimal level for defects to be raised as exceptions:
1133 self._raise_defects_level = raise_defects 1159 self._raise_defects_level = raise_defects
1134 # list of defects/issues not raised as exceptions: 1160 # list of defects/issues not raised as exceptions:
@@ -1160,10 +1186,12 @@ class OleFileIO: @@ -1160,10 +1186,12 @@ class OleFileIO:
1160 """ 1186 """
1161 # added by [PL] 1187 # added by [PL]
1162 if defect_level >= self._raise_defects_level: 1188 if defect_level >= self._raise_defects_level:
  1189 + log.error(message)
1163 raise exception_type(message) 1190 raise exception_type(message)
1164 else: 1191 else:
1165 # just record the issue, no exception raised: 1192 # just record the issue, no exception raised:
1166 self.parsing_issues.append((exception_type, message)) 1193 self.parsing_issues.append((exception_type, message))
  1194 + log.warning(message)
1167 1195
1168 1196
1169 def _decode_utf16_str(self, utf16_str, errors='replace'): 1197 def _decode_utf16_str(self, utf16_str, errors='replace'):
@@ -1235,6 +1263,7 @@ class OleFileIO: @@ -1235,6 +1263,7 @@ class OleFileIO:
1235 finally: 1263 finally:
1236 self.fp.seek(0) 1264 self.fp.seek(0)
1237 self._filesize = filesize 1265 self._filesize = filesize
  1266 + log.debug('File size: %d' % self._filesize)
1238 1267
1239 # lists of streams in FAT and MiniFAT, to detect duplicate references 1268 # lists of streams in FAT and MiniFAT, to detect duplicate references
1240 # (list of indexes of first sectors of each stream) 1269 # (list of indexes of first sectors of each stream)
@@ -1244,6 +1273,7 @@ class OleFileIO: @@ -1244,6 +1273,7 @@ class OleFileIO:
1244 header = self.fp.read(512) 1273 header = self.fp.read(512)
1245 1274
1246 if len(header) != 512 or header[:8] != MAGIC: 1275 if len(header) != 512 or header[:8] != MAGIC:
  1276 + log.debug('Magic = %r instead of %r' % (header[:8], MAGIC))
1247 self._raise_defect(DEFECT_FATAL, "not an OLE2 structured storage file") 1277 self._raise_defect(DEFECT_FATAL, "not an OLE2 structured storage file")
1248 1278
1249 # [PL] header structure according to AAF specifications: 1279 # [PL] header structure according to AAF specifications:
@@ -1285,120 +1315,125 @@ class OleFileIO: @@ -1285,120 +1315,125 @@ class OleFileIO:
1285 # '<' indicates little-endian byte ordering for Intel (cf. struct module help) 1315 # '<' indicates little-endian byte ordering for Intel (cf. struct module help)
1286 fmt_header = '<8s16sHHHHHHLLLLLLLLLL' 1316 fmt_header = '<8s16sHHHHHHLLLLLLLLLL'
1287 header_size = struct.calcsize(fmt_header) 1317 header_size = struct.calcsize(fmt_header)
1288 - debug( "fmt_header size = %d, +FAT = %d" % (header_size, header_size + 109*4) ) 1318 + log.debug( "fmt_header size = %d, +FAT = %d" % (header_size, header_size + 109*4) )
1289 header1 = header[:header_size] 1319 header1 = header[:header_size]
1290 ( 1320 (
1291 - self.Sig,  
1292 - self.clsid,  
1293 - self.MinorVersion,  
1294 - self.DllVersion,  
1295 - self.ByteOrder,  
1296 - self.SectorShift,  
1297 - self.MiniSectorShift,  
1298 - self.Reserved, self.Reserved1,  
1299 - self.csectDir,  
1300 - self.csectFat,  
1301 - self.sectDirStart,  
1302 - self.signature,  
1303 - self.MiniSectorCutoff,  
1304 - self.MiniFatStart,  
1305 - self.csectMiniFat,  
1306 - self.sectDifStart,  
1307 - self.csectDif 1321 + self.header_signature,
  1322 + self.header_clsid,
  1323 + self.minor_version,
  1324 + self.dll_version,
  1325 + self.byte_order,
  1326 + self.sector_shift,
  1327 + self.mini_sector_shift,
  1328 + self.reserved1,
  1329 + self.reserved2,
  1330 + self.num_dir_sectors,
  1331 + self.num_fat_sectors,
  1332 + self.first_dir_sector,
  1333 + self.transaction_signature_number,
  1334 + self.mini_stream_cutoff_size,
  1335 + self.first_mini_fat_sector,
  1336 + self.num_mini_fat_sectors,
  1337 + self.first_difat_sector,
  1338 + self.num_difat_sectors
1308 ) = struct.unpack(fmt_header, header1) 1339 ) = struct.unpack(fmt_header, header1)
1309 - debug( struct.unpack(fmt_header, header1)) 1340 + log.debug( struct.unpack(fmt_header, header1))
1310 1341
1311 - if self.Sig != MAGIC: 1342 + if self.header_signature != MAGIC:
1312 # OLE signature should always be present 1343 # OLE signature should always be present
1313 self._raise_defect(DEFECT_FATAL, "incorrect OLE signature") 1344 self._raise_defect(DEFECT_FATAL, "incorrect OLE signature")
1314 - if self.clsid != bytearray(16): 1345 + if self.header_clsid != bytearray(16):
1315 # according to AAF specs, CLSID should always be zero 1346 # according to AAF specs, CLSID should always be zero
1316 self._raise_defect(DEFECT_INCORRECT, "incorrect CLSID in OLE header") 1347 self._raise_defect(DEFECT_INCORRECT, "incorrect CLSID in OLE header")
1317 - debug( "MinorVersion = %d" % self.MinorVersion )  
1318 - debug( "DllVersion = %d" % self.DllVersion )  
1319 - if self.DllVersion not in [3, 4]: 1348 + log.debug( "Minor Version = %d" % self.minor_version )
  1349 + log.debug( "DLL Version = %d (expected: 3 or 4)" % self.dll_version )
  1350 + if self.dll_version not in [3, 4]:
1320 # version 3: usual format, 512 bytes per sector 1351 # version 3: usual format, 512 bytes per sector
1321 # version 4: large format, 4K per sector 1352 # version 4: large format, 4K per sector
1322 self._raise_defect(DEFECT_INCORRECT, "incorrect DllVersion in OLE header") 1353 self._raise_defect(DEFECT_INCORRECT, "incorrect DllVersion in OLE header")
1323 - debug( "ByteOrder = %X" % self.ByteOrder )  
1324 - if self.ByteOrder != 0xFFFE: 1354 + log.debug( "Byte Order = %X (expected: FFFE)" % self.byte_order )
  1355 + if self.byte_order != 0xFFFE:
1325 # For now only common little-endian documents are handled correctly 1356 # For now only common little-endian documents are handled correctly
1326 self._raise_defect(DEFECT_FATAL, "incorrect ByteOrder in OLE header") 1357 self._raise_defect(DEFECT_FATAL, "incorrect ByteOrder in OLE header")
1327 # TODO: add big-endian support for documents created on Mac ? 1358 # TODO: add big-endian support for documents created on Mac ?
1328 # But according to [MS-CFB] ? v20140502, ByteOrder MUST be 0xFFFE. 1359 # But according to [MS-CFB] ? v20140502, ByteOrder MUST be 0xFFFE.
1329 - self.SectorSize = 2**self.SectorShift  
1330 - debug( "SectorSize = %d" % self.SectorSize )  
1331 - if self.SectorSize not in [512, 4096]:  
1332 - self._raise_defect(DEFECT_INCORRECT, "incorrect SectorSize in OLE header")  
1333 - if (self.DllVersion==3 and self.SectorSize!=512) \  
1334 - or (self.DllVersion==4 and self.SectorSize!=4096):  
1335 - self._raise_defect(DEFECT_INCORRECT, "SectorSize does not match DllVersion in OLE header")  
1336 - self.MiniSectorSize = 2**self.MiniSectorShift  
1337 - debug( "MiniSectorSize = %d" % self.MiniSectorSize )  
1338 - if self.MiniSectorSize not in [64]:  
1339 - self._raise_defect(DEFECT_INCORRECT, "incorrect MiniSectorSize in OLE header")  
1340 - if self.Reserved != 0 or self.Reserved1 != 0: 1360 + self.sector_size = 2**self.sector_shift
  1361 + log.debug( "Sector Size = %d bytes (expected: 512 or 4096)" % self.sector_size )
  1362 + if self.sector_size not in [512, 4096]:
  1363 + self._raise_defect(DEFECT_INCORRECT, "incorrect sector_size in OLE header")
  1364 + if (self.dll_version==3 and self.sector_size!=512) \
  1365 + or (self.dll_version==4 and self.sector_size!=4096):
  1366 + self._raise_defect(DEFECT_INCORRECT, "sector_size does not match DllVersion in OLE header")
  1367 + self.mini_sector_size = 2**self.mini_sector_shift
  1368 + log.debug( "MiniFAT Sector Size = %d bytes (expected: 64)" % self.mini_sector_size )
  1369 + if self.mini_sector_size not in [64]:
  1370 + self._raise_defect(DEFECT_INCORRECT, "incorrect mini_sector_size in OLE header")
  1371 + if self.reserved1 != 0 or self.reserved2 != 0:
1341 self._raise_defect(DEFECT_INCORRECT, "incorrect OLE header (non-null reserved bytes)") 1372 self._raise_defect(DEFECT_INCORRECT, "incorrect OLE header (non-null reserved bytes)")
1342 - debug( "csectDir = %d" % self.csectDir ) 1373 + log.debug( "Number of directory sectors = %d" % self.num_dir_sectors )
1343 # Number of directory sectors (only allowed if DllVersion != 3) 1374 # Number of directory sectors (only allowed if DllVersion != 3)
1344 - if self.SectorSize==512 and self.csectDir!=0:  
1345 - self._raise_defect(DEFECT_INCORRECT, "incorrect csectDir in OLE header")  
1346 - debug( "csectFat = %d" % self.csectFat )  
1347 - # csectFat = number of FAT sectors in the file  
1348 - debug( "sectDirStart = %X" % self.sectDirStart )  
1349 - # sectDirStart = 1st sector containing the directory  
1350 - debug( "signature = %d" % self.signature ) 1375 + if self.sector_size==512 and self.num_dir_sectors!=0:
  1376 + self._raise_defect(DEFECT_INCORRECT, "incorrect number of directory sectors in OLE header")
  1377 + log.debug( "num_fat_sectors = %d" % self.num_fat_sectors )
  1378 + # num_fat_sectors = number of FAT sectors in the file
  1379 + log.debug( "first_dir_sector = %X" % self.first_dir_sector )
  1380 + # first_dir_sector = 1st sector containing the directory
  1381 + log.debug( "transaction_signature_number = %d" % self.transaction_signature_number )
1351 # Signature should be zero, BUT some implementations do not follow this 1382 # Signature should be zero, BUT some implementations do not follow this
1352 # rule => only a potential defect: 1383 # rule => only a potential defect:
1353 # (according to MS-CFB, may be != 0 for applications supporting file 1384 # (according to MS-CFB, may be != 0 for applications supporting file
1354 # transactions) 1385 # transactions)
1355 - if self.signature != 0:  
1356 - self._raise_defect(DEFECT_POTENTIAL, "incorrect OLE header (signature>0)")  
1357 - debug( "MiniSectorCutoff = %d" % self.MiniSectorCutoff ) 1386 + if self.transaction_signature_number != 0:
  1387 + self._raise_defect(DEFECT_POTENTIAL, "incorrect OLE header (transaction_signature_number>0)")
  1388 + log.debug( "mini_stream_cutoff_size = 0x%X (expected: 0x1000)" % self.mini_stream_cutoff_size )
1358 # MS-CFB: This integer field MUST be set to 0x00001000. This field 1389 # MS-CFB: This integer field MUST be set to 0x00001000. This field
1359 # specifies the maximum size of a user-defined data stream allocated 1390 # specifies the maximum size of a user-defined data stream allocated
1360 # from the mini FAT and mini stream, and that cutoff is 4096 bytes. 1391 # from the mini FAT and mini stream, and that cutoff is 4096 bytes.
1361 # Any user-defined data stream larger than or equal to this cutoff size 1392 # Any user-defined data stream larger than or equal to this cutoff size
1362 # must be allocated as normal sectors from the FAT. 1393 # must be allocated as normal sectors from the FAT.
1363 - if self.MiniSectorCutoff != 0x1000:  
1364 - self._raise_defect(DEFECT_INCORRECT, "incorrect MiniSectorCutoff in OLE header")  
1365 - debug( "MiniFatStart = %X" % self.MiniFatStart )  
1366 - debug( "csectMiniFat = %d" % self.csectMiniFat )  
1367 - debug( "sectDifStart = %X" % self.sectDifStart )  
1368 - debug( "csectDif = %d" % self.csectDif ) 1394 + if self.mini_stream_cutoff_size != 0x1000:
  1395 + self._raise_defect(DEFECT_INCORRECT, "incorrect mini_stream_cutoff_size in OLE header")
  1396 + # if no exception is raised, the cutoff size is fixed to 0x1000
  1397 + log.warning('Fixing the mini_stream_cutoff_size to 4096 (mandatory value) instead of %d' %
  1398 + self.mini_stream_cutoff_size)
  1399 + self.mini_stream_cutoff_size = 0x1000
  1400 + log.debug( "first_mini_fat_sector = %Xh" % self.first_mini_fat_sector )
  1401 + log.debug( "num_mini_fat_sectors = %d" % self.num_mini_fat_sectors )
  1402 + log.debug( "first_difat_sector = %Xh" % self.first_difat_sector )
  1403 + log.debug( "num_difat_sectors = %d" % self.num_difat_sectors )
1369 1404
1370 # calculate the number of sectors in the file 1405 # calculate the number of sectors in the file
1371 # (-1 because header doesn't count) 1406 # (-1 because header doesn't count)
1372 - self.nb_sect = ( (filesize + self.SectorSize-1) // self.SectorSize) - 1  
1373 - debug( "Number of sectors in the file: %d" % self.nb_sect ) 1407 + self.nb_sect = ( (filesize + self.sector_size-1) // self.sector_size) - 1
  1408 + log.debug( "Number of sectors in the file: %d" % self.nb_sect )
1374 #TODO: change this test, because an OLE file MAY contain other data 1409 #TODO: change this test, because an OLE file MAY contain other data
1375 # after the last sector. 1410 # after the last sector.
1376 1411
1377 # file clsid 1412 # file clsid
1378 - self.clsid = _clsid(header[8:24]) 1413 + self.header_clsid = _clsid(header[8:24])
1379 1414
1380 #TODO: remove redundant attributes, and fix the code which uses them? 1415 #TODO: remove redundant attributes, and fix the code which uses them?
1381 - self.sectorsize = self.SectorSize #1 << i16(header, 30)  
1382 - self.minisectorsize = self.MiniSectorSize #1 << i16(header, 32)  
1383 - self.minisectorcutoff = self.MiniSectorCutoff # i32(header, 56) 1416 + self.sectorsize = self.sector_size #1 << i16(header, 30)
  1417 + self.minisectorsize = self.mini_sector_size #1 << i16(header, 32)
  1418 + self.minisectorcutoff = self.mini_stream_cutoff_size # i32(header, 56)
1384 1419
1385 # check known streams for duplicate references (these are always in FAT, 1420 # check known streams for duplicate references (these are always in FAT,
1386 # never in MiniFAT): 1421 # never in MiniFAT):
1387 - self._check_duplicate_stream(self.sectDirStart) 1422 + self._check_duplicate_stream(self.first_dir_sector)
1388 # check MiniFAT only if it is not empty: 1423 # check MiniFAT only if it is not empty:
1389 - if self.csectMiniFat:  
1390 - self._check_duplicate_stream(self.MiniFatStart) 1424 + if self.num_mini_fat_sectors:
  1425 + self._check_duplicate_stream(self.first_mini_fat_sector)
1391 # check DIFAT only if it is not empty: 1426 # check DIFAT only if it is not empty:
1392 - if self.csectDif:  
1393 - self._check_duplicate_stream(self.sectDifStart) 1427 + if self.num_difat_sectors:
  1428 + self._check_duplicate_stream(self.first_difat_sector)
1394 1429
1395 # Load file allocation tables 1430 # Load file allocation tables
1396 self.loadfat(header) 1431 self.loadfat(header)
1397 # Load direcory. This sets both the direntries list (ordered by sid) 1432 # Load direcory. This sets both the direntries list (ordered by sid)
1398 # and the root (ordered by hierarchy) members. 1433 # and the root (ordered by hierarchy) members.
1399 - self.loaddirectory(self.sectDirStart)#i32(header, 48)) 1434 + self.loaddirectory(self.first_dir_sector)#i32(header, 48))
1400 self.ministream = None 1435 self.ministream = None
1401 - self.minifatsect = self.MiniFatStart #i32(header, 60) 1436 + self.minifatsect = self.first_mini_fat_sector #i32(header, 60)
1402 1437
1403 1438
1404 def close(self): 1439 def close(self):
@@ -1418,10 +1453,10 @@ class OleFileIO: @@ -1418,10 +1453,10 @@ class OleFileIO:
1418 :param minifat: bool, if True, stream is located in the MiniFAT, else in the FAT 1453 :param minifat: bool, if True, stream is located in the MiniFAT, else in the FAT
1419 """ 1454 """
1420 if minifat: 1455 if minifat:
1421 - debug('_check_duplicate_stream: sect=%d in MiniFAT' % first_sect) 1456 + log.debug('_check_duplicate_stream: sect=%Xh in MiniFAT' % first_sect)
1422 used_streams = self._used_streams_minifat 1457 used_streams = self._used_streams_minifat
1423 else: 1458 else:
1424 - debug('_check_duplicate_stream: sect=%d in FAT' % first_sect) 1459 + log.debug('_check_duplicate_stream: sect=%Xh in FAT' % first_sect)
1425 # some values can be safely ignored (not a real stream): 1460 # some values can be safely ignored (not a real stream):
1426 if first_sect in (DIFSECT,FATSECT,ENDOFCHAIN,FREESECT): 1461 if first_sect in (DIFSECT,FATSECT,ENDOFCHAIN,FREESECT):
1427 return 1462 return
@@ -1435,10 +1470,9 @@ class OleFileIO: @@ -1435,10 +1470,9 @@ class OleFileIO:
1435 1470
1436 1471
1437 def dumpfat(self, fat, firstindex=0): 1472 def dumpfat(self, fat, firstindex=0):
1438 - "Displays a part of FAT in human-readable form for debugging purpose"  
1439 - # [PL] added only for debug  
1440 - if not DEBUG_MODE:  
1441 - return 1473 + """
  1474 + Display a part of FAT in human-readable form for debugging purposes
  1475 + """
1442 # dictionary to convert special FAT values in human-readable strings 1476 # dictionary to convert special FAT values in human-readable strings
1443 VPL = 8 # values per line (8+1 * 8+1 = 81) 1477 VPL = 8 # values per line (8+1 * 8+1 = 81)
1444 fatnames = { 1478 fatnames = {
@@ -1455,7 +1489,7 @@ class OleFileIO: @@ -1455,7 +1489,7 @@ class OleFileIO:
1455 print() 1489 print()
1456 for l in range(nlines): 1490 for l in range(nlines):
1457 index = l*VPL 1491 index = l*VPL
1458 - print("%8X:" % (firstindex+index), end=" ") 1492 + print("%6X:" % (firstindex+index), end=" ")
1459 for i in range(index, index+VPL): 1493 for i in range(index, index+VPL):
1460 if i>=nbsect: 1494 if i>=nbsect:
1461 break 1495 break
@@ -1473,9 +1507,9 @@ class OleFileIO: @@ -1473,9 +1507,9 @@ class OleFileIO:
1473 1507
1474 1508
1475 def dumpsect(self, sector, firstindex=0): 1509 def dumpsect(self, sector, firstindex=0):
1476 - "Displays a sector in a human-readable form, for debugging purpose."  
1477 - if not DEBUG_MODE:  
1478 - return 1510 + """
  1511 + Display a sector in a human-readable form, for debugging purposes
  1512 + """
1479 VPL=8 # number of values per line (8+1 * 8+1 = 81) 1513 VPL=8 # number of values per line (8+1 * 8+1 = 81)
1480 tab = array.array(UINT32, sector) 1514 tab = array.array(UINT32, sector)
1481 if sys.byteorder == 'big': 1515 if sys.byteorder == 'big':
@@ -1488,7 +1522,7 @@ class OleFileIO: @@ -1488,7 +1522,7 @@ class OleFileIO:
1488 print() 1522 print()
1489 for l in range(nlines): 1523 for l in range(nlines):
1490 index = l*VPL 1524 index = l*VPL
1491 - print("%8X:" % (firstindex+index), end=" ") 1525 + print("%6X:" % (firstindex+index), end=" ")
1492 for i in range(index, index+VPL): 1526 for i in range(index, index+VPL):
1493 if i>=nbsect: 1527 if i>=nbsect:
1494 break 1528 break
@@ -1523,14 +1557,18 @@ class OleFileIO: @@ -1523,14 +1557,18 @@ class OleFileIO:
1523 else: 1557 else:
1524 # if it's a raw sector, it is parsed in an array 1558 # if it's a raw sector, it is parsed in an array
1525 fat1 = self.sect2array(sect) 1559 fat1 = self.sect2array(sect)
1526 - self.dumpsect(sect) 1560 + # Display the sector contents only if the logging level is debug:
  1561 + if log.isEnabledFor(logging.DEBUG):
  1562 + self.dumpsect(sect)
1527 # The FAT is a sector chain starting at the first index of itself. 1563 # The FAT is a sector chain starting at the first index of itself.
  1564 + # initialize isect, just in case:
  1565 + isect = None
1528 for isect in fat1: 1566 for isect in fat1:
1529 isect = isect & 0xFFFFFFFF # JYTHON-WORKAROUND 1567 isect = isect & 0xFFFFFFFF # JYTHON-WORKAROUND
1530 - debug("isect = %X" % isect) 1568 + log.debug("isect = %X" % isect)
1531 if isect == ENDOFCHAIN or isect == FREESECT: 1569 if isect == ENDOFCHAIN or isect == FREESECT:
1532 # the end of the sector chain has been reached 1570 # the end of the sector chain has been reached
1533 - debug("found end of sector chain") 1571 + log.debug("found end of sector chain")
1534 break 1572 break
1535 # read the FAT sector 1573 # read the FAT sector
1536 s = self.getsect(isect) 1574 s = self.getsect(isect)
@@ -1551,7 +1589,7 @@ class OleFileIO: @@ -1551,7 +1589,7 @@ class OleFileIO:
1551 # Additional sectors are described by DIF blocks 1589 # Additional sectors are described by DIF blocks
1552 1590
1553 sect = header[76:512] 1591 sect = header[76:512]
1554 - debug( "len(sect)=%d, so %d integers" % (len(sect), len(sect)//4) ) 1592 + log.debug( "len(sect)=%d, so %d integers" % (len(sect), len(sect)//4) )
1555 #fat = [] 1593 #fat = []
1556 # [PL] FAT is an array of 32 bits unsigned ints, it's more effective 1594 # [PL] FAT is an array of 32 bits unsigned ints, it's more effective
1557 # to use an array than a list in Python. 1595 # to use an array than a list in Python.
@@ -1567,53 +1605,57 @@ class OleFileIO: @@ -1567,53 +1605,57 @@ class OleFileIO:
1567 ## s = self.getsect(ix) 1605 ## s = self.getsect(ix)
1568 ## #fat = fat + [i32(s, i) for i in range(0, len(s), 4)] 1606 ## #fat = fat + [i32(s, i) for i in range(0, len(s), 4)]
1569 ## fat = fat + array.array(UINT32, s) 1607 ## fat = fat + array.array(UINT32, s)
1570 - if self.csectDif != 0: 1608 + if self.num_difat_sectors != 0:
1571 # [PL] There's a DIFAT because file is larger than 6.8MB 1609 # [PL] There's a DIFAT because file is larger than 6.8MB
1572 # some checks just in case: 1610 # some checks just in case:
1573 - if self.csectFat <= 109: 1611 + if self.num_fat_sectors <= 109:
1574 # there must be at least 109 blocks in header and the rest in 1612 # there must be at least 109 blocks in header and the rest in
1575 # DIFAT, so number of sectors must be >109. 1613 # DIFAT, so number of sectors must be >109.
1576 self._raise_defect(DEFECT_INCORRECT, 'incorrect DIFAT, not enough sectors') 1614 self._raise_defect(DEFECT_INCORRECT, 'incorrect DIFAT, not enough sectors')
1577 - if self.sectDifStart >= self.nb_sect: 1615 + if self.first_difat_sector >= self.nb_sect:
1578 # initial DIFAT block index must be valid 1616 # initial DIFAT block index must be valid
1579 self._raise_defect(DEFECT_FATAL, 'incorrect DIFAT, first index out of range') 1617 self._raise_defect(DEFECT_FATAL, 'incorrect DIFAT, first index out of range')
1580 - debug( "DIFAT analysis..." ) 1618 + log.debug( "DIFAT analysis..." )
1581 # We compute the necessary number of DIFAT sectors : 1619 # We compute the necessary number of DIFAT sectors :
1582 # Number of pointers per DIFAT sector = (sectorsize/4)-1 1620 # Number of pointers per DIFAT sector = (sectorsize/4)-1
1583 # (-1 because the last pointer is the next DIFAT sector number) 1621 # (-1 because the last pointer is the next DIFAT sector number)
1584 nb_difat_sectors = (self.sectorsize//4)-1 1622 nb_difat_sectors = (self.sectorsize//4)-1
1585 # (if 512 bytes: each DIFAT sector = 127 pointers + 1 towards next DIFAT sector) 1623 # (if 512 bytes: each DIFAT sector = 127 pointers + 1 towards next DIFAT sector)
1586 - nb_difat = (self.csectFat-109 + nb_difat_sectors-1)//nb_difat_sectors  
1587 - debug( "nb_difat = %d" % nb_difat )  
1588 - if self.csectDif != nb_difat: 1624 + nb_difat = (self.num_fat_sectors-109 + nb_difat_sectors-1)//nb_difat_sectors
  1625 + log.debug( "nb_difat = %d" % nb_difat )
  1626 + if self.num_difat_sectors != nb_difat:
1589 raise IOError('incorrect DIFAT') 1627 raise IOError('incorrect DIFAT')
1590 - isect_difat = self.sectDifStart 1628 + isect_difat = self.first_difat_sector
1591 for i in iterrange(nb_difat): 1629 for i in iterrange(nb_difat):
1592 - debug( "DIFAT block %d, sector %X" % (i, isect_difat) ) 1630 + log.debug( "DIFAT block %d, sector %X" % (i, isect_difat) )
1593 #TODO: check if corresponding FAT SID = DIFSECT 1631 #TODO: check if corresponding FAT SID = DIFSECT
1594 sector_difat = self.getsect(isect_difat) 1632 sector_difat = self.getsect(isect_difat)
1595 difat = self.sect2array(sector_difat) 1633 difat = self.sect2array(sector_difat)
1596 - self.dumpsect(sector_difat) 1634 + # Display the sector contents only if the logging level is debug:
  1635 + if log.isEnabledFor(logging.DEBUG):
  1636 + self.dumpsect(sector_difat)
1597 self.loadfat_sect(difat[:nb_difat_sectors]) 1637 self.loadfat_sect(difat[:nb_difat_sectors])
1598 # last DIFAT pointer is next DIFAT sector: 1638 # last DIFAT pointer is next DIFAT sector:
1599 isect_difat = difat[nb_difat_sectors] 1639 isect_difat = difat[nb_difat_sectors]
1600 - debug( "next DIFAT sector: %X" % isect_difat ) 1640 + log.debug( "next DIFAT sector: %X" % isect_difat )
1601 # checks: 1641 # checks:
1602 if isect_difat not in [ENDOFCHAIN, FREESECT]: 1642 if isect_difat not in [ENDOFCHAIN, FREESECT]:
1603 # last DIFAT pointer value must be ENDOFCHAIN or FREESECT 1643 # last DIFAT pointer value must be ENDOFCHAIN or FREESECT
1604 raise IOError('incorrect end of DIFAT') 1644 raise IOError('incorrect end of DIFAT')
1605 -## if len(self.fat) != self.csectFat:  
1606 -## # FAT should contain csectFat blocks  
1607 -## print("FAT length: %d instead of %d" % (len(self.fat), self.csectFat)) 1645 +## if len(self.fat) != self.num_fat_sectors:
  1646 +## # FAT should contain num_fat_sectors blocks
  1647 +## print("FAT length: %d instead of %d" % (len(self.fat), self.num_fat_sectors))
1608 ## raise IOError('incorrect DIFAT') 1648 ## raise IOError('incorrect DIFAT')
1609 # since FAT is read from fixed-size sectors, it may contain more values 1649 # since FAT is read from fixed-size sectors, it may contain more values
1610 # than the actual number of sectors in the file. 1650 # than the actual number of sectors in the file.
1611 # Keep only the relevant sector indexes: 1651 # Keep only the relevant sector indexes:
1612 if len(self.fat) > self.nb_sect: 1652 if len(self.fat) > self.nb_sect:
1613 - debug('len(fat)=%d, shrunk to nb_sect=%d' % (len(self.fat), self.nb_sect)) 1653 + log.debug('len(fat)=%d, shrunk to nb_sect=%d' % (len(self.fat), self.nb_sect))
1614 self.fat = self.fat[:self.nb_sect] 1654 self.fat = self.fat[:self.nb_sect]
1615 - debug('\nFAT:')  
1616 - self.dumpfat(self.fat) 1655 + # Display the FAT contents only if the logging level is debug:
  1656 + if log.isEnabledFor(logging.DEBUG):
  1657 + log.debug('\nFAT:')
  1658 + self.dumpfat(self.fat)
1617 1659
1618 1660
1619 def loadminifat(self): 1661 def loadminifat(self):
@@ -1626,15 +1668,15 @@ class OleFileIO: @@ -1626,15 +1668,15 @@ class OleFileIO:
1626 # 1) Stream size is calculated according to the number of sectors 1668 # 1) Stream size is calculated according to the number of sectors
1627 # declared in the OLE header. This allocated stream may be more than 1669 # declared in the OLE header. This allocated stream may be more than
1628 # needed to store the actual sector indexes. 1670 # needed to store the actual sector indexes.
1629 - # (self.csectMiniFat is the number of sectors of size self.SectorSize)  
1630 - stream_size = self.csectMiniFat * self.SectorSize 1671 + # (self.num_mini_fat_sectors is the number of sectors of size self.sector_size)
  1672 + stream_size = self.num_mini_fat_sectors * self.sector_size
1631 # 2) Actually used size is calculated by dividing the MiniStream size 1673 # 2) Actually used size is calculated by dividing the MiniStream size
1632 # (given by root entry size) by the size of mini sectors, *4 for 1674 # (given by root entry size) by the size of mini sectors, *4 for
1633 # 32 bits indexes: 1675 # 32 bits indexes:
1634 - nb_minisectors = (self.root.size + self.MiniSectorSize-1) // self.MiniSectorSize 1676 + nb_minisectors = (self.root.size + self.mini_sector_size-1) // self.mini_sector_size
1635 used_size = nb_minisectors * 4 1677 used_size = nb_minisectors * 4
1636 - debug('loadminifat(): minifatsect=%d, nb FAT sectors=%d, used_size=%d, stream_size=%d, nb MiniSectors=%d' %  
1637 - (self.minifatsect, self.csectMiniFat, used_size, stream_size, nb_minisectors)) 1678 + log.debug('loadminifat(): minifatsect=%d, nb FAT sectors=%d, used_size=%d, stream_size=%d, nb MiniSectors=%d' %
  1679 + (self.minifatsect, self.num_mini_fat_sectors, used_size, stream_size, nb_minisectors))
1638 if used_size > stream_size: 1680 if used_size > stream_size:
1639 # This is not really a problem, but may indicate a wrong implementation: 1681 # This is not really a problem, but may indicate a wrong implementation:
1640 self._raise_defect(DEFECT_INCORRECT, 'OLE MiniStream is larger than MiniFAT') 1682 self._raise_defect(DEFECT_INCORRECT, 'OLE MiniStream is larger than MiniFAT')
@@ -1644,11 +1686,13 @@ class OleFileIO: @@ -1644,11 +1686,13 @@ class OleFileIO:
1644 #self.minifat = [i32(s, i) for i in range(0, len(s), 4)] 1686 #self.minifat = [i32(s, i) for i in range(0, len(s), 4)]
1645 self.minifat = self.sect2array(s) 1687 self.minifat = self.sect2array(s)
1646 # Then shrink the array to used size, to avoid indexes out of MiniStream: 1688 # Then shrink the array to used size, to avoid indexes out of MiniStream:
1647 - debug('MiniFAT shrunk from %d to %d sectors' % (len(self.minifat), nb_minisectors)) 1689 + log.debug('MiniFAT shrunk from %d to %d sectors' % (len(self.minifat), nb_minisectors))
1648 self.minifat = self.minifat[:nb_minisectors] 1690 self.minifat = self.minifat[:nb_minisectors]
1649 - debug('loadminifat(): len=%d' % len(self.minifat))  
1650 - debug('\nMiniFAT:')  
1651 - self.dumpfat(self.minifat) 1691 + log.debug('loadminifat(): len=%d' % len(self.minifat))
  1692 + # Display the FAT contents only if the logging level is debug:
  1693 + if log.isEnabledFor(logging.DEBUG):
  1694 + log.debug('\nMiniFAT:')
  1695 + self.dumpfat(self.minifat)
1652 1696
1653 def getsect(self, sect): 1697 def getsect(self, sect):
1654 """ 1698 """
@@ -1671,12 +1715,12 @@ class OleFileIO: @@ -1671,12 +1715,12 @@ class OleFileIO:
1671 try: 1715 try:
1672 self.fp.seek(self.sectorsize * (sect+1)) 1716 self.fp.seek(self.sectorsize * (sect+1))
1673 except: 1717 except:
1674 - debug('getsect(): sect=%X, seek=%d, filesize=%d' % 1718 + log.debug('getsect(): sect=%X, seek=%d, filesize=%d' %
1675 (sect, self.sectorsize*(sect+1), self._filesize)) 1719 (sect, self.sectorsize*(sect+1), self._filesize))
1676 self._raise_defect(DEFECT_FATAL, 'OLE sector index out of range') 1720 self._raise_defect(DEFECT_FATAL, 'OLE sector index out of range')
1677 sector = self.fp.read(self.sectorsize) 1721 sector = self.fp.read(self.sectorsize)
1678 if len(sector) != self.sectorsize: 1722 if len(sector) != self.sectorsize:
1679 - debug('getsect(): sect=%X, read=%d, sectorsize=%d' % 1723 + log.debug('getsect(): sect=%X, read=%d, sectorsize=%d' %
1680 (sect, len(sector), self.sectorsize)) 1724 (sect, len(sector), self.sectorsize))
1681 self._raise_defect(DEFECT_FATAL, 'incomplete OLE sector') 1725 self._raise_defect(DEFECT_FATAL, 'incomplete OLE sector')
1682 return sector 1726 return sector
@@ -1698,7 +1742,7 @@ class OleFileIO: @@ -1698,7 +1742,7 @@ class OleFileIO:
1698 try: 1742 try:
1699 self.fp.seek(self.sectorsize * (sect+1)) 1743 self.fp.seek(self.sectorsize * (sect+1))
1700 except: 1744 except:
1701 - debug('write_sect(): sect=%X, seek=%d, filesize=%d' % 1745 + log.debug('write_sect(): sect=%X, seek=%d, filesize=%d' %
1702 (sect, self.sectorsize*(sect+1), self._filesize)) 1746 (sect, self.sectorsize*(sect+1), self._filesize))
1703 self._raise_defect(DEFECT_FATAL, 'OLE sector index out of range') 1747 self._raise_defect(DEFECT_FATAL, 'OLE sector index out of range')
1704 if len(data) < self.sectorsize: 1748 if len(data) < self.sectorsize:
@@ -1725,7 +1769,7 @@ class OleFileIO: @@ -1725,7 +1769,7 @@ class OleFileIO:
1725 #[PL] to detect malformed documents and avoid DoS attacks, the maximum 1769 #[PL] to detect malformed documents and avoid DoS attacks, the maximum
1726 # number of directory entries can be calculated: 1770 # number of directory entries can be calculated:
1727 max_entries = self.directory_fp.size // 128 1771 max_entries = self.directory_fp.size // 128
1728 - debug('loaddirectory: size=%d, max_entries=%d' % 1772 + log.debug('loaddirectory: size=%d, max_entries=%d' %
1729 (self.directory_fp.size, max_entries)) 1773 (self.directory_fp.size, max_entries))
1730 1774
1731 # Create list of directory entries 1775 # Create list of directory entries
@@ -1741,6 +1785,10 @@ class OleFileIO: @@ -1741,6 +1785,10 @@ class OleFileIO:
1741 root_entry = self._load_direntry(0) 1785 root_entry = self._load_direntry(0)
1742 # Root entry is the first entry: 1786 # Root entry is the first entry:
1743 self.root = self.direntries[0] 1787 self.root = self.direntries[0]
  1788 + # TODO: read ALL directory entries (ignore bad entries?)
  1789 + # TODO: adapt build_storage_tree to avoid duplicate reads
  1790 + # for i in range(1, max_entries):
  1791 + # self._load_direntry(i)
1744 # read and build all storage trees, starting from the root: 1792 # read and build all storage trees, starting from the root:
1745 self.root.build_storage_tree() 1793 self.root.build_storage_tree()
1746 1794
@@ -1788,9 +1836,9 @@ class OleFileIO: @@ -1788,9 +1836,9 @@ class OleFileIO:
1788 :param force_FAT: if False (default), stream will be opened in FAT or MiniFAT 1836 :param force_FAT: if False (default), stream will be opened in FAT or MiniFAT
1789 according to size. If True, it will always be opened in FAT. 1837 according to size. If True, it will always be opened in FAT.
1790 """ 1838 """
1791 - debug('OleFileIO.open(): sect=%d, size=%d, force_FAT=%s' % 1839 + log.debug('OleFileIO.open(): sect=%Xh, size=%d, force_FAT=%s' %
1792 (start, size, str(force_FAT))) 1840 (start, size, str(force_FAT)))
1793 - # stream size is compared to the MiniSectorCutoff threshold: 1841 + # stream size is compared to the mini_stream_cutoff_size threshold:
1794 if size < self.minisectorcutoff and not force_FAT: 1842 if size < self.minisectorcutoff and not force_FAT:
1795 # ministream object 1843 # ministream object
1796 if not self.ministream: 1844 if not self.ministream:
@@ -1799,7 +1847,7 @@ class OleFileIO: @@ -1799,7 +1847,7 @@ class OleFileIO:
1799 # The first sector index of the miniFAT stream is stored in the 1847 # The first sector index of the miniFAT stream is stored in the
1800 # root directory entry: 1848 # root directory entry:
1801 size_ministream = self.root.size 1849 size_ministream = self.root.size
1802 - debug('Opening MiniStream: sect=%d, size=%d' % 1850 + log.debug('Opening MiniStream: sect=%Xh, size=%d' %
1803 (self.root.isectStart, size_ministream)) 1851 (self.root.isectStart, size_ministream))
1804 self.ministream = self._open(self.root.isectStart, 1852 self.ministream = self._open(self.root.isectStart,
1805 size_ministream, force_FAT=True) 1853 size_ministream, force_FAT=True)
@@ -1940,12 +1988,12 @@ class OleFileIO: @@ -1940,12 +1988,12 @@ class OleFileIO:
1940 sect = entry.isectStart 1988 sect = entry.isectStart
1941 # number of sectors to write 1989 # number of sectors to write
1942 nb_sectors = (size + (self.sectorsize-1)) // self.sectorsize 1990 nb_sectors = (size + (self.sectorsize-1)) // self.sectorsize
1943 - debug('nb_sectors = %d' % nb_sectors) 1991 + log.debug('nb_sectors = %d' % nb_sectors)
1944 for i in range(nb_sectors): 1992 for i in range(nb_sectors):
1945 ## try: 1993 ## try:
1946 ## self.fp.seek(offset + self.sectorsize * sect) 1994 ## self.fp.seek(offset + self.sectorsize * sect)
1947 ## except: 1995 ## except:
1948 -## debug('sect=%d, seek=%d' % 1996 +## log.debug('sect=%d, seek=%d' %
1949 ## (sect, offset+self.sectorsize*sect)) 1997 ## (sect, offset+self.sectorsize*sect))
1950 ## raise IOError('OLE sector index out of range') 1998 ## raise IOError('OLE sector index out of range')
1951 # extract one sector from data, the last one being smaller: 1999 # extract one sector from data, the last one being smaller:
@@ -1956,7 +2004,7 @@ class OleFileIO: @@ -1956,7 +2004,7 @@ class OleFileIO:
1956 else: 2004 else:
1957 data_sector = data [i*self.sectorsize:] 2005 data_sector = data [i*self.sectorsize:]
1958 #TODO: comment this if it works 2006 #TODO: comment this if it works
1959 - debug('write_stream: size=%d sectorsize=%d data_sector=%d size%%sectorsize=%d' 2007 + log.debug('write_stream: size=%d sectorsize=%d data_sector=%Xh size%%sectorsize=%d'
1960 % (size, self.sectorsize, len(data_sector), size % self.sectorsize)) 2008 % (size, self.sectorsize, len(data_sector), size % self.sectorsize))
1961 assert(len(data_sector) % self.sectorsize==size % self.sectorsize) 2009 assert(len(data_sector) % self.sectorsize==size % self.sectorsize)
1962 self.write_sect(sect, data_sector) 2010 self.write_sect(sect, data_sector)
@@ -2113,31 +2161,31 @@ class OleFileIO: @@ -2113,31 +2161,31 @@ class OleFileIO:
2113 return data 2161 return data
2114 2162
2115 for i in range(num_props): 2163 for i in range(num_props):
  2164 + property_id = 0 # just in case of an exception
2116 try: 2165 try:
2117 - id = 0 # just in case of an exception  
2118 - id = i32(s, 8+i*8) 2166 + property_id = i32(s, 8+i*8)
2119 offset = i32(s, 12+i*8) 2167 offset = i32(s, 12+i*8)
2120 - type = i32(s, offset) 2168 + property_type = i32(s, offset)
2121 2169
2122 - debug ('property id=%d: type=%d offset=%X' % (id, type, offset)) 2170 + log.debug('property id=%d: type=%d offset=%X' % (property_id, property_type, offset))
2123 2171
2124 # test for common types first (should perhaps use 2172 # test for common types first (should perhaps use
2125 # a dictionary instead?) 2173 # a dictionary instead?)
2126 2174
2127 - if type == VT_I2: # 16-bit signed integer 2175 + if property_type == VT_I2: # 16-bit signed integer
2128 value = i16(s, offset+4) 2176 value = i16(s, offset+4)
2129 if value >= 32768: 2177 if value >= 32768:
2130 value = value - 65536 2178 value = value - 65536
2131 - elif type == VT_UI2: # 2-byte unsigned integer 2179 + elif property_type == VT_UI2: # 2-byte unsigned integer
2132 value = i16(s, offset+4) 2180 value = i16(s, offset+4)
2133 - elif type in (VT_I4, VT_INT, VT_ERROR): 2181 + elif property_type in (VT_I4, VT_INT, VT_ERROR):
2134 # VT_I4: 32-bit signed integer 2182 # VT_I4: 32-bit signed integer
2135 # VT_ERROR: HRESULT, similar to 32-bit signed integer, 2183 # VT_ERROR: HRESULT, similar to 32-bit signed integer,
2136 # see http://msdn.microsoft.com/en-us/library/cc230330.aspx 2184 # see http://msdn.microsoft.com/en-us/library/cc230330.aspx
2137 value = i32(s, offset+4) 2185 value = i32(s, offset+4)
2138 - elif type in (VT_UI4, VT_UINT): # 4-byte unsigned integer 2186 + elif property_type in (VT_UI4, VT_UINT): # 4-byte unsigned integer
2139 value = i32(s, offset+4) # FIXME 2187 value = i32(s, offset+4) # FIXME
2140 - elif type in (VT_BSTR, VT_LPSTR): 2188 + elif property_type in (VT_BSTR, VT_LPSTR):
2141 # CodePageString, see http://msdn.microsoft.com/en-us/library/dd942354.aspx 2189 # CodePageString, see http://msdn.microsoft.com/en-us/library/dd942354.aspx
2142 # size is a 32 bits integer, including the null terminator, and 2190 # size is a 32 bits integer, including the null terminator, and
2143 # possibly trailing or embedded null chars 2191 # possibly trailing or embedded null chars
@@ -2146,50 +2194,50 @@ class OleFileIO: @@ -2146,50 +2194,50 @@ class OleFileIO:
2146 value = s[offset+8:offset+8+count-1] 2194 value = s[offset+8:offset+8+count-1]
2147 # remove all null chars: 2195 # remove all null chars:
2148 value = value.replace(b'\x00', b'') 2196 value = value.replace(b'\x00', b'')
2149 - elif type == VT_BLOB: 2197 + elif property_type == VT_BLOB:
2150 # binary large object (BLOB) 2198 # binary large object (BLOB)
2151 # see http://msdn.microsoft.com/en-us/library/dd942282.aspx 2199 # see http://msdn.microsoft.com/en-us/library/dd942282.aspx
2152 count = i32(s, offset+4) 2200 count = i32(s, offset+4)
2153 value = s[offset+8:offset+8+count] 2201 value = s[offset+8:offset+8+count]
2154 - elif type == VT_LPWSTR: 2202 + elif property_type == VT_LPWSTR:
2155 # UnicodeString 2203 # UnicodeString
2156 # see http://msdn.microsoft.com/en-us/library/dd942313.aspx 2204 # see http://msdn.microsoft.com/en-us/library/dd942313.aspx
2157 # "the string should NOT contain embedded or additional trailing 2205 # "the string should NOT contain embedded or additional trailing
2158 # null characters." 2206 # null characters."
2159 count = i32(s, offset+4) 2207 count = i32(s, offset+4)
2160 value = self._decode_utf16_str(s[offset+8:offset+8+count*2]) 2208 value = self._decode_utf16_str(s[offset+8:offset+8+count*2])
2161 - elif type == VT_FILETIME: 2209 + elif property_type == VT_FILETIME:
2162 value = long(i32(s, offset+4)) + (long(i32(s, offset+8))<<32) 2210 value = long(i32(s, offset+4)) + (long(i32(s, offset+8))<<32)
2163 # FILETIME is a 64-bit int: "number of 100ns periods 2211 # FILETIME is a 64-bit int: "number of 100ns periods
2164 # since Jan 1,1601". 2212 # since Jan 1,1601".
2165 - if convert_time and id not in no_conversion:  
2166 - debug('Converting property #%d to python datetime, value=%d=%fs'  
2167 - %(id, value, float(value)/10000000)) 2213 + if convert_time and property_id not in no_conversion:
  2214 + log.debug('Converting property #%d to python datetime, value=%d=%fs'
  2215 + %(property_id, value, float(value)/10000000))
2168 # convert FILETIME to Python datetime.datetime 2216 # convert FILETIME to Python datetime.datetime
2169 # inspired from http://code.activestate.com/recipes/511425-filetime-to-datetime/ 2217 # inspired from http://code.activestate.com/recipes/511425-filetime-to-datetime/
2170 _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0) 2218 _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0)
2171 - debug('timedelta days=%d' % (value//(10*1000000*3600*24))) 2219 + log.debug('timedelta days=%d' % (value//(10*1000000*3600*24)))
2172 value = _FILETIME_null_date + datetime.timedelta(microseconds=value//10) 2220 value = _FILETIME_null_date + datetime.timedelta(microseconds=value//10)
2173 else: 2221 else:
2174 # legacy code kept for backward compatibility: returns a 2222 # legacy code kept for backward compatibility: returns a
2175 # number of seconds since Jan 1,1601 2223 # number of seconds since Jan 1,1601
2176 value = value // 10000000 # seconds 2224 value = value // 10000000 # seconds
2177 - elif type == VT_UI1: # 1-byte unsigned integer 2225 + elif property_type == VT_UI1: # 1-byte unsigned integer
2178 value = i8(s[offset+4]) 2226 value = i8(s[offset+4])
2179 - elif type == VT_CLSID: 2227 + elif property_type == VT_CLSID:
2180 value = _clsid(s[offset+4:offset+20]) 2228 value = _clsid(s[offset+4:offset+20])
2181 - elif type == VT_CF: 2229 + elif property_type == VT_CF:
2182 # PropertyIdentifier or ClipboardData?? 2230 # PropertyIdentifier or ClipboardData??
2183 # see http://msdn.microsoft.com/en-us/library/dd941945.aspx 2231 # see http://msdn.microsoft.com/en-us/library/dd941945.aspx
2184 count = i32(s, offset+4) 2232 count = i32(s, offset+4)
2185 value = s[offset+8:offset+8+count] 2233 value = s[offset+8:offset+8+count]
2186 - elif type == VT_BOOL: 2234 + elif property_type == VT_BOOL:
2187 # VARIANT_BOOL, 16 bits bool, 0x0000=Fals, 0xFFFF=True 2235 # VARIANT_BOOL, 16 bits bool, 0x0000=Fals, 0xFFFF=True
2188 # see http://msdn.microsoft.com/en-us/library/cc237864.aspx 2236 # see http://msdn.microsoft.com/en-us/library/cc237864.aspx
2189 value = bool(i16(s, offset+4)) 2237 value = bool(i16(s, offset+4))
2190 else: 2238 else:
2191 value = None # everything else yields "None" 2239 value = None # everything else yields "None"
2192 - debug ('property id=%d: type=%d not implemented in parser yet' % (id, type)) 2240 + log.debug('property id=%d: type=%d not implemented in parser yet' % (property_id, property_type))
2193 2241
2194 # missing: VT_EMPTY, VT_NULL, VT_R4, VT_R8, VT_CY, VT_DATE, 2242 # missing: VT_EMPTY, VT_NULL, VT_R4, VT_R8, VT_CY, VT_DATE,
2195 # VT_DECIMAL, VT_I1, VT_I8, VT_UI8, 2243 # VT_DECIMAL, VT_I1, VT_I8, VT_UI8,
@@ -2201,15 +2249,15 @@ class OleFileIO: @@ -2201,15 +2249,15 @@ class OleFileIO:
2201 # type of items, e.g. VT_VECTOR|VT_BSTR 2249 # type of items, e.g. VT_VECTOR|VT_BSTR
2202 # see http://msdn.microsoft.com/en-us/library/dd942011.aspx 2250 # see http://msdn.microsoft.com/en-us/library/dd942011.aspx
2203 2251
2204 - #print("%08x" % id, repr(value), end=" ") 2252 + #print("%08x" % property_id, repr(value), end=" ")
2205 #print("(%s)" % VT[i32(s, offset) & 0xFFF]) 2253 #print("(%s)" % VT[i32(s, offset) & 0xFFF])
2206 2254
2207 - data[id] = value 2255 + data[property_id] = value
2208 except BaseException as exc: 2256 except BaseException as exc:
2209 # catch exception while parsing each property, and only raise 2257 # catch exception while parsing each property, and only raise
2210 # a DEFECT_INCORRECT, because parsing can go on 2258 # a DEFECT_INCORRECT, because parsing can go on
2211 msg = 'Error while parsing property id %d in stream %s: %s' % ( 2259 msg = 'Error while parsing property id %d in stream %s: %s' % (
2212 - id, repr(streampath), exc) 2260 + property_id, repr(streampath), exc)
2213 self._raise_defect(DEFECT_INCORRECT, msg, type(exc)) 2261 self._raise_defect(DEFECT_INCORRECT, msg, type(exc))
2214 2262
2215 return data 2263 return data
@@ -2233,38 +2281,47 @@ class OleFileIO: @@ -2233,38 +2281,47 @@ class OleFileIO:
2233 2281
2234 if __name__ == "__main__": 2282 if __name__ == "__main__":
2235 2283
2236 - import sys  
2237 -  
2238 - # [PL] display quick usage info if launched from command-line  
2239 - if len(sys.argv) <= 1:  
2240 - print('olefile version %s %s - %s' % (__version__, __date__, __author__))  
2241 - print(  
2242 -"""  
2243 -Launched from the command line, this script parses OLE files and prints info.  
2244 -  
2245 -Usage: olefile.py [-d] [-c] <file> [file2 ...] 2284 + import sys, optparse
  2285 +
  2286 + DEFAULT_LOG_LEVEL = "warning" # Default log level
  2287 + LOG_LEVELS = {
  2288 + 'debug': logging.DEBUG,
  2289 + 'info': logging.INFO,
  2290 + 'warning': logging.WARNING,
  2291 + 'error': logging.ERROR,
  2292 + 'critical': logging.CRITICAL
  2293 + }
  2294 +
  2295 + usage = 'usage: %prog [options] <filename> [filename2 ...]'
  2296 + parser = optparse.OptionParser(usage=usage)
  2297 + parser.add_option("-c", action="store_true", dest="check_streams",
  2298 + help='check all streams (for debugging purposes)')
  2299 + parser.add_option("-d", action="store_true", dest="debug_mode",
  2300 + help='debug mode, shortcut for -l debug (displays a lot of debug information, for developers only)')
  2301 + parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL,
  2302 + help="logging level debug/info/warning/error/critical (default=%default)")
  2303 +
  2304 + (options, args) = parser.parse_args()
  2305 +
  2306 + print('olefile version %s %s - http://www.decalage.info/en/olefile\n' % (__version__, __date__))
  2307 +
  2308 + # Print help if no arguments are passed
  2309 + if len(args) == 0:
  2310 + print(__doc__)
  2311 + parser.print_help()
  2312 + sys.exit()
2246 2313
2247 -Options:  
2248 --d : debug mode (displays a lot of debug information, for developers only)  
2249 --c : check all streams (for debugging purposes) 2314 + if options.debug_mode:
  2315 + options.loglevel = 'debug'
2250 2316
2251 -For more information, see http://www.decalage.info/olefile  
2252 -""")  
2253 - sys.exit() 2317 + # setup logging to the console
  2318 + logging.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(levelname)-8s %(message)s')
2254 2319
2255 - check_streams = False  
2256 - for filename in sys.argv[1:]:  
2257 -## try:  
2258 - # OPTIONS:  
2259 - if filename == '-d':  
2260 - # option to switch debug mode on:  
2261 - set_debug_mode(True)  
2262 - continue  
2263 - if filename == '-c':  
2264 - # option to switch check streams mode on:  
2265 - check_streams = True  
2266 - continue 2320 + # also set the same log level for the module's logger to enable it:
  2321 + log.setLevel(LOG_LEVELS[options.loglevel])
2267 2322
  2323 + for filename in args:
  2324 + try:
2268 ole = OleFileIO(filename)#, raise_defects=DEFECT_INCORRECT) 2325 ole = OleFileIO(filename)#, raise_defects=DEFECT_INCORRECT)
2269 print("-" * 68) 2326 print("-" * 68)
2270 print(filename) 2327 print(filename)
@@ -2272,24 +2329,27 @@ For more information, see http://www.decalage.info/olefile @@ -2272,24 +2329,27 @@ For more information, see http://www.decalage.info/olefile
2272 ole.dumpdirectory() 2329 ole.dumpdirectory()
2273 for streamname in ole.listdir(): 2330 for streamname in ole.listdir():
2274 if streamname[-1][0] == "\005": 2331 if streamname[-1][0] == "\005":
2275 - print(streamname, ": properties")  
2276 - props = ole.getproperties(streamname, convert_time=True)  
2277 - props = sorted(props.items())  
2278 - for k, v in props:  
2279 - #[PL]: avoid to display too large or binary values:  
2280 - if isinstance(v, (basestring, bytes)):  
2281 - if len(v) > 50:  
2282 - v = v[:50]  
2283 - if isinstance(v, bytes):  
2284 - # quick and dirty binary check:  
2285 - for c in (1,2,3,4,5,6,7,11,12,14,15,16,17,18,19,20,  
2286 - 21,22,23,24,25,26,27,28,29,30,31):  
2287 - if c in bytearray(v):  
2288 - v = '(binary data)'  
2289 - break  
2290 - print(" ", k, v)  
2291 -  
2292 - if check_streams: 2332 + print("%r: properties" % streamname)
  2333 + try:
  2334 + props = ole.getproperties(streamname, convert_time=True)
  2335 + props = sorted(props.items())
  2336 + for k, v in props:
  2337 + #[PL]: avoid to display too large or binary values:
  2338 + if isinstance(v, (basestring, bytes)):
  2339 + if len(v) > 50:
  2340 + v = v[:50]
  2341 + if isinstance(v, bytes):
  2342 + # quick and dirty binary check:
  2343 + for c in (1,2,3,4,5,6,7,11,12,14,15,16,17,18,19,20,
  2344 + 21,22,23,24,25,26,27,28,29,30,31):
  2345 + if c in bytearray(v):
  2346 + v = '(binary data)'
  2347 + break
  2348 + print(" ", k, v)
  2349 + except:
  2350 + log.exception('Error while parsing property stream %r' % streamname)
  2351 +
  2352 + if options.check_streams:
2293 # Read all streams to check if there are errors: 2353 # Read all streams to check if there are errors:
2294 print('\nChecking streams...') 2354 print('\nChecking streams...')
2295 for streamname in ole.listdir(): 2355 for streamname in ole.listdir():
@@ -2318,8 +2378,11 @@ For more information, see http://www.decalage.info/olefile @@ -2318,8 +2378,11 @@ For more information, see http://www.decalage.info/olefile
2318 print() 2378 print()
2319 2379
2320 # parse and display metadata: 2380 # parse and display metadata:
2321 - meta = ole.get_metadata()  
2322 - meta.dump() 2381 + try:
  2382 + meta = ole.get_metadata()
  2383 + meta.dump()
  2384 + except:
  2385 + log.exception('Error while parsing metadata')
2323 print() 2386 print()
2324 #[PL] Test a few new methods: 2387 #[PL] Test a few new methods:
2325 root = ole.get_rootentry_name() 2388 root = ole.get_rootentry_name()
@@ -2338,7 +2401,7 @@ For more information, see http://www.decalage.info/olefile @@ -2338,7 +2401,7 @@ For more information, see http://www.decalage.info/olefile
2338 print('- %s: %s' % (exctype.__name__, msg)) 2401 print('- %s: %s' % (exctype.__name__, msg))
2339 else: 2402 else:
2340 print('None') 2403 print('None')
2341 -## except IOError as v:  
2342 -## print("***", "cannot read", file, "-", v) 2404 + except:
  2405 + log.exception('Error while parsing file %r' % filename)
2343 2406
2344 # this code was developed while listening to The Wedding Present "Sea Monsters" 2407 # this code was developed while listening to The Wedding Present "Sea Monsters"
oletools/thirdparty/olefile/olefile2.py
@@ -1166,33 +1166,33 @@ class OleFileIO: @@ -1166,33 +1166,33 @@ class OleFileIO:
1166 self._raise_defect(DEFECT_FATAL, "incorrect ByteOrder in OLE header") 1166 self._raise_defect(DEFECT_FATAL, "incorrect ByteOrder in OLE header")
1167 # TODO: add big-endian support for documents created on Mac ? 1167 # TODO: add big-endian support for documents created on Mac ?
1168 self.SectorSize = 2**self.SectorShift 1168 self.SectorSize = 2**self.SectorShift
1169 - debug( "SectorSize = %d" % self.SectorSize ) 1169 + debug( "sector_size = %d" % self.SectorSize )
1170 if self.SectorSize not in [512, 4096]: 1170 if self.SectorSize not in [512, 4096]:
1171 - self._raise_defect(DEFECT_INCORRECT, "incorrect SectorSize in OLE header") 1171 + self._raise_defect(DEFECT_INCORRECT, "incorrect sector_size in OLE header")
1172 if (self.DllVersion==3 and self.SectorSize!=512) \ 1172 if (self.DllVersion==3 and self.SectorSize!=512) \
1173 or (self.DllVersion==4 and self.SectorSize!=4096): 1173 or (self.DllVersion==4 and self.SectorSize!=4096):
1174 - self._raise_defect(DEFECT_INCORRECT, "SectorSize does not match DllVersion in OLE header") 1174 + self._raise_defect(DEFECT_INCORRECT, "sector_size does not match DllVersion in OLE header")
1175 self.MiniSectorSize = 2**self.MiniSectorShift 1175 self.MiniSectorSize = 2**self.MiniSectorShift
1176 - debug( "MiniSectorSize = %d" % self.MiniSectorSize ) 1176 + debug( "mini_sector_size = %d" % self.MiniSectorSize )
1177 if self.MiniSectorSize not in [64]: 1177 if self.MiniSectorSize not in [64]:
1178 - self._raise_defect(DEFECT_INCORRECT, "incorrect MiniSectorSize in OLE header") 1178 + self._raise_defect(DEFECT_INCORRECT, "incorrect mini_sector_size in OLE header")
1179 if self.Reserved != 0 or self.Reserved1 != 0: 1179 if self.Reserved != 0 or self.Reserved1 != 0:
1180 self._raise_defect(DEFECT_INCORRECT, "incorrect OLE header (non-null reserved bytes)") 1180 self._raise_defect(DEFECT_INCORRECT, "incorrect OLE header (non-null reserved bytes)")
1181 debug( "csectDir = %d" % self.csectDir ) 1181 debug( "csectDir = %d" % self.csectDir )
1182 if self.SectorSize==512 and self.csectDir!=0: 1182 if self.SectorSize==512 and self.csectDir!=0:
1183 self._raise_defect(DEFECT_INCORRECT, "incorrect csectDir in OLE header") 1183 self._raise_defect(DEFECT_INCORRECT, "incorrect csectDir in OLE header")
1184 - debug( "csectFat = %d" % self.csectFat )  
1185 - debug( "sectDirStart = %X" % self.sectDirStart )  
1186 - debug( "signature = %d" % self.signature ) 1184 + debug( "num_fat_sectors = %d" % self.csectFat )
  1185 + debug( "first_dir_sector = %X" % self.sectDirStart )
  1186 + debug( "transaction_signature_number = %d" % self.signature )
1187 # Signature should be zero, BUT some implementations do not follow this 1187 # Signature should be zero, BUT some implementations do not follow this
1188 # rule => only a potential defect: 1188 # rule => only a potential defect:
1189 if self.signature != 0: 1189 if self.signature != 0:
1190 - self._raise_defect(DEFECT_POTENTIAL, "incorrect OLE header (signature>0)")  
1191 - debug( "MiniSectorCutoff = %d" % self.MiniSectorCutoff )  
1192 - debug( "MiniFatStart = %X" % self.MiniFatStart )  
1193 - debug( "csectMiniFat = %d" % self.csectMiniFat )  
1194 - debug( "sectDifStart = %X" % self.sectDifStart )  
1195 - debug( "csectDif = %d" % self.csectDif ) 1190 + self._raise_defect(DEFECT_POTENTIAL, "incorrect OLE header (transaction_signature_number>0)")
  1191 + debug( "mini_stream_cutoff_size = %d" % self.MiniSectorCutoff )
  1192 + debug( "first_mini_fat_sector = %X" % self.MiniFatStart )
  1193 + debug( "num_mini_fat_sectors = %d" % self.csectMiniFat )
  1194 + debug( "first_difat_sector = %X" % self.sectDifStart )
  1195 + debug( "num_difat_sectors = %d" % self.csectDif )
1196 1196
1197 # calculate the number of sectors in the file 1197 # calculate the number of sectors in the file
1198 # (-1 because header doesn't count) 1198 # (-1 because header doesn't count)
@@ -1414,9 +1414,9 @@ class OleFileIO: @@ -1414,9 +1414,9 @@ class OleFileIO:
1414 if isect_difat not in [ENDOFCHAIN, FREESECT]: 1414 if isect_difat not in [ENDOFCHAIN, FREESECT]:
1415 # last DIFAT pointer value must be ENDOFCHAIN or FREESECT 1415 # last DIFAT pointer value must be ENDOFCHAIN or FREESECT
1416 raise IOError, 'incorrect end of DIFAT' 1416 raise IOError, 'incorrect end of DIFAT'
1417 -## if len(self.fat) != self.csectFat:  
1418 -## # FAT should contain csectFat blocks  
1419 -## print "FAT length: %d instead of %d" % (len(self.fat), self.csectFat) 1417 +## if len(self.fat) != self.num_fat_sectors:
  1418 +## # FAT should contain num_fat_sectors blocks
  1419 +## print "FAT length: %d instead of %d" % (len(self.fat), self.num_fat_sectors)
1420 ## raise IOError, 'incorrect DIFAT' 1420 ## raise IOError, 'incorrect DIFAT'
1421 # since FAT is read from fixed-size sectors, it may contain more values 1421 # since FAT is read from fixed-size sectors, it may contain more values
1422 # than the actual number of sectors in the file. 1422 # than the actual number of sectors in the file.
@@ -1438,7 +1438,7 @@ class OleFileIO: @@ -1438,7 +1438,7 @@ class OleFileIO:
1438 # 1) Stream size is calculated according to the number of sectors 1438 # 1) Stream size is calculated according to the number of sectors
1439 # declared in the OLE header. This allocated stream may be more than 1439 # declared in the OLE header. This allocated stream may be more than
1440 # needed to store the actual sector indexes. 1440 # needed to store the actual sector indexes.
1441 - # (self.csectMiniFat is the number of sectors of size self.SectorSize) 1441 + # (self.num_mini_fat_sectors is the number of sectors of size self.sector_size)
1442 stream_size = self.csectMiniFat * self.SectorSize 1442 stream_size = self.csectMiniFat * self.SectorSize
1443 # 2) Actually used size is calculated by dividing the MiniStream size 1443 # 2) Actually used size is calculated by dividing the MiniStream size
1444 # (given by root entry size) by the size of mini sectors, *4 for 1444 # (given by root entry size) by the size of mini sectors, *4 for
@@ -1565,7 +1565,7 @@ class OleFileIO: @@ -1565,7 +1565,7 @@ class OleFileIO:
1565 """ 1565 """
1566 debug('OleFileIO.open(): sect=%d, size=%d, force_FAT=%s' % 1566 debug('OleFileIO.open(): sect=%d, size=%d, force_FAT=%s' %
1567 (start, size, str(force_FAT))) 1567 (start, size, str(force_FAT)))
1568 - # stream size is compared to the MiniSectorCutoff threshold: 1568 + # stream size is compared to the mini_stream_cutoff_size threshold:
1569 if size < self.minisectorcutoff and not force_FAT: 1569 if size < self.minisectorcutoff and not force_FAT:
1570 # ministream object 1570 # ministream object
1571 if not self.ministream: 1571 if not self.ministream: