Commit efc4692280c273dcda2e5e6f821ed222e0b9e0db

Authored by Philippe Lagadec
1 parent cb1288ff

Replaced OleFileIO_PL by olefile 0.41

oletools/thirdparty/OleFileIO_PL/LICENSE.txt deleted
1   -LICENSE for the OleFileIO_PL module:
2   -
3   -
4   -OleFileIO_PL is an improved version of the OleFileIO module from the
5   -Python Imaging Library (PIL).
6   -
7   -OleFileIO_PL changes are Copyright (c) 2005-2013 by Philippe Lagadec
8   -
9   -The Python Imaging Library (PIL) is
10   - Copyright (c) 1997-2005 by Secret Labs AB
11   - Copyright (c) 1995-2005 by Fredrik Lundh
12   -
13   -By obtaining, using, and/or copying this software and/or its associated
14   -documentation, you agree that you have read, understood, and will comply with
15   -the following terms and conditions:
16   -
17   -Permission to use, copy, modify, and distribute this software and its
18   -associated documentation for any purpose and without fee is hereby granted,
19   -provided that the above copyright notice appears in all copies, and that both
20   -that copyright notice and this permission notice appear in supporting
21   -documentation, and that the name of Secret Labs AB or the author(s) not be used
22   -in advertising or publicity pertaining to distribution of the software
23   -without specific, written prior permission.
24   -
25   -SECRET LABS AB AND THE AUTHORS DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
26   -SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
27   -IN NO EVENT SHALL SECRET LABS AB OR THE AUTHORS BE LIABLE FOR ANY SPECIAL,
28   -INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
29   -LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
30   -OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
31   -PERFORMANCE OF THIS SOFTWARE.
oletools/thirdparty/OleFileIO_PL/README.txt deleted
1   -OleFileIO\_PL
2   -=============
3   -
4   -`OleFileIO\_PL <http://www.decalage.info/python/olefileio>`_ is a Python
5   -module to read `Microsoft OLE2 files (also called Structured Storage,
6   -Compound File Binary Format or Compound Document File
7   -Format) <http://en.wikipedia.org/wiki/Compound_File_Binary_Format>`_,
8   -such as Microsoft Office documents, Image Composer and FlashPix files,
9   -Outlook messages, ...
10   -
11   -This is an improved version of the OleFileIO module from
12   -`PIL <http://www.pythonware.com/products/pil/index.htm>`_, the excellent
13   -Python Imaging Library, created and maintained by Fredrik Lundh. The API
14   -is still compatible with PIL, but I have improved the internal
15   -implementation significantly, with new features, bugfixes and a more
16   -robust design.
17   -
18   -As far as I know, this module is now the most complete and robust Python
19   -implementation to read MS OLE2 files, portable on several operating
20   -systems. (please tell me if you know other similar Python modules)
21   -
22   -WARNING: THIS IS (STILL) WORK IN PROGRESS.
23   -
24   -Main improvements over PIL version of OleFileIO:
25   -------------------------------------------------
26   -
27   -- Better compatibility with Python 2.4 up to 2.7
28   -- Support for files larger than 6.8MB
29   -- Robust: many checks to detect malformed files
30   -- Improved API
31   -- New features: metadata extraction, stream/storage timestamps
32   -- Added setup.py and install.bat to ease installation
33   -
34   -News
35   -----
36   -
37   -- 2013-07-24 v0.26: added methods to parse stream/storage timestamps,
38   - improved listdir to include storages, fixed parsing of direntry
39   - timestamps
40   -- 2013-05-27 v0.25: improved metadata extraction, properties parsing
41   - and exception handling, fixed `issue
42   - #12 <https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole>`_
43   -- 2013-05-07 v0.24: new features to extract metadata (get\_metadata
44   - method and OleMetadata class), improved getproperties to convert
45   - timestamps to Python datetime
46   -- 2012-10-09: published
47   - `python-oletools <http://www.decalage.info/python/oletools>`_, a
48   - package of analysis tools based on OleFileIO\_PL
49   -- 2012-09-11 v0.23: added support for file-like objects, fixed `issue
50   - #8 <https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object>`_
51   -- 2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2
52   - (added close method)
53   -- 2011-10-20: code hosted on bitbucket to ease contributions and bug
54   - tracking
55   -- 2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC
56   - Macs.
57   -- 2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not
58   - plain str.
59   -- 2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben
60   - G. and Martijn for reporting the bug)
61   -- see changelog in source code for more info.
62   -
63   -Download:
64   ----------
65   -
66   -The archive is available on `the project
67   -page <https://bitbucket.org/decalage/olefileio_pl/downloads>`_.
68   -
69   -How to use this module:
70   ------------------------
71   -
72   -See sample code at the end of the module, and also docstrings.
73   -
74   -Here are a few examples:
75   -
76   -::
77   -
78   - import OleFileIO_PL
79   -
80   - # Test if a file is an OLE container:
81   - assert OleFileIO_PL.isOleFile('myfile.doc')
82   -
83   - # Open an OLE file from disk:
84   - ole = OleFileIO_PL.OleFileIO('myfile.doc')
85   -
86   - # Get list of streams:
87   - print ole.listdir()
88   -
89   - # Test if known streams/storages exist:
90   - if ole.exists('worddocument'):
91   - print "This is a Word document."
92   - print "size :", ole.get_size('worddocument')
93   - if ole.exists('macros/vba'):
94   - print "This document seems to contain VBA macros."
95   -
96   - # Extract the "Pictures" stream from a PPT file:
97   - if ole.exists('Pictures'):
98   - pics = ole.openstream('Pictures')
99   - data = pics.read()
100   - f = open('Pictures.bin', 'w')
101   - f.write(data)
102   - f.close()
103   -
104   - # Extract metadata (new in v0.24) - see source code for all attributes:
105   - meta = ole.get_metadata()
106   - print 'Author:', meta.author
107   - print 'Title:', meta.title
108   - print 'Creation date:', meta.create_time
109   - # print all metadata:
110   - meta.dump()
111   -
112   - # Close the OLE file:
113   - ole.close()
114   -
115   - # Work with a file-like object (e.g. StringIO) instead of a file on disk:
116   - data = open('myfile.doc', 'rb').read()
117   - f = StringIO.StringIO(data)
118   - ole = OleFileIO_PL.OleFileIO(f)
119   - print ole.listdir()
120   - ole.close()
121   -
122   -It can also be used as a script from the command-line to display the
123   -structure of an OLE file, for example:
124   -
125   -::
126   -
127   - OleFileIO_PL.py myfile.doc
128   -
129   -A real-life example: `using OleFileIO\_PL for malware analysis and
130   -forensics <http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/>`_.
131   -
132   -How to contribute:
133   -------------------
134   -
135   -The code is available in `a Mercurial repository on
136   -bitbucket <https://bitbucket.org/decalage/olefileio_pl>`_. You may use
137   -it to submit enhancements or to report any issue.
138   -
139   -If you would like to help us improve this module, or simply provide
140   -feedback, you may also send an e-mail to decalage(at)laposte.net. You
141   -can help in many ways:
142   -
143   -- test this module on different platforms / Python versions
144   -- find and report bugs
145   -- improve documentation, code samples, docstrings
146   -- write unittest test cases
147   -- provide tricky malformed files
148   -
149   -How to report bugs:
150   --------------------
151   -
152   -To report a bug, for example a normal file which is not parsed
153   -correctly, please use the `issue reporting
154   -page <https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open>`_,
155   -or send an e-mail with an attachment containing the debugging output of
156   -OleFileIO\_PL.
157   -
158   -For this, launch the following command :
159   -
160   -::
161   -
162   - OleFileIO_PL.py -d -c file >debug.txt
163   -
164   -License
165   --------
166   -
167   -OleFileIO\_PL is open-source.
168   -
169   -OleFileIO\_PL changes are Copyright (c) 2005-2013 by Philippe Lagadec.
170   -
171   -The Python Imaging Library (PIL) is
172   -
173   -- Copyright (c) 1997-2005 by Secret Labs AB
174   -
175   -- Copyright (c) 1995-2005 by Fredrik Lundh
176   -
177   -By obtaining, using, and/or copying this software and/or its associated
178   -documentation, you agree that you have read, understood, and will comply
179   -with the following terms and conditions:
180   -
181   -Permission to use, copy, modify, and distribute this software and its
182   -associated documentation for any purpose and without fee is hereby
183   -granted, provided that the above copyright notice appears in all copies,
184   -and that both that copyright notice and this permission notice appear in
185   -supporting documentation, and that the name of Secret Labs AB or the
186   -author not be used in advertising or publicity pertaining to
187   -distribution of the software without specific, written prior permission.
188   -
189   -SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO
190   -THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
191   -FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR
192   -ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
193   -RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
194   -CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
195   -CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
oletools/thirdparty/OleFileIO_PL/__init__.py deleted
oletools/thirdparty/olefile/CONTRIBUTORS.txt 0 โ†’ 100644
  1 +CONTRIBUTORS for the olefile project
  2 +====================================
  3 +
  4 +This is a non-exhaustive list of all the people who helped me improve the
  5 +olefile project (formerly OleFileIO_PL), in approximative chronological order.
  6 +Please contact me if I forgot to mention your name.
  7 +
  8 +A big thank you to all of them:
  9 +
  10 +- Niko Ehrenfeuchter: added support for Jython
  11 +- Niko Ehrenfeuchter, Martijn Berger and Dave Jones: helped fix 4K sector support
  12 +- Martin Panter: conversion to Python 3.x/2.6+
  13 +- mete0r_kr: added support for file-like objects
  14 +- chuckleberryfinn: fixed bug in getproperties
  15 +- Martijn, Ben G.: bug report for 64 bits platforms
  16 +- Philippe Lagadec: main author and maintainer since 2005
  17 +- and of course Fredrik Lundh: original author of OleFileIO from 1995 to 2005
... ...
oletools/thirdparty/olefile/LICENSE.txt 0 โ†’ 100644
  1 +LICENSE for the olefile package:
  2 +
  3 +olefile (formerly OleFileIO_PL) is copyright (c) 2005-2014 Philippe Lagadec
  4 +(http://www.decalage.info)
  5 +
  6 +All rights reserved.
  7 +
  8 +Redistribution and use in source and binary forms, with or without modification,
  9 +are permitted provided that the following conditions are met:
  10 +
  11 + * Redistributions of source code must retain the above copyright notice, this
  12 + list of conditions and the following disclaimer.
  13 + * Redistributions in binary form must reproduce the above copyright notice,
  14 + this list of conditions and the following disclaimer in the documentation
  15 + and/or other materials provided with the distribution.
  16 +
  17 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  18 +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  19 +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  20 +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  21 +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22 +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23 +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24 +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25 +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  26 +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27 +
  28 +
  29 +----------
  30 +
  31 +olefile is based on source code from the OleFileIO module of the Python
  32 +Imaging Library (PIL) published by Fredrik Lundh under the following license:
  33 +
  34 +The Python Imaging Library (PIL) is
  35 +- Copyright (c) 1997-2005 by Secret Labs AB
  36 +- Copyright (c) 1995-2005 by Fredrik Lundh
  37 +
  38 +By obtaining, using, and/or copying this software and/or its associated
  39 +documentation, you agree that you have read, understood, and will comply with
  40 +the following terms and conditions:
  41 +
  42 +Permission to use, copy, modify, and distribute this software and its
  43 +associated documentation for any purpose and without fee is hereby granted,
  44 +provided that the above copyright notice appears in all copies, and that both
  45 +that copyright notice and this permission notice appear in supporting
  46 +documentation, and that the name of Secret Labs AB or the author not be used
  47 +in advertising or publicity pertaining to distribution of the software without
  48 +specific, written prior permission.
  49 +
  50 +SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
  51 +SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN
  52 +NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL,
  53 +INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
  54 +LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
  55 +OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
  56 +PERFORMANCE OF THIS SOFTWARE.
... ...
oletools/thirdparty/olefile/README.html 0 โ†’ 100644
  1 +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2 +<html xmlns="http://www.w3.org/1999/xhtml">
  3 +<head>
  4 + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  5 + <meta http-equiv="Content-Style-Type" content="text/css" />
  6 + <meta name="generator" content="pandoc" />
  7 + <title></title>
  8 +</head>
  9 +<body>
  10 +<h1 id="olefile-formerly-olefileio_pl">olefile (formerly OleFileIO_PL)</h1>
  11 +<p><a href="http://www.decalage.info/python/olefileio">olefile</a> is a Python package to parse, read and write [Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format)] (http://en.wikipedia.org/wiki/Compound_File_Binary_Format), such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.</p>
  12 +<p><strong>Quick links:</strong> <a href="http://www.decalage.info/olefile">Home page</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/Install">Download/Install</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">Documentation</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/issues?status=new&amp;status=open">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the author</a> - <a href="https://bitbucket.org/decalage/olefileio_pl">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a></p>
  13 +<p>olefile is based on the OleFileIO module from <a href="http://www.pythonware.com/products/pil/index.htm">PIL</a>, the excellent Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate its 9 years and its new write features.</p>
  14 +<p>As far as I know, this module is the most complete and robust Python implementation to read MS OLE2 files, portable on several operating systems. (please tell me if you know other similar Python modules)</p>
  15 +<p>Since 2014 olefile/OleFileIO_PL has been integrated into <a href="http://python-imaging.github.io/">Pillow</a>, the friendly fork of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow regularly.</p>
  16 +<p>olefile can be used as an independent module or with PIL/Pillow.</p>
  17 +<p>olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially for security purposes such as malware analysis and forensics), then please also check my [python-oletools] (http://www.decalage.info/python/oletools), which are built upon olefile and provide a higher-level interface.</p>
  18 +<h2 id="news">News</h2>
  19 +<p>Follow all updates and news on Twitter: <a href="https://twitter.com/decalage2"><code class="url">https://twitter.com/decalage2</code></a></p>
  20 +<ul>
  21 +<li><strong>2014-11-25 v0.41</strong>: OleFileIO.open and isOleFile now support OLE files stored in byte strings, fixed installer for python 3, added support for Jython (Niko Ehrenfeuchter)</li>
  22 +<li>2014-10-01 v0.40: renamed OleFileIO_PL to olefile, added initial write support for streams &gt;4K, updated doc and license, improved the setup script.</li>
  23 +<li>2014-07-27 v0.31: fixed support for large files with 4K sectors, thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added test scripts from Pillow (by hugovk). Fixed setup for Python 3 (Martin Panter)</li>
  24 +<li>2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin Panter who did most of the hard work.</li>
  25 +<li>2013-07-24 v0.26: added methods to parse stream/storage timestamps, improved listdir to include storages, fixed parsing of direntry timestamps</li>
  26 +<li>2013-05-27 v0.25: improved metadata extraction, properties parsing and exception handling, fixed [issue #12] (https://bitbucket.org/decalage/olefileio_pl/issue/12/error-when-converting-timestamps-in-ole)</li>
  27 +<li>2013-05-07 v0.24: new features to extract metadata (get_metadata method and OleMetadata class), improved getproperties to convert timestamps to Python datetime</li>
  28 +<li>2012-10-09: published <a href="http://www.decalage.info/python/oletools">python-oletools</a>, a package of analysis tools based on OleFileIO_PL</li>
  29 +<li>2012-09-11 v0.23: added support for file-like objects, fixed [issue #8] (https://bitbucket.org/decalage/olefileio_pl/issue/8/bug-with-file-object)</li>
  30 +<li>2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2 (added close method)</li>
  31 +<li>2011-10-20: code hosted on bitbucket to ease contributions and bug tracking</li>
  32 +<li>2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC Macs.</li>
  33 +<li>2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not plain str.</li>
  34 +<li>2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben G. and Martijn for reporting the bug)</li>
  35 +<li>see changelog in source code for more info.</li>
  36 +</ul>
  37 +<h2 id="downloadinstall">Download/Install</h2>
  38 +<p>If you have pip or setuptools installed, you may simply run &quot;<strong>pip install olefile</strong>&quot; or &quot;<strong>easy_install olefile</strong>&quot;. Otherwise, see https://bitbucket.org/decalage/olefileio_pl/wiki/Install</p>
  39 +<h2 id="features">Features</h2>
  40 +<ul>
  41 +<li>Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView OIB files, etc</li>
  42 +<li>List all the streams and storages contained in an OLE file</li>
  43 +<li>Open streams as files</li>
  44 +<li>Parse and read property streams, containing metadata of the file</li>
  45 +<li>Portable, pure Python module, no dependency</li>
  46 +</ul>
  47 +<h2 id="main-improvements-over-the-original-version-of-olefileio-in-pil">Main improvements over the original version of OleFileIO in PIL:</h2>
  48 +<ul>
  49 +<li>Compatible with Python 3.x and 2.6+</li>
  50 +<li>Many bug fixes</li>
  51 +<li>Support for files larger than 6.8MB</li>
  52 +<li>Support for 64 bits platforms and big-endian CPUs</li>
  53 +<li>Robust: many checks to detect malformed files</li>
  54 +<li>Runtime option to choose if malformed files should be parsed or raise exceptions</li>
  55 +<li>Improved API</li>
  56 +<li>Metadata extraction, stream/storage timestamps (e.g. for document forensics)</li>
  57 +<li>Can open file-like objects</li>
  58 +<li>Added setup.py and install.bat to ease installation</li>
  59 +<li>More convenient slash-based syntax for stream paths</li>
  60 +<li>Write features</li>
  61 +</ul>
  62 +<h2 id="documentation">Documentation</h2>
  63 +<p>Please see the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">online documentation</a> for more information, especially the <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview">OLE overview</a> and the [API page] (https://bitbucket.org/decalage/olefileio_pl/wiki/API) which describe how to use olefile in Python applications. A copy of the same documentation is also provided in the doc subfolder of the olefile package.</p>
  64 +<h2 id="real-life-examples">Real-life examples</h2>
  65 +<p>A real-life example: [using OleFileIO_PL for malware analysis and forensics] (http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/).</p>
  66 +<p>See also <a href="https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879">this paper</a> about python tools for forensics, which features olefile.</p>
  67 +<h2 id="license">License</h2>
  68 +<p>olefile (formerly OleFileIO_PL) is copyright (c) 2005-2014 Philippe Lagadec (<a href="http://www.decalage.info">http://www.decalage.info</a>)</p>
  69 +<p>All rights reserved.</p>
  70 +<p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p>
  71 +<ul>
  72 +<li>Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.</li>
  73 +<li>Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.</li>
  74 +</ul>
  75 +<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS &quot;AS IS&quot; AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
  76 +<hr />
  77 +<p>olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik Lundh under the following license:</p>
  78 +<p>The Python Imaging Library (PIL) is</p>
  79 +<ul>
  80 +<li>Copyright (c) 1997-2005 by Secret Labs AB</li>
  81 +<li>Copyright (c) 1995-2005 by Fredrik Lundh</li>
  82 +</ul>
  83 +<p>By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:</p>
  84 +<p>Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.</p>
  85 +<p>SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.</p>
  86 +</body>
  87 +</html>
... ...
oletools/thirdparty/olefile/README.rst 0 โ†’ 100644
  1 +olefile (formerly OleFileIO\_PL)
  2 +================================
  3 +
  4 +`olefile <http://www.decalage.info/python/olefileio>`_ is a Python
  5 +package to parse, read and write [Microsoft OLE2 files (also called
  6 +Structured Storage, Compound File Binary Format or Compound Document
  7 +File Format)]
  8 +(http://en.wikipedia.org/wiki/Compound\_File\_Binary\_Format), such as
  9 +Microsoft Office 97-2003 documents, Image Composer and FlashPix files,
  10 +Outlook messages, StickyNotes, several Microscopy file formats, McAfee
  11 +antivirus quarantine files, etc.
  12 +
  13 +**Quick links:** `Home page <http://www.decalage.info/olefile>`_ -
  14 +`Download/Install <https://bitbucket.org/decalage/olefileio_pl/wiki/Install>`_
  15 +- `Documentation <https://bitbucket.org/decalage/olefileio_pl/wiki>`_ -
  16 +`Report
  17 +Issues/Suggestions/Questions <https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open>`_
  18 +- `Contact the author <http://decalage.info/contact>`_ -
  19 +`Repository <https://bitbucket.org/decalage/olefileio_pl>`_ - `Updates
  20 +on Twitter <https://twitter.com/decalage2>`_
  21 +
  22 +olefile is based on the OleFileIO module from
  23 +`PIL <http://www.pythonware.com/products/pil/index.htm>`_, the excellent
  24 +Python Imaging Library, created and maintained by Fredrik Lundh. The
  25 +olefile API is still compatible with PIL, but since 2005 I have improved
  26 +the internal implementation significantly, with new features, bugfixes
  27 +and a more robust design. From 2005 to 2014 the project was called
  28 +OleFileIO\_PL, and in 2014 I changed its name to olefile to celebrate
  29 +its 9 years and its new write features.
  30 +
  31 +As far as I know, this module is the most complete and robust Python
  32 +implementation to read MS OLE2 files, portable on several operating
  33 +systems. (please tell me if you know other similar Python modules)
  34 +
  35 +Since 2014 olefile/OleFileIO\_PL has been integrated into
  36 +`Pillow <http://python-imaging.github.io/>`_, the friendly fork of PIL.
  37 +olefile will continue to be improved as a separate project, and new
  38 +versions will be merged into Pillow regularly.
  39 +
  40 +olefile can be used as an independent module or with PIL/Pillow.
  41 +
  42 +olefile is mostly meant for developers. If you are looking for tools to
  43 +analyze OLE files or to extract data (especially for security purposes
  44 +such as malware analysis and forensics), then please also check my
  45 +[python-oletools] (http://www.decalage.info/python/oletools), which are
  46 +built upon olefile and provide a higher-level interface.
  47 +
  48 +News
  49 +----
  50 +
  51 +Follow all updates and news on Twitter: https://twitter.com/decalage2
  52 +
  53 +- **2014-11-25 v0.41**: OleFileIO.open and isOleFile now support OLE
  54 + files stored in byte strings, fixed installer for python 3, added
  55 + support for Jython (Niko Ehrenfeuchter)
  56 +- 2014-10-01 v0.40: renamed OleFileIO\_PL to olefile, added initial
  57 + write support for streams >4K, updated doc and license, improved the
  58 + setup script.
  59 +- 2014-07-27 v0.31: fixed support for large files with 4K sectors,
  60 + thanks to Niko Ehrenfeuchter, Martijn Berger and Dave Jones. Added
  61 + test scripts from Pillow (by hugovk). Fixed setup for Python 3
  62 + (Martin Panter)
  63 +- 2014-02-04 v0.30: now compatible with Python 3.x, thanks to Martin
  64 + Panter who did most of the hard work.
  65 +- 2013-07-24 v0.26: added methods to parse stream/storage timestamps,
  66 + improved listdir to include storages, fixed parsing of direntry
  67 + timestamps
  68 +- 2013-05-27 v0.25: improved metadata extraction, properties parsing
  69 + and exception handling, fixed [issue #12]
  70 + (https://bitbucket.org/decalage/olefileio\_pl/issue/12/error-when-converting-timestamps-in-ole)
  71 +- 2013-05-07 v0.24: new features to extract metadata (get\_metadata
  72 + method and OleMetadata class), improved getproperties to convert
  73 + timestamps to Python datetime
  74 +- 2012-10-09: published
  75 + `python-oletools <http://www.decalage.info/python/oletools>`_, a
  76 + package of analysis tools based on OleFileIO\_PL
  77 +- 2012-09-11 v0.23: added support for file-like objects, fixed [issue
  78 + #8]
  79 + (https://bitbucket.org/decalage/olefileio\_pl/issue/8/bug-with-file-object)
  80 +- 2012-02-17 v0.22: fixed issues #7 (bug in getproperties) and #2
  81 + (added close method)
  82 +- 2011-10-20: code hosted on bitbucket to ease contributions and bug
  83 + tracking
  84 +- 2010-01-24 v0.21: fixed support for big-endian CPUs, such as PowerPC
  85 + Macs.
  86 +- 2009-12-11 v0.20: small bugfix in OleFileIO.open when filename is not
  87 + plain str.
  88 +- 2009-12-10 v0.19: fixed support for 64 bits platforms (thanks to Ben
  89 + G. and Martijn for reporting the bug)
  90 +- see changelog in source code for more info.
  91 +
  92 +Download/Install
  93 +----------------
  94 +
  95 +If you have pip or setuptools installed, you may simply run "**pip
  96 +install olefile**\ " or "**easy\_install olefile**\ ". Otherwise, see
  97 +https://bitbucket.org/decalage/olefileio\_pl/wiki/Install
  98 +
  99 +Features
  100 +--------
  101 +
  102 +- Parse, read and write any OLE file such as Microsoft Office 97-2003
  103 + legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt,
  104 + Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook
  105 + messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView
  106 + OIB files, etc
  107 +- List all the streams and storages contained in an OLE file
  108 +- Open streams as files
  109 +- Parse and read property streams, containing metadata of the file
  110 +- Portable, pure Python module, no dependency
  111 +
  112 +Main improvements over the original version of OleFileIO in PIL:
  113 +----------------------------------------------------------------
  114 +
  115 +- Compatible with Python 3.x and 2.6+
  116 +- Many bug fixes
  117 +- Support for files larger than 6.8MB
  118 +- Support for 64 bits platforms and big-endian CPUs
  119 +- Robust: many checks to detect malformed files
  120 +- Runtime option to choose if malformed files should be parsed or raise
  121 + exceptions
  122 +- Improved API
  123 +- Metadata extraction, stream/storage timestamps (e.g. for document
  124 + forensics)
  125 +- Can open file-like objects
  126 +- Added setup.py and install.bat to ease installation
  127 +- More convenient slash-based syntax for stream paths
  128 +- Write features
  129 +
  130 +Documentation
  131 +-------------
  132 +
  133 +Please see the `online
  134 +documentation <https://bitbucket.org/decalage/olefileio_pl/wiki>`_ for
  135 +more information, especially the `OLE
  136 +overview <https://bitbucket.org/decalage/olefileio_pl/wiki/OLE_Overview>`_
  137 +and the [API page]
  138 +(https://bitbucket.org/decalage/olefileio\_pl/wiki/API) which describe
  139 +how to use olefile in Python applications. A copy of the same
  140 +documentation is also provided in the doc subfolder of the olefile
  141 +package.
  142 +
  143 +Real-life examples
  144 +------------------
  145 +
  146 +A real-life example: [using OleFileIO\_PL for malware analysis and
  147 +forensics]
  148 +(http://blog.gregback.net/2011/03/using-remnux-for-forensic-puzzle-6/).
  149 +
  150 +See also `this
  151 +paper <https://computer-forensics.sans.org/community/papers/gcfa/grow-forensic-tools-taxonomy-python-libraries-helpful-forensic-analysis_6879>`_
  152 +about python tools for forensics, which features olefile.
  153 +
  154 +License
  155 +-------
  156 +
  157 +olefile (formerly OleFileIO\_PL) is copyright (c) 2005-2014 Philippe
  158 +Lagadec (`http://www.decalage.info <http://www.decalage.info>`_)
  159 +
  160 +All rights reserved.
  161 +
  162 +Redistribution and use in source and binary forms, with or without
  163 +modification, are permitted provided that the following conditions are
  164 +met:
  165 +
  166 +- Redistributions of source code must retain the above copyright
  167 + notice, this list of conditions and the following disclaimer.
  168 +- Redistributions in binary form must reproduce the above copyright
  169 + notice, this list of conditions and the following disclaimer in the
  170 + documentation and/or other materials provided with the distribution.
  171 +
  172 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS
  173 +IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
  174 +TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
  175 +PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
  176 +HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
  177 +SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED
  178 +TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
  179 +PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
  180 +LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
  181 +NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
  182 +SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  183 +
  184 +--------------
  185 +
  186 +olefile is based on source code from the OleFileIO module of the Python
  187 +Imaging Library (PIL) published by Fredrik Lundh under the following
  188 +license:
  189 +
  190 +The Python Imaging Library (PIL) is
  191 +
  192 +- Copyright (c) 1997-2005 by Secret Labs AB
  193 +- Copyright (c) 1995-2005 by Fredrik Lundh
  194 +
  195 +By obtaining, using, and/or copying this software and/or its associated
  196 +documentation, you agree that you have read, understood, and will comply
  197 +with the following terms and conditions:
  198 +
  199 +Permission to use, copy, modify, and distribute this software and its
  200 +associated documentation for any purpose and without fee is hereby
  201 +granted, provided that the above copyright notice appears in all copies,
  202 +and that both that copyright notice and this permission notice appear in
  203 +supporting documentation, and that the name of Secret Labs AB or the
  204 +author not be used in advertising or publicity pertaining to
  205 +distribution of the software without specific, written prior permission.
  206 +
  207 +SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO
  208 +THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
  209 +FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR
  210 +ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER
  211 +RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
  212 +CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
  213 +CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
... ...
oletools/thirdparty/olefile/__init__.py 0 โ†’ 100644
  1 +#!/usr/local/bin/python
  2 +# -*- coding: latin-1 -*-
  3 +"""
  4 +olefile (formerly OleFileIO_PL)
  5 +
  6 +Module to read/write Microsoft OLE2 files (also called Structured Storage or
  7 +Microsoft Compound Document File Format), such as Microsoft Office 97-2003
  8 +documents, Image Composer and FlashPix files, Outlook messages, ...
  9 +This version is compatible with Python 2.6+ and 3.x
  10 +
  11 +Project website: http://www.decalage.info/olefile
  12 +
  13 +olefile is copyright (c) 2005-2014 Philippe Lagadec (http://www.decalage.info)
  14 +
  15 +olefile is based on the OleFileIO module from the PIL library v1.1.6
  16 +See: http://www.pythonware.com/products/pil/index.htm
  17 +
  18 +The Python Imaging Library (PIL) is
  19 + Copyright (c) 1997-2005 by Secret Labs AB
  20 + Copyright (c) 1995-2005 by Fredrik Lundh
  21 +
  22 +See source code and LICENSE.txt for information on usage and redistribution.
  23 +"""
  24 +
  25 +try:
  26 + # first try to import olefile for Python 2.6+/3.x
  27 + from .olefile import *
  28 + # import metadata not covered by *:
  29 + from .olefile import __version__, __author__, __date__
  30 +
  31 +except:
  32 + # if it fails, fallback to the old version olefile2 for Python 2.x:
  33 + from .olefile2 import *
  34 + # import metadata not covered by *:
  35 + from .olefile2 import __doc__, __version__, __author__, __date__
... ...
oletools/thirdparty/olefile/doc/API.html 0 โ†’ 100644
  1 +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2 +<html xmlns="http://www.w3.org/1999/xhtml">
  3 +<head>
  4 + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  5 + <meta http-equiv="Content-Style-Type" content="text/css" />
  6 + <meta name="generator" content="pandoc" />
  7 + <title></title>
  8 +</head>
  9 +<body>
  10 +<h1 id="how-to-use-olefile---api">How to use olefile - API</h1>
  11 +<p>This page is part of the documentation for <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">olefile</a>. It explains how to use all its features to parse and write OLE files. For more information about OLE files, see <a href="OLE_Overview.html">OLE_Overview</a>.</p>
  12 +<p>olefile can be used as an independent module or with PIL/Pillow. The main functions and methods are explained below.</p>
  13 +<p>For more information, see also the file <strong>olefile.html</strong>, sample code at the end of the module itself, and docstrings within the code.</p>
  14 +<h2 id="import-olefile">Import olefile</h2>
  15 +<p>When the olefile package has been installed, it can be imported in Python applications with this statement:</p>
  16 +<pre><code>import olefile</code></pre>
  17 +<p>Before v0.40, olefile was named OleFileIO_PL. To maintain backward compatibility with older applications and samples, a simple script is also installed so that the following statement imports olefile as OleFileIO_PL:</p>
  18 +<pre><code>import OleFileIO_PL</code></pre>
  19 +<p>As of version 0.30, the code has been changed to be compatible with Python 3.x. As a consequence, compatibility with Python 2.5 or older is not provided anymore. However, a copy of OleFileIO_PL v0.26 (with some backported enhancements) is available as olefile2.py. When importing the olefile package, it falls back automatically to olefile2 if running on Python 2.5 or older. This is implemented in olefile/<strong>init</strong>.py. (new in v0.40)</p>
  20 +<p>If you think olefile should stay compatible with Python 2.5 or older, please <a href="http://decalage.info/contact">contact me</a>.</p>
  21 +<h2 id="test-if-a-file-is-an-ole-container">Test if a file is an OLE container</h2>
  22 +<p>Use <strong>isOleFile</strong> to check if the first bytes of the file contain the Magic for OLE files, before opening it. isOleFile returns True if it is an OLE file, False otherwise (new in v0.16).</p>
  23 +<pre><code>assert olefile.isOleFile(&#39;myfile.doc&#39;)</code></pre>
  24 +<p>The argument of isOleFile can be (new in v0.41):</p>
  25 +<ul>
  26 +<li>the path of the file to open on disk (bytes or unicode string smaller than 1536 bytes),</li>
  27 +<li>or a bytes string containing the file in memory. (bytes string longer than 1535 bytes),</li>
  28 +<li>or a file-like object (with read and seek methods).</li>
  29 +</ul>
  30 +<h2 id="open-an-ole-file-from-disk">Open an OLE file from disk</h2>
  31 +<p>Create an <strong>OleFileIO</strong> object with the file path as parameter:</p>
  32 +<pre><code>ole = olefile.OleFileIO(&#39;myfile.doc&#39;)</code></pre>
  33 +<h2 id="open-an-ole-file-from-a-bytes-string">Open an OLE file from a bytes string</h2>
  34 +<p>This is useful if the file is already stored in memory as a bytes string.</p>
  35 +<pre><code>ole = olefile.OleFileIO(s)</code></pre>
  36 +<p>Note: olefile checks the size of the string provided as argument to determine if it is a file path or the content of an OLE file. An OLE file cannot be smaller than 1536 bytes. If the string is larger than 1535 bytes, then it is expected to contain an OLE file, otherwise it is expected to be a file path.</p>
  37 +<p>(new in v0.41)</p>
  38 +<h2 id="open-an-ole-file-from-a-file-like-object">Open an OLE file from a file-like object</h2>
  39 +<p>This is useful if the file is not on disk but only available as a file-like object (with read, seek and tell methods).</p>
  40 +<pre><code>ole = olefile.OleFileIO(f)</code></pre>
  41 +<p>If the file-like object does not have seek or tell methods, the easiest solution is to read the file entirely in a bytes string before parsing:</p>
  42 +<pre><code>data = f.read()
  43 +ole = olefile.OleFileIO(data)</code></pre>
  44 +<h2 id="how-to-handle-malformed-ole-files">How to handle malformed OLE files</h2>
  45 +<p>By default, the parser is configured to be as robust and permissive as possible, allowing to parse most malformed OLE files. Only fatal errors will raise an exception. It is possible to tell the parser to be more strict in order to raise exceptions for files that do not fully conform to the OLE specifications, using the raise_defect option (new in v0.14):</p>
  46 +<pre><code>ole = olefile.OleFileIO(&#39;myfile.doc&#39;, raise_defects=olefile.DEFECT_INCORRECT)</code></pre>
  47 +<p>When the parsing is done, the list of non-fatal issues detected is available as a list in the parsing_issues attribute of the OleFileIO object (new in 0.25):</p>
  48 +<pre><code>print(&#39;Non-fatal issues raised during parsing:&#39;)
  49 +if ole.parsing_issues:
  50 + for exctype, msg in ole.parsing_issues:
  51 + print(&#39;- %s: %s&#39; % (exctype.__name__, msg))
  52 +else:
  53 + print(&#39;None&#39;)</code></pre>
  54 +<h2 id="open-an-ole-file-in-write-mode">Open an OLE file in write mode</h2>
  55 +<p>Before using the write features, the OLE file must be opened in read/write mode:</p>
  56 +<pre><code>ole = olefile.OleFileIO(&#39;test.doc&#39;, write_mode=True)</code></pre>
  57 +<p>(new in v0.40)</p>
  58 +<p>The code for write features is new and it has not been thoroughly tested yet. See <a href="https://bitbucket.org/decalage/olefileio_pl/issue/6/improve-olefileio_pl-to-write-ole-files">issue #6</a> for the roadmap and the implementation status. If you encounter any issue, please send me your <a href="http://www.decalage.info/en/contact">feedback</a> or <a href="https://bitbucket.org/decalage/olefileio_pl/issues?status=new&amp;status=open">report issues</a>.</p>
  59 +<h2 id="syntax-for-stream-and-storage-path">Syntax for stream and storage path</h2>
  60 +<p>Two different syntaxes are allowed for methods that need or return the path of streams and storages:</p>
  61 +<ol style="list-style-type: decimal">
  62 +<li><p>Either a <strong>list of strings</strong> including all the storages from the root up to the stream/storage name. For example a stream called &quot;WordDocument&quot; at the root will have ['WordDocument'] as full path. A stream called &quot;ThisDocument&quot; located in the storage &quot;Macros/VBA&quot; will be ['Macros', 'VBA', 'ThisDocument']. This is the original syntax from PIL. While hard to read and not very convenient, this syntax works in all cases.</p></li>
  63 +<li><p>Or a <strong>single string with slashes</strong> to separate storage and stream names (similar to the Unix path syntax). The previous examples would be 'WordDocument' and 'Macros/VBA/ThisDocument'. This syntax is easier, but may fail if a stream or storage name contains a slash. (new in v0.15)</p></li>
  64 +</ol>
  65 +<p>Both are case-insensitive.</p>
  66 +<p>Switching between the two is easy:</p>
  67 +<pre><code>slash_path = &#39;/&#39;.join(list_path)
  68 +list_path = slash_path.split(&#39;/&#39;)</code></pre>
  69 +<h2 id="get-the-list-of-streams">Get the list of streams</h2>
  70 +<p>listdir() returns a list of all the streams contained in the OLE file, including those stored in storages. Each stream is listed itself as a list, as described above.</p>
  71 +<pre><code>print(ole.listdir())</code></pre>
  72 +<p>Sample result:</p>
  73 +<pre><code>[[&#39;\x01CompObj&#39;], [&#39;\x05DocumentSummaryInformation&#39;], [&#39;\x05SummaryInformation&#39;]
  74 +, [&#39;1Table&#39;], [&#39;Macros&#39;, &#39;PROJECT&#39;], [&#39;Macros&#39;, &#39;PROJECTwm&#39;], [&#39;Macros&#39;, &#39;VBA&#39;,
  75 +&#39;Module1&#39;], [&#39;Macros&#39;, &#39;VBA&#39;, &#39;ThisDocument&#39;], [&#39;Macros&#39;, &#39;VBA&#39;, &#39;_VBA_PROJECT&#39;]
  76 +, [&#39;Macros&#39;, &#39;VBA&#39;, &#39;dir&#39;], [&#39;ObjectPool&#39;], [&#39;WordDocument&#39;]]</code></pre>
  77 +<p>As an option it is possible to choose if storages should also be listed, with or without streams (new in v0.26):</p>
  78 +<pre><code>ole.listdir (streams=False, storages=True)</code></pre>
  79 +<h2 id="test-if-known-streamsstorages-exist">Test if known streams/storages exist:</h2>
  80 +<p>exists(path) checks if a given stream or storage exists in the OLE file (new in v0.16). The provided path is case-insensitive.</p>
  81 +<pre><code>if ole.exists(&#39;worddocument&#39;):
  82 + print(&quot;This is a Word document.&quot;)
  83 + if ole.exists(&#39;macros/vba&#39;):
  84 + print(&quot;This document seems to contain VBA macros.&quot;)</code></pre>
  85 +<h2 id="read-data-from-a-stream">Read data from a stream</h2>
  86 +<p>openstream(path) opens a stream as a file-like object. The provided path is case-insensitive.</p>
  87 +<p>The following example extracts the &quot;Pictures&quot; stream from a PPT file:</p>
  88 +<pre><code>pics = ole.openstream(&#39;Pictures&#39;)
  89 +data = pics.read()</code></pre>
  90 +<h2 id="get-information-about-a-streamstorage">Get information about a stream/storage</h2>
  91 +<p>Several methods can provide the size, type and timestamps of a given stream/storage:</p>
  92 +<p>get_size(path) returns the size of a stream in bytes (new in v0.16):</p>
  93 +<pre><code>s = ole.get_size(&#39;WordDocument&#39;)</code></pre>
  94 +<p>get_type(path) returns the type of a stream/storage, as one of the following constants: STGTY_STREAM for a stream, STGTY_STORAGE for a storage, STGTY_ROOT for the root entry, and False for a non existing path (new in v0.15).</p>
  95 +<pre><code>t = ole.get_type(&#39;WordDocument&#39;)</code></pre>
  96 +<p>get_ctime(path) and get_mtime(path) return the creation and modification timestamps of a stream/storage, as a Python datetime object with UTC timezone. Please note that these timestamps are only present if the application that created the OLE file explicitly stored them, which is rarely the case. When not present, these methods return None (new in v0.26).</p>
  97 +<pre><code>c = ole.get_ctime(&#39;WordDocument&#39;)
  98 +m = ole.get_mtime(&#39;WordDocument&#39;)</code></pre>
  99 +<p>The root storage is a special case: You can get its creation and modification timestamps using the OleFileIO.root attribute (new in v0.26):</p>
  100 +<pre><code>c = ole.root.getctime()
  101 +m = ole.root.getmtime()</code></pre>
  102 +<p>Note: all these methods are case-insensitive.</p>
  103 +<h2 id="overwriting-a-sector">Overwriting a sector</h2>
  104 +<p>The write_sect method can overwrite any sector of the file. If the provided data is smaller than the sector size (normally 512 bytes, sometimes 4KB), data is padded with null characters. (new in v0.40)</p>
  105 +<p>Here is an example:</p>
  106 +<pre><code>ole.write_sect(0x17, b&#39;TEST&#39;)</code></pre>
  107 +<p>Note: following the <a href="http://msdn.microsoft.com/en-us/library/dd942138.aspx">MS-CFB specifications</a>, sector 0 is actually the second sector of the file. You may use -1 as index to write the first sector.</p>
  108 +<h2 id="overwriting-a-stream">Overwriting a stream</h2>
  109 +<p>The write_stream method can overwrite an existing stream in the file. The new stream data must be the exact same size as the existing one. For now, write_stream can only write streams of 4KB or larger (stored in the main FAT).</p>
  110 +<p>For example, you may change text in a MS Word document:</p>
  111 +<pre><code>ole = olefile.OleFileIO(&#39;test.doc&#39;, write_mode=True)
  112 +data = ole.openstream(&#39;WordDocument&#39;).read()
  113 +data = data.replace(b&#39;foo&#39;, b&#39;bar&#39;)
  114 +ole.write_stream(&#39;WordDocument&#39;, data)
  115 +ole.close()</code></pre>
  116 +<p>(new in v0.40)</p>
  117 +<h2 id="extract-metadata">Extract metadata</h2>
  118 +<p>get_metadata() will check if standard property streams exist, parse all the properties they contain, and return an OleMetadata object with the found properties as attributes (new in v0.24).</p>
  119 +<pre><code>meta = ole.get_metadata()
  120 +print(&#39;Author:&#39;, meta.author)
  121 +print(&#39;Title:&#39;, meta.title)
  122 +print(&#39;Creation date:&#39;, meta.create_time)
  123 +# print all metadata:
  124 +meta.dump()</code></pre>
  125 +<p>Available attributes include:</p>
  126 +<pre><code>codepage, title, subject, author, keywords, comments, template,
  127 +last_saved_by, revision_number, total_edit_time, last_printed, create_time,
  128 +last_saved_time, num_pages, num_words, num_chars, thumbnail,
  129 +creating_application, security, codepage_doc, category, presentation_target,
  130 +bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips,
  131 +scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty,
  132 +chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed,
  133 +version, dig_sig, content_type, content_status, language, doc_version</code></pre>
  134 +<p>See the source code of the OleMetadata class for more information.</p>
  135 +<h2 id="parse-a-property-stream">Parse a property stream</h2>
  136 +<p>get_properties(path) can be used to parse any property stream that is not handled by get_metadata. It returns a dictionary indexed by integers. Each integer is the index of the property, pointing to its value. For example in the standard property stream '05SummaryInformation', the document title is property #2, and the subject is #3.</p>
  137 +<pre><code>p = ole.getproperties(&#39;specialprops&#39;)</code></pre>
  138 +<p>By default as in the original PIL version, timestamp properties are converted into a number of seconds since Jan 1,1601. With the option convert_time, you can obtain more convenient Python datetime objects (UTC timezone). If some time properties should not be converted (such as total editing time in '05SummaryInformation'), the list of indexes can be passed as no_conversion (new in v0.25):</p>
  139 +<pre><code>p = ole.getproperties(&#39;specialprops&#39;, convert_time=True, no_conversion=[10])</code></pre>
  140 +<h2 id="close-the-ole-file">Close the OLE file</h2>
  141 +<p>Unless your application is a simple script that terminates after processing an OLE file, do not forget to close each OleFileIO object after parsing to close the file on disk. (new in v0.22)</p>
  142 +<pre><code>ole.close()</code></pre>
  143 +<h2 id="use-olefile-as-a-script-for-testingdebugging">Use olefile as a script for testing/debugging</h2>
  144 +<p>olefile can also be used as a script from the command-line to display the structure of an OLE file and its metadata, for example:</p>
  145 +<pre><code>olefile.py myfile.doc</code></pre>
  146 +<p>You can use the option -c to check that all streams can be read fully, and -d to generate very verbose debugging information.</p>
  147 +<hr />
  148 +<h2 id="olefile-documentation">olefile documentation</h2>
  149 +<ul>
  150 +<li><a href="Home.html">Home</a></li>
  151 +<li><a href="License.html">License</a></li>
  152 +<li><a href="Install.html">Install</a></li>
  153 +<li><a href="Contribute.html">Contribute</a>, Suggest Improvements or Report Issues</li>
  154 +<li><a href="OLE_Overview.html">OLE_Overview</a></li>
  155 +<li><a href="API.html">API</a> and Usage</li>
  156 +</ul>
  157 +</body>
  158 +</html>
... ...
oletools/thirdparty/olefile/doc/API.md 0 โ†’ 100644
  1 +How to use olefile - API
  2 +========================
  3 +
  4 +This page is part of the documentation for [olefile](https://bitbucket.org/decalage/olefileio_pl/wiki). It explains
  5 +how to use all its features to parse and write OLE files. For more information about OLE files, see [[OLE_Overview]].
  6 +
  7 +olefile can be used as an independent module or with PIL/Pillow. The main functions and methods are explained below.
  8 +
  9 +For more information, see also the file **olefile.html**, sample code at the end of the module itself, and docstrings within the code.
  10 +
  11 +
  12 +
  13 +Import olefile
  14 +--------------
  15 +
  16 +When the olefile package has been installed, it can be imported in Python applications with this statement:
  17 +
  18 + :::python
  19 + import olefile
  20 +
  21 +Before v0.40, olefile was named OleFileIO_PL. To maintain backward compatibility with older applications and samples, a
  22 +simple script is also installed so that the following statement imports olefile as OleFileIO_PL:
  23 +
  24 + :::python
  25 + import OleFileIO_PL
  26 +
  27 +As of version 0.30, the code has been changed to be compatible with Python 3.x. As a consequence, compatibility with
  28 +Python 2.5 or older is not provided anymore. However, a copy of OleFileIO_PL v0.26 (with some backported enhancements)
  29 +is available as olefile2.py. When importing the olefile package, it falls back automatically to olefile2 if running on
  30 +Python 2.5 or older. This is implemented in olefile/__init__.py. (new in v0.40)
  31 +
  32 +If you think olefile should stay compatible with Python 2.5 or older, please [contact me](http://decalage.info/contact).
  33 +
  34 +
  35 +## Test if a file is an OLE container
  36 +
  37 +Use **isOleFile** to check if the first bytes of the file contain the Magic for OLE files, before opening it. isOleFile
  38 +returns True if it is an OLE file, False otherwise (new in v0.16).
  39 +
  40 + :::python
  41 + assert olefile.isOleFile('myfile.doc')
  42 +
  43 +The argument of isOleFile can be (new in v0.41):
  44 +
  45 +- the path of the file to open on disk (bytes or unicode string smaller than 1536 bytes),
  46 +- or a bytes string containing the file in memory. (bytes string longer than 1535 bytes),
  47 +- or a file-like object (with read and seek methods).
  48 +
  49 +## Open an OLE file from disk
  50 +
  51 +Create an **OleFileIO** object with the file path as parameter:
  52 +
  53 + :::python
  54 + ole = olefile.OleFileIO('myfile.doc')
  55 +
  56 +## Open an OLE file from a bytes string
  57 +
  58 +This is useful if the file is already stored in memory as a bytes string.
  59 +
  60 + :::python
  61 + ole = olefile.OleFileIO(s)
  62 +
  63 +Note: olefile checks the size of the string provided as argument to determine if it is a file path or the content of an
  64 +OLE file. An OLE file cannot be smaller than 1536 bytes. If the string is larger than 1535 bytes, then it is expected to
  65 +contain an OLE file, otherwise it is expected to be a file path.
  66 +
  67 +(new in v0.41)
  68 +
  69 +
  70 +## Open an OLE file from a file-like object
  71 +
  72 +This is useful if the file is not on disk but only available as a file-like object (with read, seek and tell methods).
  73 +
  74 + :::python
  75 + ole = olefile.OleFileIO(f)
  76 +
  77 +If the file-like object does not have seek or tell methods, the easiest solution is to read the file entirely in
  78 +a bytes string before parsing:
  79 +
  80 + :::python
  81 + data = f.read()
  82 + ole = olefile.OleFileIO(data)
  83 +
  84 +
  85 +## How to handle malformed OLE files
  86 +
  87 +By default, the parser is configured to be as robust and permissive as possible, allowing to parse most malformed OLE files. Only fatal errors will raise an exception. It is possible to tell the parser to be more strict in order to raise exceptions for files that do not fully conform to the OLE specifications, using the raise_defect option (new in v0.14):
  88 +
  89 + :::python
  90 + ole = olefile.OleFileIO('myfile.doc', raise_defects=olefile.DEFECT_INCORRECT)
  91 +
  92 +When the parsing is done, the list of non-fatal issues detected is available as a list in the parsing_issues attribute of the OleFileIO object (new in 0.25):
  93 +
  94 + :::python
  95 + print('Non-fatal issues raised during parsing:')
  96 + if ole.parsing_issues:
  97 + for exctype, msg in ole.parsing_issues:
  98 + print('- %s: %s' % (exctype.__name__, msg))
  99 + else:
  100 + print('None')
  101 +
  102 +
  103 +## Open an OLE file in write mode
  104 +
  105 +Before using the write features, the OLE file must be opened in read/write mode:
  106 +
  107 + :::python
  108 + ole = olefile.OleFileIO('test.doc', write_mode=True)
  109 +
  110 +(new in v0.40)
  111 +
  112 +The code for write features is new and it has not been thoroughly tested yet. See [issue #6](https://bitbucket.org/decalage/olefileio_pl/issue/6/improve-olefileio_pl-to-write-ole-files) for the roadmap and the implementation status. If you encounter any issue, please send me your [feedback](http://www.decalage.info/en/contact) or [report issues](https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open).
  113 +
  114 +
  115 +## Syntax for stream and storage path
  116 +
  117 +Two different syntaxes are allowed for methods that need or return the path of streams and storages:
  118 +
  119 +1) Either a **list of strings** including all the storages from the root up to the stream/storage name. For example a stream called "WordDocument" at the root will have ['WordDocument'] as full path. A stream called "ThisDocument" located in the storage "Macros/VBA" will be ['Macros', 'VBA', 'ThisDocument']. This is the original syntax from PIL. While hard to read and not very convenient, this syntax works in all cases.
  120 +
  121 +2) Or a **single string with slashes** to separate storage and stream names (similar to the Unix path syntax). The previous examples would be 'WordDocument' and 'Macros/VBA/ThisDocument'. This syntax is easier, but may fail if a stream or storage name contains a slash. (new in v0.15)
  122 +
  123 +Both are case-insensitive.
  124 +
  125 +Switching between the two is easy:
  126 +
  127 + :::python
  128 + slash_path = '/'.join(list_path)
  129 + list_path = slash_path.split('/')
  130 +
  131 +
  132 +## Get the list of streams
  133 +
  134 +listdir() returns a list of all the streams contained in the OLE file, including those stored in storages. Each stream is listed itself as a list, as described above.
  135 +
  136 + :::python
  137 + print(ole.listdir())
  138 +
  139 +Sample result:
  140 +
  141 + :::python
  142 + [['\x01CompObj'], ['\x05DocumentSummaryInformation'], ['\x05SummaryInformation']
  143 + , ['1Table'], ['Macros', 'PROJECT'], ['Macros', 'PROJECTwm'], ['Macros', 'VBA',
  144 + 'Module1'], ['Macros', 'VBA', 'ThisDocument'], ['Macros', 'VBA', '_VBA_PROJECT']
  145 + , ['Macros', 'VBA', 'dir'], ['ObjectPool'], ['WordDocument']]
  146 +
  147 +As an option it is possible to choose if storages should also be listed, with or without streams (new in v0.26):
  148 +
  149 + :::python
  150 + ole.listdir (streams=False, storages=True)
  151 +
  152 +
  153 +## Test if known streams/storages exist:
  154 +
  155 +exists(path) checks if a given stream or storage exists in the OLE file (new in v0.16). The provided path is case-insensitive.
  156 +
  157 + :::python
  158 + if ole.exists('worddocument'):
  159 + print("This is a Word document.")
  160 + if ole.exists('macros/vba'):
  161 + print("This document seems to contain VBA macros.")
  162 +
  163 +
  164 +## Read data from a stream
  165 +
  166 +openstream(path) opens a stream as a file-like object. The provided path is case-insensitive.
  167 +
  168 +The following example extracts the "Pictures" stream from a PPT file:
  169 +
  170 + :::python
  171 + pics = ole.openstream('Pictures')
  172 + data = pics.read()
  173 +
  174 +
  175 +## Get information about a stream/storage
  176 +
  177 +Several methods can provide the size, type and timestamps of a given stream/storage:
  178 +
  179 +get_size(path) returns the size of a stream in bytes (new in v0.16):
  180 +
  181 + :::python
  182 + s = ole.get_size('WordDocument')
  183 +
  184 +get_type(path) returns the type of a stream/storage, as one of the following constants: STGTY\_STREAM for a stream, STGTY\_STORAGE for a storage, STGTY\_ROOT for the root entry, and False for a non existing path (new in v0.15).
  185 +
  186 + :::python
  187 + t = ole.get_type('WordDocument')
  188 +
  189 +get\_ctime(path) and get\_mtime(path) return the creation and modification timestamps of a stream/storage, as a Python datetime object with UTC timezone. Please note that these timestamps are only present if the application that created the OLE file explicitly stored them, which is rarely the case. When not present, these methods return None (new in v0.26).
  190 +
  191 + :::python
  192 + c = ole.get_ctime('WordDocument')
  193 + m = ole.get_mtime('WordDocument')
  194 +
  195 +The root storage is a special case: You can get its creation and modification timestamps using the OleFileIO.root attribute (new in v0.26):
  196 +
  197 + :::python
  198 + c = ole.root.getctime()
  199 + m = ole.root.getmtime()
  200 +
  201 +Note: all these methods are case-insensitive.
  202 +
  203 +## Overwriting a sector
  204 +
  205 +The write_sect method can overwrite any sector of the file. If the provided data is smaller than the sector size (normally 512 bytes, sometimes 4KB), data is padded with null characters. (new in v0.40)
  206 +
  207 +Here is an example:
  208 +
  209 + :::python
  210 + ole.write_sect(0x17, b'TEST')
  211 +
  212 +Note: following the [MS-CFB specifications](http://msdn.microsoft.com/en-us/library/dd942138.aspx), sector 0 is actually the second sector of the file. You may use -1 as index to write the first sector.
  213 +
  214 +
  215 +## Overwriting a stream
  216 +
  217 +The write_stream method can overwrite an existing stream in the file. The new stream data must be the exact same size as the existing one. For now, write_stream can only write streams of 4KB or larger (stored in the main FAT).
  218 +
  219 +For example, you may change text in a MS Word document:
  220 +
  221 + :::python
  222 + ole = olefile.OleFileIO('test.doc', write_mode=True)
  223 + data = ole.openstream('WordDocument').read()
  224 + data = data.replace(b'foo', b'bar')
  225 + ole.write_stream('WordDocument', data)
  226 + ole.close()
  227 +
  228 +(new in v0.40)
  229 +
  230 +
  231 +
  232 +## Extract metadata
  233 +
  234 +get_metadata() will check if standard property streams exist, parse all the properties they contain, and return an OleMetadata object with the found properties as attributes (new in v0.24).
  235 +
  236 + :::python
  237 + meta = ole.get_metadata()
  238 + print('Author:', meta.author)
  239 + print('Title:', meta.title)
  240 + print('Creation date:', meta.create_time)
  241 + # print all metadata:
  242 + meta.dump()
  243 +
  244 +Available attributes include:
  245 +
  246 + :::text
  247 + codepage, title, subject, author, keywords, comments, template,
  248 + last_saved_by, revision_number, total_edit_time, last_printed, create_time,
  249 + last_saved_time, num_pages, num_words, num_chars, thumbnail,
  250 + creating_application, security, codepage_doc, category, presentation_target,
  251 + bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips,
  252 + scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty,
  253 + chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed,
  254 + version, dig_sig, content_type, content_status, language, doc_version
  255 +
  256 +See the source code of the OleMetadata class for more information.
  257 +
  258 +
  259 +## Parse a property stream
  260 +
  261 +get\_properties(path) can be used to parse any property stream that is not handled by get\_metadata. It returns a dictionary indexed by integers. Each integer is the index of the property, pointing to its value. For example in the standard property stream '\x05SummaryInformation', the document title is property #2, and the subject is #3.
  262 +
  263 + :::python
  264 + p = ole.getproperties('specialprops')
  265 +
  266 +By default as in the original PIL version, timestamp properties are converted into a number of seconds since Jan 1,1601. With the option convert\_time, you can obtain more convenient Python datetime objects (UTC timezone). If some time properties should not be converted (such as total editing time in '\x05SummaryInformation'), the list of indexes can be passed as no_conversion (new in v0.25):
  267 +
  268 + :::python
  269 + p = ole.getproperties('specialprops', convert_time=True, no_conversion=[10])
  270 +
  271 +
  272 +## Close the OLE file
  273 +
  274 +Unless your application is a simple script that terminates after processing an OLE file, do not forget to close each OleFileIO object after parsing to close the file on disk. (new in v0.22)
  275 +
  276 + :::python
  277 + ole.close()
  278 +
  279 +## Use olefile as a script for testing/debugging
  280 +
  281 +olefile can also be used as a script from the command-line to display the structure of an OLE file and its metadata, for example:
  282 +
  283 + :::text
  284 + olefile.py myfile.doc
  285 +
  286 +You can use the option -c to check that all streams can be read fully, and -d to generate very verbose debugging information.
  287 +
  288 +--------------------------------------------------------------------------
  289 +
  290 +olefile documentation
  291 +---------------------
  292 +
  293 +- [[Home]]
  294 +- [[License]]
  295 +- [[Install]]
  296 +- [[Contribute]], Suggest Improvements or Report Issues
  297 +- [[OLE_Overview]]
  298 +- [[API]] and Usage
... ...
oletools/thirdparty/olefile/doc/Contribute.html 0 โ†’ 100644
  1 +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2 +<html xmlns="http://www.w3.org/1999/xhtml">
  3 +<head>
  4 + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  5 + <meta http-equiv="Content-Style-Type" content="text/css" />
  6 + <meta name="generator" content="pandoc" />
  7 + <title></title>
  8 +</head>
  9 +<body>
  10 +<h1 id="how-to-suggest-improvements-report-issues-or-contribute">How to Suggest Improvements, Report Issues or Contribute</h1>
  11 +<p>This is a personal open-source project, developed on my spare time. Any contribution, suggestion, feedback or bug report is welcome.</p>
  12 +<p>To <strong>suggest improvements, report a bug or any issue</strong>, please use the <a href="https://bitbucket.org/decalage/olefileio_pl/issues?status=new&amp;status=open">issue reporting page</a>, providing all the information and files to reproduce the problem.</p>
  13 +<p>If possible please join the debugging output of olefile. For this, launch the following command :</p>
  14 +<pre><code> olefile.py -d -c file &gt;debug.txt </code></pre>
  15 +<p>You may also <a href="http://decalage.info/contact">contact the author</a> directly to <strong>provide feedback</strong>.</p>
  16 +<p>The code is available in <a href="https://bitbucket.org/decalage/olefileio_pl">a Mercurial repository on Bitbucket</a>. You may use it to <strong>submit enhancements</strong> using forks and pull requests.</p>
  17 +<hr />
  18 +<h2 id="olefile-documentation">olefile documentation</h2>
  19 +<ul>
  20 +<li><a href="Home.html">Home</a></li>
  21 +<li><a href="License.html">License</a></li>
  22 +<li><a href="Install.html">Install</a></li>
  23 +<li><a href="Contribute.html">Contribute</a>, Suggest Improvements or Report Issues</li>
  24 +<li><a href="OLE_Overview.html">OLE_Overview</a></li>
  25 +<li><a href="API.html">API</a> and Usage</li>
  26 +</ul>
  27 +</body>
  28 +</html>
... ...
oletools/thirdparty/olefile/doc/Contribute.md 0 โ†’ 100644
  1 +How to Suggest Improvements, Report Issues or Contribute
  2 +========================================================
  3 +
  4 +This is a personal open-source project, developed on my spare time. Any contribution, suggestion, feedback or bug report is welcome.
  5 +
  6 +To **suggest improvements, report a bug or any issue**, please use the [issue reporting page](https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open), providing all the information and files to reproduce the problem.
  7 +
  8 +If possible please join the debugging output of olefile. For this, launch the following command :
  9 +
  10 + :::text
  11 + olefile.py -d -c file >debug.txt
  12 +
  13 +
  14 +You may also [contact the author](http://decalage.info/contact) directly to **provide feedback**.
  15 +
  16 +The code is available in [a Mercurial repository on Bitbucket](https://bitbucket.org/decalage/olefileio_pl). You may use it to **submit enhancements** using forks and pull requests.
  17 +
  18 +--------------------------------------------------------------------------
  19 +
  20 +olefile documentation
  21 +---------------------
  22 +
  23 +- [[Home]]
  24 +- [[License]]
  25 +- [[Install]]
  26 +- [[Contribute]], Suggest Improvements or Report Issues
  27 +- [[OLE_Overview]]
  28 +- [[API]] and Usage
... ...
oletools/thirdparty/olefile/doc/Home.html 0 โ†’ 100644
  1 +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2 +<html xmlns="http://www.w3.org/1999/xhtml">
  3 +<head>
  4 + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  5 + <meta http-equiv="Content-Style-Type" content="text/css" />
  6 + <meta name="generator" content="pandoc" />
  7 + <title></title>
  8 +</head>
  9 +<body>
  10 +<h1 id="olefile-v0.41-documentation">olefile v0.41 documentation</h1>
  11 +<p>This is the home page of the documentation for olefile. The latest version can be found <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">online</a>, otherwise a copy is provided in the doc subfolder of the package.</p>
  12 +<p><a href="http://www.decalage.info/olefile">olefile</a> is a Python package to parse, read and write <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format)</a>, such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.</p>
  13 +<p><strong>Quick links:</strong> <a href="http://www.decalage.info/olefile">Home page</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki/Install">Download/Install</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">Documentation</a> - <a href="https://bitbucket.org/decalage/olefileio_pl/issues?status=new&amp;status=open">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the author</a> - <a href="https://bitbucket.org/decalage/olefileio_pl">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a></p>
  14 +<h2 id="history">History</h2>
  15 +<p>olefile is based on the OleFileIO module from <a href="http://www.pythonware.com/products/pil/index.htm">PIL</a>, the excellent Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate its 9 years and its new write features.</p>
  16 +<p>As far as I know, this module is the most complete and robust Python implementation to read MS OLE2 files, portable on several operating systems. (please tell me if you know other similar Python modules)</p>
  17 +<p>Since 2014 olefile/OleFileIO_PL has been integrated into <a href="http://python-imaging.github.io/">Pillow</a>, the friendly fork of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow regularly.</p>
  18 +<p>olefile can be used as an independent module or with PIL/Pillow.</p>
  19 +<p>olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially for security purposes such as malware analysis and forensics), then please also check my <a href="http://www.decalage.info/python/oletools">python-oletools</a>, which are built upon olefile and provide a higher-level interface.</p>
  20 +<h2 id="features">Features</h2>
  21 +<ul>
  22 +<li>Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView OIB files, etc</li>
  23 +<li>List all the streams and storages contained in an OLE file</li>
  24 +<li>Open streams as files</li>
  25 +<li>Parse and read property streams, containing metadata of the file</li>
  26 +<li>Portable, pure Python module, no dependency</li>
  27 +</ul>
  28 +<h2 id="main-improvements-over-the-original-version-of-olefileio-in-pil">Main improvements over the original version of OleFileIO in PIL:</h2>
  29 +<ul>
  30 +<li>Compatible with Python 3.x and 2.6+</li>
  31 +<li>Many bug fixes</li>
  32 +<li>Support for files larger than 6.8MB</li>
  33 +<li>Support for 64 bits platforms and big-endian CPUs</li>
  34 +<li>Robust: many checks to detect malformed files</li>
  35 +<li>Runtime option to choose if malformed files should be parsed or raise exceptions</li>
  36 +<li>Improved API</li>
  37 +<li>Metadata extraction, stream/storage timestamps (e.g. for document forensics)</li>
  38 +<li>Can open file-like objects</li>
  39 +<li>Added setup.py and install.bat to ease installation</li>
  40 +<li>More convenient slash-based syntax for stream paths</li>
  41 +<li>Write features</li>
  42 +</ul>
  43 +<hr />
  44 +<h2 id="olefile-documentation">olefile documentation</h2>
  45 +<ul>
  46 +<li><a href="Home.html">Home</a></li>
  47 +<li><a href="License.html">License</a></li>
  48 +<li><a href="Install.html">Install</a></li>
  49 +<li><a href="Contribute.html">Contribute</a>, Suggest Improvements or Report Issues</li>
  50 +<li><a href="OLE_Overview.html">OLE_Overview</a></li>
  51 +<li><a href="API.html">API</a> and Usage</li>
  52 +</ul>
  53 +</body>
  54 +</html>
... ...
oletools/thirdparty/olefile/doc/Home.md 0 โ†’ 100644
  1 +olefile v0.41 documentation
  2 +===========================
  3 +
  4 +This is the home page of the documentation for olefile. The latest version can be found [online](https://bitbucket.org/decalage/olefileio_pl/wiki), otherwise a copy is provided in the doc subfolder of the package.
  5 +
  6 +[olefile](http://www.decalage.info/olefile) is a Python package to parse, read and write [Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format)](http://en.wikipedia.org/wiki/Compound_File_Binary_Format), such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.
  7 +
  8 +
  9 +**Quick links:** [Home page](http://www.decalage.info/olefile) - [Download/Install](https://bitbucket.org/decalage/olefileio_pl/wiki/Install) - [Documentation](https://bitbucket.org/decalage/olefileio_pl/wiki) - [Report Issues/Suggestions/Questions](https://bitbucket.org/decalage/olefileio_pl/issues?status=new&status=open) - [Contact the author](http://decalage.info/contact) - [Repository](https://bitbucket.org/decalage/olefileio_pl) - [Updates on Twitter](https://twitter.com/decalage2)
  10 +
  11 +History
  12 +-------
  13 +
  14 +olefile is based on the OleFileIO module from [PIL](http://www.pythonware.com/products/pil/index.htm), the excellent Python Imaging Library, created and maintained by Fredrik Lundh. The olefile API is still compatible with PIL, but since 2005 I have improved the internal implementation significantly, with new features, bugfixes and a more robust design. From 2005 to 2014 the project was called OleFileIO_PL, and in 2014 I changed its name to olefile to celebrate its 9 years and its new write features.
  15 +
  16 +As far as I know, this module is the most complete and robust Python implementation to read MS OLE2 files, portable on several operating systems. (please tell me if you know other similar Python modules)
  17 +
  18 +Since 2014 olefile/OleFileIO_PL has been integrated into [Pillow](http://python-imaging.github.io/), the friendly fork of PIL. olefile will continue to be improved as a separate project, and new versions will be merged into Pillow regularly.
  19 +
  20 +olefile can be used as an independent module or with PIL/Pillow.
  21 +
  22 +olefile is mostly meant for developers. If you are looking for tools to analyze OLE files or to extract data (especially for security purposes such as malware analysis and forensics), then please also check my [python-oletools](http://www.decalage.info/python/oletools), which are built upon olefile and provide a higher-level interface.
  23 +
  24 +Features
  25 +--------
  26 +
  27 +- Parse, read and write any OLE file such as Microsoft Office 97-2003 legacy document formats (Word .doc, Excel .xls, PowerPoint .ppt, Visio .vsd, Project .mpp), Image Composer and FlashPix files, Outlook messages, StickyNotes, Zeiss AxioVision ZVI files, Olympus FluoView OIB files, etc
  28 +- List all the streams and storages contained in an OLE file
  29 +- Open streams as files
  30 +- Parse and read property streams, containing metadata of the file
  31 +- Portable, pure Python module, no dependency
  32 +
  33 +
  34 +Main improvements over the original version of OleFileIO in PIL:
  35 +----------------------------------------------------------------
  36 +
  37 +- Compatible with Python 3.x and 2.6+
  38 +- Many bug fixes
  39 +- Support for files larger than 6.8MB
  40 +- Support for 64 bits platforms and big-endian CPUs
  41 +- Robust: many checks to detect malformed files
  42 +- Runtime option to choose if malformed files should be parsed or raise exceptions
  43 +- Improved API
  44 +- Metadata extraction, stream/storage timestamps (e.g. for document forensics)
  45 +- Can open file-like objects
  46 +- Added setup.py and install.bat to ease installation
  47 +- More convenient slash-based syntax for stream paths
  48 +- Write features
  49 +
  50 +
  51 +--------------------------------------------------------------------------
  52 +
  53 +olefile documentation
  54 +---------------------
  55 +
  56 +- [[Home]]
  57 +- [[License]]
  58 +- [[Install]]
  59 +- [[Contribute]], Suggest Improvements or Report Issues
  60 +- [[OLE_Overview]]
  61 +- [[API]] and Usage
... ...
oletools/thirdparty/olefile/doc/Install.html 0 โ†’ 100644
  1 +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2 +<html xmlns="http://www.w3.org/1999/xhtml">
  3 +<head>
  4 + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  5 + <meta http-equiv="Content-Style-Type" content="text/css" />
  6 + <meta name="generator" content="pandoc" />
  7 + <title></title>
  8 +</head>
  9 +<body>
  10 +<h1 id="how-to-download-and-install-olefile">How to Download and Install olefile</h1>
  11 +<h2 id="pre-requisites">Pre-requisites</h2>
  12 +<p>olefile requires Python 2.6, 2.7 or 3.x.</p>
  13 +<p>For Python 2.5 and older, olefile falls back to an older version (based on OleFileIO_PL 0.26) which might not contain all the enhancements implemented in olefile.</p>
  14 +<h2 id="download-and-install">Download and Install</h2>
  15 +<p>To use olefile with other Python applications or your own scripts, the simplest solution is to run &quot;<strong>pip install olefile</strong>&quot; or &quot;<strong>easy_install olefile</strong>&quot; to download and install the package in one go.</p>
  16 +<p>Otherwise you may download/extract the <a href="https://bitbucket.org/decalage/olefileio_pl/downloads">zip archive</a> in a temporary directory and run &quot;<strong>python setup.py install</strong>&quot;.</p>
  17 +<p>On Windows you may simply double-click on <strong>install.bat</strong>.</p>
  18 +<hr />
  19 +<h2 id="olefile-documentation">olefile documentation</h2>
  20 +<ul>
  21 +<li><a href="Home.html">Home</a></li>
  22 +<li><a href="License.html">License</a></li>
  23 +<li><a href="Install.html">Install</a></li>
  24 +<li><a href="Contribute.html">Contribute</a>, Suggest Improvements or Report Issues</li>
  25 +<li><a href="OLE_Overview.html">OLE_Overview</a></li>
  26 +<li><a href="API.html">API</a> and Usage</li>
  27 +</ul>
  28 +</body>
  29 +</html>
... ...
oletools/thirdparty/olefile/doc/Install.md 0 โ†’ 100644
  1 +How to Download and Install olefile
  2 +===================================
  3 +
  4 +Pre-requisites
  5 +--------------
  6 +
  7 +olefile requires Python 2.6, 2.7 or 3.x.
  8 +
  9 +For Python 2.5 and older, olefile falls back to an older version (based on OleFileIO_PL 0.26) which might not contain all the enhancements implemented in olefile.
  10 +
  11 +
  12 +Download and Install
  13 +--------------------
  14 +
  15 +To use olefile with other Python applications or your own scripts, the simplest solution is to run "**pip install olefile**" or "**easy_install olefile**" to download and install the package in one go.
  16 +
  17 +Otherwise you may download/extract the [zip archive](https://bitbucket.org/decalage/olefileio_pl/downloads) in a temporary directory and run "**python setup.py install**".
  18 +
  19 +On Windows you may simply double-click on **install.bat**.
  20 +
  21 +--------------------------------------------------------------------------
  22 +
  23 +olefile documentation
  24 +---------------------
  25 +
  26 +- [[Home]]
  27 +- [[License]]
  28 +- [[Install]]
  29 +- [[Contribute]], Suggest Improvements or Report Issues
  30 +- [[OLE_Overview]]
  31 +- [[API]] and Usage
... ...
oletools/thirdparty/olefile/doc/License.html 0 โ†’ 100644
  1 +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2 +<html xmlns="http://www.w3.org/1999/xhtml">
  3 +<head>
  4 + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  5 + <meta http-equiv="Content-Style-Type" content="text/css" />
  6 + <meta name="generator" content="pandoc" />
  7 + <title></title>
  8 +</head>
  9 +<body>
  10 +<h1 id="license-for-olefile">License for olefile</h1>
  11 +<p>olefile (formerly OleFileIO_PL) is copyright (c) 2005-2014 Philippe Lagadec (<a href="http://www.decalage.info">http://www.decalage.info</a>)</p>
  12 +<p>All rights reserved.</p>
  13 +<p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p>
  14 +<ul>
  15 +<li>Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.</li>
  16 +<li>Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.</li>
  17 +</ul>
  18 +<p>THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS &quot;AS IS&quot; AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.</p>
  19 +<hr />
  20 +<p>olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik Lundh under the following license:</p>
  21 +<p>The Python Imaging Library (PIL) is</p>
  22 +<ul>
  23 +<li>Copyright (c) 1997-2005 by Secret Labs AB</li>
  24 +<li>Copyright (c) 1995-2005 by Fredrik Lundh</li>
  25 +</ul>
  26 +<p>By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:</p>
  27 +<p>Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.</p>
  28 +<p>SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.</p>
  29 +<hr />
  30 +<h2 id="olefile-documentation">olefile documentation</h2>
  31 +<ul>
  32 +<li><a href="Home.html">Home</a></li>
  33 +<li><a href="License.html">License</a></li>
  34 +<li><a href="Install.html">Install</a></li>
  35 +<li><a href="Contribute.html">Contribute</a>, Suggest Improvements or Report Issues</li>
  36 +<li><a href="OLE_Overview.html">OLE_Overview</a></li>
  37 +<li><a href="API.html">API</a> and Usage</li>
  38 +</ul>
  39 +</body>
  40 +</html>
... ...
oletools/thirdparty/olefile/doc/License.md 0 โ†’ 100644
  1 +License for olefile
  2 +===================
  3 +
  4 +olefile (formerly OleFileIO_PL) is copyright (c) 2005-2014 Philippe Lagadec ([http://www.decalage.info](http://www.decalage.info))
  5 +
  6 +All rights reserved.
  7 +
  8 +Redistribution and use in source and binary forms, with or without modification,
  9 +are permitted provided that the following conditions are met:
  10 +
  11 + * Redistributions of source code must retain the above copyright notice, this
  12 + list of conditions and the following disclaimer.
  13 + * Redistributions in binary form must reproduce the above copyright notice,
  14 + this list of conditions and the following disclaimer in the documentation
  15 + and/or other materials provided with the distribution.
  16 +
  17 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  18 +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  19 +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  20 +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  21 +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22 +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23 +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24 +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25 +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  26 +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27 +
  28 +
  29 +----------
  30 +
  31 +olefile is based on source code from the OleFileIO module of the Python Imaging Library (PIL) published by Fredrik Lundh under the following license:
  32 +
  33 +The Python Imaging Library (PIL) is
  34 +
  35 +- Copyright (c) 1997-2005 by Secret Labs AB
  36 +- Copyright (c) 1995-2005 by Fredrik Lundh
  37 +
  38 +By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions:
  39 +
  40 +Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission.
  41 +
  42 +SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
  43 +
  44 +--------------------------------------------------------------------------
  45 +
  46 +olefile documentation
  47 +---------------------
  48 +
  49 +- [[Home]]
  50 +- [[License]]
  51 +- [[Install]]
  52 +- [[Contribute]], Suggest Improvements or Report Issues
  53 +- [[OLE_Overview]]
  54 +- [[API]] and Usage
... ...
oletools/thirdparty/olefile/doc/OLE_Overview.html 0 โ†’ 100644
  1 +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
  2 +<html xmlns="http://www.w3.org/1999/xhtml">
  3 +<head>
  4 + <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
  5 + <meta http-equiv="Content-Style-Type" content="text/css" />
  6 + <meta name="generator" content="pandoc" />
  7 + <title></title>
  8 +</head>
  9 +<body>
  10 +<h1 id="about-the-structure-of-ole-files">About the structure of OLE files</h1>
  11 +<p>This page is part of the documentation for <a href="https://bitbucket.org/decalage/olefileio_pl/wiki">olefile</a>. It provides a brief overview of the structure of <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format)</a>, such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.</p>
  12 +<p>An OLE file can be seen as a mini file system or a Zip archive: It contains <strong>streams</strong> of data that look like files embedded within the OLE file. Each stream has a name. For example, the main stream of a MS Word document containing its text is named &quot;WordDocument&quot;.</p>
  13 +<p>An OLE file can also contain <strong>storages</strong>. A storage is a folder that contains streams or other storages. For example, a MS Word document with VBA macros has a storage called &quot;Macros&quot;.</p>
  14 +<p>Special streams can contain <strong>properties</strong>. A property is a specific value that can be used to store information such as the metadata of a document (title, author, creation date, etc). Property stream names usually start with the character '05'.</p>
  15 +<p>For example, a typical MS Word document may look like this:</p>
  16 +<div class="figure">
  17 +<img src="OLE_VBA_sample.png" /><p class="caption"></p>
  18 +</div>
  19 +<p>Go to the <a href="API.html">API</a> page to see how to use all olefile features to parse OLE files.</p>
  20 +<hr />
  21 +<h2 id="olefile-documentation">olefile documentation</h2>
  22 +<ul>
  23 +<li><a href="Home.html">Home</a></li>
  24 +<li><a href="License.html">License</a></li>
  25 +<li><a href="Install.html">Install</a></li>
  26 +<li><a href="Contribute.html">Contribute</a>, Suggest Improvements or Report Issues</li>
  27 +<li><a href="OLE_Overview.html">OLE_Overview</a></li>
  28 +<li><a href="API.html">API</a> and Usage</li>
  29 +</ul>
  30 +</body>
  31 +</html>
... ...
oletools/thirdparty/olefile/doc/OLE_Overview.md 0 โ†’ 100644
  1 +About the structure of OLE files
  2 +================================
  3 +
  4 +This page is part of the documentation for [olefile](https://bitbucket.org/decalage/olefileio_pl/wiki). It provides a brief overview of the structure of [Microsoft OLE2 files (also called Structured Storage, Compound File Binary Format or Compound Document File Format)](http://en.wikipedia.org/wiki/Compound_File_Binary_Format), such as Microsoft Office 97-2003 documents, Image Composer and FlashPix files, Outlook messages, StickyNotes, several Microscopy file formats, McAfee antivirus quarantine files, etc.
  5 +
  6 +An OLE file can be seen as a mini file system or a Zip archive: It contains **streams** of data that look like files embedded within the OLE file. Each stream has a name. For example, the main stream of a MS Word document containing its text is named "WordDocument".
  7 +
  8 +An OLE file can also contain **storages**. A storage is a folder that contains streams or other storages. For example, a MS Word document with VBA macros has a storage called "Macros".
  9 +
  10 +Special streams can contain **properties**. A property is a specific value that can be used to store information such as the metadata of a document (title, author, creation date, etc). Property stream names usually start with the character '\x05'.
  11 +
  12 +For example, a typical MS Word document may look like this:
  13 +
  14 +![](OLE_VBA_sample.png)
  15 +
  16 +Go to the [[API]] page to see how to use all olefile features to parse OLE files.
  17 +
  18 +
  19 +--------------------------------------------------------------------------
  20 +
  21 +olefile documentation
  22 +---------------------
  23 +
  24 +- [[Home]]
  25 +- [[License]]
  26 +- [[Install]]
  27 +- [[Contribute]], Suggest Improvements or Report Issues
  28 +- [[OLE_Overview]]
  29 +- [[API]] and Usage
... ...
oletools/thirdparty/olefile/doc/OLE_VBA_sample.png 0 โ†’ 100644

3.48 KB

oletools/thirdparty/olefile/olefile.html 0 โ†’ 100644
No preview for this file type
oletools/thirdparty/olefile/olefile.py 0 โ†’ 100644
  1 +#!/usr/bin/env python
  2 +
  3 +# olefile (formerly OleFileIO_PL) version 0.41 2014-11-25
  4 +#
  5 +# Module to read/write Microsoft OLE2 files (also called Structured Storage or
  6 +# Microsoft Compound Document File Format), such as Microsoft Office 97-2003
  7 +# documents, Image Composer and FlashPix files, Outlook messages, ...
  8 +# This version is compatible with Python 2.6+ and 3.x
  9 +#
  10 +# Project website: http://www.decalage.info/olefile
  11 +#
  12 +# olefile is copyright (c) 2005-2014 Philippe Lagadec (http://www.decalage.info)
  13 +#
  14 +# olefile is based on the OleFileIO module from the PIL library v1.1.6
  15 +# See: http://www.pythonware.com/products/pil/index.htm
  16 +#
  17 +# The Python Imaging Library (PIL) is
  18 +# Copyright (c) 1997-2005 by Secret Labs AB
  19 +# Copyright (c) 1995-2005 by Fredrik Lundh
  20 +#
  21 +# See source code and LICENSE.txt for information on usage and redistribution.
  22 +
  23 +
  24 +# Since OleFileIO_PL v0.30, only Python 2.6+ and 3.x is supported
  25 +# This import enables print() as a function rather than a keyword
  26 +# (main requirement to be compatible with Python 3.x)
  27 +# The comment on the line below should be printed on Python 2.5 or older:
  28 +from __future__ import print_function # This version of olefile requires Python 2.6+ or 3.x.
  29 +
  30 +
  31 +__author__ = "Philippe Lagadec"
  32 +__date__ = "2014-11-25"
  33 +__version__ = '0.41'
  34 +
  35 +#--- LICENSE ------------------------------------------------------------------
  36 +
  37 +# olefile (formerly OleFileIO_PL) is copyright (c) 2005-2014 Philippe Lagadec
  38 +# (http://www.decalage.info)
  39 +#
  40 +# All rights reserved.
  41 +#
  42 +# Redistribution and use in source and binary forms, with or without modification,
  43 +# are permitted provided that the following conditions are met:
  44 +#
  45 +# * Redistributions of source code must retain the above copyright notice, this
  46 +# list of conditions and the following disclaimer.
  47 +# * Redistributions in binary form must reproduce the above copyright notice,
  48 +# this list of conditions and the following disclaimer in the documentation
  49 +# and/or other materials provided with the distribution.
  50 +#
  51 +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  52 +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  53 +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  54 +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  55 +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  56 +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  57 +# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  58 +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  59 +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  60 +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  61 +
  62 +# ----------
  63 +# PIL License:
  64 +#
  65 +# olefile is based on source code from the OleFileIO module of the Python
  66 +# Imaging Library (PIL) published by Fredrik Lundh under the following license:
  67 +
  68 +# The Python Imaging Library (PIL) is
  69 +# Copyright (c) 1997-2005 by Secret Labs AB
  70 +# Copyright (c) 1995-2005 by Fredrik Lundh
  71 +#
  72 +# By obtaining, using, and/or copying this software and/or its associated
  73 +# documentation, you agree that you have read, understood, and will comply with
  74 +# the following terms and conditions:
  75 +#
  76 +# Permission to use, copy, modify, and distribute this software and its
  77 +# associated documentation for any purpose and without fee is hereby granted,
  78 +# provided that the above copyright notice appears in all copies, and that both
  79 +# that copyright notice and this permission notice appear in supporting
  80 +# documentation, and that the name of Secret Labs AB or the author(s) not be used
  81 +# in advertising or publicity pertaining to distribution of the software
  82 +# without specific, written prior permission.
  83 +#
  84 +# SECRET LABS AB AND THE AUTHORS DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS
  85 +# SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS.
  86 +# IN NO EVENT SHALL SECRET LABS AB OR THE AUTHORS BE LIABLE FOR ANY SPECIAL,
  87 +# INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM
  88 +# LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR
  89 +# OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR
  90 +# PERFORMANCE OF THIS SOFTWARE.
  91 +
  92 +#-----------------------------------------------------------------------------
  93 +# CHANGELOG: (only olefile/OleFileIO_PL changes compared to PIL 1.1.6)
  94 +# 2005-05-11 v0.10 PL: - a few fixes for Python 2.4 compatibility
  95 +# (all changes flagged with [PL])
  96 +# 2006-02-22 v0.11 PL: - a few fixes for some Office 2003 documents which raise
  97 +# exceptions in _OleStream.__init__()
  98 +# 2006-06-09 v0.12 PL: - fixes for files above 6.8MB (DIFAT in loadfat)
  99 +# - added some constants
  100 +# - added header values checks
  101 +# - added some docstrings
  102 +# - getsect: bugfix in case sectors >512 bytes
  103 +# - getsect: added conformity checks
  104 +# - DEBUG_MODE constant to activate debug display
  105 +# 2007-09-04 v0.13 PL: - improved/translated (lots of) comments
  106 +# - updated license
  107 +# - converted tabs to 4 spaces
  108 +# 2007-11-19 v0.14 PL: - added OleFileIO._raise_defect() to adapt sensitivity
  109 +# - improved _unicode() to use Python 2.x unicode support
  110 +# - fixed bug in _OleDirectoryEntry
  111 +# 2007-11-25 v0.15 PL: - added safety checks to detect FAT loops
  112 +# - fixed _OleStream which didn't check stream size
  113 +# - added/improved many docstrings and comments
  114 +# - moved helper functions _unicode and _clsid out of
  115 +# OleFileIO class
  116 +# - improved OleFileIO._find() to add Unix path syntax
  117 +# - OleFileIO._find() is now case-insensitive
  118 +# - added get_type() and get_rootentry_name()
  119 +# - rewritten loaddirectory and _OleDirectoryEntry
  120 +# 2007-11-27 v0.16 PL: - added _OleDirectoryEntry.kids_dict
  121 +# - added detection of duplicate filenames in storages
  122 +# - added detection of duplicate references to streams
  123 +# - added get_size() and exists() to _OleDirectoryEntry
  124 +# - added isOleFile to check header before parsing
  125 +# - added __all__ list to control public keywords in pydoc
  126 +# 2007-12-04 v0.17 PL: - added _load_direntry to fix a bug in loaddirectory
  127 +# - improved _unicode(), added workarounds for Python <2.3
  128 +# - added set_debug_mode and -d option to set debug mode
  129 +# - fixed bugs in OleFileIO.open and _OleDirectoryEntry
  130 +# - added safety check in main for large or binary
  131 +# properties
  132 +# - allow size>0 for storages for some implementations
  133 +# 2007-12-05 v0.18 PL: - fixed several bugs in handling of FAT, MiniFAT and
  134 +# streams
  135 +# - added option '-c' in main to check all streams
  136 +# 2009-12-10 v0.19 PL: - bugfix for 32 bit arrays on 64 bits platforms
  137 +# (thanks to Ben G. and Martijn for reporting the bug)
  138 +# 2009-12-11 v0.20 PL: - bugfix in OleFileIO.open when filename is not plain str
  139 +# 2010-01-22 v0.21 PL: - added support for big-endian CPUs such as PowerPC Macs
  140 +# 2012-02-16 v0.22 PL: - fixed bug in getproperties, patch by chuckleberryfinn
  141 +# (https://bitbucket.org/decalage/olefileio_pl/issue/7)
  142 +# - added close method to OleFileIO (fixed issue #2)
  143 +# 2012-07-25 v0.23 PL: - added support for file-like objects (patch by mete0r_kr)
  144 +# 2013-05-05 v0.24 PL: - getproperties: added conversion from filetime to python
  145 +# datetime
  146 +# - main: displays properties with date format
  147 +# - new class OleMetadata to parse standard properties
  148 +# - added get_metadata method
  149 +# 2013-05-07 v0.24 PL: - a few improvements in OleMetadata
  150 +# 2013-05-24 v0.25 PL: - getproperties: option to not convert some timestamps
  151 +# - OleMetaData: total_edit_time is now a number of seconds,
  152 +# not a timestamp
  153 +# - getproperties: added support for VT_BOOL, VT_INT, V_UINT
  154 +# - getproperties: filter out null chars from strings
  155 +# - getproperties: raise non-fatal defects instead of
  156 +# exceptions when properties cannot be parsed properly
  157 +# 2013-05-27 PL: - getproperties: improved exception handling
  158 +# - _raise_defect: added option to set exception type
  159 +# - all non-fatal issues are now recorded, and displayed
  160 +# when run as a script
  161 +# 2013-07-11 v0.26 PL: - added methods to get modification and creation times
  162 +# of a directory entry or a storage/stream
  163 +# - fixed parsing of direntry timestamps
  164 +# 2013-07-24 PL: - new options in listdir to list storages and/or streams
  165 +# 2014-02-04 v0.30 PL: - upgraded code to support Python 3.x by Martin Panter
  166 +# - several fixes for Python 2.6 (xrange, MAGIC)
  167 +# - reused i32 from Pillow's _binary
  168 +# 2014-07-18 v0.31 - preliminary support for 4K sectors
  169 +# 2014-07-27 v0.31 PL: - a few improvements in OleFileIO.open (header parsing)
  170 +# - Fixed loadfat for large files with 4K sectors (issue #3)
  171 +# 2014-07-30 v0.32 PL: - added write_sect to write sectors to disk
  172 +# - added write_mode option to OleFileIO.__init__ and open
  173 +# 2014-07-31 PL: - fixed padding in write_sect for Python 3, added checks
  174 +# - added write_stream to write a stream to disk
  175 +# 2014-09-26 v0.40 PL: - renamed OleFileIO_PL to olefile
  176 +# 2014-11-09 NE: - added support for Jython (Niko Ehrenfeuchter)
  177 +# 2014-11-13 v0.41 PL: - improved isOleFile and OleFileIO.open to support OLE
  178 +# data in a string buffer and file-like objects.
  179 +# 2014-11-21 PL: - updated comments according to Pillow's commits
  180 +
  181 +#-----------------------------------------------------------------------------
  182 +# TODO (for version 1.0):
  183 +# + get rid of print statements, to simplify Python 2.x and 3.x support
  184 +# + add is_stream and is_storage
  185 +# + remove leading and trailing slashes where a path is used
  186 +# + add functions path_list2str and path_str2list
  187 +# + fix how all the methods handle unicode str and/or bytes as arguments
  188 +# + add path attrib to _OleDirEntry, set it once and for all in init or
  189 +# append_kids (then listdir/_list can be simplified)
  190 +# - TESTS with Linux, MacOSX, Python 1.5.2, various files, PIL, ...
  191 +# - add underscore to each private method, to avoid their display in
  192 +# pydoc/epydoc documentation - Remove it for classes to be documented
  193 +# - replace all raised exceptions with _raise_defect (at least in OleFileIO)
  194 +# - merge code from _OleStream and OleFileIO.getsect to read sectors
  195 +# (maybe add a class for FAT and MiniFAT ?)
  196 +# - add method to check all streams (follow sectors chains without storing all
  197 +# stream in memory, and report anomalies)
  198 +# - use _OleDirectoryEntry.kids_dict to improve _find and _list ?
  199 +# - fix Unicode names handling (find some way to stay compatible with Py1.5.2)
  200 +# => if possible avoid converting names to Latin-1
  201 +# - review DIFAT code: fix handling of DIFSECT blocks in FAT (not stop)
  202 +# - rewrite OleFileIO.getproperties
  203 +# - improve docstrings to show more sample uses
  204 +# - see also original notes and FIXME below
  205 +# - remove all obsolete FIXMEs
  206 +# - OleMetadata: fix version attrib according to
  207 +# http://msdn.microsoft.com/en-us/library/dd945671%28v=office.12%29.aspx
  208 +
  209 +# IDEAS:
  210 +# - in OleFileIO._open and _OleStream, use size=None instead of 0x7FFFFFFF for
  211 +# streams with unknown size
  212 +# - use arrays of int instead of long integers for FAT/MiniFAT, to improve
  213 +# performance and reduce memory usage ? (possible issue with values >2^31)
  214 +# - provide tests with unittest (may need write support to create samples)
  215 +# - move all debug code (and maybe dump methods) to a separate module, with
  216 +# a class which inherits OleFileIO ?
  217 +# - fix docstrings to follow epydoc format
  218 +# - add support for big endian byte order ?
  219 +# - create a simple OLE explorer with wxPython
  220 +
  221 +# FUTURE EVOLUTIONS to add write support:
  222 +# see issue #6 on Bitbucket:
  223 +# https://bitbucket.org/decalage/olefileio_pl/issue/6/improve-olefileio_pl-to-write-ole-files
  224 +
  225 +#-----------------------------------------------------------------------------
  226 +# NOTES from PIL 1.1.6:
  227 +
  228 +# History:
  229 +# 1997-01-20 fl Created
  230 +# 1997-01-22 fl Fixed 64-bit portability quirk
  231 +# 2003-09-09 fl Fixed typo in OleFileIO.loadfat (noted by Daniel Haertle)
  232 +# 2004-02-29 fl Changed long hex constants to signed integers
  233 +#
  234 +# Notes:
  235 +# FIXME: sort out sign problem (eliminate long hex constants)
  236 +# FIXME: change filename to use "a/b/c" instead of ["a", "b", "c"]
  237 +# FIXME: provide a glob mechanism function (using fnmatchcase)
  238 +#
  239 +# Literature:
  240 +#
  241 +# "FlashPix Format Specification, Appendix A", Kodak and Microsoft,
  242 +# September 1996.
  243 +#
  244 +# Quotes:
  245 +#
  246 +# "If this document and functionality of the Software conflict,
  247 +# the actual functionality of the Software represents the correct
  248 +# functionality" -- Microsoft, in the OLE format specification
  249 +
  250 +#------------------------------------------------------------------------------
  251 +
  252 +
  253 +import io
  254 +import sys
  255 +import struct, array, os.path, datetime
  256 +
  257 +#=== COMPATIBILITY WORKAROUNDS ================================================
  258 +
  259 +#[PL] Define explicitly the public API to avoid private objects in pydoc:
  260 +#TODO: add more
  261 +# __all__ = ['OleFileIO', 'isOleFile', 'MAGIC']
  262 +
  263 +# For Python 3.x, need to redefine long as int:
  264 +if str is not bytes:
  265 + long = int
  266 +
  267 +# Need to make sure we use xrange both on Python 2 and 3.x:
  268 +try:
  269 + # on Python 2 we need xrange:
  270 + iterrange = xrange
  271 +except:
  272 + # no xrange, for Python 3 it was renamed as range:
  273 + iterrange = range
  274 +
  275 +#[PL] workaround to fix an issue with array item size on 64 bits systems:
  276 +if array.array('L').itemsize == 4:
  277 + # on 32 bits platforms, long integers in an array are 32 bits:
  278 + UINT32 = 'L'
  279 +elif array.array('I').itemsize == 4:
  280 + # on 64 bits platforms, integers in an array are 32 bits:
  281 + UINT32 = 'I'
  282 +elif array.array('i').itemsize == 4:
  283 + # On 64 bit Jython, signed integers ('i') are the only way to store our 32
  284 + # bit values in an array in a *somewhat* reasonable way, as the otherwise
  285 + # perfectly suited 'H' (unsigned int, 32 bits) results in a completely
  286 + # unusable behaviour. This is most likely caused by the fact that Java
  287 + # doesn't have unsigned values, and thus Jython's "array" implementation,
  288 + # which is based on "jarray", doesn't have them either.
  289 + # NOTE: to trick Jython into converting the values it would normally
  290 + # interpret as "signed" into "unsigned", a binary-and operation with
  291 + # 0xFFFFFFFF can be used. This way it is possible to use the same comparing
  292 + # operations on all platforms / implementations. The corresponding code
  293 + # lines are flagged with a 'JYTHON-WORKAROUND' tag below.
  294 + UINT32 = 'i'
  295 +else:
  296 + raise ValueError('Need to fix a bug with 32 bit arrays, please contact author...')
  297 +
  298 +
  299 +#[PL] These workarounds were inspired from the Path module
  300 +# (see http://www.jorendorff.com/articles/python/path/)
  301 +#TODO: test with old Python versions
  302 +
  303 +# Pre-2.3 workaround for basestring.
  304 +try:
  305 + basestring
  306 +except NameError:
  307 + try:
  308 + # is Unicode supported (Python >2.0 or >1.6 ?)
  309 + basestring = (str, unicode)
  310 + except NameError:
  311 + basestring = str
  312 +
  313 +#[PL] Experimental setting: if True, OLE filenames will be kept in Unicode
  314 +# if False (default PIL behaviour), all filenames are converted to Latin-1.
  315 +KEEP_UNICODE_NAMES = False
  316 +
  317 +#=== DEBUGGING ===============================================================
  318 +
  319 +#TODO: replace this by proper logging
  320 +
  321 +#[PL] DEBUG display mode: False by default, use set_debug_mode() or "-d" on
  322 +# command line to change it.
  323 +DEBUG_MODE = False
  324 +def debug_print(msg):
  325 + print(msg)
  326 +def debug_pass(msg):
  327 + pass
  328 +debug = debug_pass
  329 +
  330 +def set_debug_mode(debug_mode):
  331 + """
  332 + Set debug mode on or off, to control display of debugging messages.
  333 + :param mode: True or False
  334 + """
  335 + global DEBUG_MODE, debug
  336 + DEBUG_MODE = debug_mode
  337 + if debug_mode:
  338 + debug = debug_print
  339 + else:
  340 + debug = debug_pass
  341 +
  342 +
  343 +#=== CONSTANTS ===============================================================
  344 +
  345 +# magic bytes that should be at the beginning of every OLE file:
  346 +MAGIC = b'\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1'
  347 +
  348 +#[PL]: added constants for Sector IDs (from AAF specifications)
  349 +MAXREGSECT = 0xFFFFFFFA # (-6) maximum SECT
  350 +DIFSECT = 0xFFFFFFFC # (-4) denotes a DIFAT sector in a FAT
  351 +FATSECT = 0xFFFFFFFD # (-3) denotes a FAT sector in a FAT
  352 +ENDOFCHAIN = 0xFFFFFFFE # (-2) end of a virtual stream chain
  353 +FREESECT = 0xFFFFFFFF # (-1) unallocated sector
  354 +
  355 +#[PL]: added constants for Directory Entry IDs (from AAF specifications)
  356 +MAXREGSID = 0xFFFFFFFA # (-6) maximum directory entry ID
  357 +NOSTREAM = 0xFFFFFFFF # (-1) unallocated directory entry
  358 +
  359 +#[PL] object types in storage (from AAF specifications)
  360 +STGTY_EMPTY = 0 # empty directory entry (according to OpenOffice.org doc)
  361 +STGTY_STORAGE = 1 # element is a storage object
  362 +STGTY_STREAM = 2 # element is a stream object
  363 +STGTY_LOCKBYTES = 3 # element is an ILockBytes object
  364 +STGTY_PROPERTY = 4 # element is an IPropertyStorage object
  365 +STGTY_ROOT = 5 # element is a root storage
  366 +
  367 +
  368 +#
  369 +# --------------------------------------------------------------------
  370 +# property types
  371 +
  372 +VT_EMPTY=0; VT_NULL=1; VT_I2=2; VT_I4=3; VT_R4=4; VT_R8=5; VT_CY=6;
  373 +VT_DATE=7; VT_BSTR=8; VT_DISPATCH=9; VT_ERROR=10; VT_BOOL=11;
  374 +VT_VARIANT=12; VT_UNKNOWN=13; VT_DECIMAL=14; VT_I1=16; VT_UI1=17;
  375 +VT_UI2=18; VT_UI4=19; VT_I8=20; VT_UI8=21; VT_INT=22; VT_UINT=23;
  376 +VT_VOID=24; VT_HRESULT=25; VT_PTR=26; VT_SAFEARRAY=27; VT_CARRAY=28;
  377 +VT_USERDEFINED=29; VT_LPSTR=30; VT_LPWSTR=31; VT_FILETIME=64;
  378 +VT_BLOB=65; VT_STREAM=66; VT_STORAGE=67; VT_STREAMED_OBJECT=68;
  379 +VT_STORED_OBJECT=69; VT_BLOB_OBJECT=70; VT_CF=71; VT_CLSID=72;
  380 +VT_VECTOR=0x1000;
  381 +
  382 +# map property id to name (for debugging purposes)
  383 +
  384 +VT = {}
  385 +for keyword, var in list(vars().items()):
  386 + if keyword[:3] == "VT_":
  387 + VT[var] = keyword
  388 +
  389 +#
  390 +# --------------------------------------------------------------------
  391 +# Some common document types (root.clsid fields)
  392 +
  393 +WORD_CLSID = "00020900-0000-0000-C000-000000000046"
  394 +#TODO: check Excel, PPT, ...
  395 +
  396 +#[PL]: Defect levels to classify parsing errors - see OleFileIO._raise_defect()
  397 +DEFECT_UNSURE = 10 # a case which looks weird, but not sure it's a defect
  398 +DEFECT_POTENTIAL = 20 # a potential defect
  399 +DEFECT_INCORRECT = 30 # an error according to specifications, but parsing
  400 + # can go on
  401 +DEFECT_FATAL = 40 # an error which cannot be ignored, parsing is
  402 + # impossible
  403 +
  404 +# Minimal size of an empty OLE file, with 512-bytes sectors = 1536 bytes
  405 +# (this is used in isOleFile and OleFile.open)
  406 +MINIMAL_OLEFILE_SIZE = 1536
  407 +
  408 +#[PL] add useful constants to __all__:
  409 +# for key in list(vars().keys()):
  410 +# if key.startswith('STGTY_') or key.startswith('DEFECT_'):
  411 +# __all__.append(key)
  412 +
  413 +
  414 +#=== FUNCTIONS ===============================================================
  415 +
  416 +def isOleFile (filename):
  417 + """
  418 + Test if a file is an OLE container (according to the magic bytes in its header).
  419 +
  420 + :param filename: string-like or file-like object, OLE file to parse
  421 +
  422 + - if filename is a string smaller than 1536 bytes, it is the path
  423 + of the file to open. (bytes or unicode string)
  424 + - if filename is a string longer than 1535 bytes, it is parsed
  425 + as the content of an OLE file in memory. (bytes type only)
  426 + - if filename is a file-like object (with read and seek methods),
  427 + it is parsed as-is.
  428 +
  429 + :returns: True if OLE, False otherwise.
  430 + """
  431 + # check if filename is a string-like or file-like object:
  432 + if hasattr(filename, 'read'):
  433 + # file-like object: use it directly
  434 + header = filename.read(len(MAGIC))
  435 + # just in case, seek back to start of file:
  436 + filename.seek(0)
  437 + elif isinstance(filename, bytes) and len(filename) >= MINIMAL_OLEFILE_SIZE:
  438 + # filename is a bytes string containing the OLE file to be parsed:
  439 + header = filename[:len(MAGIC)]
  440 + else:
  441 + # string-like object: filename of file on disk
  442 + header = open(filename, 'rb').read(len(MAGIC))
  443 + if header == MAGIC:
  444 + return True
  445 + else:
  446 + return False
  447 +
  448 +
  449 +if bytes is str:
  450 + # version for Python 2.x
  451 + def i8(c):
  452 + return ord(c)
  453 +else:
  454 + # version for Python 3.x
  455 + def i8(c):
  456 + return c if c.__class__ is int else c[0]
  457 +
  458 +
  459 +#TODO: replace i16 and i32 with more readable struct.unpack equivalent?
  460 +
  461 +def i16(c, o = 0):
  462 + """
  463 + Converts a 2-bytes (16 bits) string to an integer.
  464 +
  465 + :param c: string containing bytes to convert
  466 + :param o: offset of bytes to convert in string
  467 + """
  468 + return i8(c[o]) | (i8(c[o+1])<<8)
  469 +
  470 +
  471 +def i32(c, o = 0):
  472 + """
  473 + Converts a 4-bytes (32 bits) string to an integer.
  474 +
  475 + :param c: string containing bytes to convert
  476 + :param o: offset of bytes to convert in string
  477 + """
  478 +## return int(ord(c[o])+(ord(c[o+1])<<8)+(ord(c[o+2])<<16)+(ord(c[o+3])<<24))
  479 +## # [PL]: added int() because "<<" gives long int since Python 2.4
  480 + # copied from Pillow's _binary:
  481 + return i8(c[o]) | (i8(c[o+1])<<8) | (i8(c[o+2])<<16) | (i8(c[o+3])<<24)
  482 +
  483 +
  484 +def _clsid(clsid):
  485 + """
  486 + Converts a CLSID to a human-readable string.
  487 +
  488 + :param clsid: string of length 16.
  489 + """
  490 + assert len(clsid) == 16
  491 + # if clsid is only made of null bytes, return an empty string:
  492 + # (PL: why not simply return the string with zeroes?)
  493 + if not clsid.strip(b"\0"):
  494 + return ""
  495 + return (("%08X-%04X-%04X-%02X%02X-" + "%02X" * 6) %
  496 + ((i32(clsid, 0), i16(clsid, 4), i16(clsid, 6)) +
  497 + tuple(map(i8, clsid[8:16]))))
  498 +
  499 +
  500 +
  501 +# UNICODE support:
  502 +# (necessary to handle storages/streams names which use Unicode)
  503 +
  504 +def _unicode(s, errors='replace'):
  505 + """
  506 + Map unicode string to Latin 1. (Python with Unicode support)
  507 +
  508 + :param s: UTF-16LE unicode string to convert to Latin-1
  509 + :param errors: 'replace', 'ignore' or 'strict'.
  510 + """
  511 + #TODO: test if it OleFileIO works with Unicode strings, instead of
  512 + # converting to Latin-1.
  513 + try:
  514 + # First the string is converted to plain Unicode:
  515 + # (assuming it is encoded as UTF-16 little-endian)
  516 + u = s.decode('UTF-16LE', errors)
  517 + if bytes is not str or KEEP_UNICODE_NAMES:
  518 + return u
  519 + else:
  520 + # Second the unicode string is converted to Latin-1
  521 + return u.encode('latin_1', errors)
  522 + except:
  523 + # there was an error during Unicode to Latin-1 conversion:
  524 + raise IOError('incorrect Unicode name')
  525 +
  526 +
  527 +def filetime2datetime(filetime):
  528 + """
  529 + convert FILETIME (64 bits int) to Python datetime.datetime
  530 + """
  531 + # TODO: manage exception when microseconds is too large
  532 + # inspired from http://code.activestate.com/recipes/511425-filetime-to-datetime/
  533 + _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0)
  534 + #debug('timedelta days=%d' % (filetime//(10*1000000*3600*24)))
  535 + return _FILETIME_null_date + datetime.timedelta(microseconds=filetime//10)
  536 +
  537 +
  538 +
  539 +#=== CLASSES ==================================================================
  540 +
  541 +class OleMetadata:
  542 + """
  543 + class to parse and store metadata from standard properties of OLE files.
  544 +
  545 + Available attributes:
  546 + codepage, title, subject, author, keywords, comments, template,
  547 + last_saved_by, revision_number, total_edit_time, last_printed, create_time,
  548 + last_saved_time, num_pages, num_words, num_chars, thumbnail,
  549 + creating_application, security, codepage_doc, category, presentation_target,
  550 + bytes, lines, paragraphs, slides, notes, hidden_slides, mm_clips,
  551 + scale_crop, heading_pairs, titles_of_parts, manager, company, links_dirty,
  552 + chars_with_spaces, unused, shared_doc, link_base, hlinks, hlinks_changed,
  553 + version, dig_sig, content_type, content_status, language, doc_version
  554 +
  555 + Note: an attribute is set to None when not present in the properties of the
  556 + OLE file.
  557 +
  558 + References for SummaryInformation stream:
  559 + - http://msdn.microsoft.com/en-us/library/dd942545.aspx
  560 + - http://msdn.microsoft.com/en-us/library/dd925819%28v=office.12%29.aspx
  561 + - http://msdn.microsoft.com/en-us/library/windows/desktop/aa380376%28v=vs.85%29.aspx
  562 + - http://msdn.microsoft.com/en-us/library/aa372045.aspx
  563 + - http://sedna-soft.de/summary-information-stream/
  564 + - http://poi.apache.org/apidocs/org/apache/poi/hpsf/SummaryInformation.html
  565 +
  566 + References for DocumentSummaryInformation stream:
  567 + - http://msdn.microsoft.com/en-us/library/dd945671%28v=office.12%29.aspx
  568 + - http://msdn.microsoft.com/en-us/library/windows/desktop/aa380374%28v=vs.85%29.aspx
  569 + - http://poi.apache.org/apidocs/org/apache/poi/hpsf/DocumentSummaryInformation.html
  570 +
  571 + new in version 0.25
  572 + """
  573 +
  574 + # attribute names for SummaryInformation stream properties:
  575 + # (ordered by property id, starting at 1)
  576 + SUMMARY_ATTRIBS = ['codepage', 'title', 'subject', 'author', 'keywords', 'comments',
  577 + 'template', 'last_saved_by', 'revision_number', 'total_edit_time',
  578 + 'last_printed', 'create_time', 'last_saved_time', 'num_pages',
  579 + 'num_words', 'num_chars', 'thumbnail', 'creating_application',
  580 + 'security']
  581 +
  582 + # attribute names for DocumentSummaryInformation stream properties:
  583 + # (ordered by property id, starting at 1)
  584 + DOCSUM_ATTRIBS = ['codepage_doc', 'category', 'presentation_target', 'bytes', 'lines', 'paragraphs',
  585 + 'slides', 'notes', 'hidden_slides', 'mm_clips',
  586 + 'scale_crop', 'heading_pairs', 'titles_of_parts', 'manager',
  587 + 'company', 'links_dirty', 'chars_with_spaces', 'unused', 'shared_doc',
  588 + 'link_base', 'hlinks', 'hlinks_changed', 'version', 'dig_sig',
  589 + 'content_type', 'content_status', 'language', 'doc_version']
  590 +
  591 + def __init__(self):
  592 + """
  593 + Constructor for OleMetadata
  594 + All attributes are set to None by default
  595 + """
  596 + # properties from SummaryInformation stream
  597 + self.codepage = None
  598 + self.title = None
  599 + self.subject = None
  600 + self.author = None
  601 + self.keywords = None
  602 + self.comments = None
  603 + self.template = None
  604 + self.last_saved_by = None
  605 + self.revision_number = None
  606 + self.total_edit_time = None
  607 + self.last_printed = None
  608 + self.create_time = None
  609 + self.last_saved_time = None
  610 + self.num_pages = None
  611 + self.num_words = None
  612 + self.num_chars = None
  613 + self.thumbnail = None
  614 + self.creating_application = None
  615 + self.security = None
  616 + # properties from DocumentSummaryInformation stream
  617 + self.codepage_doc = None
  618 + self.category = None
  619 + self.presentation_target = None
  620 + self.bytes = None
  621 + self.lines = None
  622 + self.paragraphs = None
  623 + self.slides = None
  624 + self.notes = None
  625 + self.hidden_slides = None
  626 + self.mm_clips = None
  627 + self.scale_crop = None
  628 + self.heading_pairs = None
  629 + self.titles_of_parts = None
  630 + self.manager = None
  631 + self.company = None
  632 + self.links_dirty = None
  633 + self.chars_with_spaces = None
  634 + self.unused = None
  635 + self.shared_doc = None
  636 + self.link_base = None
  637 + self.hlinks = None
  638 + self.hlinks_changed = None
  639 + self.version = None
  640 + self.dig_sig = None
  641 + self.content_type = None
  642 + self.content_status = None
  643 + self.language = None
  644 + self.doc_version = None
  645 +
  646 +
  647 + def parse_properties(self, olefile):
  648 + """
  649 + Parse standard properties of an OLE file, from the streams
  650 + "\x05SummaryInformation" and "\x05DocumentSummaryInformation",
  651 + if present.
  652 + Properties are converted to strings, integers or python datetime objects.
  653 + If a property is not present, its value is set to None.
  654 + """
  655 + # first set all attributes to None:
  656 + for attrib in (self.SUMMARY_ATTRIBS + self.DOCSUM_ATTRIBS):
  657 + setattr(self, attrib, None)
  658 + if olefile.exists("\x05SummaryInformation"):
  659 + # get properties from the stream:
  660 + # (converting timestamps to python datetime, except total_edit_time,
  661 + # which is property #10)
  662 + props = olefile.getproperties("\x05SummaryInformation",
  663 + convert_time=True, no_conversion=[10])
  664 + # store them into this object's attributes:
  665 + for i in range(len(self.SUMMARY_ATTRIBS)):
  666 + # ids for standards properties start at 0x01, until 0x13
  667 + value = props.get(i+1, None)
  668 + setattr(self, self.SUMMARY_ATTRIBS[i], value)
  669 + if olefile.exists("\x05DocumentSummaryInformation"):
  670 + # get properties from the stream:
  671 + props = olefile.getproperties("\x05DocumentSummaryInformation",
  672 + convert_time=True)
  673 + # store them into this object's attributes:
  674 + for i in range(len(self.DOCSUM_ATTRIBS)):
  675 + # ids for standards properties start at 0x01, until 0x13
  676 + value = props.get(i+1, None)
  677 + setattr(self, self.DOCSUM_ATTRIBS[i], value)
  678 +
  679 + def dump(self):
  680 + """
  681 + Dump all metadata, for debugging purposes.
  682 + """
  683 + print('Properties from SummaryInformation stream:')
  684 + for prop in self.SUMMARY_ATTRIBS:
  685 + value = getattr(self, prop)
  686 + print('- %s: %s' % (prop, repr(value)))
  687 + print('Properties from DocumentSummaryInformation stream:')
  688 + for prop in self.DOCSUM_ATTRIBS:
  689 + value = getattr(self, prop)
  690 + print('- %s: %s' % (prop, repr(value)))
  691 +
  692 +
  693 +#--- _OleStream ---------------------------------------------------------------
  694 +
  695 +class _OleStream(io.BytesIO):
  696 + """
  697 + OLE2 Stream
  698 +
  699 + Returns a read-only file object which can be used to read
  700 + the contents of a OLE stream (instance of the BytesIO class).
  701 + To open a stream, use the openstream method in the OleFile class.
  702 +
  703 + This function can be used with either ordinary streams,
  704 + or ministreams, depending on the offset, sectorsize, and
  705 + fat table arguments.
  706 +
  707 + Attributes:
  708 +
  709 + - size: actual size of data stream, after it was opened.
  710 + """
  711 +
  712 + # FIXME: should store the list of sects obtained by following
  713 + # the fat chain, and load new sectors on demand instead of
  714 + # loading it all in one go.
  715 +
  716 + def __init__(self, fp, sect, size, offset, sectorsize, fat, filesize):
  717 + """
  718 + Constructor for _OleStream class.
  719 +
  720 + :param fp: file object, the OLE container or the MiniFAT stream
  721 + :param sect: sector index of first sector in the stream
  722 + :param size: total size of the stream
  723 + :param offset: offset in bytes for the first FAT or MiniFAT sector
  724 + :param sectorsize: size of one sector
  725 + :param fat: array/list of sector indexes (FAT or MiniFAT)
  726 + :param filesize: size of OLE file (for debugging)
  727 + :returns: a BytesIO instance containing the OLE stream
  728 + """
  729 + debug('_OleStream.__init__:')
  730 + debug(' sect=%d (%X), size=%d, offset=%d, sectorsize=%d, len(fat)=%d, fp=%s'
  731 + %(sect,sect,size,offset,sectorsize,len(fat), repr(fp)))
  732 + #[PL] To detect malformed documents with FAT loops, we compute the
  733 + # expected number of sectors in the stream:
  734 + unknown_size = False
  735 + if size==0x7FFFFFFF:
  736 + # this is the case when called from OleFileIO._open(), and stream
  737 + # size is not known in advance (for example when reading the
  738 + # Directory stream). Then we can only guess maximum size:
  739 + size = len(fat)*sectorsize
  740 + # and we keep a record that size was unknown:
  741 + unknown_size = True
  742 + debug(' stream with UNKNOWN SIZE')
  743 + nb_sectors = (size + (sectorsize-1)) // sectorsize
  744 + debug('nb_sectors = %d' % nb_sectors)
  745 + # This number should (at least) be less than the total number of
  746 + # sectors in the given FAT:
  747 + if nb_sectors > len(fat):
  748 + raise IOError('malformed OLE document, stream too large')
  749 + # optimization(?): data is first a list of strings, and join() is called
  750 + # at the end to concatenate all in one string.
  751 + # (this may not be really useful with recent Python versions)
  752 + data = []
  753 + # if size is zero, then first sector index should be ENDOFCHAIN:
  754 + if size == 0 and sect != ENDOFCHAIN:
  755 + debug('size == 0 and sect != ENDOFCHAIN:')
  756 + raise IOError('incorrect OLE sector index for empty stream')
  757 + #[PL] A fixed-length for loop is used instead of an undefined while
  758 + # loop to avoid DoS attacks:
  759 + for i in range(nb_sectors):
  760 + # Sector index may be ENDOFCHAIN, but only if size was unknown
  761 + if sect == ENDOFCHAIN:
  762 + if unknown_size:
  763 + break
  764 + else:
  765 + # else this means that the stream is smaller than declared:
  766 + debug('sect=ENDOFCHAIN before expected size')
  767 + raise IOError('incomplete OLE stream')
  768 + # sector index should be within FAT:
  769 + if sect<0 or sect>=len(fat):
  770 + debug('sect=%d (%X) / len(fat)=%d' % (sect, sect, len(fat)))
  771 + debug('i=%d / nb_sectors=%d' %(i, nb_sectors))
  772 +## tmp_data = b"".join(data)
  773 +## f = open('test_debug.bin', 'wb')
  774 +## f.write(tmp_data)
  775 +## f.close()
  776 +## debug('data read so far: %d bytes' % len(tmp_data))
  777 + raise IOError('incorrect OLE FAT, sector index out of range')
  778 + #TODO: merge this code with OleFileIO.getsect() ?
  779 + #TODO: check if this works with 4K sectors:
  780 + try:
  781 + fp.seek(offset + sectorsize * sect)
  782 + except:
  783 + debug('sect=%d, seek=%d, filesize=%d' %
  784 + (sect, offset+sectorsize*sect, filesize))
  785 + raise IOError('OLE sector index out of range')
  786 + sector_data = fp.read(sectorsize)
  787 + # [PL] check if there was enough data:
  788 + # Note: if sector is the last of the file, sometimes it is not a
  789 + # complete sector (of 512 or 4K), so we may read less than
  790 + # sectorsize.
  791 + if len(sector_data)!=sectorsize and sect!=(len(fat)-1):
  792 + debug('sect=%d / len(fat)=%d, seek=%d / filesize=%d, len read=%d' %
  793 + (sect, len(fat), offset+sectorsize*sect, filesize, len(sector_data)))
  794 + debug('seek+len(read)=%d' % (offset+sectorsize*sect+len(sector_data)))
  795 + raise IOError('incomplete OLE sector')
  796 + data.append(sector_data)
  797 + # jump to next sector in the FAT:
  798 + try:
  799 + sect = fat[sect] & 0xFFFFFFFF # JYTHON-WORKAROUND
  800 + except IndexError:
  801 + # [PL] if pointer is out of the FAT an exception is raised
  802 + raise IOError('incorrect OLE FAT, sector index out of range')
  803 + #[PL] Last sector should be a "end of chain" marker:
  804 + if sect != ENDOFCHAIN:
  805 + raise IOError('incorrect last sector index in OLE stream')
  806 + data = b"".join(data)
  807 + # Data is truncated to the actual stream size:
  808 + if len(data) >= size:
  809 + data = data[:size]
  810 + # actual stream size is stored for future use:
  811 + self.size = size
  812 + elif unknown_size:
  813 + # actual stream size was not known, now we know the size of read
  814 + # data:
  815 + self.size = len(data)
  816 + else:
  817 + # read data is less than expected:
  818 + debug('len(data)=%d, size=%d' % (len(data), size))
  819 + raise IOError('OLE stream size is less than declared')
  820 + # when all data is read in memory, BytesIO constructor is called
  821 + io.BytesIO.__init__(self, data)
  822 + # Then the _OleStream object can be used as a read-only file object.
  823 +
  824 +
  825 +#--- _OleDirectoryEntry -------------------------------------------------------
  826 +
  827 +class _OleDirectoryEntry:
  828 +
  829 + """
  830 + OLE2 Directory Entry
  831 + """
  832 + #[PL] parsing code moved from OleFileIO.loaddirectory
  833 +
  834 + # struct to parse directory entries:
  835 + # <: little-endian byte order, standard sizes
  836 + # (note: this should guarantee that Q returns a 64 bits int)
  837 + # 64s: string containing entry name in unicode (max 31 chars) + null char
  838 + # H: uint16, number of bytes used in name buffer, including null = (len+1)*2
  839 + # B: uint8, dir entry type (between 0 and 5)
  840 + # B: uint8, color: 0=black, 1=red
  841 + # I: uint32, index of left child node in the red-black tree, NOSTREAM if none
  842 + # I: uint32, index of right child node in the red-black tree, NOSTREAM if none
  843 + # I: uint32, index of child root node if it is a storage, else NOSTREAM
  844 + # 16s: CLSID, unique identifier (only used if it is a storage)
  845 + # I: uint32, user flags
  846 + # Q (was 8s): uint64, creation timestamp or zero
  847 + # Q (was 8s): uint64, modification timestamp or zero
  848 + # I: uint32, SID of first sector if stream or ministream, SID of 1st sector
  849 + # of stream containing ministreams if root entry, 0 otherwise
  850 + # I: uint32, total stream size in bytes if stream (low 32 bits), 0 otherwise
  851 + # I: uint32, total stream size in bytes if stream (high 32 bits), 0 otherwise
  852 + STRUCT_DIRENTRY = '<64sHBBIII16sIQQIII'
  853 + # size of a directory entry: 128 bytes
  854 + DIRENTRY_SIZE = 128
  855 + assert struct.calcsize(STRUCT_DIRENTRY) == DIRENTRY_SIZE
  856 +
  857 +
  858 + def __init__(self, entry, sid, olefile):
  859 + """
  860 + Constructor for an _OleDirectoryEntry object.
  861 + Parses a 128-bytes entry from the OLE Directory stream.
  862 +
  863 + :param entry : string (must be 128 bytes long)
  864 + :param sid : index of this directory entry in the OLE file directory
  865 + :param olefile: OleFileIO containing this directory entry
  866 + """
  867 + self.sid = sid
  868 + # ref to olefile is stored for future use
  869 + self.olefile = olefile
  870 + # kids is a list of children entries, if this entry is a storage:
  871 + # (list of _OleDirectoryEntry objects)
  872 + self.kids = []
  873 + # kids_dict is a dictionary of children entries, indexed by their
  874 + # name in lowercase: used to quickly find an entry, and to detect
  875 + # duplicates
  876 + self.kids_dict = {}
  877 + # flag used to detect if the entry is referenced more than once in
  878 + # directory:
  879 + self.used = False
  880 + # decode DirEntry
  881 + (
  882 + name,
  883 + namelength,
  884 + self.entry_type,
  885 + self.color,
  886 + self.sid_left,
  887 + self.sid_right,
  888 + self.sid_child,
  889 + clsid,
  890 + self.dwUserFlags,
  891 + self.createTime,
  892 + self.modifyTime,
  893 + self.isectStart,
  894 + sizeLow,
  895 + sizeHigh
  896 + ) = struct.unpack(_OleDirectoryEntry.STRUCT_DIRENTRY, entry)
  897 + if self.entry_type not in [STGTY_ROOT, STGTY_STORAGE, STGTY_STREAM, STGTY_EMPTY]:
  898 + olefile._raise_defect(DEFECT_INCORRECT, 'unhandled OLE storage type')
  899 + # only first directory entry can (and should) be root:
  900 + if self.entry_type == STGTY_ROOT and sid != 0:
  901 + olefile._raise_defect(DEFECT_INCORRECT, 'duplicate OLE root entry')
  902 + if sid == 0 and self.entry_type != STGTY_ROOT:
  903 + olefile._raise_defect(DEFECT_INCORRECT, 'incorrect OLE root entry')
  904 + #debug (struct.unpack(fmt_entry, entry[:len_entry]))
  905 + # name should be at most 31 unicode characters + null character,
  906 + # so 64 bytes in total (31*2 + 2):
  907 + if namelength>64:
  908 + olefile._raise_defect(DEFECT_INCORRECT, 'incorrect DirEntry name length')
  909 + # if exception not raised, namelength is set to the maximum value:
  910 + namelength = 64
  911 + # only characters without ending null char are kept:
  912 + name = name[:(namelength-2)]
  913 + # name is converted from unicode to Latin-1:
  914 + self.name = _unicode(name)
  915 +
  916 + debug('DirEntry SID=%d: %s' % (self.sid, repr(self.name)))
  917 + debug(' - type: %d' % self.entry_type)
  918 + debug(' - sect: %d' % self.isectStart)
  919 + debug(' - SID left: %d, right: %d, child: %d' % (self.sid_left,
  920 + self.sid_right, self.sid_child))
  921 +
  922 + # sizeHigh is only used for 4K sectors, it should be zero for 512 bytes
  923 + # sectors, BUT apparently some implementations set it as 0xFFFFFFFF, 1
  924 + # or some other value so it cannot be raised as a defect in general:
  925 + if olefile.sectorsize == 512:
  926 + if sizeHigh != 0 and sizeHigh != 0xFFFFFFFF:
  927 + debug('sectorsize=%d, sizeLow=%d, sizeHigh=%d (%X)' %
  928 + (olefile.sectorsize, sizeLow, sizeHigh, sizeHigh))
  929 + olefile._raise_defect(DEFECT_UNSURE, 'incorrect OLE stream size')
  930 + self.size = sizeLow
  931 + else:
  932 + self.size = sizeLow + (long(sizeHigh)<<32)
  933 + debug(' - size: %d (sizeLow=%d, sizeHigh=%d)' % (self.size, sizeLow, sizeHigh))
  934 +
  935 + self.clsid = _clsid(clsid)
  936 + # a storage should have a null size, BUT some implementations such as
  937 + # Word 8 for Mac seem to allow non-null values => Potential defect:
  938 + if self.entry_type == STGTY_STORAGE and self.size != 0:
  939 + olefile._raise_defect(DEFECT_POTENTIAL, 'OLE storage with size>0')
  940 + # check if stream is not already referenced elsewhere:
  941 + if self.entry_type in (STGTY_ROOT, STGTY_STREAM) and self.size>0:
  942 + if self.size < olefile.minisectorcutoff \
  943 + and self.entry_type==STGTY_STREAM: # only streams can be in MiniFAT
  944 + # ministream object
  945 + minifat = True
  946 + else:
  947 + minifat = False
  948 + olefile._check_duplicate_stream(self.isectStart, minifat)
  949 +
  950 +
  951 +
  952 + def build_storage_tree(self):
  953 + """
  954 + Read and build the red-black tree attached to this _OleDirectoryEntry
  955 + object, if it is a storage.
  956 + Note that this method builds a tree of all subentries, so it should
  957 + only be called for the root object once.
  958 + """
  959 + debug('build_storage_tree: SID=%d - %s - sid_child=%d'
  960 + % (self.sid, repr(self.name), self.sid_child))
  961 + if self.sid_child != NOSTREAM:
  962 + # if child SID is not NOSTREAM, then this entry is a storage.
  963 + # Let's walk through the tree of children to fill the kids list:
  964 + self.append_kids(self.sid_child)
  965 +
  966 + # Note from OpenOffice documentation: the safest way is to
  967 + # recreate the tree because some implementations may store broken
  968 + # red-black trees...
  969 +
  970 + # in the OLE file, entries are sorted on (length, name).
  971 + # for convenience, we sort them on name instead:
  972 + # (see rich comparison methods in this class)
  973 + self.kids.sort()
  974 +
  975 +
  976 + def append_kids(self, child_sid):
  977 + """
  978 + Walk through red-black tree of children of this directory entry to add
  979 + all of them to the kids list. (recursive method)
  980 +
  981 + :param child_sid : index of child directory entry to use, or None when called
  982 + first time for the root. (only used during recursion)
  983 + """
  984 + #[PL] this method was added to use simple recursion instead of a complex
  985 + # algorithm.
  986 + # if this is not a storage or a leaf of the tree, nothing to do:
  987 + if child_sid == NOSTREAM:
  988 + return
  989 + # check if child SID is in the proper range:
  990 + if child_sid<0 or child_sid>=len(self.olefile.direntries):
  991 + self.olefile._raise_defect(DEFECT_FATAL, 'OLE DirEntry index out of range')
  992 + # get child direntry:
  993 + child = self.olefile._load_direntry(child_sid) #direntries[child_sid]
  994 + debug('append_kids: child_sid=%d - %s - sid_left=%d, sid_right=%d, sid_child=%d'
  995 + % (child.sid, repr(child.name), child.sid_left, child.sid_right, child.sid_child))
  996 + # the directory entries are organized as a red-black tree.
  997 + # (cf. Wikipedia for details)
  998 + # First walk through left side of the tree:
  999 + self.append_kids(child.sid_left)
  1000 + # Check if its name is not already used (case-insensitive):
  1001 + name_lower = child.name.lower()
  1002 + if name_lower in self.kids_dict:
  1003 + self.olefile._raise_defect(DEFECT_INCORRECT,
  1004 + "Duplicate filename in OLE storage")
  1005 + # Then the child_sid _OleDirectoryEntry object is appended to the
  1006 + # kids list and dictionary:
  1007 + self.kids.append(child)
  1008 + self.kids_dict[name_lower] = child
  1009 + # Check if kid was not already referenced in a storage:
  1010 + if child.used:
  1011 + self.olefile._raise_defect(DEFECT_INCORRECT,
  1012 + 'OLE Entry referenced more than once')
  1013 + child.used = True
  1014 + # Finally walk through right side of the tree:
  1015 + self.append_kids(child.sid_right)
  1016 + # Afterwards build kid's own tree if it's also a storage:
  1017 + child.build_storage_tree()
  1018 +
  1019 +
  1020 + def __eq__(self, other):
  1021 + "Compare entries by name"
  1022 + return self.name == other.name
  1023 +
  1024 + def __lt__(self, other):
  1025 + "Compare entries by name"
  1026 + return self.name < other.name
  1027 +
  1028 + def __ne__(self, other):
  1029 + return not self.__eq__(other)
  1030 +
  1031 + def __le__(self, other):
  1032 + return self.__eq__(other) or self.__lt__(other)
  1033 +
  1034 + # Reflected __lt__() and __le__() will be used for __gt__() and __ge__()
  1035 +
  1036 + #TODO: replace by the same function as MS implementation ?
  1037 + # (order by name length first, then case-insensitive order)
  1038 +
  1039 +
  1040 + def dump(self, tab = 0):
  1041 + "Dump this entry, and all its subentries (for debug purposes only)"
  1042 + TYPES = ["(invalid)", "(storage)", "(stream)", "(lockbytes)",
  1043 + "(property)", "(root)"]
  1044 + print(" "*tab + repr(self.name), TYPES[self.entry_type], end=' ')
  1045 + if self.entry_type in (STGTY_STREAM, STGTY_ROOT):
  1046 + print(self.size, "bytes", end=' ')
  1047 + print()
  1048 + if self.entry_type in (STGTY_STORAGE, STGTY_ROOT) and self.clsid:
  1049 + print(" "*tab + "{%s}" % self.clsid)
  1050 +
  1051 + for kid in self.kids:
  1052 + kid.dump(tab + 2)
  1053 +
  1054 +
  1055 + def getmtime(self):
  1056 + """
  1057 + Return modification time of a directory entry.
  1058 +
  1059 + :returns: None if modification time is null, a python datetime object
  1060 + otherwise (UTC timezone)
  1061 +
  1062 + new in version 0.26
  1063 + """
  1064 + if self.modifyTime == 0:
  1065 + return None
  1066 + return filetime2datetime(self.modifyTime)
  1067 +
  1068 +
  1069 + def getctime(self):
  1070 + """
  1071 + Return creation time of a directory entry.
  1072 +
  1073 + :returns: None if modification time is null, a python datetime object
  1074 + otherwise (UTC timezone)
  1075 +
  1076 + new in version 0.26
  1077 + """
  1078 + if self.createTime == 0:
  1079 + return None
  1080 + return filetime2datetime(self.createTime)
  1081 +
  1082 +
  1083 +#--- OleFileIO ----------------------------------------------------------------
  1084 +
  1085 +class OleFileIO:
  1086 + """
  1087 + OLE container object
  1088 +
  1089 + This class encapsulates the interface to an OLE 2 structured
  1090 + storage file. Use the listdir and openstream methods to
  1091 + access the contents of this file.
  1092 +
  1093 + Object names are given as a list of strings, one for each subentry
  1094 + level. The root entry should be omitted. For example, the following
  1095 + code extracts all image streams from a Microsoft Image Composer file::
  1096 +
  1097 + ole = OleFileIO("fan.mic")
  1098 +
  1099 + for entry in ole.listdir():
  1100 + if entry[1:2] == "Image":
  1101 + fin = ole.openstream(entry)
  1102 + fout = open(entry[0:1], "wb")
  1103 + while True:
  1104 + s = fin.read(8192)
  1105 + if not s:
  1106 + break
  1107 + fout.write(s)
  1108 +
  1109 + You can use the viewer application provided with the Python Imaging
  1110 + Library to view the resulting files (which happens to be standard
  1111 + TIFF files).
  1112 + """
  1113 +
  1114 + def __init__(self, filename=None, raise_defects=DEFECT_FATAL,
  1115 + write_mode=False, debug=False):
  1116 + """
  1117 + Constructor for the OleFileIO class.
  1118 +
  1119 + :param filename: file to open.
  1120 +
  1121 + - if filename is a string smaller than 1536 bytes, it is the path
  1122 + of the file to open. (bytes or unicode string)
  1123 + - if filename is a string longer than 1535 bytes, it is parsed
  1124 + as the content of an OLE file in memory. (bytes type only)
  1125 + - if filename is a file-like object (with read, seek and tell methods),
  1126 + it is parsed as-is.
  1127 +
  1128 + :param raise_defects: minimal level for defects to be raised as exceptions.
  1129 + (use DEFECT_FATAL for a typical application, DEFECT_INCORRECT for a
  1130 + security-oriented application, see source code for details)
  1131 +
  1132 + :param write_mode: bool, if True the file is opened in read/write mode instead
  1133 + of read-only by default.
  1134 +
  1135 + :param debug: bool, set debug mode
  1136 + """
  1137 + set_debug_mode(debug)
  1138 + # minimal level for defects to be raised as exceptions:
  1139 + self._raise_defects_level = raise_defects
  1140 + # list of defects/issues not raised as exceptions:
  1141 + # tuples of (exception type, message)
  1142 + self.parsing_issues = []
  1143 + self.write_mode = write_mode
  1144 + self._filesize = None
  1145 + self.fp = None
  1146 + if filename:
  1147 + self.open(filename, write_mode=write_mode)
  1148 +
  1149 +
  1150 + def _raise_defect(self, defect_level, message, exception_type=IOError):
  1151 + """
  1152 + This method should be called for any defect found during file parsing.
  1153 + It may raise an IOError exception according to the minimal level chosen
  1154 + for the OleFileIO object.
  1155 +
  1156 + :param defect_level: defect level, possible values are:
  1157 +
  1158 + - DEFECT_UNSURE : a case which looks weird, but not sure it's a defect
  1159 + - DEFECT_POTENTIAL : a potential defect
  1160 + - DEFECT_INCORRECT : an error according to specifications, but parsing can go on
  1161 + - DEFECT_FATAL : an error which cannot be ignored, parsing is impossible
  1162 +
  1163 + :param message: string describing the defect, used with raised exception.
  1164 + :param exception_type: exception class to be raised, IOError by default
  1165 + """
  1166 + # added by [PL]
  1167 + if defect_level >= self._raise_defects_level:
  1168 + raise exception_type(message)
  1169 + else:
  1170 + # just record the issue, no exception raised:
  1171 + self.parsing_issues.append((exception_type, message))
  1172 +
  1173 +
  1174 + def open(self, filename, write_mode=False):
  1175 + """
  1176 + Open an OLE2 file in read-only or read/write mode.
  1177 + Read and parse the header, FAT and directory.
  1178 +
  1179 + :param filename: string-like or file-like object, OLE file to parse
  1180 +
  1181 + - if filename is a string smaller than 1536 bytes, it is the path
  1182 + of the file to open. (bytes or unicode string)
  1183 + - if filename is a string longer than 1535 bytes, it is parsed
  1184 + as the content of an OLE file in memory. (bytes type only)
  1185 + - if filename is a file-like object (with read, seek and tell methods),
  1186 + it is parsed as-is.
  1187 +
  1188 + :param write_mode: bool, if True the file is opened in read/write mode instead
  1189 + of read-only by default. (ignored if filename is not a path)
  1190 + """
  1191 + self.write_mode = write_mode
  1192 + #[PL] check if filename is a string-like or file-like object:
  1193 + # (it is better to check for a read() method)
  1194 + if hasattr(filename, 'read'):
  1195 + #TODO: also check seek and tell methods?
  1196 + # file-like object: use it directly
  1197 + self.fp = filename
  1198 + elif isinstance(filename, bytes) and len(filename) >= MINIMAL_OLEFILE_SIZE:
  1199 + # filename is a bytes string containing the OLE file to be parsed:
  1200 + # convert it to BytesIO
  1201 + self.fp = io.BytesIO(filename)
  1202 + else:
  1203 + # string-like object: filename of file on disk
  1204 + if self.write_mode:
  1205 + # open file in mode 'read with update, binary'
  1206 + # According to https://docs.python.org/2/library/functions.html#open
  1207 + # 'w' would truncate the file, 'a' may only append on some Unixes
  1208 + mode = 'r+b'
  1209 + else:
  1210 + # read-only mode by default
  1211 + mode = 'rb'
  1212 + self.fp = open(filename, mode)
  1213 + # obtain the filesize by using seek and tell, which should work on most
  1214 + # file-like objects:
  1215 + #TODO: do it above, using getsize with filename when possible?
  1216 + #TODO: fix code to fail with clear exception when filesize cannot be obtained
  1217 + filesize=0
  1218 + self.fp.seek(0, os.SEEK_END)
  1219 + try:
  1220 + filesize = self.fp.tell()
  1221 + finally:
  1222 + self.fp.seek(0)
  1223 + self._filesize = filesize
  1224 +
  1225 + # lists of streams in FAT and MiniFAT, to detect duplicate references
  1226 + # (list of indexes of first sectors of each stream)
  1227 + self._used_streams_fat = []
  1228 + self._used_streams_minifat = []
  1229 +
  1230 + header = self.fp.read(512)
  1231 +
  1232 + if len(header) != 512 or header[:8] != MAGIC:
  1233 + self._raise_defect(DEFECT_FATAL, "not an OLE2 structured storage file")
  1234 +
  1235 + # [PL] header structure according to AAF specifications:
  1236 + ##Header
  1237 + ##struct StructuredStorageHeader { // [offset from start (bytes), length (bytes)]
  1238 + ##BYTE _abSig[8]; // [00H,08] {0xd0, 0xcf, 0x11, 0xe0, 0xa1, 0xb1,
  1239 + ## // 0x1a, 0xe1} for current version
  1240 + ##CLSID _clsid; // [08H,16] reserved must be zero (WriteClassStg/
  1241 + ## // GetClassFile uses root directory class id)
  1242 + ##USHORT _uMinorVersion; // [18H,02] minor version of the format: 33 is
  1243 + ## // written by reference implementation
  1244 + ##USHORT _uDllVersion; // [1AH,02] major version of the dll/format: 3 for
  1245 + ## // 512-byte sectors, 4 for 4 KB sectors
  1246 + ##USHORT _uByteOrder; // [1CH,02] 0xFFFE: indicates Intel byte-ordering
  1247 + ##USHORT _uSectorShift; // [1EH,02] size of sectors in power-of-two;
  1248 + ## // typically 9 indicating 512-byte sectors
  1249 + ##USHORT _uMiniSectorShift; // [20H,02] size of mini-sectors in power-of-two;
  1250 + ## // typically 6 indicating 64-byte mini-sectors
  1251 + ##USHORT _usReserved; // [22H,02] reserved, must be zero
  1252 + ##ULONG _ulReserved1; // [24H,04] reserved, must be zero
  1253 + ##FSINDEX _csectDir; // [28H,04] must be zero for 512-byte sectors,
  1254 + ## // number of SECTs in directory chain for 4 KB
  1255 + ## // sectors
  1256 + ##FSINDEX _csectFat; // [2CH,04] number of SECTs in the FAT chain
  1257 + ##SECT _sectDirStart; // [30H,04] first SECT in the directory chain
  1258 + ##DFSIGNATURE _signature; // [34H,04] signature used for transactions; must
  1259 + ## // be zero. The reference implementation
  1260 + ## // does not support transactions
  1261 + ##ULONG _ulMiniSectorCutoff; // [38H,04] maximum size for a mini stream;
  1262 + ## // typically 4096 bytes
  1263 + ##SECT _sectMiniFatStart; // [3CH,04] first SECT in the MiniFAT chain
  1264 + ##FSINDEX _csectMiniFat; // [40H,04] number of SECTs in the MiniFAT chain
  1265 + ##SECT _sectDifStart; // [44H,04] first SECT in the DIFAT chain
  1266 + ##FSINDEX _csectDif; // [48H,04] number of SECTs in the DIFAT chain
  1267 + ##SECT _sectFat[109]; // [4CH,436] the SECTs of first 109 FAT sectors
  1268 + ##};
  1269 +
  1270 + # [PL] header decoding:
  1271 + # '<' indicates little-endian byte ordering for Intel (cf. struct module help)
  1272 + fmt_header = '<8s16sHHHHHHLLLLLLLLLL'
  1273 + header_size = struct.calcsize(fmt_header)
  1274 + debug( "fmt_header size = %d, +FAT = %d" % (header_size, header_size + 109*4) )
  1275 + header1 = header[:header_size]
  1276 + (
  1277 + self.Sig,
  1278 + self.clsid,
  1279 + self.MinorVersion,
  1280 + self.DllVersion,
  1281 + self.ByteOrder,
  1282 + self.SectorShift,
  1283 + self.MiniSectorShift,
  1284 + self.Reserved, self.Reserved1,
  1285 + self.csectDir,
  1286 + self.csectFat,
  1287 + self.sectDirStart,
  1288 + self.signature,
  1289 + self.MiniSectorCutoff,
  1290 + self.MiniFatStart,
  1291 + self.csectMiniFat,
  1292 + self.sectDifStart,
  1293 + self.csectDif
  1294 + ) = struct.unpack(fmt_header, header1)
  1295 + debug( struct.unpack(fmt_header, header1))
  1296 +
  1297 + if self.Sig != MAGIC:
  1298 + # OLE signature should always be present
  1299 + self._raise_defect(DEFECT_FATAL, "incorrect OLE signature")
  1300 + if self.clsid != bytearray(16):
  1301 + # according to AAF specs, CLSID should always be zero
  1302 + self._raise_defect(DEFECT_INCORRECT, "incorrect CLSID in OLE header")
  1303 + debug( "MinorVersion = %d" % self.MinorVersion )
  1304 + debug( "DllVersion = %d" % self.DllVersion )
  1305 + if self.DllVersion not in [3, 4]:
  1306 + # version 3: usual format, 512 bytes per sector
  1307 + # version 4: large format, 4K per sector
  1308 + self._raise_defect(DEFECT_INCORRECT, "incorrect DllVersion in OLE header")
  1309 + debug( "ByteOrder = %X" % self.ByteOrder )
  1310 + if self.ByteOrder != 0xFFFE:
  1311 + # For now only common little-endian documents are handled correctly
  1312 + self._raise_defect(DEFECT_FATAL, "incorrect ByteOrder in OLE header")
  1313 + # TODO: add big-endian support for documents created on Mac ?
  1314 + # But according to [MS-CFB] ? v20140502, ByteOrder MUST be 0xFFFE.
  1315 + self.SectorSize = 2**self.SectorShift
  1316 + debug( "SectorSize = %d" % self.SectorSize )
  1317 + if self.SectorSize not in [512, 4096]:
  1318 + self._raise_defect(DEFECT_INCORRECT, "incorrect SectorSize in OLE header")
  1319 + if (self.DllVersion==3 and self.SectorSize!=512) \
  1320 + or (self.DllVersion==4 and self.SectorSize!=4096):
  1321 + self._raise_defect(DEFECT_INCORRECT, "SectorSize does not match DllVersion in OLE header")
  1322 + self.MiniSectorSize = 2**self.MiniSectorShift
  1323 + debug( "MiniSectorSize = %d" % self.MiniSectorSize )
  1324 + if self.MiniSectorSize not in [64]:
  1325 + self._raise_defect(DEFECT_INCORRECT, "incorrect MiniSectorSize in OLE header")
  1326 + if self.Reserved != 0 or self.Reserved1 != 0:
  1327 + self._raise_defect(DEFECT_INCORRECT, "incorrect OLE header (non-null reserved bytes)")
  1328 + debug( "csectDir = %d" % self.csectDir )
  1329 + # Number of directory sectors (only allowed if DllVersion != 3)
  1330 + if self.SectorSize==512 and self.csectDir!=0:
  1331 + self._raise_defect(DEFECT_INCORRECT, "incorrect csectDir in OLE header")
  1332 + debug( "csectFat = %d" % self.csectFat )
  1333 + # csectFat = number of FAT sectors in the file
  1334 + debug( "sectDirStart = %X" % self.sectDirStart )
  1335 + # sectDirStart = 1st sector containing the directory
  1336 + debug( "signature = %d" % self.signature )
  1337 + # Signature should be zero, BUT some implementations do not follow this
  1338 + # rule => only a potential defect:
  1339 + # (according to MS-CFB, may be != 0 for applications supporting file
  1340 + # transactions)
  1341 + if self.signature != 0:
  1342 + self._raise_defect(DEFECT_POTENTIAL, "incorrect OLE header (signature>0)")
  1343 + debug( "MiniSectorCutoff = %d" % self.MiniSectorCutoff )
  1344 + # MS-CFB: This integer field MUST be set to 0x00001000. This field
  1345 + # specifies the maximum size of a user-defined data stream allocated
  1346 + # from the mini FAT and mini stream, and that cutoff is 4096 bytes.
  1347 + # Any user-defined data stream larger than or equal to this cutoff size
  1348 + # must be allocated as normal sectors from the FAT.
  1349 + if self.MiniSectorCutoff != 0x1000:
  1350 + self._raise_defect(DEFECT_INCORRECT, "incorrect MiniSectorCutoff in OLE header")
  1351 + debug( "MiniFatStart = %X" % self.MiniFatStart )
  1352 + debug( "csectMiniFat = %d" % self.csectMiniFat )
  1353 + debug( "sectDifStart = %X" % self.sectDifStart )
  1354 + debug( "csectDif = %d" % self.csectDif )
  1355 +
  1356 + # calculate the number of sectors in the file
  1357 + # (-1 because header doesn't count)
  1358 + self.nb_sect = ( (filesize + self.SectorSize-1) // self.SectorSize) - 1
  1359 + debug( "Number of sectors in the file: %d" % self.nb_sect )
  1360 + #TODO: change this test, because an OLE file MAY contain other data
  1361 + # after the last sector.
  1362 +
  1363 + # file clsid
  1364 + self.clsid = _clsid(header[8:24])
  1365 +
  1366 + #TODO: remove redundant attributes, and fix the code which uses them?
  1367 + self.sectorsize = self.SectorSize #1 << i16(header, 30)
  1368 + self.minisectorsize = self.MiniSectorSize #1 << i16(header, 32)
  1369 + self.minisectorcutoff = self.MiniSectorCutoff # i32(header, 56)
  1370 +
  1371 + # check known streams for duplicate references (these are always in FAT,
  1372 + # never in MiniFAT):
  1373 + self._check_duplicate_stream(self.sectDirStart)
  1374 + # check MiniFAT only if it is not empty:
  1375 + if self.csectMiniFat:
  1376 + self._check_duplicate_stream(self.MiniFatStart)
  1377 + # check DIFAT only if it is not empty:
  1378 + if self.csectDif:
  1379 + self._check_duplicate_stream(self.sectDifStart)
  1380 +
  1381 + # Load file allocation tables
  1382 + self.loadfat(header)
  1383 + # Load direcory. This sets both the direntries list (ordered by sid)
  1384 + # and the root (ordered by hierarchy) members.
  1385 + self.loaddirectory(self.sectDirStart)#i32(header, 48))
  1386 + self.ministream = None
  1387 + self.minifatsect = self.MiniFatStart #i32(header, 60)
  1388 +
  1389 +
  1390 + def close(self):
  1391 + """
  1392 + close the OLE file, to release the file object
  1393 + """
  1394 + self.fp.close()
  1395 +
  1396 +
  1397 + def _check_duplicate_stream(self, first_sect, minifat=False):
  1398 + """
  1399 + Checks if a stream has not been already referenced elsewhere.
  1400 + This method should only be called once for each known stream, and only
  1401 + if stream size is not null.
  1402 +
  1403 + :param first_sect: int, index of first sector of the stream in FAT
  1404 + :param minifat: bool, if True, stream is located in the MiniFAT, else in the FAT
  1405 + """
  1406 + if minifat:
  1407 + debug('_check_duplicate_stream: sect=%d in MiniFAT' % first_sect)
  1408 + used_streams = self._used_streams_minifat
  1409 + else:
  1410 + debug('_check_duplicate_stream: sect=%d in FAT' % first_sect)
  1411 + # some values can be safely ignored (not a real stream):
  1412 + if first_sect in (DIFSECT,FATSECT,ENDOFCHAIN,FREESECT):
  1413 + return
  1414 + used_streams = self._used_streams_fat
  1415 + #TODO: would it be more efficient using a dict or hash values, instead
  1416 + # of a list of long ?
  1417 + if first_sect in used_streams:
  1418 + self._raise_defect(DEFECT_INCORRECT, 'Stream referenced twice')
  1419 + else:
  1420 + used_streams.append(first_sect)
  1421 +
  1422 +
  1423 + def dumpfat(self, fat, firstindex=0):
  1424 + "Displays a part of FAT in human-readable form for debugging purpose"
  1425 + # [PL] added only for debug
  1426 + if not DEBUG_MODE:
  1427 + return
  1428 + # dictionary to convert special FAT values in human-readable strings
  1429 + VPL = 8 # values per line (8+1 * 8+1 = 81)
  1430 + fatnames = {
  1431 + FREESECT: "..free..",
  1432 + ENDOFCHAIN: "[ END. ]",
  1433 + FATSECT: "FATSECT ",
  1434 + DIFSECT: "DIFSECT "
  1435 + }
  1436 + nbsect = len(fat)
  1437 + nlines = (nbsect+VPL-1)//VPL
  1438 + print("index", end=" ")
  1439 + for i in range(VPL):
  1440 + print("%8X" % i, end=" ")
  1441 + print()
  1442 + for l in range(nlines):
  1443 + index = l*VPL
  1444 + print("%8X:" % (firstindex+index), end=" ")
  1445 + for i in range(index, index+VPL):
  1446 + if i>=nbsect:
  1447 + break
  1448 + sect = fat[i]
  1449 + aux = sect & 0xFFFFFFFF # JYTHON-WORKAROUND
  1450 + if aux in fatnames:
  1451 + name = fatnames[aux]
  1452 + else:
  1453 + if sect == i+1:
  1454 + name = " --->"
  1455 + else:
  1456 + name = "%8X" % sect
  1457 + print(name, end=" ")
  1458 + print()
  1459 +
  1460 +
  1461 + def dumpsect(self, sector, firstindex=0):
  1462 + "Displays a sector in a human-readable form, for debugging purpose."
  1463 + if not DEBUG_MODE:
  1464 + return
  1465 + VPL=8 # number of values per line (8+1 * 8+1 = 81)
  1466 + tab = array.array(UINT32, sector)
  1467 + if sys.byteorder == 'big':
  1468 + tab.byteswap()
  1469 + nbsect = len(tab)
  1470 + nlines = (nbsect+VPL-1)//VPL
  1471 + print("index", end=" ")
  1472 + for i in range(VPL):
  1473 + print("%8X" % i, end=" ")
  1474 + print()
  1475 + for l in range(nlines):
  1476 + index = l*VPL
  1477 + print("%8X:" % (firstindex+index), end=" ")
  1478 + for i in range(index, index+VPL):
  1479 + if i>=nbsect:
  1480 + break
  1481 + sect = tab[i]
  1482 + name = "%8X" % sect
  1483 + print(name, end=" ")
  1484 + print()
  1485 +
  1486 + def sect2array(self, sect):
  1487 + """
  1488 + convert a sector to an array of 32 bits unsigned integers,
  1489 + swapping bytes on big endian CPUs such as PowerPC (old Macs)
  1490 + """
  1491 + a = array.array(UINT32, sect)
  1492 + # if CPU is big endian, swap bytes:
  1493 + if sys.byteorder == 'big':
  1494 + a.byteswap()
  1495 + return a
  1496 +
  1497 +
  1498 + def loadfat_sect(self, sect):
  1499 + """
  1500 + Adds the indexes of the given sector to the FAT
  1501 +
  1502 + :param sect: string containing the first FAT sector, or array of long integers
  1503 + :returns: index of last FAT sector.
  1504 + """
  1505 + # a FAT sector is an array of ulong integers.
  1506 + if isinstance(sect, array.array):
  1507 + # if sect is already an array it is directly used
  1508 + fat1 = sect
  1509 + else:
  1510 + # if it's a raw sector, it is parsed in an array
  1511 + fat1 = self.sect2array(sect)
  1512 + self.dumpsect(sect)
  1513 + # The FAT is a sector chain starting at the first index of itself.
  1514 + for isect in fat1:
  1515 + isect = isect & 0xFFFFFFFF # JYTHON-WORKAROUND
  1516 + debug("isect = %X" % isect)
  1517 + if isect == ENDOFCHAIN or isect == FREESECT:
  1518 + # the end of the sector chain has been reached
  1519 + debug("found end of sector chain")
  1520 + break
  1521 + # read the FAT sector
  1522 + s = self.getsect(isect)
  1523 + # parse it as an array of 32 bits integers, and add it to the
  1524 + # global FAT array
  1525 + nextfat = self.sect2array(s)
  1526 + self.fat = self.fat + nextfat
  1527 + return isect
  1528 +
  1529 +
  1530 + def loadfat(self, header):
  1531 + """
  1532 + Load the FAT table.
  1533 + """
  1534 + # The 1st sector of the file contains sector numbers for the first 109
  1535 + # FAT sectors, right after the header which is 76 bytes long.
  1536 + # (always 109, whatever the sector size: 512 bytes = 76+4*109)
  1537 + # Additional sectors are described by DIF blocks
  1538 +
  1539 + sect = header[76:512]
  1540 + debug( "len(sect)=%d, so %d integers" % (len(sect), len(sect)//4) )
  1541 + #fat = []
  1542 + # [PL] FAT is an array of 32 bits unsigned ints, it's more effective
  1543 + # to use an array than a list in Python.
  1544 + # It's initialized as empty first:
  1545 + self.fat = array.array(UINT32)
  1546 + self.loadfat_sect(sect)
  1547 + #self.dumpfat(self.fat)
  1548 +## for i in range(0, len(sect), 4):
  1549 +## ix = i32(sect, i)
  1550 +## #[PL] if ix == -2 or ix == -1: # ix == 0xFFFFFFFE or ix == 0xFFFFFFFF:
  1551 +## if ix == 0xFFFFFFFE or ix == 0xFFFFFFFF:
  1552 +## break
  1553 +## s = self.getsect(ix)
  1554 +## #fat = fat + [i32(s, i) for i in range(0, len(s), 4)]
  1555 +## fat = fat + array.array(UINT32, s)
  1556 + if self.csectDif != 0:
  1557 + # [PL] There's a DIFAT because file is larger than 6.8MB
  1558 + # some checks just in case:
  1559 + if self.csectFat <= 109:
  1560 + # there must be at least 109 blocks in header and the rest in
  1561 + # DIFAT, so number of sectors must be >109.
  1562 + self._raise_defect(DEFECT_INCORRECT, 'incorrect DIFAT, not enough sectors')
  1563 + if self.sectDifStart >= self.nb_sect:
  1564 + # initial DIFAT block index must be valid
  1565 + self._raise_defect(DEFECT_FATAL, 'incorrect DIFAT, first index out of range')
  1566 + debug( "DIFAT analysis..." )
  1567 + # We compute the necessary number of DIFAT sectors :
  1568 + # Number of pointers per DIFAT sector = (sectorsize/4)-1
  1569 + # (-1 because the last pointer is the next DIFAT sector number)
  1570 + nb_difat_sectors = (self.sectorsize//4)-1
  1571 + # (if 512 bytes: each DIFAT sector = 127 pointers + 1 towards next DIFAT sector)
  1572 + nb_difat = (self.csectFat-109 + nb_difat_sectors-1)//nb_difat_sectors
  1573 + debug( "nb_difat = %d" % nb_difat )
  1574 + if self.csectDif != nb_difat:
  1575 + raise IOError('incorrect DIFAT')
  1576 + isect_difat = self.sectDifStart
  1577 + for i in iterrange(nb_difat):
  1578 + debug( "DIFAT block %d, sector %X" % (i, isect_difat) )
  1579 + #TODO: check if corresponding FAT SID = DIFSECT
  1580 + sector_difat = self.getsect(isect_difat)
  1581 + difat = self.sect2array(sector_difat)
  1582 + self.dumpsect(sector_difat)
  1583 + self.loadfat_sect(difat[:nb_difat_sectors])
  1584 + # last DIFAT pointer is next DIFAT sector:
  1585 + isect_difat = difat[nb_difat_sectors]
  1586 + debug( "next DIFAT sector: %X" % isect_difat )
  1587 + # checks:
  1588 + if isect_difat not in [ENDOFCHAIN, FREESECT]:
  1589 + # last DIFAT pointer value must be ENDOFCHAIN or FREESECT
  1590 + raise IOError('incorrect end of DIFAT')
  1591 +## if len(self.fat) != self.csectFat:
  1592 +## # FAT should contain csectFat blocks
  1593 +## print("FAT length: %d instead of %d" % (len(self.fat), self.csectFat))
  1594 +## raise IOError('incorrect DIFAT')
  1595 + # since FAT is read from fixed-size sectors, it may contain more values
  1596 + # than the actual number of sectors in the file.
  1597 + # Keep only the relevant sector indexes:
  1598 + if len(self.fat) > self.nb_sect:
  1599 + debug('len(fat)=%d, shrunk to nb_sect=%d' % (len(self.fat), self.nb_sect))
  1600 + self.fat = self.fat[:self.nb_sect]
  1601 + debug('\nFAT:')
  1602 + self.dumpfat(self.fat)
  1603 +
  1604 +
  1605 + def loadminifat(self):
  1606 + """
  1607 + Load the MiniFAT table.
  1608 + """
  1609 + # MiniFAT is stored in a standard sub-stream, pointed to by a header
  1610 + # field.
  1611 + # NOTE: there are two sizes to take into account for this stream:
  1612 + # 1) Stream size is calculated according to the number of sectors
  1613 + # declared in the OLE header. This allocated stream may be more than
  1614 + # needed to store the actual sector indexes.
  1615 + # (self.csectMiniFat is the number of sectors of size self.SectorSize)
  1616 + stream_size = self.csectMiniFat * self.SectorSize
  1617 + # 2) Actually used size is calculated by dividing the MiniStream size
  1618 + # (given by root entry size) by the size of mini sectors, *4 for
  1619 + # 32 bits indexes:
  1620 + nb_minisectors = (self.root.size + self.MiniSectorSize-1) // self.MiniSectorSize
  1621 + used_size = nb_minisectors * 4
  1622 + debug('loadminifat(): minifatsect=%d, nb FAT sectors=%d, used_size=%d, stream_size=%d, nb MiniSectors=%d' %
  1623 + (self.minifatsect, self.csectMiniFat, used_size, stream_size, nb_minisectors))
  1624 + if used_size > stream_size:
  1625 + # This is not really a problem, but may indicate a wrong implementation:
  1626 + self._raise_defect(DEFECT_INCORRECT, 'OLE MiniStream is larger than MiniFAT')
  1627 + # In any case, first read stream_size:
  1628 + s = self._open(self.minifatsect, stream_size, force_FAT=True).read()
  1629 + #[PL] Old code replaced by an array:
  1630 + #self.minifat = [i32(s, i) for i in range(0, len(s), 4)]
  1631 + self.minifat = self.sect2array(s)
  1632 + # Then shrink the array to used size, to avoid indexes out of MiniStream:
  1633 + debug('MiniFAT shrunk from %d to %d sectors' % (len(self.minifat), nb_minisectors))
  1634 + self.minifat = self.minifat[:nb_minisectors]
  1635 + debug('loadminifat(): len=%d' % len(self.minifat))
  1636 + debug('\nMiniFAT:')
  1637 + self.dumpfat(self.minifat)
  1638 +
  1639 + def getsect(self, sect):
  1640 + """
  1641 + Read given sector from file on disk.
  1642 +
  1643 + :param sect: int, sector index
  1644 + :returns: a string containing the sector data.
  1645 + """
  1646 + # From [MS-CFB]: A sector number can be converted into a byte offset
  1647 + # into the file by using the following formula:
  1648 + # (sector number + 1) x Sector Size.
  1649 + # This implies that sector #0 of the file begins at byte offset Sector
  1650 + # Size, not at 0.
  1651 +
  1652 + # [PL] the original code in PIL was wrong when sectors are 4KB instead of
  1653 + # 512 bytes:
  1654 + #self.fp.seek(512 + self.sectorsize * sect)
  1655 + #[PL]: added safety checks:
  1656 + #print("getsect(%X)" % sect)
  1657 + try:
  1658 + self.fp.seek(self.sectorsize * (sect+1))
  1659 + except:
  1660 + debug('getsect(): sect=%X, seek=%d, filesize=%d' %
  1661 + (sect, self.sectorsize*(sect+1), self._filesize))
  1662 + self._raise_defect(DEFECT_FATAL, 'OLE sector index out of range')
  1663 + sector = self.fp.read(self.sectorsize)
  1664 + if len(sector) != self.sectorsize:
  1665 + debug('getsect(): sect=%X, read=%d, sectorsize=%d' %
  1666 + (sect, len(sector), self.sectorsize))
  1667 + self._raise_defect(DEFECT_FATAL, 'incomplete OLE sector')
  1668 + return sector
  1669 +
  1670 +
  1671 + def write_sect(self, sect, data, padding=b'\x00'):
  1672 + """
  1673 + Write given sector to file on disk.
  1674 +
  1675 + :param sect: int, sector index
  1676 + :param data: bytes, sector data
  1677 + :param padding: single byte, padding character if data < sector size
  1678 + """
  1679 + if not isinstance(data, bytes):
  1680 + raise TypeError("write_sect: data must be a bytes string")
  1681 + if not isinstance(padding, bytes) or len(padding)!=1:
  1682 + raise TypeError("write_sect: padding must be a bytes string of 1 char")
  1683 + #TODO: we could allow padding=None for no padding at all
  1684 + try:
  1685 + self.fp.seek(self.sectorsize * (sect+1))
  1686 + except:
  1687 + debug('write_sect(): sect=%X, seek=%d, filesize=%d' %
  1688 + (sect, self.sectorsize*(sect+1), self._filesize))
  1689 + self._raise_defect(DEFECT_FATAL, 'OLE sector index out of range')
  1690 + if len(data) < self.sectorsize:
  1691 + # add padding
  1692 + data += padding * (self.sectorsize - len(data))
  1693 + elif len(data) < self.sectorsize:
  1694 + raise ValueError("Data is larger than sector size")
  1695 + self.fp.write(data)
  1696 +
  1697 +
  1698 + def loaddirectory(self, sect):
  1699 + """
  1700 + Load the directory.
  1701 +
  1702 + :param sect: sector index of directory stream.
  1703 + """
  1704 + # The directory is stored in a standard
  1705 + # substream, independent of its size.
  1706 +
  1707 + # open directory stream as a read-only file:
  1708 + # (stream size is not known in advance)
  1709 + self.directory_fp = self._open(sect)
  1710 +
  1711 + #[PL] to detect malformed documents and avoid DoS attacks, the maximum
  1712 + # number of directory entries can be calculated:
  1713 + max_entries = self.directory_fp.size // 128
  1714 + debug('loaddirectory: size=%d, max_entries=%d' %
  1715 + (self.directory_fp.size, max_entries))
  1716 +
  1717 + # Create list of directory entries
  1718 + #self.direntries = []
  1719 + # We start with a list of "None" object
  1720 + self.direntries = [None] * max_entries
  1721 +## for sid in iterrange(max_entries):
  1722 +## entry = fp.read(128)
  1723 +## if not entry:
  1724 +## break
  1725 +## self.direntries.append(_OleDirectoryEntry(entry, sid, self))
  1726 + # load root entry:
  1727 + root_entry = self._load_direntry(0)
  1728 + # Root entry is the first entry:
  1729 + self.root = self.direntries[0]
  1730 + # read and build all storage trees, starting from the root:
  1731 + self.root.build_storage_tree()
  1732 +
  1733 +
  1734 + def _load_direntry (self, sid):
  1735 + """
  1736 + Load a directory entry from the directory.
  1737 + This method should only be called once for each storage/stream when
  1738 + loading the directory.
  1739 +
  1740 + :param sid: index of storage/stream in the directory.
  1741 + :returns: a _OleDirectoryEntry object
  1742 +
  1743 + :exception IOError: if the entry has always been referenced.
  1744 + """
  1745 + # check if SID is OK:
  1746 + if sid<0 or sid>=len(self.direntries):
  1747 + self._raise_defect(DEFECT_FATAL, "OLE directory index out of range")
  1748 + # check if entry was already referenced:
  1749 + if self.direntries[sid] is not None:
  1750 + self._raise_defect(DEFECT_INCORRECT,
  1751 + "double reference for OLE stream/storage")
  1752 + # if exception not raised, return the object
  1753 + return self.direntries[sid]
  1754 + self.directory_fp.seek(sid * 128)
  1755 + entry = self.directory_fp.read(128)
  1756 + self.direntries[sid] = _OleDirectoryEntry(entry, sid, self)
  1757 + return self.direntries[sid]
  1758 +
  1759 +
  1760 + def dumpdirectory(self):
  1761 + """
  1762 + Dump directory (for debugging only)
  1763 + """
  1764 + self.root.dump()
  1765 +
  1766 +
  1767 + def _open(self, start, size = 0x7FFFFFFF, force_FAT=False):
  1768 + """
  1769 + Open a stream, either in FAT or MiniFAT according to its size.
  1770 + (openstream helper)
  1771 +
  1772 + :param start: index of first sector
  1773 + :param size: size of stream (or nothing if size is unknown)
  1774 + :param force_FAT: if False (default), stream will be opened in FAT or MiniFAT
  1775 + according to size. If True, it will always be opened in FAT.
  1776 + """
  1777 + debug('OleFileIO.open(): sect=%d, size=%d, force_FAT=%s' %
  1778 + (start, size, str(force_FAT)))
  1779 + # stream size is compared to the MiniSectorCutoff threshold:
  1780 + if size < self.minisectorcutoff and not force_FAT:
  1781 + # ministream object
  1782 + if not self.ministream:
  1783 + # load MiniFAT if it wasn't already done:
  1784 + self.loadminifat()
  1785 + # The first sector index of the miniFAT stream is stored in the
  1786 + # root directory entry:
  1787 + size_ministream = self.root.size
  1788 + debug('Opening MiniStream: sect=%d, size=%d' %
  1789 + (self.root.isectStart, size_ministream))
  1790 + self.ministream = self._open(self.root.isectStart,
  1791 + size_ministream, force_FAT=True)
  1792 + return _OleStream(fp=self.ministream, sect=start, size=size,
  1793 + offset=0, sectorsize=self.minisectorsize,
  1794 + fat=self.minifat, filesize=self.ministream.size)
  1795 + else:
  1796 + # standard stream
  1797 + return _OleStream(fp=self.fp, sect=start, size=size,
  1798 + offset=self.sectorsize,
  1799 + sectorsize=self.sectorsize, fat=self.fat,
  1800 + filesize=self._filesize)
  1801 +
  1802 +
  1803 + def _list(self, files, prefix, node, streams=True, storages=False):
  1804 + """
  1805 + listdir helper
  1806 +
  1807 + :param files: list of files to fill in
  1808 + :param prefix: current location in storage tree (list of names)
  1809 + :param node: current node (_OleDirectoryEntry object)
  1810 + :param streams: bool, include streams if True (True by default) - new in v0.26
  1811 + :param storages: bool, include storages if True (False by default) - new in v0.26
  1812 + (note: the root storage is never included)
  1813 + """
  1814 + prefix = prefix + [node.name]
  1815 + for entry in node.kids:
  1816 + if entry.kids:
  1817 + # this is a storage
  1818 + if storages:
  1819 + # add it to the list
  1820 + files.append(prefix[1:] + [entry.name])
  1821 + # check its kids
  1822 + self._list(files, prefix, entry, streams, storages)
  1823 + else:
  1824 + # this is a stream
  1825 + if streams:
  1826 + # add it to the list
  1827 + files.append(prefix[1:] + [entry.name])
  1828 +
  1829 +
  1830 + def listdir(self, streams=True, storages=False):
  1831 + """
  1832 + Return a list of streams and/or storages stored in this file
  1833 +
  1834 + :param streams: bool, include streams if True (True by default) - new in v0.26
  1835 + :param storages: bool, include storages if True (False by default) - new in v0.26
  1836 + (note: the root storage is never included)
  1837 + :returns: list of stream and/or storage paths
  1838 + """
  1839 + files = []
  1840 + self._list(files, [], self.root, streams, storages)
  1841 + return files
  1842 +
  1843 +
  1844 + def _find(self, filename):
  1845 + """
  1846 + Returns directory entry of given filename. (openstream helper)
  1847 + Note: this method is case-insensitive.
  1848 +
  1849 + :param filename: path of stream in storage tree (except root entry), either:
  1850 +
  1851 + - a string using Unix path syntax, for example:
  1852 + 'storage_1/storage_1.2/stream'
  1853 + - or a list of storage filenames, path to the desired stream/storage.
  1854 + Example: ['storage_1', 'storage_1.2', 'stream']
  1855 +
  1856 + :returns: sid of requested filename
  1857 + :exception IOError: if file not found
  1858 + """
  1859 +
  1860 + # if filename is a string instead of a list, split it on slashes to
  1861 + # convert to a list:
  1862 + if isinstance(filename, basestring):
  1863 + filename = filename.split('/')
  1864 + # walk across storage tree, following given path:
  1865 + node = self.root
  1866 + for name in filename:
  1867 + for kid in node.kids:
  1868 + if kid.name.lower() == name.lower():
  1869 + break
  1870 + else:
  1871 + raise IOError("file not found")
  1872 + node = kid
  1873 + return node.sid
  1874 +
  1875 +
  1876 + def openstream(self, filename):
  1877 + """
  1878 + Open a stream as a read-only file object (BytesIO).
  1879 + Note: filename is case-insensitive.
  1880 +
  1881 + :param filename: path of stream in storage tree (except root entry), either:
  1882 +
  1883 + - a string using Unix path syntax, for example:
  1884 + 'storage_1/storage_1.2/stream'
  1885 + - or a list of storage filenames, path to the desired stream/storage.
  1886 + Example: ['storage_1', 'storage_1.2', 'stream']
  1887 +
  1888 + :returns: file object (read-only)
  1889 + :exception IOError: if filename not found, or if this is not a stream.
  1890 + """
  1891 + sid = self._find(filename)
  1892 + entry = self.direntries[sid]
  1893 + if entry.entry_type != STGTY_STREAM:
  1894 + raise IOError("this file is not a stream")
  1895 + return self._open(entry.isectStart, entry.size)
  1896 +
  1897 +
  1898 + def write_stream(self, stream_name, data):
  1899 + """
  1900 + Write a stream to disk. For now, it is only possible to replace an
  1901 + existing stream by data of the same size.
  1902 +
  1903 + :param stream_name: path of stream in storage tree (except root entry), either:
  1904 +
  1905 + - a string using Unix path syntax, for example:
  1906 + 'storage_1/storage_1.2/stream'
  1907 + - or a list of storage filenames, path to the desired stream/storage.
  1908 + Example: ['storage_1', 'storage_1.2', 'stream']
  1909 +
  1910 + :param data: bytes, data to be written, must be the same size as the original
  1911 + stream.
  1912 + """
  1913 + if not isinstance(data, bytes):
  1914 + raise TypeError("write_stream: data must be a bytes string")
  1915 + sid = self._find(stream_name)
  1916 + entry = self.direntries[sid]
  1917 + if entry.entry_type != STGTY_STREAM:
  1918 + raise IOError("this is not a stream")
  1919 + size = entry.size
  1920 + if size != len(data):
  1921 + raise ValueError("write_stream: data must be the same size as the existing stream")
  1922 + if size < self.minisectorcutoff:
  1923 + raise NotImplementedError("Writing a stream in MiniFAT is not implemented yet")
  1924 + sect = entry.isectStart
  1925 + # number of sectors to write
  1926 + nb_sectors = (size + (self.sectorsize-1)) // self.sectorsize
  1927 + debug('nb_sectors = %d' % nb_sectors)
  1928 + for i in range(nb_sectors):
  1929 +## try:
  1930 +## self.fp.seek(offset + self.sectorsize * sect)
  1931 +## except:
  1932 +## debug('sect=%d, seek=%d' %
  1933 +## (sect, offset+self.sectorsize*sect))
  1934 +## raise IOError('OLE sector index out of range')
  1935 + # extract one sector from data, the last one being smaller:
  1936 + if i<(nb_sectors-1):
  1937 + data_sector = data [i*self.sectorsize : (i+1)*self.sectorsize]
  1938 + #TODO: comment this if it works
  1939 + assert(len(data_sector)==self.sectorsize)
  1940 + else:
  1941 + data_sector = data [i*self.sectorsize:]
  1942 + #TODO: comment this if it works
  1943 + debug('write_stream: size=%d sectorsize=%d data_sector=%d size%%sectorsize=%d'
  1944 + % (size, self.sectorsize, len(data_sector), size % self.sectorsize))
  1945 + assert(len(data_sector) % self.sectorsize==size % self.sectorsize)
  1946 + self.write_sect(sect, data_sector)
  1947 +## self.fp.write(data_sector)
  1948 + # jump to next sector in the FAT:
  1949 + try:
  1950 + sect = self.fat[sect]
  1951 + except IndexError:
  1952 + # [PL] if pointer is out of the FAT an exception is raised
  1953 + raise IOError('incorrect OLE FAT, sector index out of range')
  1954 + #[PL] Last sector should be a "end of chain" marker:
  1955 + if sect != ENDOFCHAIN:
  1956 + raise IOError('incorrect last sector index in OLE stream')
  1957 +
  1958 +
  1959 + def get_type(self, filename):
  1960 + """
  1961 + Test if given filename exists as a stream or a storage in the OLE
  1962 + container, and return its type.
  1963 +
  1964 + :param filename: path of stream in storage tree. (see openstream for syntax)
  1965 + :returns: False if object does not exist, its entry type (>0) otherwise:
  1966 +
  1967 + - STGTY_STREAM: a stream
  1968 + - STGTY_STORAGE: a storage
  1969 + - STGTY_ROOT: the root entry
  1970 + """
  1971 + try:
  1972 + sid = self._find(filename)
  1973 + entry = self.direntries[sid]
  1974 + return entry.entry_type
  1975 + except:
  1976 + return False
  1977 +
  1978 +
  1979 + def getmtime(self, filename):
  1980 + """
  1981 + Return modification time of a stream/storage.
  1982 +
  1983 + :param filename: path of stream/storage in storage tree. (see openstream for
  1984 + syntax)
  1985 + :returns: None if modification time is null, a python datetime object
  1986 + otherwise (UTC timezone)
  1987 +
  1988 + new in version 0.26
  1989 + """
  1990 + sid = self._find(filename)
  1991 + entry = self.direntries[sid]
  1992 + return entry.getmtime()
  1993 +
  1994 +
  1995 + def getctime(self, filename):
  1996 + """
  1997 + Return creation time of a stream/storage.
  1998 +
  1999 + :param filename: path of stream/storage in storage tree. (see openstream for
  2000 + syntax)
  2001 + :returns: None if creation time is null, a python datetime object
  2002 + otherwise (UTC timezone)
  2003 +
  2004 + new in version 0.26
  2005 + """
  2006 + sid = self._find(filename)
  2007 + entry = self.direntries[sid]
  2008 + return entry.getctime()
  2009 +
  2010 +
  2011 + def exists(self, filename):
  2012 + """
  2013 + Test if given filename exists as a stream or a storage in the OLE
  2014 + container.
  2015 + Note: filename is case-insensitive.
  2016 +
  2017 + :param filename: path of stream in storage tree. (see openstream for syntax)
  2018 + :returns: True if object exist, else False.
  2019 + """
  2020 + try:
  2021 + sid = self._find(filename)
  2022 + return True
  2023 + except:
  2024 + return False
  2025 +
  2026 +
  2027 + def get_size(self, filename):
  2028 + """
  2029 + Return size of a stream in the OLE container, in bytes.
  2030 +
  2031 + :param filename: path of stream in storage tree (see openstream for syntax)
  2032 + :returns: size in bytes (long integer)
  2033 + :exception IOError: if file not found
  2034 + :exception TypeError: if this is not a stream.
  2035 + """
  2036 + sid = self._find(filename)
  2037 + entry = self.direntries[sid]
  2038 + if entry.entry_type != STGTY_STREAM:
  2039 + #TODO: Should it return zero instead of raising an exception ?
  2040 + raise TypeError('object is not an OLE stream')
  2041 + return entry.size
  2042 +
  2043 +
  2044 + def get_rootentry_name(self):
  2045 + """
  2046 + Return root entry name. Should usually be 'Root Entry' or 'R' in most
  2047 + implementations.
  2048 + """
  2049 + return self.root.name
  2050 +
  2051 +
  2052 + def getproperties(self, filename, convert_time=False, no_conversion=None):
  2053 + """
  2054 + Return properties described in substream.
  2055 +
  2056 + :param filename: path of stream in storage tree (see openstream for syntax)
  2057 + :param convert_time: bool, if True timestamps will be converted to Python datetime
  2058 + :param no_conversion: None or list of int, timestamps not to be converted
  2059 + (for example total editing time is not a real timestamp)
  2060 +
  2061 + :returns: a dictionary of values indexed by id (integer)
  2062 + """
  2063 + # make sure no_conversion is a list, just to simplify code below:
  2064 + if no_conversion == None:
  2065 + no_conversion = []
  2066 + # stream path as a string to report exceptions:
  2067 + streampath = filename
  2068 + if not isinstance(streampath, str):
  2069 + streampath = '/'.join(streampath)
  2070 +
  2071 + fp = self.openstream(filename)
  2072 +
  2073 + data = {}
  2074 +
  2075 + try:
  2076 + # header
  2077 + s = fp.read(28)
  2078 + clsid = _clsid(s[8:24])
  2079 +
  2080 + # format id
  2081 + s = fp.read(20)
  2082 + fmtid = _clsid(s[:16])
  2083 + fp.seek(i32(s, 16))
  2084 +
  2085 + # get section
  2086 + s = b"****" + fp.read(i32(fp.read(4))-4)
  2087 + # number of properties:
  2088 + num_props = i32(s, 4)
  2089 + except BaseException as exc:
  2090 + # catch exception while parsing property header, and only raise
  2091 + # a DEFECT_INCORRECT then return an empty dict, because this is not
  2092 + # a fatal error when parsing the whole file
  2093 + msg = 'Error while parsing properties header in stream %s: %s' % (
  2094 + repr(streampath), exc)
  2095 + self._raise_defect(DEFECT_INCORRECT, msg, type(exc))
  2096 + return data
  2097 +
  2098 + for i in range(num_props):
  2099 + try:
  2100 + id = 0 # just in case of an exception
  2101 + id = i32(s, 8+i*8)
  2102 + offset = i32(s, 12+i*8)
  2103 + type = i32(s, offset)
  2104 +
  2105 + debug ('property id=%d: type=%d offset=%X' % (id, type, offset))
  2106 +
  2107 + # test for common types first (should perhaps use
  2108 + # a dictionary instead?)
  2109 +
  2110 + if type == VT_I2: # 16-bit signed integer
  2111 + value = i16(s, offset+4)
  2112 + if value >= 32768:
  2113 + value = value - 65536
  2114 + elif type == VT_UI2: # 2-byte unsigned integer
  2115 + value = i16(s, offset+4)
  2116 + elif type in (VT_I4, VT_INT, VT_ERROR):
  2117 + # VT_I4: 32-bit signed integer
  2118 + # VT_ERROR: HRESULT, similar to 32-bit signed integer,
  2119 + # see http://msdn.microsoft.com/en-us/library/cc230330.aspx
  2120 + value = i32(s, offset+4)
  2121 + elif type in (VT_UI4, VT_UINT): # 4-byte unsigned integer
  2122 + value = i32(s, offset+4) # FIXME
  2123 + elif type in (VT_BSTR, VT_LPSTR):
  2124 + # CodePageString, see http://msdn.microsoft.com/en-us/library/dd942354.aspx
  2125 + # size is a 32 bits integer, including the null terminator, and
  2126 + # possibly trailing or embedded null chars
  2127 + #TODO: if codepage is unicode, the string should be converted as such
  2128 + count = i32(s, offset+4)
  2129 + value = s[offset+8:offset+8+count-1]
  2130 + # remove all null chars:
  2131 + value = value.replace(b'\x00', b'')
  2132 + elif type == VT_BLOB:
  2133 + # binary large object (BLOB)
  2134 + # see http://msdn.microsoft.com/en-us/library/dd942282.aspx
  2135 + count = i32(s, offset+4)
  2136 + value = s[offset+8:offset+8+count]
  2137 + elif type == VT_LPWSTR:
  2138 + # UnicodeString
  2139 + # see http://msdn.microsoft.com/en-us/library/dd942313.aspx
  2140 + # "the string should NOT contain embedded or additional trailing
  2141 + # null characters."
  2142 + count = i32(s, offset+4)
  2143 + value = _unicode(s[offset+8:offset+8+count*2])
  2144 + elif type == VT_FILETIME:
  2145 + value = long(i32(s, offset+4)) + (long(i32(s, offset+8))<<32)
  2146 + # FILETIME is a 64-bit int: "number of 100ns periods
  2147 + # since Jan 1,1601".
  2148 + if convert_time and id not in no_conversion:
  2149 + debug('Converting property #%d to python datetime, value=%d=%fs'
  2150 + %(id, value, float(value)/10000000))
  2151 + # convert FILETIME to Python datetime.datetime
  2152 + # inspired from http://code.activestate.com/recipes/511425-filetime-to-datetime/
  2153 + _FILETIME_null_date = datetime.datetime(1601, 1, 1, 0, 0, 0)
  2154 + debug('timedelta days=%d' % (value//(10*1000000*3600*24)))
  2155 + value = _FILETIME_null_date + datetime.timedelta(microseconds=value//10)
  2156 + else:
  2157 + # legacy code kept for backward compatibility: returns a
  2158 + # number of seconds since Jan 1,1601
  2159 + value = value // 10000000 # seconds
  2160 + elif type == VT_UI1: # 1-byte unsigned integer
  2161 + value = i8(s[offset+4])
  2162 + elif type == VT_CLSID:
  2163 + value = _clsid(s[offset+4:offset+20])
  2164 + elif type == VT_CF:
  2165 + # PropertyIdentifier or ClipboardData??
  2166 + # see http://msdn.microsoft.com/en-us/library/dd941945.aspx
  2167 + count = i32(s, offset+4)
  2168 + value = s[offset+8:offset+8+count]
  2169 + elif type == VT_BOOL:
  2170 + # VARIANT_BOOL, 16 bits bool, 0x0000=Fals, 0xFFFF=True
  2171 + # see http://msdn.microsoft.com/en-us/library/cc237864.aspx
  2172 + value = bool(i16(s, offset+4))
  2173 + else:
  2174 + value = None # everything else yields "None"
  2175 + debug ('property id=%d: type=%d not implemented in parser yet' % (id, type))
  2176 +
  2177 + # missing: VT_EMPTY, VT_NULL, VT_R4, VT_R8, VT_CY, VT_DATE,
  2178 + # VT_DECIMAL, VT_I1, VT_I8, VT_UI8,
  2179 + # see http://msdn.microsoft.com/en-us/library/dd942033.aspx
  2180 +
  2181 + # FIXME: add support for VT_VECTOR
  2182 + # VT_VECTOR is a 32 uint giving the number of items, followed by
  2183 + # the items in sequence. The VT_VECTOR value is combined with the
  2184 + # type of items, e.g. VT_VECTOR|VT_BSTR
  2185 + # see http://msdn.microsoft.com/en-us/library/dd942011.aspx
  2186 +
  2187 + #print("%08x" % id, repr(value), end=" ")
  2188 + #print("(%s)" % VT[i32(s, offset) & 0xFFF])
  2189 +
  2190 + data[id] = value
  2191 + except BaseException as exc:
  2192 + # catch exception while parsing each property, and only raise
  2193 + # a DEFECT_INCORRECT, because parsing can go on
  2194 + msg = 'Error while parsing property id %d in stream %s: %s' % (
  2195 + id, repr(streampath), exc)
  2196 + self._raise_defect(DEFECT_INCORRECT, msg, type(exc))
  2197 +
  2198 + return data
  2199 +
  2200 + def get_metadata(self):
  2201 + """
  2202 + Parse standard properties streams, return an OleMetadata object
  2203 + containing all the available metadata.
  2204 + (also stored in the metadata attribute of the OleFileIO object)
  2205 +
  2206 + new in version 0.25
  2207 + """
  2208 + self.metadata = OleMetadata()
  2209 + self.metadata.parse_properties(self)
  2210 + return self.metadata
  2211 +
  2212 +#
  2213 +# --------------------------------------------------------------------
  2214 +# This script can be used to dump the directory of any OLE2 structured
  2215 +# storage file.
  2216 +
  2217 +if __name__ == "__main__":
  2218 +
  2219 + import sys
  2220 +
  2221 + # [PL] display quick usage info if launched from command-line
  2222 + if len(sys.argv) <= 1:
  2223 + print('olefile version %s %s - %s' % (__version__, __date__, __author__))
  2224 + print(
  2225 +"""
  2226 +Launched from the command line, this script parses OLE files and prints info.
  2227 +
  2228 +Usage: olefile.py [-d] [-c] <file> [file2 ...]
  2229 +
  2230 +Options:
  2231 +-d : debug mode (displays a lot of debug information, for developers only)
  2232 +-c : check all streams (for debugging purposes)
  2233 +
  2234 +For more information, see http://www.decalage.info/olefile
  2235 +""")
  2236 + sys.exit()
  2237 +
  2238 + check_streams = False
  2239 + for filename in sys.argv[1:]:
  2240 +## try:
  2241 + # OPTIONS:
  2242 + if filename == '-d':
  2243 + # option to switch debug mode on:
  2244 + set_debug_mode(True)
  2245 + continue
  2246 + if filename == '-c':
  2247 + # option to switch check streams mode on:
  2248 + check_streams = True
  2249 + continue
  2250 +
  2251 + ole = OleFileIO(filename)#, raise_defects=DEFECT_INCORRECT)
  2252 + print("-" * 68)
  2253 + print(filename)
  2254 + print("-" * 68)
  2255 + ole.dumpdirectory()
  2256 + for streamname in ole.listdir():
  2257 + if streamname[-1][0] == "\005":
  2258 + print(streamname, ": properties")
  2259 + props = ole.getproperties(streamname, convert_time=True)
  2260 + props = sorted(props.items())
  2261 + for k, v in props:
  2262 + #[PL]: avoid to display too large or binary values:
  2263 + if isinstance(v, (basestring, bytes)):
  2264 + if len(v) > 50:
  2265 + v = v[:50]
  2266 + if isinstance(v, bytes):
  2267 + # quick and dirty binary check:
  2268 + for c in (1,2,3,4,5,6,7,11,12,14,15,16,17,18,19,20,
  2269 + 21,22,23,24,25,26,27,28,29,30,31):
  2270 + if c in bytearray(v):
  2271 + v = '(binary data)'
  2272 + break
  2273 + print(" ", k, v)
  2274 +
  2275 + if check_streams:
  2276 + # Read all streams to check if there are errors:
  2277 + print('\nChecking streams...')
  2278 + for streamname in ole.listdir():
  2279 + # print name using repr() to convert binary chars to \xNN:
  2280 + print('-', repr('/'.join(streamname)),'-', end=' ')
  2281 + st_type = ole.get_type(streamname)
  2282 + if st_type == STGTY_STREAM:
  2283 + print('size %d' % ole.get_size(streamname))
  2284 + # just try to read stream in memory:
  2285 + ole.openstream(streamname)
  2286 + else:
  2287 + print('NOT a stream : type=%d' % st_type)
  2288 + print()
  2289 +
  2290 +## for streamname in ole.listdir():
  2291 +## # print name using repr() to convert binary chars to \xNN:
  2292 +## print('-', repr('/'.join(streamname)),'-', end=' ')
  2293 +## print(ole.getmtime(streamname))
  2294 +## print()
  2295 +
  2296 + print('Modification/Creation times of all directory entries:')
  2297 + for entry in ole.direntries:
  2298 + if entry is not None:
  2299 + print('- %s: mtime=%s ctime=%s' % (entry.name,
  2300 + entry.getmtime(), entry.getctime()))
  2301 + print()
  2302 +
  2303 + # parse and display metadata:
  2304 + meta = ole.get_metadata()
  2305 + meta.dump()
  2306 + print()
  2307 + #[PL] Test a few new methods:
  2308 + root = ole.get_rootentry_name()
  2309 + print('Root entry name: "%s"' % root)
  2310 + if ole.exists('worddocument'):
  2311 + print("This is a Word document.")
  2312 + print("type of stream 'WordDocument':", ole.get_type('worddocument'))
  2313 + print("size :", ole.get_size('worddocument'))
  2314 + if ole.exists('macros/vba'):
  2315 + print("This document may contain VBA macros.")
  2316 +
  2317 + # print parsing issues:
  2318 + print('\nNon-fatal issues raised during parsing:')
  2319 + if ole.parsing_issues:
  2320 + for exctype, msg in ole.parsing_issues:
  2321 + print('- %s: %s' % (exctype.__name__, msg))
  2322 + else:
  2323 + print('None')
  2324 +## except IOError as v:
  2325 +## print("***", "cannot read", file, "-", v)
  2326 +
  2327 +# this code was developed while listening to The Wedding Present "Sea Monsters"
... ...
oletools/thirdparty/olefile/olefile2.html 0 โ†’ 100644
  1 +
  2 +<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
  3 +<html><head><title>Python: module olefile2</title>
  4 +</head><body bgcolor="#f0f0f8">
  5 +
  6 +<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="heading">
  7 +<tr bgcolor="#7799ee">
  8 +<td valign=bottom>&nbsp;<br>
  9 +<font color="#ffffff" face="helvetica, arial">&nbsp;<br><big><big><strong>olefile2</strong></big></big> (version 0.40py2, 2014-10-01)</font></td
  10 +><td align=right valign=bottom
  11 +><font color="#ffffff" face="helvetica, arial"><a href=".">index</a><br><a href="file:./olefile2.py">.\olefile2.py</a></font></td></tr></table>
  12 + <p><tt>olefile2&nbsp;(formerly&nbsp;OleFileIO_PL2)&nbsp;version&nbsp;0.40py2&nbsp;2014-10-01<br>
  13 +&nbsp;<br>
  14 +Module&nbsp;to&nbsp;read&nbsp;Microsoft&nbsp;OLE2&nbsp;files&nbsp;(also&nbsp;called&nbsp;Structured&nbsp;Storage&nbsp;or<br>
  15 +Microsoft&nbsp;Compound&nbsp;Document&nbsp;File&nbsp;Format),&nbsp;such&nbsp;as&nbsp;Microsoft&nbsp;Office<br>
  16 +documents,&nbsp;Image&nbsp;Composer&nbsp;and&nbsp;FlashPix&nbsp;files,&nbsp;Outlook&nbsp;messages,&nbsp;...<br>
  17 +&nbsp;<br>
  18 +IMPORTANT&nbsp;NOTE:&nbsp;olefile2&nbsp;is&nbsp;an&nbsp;old&nbsp;version&nbsp;of&nbsp;olefile&nbsp;meant&nbsp;to&nbsp;be&nbsp;used<br>
  19 +as&nbsp;fallback&nbsp;for&nbsp;Python&nbsp;2.5&nbsp;and&nbsp;older.&nbsp;For&nbsp;Python&nbsp;2.6,&nbsp;2.7&nbsp;and&nbsp;3.x,&nbsp;please&nbsp;use<br>
  20 +olefile&nbsp;which&nbsp;is&nbsp;more&nbsp;up-to-date.&nbsp;The&nbsp;improvements&nbsp;in&nbsp;olefile&nbsp;might<br>
  21 +not&nbsp;always&nbsp;be&nbsp;backported&nbsp;to&nbsp;olefile2.<br>
  22 +&nbsp;<br>
  23 +Project&nbsp;website:&nbsp;<a href="http://www.decalage.info/python/olefileio">http://www.decalage.info/python/olefileio</a><br>
  24 +&nbsp;<br>
  25 +olefile2&nbsp;is&nbsp;copyright&nbsp;(c)&nbsp;2005-2014&nbsp;Philippe&nbsp;Lagadec&nbsp;(<a href="http://www.decalage.info">http://www.decalage.info</a>)<br>
  26 +&nbsp;<br>
  27 +olefile2&nbsp;is&nbsp;based&nbsp;on&nbsp;the&nbsp;<a href="#OleFileIO">OleFileIO</a>&nbsp;module&nbsp;from&nbsp;the&nbsp;PIL&nbsp;library&nbsp;v1.1.6<br>
  28 +See:&nbsp;<a href="http://www.pythonware.com/products/pil/index.htm">http://www.pythonware.com/products/pil/index.htm</a><br>
  29 +&nbsp;<br>
  30 +The&nbsp;Python&nbsp;Imaging&nbsp;Library&nbsp;(PIL)&nbsp;is<br>
  31 +&nbsp;&nbsp;&nbsp;&nbsp;Copyright&nbsp;(c)&nbsp;1997-2005&nbsp;by&nbsp;Secret&nbsp;Labs&nbsp;AB<br>
  32 +&nbsp;&nbsp;&nbsp;&nbsp;Copyright&nbsp;(c)&nbsp;1995-2005&nbsp;by&nbsp;Fredrik&nbsp;Lundh<br>
  33 +&nbsp;<br>
  34 +See&nbsp;source&nbsp;code&nbsp;and&nbsp;LICENSE.txt&nbsp;for&nbsp;information&nbsp;on&nbsp;usage&nbsp;and&nbsp;redistribution.</tt></p>
  35 +<p>
  36 +<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="section">
  37 +<tr bgcolor="#aa55cc">
  38 +<td colspan=3 valign=bottom>&nbsp;<br>
  39 +<font color="#ffffff" face="helvetica, arial"><big><strong>Modules</strong></big></font></td></tr>
  40 +
  41 +<tr><td bgcolor="#aa55cc"><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</tt></td><td>&nbsp;</td>
  42 +<td width="100%"><table width="100%" summary="list"><tr><td width="25%" valign=top><a href="StringIO.html">StringIO</a><br>
  43 +<a href="array.html">array</a><br>
  44 +</td><td width="25%" valign=top><a href="datetime.html">datetime</a><br>
  45 +<a href="os.html">os</a><br>
  46 +</td><td width="25%" valign=top><a href="string.html">string</a><br>
  47 +<a href="struct.html">struct</a><br>
  48 +</td><td width="25%" valign=top><a href="sys.html">sys</a><br>
  49 +</td></tr></table></td></tr></table><p>
  50 +<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="section">
  51 +<tr bgcolor="#ee77aa">
  52 +<td colspan=3 valign=bottom>&nbsp;<br>
  53 +<font color="#ffffff" face="helvetica, arial"><big><strong>Classes</strong></big></font></td></tr>
  54 +
  55 +<tr><td bgcolor="#ee77aa"><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</tt></td><td>&nbsp;</td>
  56 +<td width="100%"><dl>
  57 +<dt><font face="helvetica, arial"><a href="olefile2.html#OleFileIO">OleFileIO</a>
  58 +</font></dt></dl>
  59 + <p>
  60 +<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="section">
  61 +<tr bgcolor="#ffc8d8">
  62 +<td colspan=3 valign=bottom>&nbsp;<br>
  63 +<font color="#000000" face="helvetica, arial"><a name="OleFileIO">class <strong>OleFileIO</strong></a></font></td></tr>
  64 +
  65 +<tr bgcolor="#ffc8d8"><td rowspan=2><tt>&nbsp;&nbsp;&nbsp;</tt></td>
  66 +<td colspan=2><tt>OLE&nbsp;container&nbsp;object<br>
  67 +&nbsp;<br>
  68 +This&nbsp;class&nbsp;encapsulates&nbsp;the&nbsp;interface&nbsp;to&nbsp;an&nbsp;OLE&nbsp;2&nbsp;structured<br>
  69 +storage&nbsp;file.&nbsp;&nbsp;Use&nbsp;the&nbsp;{@link&nbsp;listdir}&nbsp;and&nbsp;{@link&nbsp;openstream}&nbsp;methods&nbsp;to<br>
  70 +access&nbsp;the&nbsp;contents&nbsp;of&nbsp;this&nbsp;file.<br>
  71 +&nbsp;<br>
  72 +Object&nbsp;names&nbsp;are&nbsp;given&nbsp;as&nbsp;a&nbsp;list&nbsp;of&nbsp;strings,&nbsp;one&nbsp;for&nbsp;each&nbsp;subentry<br>
  73 +level.&nbsp;&nbsp;The&nbsp;root&nbsp;entry&nbsp;should&nbsp;be&nbsp;omitted.&nbsp;&nbsp;For&nbsp;example,&nbsp;the&nbsp;following<br>
  74 +code&nbsp;extracts&nbsp;all&nbsp;image&nbsp;streams&nbsp;from&nbsp;a&nbsp;Microsoft&nbsp;Image&nbsp;Composer&nbsp;file:<br>
  75 +&nbsp;<br>
  76 +&nbsp;&nbsp;&nbsp;&nbsp;ole&nbsp;=&nbsp;<a href="#OleFileIO">OleFileIO</a>("fan.mic")<br>
  77 +&nbsp;<br>
  78 +&nbsp;&nbsp;&nbsp;&nbsp;for&nbsp;entry&nbsp;in&nbsp;ole.<a href="#OleFileIO-listdir">listdir</a>():<br>
  79 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if&nbsp;entry[1:2]&nbsp;==&nbsp;"Image":<br>
  80 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fin&nbsp;=&nbsp;ole.<a href="#OleFileIO-openstream">openstream</a>(entry)<br>
  81 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fout&nbsp;=&nbsp;<a href="#OleFileIO-open">open</a>(entry[0:1],&nbsp;"wb")<br>
  82 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;while&nbsp;True:<br>
  83 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;s&nbsp;=&nbsp;fin.read(8192)<br>
  84 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if&nbsp;not&nbsp;s:<br>
  85 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;break<br>
  86 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;fout.write(s)<br>
  87 +&nbsp;<br>
  88 +You&nbsp;can&nbsp;use&nbsp;the&nbsp;viewer&nbsp;application&nbsp;provided&nbsp;with&nbsp;the&nbsp;Python&nbsp;Imaging<br>
  89 +Library&nbsp;to&nbsp;view&nbsp;the&nbsp;resulting&nbsp;files&nbsp;(which&nbsp;happens&nbsp;to&nbsp;be&nbsp;standard<br>
  90 +TIFF&nbsp;files).<br>&nbsp;</tt></td></tr>
  91 +<tr><td>&nbsp;</td>
  92 +<td width="100%">Methods defined here:<br>
  93 +<dl><dt><a name="OleFileIO-__init__"><strong>__init__</strong></a>(self, filename<font color="#909090">=None</font>, raise_defects<font color="#909090">=40</font>)</dt><dd><tt>Constructor&nbsp;for&nbsp;<a href="#OleFileIO">OleFileIO</a>&nbsp;class.<br>
  94 +&nbsp;<br>
  95 +filename:&nbsp;file&nbsp;to&nbsp;open.<br>
  96 +raise_defects:&nbsp;minimal&nbsp;level&nbsp;for&nbsp;defects&nbsp;to&nbsp;be&nbsp;raised&nbsp;as&nbsp;exceptions.<br>
  97 +(use&nbsp;DEFECT_FATAL&nbsp;for&nbsp;a&nbsp;typical&nbsp;application,&nbsp;DEFECT_INCORRECT&nbsp;for&nbsp;a<br>
  98 +security-oriented&nbsp;application,&nbsp;see&nbsp;source&nbsp;code&nbsp;for&nbsp;details)</tt></dd></dl>
  99 +
  100 +<dl><dt><a name="OleFileIO-close"><strong>close</strong></a>(self)</dt><dd><tt>close&nbsp;the&nbsp;OLE&nbsp;file,&nbsp;to&nbsp;release&nbsp;the&nbsp;file&nbsp;object</tt></dd></dl>
  101 +
  102 +<dl><dt><a name="OleFileIO-dumpdirectory"><strong>dumpdirectory</strong></a>(self)</dt><dd><tt>Dump&nbsp;directory&nbsp;(for&nbsp;debugging&nbsp;only)</tt></dd></dl>
  103 +
  104 +<dl><dt><a name="OleFileIO-dumpfat"><strong>dumpfat</strong></a>(self, fat, firstindex<font color="#909090">=0</font>)</dt><dd><tt>Displays&nbsp;a&nbsp;part&nbsp;of&nbsp;FAT&nbsp;in&nbsp;human-readable&nbsp;form&nbsp;for&nbsp;debugging&nbsp;purpose</tt></dd></dl>
  105 +
  106 +<dl><dt><a name="OleFileIO-dumpsect"><strong>dumpsect</strong></a>(self, sector, firstindex<font color="#909090">=0</font>)</dt><dd><tt>Displays&nbsp;a&nbsp;sector&nbsp;in&nbsp;a&nbsp;human-readable&nbsp;form,&nbsp;for&nbsp;debugging&nbsp;purpose.</tt></dd></dl>
  107 +
  108 +<dl><dt><a name="OleFileIO-exists"><strong>exists</strong></a>(self, filename)</dt><dd><tt>Test&nbsp;if&nbsp;given&nbsp;filename&nbsp;exists&nbsp;as&nbsp;a&nbsp;stream&nbsp;or&nbsp;a&nbsp;storage&nbsp;in&nbsp;the&nbsp;OLE<br>
  109 +container.<br>
  110 +&nbsp;<br>
  111 +filename:&nbsp;path&nbsp;of&nbsp;stream&nbsp;in&nbsp;storage&nbsp;tree.&nbsp;(see&nbsp;openstream&nbsp;for&nbsp;syntax)<br>
  112 +return:&nbsp;True&nbsp;if&nbsp;object&nbsp;exist,&nbsp;else&nbsp;False.</tt></dd></dl>
  113 +
  114 +<dl><dt><a name="OleFileIO-get_metadata"><strong>get_metadata</strong></a>(self)</dt><dd><tt>Parse&nbsp;standard&nbsp;properties&nbsp;streams,&nbsp;return&nbsp;an&nbsp;OleMetadata&nbsp;object<br>
  115 +containing&nbsp;all&nbsp;the&nbsp;available&nbsp;metadata.<br>
  116 +(also&nbsp;stored&nbsp;in&nbsp;the&nbsp;metadata&nbsp;attribute&nbsp;of&nbsp;the&nbsp;<a href="#OleFileIO">OleFileIO</a>&nbsp;object)<br>
  117 +&nbsp;<br>
  118 +new&nbsp;in&nbsp;version&nbsp;0.25</tt></dd></dl>
  119 +
  120 +<dl><dt><a name="OleFileIO-get_rootentry_name"><strong>get_rootentry_name</strong></a>(self)</dt><dd><tt>Return&nbsp;root&nbsp;entry&nbsp;name.&nbsp;Should&nbsp;usually&nbsp;be&nbsp;'Root&nbsp;Entry'&nbsp;or&nbsp;'R'&nbsp;in&nbsp;most<br>
  121 +implementations.</tt></dd></dl>
  122 +
  123 +<dl><dt><a name="OleFileIO-get_size"><strong>get_size</strong></a>(self, filename)</dt><dd><tt>Return&nbsp;size&nbsp;of&nbsp;a&nbsp;stream&nbsp;in&nbsp;the&nbsp;OLE&nbsp;container,&nbsp;in&nbsp;bytes.<br>
  124 +&nbsp;<br>
  125 +filename:&nbsp;path&nbsp;of&nbsp;stream&nbsp;in&nbsp;storage&nbsp;tree&nbsp;(see&nbsp;openstream&nbsp;for&nbsp;syntax)<br>
  126 +return:&nbsp;size&nbsp;in&nbsp;bytes&nbsp;(long&nbsp;integer)<br>
  127 +raise:&nbsp;IOError&nbsp;if&nbsp;file&nbsp;not&nbsp;found,&nbsp;TypeError&nbsp;if&nbsp;this&nbsp;is&nbsp;not&nbsp;a&nbsp;stream.</tt></dd></dl>
  128 +
  129 +<dl><dt><a name="OleFileIO-get_type"><strong>get_type</strong></a>(self, filename)</dt><dd><tt>Test&nbsp;if&nbsp;given&nbsp;filename&nbsp;exists&nbsp;as&nbsp;a&nbsp;stream&nbsp;or&nbsp;a&nbsp;storage&nbsp;in&nbsp;the&nbsp;OLE<br>
  130 +container,&nbsp;and&nbsp;return&nbsp;its&nbsp;type.<br>
  131 +&nbsp;<br>
  132 +filename:&nbsp;path&nbsp;of&nbsp;stream&nbsp;in&nbsp;storage&nbsp;tree.&nbsp;(see&nbsp;openstream&nbsp;for&nbsp;syntax)<br>
  133 +return:&nbsp;False&nbsp;if&nbsp;object&nbsp;does&nbsp;not&nbsp;exist,&nbsp;its&nbsp;entry&nbsp;type&nbsp;(&gt;0)&nbsp;otherwise:<br>
  134 +&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;STGTY_STREAM:&nbsp;a&nbsp;stream<br>
  135 +&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;STGTY_STORAGE:&nbsp;a&nbsp;storage<br>
  136 +&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;STGTY_ROOT:&nbsp;the&nbsp;root&nbsp;entry</tt></dd></dl>
  137 +
  138 +<dl><dt><a name="OleFileIO-getctime"><strong>getctime</strong></a>(self, filename)</dt><dd><tt>Return&nbsp;creation&nbsp;time&nbsp;of&nbsp;a&nbsp;stream/storage.<br>
  139 +&nbsp;<br>
  140 +filename:&nbsp;path&nbsp;of&nbsp;stream/storage&nbsp;in&nbsp;storage&nbsp;tree.&nbsp;(see&nbsp;openstream&nbsp;for<br>
  141 +syntax)<br>
  142 +return:&nbsp;None&nbsp;if&nbsp;creation&nbsp;time&nbsp;is&nbsp;null,&nbsp;a&nbsp;python&nbsp;datetime&nbsp;object<br>
  143 +otherwise&nbsp;(UTC&nbsp;timezone)<br>
  144 +&nbsp;<br>
  145 +new&nbsp;in&nbsp;version&nbsp;0.26</tt></dd></dl>
  146 +
  147 +<dl><dt><a name="OleFileIO-getmtime"><strong>getmtime</strong></a>(self, filename)</dt><dd><tt>Return&nbsp;modification&nbsp;time&nbsp;of&nbsp;a&nbsp;stream/storage.<br>
  148 +&nbsp;<br>
  149 +filename:&nbsp;path&nbsp;of&nbsp;stream/storage&nbsp;in&nbsp;storage&nbsp;tree.&nbsp;(see&nbsp;openstream&nbsp;for<br>
  150 +syntax)<br>
  151 +return:&nbsp;None&nbsp;if&nbsp;modification&nbsp;time&nbsp;is&nbsp;null,&nbsp;a&nbsp;python&nbsp;datetime&nbsp;object<br>
  152 +otherwise&nbsp;(UTC&nbsp;timezone)<br>
  153 +&nbsp;<br>
  154 +new&nbsp;in&nbsp;version&nbsp;0.26</tt></dd></dl>
  155 +
  156 +<dl><dt><a name="OleFileIO-getproperties"><strong>getproperties</strong></a>(self, filename, convert_time<font color="#909090">=False</font>, no_conversion<font color="#909090">=None</font>)</dt><dd><tt>Return&nbsp;properties&nbsp;described&nbsp;in&nbsp;substream.<br>
  157 +&nbsp;<br>
  158 +filename:&nbsp;path&nbsp;of&nbsp;stream&nbsp;in&nbsp;storage&nbsp;tree&nbsp;(see&nbsp;openstream&nbsp;for&nbsp;syntax)<br>
  159 +convert_time:&nbsp;bool,&nbsp;if&nbsp;True&nbsp;timestamps&nbsp;will&nbsp;be&nbsp;converted&nbsp;to&nbsp;Python&nbsp;datetime<br>
  160 +no_conversion:&nbsp;None&nbsp;or&nbsp;list&nbsp;of&nbsp;int,&nbsp;timestamps&nbsp;not&nbsp;to&nbsp;be&nbsp;converted<br>
  161 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;(for&nbsp;example&nbsp;total&nbsp;editing&nbsp;time&nbsp;is&nbsp;not&nbsp;a&nbsp;real&nbsp;timestamp)<br>
  162 +return:&nbsp;a&nbsp;dictionary&nbsp;of&nbsp;values&nbsp;indexed&nbsp;by&nbsp;id&nbsp;(integer)</tt></dd></dl>
  163 +
  164 +<dl><dt><a name="OleFileIO-getsect"><strong>getsect</strong></a>(self, sect)</dt><dd><tt>Read&nbsp;given&nbsp;sector&nbsp;from&nbsp;file&nbsp;on&nbsp;disk.<br>
  165 +sect:&nbsp;sector&nbsp;index<br>
  166 +returns&nbsp;a&nbsp;string&nbsp;containing&nbsp;the&nbsp;sector&nbsp;data.</tt></dd></dl>
  167 +
  168 +<dl><dt><a name="OleFileIO-listdir"><strong>listdir</strong></a>(self, streams<font color="#909090">=True</font>, storages<font color="#909090">=False</font>)</dt><dd><tt>Return&nbsp;a&nbsp;list&nbsp;of&nbsp;streams&nbsp;stored&nbsp;in&nbsp;this&nbsp;file<br>
  169 +&nbsp;<br>
  170 +streams:&nbsp;bool,&nbsp;include&nbsp;streams&nbsp;if&nbsp;True&nbsp;(True&nbsp;by&nbsp;default)&nbsp;-&nbsp;new&nbsp;in&nbsp;v0.26<br>
  171 +storages:&nbsp;bool,&nbsp;include&nbsp;storages&nbsp;if&nbsp;True&nbsp;(False&nbsp;by&nbsp;default)&nbsp;-&nbsp;new&nbsp;in&nbsp;v0.26<br>
  172 +(note:&nbsp;the&nbsp;root&nbsp;storage&nbsp;is&nbsp;never&nbsp;included)</tt></dd></dl>
  173 +
  174 +<dl><dt><a name="OleFileIO-loaddirectory"><strong>loaddirectory</strong></a>(self, sect)</dt><dd><tt>Load&nbsp;the&nbsp;directory.<br>
  175 +sect:&nbsp;sector&nbsp;index&nbsp;of&nbsp;directory&nbsp;stream.</tt></dd></dl>
  176 +
  177 +<dl><dt><a name="OleFileIO-loadfat"><strong>loadfat</strong></a>(self, header)</dt><dd><tt>Load&nbsp;the&nbsp;FAT&nbsp;table.</tt></dd></dl>
  178 +
  179 +<dl><dt><a name="OleFileIO-loadfat_sect"><strong>loadfat_sect</strong></a>(self, sect)</dt><dd><tt>Adds&nbsp;the&nbsp;indexes&nbsp;of&nbsp;the&nbsp;given&nbsp;sector&nbsp;to&nbsp;the&nbsp;FAT<br>
  180 +sect:&nbsp;string&nbsp;containing&nbsp;the&nbsp;first&nbsp;FAT&nbsp;sector,&nbsp;or&nbsp;array&nbsp;of&nbsp;long&nbsp;integers<br>
  181 +return:&nbsp;index&nbsp;of&nbsp;last&nbsp;FAT&nbsp;sector.</tt></dd></dl>
  182 +
  183 +<dl><dt><a name="OleFileIO-loadminifat"><strong>loadminifat</strong></a>(self)</dt><dd><tt>Load&nbsp;the&nbsp;MiniFAT&nbsp;table.</tt></dd></dl>
  184 +
  185 +<dl><dt><a name="OleFileIO-open"><strong>open</strong></a>(self, filename)</dt><dd><tt>Open&nbsp;an&nbsp;OLE2&nbsp;file.<br>
  186 +Reads&nbsp;the&nbsp;header,&nbsp;FAT&nbsp;and&nbsp;directory.<br>
  187 +&nbsp;<br>
  188 +filename:&nbsp;string-like&nbsp;or&nbsp;file-like&nbsp;object</tt></dd></dl>
  189 +
  190 +<dl><dt><a name="OleFileIO-openstream"><strong>openstream</strong></a>(self, filename)</dt><dd><tt>Open&nbsp;a&nbsp;stream&nbsp;as&nbsp;a&nbsp;read-only&nbsp;file&nbsp;object&nbsp;(StringIO).<br>
  191 +&nbsp;<br>
  192 +filename:&nbsp;path&nbsp;of&nbsp;stream&nbsp;in&nbsp;storage&nbsp;tree&nbsp;(except&nbsp;root&nbsp;entry),&nbsp;either:<br>
  193 +&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;a&nbsp;string&nbsp;using&nbsp;Unix&nbsp;path&nbsp;syntax,&nbsp;for&nbsp;example:<br>
  194 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'storage_1/storage_1.2/stream'<br>
  195 +&nbsp;&nbsp;&nbsp;&nbsp;-&nbsp;a&nbsp;list&nbsp;of&nbsp;storage&nbsp;filenames,&nbsp;path&nbsp;to&nbsp;the&nbsp;desired&nbsp;stream/storage.<br>
  196 +&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Example:&nbsp;['storage_1',&nbsp;'storage_1.2',&nbsp;'stream']<br>
  197 +return:&nbsp;file&nbsp;object&nbsp;(read-only)<br>
  198 +raise&nbsp;IOError&nbsp;if&nbsp;filename&nbsp;not&nbsp;found,&nbsp;or&nbsp;if&nbsp;this&nbsp;is&nbsp;not&nbsp;a&nbsp;stream.</tt></dd></dl>
  199 +
  200 +<dl><dt><a name="OleFileIO-sect2array"><strong>sect2array</strong></a>(self, sect)</dt><dd><tt>convert&nbsp;a&nbsp;sector&nbsp;to&nbsp;an&nbsp;array&nbsp;of&nbsp;32&nbsp;bits&nbsp;unsigned&nbsp;integers,<br>
  201 +swapping&nbsp;bytes&nbsp;on&nbsp;big&nbsp;endian&nbsp;CPUs&nbsp;such&nbsp;as&nbsp;PowerPC&nbsp;(old&nbsp;Macs)</tt></dd></dl>
  202 +
  203 +</td></tr></table></td></tr></table><p>
  204 +<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="section">
  205 +<tr bgcolor="#eeaa77">
  206 +<td colspan=3 valign=bottom>&nbsp;<br>
  207 +<font color="#ffffff" face="helvetica, arial"><big><strong>Functions</strong></big></font></td></tr>
  208 +
  209 +<tr><td bgcolor="#eeaa77"><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</tt></td><td>&nbsp;</td>
  210 +<td width="100%"><dl><dt><a name="-isOleFile"><strong>isOleFile</strong></a>(filename)</dt><dd><tt>Test&nbsp;if&nbsp;file&nbsp;is&nbsp;an&nbsp;OLE&nbsp;container&nbsp;(according&nbsp;to&nbsp;its&nbsp;header).<br>
  211 +filename:&nbsp;file&nbsp;name&nbsp;or&nbsp;path&nbsp;(str,&nbsp;unicode)<br>
  212 +return:&nbsp;True&nbsp;if&nbsp;OLE,&nbsp;False&nbsp;otherwise.</tt></dd></dl>
  213 +</td></tr></table><p>
  214 +<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="section">
  215 +<tr bgcolor="#55aa55">
  216 +<td colspan=3 valign=bottom>&nbsp;<br>
  217 +<font color="#ffffff" face="helvetica, arial"><big><strong>Data</strong></big></font></td></tr>
  218 +
  219 +<tr><td bgcolor="#55aa55"><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</tt></td><td>&nbsp;</td>
  220 +<td width="100%"><strong>DEFECT_FATAL</strong> = 40<br>
  221 +<strong>DEFECT_INCORRECT</strong> = 30<br>
  222 +<strong>DEFECT_POTENTIAL</strong> = 20<br>
  223 +<strong>DEFECT_UNSURE</strong> = 10<br>
  224 +<strong>STGTY_EMPTY</strong> = 0<br>
  225 +<strong>STGTY_LOCKBYTES</strong> = 3<br>
  226 +<strong>STGTY_PROPERTY</strong> = 4<br>
  227 +<strong>STGTY_ROOT</strong> = 5<br>
  228 +<strong>STGTY_STORAGE</strong> = 1<br>
  229 +<strong>STGTY_STREAM</strong> = 2<br>
  230 +<strong>__all__</strong> = ['OleFileIO', 'isOleFile', 'DEFECT_UNSURE', 'STGTY_STREAM', 'DEFECT_FATAL', 'STGTY_EMPTY', 'STGTY_LOCKBYTES', 'STGTY_STORAGE', 'STGTY_PROPERTY', 'DEFECT_INCORRECT', 'DEFECT_POTENTIAL', 'STGTY_ROOT']<br>
  231 +<strong>__author__</strong> = 'Philippe Lagadec'<br>
  232 +<strong>__date__</strong> = '2014-10-01'<br>
  233 +<strong>__version__</strong> = '0.40py2'</td></tr></table><p>
  234 +<table width="100%" cellspacing=0 cellpadding=2 border=0 summary="section">
  235 +<tr bgcolor="#7799ee">
  236 +<td colspan=3 valign=bottom>&nbsp;<br>
  237 +<font color="#ffffff" face="helvetica, arial"><big><strong>Author</strong></big></font></td></tr>
  238 +
  239 +<tr><td bgcolor="#7799ee"><tt>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</tt></td><td>&nbsp;</td>
  240 +<td width="100%">Philippe&nbsp;Lagadec</td></tr></table>
  241 +</body></html>
0 242 \ No newline at end of file
... ...
oletools/thirdparty/OleFileIO_PL/OleFileIO_PL.py renamed to oletools/thirdparty/olefile/olefile2.py
1 1 #!/usr/local/bin/python
2 2 # -*- coding: latin-1 -*-
3 3 """
4   -OleFileIO_PL:
5   - Module to read Microsoft OLE2 files (also called Structured Storage or
6   - Microsoft Compound Document File Format), such as Microsoft Office
7   - documents, Image Composer and FlashPix files, Outlook messages, ...
  4 +olefile2 (formerly OleFileIO_PL2) version 0.40py2 2014-10-01
8 5  
9   -version 0.26 2013-07-24 Philippe Lagadec - http://www.decalage.info
  6 +Module to read Microsoft OLE2 files (also called Structured Storage or
  7 +Microsoft Compound Document File Format), such as Microsoft Office
  8 +documents, Image Composer and FlashPix files, Outlook messages, ...
  9 +
  10 +IMPORTANT NOTE: olefile2 is an old version of olefile meant to be used
  11 +as fallback for Python 2.5 and older. For Python 2.6, 2.7 and 3.x, please use
  12 +olefile which is more up-to-date. The improvements in olefile might
  13 +not always be backported to olefile2.
10 14  
11 15 Project website: http://www.decalage.info/python/olefileio
12 16  
13   -Improved version of the OleFileIO module from PIL library v1.1.6
  17 +olefile2 is copyright (c) 2005-2014 Philippe Lagadec (http://www.decalage.info)
  18 +
  19 +olefile2 is based on the OleFileIO module from the PIL library v1.1.6
14 20 See: http://www.pythonware.com/products/pil/index.htm
15 21  
16 22 The Python Imaging Library (PIL) is
17 23 Copyright (c) 1997-2005 by Secret Labs AB
18 24 Copyright (c) 1995-2005 by Fredrik Lundh
19   -OleFileIO_PL changes are Copyright (c) 2005-2013 by Philippe Lagadec
20 25  
21 26 See source code and LICENSE.txt for information on usage and redistribution.
22   -
23   -WARNING: THIS IS (STILL) WORK IN PROGRESS.
24 27 """
25 28  
26   -__author__ = "Philippe Lagadec, Fredrik Lundh (Secret Labs AB)"
27   -__date__ = "2013-07-24"
28   -__version__ = '0.26'
  29 +__author__ = "Philippe Lagadec"
  30 +__date__ = "2014-10-01"
  31 +__version__ = '0.40py2'
29 32  
30 33 #--- LICENSE ------------------------------------------------------------------
31 34  
32   -# OleFileIO_PL is an improved version of the OleFileIO module from the
33   -# Python Imaging Library (PIL).
34   -
35   -# OleFileIO_PL changes are Copyright (c) 2005-2013 by Philippe Lagadec
  35 +# olefile (formerly OleFileIO_PL) is copyright (c) 2005-2014 Philippe Lagadec
  36 +# (http://www.decalage.info)
  37 +#
  38 +# All rights reserved.
  39 +#
  40 +# Redistribution and use in source and binary forms, with or without modification,
  41 +# are permitted provided that the following conditions are met:
36 42 #
  43 +# * Redistributions of source code must retain the above copyright notice, this
  44 +# list of conditions and the following disclaimer.
  45 +# * Redistributions in binary form must reproduce the above copyright notice,
  46 +# this list of conditions and the following disclaimer in the documentation
  47 +# and/or other materials provided with the distribution.
  48 +#
  49 +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  50 +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  51 +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  52 +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  53 +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  54 +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  55 +# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  56 +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  57 +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  58 +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  59 +
  60 +# ----------
  61 +# PIL License:
  62 +#
  63 +# olefile is based on source code from the OleFileIO module of the Python
  64 +# Imaging Library (PIL) published by Fredrik Lundh under the following license:
  65 +
37 66 # The Python Imaging Library (PIL) is
38 67 # Copyright (c) 1997-2005 by Secret Labs AB
39 68 # Copyright (c) 1995-2005 by Fredrik Lundh
... ... @@ -59,7 +88,7 @@ __version__ = &#39;0.26&#39;
59 88 # PERFORMANCE OF THIS SOFTWARE.
60 89  
61 90 #-----------------------------------------------------------------------------
62   -# CHANGELOG: (only OleFileIO_PL changes compared to PIL 1.1.6)
  91 +# CHANGELOG: (only olefile/OleFileIO_PL changes compared to PIL 1.1.6)
63 92 # 2005-05-11 v0.10 PL: - a few fixes for Python 2.4 compatibility
64 93 # (all changes flagged with [PL])
65 94 # 2006-02-22 v0.11 PL: - a few fixes for some Office 2003 documents which raise
... ... @@ -131,51 +160,13 @@ __version__ = &#39;0.26&#39;
131 160 # of a directory entry or a storage/stream
132 161 # - fixed parsing of direntry timestamps
133 162 # 2013-07-24 PL: - new options in listdir to list storages and/or streams
  163 +# 2014-07-18 v0.31 - preliminary support for 4K sectors
  164 +# 2014-09-26 v0.40 PL: - renamed OleFileIO_PL to olefile
134 165  
135 166 #-----------------------------------------------------------------------------
136   -# TODO (for version 1.0):
137   -# + add path attrib to _OleDirEntry, set it once and for all in init or
138   -# append_kids (then listdir/_list can be simplified)
139   -# - TESTS with Linux, MacOSX, Python 1.5.2, various files, PIL, ...
140   -# - add underscore to each private method, to avoid their display in
141   -# pydoc/epydoc documentation - Remove it for classes to be documented
142   -# - replace all raised exceptions with _raise_defect (at least in OleFileIO)
143   -# - merge code from _OleStream and OleFileIO.getsect to read sectors
144   -# (maybe add a class for FAT and MiniFAT ?)
145   -# - add method to check all streams (follow sectors chains without storing all
146   -# stream in memory, and report anomalies)
147   -# - use _OleDirectoryEntry.kids_dict to improve _find and _list ?
148   -# - fix Unicode names handling (find some way to stay compatible with Py1.5.2)
149   -# => if possible avoid converting names to Latin-1
150   -# - review DIFAT code: fix handling of DIFSECT blocks in FAT (not stop)
151   -# - rewrite OleFileIO.getproperties
152   -# - improve docstrings to show more sample uses
153   -# - see also original notes and FIXME below
154   -# - remove all obsolete FIXMEs
155   -# - OleMetadata: fix version attrib according to
156   -# http://msdn.microsoft.com/en-us/library/dd945671%28v=office.12%29.aspx
157   -
158   -# IDEAS:
159   -# - in OleFileIO._open and _OleStream, use size=None instead of 0x7FFFFFFF for
160   -# streams with unknown size
161   -# - use arrays of int instead of long integers for FAT/MiniFAT, to improve
162   -# performance and reduce memory usage ? (possible issue with values >2^31)
163   -# - provide tests with unittest (may need write support to create samples)
164   -# - move all debug code (and maybe dump methods) to a separate module, with
165   -# a class which inherits OleFileIO ?
166   -# - fix docstrings to follow epydoc format
167   -# - add support for 4K sectors ?
168   -# - add support for big endian byte order ?
169   -# - create a simple OLE explorer with wxPython
170   -
171   -# FUTURE EVOLUTIONS to add write support:
172   -# 1) add ability to write a stream back on disk from StringIO (same size, no
173   -# change in FAT/MiniFAT).
174   -# 2) rename a stream/storage if it doesn't change the RB tree
175   -# 3) use rbtree module to update the red-black tree + any rename
176   -# 4) remove a stream/storage: free sectors in FAT/MiniFAT
177   -# 5) allocate new sectors in FAT/MiniFAT
178   -# 6) create new storage/stream
  167 +# TODO:
  168 +# + check if running on Python 2.6+, if so issue warning to use olefile
  169 +
179 170 #-----------------------------------------------------------------------------
180 171  
181 172 #
... ... @@ -1002,7 +993,7 @@ class OleFileIO:
1002 993 if entry[1:2] == "Image":
1003 994 fin = ole.openstream(entry)
1004 995 fout = open(entry[0:1], "wb")
1005   - while 1:
  996 + while True:
1006 997 s = fin.read(8192)
1007 998 if not s:
1008 999 break
... ... @@ -1587,13 +1578,15 @@ class OleFileIO:
1587 1578 (self.root.isectStart, size_ministream))
1588 1579 self.ministream = self._open(self.root.isectStart,
1589 1580 size_ministream, force_FAT=True)
1590   - return _OleStream(self.ministream, start, size, 0,
1591   - self.minisectorsize, self.minifat,
1592   - self.ministream.size)
  1581 + return _OleStream(fp=self.ministream, sect=start, size=size,
  1582 + offset=0, sectorsize=self.minisectorsize,
  1583 + fat=self.minifat, filesize=self.ministream.size)
1593 1584 else:
1594 1585 # standard stream
1595   - return _OleStream(self.fp, start, size, 512,
1596   - self.sectorsize, self.fat, self._filesize)
  1586 + return _OleStream(fp=self.fp, sect=start, size=size,
  1587 + offset=self.sectorsize,
  1588 + sectorsize=self.sectorsize, fat=self.fat,
  1589 + filesize=self._filesize)
1597 1590  
1598 1591  
1599 1592 def _list(self, files, prefix, node, streams=True, storages=False):
... ... @@ -1950,7 +1943,7 @@ if __name__ == &quot;__main__&quot;:
1950 1943 print """
1951 1944 Launched from command line, this script parses OLE files and prints info.
1952 1945  
1953   -Usage: OleFileIO_PL.py [-d] [-c] <file> [file2 ...]
  1946 +Usage: olefile2.py [-d] [-c] <file> [file2 ...]
1954 1947  
1955 1948 Options:
1956 1949 -d : debug mode (display a lot of debug information, for developers only)
... ... @@ -2046,3 +2039,5 @@ Options:
2046 2039 print 'None'
2047 2040 ## except IOError, v:
2048 2041 ## print "***", "cannot read", file, "-", v
  2042 +
  2043 +# this code was developed while listening to The Wedding Present "Sea Monsters"
... ...