Commit 677d9ad57da9670ed58cd2d824b5baf7ac7c5c64

Authored by kirk-sayre-work
2 parents be2898f5 c03e948d

Merge remote-tracking branch 'upstream/master'

Showing 93 changed files with 4772 additions and 5845 deletions
.travis.yml
... ... @@ -17,5 +17,8 @@ matrix:
17 17 - python: pypy
18 18 - python: pypy3
19 19  
  20 +install:
  21 + - pip install msoffcrypto-tool
  22 +
20 23 script:
21 24 - python setup.py test
... ...
INSTALL.txt
1   -How to Download and Install python-oletools
2   -===========================================
  1 +How to Download and Install oletools
  2 +====================================
3 3  
4 4 Pre-requisites
5 5 --------------
6 6  
7   -The recommended Python version to run oletools is Python 2.7.
8   -Python 2.6 is also supported, but as it is not tested as often as 2.7, some features
9   -might not work as expected.
10   -
11   -Since v0.50, oletools can also run with Python 3.x. As this is quite new, please
12   -report any issue you may encounter.
13   -
  7 +The recommended Python version to run oletools is the latest **Python 3.x** (3.7 for now).
  8 +Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly
  9 +recommended to switch to Python 3 now.
14 10  
15 11 Recommended way to Download+Install/Update oletools: pip
16 12 --------------------------------------------------------
... ... @@ -23,7 +19,11 @@ system, either upgrade Python or see https://pip.pypa.io/en/stable/installing/
23 19 To download and install/update the latest release version of oletools,
24 20 run the following command in a shell:
25 21  
  22 +```text
26 23 sudo -H pip install -U oletools
  24 +```
  25 +
  26 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
27 27  
28 28 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts
29 29 in /usr/local/bin to run all the oletools from any directory.
... ... @@ -33,7 +33,19 @@ in /usr/local/bin to run all the oletools from any directory.
33 33 To download and install/update the latest release version of oletools,
34 34 run the following command in a cmd window:
35 35  
  36 +```text
36 37 pip install -U oletools
  38 +```
  39 +
  40 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  41 +
  42 +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip
  43 +and install for all users. If that is not possible, you may also install only for the current user
  44 +by adding the `--user` option:
  45 +
  46 +```text
  47 +pip3 install -U --user oletools
  48 +```
37 49  
38 50 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts
39 51 to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc.
... ... @@ -47,18 +59,33 @@ you may also use pip:
47 59  
48 60 ### Linux, Mac OSX, Unix
49 61  
  62 +```text
50 63 sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip
  64 +```
  65 +
  66 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
51 67  
52 68 ### Windows
53 69  
  70 +```text
54 71 pip install -U https://github.com/decalage2/oletools/archive/master.zip
  72 +```
  73 +
  74 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  75 +
  76 +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip
  77 +and install for all users. If that is not possible, you may also install only for the current user
  78 +by adding the `--user` option:
55 79  
  80 +```text
  81 +pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip
  82 +```
56 83  
57 84 How to install offline - Computer without Internet access
58 85 ---------------------------------------------------------
59 86  
60 87 First, download the oletools archive on a computer with Internet access:
61   -* Latest stable version: from https://github.com/decalage2/oletools/releases
  88 +* Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases
62 89 * Development version: https://github.com/decalage2/oletools/archive/master.zip
63 90  
64 91 Copy the archive file to the target computer.
... ... @@ -66,11 +93,15 @@ Copy the archive file to the target computer.
66 93 On Linux, Mac OSX, Unix, run the following command using the filename of the
67 94 archive that you downloaded:
68 95  
  96 +```text
69 97 sudo -H pip install -U oletools.zip
  98 +```
70 99  
71 100 On Windows:
72 101  
  102 +```text
73 103 pip install -U oletools.zip
  104 +```
74 105  
75 106  
76 107 Old school install using setup.py
... ... @@ -88,9 +119,12 @@ Then extract the archive, open a shell and go to the oletools directory.
88 119  
89 120 ### Linux, Mac OSX, Unix
90 121  
  122 +```text
91 123 sudo -H python setup.py install
  124 +```
92 125  
93 126 ### Windows:
94 127  
  128 +```text
95 129 python setup.py install
96   -
  130 +```
... ...
LICENSE.md 0 → 100644
  1 +This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files
  2 +published with their own license.
  3 +
  4 +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)
  5 +
  6 +All rights reserved.
  7 +
  8 +Redistribution and use in source and binary forms, with or without modification,
  9 +are permitted provided that the following conditions are met:
  10 +
  11 + * Redistributions of source code must retain the above copyright notice, this
  12 + list of conditions and the following disclaimer.
  13 + * Redistributions in binary form must reproduce the above copyright notice,
  14 + this list of conditions and the following disclaimer in the documentation
  15 + and/or other materials provided with the distribution.
  16 +
  17 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  18 +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  19 +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  20 +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  21 +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22 +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23 +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24 +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25 +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  26 +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27 +
  28 +
  29 +----------
  30 +
  31 +olevba contains modified source code from the officeparser project, published
  32 +under the following MIT License (MIT):
  33 +
  34 +officeparser is copyright (c) 2014 John William Davison
  35 +
  36 +Permission is hereby granted, free of charge, to any person obtaining a copy
  37 +of this software and associated documentation files (the "Software"), to deal
  38 +in the Software without restriction, including without limitation the rights
  39 +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  40 +copies of the Software, and to permit persons to whom the Software is
  41 +furnished to do so, subject to the following conditions:
  42 +
  43 +The above copyright notice and this permission notice shall be included in all
  44 +copies or substantial portions of the Software.
  45 +
  46 +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  47 +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  48 +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  49 +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  50 +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  51 +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  52 +SOFTWARE.
... ...
MANIFEST.in 0 → 100644
  1 +include install.bat
  2 +include INSTALL.txt
  3 +include README.md
  4 +include requirements.txt
  5 +include oletools/README.rst
  6 +include oletools/README.html
  7 +include oletools/LICENSE.txt
  8 +include oletools/DocVarDump.vba
  9 +recursive-include oletools/thirdparty *.*
  10 +recursive-include cheatsheet *.*
  11 +global-exclude *.pyc
  12 +
  13 +recursive-include tests *.py
  14 +graft tests/test-data
... ...
README.md
... ... @@ -26,7 +26,25 @@ Note: python-oletools is not related to OLETools published by BeCubed Software.
26 26 News
27 27 ----
28 28  
29   -- **2018-05-30 v0.53**:
  29 +- **2019-05-22 v0.54.2**:
  30 + - bugfix release: fixed several issues related to encrypted documents
  31 + and XLM/XLF Excel 4 macros
  32 + - msoffcrypto-tool is now installed by default to handle encrypted documents
  33 + - olevba and msodde now handle documents encrypted with common passwords such
  34 + as 123, 1234, 4321, 12345, 123456, VelvetSweatShop automatically.
  35 +- **2019-04-04 v0.54**:
  36 + - olevba, msodde: added support for encrypted MS Office files
  37 + - olevba: added detection and extraction of XLM/XLF Excel 4 macros (thanks to plugin_biff from Didier Stevens' oledump)
  38 + - olevba, mraptor: added detection of VBA running Excel 4 macros
  39 + - olevba: detect and display special characters such as backspace
  40 + - olevba: colorized output showing suspicious keywords in the VBA code
  41 + - olevba, mraptor: full Python 3 compatibility, no separate olevba3/mraptor3 anymore
  42 + - olevba: improved handling of code pages and unicode
  43 + - olevba: fixed a false-positive in VBA macro detection
  44 + - rtfobj: improved OLE Package handling, improved Equation object detection
  45 + - oleobj: added detection of external links to objects in OpenXML
  46 + - replaced third party packages by PyPI dependencies
  47 +- 2018-05-30 v0.53:
30 48 - olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML files (aka Flat OPC format)
31 49 - improved support for VBA forms in olevba (oleform)
32 50 - rtfobj now displays the CLSID of OLE objects, which is the best way to identify them. Known-bad CLSIDs such as MS Equation Editor are highlighted in red.
... ... @@ -75,26 +93,38 @@ Projects using oletools:
75 93 ------------------------
76 94  
77 95 oletools are used by a number of projects and online malware analysis services,
78   -including [Viper](http://viper.li/), [REMnux](https://remnux.org/),
  96 +including
  97 +[ACE](https://github.com/IntegralDefense/ACE),
  98 +[Anlyz.io](https://sandbox.anlyz.io/),
  99 +[AssemblyLine](https://www.cse-cst.gc.ca/en/assemblyline),
  100 +[CAPE](https://github.com/ctxis/CAPE),
  101 +[Cuckoo Sandbox](https://github.com/cuckoosandbox/cuckoo),
  102 +[DARKSURGEON](https://github.com/cryps1s/DARKSURGEON),
  103 +[Deepviz](https://sandbox.deepviz.com/),
  104 +[dridex.malwareconfig.com](https://dridex.malwareconfig.com),
79 105 [FAME](https://certsocietegenerale.github.io/fame/),
  106 +[FLARE-VM](https://github.com/fireeye/flare-vm),
80 107 [Hybrid-analysis.com](https://www.hybrid-analysis.com/),
81 108 [Joe Sandbox](https://www.document-analyzer.net/),
82   -[Deepviz](https://sandbox.deepviz.com/),
83 109 [Laika BOSS](https://github.com/lmco/laikaboss),
84   -[Cuckoo Sandbox](https://github.com/cuckoosandbox/cuckoo),
85   -[Anlyz.io](https://sandbox.anlyz.io/),
86   -[ViperMonkey](https://github.com/decalage2/ViperMonkey),
87   -[pcodedmp](https://github.com/bontchev/pcodedmp),
88   -[dridex.malwareconfig.com](https://dridex.malwareconfig.com),
89   -[Snake](https://github.com/countercept/snake),
90   -[DARKSURGEON](https://github.com/cryps1s/DARKSURGEON),
91   -[CAPE](https://github.com/ctxis/CAPE),
92   -[AssemblyLine](https://www.cse-cst.gc.ca/en/assemblyline),
  110 +[MacroMilter](https://github.com/sbidy/MacroMilter),
93 111 [malshare.io](https://malshare.io),
94   -[Malware Repository Framework (MRF)](https://www.adlice.com/download/mrf/),
95 112 [malware-repo](https://github.com/Tigzy/malware-repo),
96   -[Vba2Graph](https://github.com/MalwareCantFly/Vba2Graph),
  113 +[Malware Repository Framework (MRF)](https://www.adlice.com/download/mrf/),
  114 +[olefy](https://github.com/HeinleinSupport/olefy),
  115 +[PeekabooAV](https://github.com/scVENUS/PeekabooAV),
  116 +[pcodedmp](https://github.com/bontchev/pcodedmp),
  117 +[PyCIRCLean](https://github.com/CIRCL/PyCIRCLean),
  118 +[REMnux](https://remnux.org/),
  119 +[Snake](https://github.com/countercept/snake),
  120 +[SNDBOX](https://app.sndbox.com),
97 121 [Strelka](https://github.com/target/strelka),
  122 +[stoQ](https://stoq.punchcyber.com/),
  123 +[TheHive/Cortex](https://github.com/TheHive-Project/Cortex-Analyzers),
  124 +[Vba2Graph](https://github.com/MalwareCantFly/Vba2Graph),
  125 +[Viper](http://viper.li/),
  126 +[ViperMonkey](https://github.com/decalage2/ViperMonkey),
  127 +[YOMI](https://yomi.yoroi.company),
98 128 and probably [VirusTotal](https://www.virustotal.com).
99 129 And quite a few [other projects on GitHub](https://github.com/search?q=oletools&type=Repositories).
100 130 (Please [contact me]((http://decalage.info/contact)) if you have or know
... ... @@ -149,7 +179,7 @@ License
149 179 This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files
150 180 published with their own license.
151 181  
152   -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (http://www.decalage.info)
  182 +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)
153 183  
154 184 All rights reserved.
155 185  
... ...
oletools/LICENSE.txt
1   -LICENSE for the python-oletools package:
2   -
3   -This license applies to the python-oletools package, apart from the thirdparty
4   -folder which contains third-party files published with their own license.
5   -
6   -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (http://www.decalage.info)
7   -
8   -All rights reserved.
9   -
10   -Redistribution and use in source and binary forms, with or without modification,
11   -are permitted provided that the following conditions are met:
12   -
13   - * Redistributions of source code must retain the above copyright notice, this
14   - list of conditions and the following disclaimer.
15   - * Redistributions in binary form must reproduce the above copyright notice,
16   - this list of conditions and the following disclaimer in the documentation
17   - and/or other materials provided with the distribution.
18   -
19   -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
20   -ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
21   -WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
22   -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
23   -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
24   -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
25   -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
26   -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
27   -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
28   -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
29   -
30   -
31   -----------
32   -
33   -olevba contains modified source code from the officeparser project, published
34   -under the following MIT License (MIT):
35   -
36   -officeparser is copyright (c) 2014 John William Davison
37   -
38   -Permission is hereby granted, free of charge, to any person obtaining a copy
39   -of this software and associated documentation files (the "Software"), to deal
40   -in the Software without restriction, including without limitation the rights
41   -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
42   -copies of the Software, and to permit persons to whom the Software is
43   -furnished to do so, subject to the following conditions:
44   -
45   -The above copyright notice and this permission notice shall be included in all
46   -copies or substantial portions of the Software.
47   -
48   -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
49   -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
50   -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
51   -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
52   -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
53   -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
54   -SOFTWARE.
  1 +LICENSE for the python-oletools package:
  2 +
  3 +This license applies to the python-oletools package, apart from the thirdparty
  4 +folder which contains third-party files published with their own license.
  5 +
  6 +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)
  7 +
  8 +All rights reserved.
  9 +
  10 +Redistribution and use in source and binary forms, with or without modification,
  11 +are permitted provided that the following conditions are met:
  12 +
  13 + * Redistributions of source code must retain the above copyright notice, this
  14 + list of conditions and the following disclaimer.
  15 + * Redistributions in binary form must reproduce the above copyright notice,
  16 + this list of conditions and the following disclaimer in the documentation
  17 + and/or other materials provided with the distribution.
  18 +
  19 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  20 +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  21 +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  22 +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  23 +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  24 +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  25 +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  26 +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  27 +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  28 +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  29 +
  30 +
  31 +----------
  32 +
  33 +olevba contains modified source code from the officeparser project, published
  34 +under the following MIT License (MIT):
  35 +
  36 +officeparser is copyright (c) 2014 John William Davison
  37 +
  38 +Permission is hereby granted, free of charge, to any person obtaining a copy
  39 +of this software and associated documentation files (the "Software"), to deal
  40 +in the Software without restriction, including without limitation the rights
  41 +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  42 +copies of the Software, and to permit persons to whom the Software is
  43 +furnished to do so, subject to the following conditions:
  44 +
  45 +The above copyright notice and this permission notice shall be included in all
  46 +copies or substantial portions of the Software.
  47 +
  48 +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  49 +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  50 +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  51 +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  52 +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  53 +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  54 +SOFTWARE.
... ...
oletools/README.html
... ... @@ -17,13 +17,33 @@
17 17 </head>
18 18 <body>
19 19 <h1 id="python-oletools">python-oletools</h1>
20   -<p><a href="https://pypi.org/project/oletools/"><img src="https://img.shields.io/pypi/v/oletools.svg" alt="PyPI" /></a> <a href="https://travis-ci.org/decalage2/oletools"><img src="https://travis-ci.org/decalage2/oletools.svg?branch=master" alt="Build Status" /></a></p>
  20 +<p><a href="https://pypi.org/project/oletools/"><img src="https://img.shields.io/pypi/v/oletools.svg" alt="PyPI" /></a> <a href="https://travis-ci.org/decalage2/oletools"><img src="https://travis-ci.org/decalage2/oletools.svg?branch=master" alt="Build Status" /></a> <a href="https://saythanks.io/to/decalage2"><img src="https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg" alt="Say Thanks!" /></a></p>
21 21 <p><a href="http://www.decalage.info/python/oletools">oletools</a> is a package of python tools to analyze <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on the <a href="http://www.decalage.info/olefile">olefile</a> parser. See <a href="http://www.decalage.info/python/oletools" class="uri">http://www.decalage.info/python/oletools</a> for more info.</p>
22 22 <p><strong>Quick links:</strong> <a href="http://www.decalage.info/python/oletools">Home page</a> - <a href="https://github.com/decalage2/oletools/wiki/Install">Download/Install</a> - <a href="https://github.com/decalage2/oletools/wiki">Documentation</a> - <a href="https://github.com/decalage2/oletools/issues">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the Author</a> - <a href="https://github.com/decalage2/oletools">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a> <a href="https://github.com/decalage2/oletools/blob/master/cheatsheet/oletools_cheatsheet.pdf">Cheatsheet</a></p>
23 23 <p>Note: python-oletools is not related to OLETools published by BeCubed Software.</p>
24 24 <h2 id="news">News</h2>
25 25 <ul>
26   -<li><strong>2018-05-30 v0.53</strong>:
  26 +<li><strong>2019-05-22 v0.54.2</strong>:
  27 +<ul>
  28 +<li>bugfix release: fixed several issues related to encrypted documents and XLM/XLF Excel 4 macros</li>
  29 +<li>msoffcrypto-tool is now installed by default to handle encrypted documents</li>
  30 +<li>olevba and msodde now handle documents encrypted with common passwords such as 123, 1234, 4321, 12345, 123456, VelvetSweatShop automatically.</li>
  31 +</ul></li>
  32 +<li><strong>2019-04-04 v0.54</strong>:
  33 +<ul>
  34 +<li>olevba, msodde: added support for encrypted MS Office files</li>
  35 +<li>olevba: added detection and extraction of XLM/XLF Excel 4 macros (thanks to plugin_biff from Didier Stevens' oledump)</li>
  36 +<li>olevba, mraptor: added detection of VBA running Excel 4 macros</li>
  37 +<li>olevba: detect and display special characters such as backspace</li>
  38 +<li>olevba: colorized output showing suspicious keywords in the VBA code</li>
  39 +<li>olevba, mraptor: full Python 3 compatibility, no separate olevba3/mraptor3 anymore</li>
  40 +<li>olevba: improved handling of code pages and unicode</li>
  41 +<li>olevba: fixed a false-positive in VBA macro detection</li>
  42 +<li>rtfobj: improved OLE Package handling, improved Equation object detection</li>
  43 +<li>oleobj: added detection of external links to objects in OpenXML</li>
  44 +<li>replaced third party packages by PyPI dependencies</li>
  45 +</ul></li>
  46 +<li>2018-05-30 v0.53:
27 47 <ul>
28 48 <li>olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML files (aka Flat OPC format)</li>
29 49 <li>improved support for VBA forms in olevba (oleform)</li>
... ... @@ -66,7 +86,7 @@
66 86 <li><a href="https://github.com/decalage2/oletools/wiki/olemap">olemap</a>: to display a map of all the sectors in an OLE file.</li>
67 87 </ul>
68 88 <h2 id="projects-using-oletools">Projects using oletools:</h2>
69   -<p>oletools are used by a number of projects and online malware analysis services, including <a href="http://viper.li/">Viper</a>, <a href="https://remnux.org/">REMnux</a>, <a href="https://certsocietegenerale.github.io/fame/">FAME</a>, <a href="https://www.hybrid-analysis.com/">Hybrid-analysis.com</a>, <a href="https://www.document-analyzer.net/">Joe Sandbox</a>, <a href="https://sandbox.deepviz.com/">Deepviz</a>, <a href="https://github.com/lmco/laikaboss">Laika BOSS</a>, <a href="https://github.com/cuckoosandbox/cuckoo">Cuckoo Sandbox</a>, <a href="https://sandbox.anlyz.io/">Anlyz.io</a>, <a href="https://github.com/decalage2/ViperMonkey">ViperMonkey</a>, <a href="https://github.com/bontchev/pcodedmp">pcodedmp</a>, <a href="https://dridex.malwareconfig.com">dridex.malwareconfig.com</a>, <a href="https://github.com/countercept/snake">Snake</a>, <a href="https://github.com/cryps1s/DARKSURGEON">DARKSURGEON</a>, and probably <a href="https://www.virustotal.com">VirusTotal</a>. (Please <a href="(http://decalage.info/contact)">contact me</a> if you have or know a project using oletools)</p>
  89 +<p>oletools are used by a number of projects and online malware analysis services, including <a href="http://viper.li/">Viper</a>, <a href="https://remnux.org/">REMnux</a>, <a href="https://github.com/fireeye/flare-vm">FLARE-VM</a>, <a href="https://certsocietegenerale.github.io/fame/">FAME</a>, <a href="https://www.hybrid-analysis.com/">Hybrid-analysis.com</a>, <a href="https://www.document-analyzer.net/">Joe Sandbox</a>, <a href="https://sandbox.deepviz.com/">Deepviz</a>, <a href="https://github.com/lmco/laikaboss">Laika BOSS</a>, <a href="https://github.com/cuckoosandbox/cuckoo">Cuckoo Sandbox</a>, <a href="https://sandbox.anlyz.io/">Anlyz.io</a>, <a href="https://github.com/decalage2/ViperMonkey">ViperMonkey</a>, <a href="https://github.com/bontchev/pcodedmp">pcodedmp</a>, <a href="https://dridex.malwareconfig.com">dridex.malwareconfig.com</a>, <a href="https://github.com/countercept/snake">Snake</a>, <a href="https://github.com/cryps1s/DARKSURGEON">DARKSURGEON</a>, <a href="https://github.com/ctxis/CAPE">CAPE</a>, <a href="https://www.cse-cst.gc.ca/en/assemblyline">AssemblyLine</a>, <a href="https://malshare.io">malshare.io</a>, <a href="https://www.adlice.com/download/mrf/">Malware Repository Framework (MRF)</a>, <a href="https://github.com/Tigzy/malware-repo">malware-repo</a>, <a href="https://github.com/MalwareCantFly/Vba2Graph">Vba2Graph</a>, <a href="https://github.com/target/strelka">Strelka</a>, <a href="https://stoq.punchcyber.com/">stoQ</a>, <a href="https://yomi.yoroi.company">YOMI</a>, and probably <a href="https://www.virustotal.com">VirusTotal</a>. And quite a few <a href="https://github.com/search?q=oletools&amp;type=Repositories">other projects on GitHub</a>. (Please <a href="(http://decalage.info/contact)">contact me</a> if you have or know a project using oletools)</p>
70 90 <h2 id="download-and-install">Download and Install:</h2>
71 91 <p>The recommended way to download and install/update the <strong>latest stable release</strong> of oletools is to use <a href="https://pip.pypa.io/en/stable/installing/">pip</a>:</p>
72 92 <ul>
... ... @@ -89,7 +109,7 @@
89 109 <p>The code is available in <a href="https://github.com/decalage2/oletools">a GitHub repository</a>. You may use it to submit enhancements using forks and pull requests.</p>
90 110 <h2 id="license">License</h2>
91 111 <p>This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files published with their own license.</p>
92   -<p>The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (http://www.decalage.info)</p>
  112 +<p>The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)</p>
93 113 <p>All rights reserved.</p>
94 114 <p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p>
95 115 <ul>
... ...
oletools/README.rst
1 1 python-oletools
2 2 ===============
3 3  
4   -|PyPI| |Build Status|
  4 +|PyPI| |Build Status| |Say Thanks!|
5 5  
6 6 `oletools <http://www.decalage.info/python/oletools>`__ is a package of
7 7 python tools to analyze `Microsoft OLE2
... ... @@ -29,7 +29,35 @@ Software.
29 29 News
30 30 ----
31 31  
32   -- **2018-05-30 v0.53**:
  32 +- **2019-05-22 v0.54.2**:
  33 +
  34 + - bugfix release: fixed several issues related to encrypted
  35 + documents and XLM/XLF Excel 4 macros
  36 + - msoffcrypto-tool is now installed by default to handle encrypted
  37 + documents
  38 + - olevba and msodde now handle documents encrypted with common
  39 + passwords such as 123, 1234, 4321, 12345, 123456, VelvetSweatShop
  40 + automatically.
  41 +
  42 +- **2019-04-04 v0.54**:
  43 +
  44 + - olevba, msodde: added support for encrypted MS Office files
  45 + - olevba: added detection and extraction of XLM/XLF Excel 4 macros
  46 + (thanks to plugin_biff from Didier Stevens' oledump)
  47 + - olevba, mraptor: added detection of VBA running Excel 4 macros
  48 + - olevba: detect and display special characters such as backspace
  49 + - olevba: colorized output showing suspicious keywords in the VBA
  50 + code
  51 + - olevba, mraptor: full Python 3 compatibility, no separate
  52 + olevba3/mraptor3 anymore
  53 + - olevba: improved handling of code pages and unicode
  54 + - olevba: fixed a false-positive in VBA macro detection
  55 + - rtfobj: improved OLE Package handling, improved Equation object
  56 + detection
  57 + - oleobj: added detection of external links to objects in OpenXML
  58 + - replaced third party packages by PyPI dependencies
  59 +
  60 +- 2018-05-30 v0.53:
33 61  
34 62 - olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML
35 63 files (aka Flat OPC format)
... ... @@ -115,6 +143,7 @@ Projects using oletools:
115 143 oletools are used by a number of projects and online malware analysis
116 144 services, including `Viper <http://viper.li/>`__,
117 145 `REMnux <https://remnux.org/>`__,
  146 +`FLARE-VM <https://github.com/fireeye/flare-vm>`__,
118 147 `FAME <https://certsocietegenerale.github.io/fame/>`__,
119 148 `Hybrid-analysis.com <https://www.hybrid-analysis.com/>`__, `Joe
120 149 Sandbox <https://www.document-analyzer.net/>`__,
... ... @@ -126,10 +155,21 @@ Sandbox &lt;https://github.com/cuckoosandbox/cuckoo&gt;`__,
126 155 `pcodedmp <https://github.com/bontchev/pcodedmp>`__,
127 156 `dridex.malwareconfig.com <https://dridex.malwareconfig.com>`__,
128 157 `Snake <https://github.com/countercept/snake>`__,
129   -`DARKSURGEON <https://github.com/cryps1s/DARKSURGEON>`__, and probably
130   -`VirusTotal <https://www.virustotal.com>`__. (Please `contact
131   -me <(http://decalage.info/contact)>`__ if you have or know a project
132   -using oletools)
  158 +`DARKSURGEON <https://github.com/cryps1s/DARKSURGEON>`__,
  159 +`CAPE <https://github.com/ctxis/CAPE>`__,
  160 +`AssemblyLine <https://www.cse-cst.gc.ca/en/assemblyline>`__,
  161 +`malshare.io <https://malshare.io>`__, `Malware Repository Framework
  162 +(MRF) <https://www.adlice.com/download/mrf/>`__,
  163 +`malware-repo <https://github.com/Tigzy/malware-repo>`__,
  164 +`Vba2Graph <https://github.com/MalwareCantFly/Vba2Graph>`__,
  165 +`Strelka <https://github.com/target/strelka>`__,
  166 +`stoQ <https://stoq.punchcyber.com/>`__,
  167 +`YOMI <https://yomi.yoroi.company>`__, and probably
  168 +`VirusTotal <https://www.virustotal.com>`__. And quite a few `other
  169 +projects on
  170 +GitHub <https://github.com/search?q=oletools&type=Repositories>`__.
  171 +(Please `contact me <(http://decalage.info/contact)>`__ if you have or
  172 +know a project using oletools)
133 173  
134 174 Download and Install:
135 175 ---------------------
... ... @@ -186,7 +226,7 @@ This license applies to the python-oletools package, apart from the
186 226 thirdparty folder which contains third-party files published with their
187 227 own license.
188 228  
189   -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec
  229 +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec
190 230 (http://www.decalage.info)
191 231  
192 232 All rights reserved.
... ... @@ -243,3 +283,5 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
243 283 :target: https://pypi.org/project/oletools/
244 284 .. |Build Status| image:: https://travis-ci.org/decalage2/oletools.svg?branch=master
245 285 :target: https://travis-ci.org/decalage2/oletools
  286 +.. |Say Thanks!| image:: https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg
  287 + :target: https://saythanks.io/to/decalage2
... ...
oletools/common/clsid.py
... ... @@ -12,7 +12,7 @@ http://www.decalage.info/python/oletools
12 12  
13 13 #=== LICENSE ==================================================================
14 14  
15   -# oletools are copyright (c) 2018 Philippe Lagadec (http://www.decalage.info)
  15 +# oletools are copyright (c) 2018-2019 Philippe Lagadec (http://www.decalage.info)
16 16 # All rights reserved.
17 17 #
18 18 # Redistribution and use in source and binary forms, with or without modification,
... ... @@ -43,7 +43,7 @@ http://www.decalage.info/python/oletools
43 43 # 2018-04-18 PL: - added known-bad CLSIDs from Cuckoo sandbox (issue #290)
44 44 # 2018-05-08 PL: - added more CLSIDs (issues #299, #304), merged and sorted
45 45  
46   -__version__ = '0.54dev3'
  46 +__version__ = '0.54'
47 47  
48 48  
49 49 # REFERENCES:
... ... @@ -137,9 +137,23 @@ KNOWN_CLSIDS = {
137 137 '85131630-480C-11D2-B1F9-00C04F86C324': 'scrrun.dll - JS File Host Encode Object (ProgID: JSFile.HostEncode)',
138 138 '85131631-480C-11D2-B1F9-00C04F86C324': 'scrrun.dll - VBS File Host Encode Object (ProgID: VBSFile.HostEncode)',
139 139 '8627E73B-B5AA-4643-A3B0-570EDA17E3E7': 'UmOutlookAddin.ButtonBar (potential exploit document CVE-2016-0042 / MS16-014)',
  140 + '88D969E5-F192-11D4-A65F-0040963251E5': 'Msxml2.DOMDocument.5.0',
  141 + '88D969E9-F192-11D4-A65F-0040963251E5': 'Msxml2.DSOControl.5.0',
  142 + '88D969E6-F192-11D4-A65F-0040963251E5': 'Msxml2.FreeThreadedDOMDocument.5.0',
  143 + '88D969F5-F192-11D4-A65F-0040963251E5': 'Msxml2.MXDigitalSignature.5.0',
  144 + '88D969F0-F192-11D4-A65F-0040963251E5': 'Msxml2.MXHTMLWriter.5.0',
  145 + '88D969F1-F192-11D4-A65F-0040963251E5': 'Msxml2.MXNamespaceManager.5.0',
  146 + '88D969EF-F192-11D4-A65F-0040963251E5': 'Msxml2.MXXMLWriter.5.0',
  147 + '88D969EE-F192-11D4-A65F-0040963251E5': 'Msxml2.SAXAttributes.5.0',
  148 + '88D969EC-8B8B-4C3D-859E-AF6CD158BE0F': 'Msxml2.SAXXMLReader.5.0',
  149 + '88D969EB-F192-11D4-A65F-0040963251E5': 'Msxml2.ServerXMLHTTP.5.0',
  150 + '88D969EA-F192-11D4-A65F-0040963251E5': 'Msxml2.XMLHTTP.5.0',
  151 + '88D969E7-F192-11D4-A65F-0040963251E5': 'Msxml2.XMLSchemaCache.5.0',
  152 + '88D969E8-F192-11D4-A65F-0040963251E5': 'Msxml2.XSLTemplate.5.0',
140 153 '8E75D913-3D21-11D2-85C4-080009A0C626': 'AutoCAD 2004-2006 Document',
141 154 '9181DC5F-E07D-418A-ACA6-8EEA1ECB8E9E': 'MSCOMCTL.TreeCtrl (may trigger CVE-2012-0158)',
142 155 '975797FC-4E2A-11D0-B702-00C04FD8DBF7': 'Loads ELSEXT.DLL (Known Related to CVE-2015-6128)',
  156 + '978C9E23-D4B0-11CE-BF2D-00AA003F40D0': 'Microsoft Forms 2.0 Label (Forms.Label.1)',
143 157 '996BF5E0-8044-4650-ADEB-0B013914E99C': 'MSCOMCTL.ListViewCtrl (may trigger CVE-2012-0158)',
144 158 'A08A033D-1A75-4AB6-A166-EAD02F547959': 'otkloadr WRAssembly Object (can be used to bypass ASLR after triggering an exploit)',
145 159 'B54F3741-5B07-11CF-A4B0-00AA004A55E8': 'vbscript.dll - VB Script Language (ProgID: VBS, VBScript)',
... ...
oletools/common/codepages.py 0 → 100644
  1 +"""
  2 +codepages.py
  3 +
  4 +codepages is a python module to map code pages (numbers) to Python codecs,
  5 +in order to decode bytes to unicode.
  6 +It also provides the name/description of code pages.
  7 +
  8 +Author: Philippe Lagadec - http://www.decalage.info
  9 +License: BSD, see source code or documentation
  10 +
  11 +codepages is part of the python-oletools package:
  12 +http://www.decalage.info/python/oletools
  13 +"""
  14 +
  15 +# === LICENSE ==================================================================
  16 +
  17 +# codepages is copyright (c) 2018-2019 Philippe Lagadec (http://www.decalage.info)
  18 +# All rights reserved.
  19 +#
  20 +# Redistribution and use in source and binary forms, with or without modification,
  21 +# are permitted provided that the following conditions are met:
  22 +#
  23 +# * Redistributions of source code must retain the above copyright notice, this
  24 +# list of conditions and the following disclaimer.
  25 +# * Redistributions in binary form must reproduce the above copyright notice,
  26 +# this list of conditions and the following disclaimer in the documentation
  27 +# and/or other materials provided with the distribution.
  28 +#
  29 +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  30 +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  31 +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  32 +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  33 +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  34 +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  35 +# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  36 +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  37 +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  38 +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  39 +
  40 +
  41 +# -----------------------------------------------------------------------------
  42 +# CHANGELOG:
  43 +# 2018-12-13 v0.54 PL: - first version
  44 +# 2019-01-30 PL: - added a few code pages from xlrd
  45 +
  46 +__version__ = '0.54'
  47 +
  48 +# -----------------------------------------------------------------------------
  49 +# TODO:
  50 +# TODO: check also http://www.aivosto.com/articles/charsets-codepages.html
  51 +# TODO: https://en.wikipedia.org/wiki/Code_page
  52 +
  53 +# -----------------------------------------------------------------------------
  54 +# REFERENCES:
  55 +# - https://docs.microsoft.com/en-gb/windows/desktop/Intl/code-page-identifiers
  56 +
  57 +
  58 +# --- IMPORTS -----------------------------------------------------------------
  59 +
  60 +import codecs
  61 +
  62 +# === CONSTANTS ===============================================================
  63 +
  64 +# Code page names from https://docs.microsoft.com/en-gb/windows/desktop/Intl/code-page-identifiers
  65 +# Retrieved on the 2018-12-13
  66 +# How it was converted to Python:
  67 +# 1) copy the table data (3 columns) from browser into Excel
  68 +# 2) use the following formula to concatenate 1st and 3rd columns: =A1 & ": " & "'" & C1 & "',"
  69 +# 3) copy from Excel into Python
  70 +
  71 +CODEPAGE_NAME = {
  72 + 37: 'IBM EBCDIC US-Canada',
  73 + 437: 'OEM United States',
  74 + 500: 'IBM EBCDIC International',
  75 + 708: 'Arabic (ASMO 708)',
  76 + 709: 'Arabic (ASMO-449+, BCON V4)',
  77 + 710: 'Arabic - Transparent Arabic',
  78 + 720: 'Arabic (Transparent ASMO); Arabic (DOS)',
  79 + 737: 'OEM Greek (formerly 437G); Greek (DOS)',
  80 + 775: 'OEM Baltic; Baltic (DOS)',
  81 + 850: 'OEM Multilingual Latin 1; Western European (DOS)',
  82 + 852: 'OEM Latin 2; Central European (DOS)',
  83 + 855: 'OEM Cyrillic (primarily Russian)',
  84 + 857: 'OEM Turkish; Turkish (DOS)',
  85 + 858: 'OEM Multilingual Latin 1 + Euro symbol',
  86 + 860: 'OEM Portuguese; Portuguese (DOS)',
  87 + 861: 'OEM Icelandic; Icelandic (DOS)',
  88 + 862: 'OEM Hebrew; Hebrew (DOS)',
  89 + 863: 'OEM French Canadian; French Canadian (DOS)',
  90 + 864: 'OEM Arabic; Arabic (864)',
  91 + 865: 'OEM Nordic; Nordic (DOS)',
  92 + 866: 'OEM Russian; Cyrillic (DOS)',
  93 + 869: 'OEM Modern Greek; Greek, Modern (DOS)',
  94 + 870: 'IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2',
  95 + 874: 'ANSI/OEM Thai (ISO 8859-11); Thai (Windows)',
  96 + 875: 'IBM EBCDIC Greek Modern',
  97 + 932: 'ANSI/OEM Japanese; Japanese (Shift-JIS)',
  98 + 936: 'ANSI/OEM Simplified Chinese (PRC, Singapore); Chinese Simplified (GB2312)',
  99 + 949: 'ANSI/OEM Korean (Unified Hangul Code)',
  100 + 950: 'ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5)',
  101 + 1026: 'IBM EBCDIC Turkish (Latin 5)',
  102 + 1047: 'IBM EBCDIC Latin 1/Open System',
  103 + 1140: 'IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro)',
  104 + 1141: 'IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro)',
  105 + 1142: 'IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro)',
  106 + 1143: 'IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro)',
  107 + 1144: 'IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro)',
  108 + 1145: 'IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro)',
  109 + 1146: 'IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro)',
  110 + 1147: 'IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro)',
  111 + 1148: 'IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro)',
  112 + 1149: 'IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro)',
  113 + 1200: 'Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications',
  114 + 1201: 'Unicode UTF-16, big endian byte order; available only to managed applications',
  115 + 1250: 'ANSI Central European; Central European (Windows)',
  116 + 1251: 'ANSI Cyrillic; Cyrillic (Windows)',
  117 + 1252: 'ANSI Latin 1; Western European (Windows)',
  118 + 1253: 'ANSI Greek; Greek (Windows)',
  119 + 1254: 'ANSI Turkish; Turkish (Windows)',
  120 + 1255: 'ANSI Hebrew; Hebrew (Windows)',
  121 + 1256: 'ANSI Arabic; Arabic (Windows)',
  122 + 1257: 'ANSI Baltic; Baltic (Windows)',
  123 + 1258: 'ANSI/OEM Vietnamese; Vietnamese (Windows)',
  124 + 1361: 'Korean (Johab)',
  125 + 10000: 'MAC Roman; Western European (Mac)',
  126 + 10001: 'Japanese (Mac)',
  127 + 10002: 'MAC Traditional Chinese (Big5); Chinese Traditional (Mac)',
  128 + 10003: 'Korean (Mac)',
  129 + 10004: 'Arabic (Mac)',
  130 + 10005: 'Hebrew (Mac)',
  131 + 10006: 'Greek (Mac)',
  132 + 10007: 'Cyrillic (Mac)',
  133 + 10008: 'MAC Simplified Chinese (GB 2312); Chinese Simplified (Mac)',
  134 + 10010: 'Romanian (Mac)',
  135 + 10017: 'Ukrainian (Mac)',
  136 + 10021: 'Thai (Mac)',
  137 + 10029: 'MAC Latin 2; Central European (Mac)',
  138 + 10079: 'Icelandic (Mac)',
  139 + 10081: 'Turkish (Mac)',
  140 + 10082: 'Croatian (Mac)',
  141 + 12000: 'Unicode UTF-32, little endian byte order; available only to managed applications',
  142 + 12001: 'Unicode UTF-32, big endian byte order; available only to managed applications',
  143 + 20000: 'CNS Taiwan; Chinese Traditional (CNS)',
  144 + 20001: 'TCA Taiwan',
  145 + 20002: 'Eten Taiwan; Chinese Traditional (Eten)',
  146 + 20003: 'IBM5550 Taiwan',
  147 + 20004: 'TeleText Taiwan',
  148 + 20005: 'Wang Taiwan',
  149 + 20105: 'IA5 (IRV International Alphabet No. 5, 7-bit); Western European (IA5)',
  150 + 20106: 'IA5 German (7-bit)',
  151 + 20107: 'IA5 Swedish (7-bit)',
  152 + 20108: 'IA5 Norwegian (7-bit)',
  153 + 20127: 'US-ASCII (7-bit)',
  154 + 20261: 'T.61',
  155 + 20269: 'ISO 6937 Non-Spacing Accent',
  156 + 20273: 'IBM EBCDIC Germany',
  157 + 20277: 'IBM EBCDIC Denmark-Norway',
  158 + 20278: 'IBM EBCDIC Finland-Sweden',
  159 + 20280: 'IBM EBCDIC Italy',
  160 + 20284: 'IBM EBCDIC Latin America-Spain',
  161 + 20285: 'IBM EBCDIC United Kingdom',
  162 + 20290: 'IBM EBCDIC Japanese Katakana Extended',
  163 + 20297: 'IBM EBCDIC France',
  164 + 20420: 'IBM EBCDIC Arabic',
  165 + 20423: 'IBM EBCDIC Greek',
  166 + 20424: 'IBM EBCDIC Hebrew',
  167 + 20833: 'IBM EBCDIC Korean Extended',
  168 + 20838: 'IBM EBCDIC Thai',
  169 + 20866: 'Russian (KOI8-R); Cyrillic (KOI8-R)',
  170 + 20871: 'IBM EBCDIC Icelandic',
  171 + 20880: 'IBM EBCDIC Cyrillic Russian',
  172 + 20905: 'IBM EBCDIC Turkish',
  173 + 20924: 'IBM EBCDIC Latin 1/Open System (1047 + Euro symbol)',
  174 + 20932: 'Japanese (JIS 0208-1990 and 0212-1990)',
  175 + 20936: 'Simplified Chinese (GB2312); Chinese Simplified (GB2312-80)',
  176 + 20949: 'Korean Wansung',
  177 + 21025: 'IBM EBCDIC Cyrillic Serbian-Bulgarian',
  178 + 21027: '(deprecated)',
  179 + 21866: 'Ukrainian (KOI8-U); Cyrillic (KOI8-U)',
  180 + 28591: 'ISO 8859-1 Latin 1; Western European (ISO)',
  181 + 28592: 'ISO 8859-2 Central European; Central European (ISO)',
  182 + 28593: 'ISO 8859-3 Latin 3',
  183 + 28594: 'ISO 8859-4 Baltic',
  184 + 28595: 'ISO 8859-5 Cyrillic',
  185 + 28596: 'ISO 8859-6 Arabic',
  186 + 28597: 'ISO 8859-7 Greek',
  187 + 28598: 'ISO 8859-8 Hebrew; Hebrew (ISO-Visual)',
  188 + 28599: 'ISO 8859-9 Turkish',
  189 + 28603: 'ISO 8859-13 Estonian',
  190 + 28605: 'ISO 8859-15 Latin 9',
  191 + 29001: 'Europa 3',
  192 + 38598: 'ISO 8859-8 Hebrew; Hebrew (ISO-Logical)',
  193 + 50220: 'ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS)',
  194 + 50221: 'ISO 2022 Japanese with halfwidth Katakana; Japanese (JIS-Allow 1 byte Kana)',
  195 + 50222: 'ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI)',
  196 + 50225: 'ISO 2022 Korean',
  197 + 50227: 'ISO 2022 Simplified Chinese; Chinese Simplified (ISO 2022)',
  198 + 50229: 'ISO 2022 Traditional Chinese',
  199 + 50930: 'EBCDIC Japanese (Katakana) Extended',
  200 + 50931: 'EBCDIC US-Canada and Japanese',
  201 + 50933: 'EBCDIC Korean Extended and Korean',
  202 + 50935: 'EBCDIC Simplified Chinese Extended and Simplified Chinese',
  203 + 50936: 'EBCDIC Simplified Chinese',
  204 + 50937: 'EBCDIC US-Canada and Traditional Chinese',
  205 + 50939: 'EBCDIC Japanese (Latin) Extended and Japanese',
  206 + 51932: 'EUC Japanese',
  207 + 51936: 'EUC Simplified Chinese; Chinese Simplified (EUC)',
  208 + 51949: 'EUC Korean',
  209 + 51950: 'EUC Traditional Chinese',
  210 + 52936: 'HZ-GB2312 Simplified Chinese; Chinese Simplified (HZ)',
  211 + 54936: 'Windows XP and later: GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030)',
  212 + 57002: 'ISCII Devanagari',
  213 + 57003: 'ISCII Bangla',
  214 + 57004: 'ISCII Tamil',
  215 + 57005: 'ISCII Telugu',
  216 + 57006: 'ISCII Assamese',
  217 + 57007: 'ISCII Odia',
  218 + 57008: 'ISCII Kannada',
  219 + 57009: 'ISCII Malayalam',
  220 + 57010: 'ISCII Gujarati',
  221 + 57011: 'ISCII Punjabi',
  222 + 65000: 'Unicode (UTF-7)',
  223 + 65001: 'Unicode (UTF-8)',
  224 +}
  225 +
  226 +
  227 +# Mapping from codepages to Python codecs, when 'cpXXX' does not work
  228 +# (inspired from http://stackoverflow.com/questions/1592925/decoding-mac-os-text-in-python)
  229 +CODEPAGE_TO_CODEC = {
  230 + 37: 'cp037',
  231 + 708: 'arabic', # not found: Arabic (ASMO 708) => arabic = iso-8859-6
  232 + 709: 'arabic', # not found: Arabic (ASMO-449+, BCON V4) => arabic = iso-8859-6
  233 + 710: 'arabic', # not found: Arabic - Transparent Arabic => arabic = iso-8859-6
  234 + 870: 'latin2', # IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2
  235 + 1047: 'latin1', # IBM EBCDIC Latin 1/Open System
  236 + 1141: 'cp273', # IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro)
  237 + 1200: 'utf_16_le', # Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications
  238 + 1201: 'utf_16_be', # Unicode UTF-16, big endian byte order; available only to managed applications
  239 +
  240 + 10000: 'mac-roman',
  241 + 10001: 'shiftjis', # not found: 'mac-shift-jis',
  242 + 10002: 'big5', # not found: 'mac-big5',
  243 + 10003: 'ascii', # nothing appropriate found: 'mac-hangul',
  244 + 10004: 'mac-arabic',
  245 + 10005: 'hebrew', # not found: 'mac-hebrew',
  246 + 10006: 'mac-greek',
  247 + #10007: 'ascii', # nothing appropriate found: 'mac-russian',
  248 + 10007: 'mac_cyrillic', # guess (from xlrd)
  249 + 10008: 'gb2312', # not found: 'mac-gb2312',
  250 + 10021: 'thai', # not found: mac-thai',
  251 + #10029: 'maccentraleurope', # not found: 'mac-east europe',
  252 + 10029: 'mac_latin2', # guess (from xlrd)
  253 + 10079: 'mac_iceland', # guess (from xlrd)
  254 + 10081: 'mac-turkish',
  255 +
  256 + 12000: 'utf_32_le', # Unicode UTF-32, little endian byte order
  257 + 12001: 'utf_32_be', # Unicode UTF-32, big endian byte order
  258 +
  259 + 20127: 'ascii',
  260 +
  261 + 28591: 'latin1',
  262 + 28592: 'iso8859_2',
  263 + 28593: 'iso8859_3',
  264 + 28594: 'iso8859_4',
  265 + 28595: 'iso8859_5',
  266 + 28596: 'iso8859_6',
  267 + 28597: 'iso8859_7',
  268 + 28598: 'iso8859_8',
  269 + 28599: 'iso8859_9',
  270 + 28603: 'iso8859_13',
  271 + 28605: 'iso8859_15',
  272 +
  273 + 32768: 'mac_roman', # from xlrd
  274 + 32769: 'cp1252', # from xlrd
  275 + 38598: 'iso8859_8',
  276 +
  277 + 65000: 'utf7',
  278 + 65001: 'utf8',
  279 +}
  280 +
  281 +
  282 +# === FUNCTIONS ==============================================================
  283 +
  284 +def codepage2codec(codepage):
  285 + """
  286 + convert a codepage number to a Python codec.
  287 + If the corresponding codec cannot be found, returns "utf8" by default.
  288 +
  289 + :param codepage: int, code page number
  290 + :return: str, Python codec name
  291 + """
  292 + if codepage in CODEPAGE_TO_CODEC:
  293 + codec = CODEPAGE_TO_CODEC[codepage]
  294 + else:
  295 + codec = 'cp%d' % codepage
  296 + try:
  297 + codecs.lookup(codec)
  298 + except LookupError:
  299 + #log.error('Codec not found for code page %d, using UTF-8 as fallback.' % codepage)
  300 + codec = 'utf8'
  301 + return codec
  302 +
  303 +
  304 +def get_codepage_name(codepage):
  305 + """
  306 + return the name of a codepage based on its number
  307 + :param codepage: int, codepage number
  308 + :return: str, codepage name
  309 + """
  310 + return CODEPAGE_NAME.get(codepage, 'Unknown code page')
  311 +
  312 +
  313 +# === MAIN: TESTS ============================================================
  314 +
  315 +if __name__ == '__main__':
  316 + for cp in sorted(CODEPAGE_NAME.keys()):
  317 + print('Code Page: %d => codec: %s - %s' % (cp, codepage2codec(cp), CODEPAGE_NAME[cp]))
0 318 \ No newline at end of file
... ...
oletools/common/errors.py
... ... @@ -4,10 +4,42 @@ Errors used in several tools to avoid duplication
4 4 .. codeauthor:: Intra2net AG <info@intra2net.com>
5 5 """
6 6  
7   -class FileIsEncryptedError(ValueError):
  7 +class CryptoErrorBase(ValueError):
  8 + """Base class for crypto-based exceptions."""
  9 + pass
  10 +
  11 +
  12 +class CryptoLibNotImported(CryptoErrorBase, ImportError):
  13 + """Exception thrown if msoffcrypto is needed but could not be imported."""
  14 +
  15 + def __init__(self):
  16 + super(CryptoLibNotImported, self).__init__(
  17 + 'msoffcrypto-tools is not installed. Please run "pip install msoffcrypto-tool" or see https://github.com/nolze/msoffcrypto-tool')
  18 +
  19 +
  20 +class UnsupportedEncryptionError(CryptoErrorBase):
8 21 """Exception thrown if file is encrypted and cannot deal with it."""
9   - # see also: same class in olevba[3] and record_base
10 22 def __init__(self, filename=None):
11   - super(FileIsEncryptedError, self).__init__(
  23 + super(UnsupportedEncryptionError, self).__init__(
12 24 'Office file {}is encrypted, not yet supported'
13 25 .format('' if filename is None else filename + ' '))
  26 +
  27 +
  28 +class WrongEncryptionPassword(CryptoErrorBase):
  29 + """Exception thrown if encryption could be handled but passwords wrong."""
  30 + def __init__(self, filename=None):
  31 + super(WrongEncryptionPassword, self).__init__(
  32 + 'Given passwords could not decrypt office file{}, use option -p to specify the password'
  33 + .format('' if filename is None else ' ' + filename))
  34 +
  35 +
  36 +class MaxCryptoNestingReached(CryptoErrorBase):
  37 + """
  38 + Exception thrown if decryption is too deeply layered.
  39 +
  40 + (...or decrypt code creates inf loop)
  41 + """
  42 + def __init__(self, n_layers, filename=None):
  43 + super(MaxCryptoNestingReached, self).__init__(
  44 + 'Encountered more than {} layers of encryption for office file{}'
  45 + .format(n_layers, '' if filename is None else ' ' + filename))
... ...
oletools/common/log_helper/_json_formatter.py
... ... @@ -13,8 +13,13 @@ class JsonFormatter(logging.Formatter):
13 13 Since we don't buffer messages, we always prepend messages with a comma to make
14 14 the output JSON-compatible. The only exception is when printing the first line,
15 15 so we need to keep track of it.
  16 +
  17 + We assume that all input comes from the OletoolsLoggerAdapter which
  18 + ensures that there is a `type` field in the record. Otherwise will have
  19 + to add a try-except around the access to `record.type`.
16 20 """
17   - json_dict = dict(msg=record.msg, level=record.levelname)
  21 + json_dict = dict(msg=record.msg.replace('\n', ' '), level=record.levelname)
  22 + json_dict['type'] = record.type
18 23 formatted_message = ' ' + json.dumps(json_dict)
19 24  
20 25 if self._is_first_line:
... ...
oletools/common/log_helper/_logger_adapter.py
... ... @@ -8,18 +8,45 @@ class OletoolsLoggerAdapter(logging.LoggerAdapter):
8 8 """
9 9 _json_enabled = None
10 10  
11   - def print_str(self, message):
  11 + def print_str(self, message, **kwargs):
12 12 """
13 13 This function replaces normal print() calls so we can format them as JSON
14 14 when needed or just print them right away otherwise.
15 15 """
16 16 if self._json_enabled and self._json_enabled():
17 17 # Messages from this function should always be printed,
18   - # so when using JSON we log using the same level that set
19   - self.log(_root_logger_wrapper.level(), message)
  18 + # so when using JSON we log using the same level that set.
  19 + # Additional information in kwargs is added to LogRecord
  20 + self.log(_root_logger_wrapper.level(), message, extra=kwargs)
20 21 else:
21 22 print(message)
22 23  
  24 + def log(self, lvl, msg, *args, **kwargs):
  25 + """
  26 + Run :py:meth:`process` on kwargs, then forward to actual logger.
  27 +
  28 + This is based on the logging cookbox, section "Using LoggerAdapter to
  29 + impart contextual information".
  30 + """
  31 + msg, kwargs = self.process(msg, kwargs)
  32 + self.logger.log(lvl, msg, *args, **kwargs)
  33 +
  34 + def process(self, msg, kwargs):
  35 + """
  36 + Ensure `kwargs['extra']['type']` exists, init with given arg `type`.
  37 +
  38 + The `type` field will be added to the :py:class:`logging.LogRecord` and
  39 + is used by the :py:class:`JsonFormatter`.
  40 + """
  41 + if 'extra' not in kwargs:
  42 + kwargs['extra'] = {}
  43 + if 'type' in kwargs:
  44 + kwargs['extra']['type'] = kwargs['type']
  45 + del kwargs['type'] # downstream loggers cannot deal with this
  46 + if 'type' not in kwargs['extra']:
  47 + kwargs['extra']['type'] = 'msg' # type will be added to LogRecord
  48 + return msg, kwargs
  49 +
23 50 def set_json_enabled_function(self, json_enabled):
24 51 """
25 52 Set a function to be called to check whether JSON output is enabled.
... ...
oletools/crypto.py 0 → 100644
  1 +#!/usr/bin/env python
  2 +"""
  3 +crypto.py
  4 +
  5 +Module to be used by other scripts and modules in oletools, that provides
  6 +information on encryption in OLE files.
  7 +
  8 +Uses :py:mod:`msoffcrypto-tool` to decrypt if it is available. Otherwise
  9 +decryption will fail with an ImportError.
  10 +
  11 +Encryption/Write-Protection can be realized in many different ways. They range
  12 +from setting a single flag in an otherwise unprotected file to embedding a
  13 +regular file (e.g. xlsx) in an EncryptedStream inside an OLE file. That means
  14 +that (1) that lots of bad things are accesible even if no encryption password
  15 +is known, and (2) even basic attributes like the file type can change by
  16 +decryption. Therefore I suggest the following general routine to deal with
  17 +potentially encrypted files::
  18 +
  19 + def script_main_function(input_file, passwords, crypto_nesting=0, args):
  20 + '''Wrapper around main function to deal with encrypted files.'''
  21 + initial_stuff(input_file, args)
  22 + result = None
  23 + try:
  24 + result = do_your_thing_assuming_no_encryption(input_file)
  25 + if not crypto.is_encrypted(input_file):
  26 + return result
  27 + except Exception:
  28 + if not crypto.is_encrypted(input_file):
  29 + raise
  30 + # we reach this point only if file is encrypted
  31 + # check if this is an encrypted file in an encrypted file in an ...
  32 + if crypto_nesting >= crypto.MAX_NESTING_DEPTH:
  33 + raise crypto.MaxCryptoNestingReached(crypto_nesting, filename)
  34 + decrypted_file = None
  35 + try:
  36 + decrypted_file = crypto.decrypt(input_file, passwords)
  37 + if decrypted_file is None:
  38 + raise crypto.WrongEncryptionPassword(input_file)
  39 + # might still be encrypted, so call this again recursively
  40 + result = script_main_function(decrypted_file, passwords,
  41 + crypto_nesting+1, args)
  42 + except Exception:
  43 + raise
  44 + finally: # clean up
  45 + try: # (maybe file was not yet created)
  46 + os.unlink(decrypted_file)
  47 + except Exception:
  48 + pass
  49 +
  50 +(Realized e.g. in :py:mod:`oletools.msodde`).
  51 +That means that caller code needs another wrapper around its main function. I
  52 +did try it another way first (a transparent on-demand unencrypt) but for the
  53 +above reasons I believe this is the better way. Also, non-top-level-code can
  54 +just assume that it works on unencrypted data and fail with an exception if
  55 +encrypted data makes its work impossible. No need to check `if is_encrypted()`
  56 +at the start of functions.
  57 +
  58 +.. seealso:: [MS-OFFCRYPTO]
  59 +.. seealso:: https://github.com/nolze/msoffcrypto-tool
  60 +
  61 +crypto is part of the python-oletools package:
  62 +http://www.decalage.info/python/oletools
  63 +"""
  64 +
  65 +# === LICENSE =================================================================
  66 +
  67 +# crypto is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info)
  68 +# All rights reserved.
  69 +#
  70 +# Redistribution and use in source and binary forms, with or without
  71 +# modification, are permitted provided that the following conditions are met:
  72 +#
  73 +# * Redistributions of source code must retain the above copyright notice,
  74 +# this list of conditions and the following disclaimer.
  75 +# * Redistributions in binary form must reproduce the above copyright notice,
  76 +# this list of conditions and the following disclaimer in the documentation
  77 +# and/or other materials provided with the distribution.
  78 +#
  79 +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  80 +# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  81 +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  82 +# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
  83 +# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  84 +# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  85 +# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  86 +# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  87 +# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  88 +# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  89 +# POSSIBILITY OF SUCH DAMAGE.
  90 +
  91 +# -----------------------------------------------------------------------------
  92 +# CHANGELOG:
  93 +# 2019-02-14 v0.01 CH: - first version with encryption check from oleid
  94 +# 2019-04-01 v0.54 PL: - fixed bug in is_encrypted_ole
  95 +# 2019-05-23 PL: - added DEFAULT_PASSWORDS list
  96 +
  97 +__version__ = '0.54.2'
  98 +
  99 +import sys
  100 +import struct
  101 +import os
  102 +from os.path import splitext, isfile
  103 +from tempfile import mkstemp
  104 +import zipfile
  105 +import logging
  106 +
  107 +from olefile import OleFileIO
  108 +
  109 +try:
  110 + import msoffcrypto
  111 +except ImportError:
  112 + msoffcrypto = None
  113 +
  114 +# IMPORTANT: it should be possible to run oletools directly as scripts
  115 +# in any directory without installing them with pip or setup.py.
  116 +# In that case, relative imports are NOT usable.
  117 +# And to enable Python 2+3 compatibility, we need to use absolute imports,
  118 +# so we add the oletools parent folder to sys.path (absolute+normalized path):
  119 +_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
  120 +_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
  121 +if _parent_dir not in sys.path:
  122 + sys.path.insert(0, _parent_dir)
  123 +
  124 +from oletools.common.errors import CryptoErrorBase, WrongEncryptionPassword, \
  125 + UnsupportedEncryptionError, MaxCryptoNestingReached, CryptoLibNotImported
  126 +from oletools.common.log_helper import log_helper
  127 +
  128 +
  129 +#: if there is an encrypted file embedded in an encrypted file,
  130 +#: how deep down do we go
  131 +MAX_NESTING_DEPTH = 10
  132 +
  133 +# === LOGGING =================================================================
  134 +
  135 +# TODO: use log_helper instead
  136 +
  137 +def get_logger(name, level=logging.CRITICAL+1):
  138 + """
  139 + Create a suitable logger object for this module.
  140 + The goal is not to change settings of the root logger, to avoid getting
  141 + other modules' logs on the screen.
  142 + If a logger exists with same name, reuse it. (Else it would have duplicate
  143 + handlers and messages would be doubled.)
  144 + The level is set to CRITICAL+1 by default, to avoid any logging.
  145 + """
  146 + # First, test if there is already a logger with the same name, else it
  147 + # will generate duplicate messages (due to duplicate handlers):
  148 + if name in logging.Logger.manager.loggerDict:
  149 + # NOTE: another less intrusive but more "hackish" solution would be to
  150 + # use getLogger then test if its effective level is not default.
  151 + logger = logging.getLogger(name)
  152 + # make sure level is OK:
  153 + logger.setLevel(level)
  154 + return logger
  155 + # get a new logger:
  156 + logger = logging.getLogger(name)
  157 + # only add a NullHandler for this logger, it is up to the application
  158 + # to configure its own logging:
  159 + logger.addHandler(logging.NullHandler())
  160 + logger.setLevel(level)
  161 + return logger
  162 +
  163 +# a global logger object used for debugging:
  164 +log = get_logger('crypto')
  165 +
  166 +def enable_logging():
  167 + """
  168 + Enable logging for this module (disabled by default).
  169 + This will set the module-specific logger level to NOTSET, which
  170 + means the main application controls the actual logging level.
  171 + """
  172 + log.setLevel(logging.NOTSET)
  173 +
  174 +
  175 +def is_encrypted(some_file):
  176 + """
  177 + Determine whether document contains encrypted content.
  178 +
  179 + This should return False for documents that are just write-protected or
  180 + signed or finalized. It should return True if ANY content of the file is
  181 + encrypted and can therefore not be analyzed by other oletools modules
  182 + without given a password.
  183 +
  184 + Exception: there are way to write-protect an office document by embedding
  185 + it as encrypted stream with hard-coded standard password into an otherwise
  186 + empty OLE file. From an office user point of view, this is no encryption,
  187 + but regarding file structure this is encryption, so we return `True` for
  188 + these.
  189 +
  190 + This should not raise exceptions needlessly.
  191 +
  192 + This implementation is rather simple: it returns True if the file contains
  193 + streams with typical encryption names (c.f. [MS-OFFCRYPTO]). It does not
  194 + test whether these streams actually contain data or whether the ole file
  195 + structure contains the necessary references to these. It also checks the
  196 + "well-known property" PIDSI_DOC_SECURITY if the SummaryInformation stream
  197 + is accessible (c.f. [MS-OLEPS] 2.25.1)
  198 +
  199 + :param some_file: File name or an opened OleFileIO
  200 + :type some_file: :py:class:`olefile.OleFileIO` or `str`
  201 + :returns: True if (and only if) the file contains encrypted content
  202 + """
  203 + log.debug('is_encrypted')
  204 +
  205 + # ask msoffcrypto if possible
  206 + if check_msoffcrypto():
  207 + log.debug('Checking for encryption using msoffcrypto')
  208 + file_handle = None
  209 + file_pos = None
  210 + try:
  211 + if isinstance(some_file, OleFileIO):
  212 + # TODO: hacky, replace once msoffcrypto-tools accepts OleFileIO
  213 + file_handle = some_file.fp
  214 + file_pos = file_handle.tell()
  215 + file_handle.seek(0)
  216 + else:
  217 + file_handle = open(some_file, 'rb')
  218 +
  219 + return msoffcrypto.OfficeFile(file_handle).is_encrypted()
  220 +
  221 + except Exception as exc:
  222 + log.warning('msoffcrypto failed to interpret file {} or determine '
  223 + 'whether it is encrypted: {}'
  224 + .format(file_handle.name, exc))
  225 +
  226 + finally:
  227 + try:
  228 + if file_pos is not None: # input was OleFileIO
  229 + file_handle.seek(file_pos)
  230 + else: # input was file name
  231 + file_handle.close()
  232 + except Exception as exc:
  233 + log.warning('Ignoring error during clean up: {}'.format(exc))
  234 +
  235 + # if that failed, try ourselves with older and less accurate code
  236 + try:
  237 + if isinstance(some_file, OleFileIO):
  238 + return _is_encrypted_ole(some_file)
  239 + if zipfile.is_zipfile(some_file):
  240 + return _is_encrypted_zip(some_file)
  241 + # otherwise assume it is the name of an ole file
  242 + with OleFileIO(some_file) as ole:
  243 + return _is_encrypted_ole(ole)
  244 + except Exception as exc:
  245 + log.warning('Failed to check {} for encryption ({}); assume it is not '
  246 + 'encrypted.'.format(some_file, exc))
  247 +
  248 + return False
  249 +
  250 +
  251 +def _is_encrypted_zip(filename):
  252 + """Specialization of :py:func:`is_encrypted` for zip-based files."""
  253 + log.debug('Checking for encryption in zip file')
  254 + # TODO: distinguish OpenXML from normal zip files
  255 + # try to decrypt a few bytes from first entry
  256 + with zipfile.ZipFile(filename, 'r') as zipper:
  257 + first_entry = zipper.infolist()[0]
  258 + try:
  259 + with zipper.open(first_entry, 'r') as reader:
  260 + reader.read(min(16, first_entry.file_size))
  261 + return False
  262 + except RuntimeError as rt_err:
  263 + return 'crypt' in str(rt_err)
  264 +
  265 +
  266 +def _is_encrypted_ole(ole):
  267 + """Specialization of :py:func:`is_encrypted` for ole files."""
  268 + log.debug('Checking for encryption in OLE file')
  269 + # check well known property for password protection
  270 + # (this field may be missing for Powerpoint2000, for example)
  271 + # TODO: check whether password protection always implies encryption. Could
  272 + # write-protection or signing with password trigger this as well?
  273 + if ole.exists("\x05SummaryInformation"):
  274 + suminfo_data = ole.getproperties("\x05SummaryInformation")
  275 + if 0x13 in suminfo_data and (suminfo_data[0x13] & 1):
  276 + return True
  277 +
  278 + # check a few stream names
  279 + # TODO: check whether these actually contain data and whether other
  280 + # necessary properties exist / are set
  281 + if ole.exists('EncryptionInfo'):
  282 + log.debug('found stream EncryptionInfo')
  283 + return True
  284 + # or an encrypted ppt file
  285 + if ole.exists('EncryptedSummary') and \
  286 + not ole.exists('SummaryInformation'):
  287 + return True
  288 +
  289 + # Word-specific old encryption:
  290 + if ole.exists('WordDocument'):
  291 + # check for Word-specific encryption flag:
  292 + stream = None
  293 + try:
  294 + stream = ole.openstream(["WordDocument"])
  295 + # pass header 10 bytes
  296 + stream.read(10)
  297 + # read flag structure:
  298 + temp16 = struct.unpack("H", stream.read(2))[0]
  299 + f_encrypted = (temp16 & 0x0100) >> 8
  300 + if f_encrypted:
  301 + return True
  302 + finally:
  303 + if stream is not None:
  304 + stream.close()
  305 +
  306 + # no indication of encryption
  307 + return False
  308 +
  309 +
  310 +#: one way to achieve "write protection" in office files is to encrypt the file
  311 +#: using this password
  312 +WRITE_PROTECT_ENCRYPTION_PASSWORD = 'VelvetSweatshop'
  313 +
  314 +#: list of common passwords to be tried by default, used by malware
  315 +DEFAULT_PASSWORDS = [WRITE_PROTECT_ENCRYPTION_PASSWORD, '123', '1234', '12345', '123456', '4321']
  316 +
  317 +
  318 +def _check_msoffcrypto():
  319 + """Raise a :py:class:`CryptoLibNotImported` if msoffcrypto not imported."""
  320 + if msoffcrypto is None:
  321 + raise CryptoLibNotImported()
  322 +
  323 +
  324 +def check_msoffcrypto():
  325 + """Return `True` iff :py:mod:`msoffcrypto` could be imported."""
  326 + return msoffcrypto is not None
  327 +
  328 +
  329 +def decrypt(filename, passwords=None, **temp_file_args):
  330 + """
  331 + Try to decrypt an encrypted file
  332 +
  333 + This function tries to decrypt the given file using a given set of
  334 + passwords. If no password is given, tries the standard password for write
  335 + protection. Creates a file with decrypted data whose file name is returned.
  336 + If the decryption fails, None is returned.
  337 +
  338 + :param str filename: path to an ole file on disc
  339 + :param passwords: list/set/tuple/... of passwords or a single password or
  340 + None
  341 + :type passwords: iterable or str or None
  342 + :param temp_file_args: arguments for :py:func:`tempfile.mkstemp` e.g.,
  343 + `dirname` or `prefix`. `suffix` will default to
  344 + suffix of input `filename`, `prefix` defaults to
  345 + `oletools-decrypt-`; `text` will be ignored
  346 + :returns: name of the decrypted temporary file (type str) or `None`
  347 + :raises: :py:class:`ImportError` if :py:mod:`msoffcrypto-tools` not found
  348 + :raises: :py:class:`ValueError` if the given file is not encrypted
  349 + """
  350 + _check_msoffcrypto()
  351 +
  352 + # normalize password so we always have a list/tuple
  353 + if isinstance(passwords, str):
  354 + passwords = (passwords, )
  355 + elif not passwords:
  356 + passwords = DEFAULT_PASSWORDS
  357 +
  358 + # check temp file args
  359 + if 'prefix' not in temp_file_args:
  360 + temp_file_args['prefix'] = 'oletools-decrypt-'
  361 + if 'suffix' not in temp_file_args:
  362 + temp_file_args['suffix'] = splitext(filename)[1]
  363 + temp_file_args['text'] = False
  364 +
  365 + decrypt_file = None
  366 + with open(filename, 'rb') as reader:
  367 + try:
  368 + crypto_file = msoffcrypto.OfficeFile(reader)
  369 + except Exception as exc: # e.g. ppt, not yet supported by msoffcrypto
  370 + if 'Unrecognized file format' in str(exc):
  371 + log.debug('Caught exception', exc_info=True)
  372 +
  373 + # raise different exception without stack trace of original exc
  374 + if sys.version_info.major == 2:
  375 + raise UnsupportedEncryptionError(filename)
  376 + else:
  377 + # this is a syntax error in python 2, so wrap it in exec()
  378 + exec('raise UnsupportedEncryptionError(filename) from None')
  379 + else:
  380 + raise
  381 + if not crypto_file.is_encrypted():
  382 + raise ValueError('Given input file {} is not encrypted!'
  383 + .format(filename))
  384 +
  385 + for password in passwords:
  386 + log.debug('Trying to decrypt with password {!r}'.format(password))
  387 + write_descriptor = None
  388 + write_handle = None
  389 + decrypt_file = None
  390 + try:
  391 + crypto_file.load_key(password=password)
  392 +
  393 + # create temp file
  394 + write_descriptor, decrypt_file = mkstemp(**temp_file_args)
  395 + write_handle = os.fdopen(write_descriptor, 'wb')
  396 + write_descriptor = None # is now handled via write_handle
  397 + crypto_file.decrypt(write_handle)
  398 +
  399 + # decryption was successfull; clean up and return
  400 + write_handle.close()
  401 + write_handle = None
  402 + break
  403 + except Exception:
  404 + log.debug('Failed to decrypt', exc_info=True)
  405 +
  406 + # error-clean up: close everything and del temp file
  407 + if write_handle:
  408 + write_handle.close()
  409 + elif write_descriptor:
  410 + os.close(write_descriptor)
  411 + if decrypt_file and isfile(decrypt_file):
  412 + os.unlink(decrypt_file)
  413 + decrypt_file = None
  414 + # if we reach this, all passwords were tried without success
  415 + log.debug('All passwords failed')
  416 + return decrypt_file
... ...
oletools/doc/Home.html
... ... @@ -16,7 +16,7 @@
16 16 <![endif]-->
17 17 </head>
18 18 <body>
19   -<h1 id="python-oletools-v0.53-documentation">python-oletools v0.53 documentation</h1>
  19 +<h1 id="python-oletools-v0.54-documentation">python-oletools v0.54 documentation</h1>
20 20 <p>This is the home page of the documentation for python-oletools. The latest version can be found <a href="https://github.com/decalage2/oletools/wiki">online</a>, otherwise a copy is provided in the doc subfolder of the package.</p>
21 21 <p><a href="http://www.decalage.info/python/oletools">python-oletools</a> is a package of python tools to analyze <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on the <a href="http://www.decalage.info/olefile">olefile</a> parser. See <a href="http://www.decalage.info/python/oletools" class="uri">http://www.decalage.info/python/oletools</a> for more info.</p>
22 22 <p><strong>Quick links:</strong> <a href="http://www.decalage.info/python/oletools">Home page</a> - <a href="https://github.com/decalage2/oletools/wiki/Install">Download/Install</a> - <a href="https://github.com/decalage2/oletools/wiki">Documentation</a> - <a href="https://github.com/decalage2/oletools/issues">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the Author</a> - <a href="https://github.com/decalage2/oletools">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a></p>
... ...
oletools/doc/Home.md
1   -python-oletools v0.53 documentation
  1 +python-oletools v0.54 documentation
2 2 ===================================
3 3  
4 4 This is the home page of the documentation for python-oletools. The latest version can be found
... ...
oletools/doc/Install.html
... ... @@ -16,28 +16,35 @@
16 16 <![endif]-->
17 17 </head>
18 18 <body>
19   -<h1 id="how-to-download-and-install-python-oletools">How to Download and Install python-oletools</h1>
  19 +<h1 id="how-to-download-and-install-oletools">How to Download and Install oletools</h1>
20 20 <h2 id="pre-requisites">Pre-requisites</h2>
21   -<p>The recommended Python version to run oletools is <strong>Python 2.7</strong>. Python 2.6 is also supported, but as it is not tested as often as 2.7, some features might not work as expected.</p>
22   -<p>Since oletools v0.50, thanks to contributions by <span class="citation" data-cites="Sebdraven">[@Sebdraven]</span>(https://twitter.com/Sebdraven), most tools can also run with <strong>Python 3.x</strong>. As this is quite new, please <a href="(https://github.com/decalage2/oletools/issues)">report any issue</a> you may encounter.</p>
  21 +<p>The recommended Python version to run oletools is the latest <strong>Python 3.x</strong> (3.7 for now). Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly recommended to switch to Python 3 now.</p>
23 22 <h2 id="recommended-way-to-downloadinstallupdate-oletools-pip">Recommended way to Download+Install/Update oletools: pip</h2>
24 23 <p>Pip is included with Python since version 2.7.9 and 3.4. If it is not installed on your system, either upgrade Python or see https://pip.pypa.io/en/stable/installing/</p>
25 24 <h3 id="linux-mac-osx-unix">Linux, Mac OSX, Unix</h3>
26 25 <p>To download and install/update the latest release version of oletools, run the following command in a shell:</p>
27 26 <pre class="text"><code>sudo -H pip install -U oletools</code></pre>
  27 +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p>
28 28 <p><strong>Important</strong>: Since version 0.50, pip will automatically create convenient command-line scripts in /usr/local/bin to run all the oletools from any directory.</p>
29 29 <h3 id="windows">Windows</h3>
30 30 <p>To download and install/update the latest release version of oletools, run the following command in a cmd window:</p>
31 31 <pre class="text"><code>pip install -U oletools</code></pre>
  32 +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p>
  33 +<p><strong>Note</strong>: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip and install for all users. If that is not possible, you may also install only for the current user by adding the <code>--user</code> option:</p>
  34 +<pre class="text"><code>pip3 install -U --user oletools</code></pre>
32 35 <p><strong>Important</strong>: Since version 0.50, pip will automatically create convenient command-line scripts to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc.</p>
33 36 <h2 id="how-to-install-the-latest-development-version">How to install the latest development version</h2>
34 37 <p>If you want to benefit from the latest improvements in the development version, you may also use pip:</p>
35 38 <h3 id="linux-mac-osx-unix-1">Linux, Mac OSX, Unix</h3>
36 39 <pre class="text"><code>sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip</code></pre>
  40 +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p>
37 41 <h3 id="windows-1">Windows</h3>
38 42 <pre class="text"><code>pip install -U https://github.com/decalage2/oletools/archive/master.zip</code></pre>
  43 +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p>
  44 +<p><strong>Note</strong>: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip and install for all users. If that is not possible, you may also install only for the current user by adding the <code>--user</code> option:</p>
  45 +<pre class="text"><code>pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip</code></pre>
39 46 <h2 id="how-to-install-offline---computer-without-internet-access">How to install offline - Computer without Internet access</h2>
40   -<p>First, download the oletools archive on a computer with Internet access: * Latest stable version: from https://github.com/decalage2/oletools/releases * Development version: https://github.com/decalage2/oletools/archive/master.zip</p>
  47 +<p>First, download the oletools archive on a computer with Internet access: * Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases * Development version: https://github.com/decalage2/oletools/archive/master.zip</p>
41 48 <p>Copy the archive file to the target computer.</p>
42 49 <p>On Linux, Mac OSX, Unix, run the following command using the filename of the archive that you downloaded:</p>
43 50 <pre class="text"><code>sudo -H pip install -U oletools.zip</code></pre>
... ...
oletools/doc/Install.md
1   -How to Download and Install python-oletools
2   -===========================================
  1 +How to Download and Install oletools
  2 +====================================
3 3  
4 4 Pre-requisites
5 5 --------------
6 6  
7   -The recommended Python version to run oletools is **Python 2.7**.
8   -Python 2.6 is also supported, but as it is not tested as often as 2.7, some features
9   -might not work as expected.
10   -
11   -Since oletools v0.50, thanks to contributions by [@Sebdraven](https://twitter.com/Sebdraven),
12   -most tools can also run with **Python 3.x**. As this is quite new, please
13   -[report any issue]((https://github.com/decalage2/oletools/issues)) you may encounter.
14   -
15   -
  7 +The recommended Python version to run oletools is the latest **Python 3.x** (3.7 for now).
  8 +Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly
  9 +recommended to switch to Python 3 now.
16 10  
17 11 Recommended way to Download+Install/Update oletools: pip
18 12 --------------------------------------------------------
... ... @@ -29,6 +23,8 @@ run the following command in a shell:
29 23 sudo -H pip install -U oletools
30 24 ```
31 25  
  26 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  27 +
32 28 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts
33 29 in /usr/local/bin to run all the oletools from any directory.
34 30  
... ... @@ -41,6 +37,16 @@ run the following command in a cmd window:
41 37 pip install -U oletools
42 38 ```
43 39  
  40 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  41 +
  42 +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip
  43 +and install for all users. If that is not possible, you may also install only for the current user
  44 +by adding the `--user` option:
  45 +
  46 +```text
  47 +pip3 install -U --user oletools
  48 +```
  49 +
44 50 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts
45 51 to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc.
46 52  
... ... @@ -57,17 +63,29 @@ you may also use pip:
57 63 sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip
58 64 ```
59 65  
  66 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  67 +
60 68 ### Windows
61 69  
62 70 ```text
63 71 pip install -U https://github.com/decalage2/oletools/archive/master.zip
64 72 ```
65 73  
  74 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  75 +
  76 +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip
  77 +and install for all users. If that is not possible, you may also install only for the current user
  78 +by adding the `--user` option:
  79 +
  80 +```text
  81 +pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip
  82 +```
  83 +
66 84 How to install offline - Computer without Internet access
67 85 ---------------------------------------------------------
68 86  
69 87 First, download the oletools archive on a computer with Internet access:
70   -* Latest stable version: from https://github.com/decalage2/oletools/releases
  88 +* Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases
71 89 * Development version: https://github.com/decalage2/oletools/archive/master.zip
72 90  
73 91 Copy the archive file to the target computer.
... ...
oletools/doc/License.html
... ... @@ -18,7 +18,7 @@
18 18 <body>
19 19 <h1 id="license-for-python-oletools">License for python-oletools</h1>
20 20 <p>This license applies to the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package, apart from the thirdparty folder which contains third-party files published with their own license.</p>
21   -<p>The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (<a href="http://www.decalage.info" class="uri">http://www.decalage.info</a>)</p>
  21 +<p>The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (<a href="http://www.decalage.info" class="uri">http://www.decalage.info</a>)</p>
22 22 <p>All rights reserved.</p>
23 23 <p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p>
24 24 <ul>
... ...
oletools/doc/License.md
... ... @@ -4,7 +4,7 @@ License for python-oletools
4 4 This license applies to the [python-oletools](http://www.decalage.info/python/oletools) package, apart from the
5 5 thirdparty folder which contains third-party files published with their own license.
6 6  
7   -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec ([http://www.decalage.info](http://www.decalage.info))
  7 +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec ([http://www.decalage.info](http://www.decalage.info))
8 8  
9 9 All rights reserved.
10 10  
... ...
oletools/doc/mraptor.html
... ... @@ -24,7 +24,7 @@
24 24 <p>mraptor can be used either as a command-line tool, or as a python module from your own applications.</p>
25 25 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p>
26 26 <h2 id="usage">Usage</h2>
27   -<pre class="text"><code>Usage: mraptor.py [options] &lt;filename&gt; [filename2 ...]
  27 +<pre class="text"><code>Usage: mraptor [options] &lt;filename&gt; [filename2 ...]
28 28  
29 29 Options:
30 30 -h, --help show this help message and exit
... ... @@ -49,15 +49,15 @@ An exit code is returned based on the analysis result:
49 49 - 20: SUSPICIOUS</code></pre>
50 50 <h3 id="examples">Examples</h3>
51 51 <p>Scan a single file:</p>
52   -<pre class="text"><code>mraptor.py file.doc</code></pre>
  52 +<pre class="text"><code>mraptor file.doc</code></pre>
53 53 <p>Scan a single file, stored in a Zip archive with password “infected”:</p>
54   -<pre class="text"><code>mraptor.py malicious_file.xls.zip -z infected</code></pre>
  54 +<pre class="text"><code>mraptor malicious_file.xls.zip -z infected</code></pre>
55 55 <p>Scan a collection of files stored in a folder:</p>
56   -<pre class="text"><code>mraptor.py &quot;MalwareZoo/VBA/*&quot;</code></pre>
  56 +<pre class="text"><code>mraptor &quot;MalwareZoo/VBA/*&quot;</code></pre>
57 57 <p><strong>Important</strong>: on Linux/MacOSX, always add double quotes around a file name when you use wildcards such as <code>*</code> and <code>?</code>. Otherwise, the shell may replace the argument with the actual list of files matching the wildcards before starting the script.</p>
58 58 <p><img src="mraptor1.png" /></p>
59 59 <h2 id="python-3-support---mraptor3">Python 3 support - mraptor3</h2>
60   -<p>As of v0.50, mraptor has been ported to Python 3 thanks to <span class="citation" data-cites="sebdraven">@sebdraven</span>. However, the differences between Python 2 and 3 are significant and for now there is a separate version of mraptor named mraptor3 to be used with Python 3.</p>
  60 +<p>Since v0.54, mraptor is fully compatible with both Python 2 and 3. There is no need to use mraptor3 anymore, however it is still present for backward compatibility.</p>
61 61 <hr />
62 62 <h2 id="how-to-use-mraptor-in-python-applications">How to use mraptor in Python applications</h2>
63 63 <p>TODO</p>
... ...
oletools/doc/mraptor.md
... ... @@ -24,7 +24,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
24 24 ## Usage
25 25  
26 26 ```text
27   -Usage: mraptor.py [options] <filename> [filename2 ...]
  27 +Usage: mraptor [options] <filename> [filename2 ...]
28 28  
29 29 Options:
30 30 -h, --help show this help message and exit
... ... @@ -54,19 +54,19 @@ An exit code is returned based on the analysis result:
54 54 Scan a single file:
55 55  
56 56 ```text
57   -mraptor.py file.doc
  57 +mraptor file.doc
58 58 ```
59 59  
60 60 Scan a single file, stored in a Zip archive with password "infected":
61 61  
62 62 ```text
63   -mraptor.py malicious_file.xls.zip -z infected
  63 +mraptor malicious_file.xls.zip -z infected
64 64 ```
65 65  
66 66 Scan a collection of files stored in a folder:
67 67  
68 68 ```text
69   -mraptor.py "MalwareZoo/VBA/*"
  69 +mraptor "MalwareZoo/VBA/*"
70 70 ```
71 71  
72 72 **Important**: on Linux/MacOSX, always add double quotes around a file name when you use
... ... @@ -77,10 +77,8 @@ list of files matching the wildcards before starting the script.
77 77  
78 78 ## Python 3 support - mraptor3
79 79  
80   -As of v0.50, mraptor has been ported to Python 3 thanks to @sebdraven.
81   -However, the differences between Python 2 and 3 are significant and for now
82   -there is a separate version of mraptor named mraptor3 to be used with
83   -Python 3.
  80 +Since v0.54, mraptor is fully compatible with both Python 2 and 3.
  81 +There is no need to use mraptor3 anymore, however it is still present for backward compatibility.
84 82  
85 83  
86 84 --------------------------------------------------------------------------
... ...
oletools/doc/olebrowse.html
... ... @@ -26,7 +26,7 @@
26 26 <p>And for Python 3:</p>
27 27 <pre><code>sudo apt-get install python3-tk</code></pre>
28 28 <h2 id="usage">Usage</h2>
29   -<pre><code>olebrowse.py [file]</code></pre>
  29 +<pre><code>olebrowse [file]</code></pre>
30 30 <p>If you provide a file it will be opened, else a dialog will allow you to browse folders to open a file. Then if it is a valid OLE file, the list of data streams will be displayed. You can select a stream, and then either view its content in a builtin hexadecimal viewer, or save it to a file for further analysis.</p>
31 31 <h2 id="screenshots">Screenshots</h2>
32 32 <p>Main menu, showing all streams in the OLE file:</p>
... ...
oletools/doc/olebrowse.md
... ... @@ -30,9 +30,9 @@ sudo apt-get install python3-tk
30 30  
31 31 Usage
32 32 -----
33   -
34   - olebrowse.py [file]
35   -
  33 +```
  34 +olebrowse [file]
  35 +```
36 36 If you provide a file it will be opened, else a dialog will allow you to browse
37 37 folders to open a file. Then if it is a valid OLE file, the list of data streams
38 38 will be displayed. You can select a stream, and then either view its content
... ...
oletools/doc/oledir.html
... ... @@ -21,10 +21,21 @@
21 21 <p>It can be used either as a command-line tool, or as a python module from your own applications.</p>
22 22 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p>
23 23 <h2 id="usage">Usage</h2>
24   -<pre class="text"><code>Usage: oledir.py &lt;filename&gt;</code></pre>
  24 +<pre class="text"><code>Usage: oledir [options] &lt;filename&gt; [filename2 ...]
  25 +
  26 +Options:
  27 + -h, --help show this help message and exit
  28 + -r find files recursively in subdirectories.
  29 + -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
  30 + if the file is a zip archive, open all files from it,
  31 + using the provided password (requires Python 2.6+)
  32 + -f ZIP_FNAME, --zipfname=ZIP_FNAME
  33 + if the file is a zip archive, file(s) to be opened
  34 + within the zip. Wildcards * and ? are supported.
  35 + (default:*)</code></pre>
25 36 <h3 id="examples">Examples</h3>
26 37 <p>Scan a single file:</p>
27   -<pre class="text"><code>oledir.py file.doc</code></pre>
  38 +<pre class="text"><code>oledir file.doc</code></pre>
28 39 <p><img src="oledir.png" /></p>
29 40 <hr />
30 41 <h2 id="how-to-use-oledir-in-python-applications">How to use oledir in Python applications</h2>
... ...
oletools/doc/oledir.md
... ... @@ -11,7 +11,18 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
11 11 ## Usage
12 12  
13 13 ```text
14   -Usage: oledir.py <filename>
  14 +Usage: oledir [options] <filename> [filename2 ...]
  15 +
  16 +Options:
  17 + -h, --help show this help message and exit
  18 + -r find files recursively in subdirectories.
  19 + -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
  20 + if the file is a zip archive, open all files from it,
  21 + using the provided password (requires Python 2.6+)
  22 + -f ZIP_FNAME, --zipfname=ZIP_FNAME
  23 + if the file is a zip archive, file(s) to be opened
  24 + within the zip. Wildcards * and ? are supported.
  25 + (default:*)
15 26 ```
16 27  
17 28 ### Examples
... ... @@ -19,7 +30,7 @@ Usage: oledir.py &lt;filename&gt;
19 30 Scan a single file:
20 31  
21 32 ```text
22   -oledir.py file.doc
  33 +oledir file.doc
23 34 ```
24 35  
25 36 ![](oledir.png)
... ...
oletools/doc/oleid.html
... ... @@ -107,10 +107,10 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
107 107 <li>CSV output</li>
108 108 </ul>
109 109 <h2 id="usage">Usage</h2>
110   -<pre class="text"><code>oleid.py &lt;file&gt;</code></pre>
  110 +<pre class="text"><code>oleid &lt;file&gt;</code></pre>
111 111 <h3 id="example">Example</h3>
112 112 <p>Analyzing a Word document containing a Flash object and VBA macros:</p>
113   -<pre class="text"><code>C:\oletools&gt;oleid.py word_flash_vba.doc
  113 +<pre class="text"><code>C:\oletools&gt;oleid word_flash_vba.doc
114 114  
115 115 Filename: word_flash_vba.doc
116 116 +-------------------------------+-----------------------+
... ...
oletools/doc/oleid.md
... ... @@ -32,7 +32,7 @@ Planned improvements:
32 32 ## Usage
33 33  
34 34 ```text
35   -oleid.py <file>
  35 +oleid <file>
36 36 ```
37 37  
38 38 ### Example
... ... @@ -40,7 +40,7 @@ oleid.py &lt;file&gt;
40 40 Analyzing a Word document containing a Flash object and VBA macros:
41 41  
42 42 ```text
43   -C:\oletools>oleid.py word_flash_vba.doc
  43 +C:\oletools>oleid word_flash_vba.doc
44 44  
45 45 Filename: word_flash_vba.doc
46 46 +-------------------------------+-----------------------+
... ...
oletools/doc/olemap.html
... ... @@ -21,10 +21,10 @@
21 21 <p>It can be used either as a command-line tool, or as a python module from your own applications.</p>
22 22 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p>
23 23 <h2 id="usage">Usage</h2>
24   -<pre class="text"><code>Usage: olemap.py &lt;filename&gt;</code></pre>
  24 +<pre class="text"><code>Usage: olemap &lt;filename&gt;</code></pre>
25 25 <h3 id="examples">Examples</h3>
26 26 <p>Scan a single file:</p>
27   -<pre class="text"><code>olemap.py file.doc</code></pre>
  27 +<pre class="text"><code>olemap file.doc</code></pre>
28 28 <p><img src="olemap1.png" /></p>
29 29 <p><img src="olemap2.png" /></p>
30 30 <hr />
... ...
oletools/doc/olemap.md
... ... @@ -10,7 +10,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
10 10 ## Usage
11 11  
12 12 ```text
13   -Usage: olemap.py <filename>
  13 +Usage: olemap <filename>
14 14 ```
15 15  
16 16 ### Examples
... ... @@ -18,7 +18,7 @@ Usage: olemap.py &lt;filename&gt;
18 18 Scan a single file:
19 19  
20 20 ```text
21   -olemap.py file.doc
  21 +olemap file.doc
22 22 ```
23 23  
24 24 ![](olemap1.png)
... ...
oletools/doc/olemeta.html
... ... @@ -20,7 +20,7 @@
20 20 <p>olemeta is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract all standard properties present in the OLE file.</p>
21 21 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p>
22 22 <h2 id="usage">Usage</h2>
23   -<pre class="text"><code>olemeta.py &lt;file&gt;</code></pre>
  23 +<pre class="text"><code>olemeta &lt;file&gt;</code></pre>
24 24 <h3 id="example">Example</h3>
25 25 <p><img src="olemeta1.png" /></p>
26 26 <h2 id="how-to-use-olemeta-in-python-applications">How to use olemeta in Python applications</h2>
... ...
oletools/doc/olemeta.md
... ... @@ -9,7 +9,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
9 9 ## Usage
10 10  
11 11 ```text
12   -olemeta.py <file>
  12 +olemeta <file>
13 13 ```
14 14  
15 15 ### Example
... ...
oletools/doc/oletimes.html
... ... @@ -20,10 +20,10 @@
20 20 <p>oletimes is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract creation and modification times of all streams and storages in the OLE file.</p>
21 21 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p>
22 22 <h2 id="usage">Usage</h2>
23   -<pre class="text"><code>oletimes.py &lt;file&gt;</code></pre>
  23 +<pre class="text"><code>oletimes &lt;file&gt;</code></pre>
24 24 <h3 id="example">Example</h3>
25 25 <p>Checking the malware sample <a href="https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/">DIAN_caso-5415.doc</a>:</p>
26   -<pre class="text"><code>&gt;oletimes.py DIAN_caso-5415.doc
  26 +<pre class="text"><code>&gt;oletimes DIAN_caso-5415.doc
27 27  
28 28 +----------------------------+---------------------+---------------------+
29 29 | Stream/Storage name | Modification Time | Creation Time |
... ...
oletools/doc/oletimes.md
... ... @@ -10,7 +10,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
10 10 ## Usage
11 11  
12 12 ```text
13   -oletimes.py <file>
  13 +oletimes <file>
14 14 ```
15 15  
16 16 ### Example
... ... @@ -18,7 +18,7 @@ oletimes.py &lt;file&gt;
18 18 Checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/):
19 19  
20 20 ```text
21   ->oletimes.py DIAN_caso-5415.doc
  21 +>oletimes DIAN_caso-5415.doc
22 22  
23 23 +----------------------------+---------------------+---------------------+
24 24 | Stream/Storage name | Modification Time | Creation Time |
... ...
oletools/doc/olevba.html
... ... @@ -127,56 +127,65 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
127 127 <li>olevba scans the macro source code and the deobfuscated strings to find suspicious keywords, auto-executable macros and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames, etc).</li>
128 128 </ol>
129 129 <h2 id="usage">Usage</h2>
130   -<pre class="text"><code>Usage: olevba.py [options] &lt;filename&gt; [filename2 ...]
131   -
  130 +<pre class="text"><code>Usage: olevba [options] &lt;filename&gt; [filename2 ...]
  131 +
132 132 Options:
133 133 -h, --help show this help message and exit
134 134 -r find files recursively in subdirectories.
135 135 -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
136 136 if the file is a zip archive, open all files from it,
137   - using the provided password (requires Python 2.6+)
  137 + using the provided password.
  138 + -p PASSWORD, --password=PASSWORD
  139 + if encrypted office files are encountered, try
  140 + decryption with this password. May be repeated.
138 141 -f ZIP_FNAME, --zipfname=ZIP_FNAME
139 142 if the file is a zip archive, file(s) to be opened
140 143 within the zip. Wildcards * and ? are supported.
141 144 (default:*)
142   - -t, --triage triage mode, display results as a summary table
143   - (default for multiple files)
144   - -d, --detailed detailed mode, display full results (default for
145   - single file)
146 145 -a, --analysis display only analysis results, not the macro source
147 146 code
148 147 -c, --code display only VBA source code, do not analyze it
149   - -i INPUT, --input=INPUT
150   - input file containing VBA source code to be analyzed
151   - (no parsing)
152 148 --decode display all the obfuscated strings with their decoded
153 149 content (Hex, Base64, StrReverse, Dridex, VBA).
154 150 --attr display the attribute lines at the beginning of VBA
155 151 source code
156 152 --reveal display the macro source code after replacing all the
157   - obfuscated strings by their decoded content.</code></pre>
  153 + obfuscated strings by their decoded content.
  154 + -l LOGLEVEL, --loglevel=LOGLEVEL
  155 + logging level debug/info/warning/error/critical
  156 + (default=warning)
  157 + --deobf Attempt to deobfuscate VBA expressions (slow)
  158 + --relaxed Do not raise errors if opening of substream fails
  159 +
  160 + Output mode (mutually exclusive):
  161 + -t, --triage triage mode, display results as a summary table
  162 + (default for multiple files)
  163 + -d, --detailed detailed mode, display full results (default for
  164 + single file)
  165 + -j, --json json mode, detailed in json format (never default)</code></pre>
  166 +<p><strong>New in v0.54:</strong> the -p option can now be used to decrypt encrypted documents using the provided password(s).</p>
158 167 <h3 id="examples">Examples</h3>
159 168 <p>Scan a single file:</p>
160   -<pre class="text"><code>olevba.py file.doc</code></pre>
  169 +<pre class="text"><code>olevba file.doc</code></pre>
161 170 <p>Scan a single file, stored in a Zip archive with password “infected”:</p>
162   -<pre class="text"><code>olevba.py malicious_file.xls.zip -z infected</code></pre>
  171 +<pre class="text"><code>olevba malicious_file.xls.zip -z infected</code></pre>
163 172 <p>Scan a single file, showing all obfuscated strings decoded:</p>
164   -<pre class="text"><code>olevba.py file.doc --decode</code></pre>
  173 +<pre class="text"><code>olevba file.doc --decode</code></pre>
165 174 <p>Scan a single file, showing the macro source code with VBA strings deobfuscated:</p>
166   -<pre class="text"><code>olevba.py file.doc --reveal</code></pre>
  175 +<pre class="text"><code>olevba file.doc --reveal</code></pre>
167 176 <p>Scan VBA source code extracted into a text file:</p>
168   -<pre class="text"><code>olevba.py source_code.vba</code></pre>
  177 +<pre class="text"><code>olevba source_code.vba</code></pre>
169 178 <p>Scan a collection of files stored in a folder:</p>
170   -<pre class="text"><code>olevba.py &quot;MalwareZoo/VBA/*&quot;</code></pre>
  179 +<pre class="text"><code>olevba &quot;MalwareZoo/VBA/*&quot;</code></pre>
171 180 <p>NOTE: On Linux, MacOSX and other Unix variants, it is required to add double quotes around wildcards. Otherwise, they will be expanded by the shell instead of olevba.</p>
172 181 <p>Scan all .doc and .xls files, recursively in all subfolders:</p>
173   -<pre class="text"><code>olevba.py &quot;MalwareZoo/VBA/*.doc&quot; &quot;MalwareZoo/VBA/*.xls&quot; -r</code></pre>
  182 +<pre class="text"><code>olevba &quot;MalwareZoo/VBA/*.doc&quot; &quot;MalwareZoo/VBA/*.xls&quot; -r</code></pre>
174 183 <p>Scan all .doc files within all .zip files with password, recursively:</p>
175   -<pre class="text"><code>olevba.py &quot;MalwareZoo/VBA/*.zip&quot; -r -z infected -f &quot;*.doc&quot;</code></pre>
  184 +<pre class="text"><code>olevba &quot;MalwareZoo/VBA/*.zip&quot; -r -z infected -f &quot;*.doc&quot;</code></pre>
176 185 <h3 id="detailed-analysis-mode-default-for-single-file">Detailed analysis mode (default for single file)</h3>
177 186 <p>When a single file is scanned, or when using the option -d, all details of the analysis are displayed.</p>
178 187 <p>For example, checking the malware sample <a href="https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/">DIAN_caso-5415.doc</a>:</p>
179   -<pre class="text"><code>&gt;olevba.py c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
  188 +<pre class="text"><code>&gt;olevba c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
180 189 ===============================================================================
181 190 FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip
182 191 Type: OLE
... ... @@ -246,7 +255,7 @@ ANALYSIS:
246 255 <li><strong>V</strong>: VBA string expressions (potential obfuscation)</li>
247 256 </ul>
248 257 <p>Here is an example:</p>
249   -<pre class="text"><code>c:\&gt;olevba.py \MalwareZoo\VBA\samples\*
  258 +<pre class="text"><code>c:\&gt;olevba \MalwareZoo\VBA\samples\*
250 259 Flags Filename
251 260 ----------- -----------------------------------------------------------------
252 261 OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware
... ... @@ -266,7 +275,7 @@ OpX:MASI--- \MalwareZoo\VBA\samples\RottenKitten.xlsb.malware
266 275 OLE:MASI-B- \MalwareZoo\VBA\samples\ROVNIX.doc.malware
267 276 OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc</code></pre>
268 277 <h2 id="python-3-support---olevba3">Python 3 support - olevba3</h2>
269   -<p>As of v0.50, olevba has been ported to Python 3 thanks to <span class="citation" data-cites="sebdraven">@sebdraven</span>. However, the differences between Python 2 and 3 are significant and for now there is a separate version of olevba named olevba3 to be used with Python 3.</p>
  278 +<p>Since v0.54, olevba is fully compatible with both Python 2 and 3. There is no need to use olevba3 anymore, however it is still present for backward compatibility.</p>
270 279 <hr />
271 280 <h2 id="how-to-use-olevba-in-python-applications">How to use olevba in Python applications</h2>
272 281 <p>olevba may be used to open a MS Office file, detect if it contains VBA macros, extract and analyze the VBA source code from your own python applications.</p>
... ...
oletools/doc/olevba.md
... ... @@ -67,85 +67,95 @@ and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames,
67 67 ## Usage
68 68  
69 69 ```text
70   -Usage: olevba.py [options] <filename> [filename2 ...]
71   -
  70 +Usage: olevba [options] <filename> [filename2 ...]
  71 +
72 72 Options:
73 73 -h, --help show this help message and exit
74 74 -r find files recursively in subdirectories.
75 75 -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
76 76 if the file is a zip archive, open all files from it,
77   - using the provided password (requires Python 2.6+)
  77 + using the provided password.
  78 + -p PASSWORD, --password=PASSWORD
  79 + if encrypted office files are encountered, try
  80 + decryption with this password. May be repeated.
78 81 -f ZIP_FNAME, --zipfname=ZIP_FNAME
79 82 if the file is a zip archive, file(s) to be opened
80 83 within the zip. Wildcards * and ? are supported.
81 84 (default:*)
82   - -t, --triage triage mode, display results as a summary table
83   - (default for multiple files)
84   - -d, --detailed detailed mode, display full results (default for
85   - single file)
86 85 -a, --analysis display only analysis results, not the macro source
87 86 code
88 87 -c, --code display only VBA source code, do not analyze it
89   - -i INPUT, --input=INPUT
90   - input file containing VBA source code to be analyzed
91   - (no parsing)
92 88 --decode display all the obfuscated strings with their decoded
93 89 content (Hex, Base64, StrReverse, Dridex, VBA).
94 90 --attr display the attribute lines at the beginning of VBA
95 91 source code
96 92 --reveal display the macro source code after replacing all the
97 93 obfuscated strings by their decoded content.
  94 + -l LOGLEVEL, --loglevel=LOGLEVEL
  95 + logging level debug/info/warning/error/critical
  96 + (default=warning)
  97 + --deobf Attempt to deobfuscate VBA expressions (slow)
  98 + --relaxed Do not raise errors if opening of substream fails
  99 +
  100 + Output mode (mutually exclusive):
  101 + -t, --triage triage mode, display results as a summary table
  102 + (default for multiple files)
  103 + -d, --detailed detailed mode, display full results (default for
  104 + single file)
  105 + -j, --json json mode, detailed in json format (never default)
98 106 ```
99 107  
  108 +**New in v0.54:** the -p option can now be used to decrypt encrypted documents using the provided password(s).
  109 +
100 110 ### Examples
101 111  
102 112 Scan a single file:
103 113  
104 114 ```text
105   -olevba.py file.doc
  115 +olevba file.doc
106 116 ```
107 117  
108 118 Scan a single file, stored in a Zip archive with password "infected":
109 119  
110 120 ```text
111   -olevba.py malicious_file.xls.zip -z infected
  121 +olevba malicious_file.xls.zip -z infected
112 122 ```
113 123  
114 124 Scan a single file, showing all obfuscated strings decoded:
115 125  
116 126 ```text
117   -olevba.py file.doc --decode
  127 +olevba file.doc --decode
118 128 ```
119 129  
120 130 Scan a single file, showing the macro source code with VBA strings deobfuscated:
121 131  
122 132 ```text
123   -olevba.py file.doc --reveal
  133 +olevba file.doc --reveal
124 134 ```
125 135  
126 136 Scan VBA source code extracted into a text file:
127 137  
128 138 ```text
129   -olevba.py source_code.vba
  139 +olevba source_code.vba
130 140 ```
131 141  
132 142 Scan a collection of files stored in a folder:
133 143  
134 144 ```text
135   -olevba.py "MalwareZoo/VBA/*"
  145 +olevba "MalwareZoo/VBA/*"
136 146 ```
137 147 NOTE: On Linux, MacOSX and other Unix variants, it is required to add double quotes around wildcards. Otherwise, they will be expanded by the shell instead of olevba.
138 148  
139 149 Scan all .doc and .xls files, recursively in all subfolders:
140 150  
141 151 ```text
142   -olevba.py "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r
  152 +olevba "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r
143 153 ```
144 154  
145 155 Scan all .doc files within all .zip files with password, recursively:
146 156  
147 157 ```text
148   -olevba.py "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc"
  158 +olevba "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc"
149 159 ```
150 160  
151 161  
... ... @@ -156,7 +166,7 @@ When a single file is scanned, or when using the option -d, all details of the a
156 166 For example, checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/):
157 167  
158 168 ```text
159   ->olevba.py c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
  169 +>olevba c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
160 170 ===============================================================================
161 171 FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip
162 172 Type: OLE
... ... @@ -233,7 +243,7 @@ The following flags show the results of the analysis:
233 243 Here is an example:
234 244  
235 245 ```text
236   -c:\>olevba.py \MalwareZoo\VBA\samples\*
  246 +c:\>olevba \MalwareZoo\VBA\samples\*
237 247 Flags Filename
238 248 ----------- -----------------------------------------------------------------
239 249 OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware
... ... @@ -256,10 +266,9 @@ OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc
256 266  
257 267 ## Python 3 support - olevba3
258 268  
259   -As of v0.50, olevba has been ported to Python 3 thanks to @sebdraven.
260   -However, the differences between Python 2 and 3 are significant and for now
261   -there is a separate version of olevba named olevba3 to be used with
262   -Python 3.
  269 +Since v0.54, olevba is fully compatible with both Python 2 and 3.
  270 +There is no need to use olevba3 anymore, however it is still present for backward compatibility.
  271 +
263 272  
264 273 --------------------------------------------------------------------------
265 274  
... ...
oletools/doc/pyxswf.html
... ... @@ -24,7 +24,7 @@
24 24 <p>It can also extract Flash objects from RTF documents, by parsing embedded objects encoded in hexadecimal format (-f option).</p>
25 25 <p>For this, simply add the -o option to work on OLE streams rather than raw files, or the -f option to work on RTF files.</p>
26 26 <h2 id="usage">Usage</h2>
27   -<pre class="text"><code>Usage: pyxswf.py [options] &lt;file.bad&gt;
  27 +<pre class="text"><code>Usage: pyxswf [options] &lt;file.bad&gt;
28 28  
29 29 Options:
30 30 -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF
... ... @@ -46,18 +46,18 @@ Options:
46 46 contain SWFs. Must provide path in quotes
47 47 -c, --compress Compresses the SWF using Zlib</code></pre>
48 48 <h3 id="example-1---detecting-and-extracting-a-swf-file-from-a-word-document-on-windows">Example 1 - detecting and extracting a SWF file from a Word document on Windows:</h3>
49   -<pre class="text"><code>C:\oletools&gt;pyxswf.py -o word_flash.doc
  49 +<pre class="text"><code>C:\oletools&gt;pyxswf -o word_flash.doc
50 50 OLE stream: &#39;Contents&#39;
51 51 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
52 52 [ADDR] SWF 1 at 0x8 - FWS Header
53 53  
54   -C:\oletools&gt;pyxswf.py -xo word_flash.doc
  54 +C:\oletools&gt;pyxswf -xo word_flash.doc
55 55 OLE stream: &#39;Contents&#39;
56 56 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
57 57 [ADDR] SWF 1 at 0x8 - FWS Header
58 58 [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf</code></pre>
59 59 <h3 id="example-2---detecting-and-extracting-a-swf-file-from-a-rtf-document-on-windows">Example 2 - detecting and extracting a SWF file from a RTF document on Windows:</h3>
60   -<pre class="text"><code>C:\oletools&gt;pyxswf.py -xf &quot;rtf_flash.rtf&quot;
  60 +<pre class="text"><code>C:\oletools&gt;pyxswf -xf &quot;rtf_flash.rtf&quot;
61 61 RTF embedded object size 1498557 at index 000036DD
62 62 [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
63 63 00036DD
... ...
oletools/doc/pyxswf.md
... ... @@ -21,7 +21,7 @@ For this, simply add the -o option to work on OLE streams rather than raw files,
21 21 ## Usage
22 22  
23 23 ```text
24   -Usage: pyxswf.py [options] <file.bad>
  24 +Usage: pyxswf [options] <file.bad>
25 25  
26 26 Options:
27 27 -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF
... ... @@ -47,12 +47,12 @@ Options:
47 47 ### Example 1 - detecting and extracting a SWF file from a Word document on Windows:
48 48  
49 49 ```text
50   -C:\oletools>pyxswf.py -o word_flash.doc
  50 +C:\oletools>pyxswf -o word_flash.doc
51 51 OLE stream: 'Contents'
52 52 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
53 53 [ADDR] SWF 1 at 0x8 - FWS Header
54 54  
55   -C:\oletools>pyxswf.py -xo word_flash.doc
  55 +C:\oletools>pyxswf -xo word_flash.doc
56 56 OLE stream: 'Contents'
57 57 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
58 58 [ADDR] SWF 1 at 0x8 - FWS Header
... ... @@ -62,7 +62,7 @@ OLE stream: &#39;Contents&#39;
62 62 ### Example 2 - detecting and extracting a SWF file from a RTF document on Windows:
63 63  
64 64 ```text
65   -C:\oletools>pyxswf.py -xf "rtf_flash.rtf"
  65 +C:\oletools>pyxswf -xf "rtf_flash.rtf"
66 66 RTF embedded object size 1498557 at index 000036DD
67 67 [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
68 68 00036DD
... ...
oletools/ezhexviewer.py
... ... @@ -16,7 +16,7 @@ Usage in a python application:
16 16  
17 17 ezhexviewer project website: http://www.decalage.info/python/ezhexviewer
18 18  
19   -ezhexviewer is copyright (c) 2012-2017, Philippe Lagadec (http://www.decalage.info)
  19 +ezhexviewer is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
20 20 All rights reserved.
21 21  
22 22 Redistribution and use in source and binary forms, with or without modification,
... ... @@ -50,7 +50,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
50 50 # 2017-04-26 PL: - fixed absolute imports (issue #141)
51 51 # 2018-09-15 v0.54 PL: - easygui is now a dependency
52 52  
53   -__version__ = '0.54dev1'
  53 +__version__ = '0.54'
54 54  
55 55 #-----------------------------------------------------------------------------
56 56 # TODO:
... ...
oletools/mraptor.py
... ... @@ -23,7 +23,7 @@ http://www.decalage.info/python/oletools
23 23  
24 24 # === LICENSE ==================================================================
25 25  
26   -# MacroRaptor is copyright (c) 2016-2018 Philippe Lagadec (http://www.decalage.info)
  26 +# MacroRaptor is copyright (c) 2016-2019 Philippe Lagadec (http://www.decalage.info)
27 27 # All rights reserved.
28 28 #
29 29 # Redistribution and use in source and binary forms, with or without modification,
... ... @@ -58,8 +58,9 @@ http://www.decalage.info/python/oletools
58 58 # 2016-12-21 v0.51 PL: - added more ActiveX macro triggers
59 59 # 2017-03-08 PL: - fixed absolute imports
60 60 # 2018-05-25 v0.53 PL: - added Word/PowerPoint 2007+ XML (aka Flat OPC) issue #283
  61 +# 2019-04-04 v0.54 PL: - added ExecuteExcel4Macro, ShellExecuteA, XLM keywords
61 62  
62   -__version__ = '0.53'
  63 +__version__ = '0.54'
63 64  
64 65 #------------------------------------------------------------------------------
65 66 # TODO:
... ... @@ -119,20 +120,21 @@ re_autoexec = re.compile(r&#39;(?i)\b(?:Auto(?:Exec|_?Open|_?Close|Exit|New)&#39; +
119 120 r'|DocumentComplete|DownloadBegin|DownloadComplete|FileDownload' +
120 121 r'|NavigateComplete2|NavigateError|ProgressChange|PropertyChange' +
121 122 r'|SetSecureLockIcon|StatusTextChange|TitleChange|MouseMove' +
122   - r'|MouseEnter|MouseLeave|))\b')
  123 + r'|MouseEnter|MouseLeave))|Auto_Ope\b')
  124 +# TODO: "Auto_Ope" is temporarily here because of a bug in plugin_biff, which misses the last byte in "Auto_Open"...
123 125  
124 126 # MS-VBAL 5.4.5.1 Open Statement:
125 127 RE_OPEN_WRITE = r'(?:\bOpen\b[^\n]+\b(?:Write|Append|Binary|Output|Random)\b)'
126 128  
127 129 re_write = re.compile(r'(?i)\b(?:FileCopy|CopyFile|Kill|CreateTextFile|'
128   - + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|'
  130 + + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|WriteProcessMemory|'
129 131 + r'ADODB\.Stream|WriteText|SaveToFile|SaveAs|SaveAsRTF|FileSaveAs|MkDir|RmDir|SaveSetting|SetAttr)\b|' + RE_OPEN_WRITE)
130 132  
131 133 # MS-VBAL 5.2.3.5 External Procedure Declaration
132 134 RE_DECLARE_LIB = r'(?:\bDeclare\b[^\n]+\bLib\b)'
133 135  
134 136 re_execute = re.compile(r'(?i)\b(?:Shell|CreateObject|GetObject|SendKeys|'
135   - + r'MacScript|FollowHyperlink|CreateThread|ShellExecute)\b|' + RE_DECLARE_LIB)
  137 + + r'MacScript|FollowHyperlink|CreateThread|ShellExecuteA?|ExecuteExcel4Macro|EXEC|REGISTER)\b|' + RE_DECLARE_LIB)
136 138  
137 139  
138 140 # === CLASSES =================================================================
... ...
oletools/mraptor3.py
1 1 #!/usr/bin/env python
2   -"""
3   -mraptor.py - MacroRaptor
4 2  
5   -MacroRaptor is a script to parse OLE and OpenXML files such as MS Office
6   -documents (e.g. Word, Excel), to detect malicious macros.
  3 +# mraptor3 is a stub that redirects to mraptor.py, for backwards compatibility
7 4  
8   -Supported formats:
9   -- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
10   -- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
11   -- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
12   -- Word/PowerPoint 2007+ XML (aka Flat OPC)
13   -- Word 2003 XML (.xml)
14   -- Word/Excel Single File Web Page / MHTML (.mht)
15   -- Publisher (.pub)
  5 +import sys, os, warnings
16 6  
17   -Author: Philippe Lagadec - http://www.decalage.info
18   -License: BSD, see source code or documentation
19   -
20   -MacroRaptor is part of the python-oletools package:
21   -http://www.decalage.info/python/oletools
22   -"""
23   -
24   -# === LICENSE ==================================================================
25   -
26   -# MacroRaptor is copyright (c) 2016-2018 Philippe Lagadec (http://www.decalage.info)
27   -# All rights reserved.
28   -#
29   -# Redistribution and use in source and binary forms, with or without modification,
30   -# are permitted provided that the following conditions are met:
31   -#
32   -# * Redistributions of source code must retain the above copyright notice, this
33   -# list of conditions and the following disclaimer.
34   -# * Redistributions in binary form must reproduce the above copyright notice,
35   -# this list of conditions and the following disclaimer in the documentation
36   -# and/or other materials provided with the distribution.
37   -#
38   -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
39   -# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
40   -# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
41   -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
42   -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
43   -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
44   -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
45   -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
46   -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
47   -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
48   -
49   -#------------------------------------------------------------------------------
50   -# CHANGELOG:
51   -# 2016-02-23 v0.01 PL: - first version
52   -# 2016-02-29 v0.02 PL: - added Workbook_Activate, FileSaveAs
53   -# 2016-03-04 v0.03 PL: - returns an exit code based on the overall result
54   -# 2016-03-08 v0.04 PL: - collapse long lines before analysis
55   -# 2016-07-19 v0.50 SL: - converted to Python 3
56   -# 2016-08-26 PL: - changed imports for Python 3
57   -# 2017-04-26 v0.51 PL: - fixed absolute imports (issue #141)
58   -# 2017-06-29 PL: - synced with mraptor.py 0.51
59   -# 2018-05-25 v0.53 PL: - added Word/PowerPoint 2007+ XML (aka Flat OPC) issue #283
60   -
61   -__version__ = '0.53'
62   -
63   -#------------------------------------------------------------------------------
64   -# TODO:
65   -
66   -
67   -#--- IMPORTS ------------------------------------------------------------------
68   -
69   -import sys, os, logging, optparse, re
  7 +warnings.warn('mraptor3 is deprecated, mraptor should be used instead.', DeprecationWarning)
70 8  
71 9 # IMPORTANT: it should be possible to run oletools directly as scripts
72 10 # in any directory without installing them with pip or setup.py.
... ... @@ -74,280 +12,12 @@ import sys, os, logging, optparse, re
74 12 # And to enable Python 2+3 compatibility, we need to use absolute imports,
75 13 # so we add the oletools parent folder to sys.path (absolute+normalized path):
76 14 _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
77   -# print('_thismodule_dir = %r' % _thismodule_dir)
78 15 _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
79   -# print('_parent_dir = %r' % _thirdparty_dir)
80   -if not _parent_dir in sys.path:
  16 +if _parent_dir not in sys.path:
81 17 sys.path.insert(0, _parent_dir)
82 18  
83   -from oletools.thirdparty.xglob import xglob
84   -from oletools.thirdparty.tablestream import tablestream
85   -
86   -# import the python 3 version of olevba
87   -from oletools import olevba3 as olevba
88   -from oletools.olevba3 import TYPE2TAG
89   -
90   -# === LOGGING =================================================================
91   -
92   -# a global logger object used for debugging:
93   -log = olevba.get_logger('mraptor')
94   -
95   -
96   -#--- CONSTANTS ----------------------------------------------------------------
97   -
98   -# URL and message to report issues:
99   -# TODO: make it a common variable for all oletools
100   -URL_ISSUES = 'https://github.com/decalage2/oletools/issues'
101   -MSG_ISSUES = 'Please report this issue on %s' % URL_ISSUES
102   -
103   -# 'AutoExec', 'AutoOpen', 'Auto_Open', 'AutoClose', 'Auto_Close', 'AutoNew', 'AutoExit',
104   -# 'Document_Open', 'DocumentOpen',
105   -# 'Document_Close', 'DocumentBeforeClose', 'Document_BeforeClose',
106   -# 'DocumentChange','Document_New',
107   -# 'NewDocument'
108   -# 'Workbook_Open', 'Workbook_Close',
109   -# *_Painted such as InkPicture1_Painted
110   -# *_GotFocus|LostFocus|MouseHover for other ActiveX objects
111   -# reference: http://www.greyhathacker.net/?p=948
112   -
113   -# TODO: check if line also contains Sub or Function
114   -re_autoexec = re.compile(r'(?i)\b(?:Auto(?:Exec|_?Open|_?Close|Exit|New)' +
115   - r'|Document(?:_?Open|_Close|_?BeforeClose|Change|_New)' +
116   - r'|NewDocument|Workbook(?:_Open|_Activate|_Close)' +
117   - r'|\w+_(?:Painted|Painting|GotFocus|LostFocus|MouseHover' +
118   - r'|Layout|Click|Change|Resize|BeforeNavigate2|BeforeScriptExecute' +
119   - r'|DocumentComplete|DownloadBegin|DownloadComplete|FileDownload' +
120   - r'|NavigateComplete2|NavigateError|ProgressChange|PropertyChange' +
121   - r'|SetSecureLockIcon|StatusTextChange|TitleChange|MouseMove' +
122   - r'|MouseEnter|MouseLeave|))\b')
123   -
124   -# MS-VBAL 5.4.5.1 Open Statement:
125   -RE_OPEN_WRITE = r'(?:\bOpen\b[^\n]+\b(?:Write|Append|Binary|Output|Random)\b)'
126   -
127   -re_write = re.compile(r'(?i)\b(?:FileCopy|CopyFile|Kill|CreateTextFile|'
128   - + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|'
129   - + r'ADODB\.Stream|WriteText|SaveToFile|SaveAs|SaveAsRTF|FileSaveAs|MkDir|RmDir|SaveSetting|SetAttr)\b|' + RE_OPEN_WRITE)
130   -
131   -# MS-VBAL 5.2.3.5 External Procedure Declaration
132   -RE_DECLARE_LIB = r'(?:\bDeclare\b[^\n]+\bLib\b)'
133   -
134   -re_execute = re.compile(r'(?i)\b(?:Shell|CreateObject|GetObject|SendKeys|'
135   - + r'MacScript|FollowHyperlink|CreateThread|ShellExecute)\b|' + RE_DECLARE_LIB)
136   -
137   -
138   -# === CLASSES =================================================================
139   -
140   -class Result_NoMacro(object):
141   - exit_code = 0
142   - color = 'green'
143   - name = 'No Macro'
144   -
145   -
146   -class Result_NotMSOffice(object):
147   - exit_code = 1
148   - color = 'green'
149   - name = 'Not MS Office'
150   -
151   -
152   -class Result_MacroOK(object):
153   - exit_code = 2
154   - color = 'cyan'
155   - name = 'Macro OK'
156   -
157   -
158   -class Result_Error(object):
159   - exit_code = 10
160   - color = 'yellow'
161   - name = 'ERROR'
162   -
163   -
164   -class Result_Suspicious(object):
165   - exit_code = 20
166   - color = 'red'
167   - name = 'SUSPICIOUS'
168   -
169   -
170   -class MacroRaptor(object):
171   - """
172   - class to scan VBA macro code to detect if it is malicious
173   - """
174   - def __init__(self, vba_code):
175   - """
176   - MacroRaptor constructor
177   - :param vba_code: string containing the VBA macro code
178   - """
179   - # collapse long lines first
180   - self.vba_code = olevba.vba_collapse_long_lines(vba_code)
181   - self.autoexec = False
182   - self.write = False
183   - self.execute = False
184   - self.flags = ''
185   - self.suspicious = False
186   - self.autoexec_match = None
187   - self.write_match = None
188   - self.execute_match = None
189   - self.matches = []
190   -
191   - def scan(self):
192   - """
193   - Scan the VBA macro code to detect if it is malicious
194   - :return:
195   - """
196   - m = re_autoexec.search(self.vba_code)
197   - if m is not None:
198   - self.autoexec = True
199   - self.autoexec_match = m.group()
200   - self.matches.append(m.group())
201   - m = re_write.search(self.vba_code)
202   - if m is not None:
203   - self.write = True
204   - self.write_match = m.group()
205   - self.matches.append(m.group())
206   - m = re_execute.search(self.vba_code)
207   - if m is not None:
208   - self.execute = True
209   - self.execute_match = m.group()
210   - self.matches.append(m.group())
211   - if self.autoexec and (self.execute or self.write):
212   - self.suspicious = True
213   -
214   - def get_flags(self):
215   - flags = ''
216   - flags += 'A' if self.autoexec else '-'
217   - flags += 'W' if self.write else '-'
218   - flags += 'X' if self.execute else '-'
219   - return flags
220   -
221   -
222   -# === MAIN ====================================================================
223   -
224   -def main():
225   - """
226   - Main function, called when olevba is run from the command line
227   - """
228   - global log
229   - DEFAULT_LOG_LEVEL = "warning" # Default log level
230   - LOG_LEVELS = {
231   - 'debug': logging.DEBUG,
232   - 'info': logging.INFO,
233   - 'warning': logging.WARNING,
234   - 'error': logging.ERROR,
235   - 'critical': logging.CRITICAL
236   - }
237   -
238   - usage = 'usage: %prog [options] <filename> [filename2 ...]'
239   - parser = optparse.OptionParser(usage=usage)
240   - parser.add_option("-r", action="store_true", dest="recursive",
241   - help='find files recursively in subdirectories.')
242   - parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None,
243   - help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)')
244   - parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*',
245   - help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)')
246   - parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL,
247   - help="logging level debug/info/warning/error/critical (default=%default)")
248   - parser.add_option("-m", '--matches', action="store_true", dest="show_matches",
249   - help='Show matched strings.')
250   -
251   - # TODO: add logfile option
252   -
253   - (options, args) = parser.parse_args()
254   -
255   - # Print help if no arguments are passed
256   - if len(args) == 0:
257   - print('MacroRaptor %s - http://decalage.info/python/oletools' % __version__)
258   - print('This is work in progress, please report issues at %s' % URL_ISSUES)
259   - print(__doc__)
260   - parser.print_help()
261   - print('\nAn exit code is returned based on the analysis result:')
262   - for result in (Result_NoMacro, Result_NotMSOffice, Result_MacroOK, Result_Error, Result_Suspicious):
263   - print(' - %d: %s' % (result.exit_code, result.name))
264   - sys.exit()
265   -
266   - # print banner with version
267   - print('MacroRaptor %s - http://decalage.info/python/oletools' % __version__)
268   - print('This is work in progress, please report issues at %s' % URL_ISSUES)
269   -
270   - logging.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(levelname)-8s %(message)s')
271   - # enable logging in the modules:
272   - log.setLevel(logging.NOTSET)
273   -
274   - t = tablestream.TableStream(style=tablestream.TableStyleSlim,
275   - header_row=['Result', 'Flags', 'Type', 'File'],
276   - column_width=[10, 5, 4, 56])
277   -
278   - exitcode = -1
279   - global_result = None
280   - # TODO: handle errors in xglob, to continue processing the next files
281   - for container, filename, data in xglob.iter_files(args, recursive=options.recursive,
282   - zip_password=options.zip_password, zip_fname=options.zip_fname):
283   - # ignore directory names stored in zip files:
284   - if container and filename.endswith('/'):
285   - continue
286   - full_name = '%s in %s' % (filename, container) if container else filename
287   - # try:
288   - # # Open the file
289   - # if data is None:
290   - # data = open(filename, 'rb').read()
291   - # except:
292   - # log.exception('Error when opening file %r' % full_name)
293   - # continue
294   - if isinstance(data, Exception):
295   - result = Result_Error
296   - t.write_row([result.name, '', '', full_name],
297   - colors=[result.color, None, None, None])
298   - t.write_row(['', '', '', str(data)],
299   - colors=[None, None, None, result.color])
300   - else:
301   - filetype = '???'
302   - try:
303   - vba_parser = olevba.VBA_Parser(filename=filename, data=data, container=container)
304   - filetype = TYPE2TAG[vba_parser.type]
305   - except Exception as e:
306   - # log.error('Error when parsing VBA macros from file %r' % full_name)
307   - # TODO: distinguish actual errors from non-MSOffice files
308   - result = Result_Error
309   - t.write_row([result.name, '', filetype, full_name],
310   - colors=[result.color, None, None, None])
311   - t.write_row(['', '', '', str(e)],
312   - colors=[None, None, None, result.color])
313   - continue
314   - if vba_parser.detect_vba_macros():
315   - vba_code_all_modules = ''
316   - try:
317   - for (subfilename, stream_path, vba_filename, vba_code) in vba_parser.extract_all_macros():
318   - vba_code_all_modules += vba_code.decode('utf-8','replace') + '\n'
319   - except Exception as e:
320   - # log.error('Error when parsing VBA macros from file %r' % full_name)
321   - result = Result_Error
322   - t.write_row([result.name, '', TYPE2TAG[vba_parser.type], full_name],
323   - colors=[result.color, None, None, None])
324   - t.write_row(['', '', '', str(e)],
325   - colors=[None, None, None, result.color])
326   - continue
327   - mraptor = MacroRaptor(vba_code_all_modules)
328   - mraptor.scan()
329   - if mraptor.suspicious:
330   - result = Result_Suspicious
331   - else:
332   - result = Result_MacroOK
333   - t.write_row([result.name, mraptor.get_flags(), filetype, full_name],
334   - colors=[result.color, None, None, None])
335   - if mraptor.matches and options.show_matches:
336   - t.write_row(['', '', '', 'Matches: %r' % mraptor.matches])
337   - else:
338   - result = Result_NoMacro
339   - t.write_row([result.name, '', filetype, full_name],
340   - colors=[result.color, None, None, None])
341   - if result.exit_code > exitcode:
342   - global_result = result
343   - exitcode = result.exit_code
344   -
345   - print('')
346   - print('Flags: A=AutoExec, W=Write, X=Execute')
347   - print('Exit code: %d - %s' % (exitcode, global_result.name))
348   - sys.exit(exitcode)
  19 +from oletools.mraptor import *
  20 +from oletools.mraptor import __doc__, __version__
349 21  
350 22 if __name__ == '__main__':
351 23 main()
352   -
353   -# Soundtrack: "Dark Child" by Marlon Williams
... ...
oletools/mraptor_milter.py
... ... @@ -98,18 +98,7 @@ from oletools import olevba, mraptor
98 98  
99 99 from Milter.utils import parse_addr
100 100  
101   -if sys.version_info[0] <= 2:
102   - # Python 2.x
103   - if sys.version_info[1] <= 6:
104   - # Python 2.6
105   - # use is_zipfile backported from Python 2.7:
106   - from oletools.thirdparty.zipfile27 import is_zipfile
107   - else:
108   - # Python 2.7
109   - from zipfile import is_zipfile
110   -else:
111   - # Python 3.x+
112   - from zipfile import is_zipfile
  101 +from zipfile import is_zipfile
113 102  
114 103  
115 104  
... ...
oletools/msodde.py
... ... @@ -11,7 +11,6 @@ Supported formats:
11 11 - RTF
12 12 - CSV (exported from / imported into Excel)
13 13 - XML (exported from Word 2003, Word 2007+, Excel 2003, (Excel 2007+?)
14   -- raises an error if run with files encrypted using MS Crypto API RC4
15 14  
16 15 Author: Philippe Lagadec - http://www.decalage.info
17 16 License: BSD, see source code or documentation
... ... @@ -22,7 +21,7 @@ http://www.decalage.info/python/oletools
22 21  
23 22 # === LICENSE =================================================================
24 23  
25   -# msodde is copyright (c) 2017-2018 Philippe Lagadec (http://www.decalage.info)
  24 +# msodde is copyright (c) 2017-2019 Philippe Lagadec (http://www.decalage.info)
26 25 # All rights reserved.
27 26 #
28 27 # Redistribution and use in source and binary forms, with or without
... ... @@ -52,19 +51,30 @@ from __future__ import print_function
52 51  
53 52 import argparse
54 53 import os
55   -from os.path import abspath, dirname
56 54 import sys
57 55 import re
58 56 import csv
59 57  
60 58 import olefile
61 59  
  60 +# IMPORTANT: it should be possible to run oletools directly as scripts
  61 +# in any directory without installing them with pip or setup.py.
  62 +# In that case, relative imports are NOT usable.
  63 +# And to enable Python 2+3 compatibility, we need to use absolute imports,
  64 +# so we add the oletools parent folder to sys.path (absolute+normalized path):
  65 +_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
  66 +# print('_thismodule_dir = %r' % _thismodule_dir)
  67 +_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
  68 +# print('_parent_dir = %r' % _thirdparty_dir)
  69 +if _parent_dir not in sys.path:
  70 + sys.path.insert(0, _parent_dir)
  71 +
62 72 from oletools import ooxml
63 73 from oletools import xls_parser
64 74 from oletools import rtfobj
65   -from oletools import oleid
  75 +from oletools.ppt_record_parser import is_ppt
  76 +from oletools import crypto
66 77 from oletools.common.log_helper import log_helper
67   -from oletools.common.errors import FileIsEncryptedError
68 78  
69 79 # -----------------------------------------------------------------------------
70 80 # CHANGELOG:
... ... @@ -88,8 +98,11 @@ from oletools.common.errors import FileIsEncryptedError
88 98 # 2018-03-21 CH: - added detection for various CSV formulas (issue #259)
89 99 # 2018-09-11 v0.54 PL: - olefile is now a dependency
90 100 # 2018-10-25 CH: - detect encryption and raise error if detected
  101 +# 2019-03-25 CH: - added decryption of password-protected files
  102 +# 2019-07-17 v0.55 CH: - fixed issue #267, unicode error on Python 2
  103 +
91 104  
92   -__version__ = '0.54dev4'
  105 +__version__ = '0.55.dev3'
93 106  
94 107 # -----------------------------------------------------------------------------
95 108 # TODO: field codes can be in headers/footers/comments - parse these
... ... @@ -305,6 +318,9 @@ def process_args(cmd_line_args=None):
305 318 default=DEFAULT_LOG_LEVEL,
306 319 help="logging level debug/info/warning/error/critical "
307 320 "(default=%(default)s)")
  321 + parser.add_argument("-p", "--password", type=str, action='append',
  322 + help='if encrypted office files are encountered, try '
  323 + 'decryption with this password. May be repeated.')
308 324 filter_group = parser.add_argument_group(
309 325 title='Filter which OpenXML field commands are returned',
310 326 description='Only applies to OpenXML (e.g. docx) and rtf, not to OLE '
... ... @@ -348,14 +364,13 @@ def process_doc_field(data):
348 364 """ check if field instructions start with DDE
349 365  
350 366 expects unicode input, returns unicode output (empty if not dde) """
351   - logger.debug('processing field {0}'.format(data))
  367 + logger.debug(u'processing field {0}'.format(data))
352 368  
353 369 if data.lstrip().lower().startswith(u'dde'):
354 370 return data
355   - elif data.lstrip().lower().startswith(u'\x00d\x00d\x00e\x00'):
  371 + if data.lstrip().lower().startswith(u'\x00d\x00d\x00e\x00'):
356 372 return data
357   - else:
358   - return u''
  373 + return u''
359 374  
360 375  
361 376 OLE_FIELD_START = 0x13
... ... @@ -379,7 +394,7 @@ def process_doc_stream(stream):
379 394 while True:
380 395 idx += 1
381 396 char = stream.read(1) # loop over every single byte
382   - if len(char) == 0:
  397 + if len(char) == 0: # pylint: disable=len-as-condition
383 398 break
384 399 else:
385 400 char = ord(char)
... ... @@ -417,7 +432,7 @@ def process_doc_stream(stream):
417 432 pass
418 433 elif len(field_contents) > OLE_FIELD_MAX_SIZE:
419 434 logger.debug('field exceeds max size of {0}. Ignore rest'
420   - .format(OLE_FIELD_MAX_SIZE))
  435 + .format(OLE_FIELD_MAX_SIZE))
421 436 max_size_exceeded = True
422 437  
423 438 # appending a raw byte to a unicode string here. Not clean but
... ... @@ -437,7 +452,7 @@ def process_doc_stream(stream):
437 452 logger.debug('big field was not a field after all')
438 453  
439 454 logger.debug('Checked {0} characters, found {1} fields'
440   - .format(idx, len(result_parts)))
  455 + .format(idx, len(result_parts)))
441 456  
442 457 return result_parts
443 458  
... ... @@ -462,11 +477,10 @@ def process_doc(ole):
462 477 direntry = ole._load_direntry(sid)
463 478 is_stream = direntry.entry_type == olefile.STGTY_STREAM
464 479 logger.debug('direntry {:2d} {}: {}'
465   - .format(sid, '[orphan]' if is_orphan else direntry.name,
466   - 'is stream of size {}'.format(direntry.size)
467   - if is_stream else
468   - 'no stream ({})'
469   - .format(direntry.entry_type)))
  480 + .format(sid, '[orphan]' if is_orphan else direntry.name,
  481 + 'is stream of size {}'.format(direntry.size)
  482 + if is_stream else
  483 + 'no stream ({})'.format(direntry.entry_type)))
470 484 if is_stream:
471 485 new_parts = process_doc_stream(
472 486 ole._open(direntry.isectStart, direntry.size))
... ... @@ -480,17 +494,23 @@ def process_xls(filepath):
480 494 """ find dde links in excel ole file """
481 495  
482 496 result = []
483   - for stream in xls_parser.XlsFile(filepath).iter_streams():
484   - if not isinstance(stream, xls_parser.WorkbookStream):
485   - continue
486   - for record in stream.iter_records():
487   - if not isinstance(record, xls_parser.XlsRecordSupBook):
  497 + xls_file = None
  498 + try:
  499 + xls_file = xls_parser.XlsFile(filepath)
  500 + for stream in xls_file.iter_streams():
  501 + if not isinstance(stream, xls_parser.WorkbookStream):
488 502 continue
489   - if record.support_link_type in (
490   - xls_parser.XlsRecordSupBook.LINK_TYPE_OLE_DDE,
491   - xls_parser.XlsRecordSupBook.LINK_TYPE_EXTERNAL):
492   - result.append(record.virt_path.replace(u'\u0003', u' '))
493   - return u'\n'.join(result)
  503 + for record in stream.iter_records():
  504 + if not isinstance(record, xls_parser.XlsRecordSupBook):
  505 + continue
  506 + if record.support_link_type in (
  507 + xls_parser.XlsRecordSupBook.LINK_TYPE_OLE_DDE,
  508 + xls_parser.XlsRecordSupBook.LINK_TYPE_EXTERNAL):
  509 + result.append(record.virt_path.replace(u'\u0003', u' '))
  510 + return u'\n'.join(result)
  511 + finally:
  512 + if xls_file is not None:
  513 + xls_file.close()
494 514  
495 515  
496 516 def process_docx(filepath, field_filter_mode=None):
... ... @@ -525,7 +545,8 @@ def process_docx(filepath, field_filter_mode=None):
525 545 else:
526 546 elem = curr_elem
527 547 if elem is None:
528   - raise BadOOXML(filepath, 'Got "None"-Element from iter_xml')
  548 + raise ooxml.BadOOXML(filepath,
  549 + 'Got "None"-Element from iter_xml')
529 550  
530 551 # check if FLDCHARTYPE and whether "begin" or "end" tag
531 552 attrib_type = elem.attrib.get(ATTR_W_FLDCHARTYPE[0]) or \
... ... @@ -535,7 +556,7 @@ def process_docx(filepath, field_filter_mode=None):
535 556 level += 1
536 557 if attrib_type == "end":
537 558 level -= 1
538   - if level == 0 or level == -1: # edge-case; level gets -1
  559 + if level in (0, -1): # edge-case; level gets -1
539 560 all_fields.append(ddetext)
540 561 ddetext = u''
541 562 level = 0 # reset edge-case
... ... @@ -564,6 +585,7 @@ def process_docx(filepath, field_filter_mode=None):
564 585  
565 586  
566 587 def unquote(field):
  588 + """TODO: document what exactly is happening here..."""
567 589 if "QUOTE" not in field or NO_QUOTES:
568 590 return field
569 591 # split into components
... ... @@ -605,8 +627,8 @@ def field_is_blacklisted(contents):
605 627 index = FIELD_BLACKLIST_CMDS.index(words[0].lower())
606 628 except ValueError: # first word is no blacklisted command
607 629 return False
608   - logger.debug('trying to match "{0}" to blacklist command {1}'
609   - .format(contents, FIELD_BLACKLIST[index]))
  630 + logger.debug(u'trying to match "{0}" to blacklist command {1}'
  631 + .format(contents, FIELD_BLACKLIST[index]))
610 632 _, nargs_required, nargs_optional, sw_with_arg, sw_solo, sw_format \
611 633 = FIELD_BLACKLIST[index]
612 634  
... ... @@ -617,12 +639,13 @@ def field_is_blacklisted(contents):
617 639 break
618 640 nargs += 1
619 641 if nargs < nargs_required:
620   - logger.debug('too few args: found {0}, but need at least {1} in "{2}"'
621   - .format(nargs, nargs_required, contents))
  642 + logger.debug(u'too few args: found {0}, but need at least {1} in "{2}"'
  643 + .format(nargs, nargs_required, contents))
622 644 return False
623   - elif nargs > nargs_required + nargs_optional:
624   - logger.debug('too many args: found {0}, but need at most {1}+{2} in "{3}"'
625   - .format(nargs, nargs_required, nargs_optional, contents))
  645 + if nargs > nargs_required + nargs_optional:
  646 + logger.debug(u'too many args: found {0}, but need at most {1}+{2} in '
  647 + u'"{3}"'
  648 + .format(nargs, nargs_required, nargs_optional, contents))
626 649 return False
627 650  
628 651 # check switches
... ... @@ -631,15 +654,15 @@ def field_is_blacklisted(contents):
631 654 for word in words[1+nargs:]:
632 655 if expect_arg: # this is an argument for the last switch
633 656 if arg_choices and (word not in arg_choices):
634   - logger.debug('Found invalid switch argument "{0}" in "{1}"'
635   - .format(word, contents))
  657 + logger.debug(u'Found invalid switch argument "{0}" in "{1}"'
  658 + .format(word, contents))
636 659 return False
637 660 expect_arg = False
638 661 arg_choices = [] # in general, do not enforce choices
639 662 continue # "no further questions, your honor"
640 663 elif not FIELD_SWITCH_REGEX.match(word):
641   - logger.debug('expected switch, found "{0}" in "{1}"'
642   - .format(word, contents))
  664 + logger.debug(u'expected switch, found "{0}" in "{1}"'
  665 + .format(word, contents))
643 666 return False
644 667 # we want a switch and we got a valid one
645 668 switch = word[1]
... ... @@ -660,8 +683,8 @@ def field_is_blacklisted(contents):
660 683 if 'numeric' in sw_format:
661 684 arg_choices = [] # too many choices to list them here
662 685 else:
663   - logger.debug('unexpected switch {0} in "{1}"'
664   - .format(switch, contents))
  686 + logger.debug(u'unexpected switch {0} in "{1}"'
  687 + .format(switch, contents))
665 688 return False
666 689  
667 690 # if nothing went wrong sofar, the contents seems to match the blacklist
... ... @@ -676,7 +699,7 @@ def process_xlsx(filepath):
676 699 tag = elem.tag.lower()
677 700 if tag == 'ddelink' or tag.endswith('}ddelink'):
678 701 # we have found a dde link. Try to get more info about it
679   - link_info = ['DDE-Link']
  702 + link_info = []
680 703 if 'ddeService' in elem.attrib:
681 704 link_info.append(elem.attrib['ddeService'])
682 705 if 'ddeTopic' in elem.attrib:
... ... @@ -687,16 +710,15 @@ def process_xlsx(filepath):
687 710 for subfile, content_type, handle in parser.iter_non_xml():
688 711 try:
689 712 logger.info('Parsing non-xml subfile {0} with content type {1}'
690   - .format(subfile, content_type))
  713 + .format(subfile, content_type))
691 714 for record in xls_parser.parse_xlsb_part(handle, content_type,
692 715 subfile):
693 716 logger.debug('{0}: {1}'.format(subfile, record))
694 717 if isinstance(record, xls_parser.XlsbBeginSupBook) and \
695 718 record.link_type == \
696 719 xls_parser.XlsbBeginSupBook.LINK_TYPE_DDE:
697   - dde_links.append('DDE-Link ' + record.string1 + ' ' +
698   - record.string2)
699   - except Exception:
  720 + dde_links.append(record.string1 + ' ' + record.string2)
  721 + except Exception as exc:
700 722 if content_type.startswith('application/vnd.ms-excel.') or \
701 723 content_type.startswith('application/vnd.ms-office.'): # pylint: disable=bad-indentation
702 724 # should really be able to parse these either as xml or records
... ... @@ -727,7 +749,8 @@ class RtfFieldParser(rtfobj.RtfParser):
727 749  
728 750 def open_destination(self, destination):
729 751 if destination.cword == b'fldinst':
730   - logger.debug('*** Start field data at index %Xh' % destination.start)
  752 + logger.debug('*** Start field data at index %Xh'
  753 + % destination.start)
731 754  
732 755 def close_destination(self, destination):
733 756 if destination.cword == b'fldinst':
... ... @@ -758,7 +781,7 @@ def process_rtf(file_handle, field_filter_mode=None):
758 781 all_fields = [field.decode('ascii') for field in rtfparser.fields]
759 782 # apply field command filter
760 783 logger.debug('found {1} fields, filtering with mode "{0}"'
761   - .format(field_filter_mode, len(all_fields)))
  784 + .format(field_filter_mode, len(all_fields)))
762 785 if field_filter_mode in (FIELD_FILTER_ALL, None):
763 786 clean_fields = all_fields
764 787 elif field_filter_mode == FIELD_FILTER_DDE:
... ... @@ -815,11 +838,12 @@ def process_csv(filepath):
815 838 results, _ = process_csv_dialect(file_handle, delim)
816 839 except csv.Error: # e.g. sniffing fails
817 840 logger.debug('failed to csv-parse with delimiter {0!r}'
818   - .format(delim))
  841 + .format(delim))
819 842  
820 843 if is_small and not results:
821 844 # try whole file as single cell, since sniffing fails in this case
822   - logger.debug('last attempt: take whole file as single unquoted cell')
  845 + logger.debug('last attempt: take whole file as single unquoted '
  846 + 'cell')
823 847 file_handle.seek(0)
824 848 match = CSV_DDE_FORMAT.match(file_handle.read(CSV_SMALL_THRESH))
825 849 if match:
... ... @@ -836,8 +860,8 @@ def process_csv_dialect(file_handle, delimiters):
836 860 delimiters=delimiters)
837 861 dialect.strict = False # microsoft is never strict
838 862 logger.debug('sniffed csv dialect with delimiter {0!r} '
839   - 'and quote char {1!r}'
840   - .format(dialect.delimiter, dialect.quotechar))
  863 + 'and quote char {1!r}'
  864 + .format(dialect.delimiter, dialect.quotechar))
841 865  
842 866 # rewind file handle to start
843 867 file_handle.seek(0)
... ... @@ -877,7 +901,7 @@ def process_excel_xml(filepath):
877 901 break
878 902 if formula is None:
879 903 continue
880   - logger.debug('found cell with formula {0}'.format(formula))
  904 + logger.debug(u'found cell with formula {0}'.format(formula))
881 905 match = re.match(XML_DDE_FORMAT, formula)
882 906 if match:
883 907 dde_links.append(u' '.join(match.groups()[:2]))
... ... @@ -891,19 +915,11 @@ def process_file(filepath, field_filter_mode=None):
891 915 if xls_parser.is_xls(filepath):
892 916 logger.debug('Process file as excel 2003 (xls)')
893 917 return process_xls(filepath)
894   -
895   - # encrypted files also look like ole, even if office 2007+ (xml-based)
896   - # so check for encryption, first
897   - ole = olefile.OleFileIO(filepath, path_encoding=None)
898   - oid = oleid.OleID(ole)
899   - if oid.check_encrypted().value:
900   - log.debug('is encrypted - raise error')
901   - raise FileIsEncryptedError(filepath)
902   - elif oid.check_powerpoint().value:
903   - log.debug('is ppt - cannot have DDE')
  918 + if is_ppt(filepath):
  919 + logger.debug('is ppt - cannot have DDE')
904 920 return u''
905   - else:
906   - logger.debug('Process file as word 2003 (doc)')
  921 + logger.debug('Process file as word 2003 (doc)')
  922 + with olefile.OleFileIO(filepath, path_encoding=None) as ole:
907 923 return process_doc(ole)
908 924  
909 925 with open(filepath, 'rb') as file_handle:
... ... @@ -921,22 +937,77 @@ def process_file(filepath, field_filter_mode=None):
921 937 if doctype == ooxml.DOCTYPE_EXCEL:
922 938 logger.debug('Process file as excel 2007+ (xlsx)')
923 939 return process_xlsx(filepath)
924   - elif doctype in (ooxml.DOCTYPE_EXCEL_XML, ooxml.DOCTYPE_EXCEL_XML2003):
  940 + if doctype in (ooxml.DOCTYPE_EXCEL_XML, ooxml.DOCTYPE_EXCEL_XML2003):
925 941 logger.debug('Process file as xml from excel 2003/2007+')
926 942 return process_excel_xml(filepath)
927   - elif doctype in (ooxml.DOCTYPE_WORD_XML, ooxml.DOCTYPE_WORD_XML2003):
  943 + if doctype in (ooxml.DOCTYPE_WORD_XML, ooxml.DOCTYPE_WORD_XML2003):
928 944 logger.debug('Process file as xml from word 2003/2007+')
929 945 return process_docx(filepath)
930   - elif doctype is None:
  946 + if doctype is None:
931 947 logger.debug('Process file as csv')
932 948 return process_csv(filepath)
933   - else: # could be docx; if not: this is the old default code path
934   - logger.debug('Process file as word 2007+ (docx)')
935   - return process_docx(filepath, field_filter_mode)
  949 + # could be docx; if not: this is the old default code path
  950 + logger.debug('Process file as word 2007+ (docx)')
  951 + return process_docx(filepath, field_filter_mode)
936 952  
937 953  
938 954 # === MAIN =================================================================
939 955  
  956 +
  957 +def process_maybe_encrypted(filepath, passwords=None, crypto_nesting=0,
  958 + **kwargs):
  959 + """
  960 + Process a file that might be encrypted.
  961 +
  962 + Calls :py:func:`process_file` and if that fails tries to decrypt and
  963 + process the result. Based on recommendation in module doc string of
  964 + :py:mod:`oletools.crypto`.
  965 +
  966 + :param str filepath: path to file on disc.
  967 + :param passwords: list of passwords (str) to try for decryption or None
  968 + :param int crypto_nesting: How many decryption layers were already used to
  969 + get the given file.
  970 + :param kwargs: same as :py:func:`process_file`
  971 + :returns: same as :py:func:`process_file`
  972 + """
  973 + result = u''
  974 + try:
  975 + result = process_file(filepath, **kwargs)
  976 + if not crypto.is_encrypted(filepath):
  977 + return result
  978 + except Exception:
  979 + logger.debug('Ignoring exception:', exc_info=True)
  980 + if not crypto.is_encrypted(filepath):
  981 + raise
  982 +
  983 + # we reach this point only if file is encrypted
  984 + # check if this is an encrypted file in an encrypted file in an ...
  985 + if crypto_nesting >= crypto.MAX_NESTING_DEPTH:
  986 + raise crypto.MaxCryptoNestingReached(crypto_nesting, filepath)
  987 +
  988 + decrypted_file = None
  989 + if passwords is None:
  990 + passwords = crypto.DEFAULT_PASSWORDS
  991 + else:
  992 + passwords = list(passwords) + crypto.DEFAULT_PASSWORDS
  993 + try:
  994 + logger.debug('Trying to decrypt file')
  995 + decrypted_file = crypto.decrypt(filepath, passwords)
  996 + if not decrypted_file:
  997 + logger.error('Decrypt failed, run with debug output to get details')
  998 + raise crypto.WrongEncryptionPassword(filepath)
  999 + logger.info('Analyze decrypted file')
  1000 + result = process_maybe_encrypted(decrypted_file, passwords,
  1001 + crypto_nesting+1, **kwargs)
  1002 + finally: # clean up
  1003 + try: # (maybe file was not yet created)
  1004 + os.unlink(decrypted_file)
  1005 + except Exception:
  1006 + logger.debug('Ignoring exception closing decrypted file:',
  1007 + exc_info=True)
  1008 + return result
  1009 +
  1010 +
940 1011 def main(cmd_line_args=None):
941 1012 """ Main function, called if this file is called as a script
942 1013  
... ... @@ -961,13 +1032,16 @@ def main(cmd_line_args=None):
961 1032 text = ''
962 1033 return_code = 1
963 1034 try:
964   - text = process_file(args.filepath, args.field_filter_mode)
  1035 + text = process_maybe_encrypted(
  1036 + args.filepath, args.password,
  1037 + field_filter_mode=args.field_filter_mode)
965 1038 return_code = 0
966 1039 except Exception as exc:
967   - logger.exception(exc.message)
  1040 + logger.exception(str(exc))
968 1041  
969 1042 logger.print_str('DDE Links:')
970   - logger.print_str(text)
  1043 + for link in text.splitlines():
  1044 + logger.print_str(text, type='dde-link')
971 1045  
972 1046 log_helper.end_logging()
973 1047  
... ...
oletools/olebrowse.py
... ... @@ -12,7 +12,7 @@ olebrowse project website: http://www.decalage.info/python/olebrowse
12 12 olebrowse is part of the python-oletools package:
13 13 http://www.decalage.info/python/oletools
14 14  
15   -olebrowse is copyright (c) 2012-2017, Philippe Lagadec (http://www.decalage.info)
  15 +olebrowse is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
16 16 All rights reserved.
17 17  
18 18 Redistribution and use in source and binary forms, with or without modification,
... ... @@ -43,7 +43,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43 43 # 2017-04-26 v0.51 PL: - fixed absolute imports (issue #141)
44 44 # 2018-09-11 v0.54 PL: - olefile is now a dependency
45 45  
46   -__version__ = '0.54dev1'
  46 +__version__ = '0.54'
47 47  
48 48 #------------------------------------------------------------------------------
49 49 # TODO:
... ...
oletools/oledir.py
... ... @@ -14,7 +14,7 @@ http://www.decalage.info/python/oletools
14 14  
15 15 #=== LICENSE ==================================================================
16 16  
17   -# oledir is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info)
  17 +# oledir is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info)
18 18 # All rights reserved.
19 19 #
20 20 # Redistribution and use in source and binary forms, with or without modification,
... ... @@ -53,7 +53,7 @@ from __future__ import print_function
53 53 # 2018-08-28 v0.54 PL: - olefile is now a dependency
54 54 # 2018-10-06 - colorclass is now a dependency
55 55  
56   -__version__ = '0.54dev1'
  56 +__version__ = '0.54'
57 57  
58 58 #------------------------------------------------------------------------------
59 59 # TODO:
... ...
oletools/oleform.py
1 1 #!/usr/bin/env python
2 2  
  3 +# REFERENCES:
  4 +# - MS-OFORMS: https://msdn.microsoft.com/en-us/library/office/cc313125%28v=office.12%29.aspx?f=255&MSPPError=-2147217396
  5 +
3 6 # CHANGELOG:
4 7 # 2018-02-19 v0.53 PL: - fixed issue #260, removed long integer literals
5 8  
... ...
oletools/oleid.py
... ... @@ -17,7 +17,7 @@ http://www.decalage.info/python/oletools
17 17  
18 18 #=== LICENSE =================================================================
19 19  
20   -# oleid is copyright (c) 2012-2018, Philippe Lagadec (http://www.decalage.info)
  20 +# oleid is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
21 21 # All rights reserved.
22 22 #
23 23 # Redistribution and use in source and binary forms, with or without
... ... @@ -59,7 +59,7 @@ from __future__ import print_function
59 59 # 2018-10-19 CH: - accept olefile as well as filename, return Indicators,
60 60 # improve encryption detection for ppt
61 61  
62   -__version__ = '0.54dev4'
  62 +__version__ = '0.54'
63 63  
64 64  
65 65 #------------------------------------------------------------------------------
... ... @@ -80,22 +80,26 @@ __version__ = &#39;0.54dev4&#39;
80 80  
81 81 #=== IMPORTS =================================================================
82 82  
83   -import argparse, sys, re, zlib, struct
  83 +import argparse, sys, re, zlib, struct, os
84 84 from os.path import dirname, abspath
85 85  
86   -# little hack to allow absolute imports even if oletools is not installed
87   -# (required to run oletools directly as scripts in any directory).
88   -try:
89   - from oletools.thirdparty import prettytable
90   -except ImportError:
91   - PARENT_DIR = dirname(dirname(abspath(__file__)))
92   - if PARENT_DIR not in sys.path:
93   - sys.path.insert(0, PARENT_DIR)
94   - del PARENT_DIR
95   - from oletools.thirdparty import prettytable
96   -
97 86 import olefile
98 87  
  88 +# IMPORTANT: it should be possible to run oletools directly as scripts
  89 +# in any directory without installing them with pip or setup.py.
  90 +# In that case, relative imports are NOT usable.
  91 +# And to enable Python 2+3 compatibility, we need to use absolute imports,
  92 +# so we add the oletools parent folder to sys.path (absolute+normalized path):
  93 +_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
  94 +# print('_thismodule_dir = %r' % _thismodule_dir)
  95 +_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
  96 +# print('_parent_dir = %r' % _thirdparty_dir)
  97 +if _parent_dir not in sys.path:
  98 + sys.path.insert(0, _parent_dir)
  99 +
  100 +from oletools.thirdparty.prettytable import prettytable
  101 +from oletools import crypto
  102 +
99 103  
100 104  
101 105 #=== FUNCTIONS ===============================================================
... ... @@ -279,20 +283,7 @@ class OleID(object):
279 283 self.indicators.append(encrypted)
280 284 if not self.ole:
281 285 return None
282   - # check if bit 1 of security field = 1:
283   - # (this field may be missing for Powerpoint2000, for example)
284   - if self.suminfo_data is None:
285   - self.check_properties()
286   - if 0x13 in self.suminfo_data:
287   - if self.suminfo_data[0x13] & 1:
288   - encrypted.value = True
289   - # check if this is an OpenXML encrypted file
290   - elif self.ole.exists('EncryptionInfo'):
291   - encrypted.value = True
292   - # or an encrypted ppt file
293   - if self.ole.exists('EncryptedSummary') and \
294   - not self.ole.exists('SummaryInformation'):
295   - encrypted.value = True
  286 + encrypted.value = crypto.is_encrypted(self.ole)
296 287 return encrypted
297 288  
298 289 def check_word(self):
... ... @@ -316,27 +307,7 @@ class OleID(object):
316 307 return None, None
317 308 if self.ole.exists('WordDocument'):
318 309 word.value = True
319   - # check for Word-specific encryption flag:
320   - stream = None
321   - try:
322   - stream = self.ole.openstream(["WordDocument"])
323   - # pass header 10 bytes
324   - stream.read(10)
325   - # read flag structure:
326   - temp16 = struct.unpack("H", stream.read(2))[0]
327   - f_encrypted = (temp16 & 0x0100) >> 8
328   - if f_encrypted:
329   - # correct encrypted indicator if present or add one
330   - encrypt_ind = self.get_indicator('encrypted')
331   - if encrypt_ind:
332   - encrypt_ind.value = True
333   - else:
334   - self.indicators.append('encrypted', True, name='Encrypted')
335   - except Exception:
336   - raise
337   - finally:
338   - if stream is not None:
339   - stream.close()
  310 +
340 311 # check for VBA macros:
341 312 if self.ole.exists('Macros'):
342 313 macros.value = True
... ...
oletools/olemap.py
... ... @@ -13,7 +13,7 @@ http://www.decalage.info/python/oletools
13 13  
14 14 #=== LICENSE ==================================================================
15 15  
16   -# olemap is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info)
  16 +# olemap is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info)
17 17 # All rights reserved.
18 18 #
19 19 # Redistribution and use in source and binary forms, with or without modification,
... ... @@ -52,8 +52,9 @@ http://www.decalage.info/python/oletools
52 52 # 2017-03-23 PL: - only display the header by default
53 53 # - added option --exdata to display extra data in hex
54 54 # 2018-08-28 v0.54 PL: - olefile is now a dependency
  55 +# 2019-07-10 v0.55 PL: - fixed display of OLE header CLSID (issue #394)
55 56  
56   -__version__ = '0.54dev1'
  57 +__version__ = '0.55.dev3'
57 58  
58 59 #------------------------------------------------------------------------------
59 60 # TODO:
... ... @@ -121,7 +122,7 @@ def show_header(ole, extra_data=False):
121 122 print("OLE HEADER:")
122 123 t = tablestream.TableStream([24, 16, 79-(4+24+16)], header_row=['Attribute', 'Value', 'Description'])
123 124 t.write_row(['OLE Signature (hex)', binascii.b2a_hex(ole.header_signature).upper(), 'Should be D0CF11E0A1B11AE1'])
124   - t.write_row(['Header CLSID (hex)', binascii.b2a_hex(ole.header_clsid).upper(), 'Should be 0'])
  125 + t.write_row(['Header CLSID', ole.header_clsid, 'Should be empty (0)'])
125 126 t.write_row(['Minor Version', '%04X' % ole.minor_version, 'Should be 003E'])
126 127 t.write_row(['Major Version', '%04X' % ole.dll_version, 'Should be 3 or 4'])
127 128 t.write_row(['Byte Order', '%04X' % ole.byte_order, 'Should be FFFE (little endian)'])
... ...
oletools/olemeta.py
... ... @@ -15,7 +15,7 @@ http://www.decalage.info/python/oletools
15 15  
16 16 #=== LICENSE =================================================================
17 17  
18   -# olemeta is copyright (c) 2013-2018, Philippe Lagadec (http://www.decalage.info)
  18 +# olemeta is copyright (c) 2013-2019, Philippe Lagadec (http://www.decalage.info)
19 19 # All rights reserved.
20 20 #
21 21 # Redistribution and use in source and binary forms, with or without modification,
... ... @@ -51,7 +51,7 @@ http://www.decalage.info/python/oletools
51 51 # 2017-05-04 PL: - added optparse and xglob (issue #141)
52 52 # 2018-09-11 v0.54 PL: - olefile is now a dependency
53 53  
54   -__version__ = '0.54dev1'
  54 +__version__ = '0.54'
55 55  
56 56 #------------------------------------------------------------------------------
57 57 # TODO:
... ...
oletools/oleobj.py
... ... @@ -14,7 +14,7 @@ http://www.decalage.info/python/oletools
14 14  
15 15 # === LICENSE =================================================================
16 16  
17   -# oleobj is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info)
  17 +# oleobj is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info)
18 18 # All rights reserved.
19 19 #
20 20 # Redistribution and use in source and binary forms, with or without
... ... @@ -89,7 +89,7 @@ from oletools.ooxml import XmlParser
89 89 # 2018-09-11 v0.54 PL: - olefile is now a dependency
90 90 # 2018-10-30 SA: - added detection of external links (PR #317)
91 91  
92   -__version__ = '0.54dev4'
  92 +__version__ = '0.54'
93 93  
94 94 # -----------------------------------------------------------------------------
95 95 # TODO:
... ... @@ -526,29 +526,35 @@ def find_ole_in_ppt(filename):
526 526 can contain the actual embedded file we are looking for (caller will check
527 527 for these).
528 528 """
529   - for stream in PptFile(filename).iter_streams():
530   - for record_idx, record in enumerate(stream.iter_records()):
531   - if isinstance(record, PptRecordExOleVbaActiveXAtom):
532   - ole = None
533   - try:
534   - data_start = next(record.iter_uncompressed())
535   - if data_start[:len(olefile.MAGIC)] != olefile.MAGIC:
536   - continue # could be an ActiveX control or VBA Storage
537   -
538   - # otherwise, this should be an OLE object
539   - log.debug('Found record with embedded ole object in ppt '
540   - '(stream "{0}", record no {1})'
541   - .format(stream.name, record_idx))
542   - ole = record.get_data_as_olefile()
543   - yield ole
544   - except IOError:
545   - log.warning('Error reading data from {0} stream or '
546   - 'interpreting it as OLE object'
547   - .format(stream.name))
548   - log.debug('', exc_info=True)
549   - finally:
550   - if ole is not None:
551   - ole.close()
  529 + ppt_file = None
  530 + try:
  531 + ppt_file = PptFile(filename)
  532 + for stream in ppt_file.iter_streams():
  533 + for record_idx, record in enumerate(stream.iter_records()):
  534 + if isinstance(record, PptRecordExOleVbaActiveXAtom):
  535 + ole = None
  536 + try:
  537 + data_start = next(record.iter_uncompressed())
  538 + if data_start[:len(olefile.MAGIC)] != olefile.MAGIC:
  539 + continue # could be ActiveX control / VBA Storage
  540 +
  541 + # otherwise, this should be an OLE object
  542 + log.debug('Found record with embedded ole object in '
  543 + 'ppt (stream "{0}", record no {1})'
  544 + .format(stream.name, record_idx))
  545 + ole = record.get_data_as_olefile()
  546 + yield ole
  547 + except IOError:
  548 + log.warning('Error reading data from {0} stream or '
  549 + 'interpreting it as OLE object'
  550 + .format(stream.name))
  551 + log.debug('', exc_info=True)
  552 + finally:
  553 + if ole is not None:
  554 + ole.close()
  555 + finally:
  556 + if ppt_file is not None:
  557 + ppt_file.close()
552 558  
553 559  
554 560 class FakeFile(io.RawIOBase):
... ... @@ -750,13 +756,13 @@ def process_file(filename, data, output_dir=None):
750 756  
751 757 xml_parser = None
752 758 if is_zipfile(filename):
753   - log.info('file is a OOXML file, looking for relationships with external links')
  759 + log.info('file could be an OOXML file, looking for relationships with '
  760 + 'external links')
754 761 xml_parser = XmlParser(filename)
755 762 for relationship, target in find_external_relationships(xml_parser):
756 763 did_dump = True
757 764 print("Found relationship '%s' with external link %s" % (relationship, target))
758 765  
759   -
760 766 # look for ole files inside file (e.g. unzip docx)
761 767 # have to finish work on every ole stream inside iteration, since handles
762 768 # are closed in find_ole
... ... @@ -765,9 +771,9 @@ def process_file(filename, data, output_dir=None):
765 771 continue
766 772  
767 773 for path_parts in ole.listdir():
  774 + stream_path = '/'.join(path_parts)
  775 + log.debug('Checking stream %r', stream_path)
768 776 if path_parts[-1] == '\x01Ole10Native':
769   - stream_path = '/'.join(path_parts)
770   - log.debug('Checking stream %r', stream_path)
771 777 stream = None
772 778 try:
773 779 stream = ole.openstream(path_parts)
... ...
oletools/oletimes.py
... ... @@ -16,7 +16,7 @@ http://www.decalage.info/python/oletools
16 16  
17 17 #=== LICENSE =================================================================
18 18  
19   -# oletimes is copyright (c) 2013-2017, Philippe Lagadec (http://www.decalage.info)
  19 +# oletimes is copyright (c) 2013-2019, Philippe Lagadec (http://www.decalage.info)
20 20 # All rights reserved.
21 21 #
22 22 # Redistribution and use in source and binary forms, with or without modification,
... ... @@ -52,7 +52,7 @@ http://www.decalage.info/python/oletools
52 52 # 2017-05-04 PL: - added optparse and xglob (issue #141)
53 53 # 2018-09-11 v0.54 PL: - olefile is now a dependency
54 54  
55   -__version__ = '0.54dev1'
  55 +__version__ = '0.54'
56 56  
57 57 #------------------------------------------------------------------------------
58 58 # TODO:
... ...
oletools/olevba.py
... ... @@ -7,14 +7,14 @@ olevba is a script to parse OLE and OpenXML files such as MS Office documents
7 7 and analyze malicious macros.
8 8  
9 9 Supported formats:
10   -- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
11   -- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
12   -- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
13   -- Word/PowerPoint 2007+ XML (aka Flat OPC)
14   -- Word 2003 XML (.xml)
15   -- Word/Excel Single File Web Page / MHTML (.mht)
16   -- Publisher (.pub)
17   -- raises an error if run with files encrypted using MS Crypto API RC4
  10 + - Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
  11 + - Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
  12 + - PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
  13 + - Word/PowerPoint 2007+ XML (aka Flat OPC)
  14 + - Word 2003 XML (.xml)
  15 + - Word/Excel Single File Web Page / MHTML (.mht)
  16 + - Publisher (.pub)
  17 + - raises an error if run with files encrypted using MS Crypto API RC4
18 18  
19 19 Author: Philippe Lagadec - http://www.decalage.info
20 20 License: BSD, see source code or documentation
... ... @@ -28,7 +28,7 @@ https://github.com/unixfreak0037/officeparser
28 28  
29 29 # === LICENSE ==================================================================
30 30  
31   -# olevba is copyright (c) 2014-2018 Philippe Lagadec (http://www.decalage.info)
  31 +# olevba is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info)
32 32 # All rights reserved.
33 33 #
34 34 # Redistribution and use in source and binary forms, with or without modification,
... ... @@ -210,8 +210,16 @@ from __future__ import print_function
210 210 # 2018-09-11 v0.54 PL: - olefile is now a dependency
211 211 # 2018-10-08 PL: - replace backspace before printing to console (issue #358)
212 212 # 2018-10-25 CH: - detect encryption and raise error if detected
  213 +# 2018-12-03 PL: - uses tablestream (+colors) instead of prettytable
  214 +# 2018-12-06 PL: - colorize the suspicious keywords found in VBA code
  215 +# 2019-01-01 PL: - removed support for Python 2.6
  216 +# 2019-03-18 PL: - added XLM/XLF macros detection for Excel OLE files
  217 +# 2019-03-25 CH: - added decryption of password-protected files
  218 +# 2019-04-09 PL: - decompress_stream accepts bytes (issue #422)
  219 +# 2019-05-23 v0.55 PL: - added option --pcode to call pcodedmp and display P-code
  220 +# 2019-06-05 PL: - added VBA stomping detection
213 221  
214   -__version__ = '0.54dev4'
  222 +__version__ = '0.55.dev3'
215 223  
216 224 #------------------------------------------------------------------------------
217 225 # TODO:
... ... @@ -236,23 +244,20 @@ __version__ = &#39;0.54dev4&#39;
236 244 # - extract_macros: use combined struct.unpack instead of many calls
237 245 # - all except clauses should target specific exceptions
238 246  
239   -#------------------------------------------------------------------------------
  247 +# ------------------------------------------------------------------------------
240 248 # REFERENCES:
241 249 # - [MS-OVBA]: Microsoft Office VBA File Format Structure
242 250 # http://msdn.microsoft.com/en-us/library/office/cc313094%28v=office.12%29.aspx
243 251 # - officeparser: https://github.com/unixfreak0037/officeparser
244 252  
245 253  
246   -#--- IMPORTS ------------------------------------------------------------------
  254 +# --- IMPORTS ------------------------------------------------------------------
247 255  
248 256 import sys
249 257 import os
250 258 import logging
251 259 import struct
252   -try:
253   - from cStringIO import StringIO
254   -except ImportError:
255   - from io import StringIO
  260 +from io import BytesIO, StringIO
256 261 import math
257 262 import zipfile
258 263 import re
... ... @@ -261,7 +266,7 @@ import binascii
261 266 import base64
262 267 import zlib
263 268 import email # for MHTML parsing
264   -import string # for printable
  269 +import string # for printable
265 270 import json # for json output mode (argument --json)
266 271  
267 272 # import lxml or ElementTree for XML parsing:
... ... @@ -297,11 +302,11 @@ _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
297 302 # print('_thismodule_dir = %r' % _thismodule_dir)
298 303 _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
299 304 # print('_parent_dir = %r' % _thirdparty_dir)
300   -if not _parent_dir in sys.path:
  305 +if _parent_dir not in sys.path:
301 306 sys.path.insert(0, _parent_dir)
302 307  
303 308 import olefile
304   -from oletools.thirdparty.prettytable import prettytable
  309 +from oletools.thirdparty.tablestream import tablestream
305 310 from oletools.thirdparty.xglob import xglob, PathNotFoundException
306 311 from pyparsing import \
307 312 CaselessKeyword, CaselessLiteral, Combine, Forward, Literal, \
... ... @@ -311,9 +316,8 @@ from pyparsing import \
311 316 from oletools import ppt_parser
312 317 from oletools import oleform
313 318 from oletools import rtfobj
314   -from oletools import oleid
315   -from oletools.common.errors import FileIsEncryptedError
316   -
  319 +from oletools import crypto
  320 +from oletools.common import codepages
317 321  
318 322 # monkeypatch email to fix issue #32:
319 323 # allow header lines without ":"
... ... @@ -324,30 +328,77 @@ email.feedparser.headerRE = re.compile(r&#39;^(From |[\041-\071\073-\176]{1,}:?|[\t
324 328  
325 329 if sys.version_info[0] <= 2:
326 330 # Python 2.x
327   - if sys.version_info[1] <= 6:
328   - # Python 2.6
329   - # use is_zipfile backported from Python 2.7:
330   - from thirdparty.zipfile27 import is_zipfile
331   - else:
332   - # Python 2.7
333   - from zipfile import is_zipfile
  331 + PYTHON2 = True
  332 + # to use ord on bytes/bytearray items the same way in Python 2+3
  333 + # on Python 2, just use the normal ord() because items are bytes
  334 + byte_ord = ord
  335 + #: Default string encoding for the olevba API
  336 + DEFAULT_API_ENCODING = 'utf8' # on Python 2: UTF-8 (bytes)
334 337 else:
335 338 # Python 3.x+
336   - from zipfile import is_zipfile
  339 + PYTHON2 = False
  340 +
  341 + # to use ord on bytes/bytearray items the same way in Python 2+3
  342 + # on Python 3, items are int, so just return the item
  343 + def byte_ord(x):
  344 + return x
337 345 # xrange is now called range:
338 346 xrange = range
  347 + # unichr does not exist anymore, only chr:
  348 + unichr = chr
  349 + # json2ascii also needs "unicode":
  350 + unicode = str
  351 + from functools import reduce
  352 + #: Default string encoding for the olevba API
  353 + DEFAULT_API_ENCODING = None # on Python 3: None (unicode)
  354 + # Python 3.0 - 3.4 support:
  355 + # From https://gist.github.com/ynkdir/867347/c5e188a4886bc2dd71876c7e069a7b00b6c16c61
  356 + if sys.version_info < (3, 5):
  357 + import codecs
  358 + _backslashreplace_errors = codecs.lookup_error("backslashreplace")
  359 +
  360 + def backslashreplace_errors(exc):
  361 + if isinstance(exc, UnicodeDecodeError):
  362 + u = "".join("\\x{0:02x}".format(c) for c in exc.object[exc.start:exc.end])
  363 + return u, exc.end
  364 + return _backslashreplace_errors(exc)
  365 +
  366 + codecs.register_error("backslashreplace", backslashreplace_errors)
  367 +
  368 +
  369 +def unicode2str(unicode_string):
  370 + """
  371 + convert a unicode string to a native str:
  372 + - on Python 3, it returns the same string
  373 + - on Python 2, the string is encoded with UTF-8 to a bytes str
  374 + :param unicode_string: unicode string to be converted
  375 + :return: the string converted to str
  376 + :rtype: str
  377 + """
  378 + if PYTHON2:
  379 + return unicode_string.encode('utf8', errors='replace')
  380 + else:
  381 + return unicode_string
339 382  
340   -# === LOGGING =================================================================
341 383  
342   -class NullHandler(logging.Handler):
  384 +def bytes2str(bytes_string, encoding='utf8'):
343 385 """
344   - Log Handler without output, to avoid printing messages if logging is not
345   - configured by the main application.
346   - Python 2.7 has logging.NullHandler, but this is necessary for 2.6:
347   - see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library
  386 + convert a bytes string to a native str:
  387 + - on Python 2, it returns the same string (bytes=str)
  388 + - on Python 3, the string is decoded using the provided encoding
  389 + (UTF-8 by default) to a unicode str
  390 + :param bytes_string: bytes string to be converted
  391 + :param encoding: codec to be used for decoding
  392 + :return: the string converted to str
  393 + :rtype: str
348 394 """
349   - def emit(self, record):
350   - pass
  395 + if PYTHON2:
  396 + return bytes_string
  397 + else:
  398 + return bytes_string.decode('utf8', errors='replace')
  399 +
  400 +
  401 +# === LOGGING =================================================================
351 402  
352 403 def get_logger(name, level=logging.CRITICAL+1):
353 404 """
... ... @@ -361,7 +412,7 @@ def get_logger(name, level=logging.CRITICAL+1):
361 412 # First, test if there is already a logger with the same name, else it
362 413 # will generate duplicate messages (due to duplicate handlers):
363 414 if name in logging.Logger.manager.loggerDict:
364   - #NOTE: another less intrusive but more "hackish" solution would be to
  415 + # NOTE: another less intrusive but more "hackish" solution would be to
365 416 # use getLogger then test if its effective level is not default.
366 417 logger = logging.getLogger(name)
367 418 # make sure level is OK:
... ... @@ -371,7 +422,7 @@ def get_logger(name, level=logging.CRITICAL+1):
371 422 logger = logging.getLogger(name)
372 423 # only add a NullHandler for this logger, it is up to the application
373 424 # to configure its own logging:
374   - logger.addHandler(NullHandler())
  425 + logger.addHandler(logging.NullHandler())
375 426 logger.setLevel(level)
376 427 return logger
377 428  
... ... @@ -388,6 +439,7 @@ def enable_logging():
388 439 log.setLevel(logging.NOTSET)
389 440 # Also enable logging in the ppt_parser module:
390 441 ppt_parser.enable_logging()
  442 + crypto.enable_logging()
391 443  
392 444  
393 445  
... ... @@ -564,7 +616,8 @@ AUTOEXEC_KEYWORDS = {
564 616  
565 617 # MS Excel:
566 618 'Runs when the Excel Workbook is opened':
567   - ('Auto_Open', 'Workbook_Open', 'Workbook_Activate'),
  619 + ('Auto_Open', 'Workbook_Open', 'Workbook_Activate', 'Auto_Ope'),
  620 + # TODO: "Auto_Ope" is temporarily here because of a bug in plugin_biff, which misses the last byte in "Auto_Open"...
568 621 'Runs when the Excel Workbook is closed':
569 622 ('Auto_Close', 'Workbook_Close'),
570 623  
... ... @@ -600,9 +653,10 @@ SUSPICIOUS_KEYWORDS = {
600 653 ('CreateTextFile', 'ADODB.Stream', 'WriteText', 'SaveToFile'),
601 654 #CreateTextFile: http://msdn.microsoft.com/en-us/library/office/gg264617%28v=office.15%29.aspx
602 655 #ADODB.Stream sample: http://pastebin.com/Z4TMyuq6
  656 + # ShellExecute: https://twitter.com/StanHacked/status/1075088449768693762
603 657 'May run an executable file or a system command':
604 658 ('Shell', 'vbNormal', 'vbNormalFocus', 'vbHide', 'vbMinimizedFocus', 'vbMaximizedFocus', 'vbNormalNoFocus',
605   - 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute'),
  659 + 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute', 'ShellExecuteA', 'shell32'),
606 660 # MacScript: see https://msdn.microsoft.com/en-us/library/office/gg264812.aspx
607 661 'May run an executable file or a system command on a Mac':
608 662 ('MacScript',),
... ... @@ -620,6 +674,8 @@ SUSPICIOUS_KEYWORDS = {
620 674 'invoke-command', 'scriptblock', 'Invoke-Expression', 'AuthorizationManager'),
621 675 'May run an executable file or a system command using PowerShell':
622 676 ('Start-Process',),
  677 + 'May run an executable file or a system command using Excel 4 Macros (XLM/XLF)':
  678 + ('EXEC',),
623 679 'May hide the application':
624 680 ('Application.Visible', 'ShowWindow', 'SW_HIDE'),
625 681 'May create a directory':
... ... @@ -635,6 +691,8 @@ SUSPICIOUS_KEYWORDS = {
635 691 ('New-Object',),
636 692 'May run an application (if combined with CreateObject)':
637 693 ('Shell.Application',),
  694 + 'May run an Excel 4 Macro (aka XLM/XLF)':
  695 + ('ExecuteExcel4Macro',),
638 696 'May enumerate application windows (if combined with Shell.Application object)':
639 697 ('Windows', 'FindWindow'),
640 698 'May run code from a DLL':
... ... @@ -643,9 +701,12 @@ SUSPICIOUS_KEYWORDS = {
643 701 'May run code from a library on a Mac':
644 702 #TODO: regex to find declare+lib on same line - see mraptor
645 703 ('libc.dylib', 'dylib'),
  704 + 'May run code from a DLL using Excel 4 Macros (XLM/XLF)':
  705 + ('REGISTER',),
646 706 'May inject code into another process':
647   - ('CreateThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload
648   - 'VirtualAllocEx', 'RtlMoveMemory',
  707 + ('CreateThread', 'CreateUserThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload
  708 + 'VirtualAllocEx', 'RtlMoveMemory', 'WriteProcessMemory',
  709 + 'SetContextThread', 'QueueApcThread', 'WriteVirtualMemory', 'VirtualProtect'
649 710 ),
650 711 'May run a shellcode in memory':
651 712 ('EnumSystemLanguageGroupsW?', # Used by Hancitor in Oct 2016
... ... @@ -777,7 +838,8 @@ re_dridex_string = re.compile(r&#39;&quot;[0-9A-Za-z]{20,}&quot;&#39;)
777 838 re_nothex_check = re.compile(r'[G-Zg-z]')
778 839  
779 840 # regex to extract printable strings (at least 5 chars) from VBA Forms:
780   -re_printable_string = re.compile(r'[\t\r\n\x20-\xFF]{5,}')
  841 +# (must be bytes for Python 3)
  842 +re_printable_string = re.compile(b'[\\t\\r\\n\\x20-\\xFF]{5,}')
781 843  
782 844  
783 845 # === PARTIAL VBA GRAMMAR ====================================================
... ... @@ -918,10 +980,13 @@ vba_chr = Suppress(
918 980 def vba_chr_tostr(t):
919 981 try:
920 982 i = t[0]
921   - # normal, non-unicode character:
922 983 if i>=0 and i<=255:
  984 + # normal, non-unicode character:
  985 + # TODO: check if it needs to be converted to bytes for Python 3
923 986 return VbaExpressionString(chr(i))
924 987 else:
  988 + # unicode character
  989 + # Note: this distinction is only needed for Python 2
925 990 return VbaExpressionString(unichr(i).encode('utf-8', 'backslashreplace'))
926 991 except ValueError:
927 992 log.exception('ERROR: incorrect parameter value for chr(): %r' % i)
... ... @@ -1188,8 +1253,9 @@ def decompress_stream(compressed_container):
1188 1253 """
1189 1254 Decompress a stream according to MS-OVBA section 2.4.1
1190 1255  
1191   - compressed_container: string compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm
1192   - return the decompressed container as a string (bytes)
  1256 + :param compressed_container bytearray: bytearray or bytes compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm
  1257 + :return: the decompressed container as a bytes string
  1258 + :rtype: bytes
1193 1259 """
1194 1260 # 2.4.1.2 State Variables
1195 1261  
... ... @@ -1211,10 +1277,14 @@ def decompress_stream(compressed_container):
1211 1277 # DecompressedChunkStart: The location of the first byte of the DecompressedChunk (section 2.4.1.1.3) within the
1212 1278 # DecompressedBuffer (section 2.4.1.1.2).
1213 1279  
1214   - decompressed_container = '' # result
  1280 + # Check the input is a bytearray, otherwise convert it (assuming it's bytes):
  1281 + if not isinstance(compressed_container, bytearray):
  1282 + compressed_container = bytearray(compressed_container)
  1283 + # raise TypeError('decompress_stream requires a bytearray as input')
  1284 + decompressed_container = bytearray() # result
1215 1285 compressed_current = 0
1216 1286  
1217   - sig_byte = ord(compressed_container[compressed_current])
  1287 + sig_byte = compressed_container[compressed_current]
1218 1288 if sig_byte != 0x01:
1219 1289 raise ValueError('invalid signature byte {0:02X}'.format(sig_byte))
1220 1290  
... ... @@ -1260,7 +1330,7 @@ def decompress_stream(compressed_container):
1260 1330 # MS-OVBA 2.4.1.3.3 Decompressing a RawChunk
1261 1331 # uncompressed chunk: read the next 4096 bytes as-is
1262 1332 #TODO: check if there are at least 4096 bytes left
1263   - decompressed_container += compressed_container[compressed_current:compressed_current + 4096]
  1333 + decompressed_container.extend([compressed_container[compressed_current:compressed_current + 4096]])
1264 1334 compressed_current += 4096
1265 1335 else:
1266 1336 # MS-OVBA 2.4.1.3.2 Decompressing a CompressedChunk
... ... @@ -1271,7 +1341,7 @@ def decompress_stream(compressed_container):
1271 1341 # log.debug('compressed_current = %d / compressed_end = %d' % (compressed_current, compressed_end))
1272 1342 # FlagByte: 8 bits indicating if the following 8 tokens are either literal (1 byte of plain text) or
1273 1343 # copy tokens (reference to a previous literal token)
1274   - flag_byte = ord(compressed_container[compressed_current])
  1344 + flag_byte = compressed_container[compressed_current]
1275 1345 compressed_current += 1
1276 1346 for bit_index in xrange(0, 8):
1277 1347 # log.debug('bit_index=%d / compressed_current=%d / compressed_end=%d' % (bit_index, compressed_current, compressed_end))
... ... @@ -1283,7 +1353,7 @@ def decompress_stream(compressed_container):
1283 1353 #log.debug('bit_index=%d: flag_bit=%d' % (bit_index, flag_bit))
1284 1354 if flag_bit == 0: # LiteralToken
1285 1355 # copy one byte directly to output
1286   - decompressed_container += compressed_container[compressed_current]
  1356 + decompressed_container.extend([compressed_container[compressed_current]])
1287 1357 compressed_current += 1
1288 1358 else: # CopyToken
1289 1359 # MS-OVBA 2.4.1.3.19.2 Unpack CopyToken
... ... @@ -1299,520 +1369,664 @@ def decompress_stream(compressed_container):
1299 1369 #log.debug('offset=%d length=%d' % (offset, length))
1300 1370 copy_source = len(decompressed_container) - offset
1301 1371 for index in xrange(copy_source, copy_source + length):
1302   - decompressed_container += decompressed_container[index]
  1372 + decompressed_container.extend([decompressed_container[index]])
1303 1373 compressed_current += 2
1304   - return decompressed_container
  1374 + return bytes(decompressed_container)
1305 1375  
1306 1376  
1307   -def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False):
  1377 +class VBA_Module(object):
1308 1378 """
1309   - Extract VBA macros from an OleFileIO object.
1310   - Internal function, do not call directly.
1311   -
1312   - vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream
1313   - vba_project: path to the PROJECT stream
1314   - :param relaxed: If True, only create info/debug log entry if data is not as expected
1315   - (e.g. opening substream fails); if False, raise an error in this case
1316   - This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream
  1379 + Class to parse a VBA module from an OLE file, and to store all the corresponding
  1380 + metadata and VBA source code.
1317 1381 """
1318   - # Open the PROJECT stream:
1319   - project = ole.openstream(project_path)
1320   - log.debug('relaxed is %s' % relaxed)
1321   -
1322   - # sample content of the PROJECT stream:
1323   -
1324   - ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}"
1325   - ## Document=ThisDocument/&H00000000
1326   - ## Module=NewMacros
1327   - ## Name="Project"
1328   - ## HelpContextID="0"
1329   - ## VersionCompatible32="393222000"
1330   - ## CMG="F1F301E705E705E705E705"
1331   - ## DPB="8F8D7FE3831F2020202020"
1332   - ## GC="2D2FDD81E51EE61EE6E1"
1333   - ##
1334   - ## [Host Extender Info]
1335   - ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000
1336   - ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000
1337   - ##
1338   - ## [Workspace]
1339   - ## ThisDocument=22, 29, 339, 477, Z
1340   - ## NewMacros=-4, 42, 832, 510, C
1341   -
1342   - code_modules = {}
1343   -
1344   - for line in project:
1345   - line = line.strip()
1346   - if '=' in line:
1347   - # split line at the 1st equal sign:
1348   - name, value = line.split('=', 1)
1349   - # looking for code modules
1350   - # add the code module as a key in the dictionary
1351   - # the value will be the extension needed later
1352   - # The value is converted to lowercase, to allow case-insensitive matching (issue #3)
1353   - value = value.lower()
1354   - if name == 'Document':
1355   - # split value at the 1st slash, keep 1st part:
1356   - value = value.split('/', 1)[0]
1357   - code_modules[value] = CLASS_EXTENSION
1358   - elif name == 'Module':
1359   - code_modules[value] = MODULE_EXTENSION
1360   - elif name == 'Class':
1361   - code_modules[value] = CLASS_EXTENSION
1362   - elif name == 'BaseClass':
1363   - code_modules[value] = FORM_EXTENSION
1364   -
1365   - # read data from dir stream (compressed)
1366   - dir_compressed = ole.openstream(dir_path).read()
1367   -
1368   - def check_value(name, expected, value):
1369   - if expected != value:
1370   - if relaxed:
1371   - log.error("invalid value for {0} expected {1:04X} got {2:04X}"
1372   - .format(name, expected, value))
1373   - else:
1374   - raise UnexpectedDataError(dir_path, name, expected, value)
1375   -
1376   - dir_stream = StringIO(decompress_stream(dir_compressed))
1377   -
1378   - # PROJECTSYSKIND Record
1379   - projectsyskind_id = struct.unpack("<H", dir_stream.read(2))[0]
1380   - check_value('PROJECTSYSKIND_Id', 0x0001, projectsyskind_id)
1381   - projectsyskind_size = struct.unpack("<L", dir_stream.read(4))[0]
1382   - check_value('PROJECTSYSKIND_Size', 0x0004, projectsyskind_size)
1383   - projectsyskind_syskind = struct.unpack("<L", dir_stream.read(4))[0]
1384   - if projectsyskind_syskind == 0x00:
1385   - log.debug("16-bit Windows")
1386   - elif projectsyskind_syskind == 0x01:
1387   - log.debug("32-bit Windows")
1388   - elif projectsyskind_syskind == 0x02:
1389   - log.debug("Macintosh")
1390   - elif projectsyskind_syskind == 0x03:
1391   - log.debug("64-bit Windows")
1392   - else:
1393   - log.error("invalid PROJECTSYSKIND_SysKind {0:04X}".format(projectsyskind_syskind))
1394   -
1395   - # PROJECTLCID Record
1396   - projectlcid_id = struct.unpack("<H", dir_stream.read(2))[0]
1397   - check_value('PROJECTLCID_Id', 0x0002, projectlcid_id)
1398   - projectlcid_size = struct.unpack("<L", dir_stream.read(4))[0]
1399   - check_value('PROJECTLCID_Size', 0x0004, projectlcid_size)
1400   - projectlcid_lcid = struct.unpack("<L", dir_stream.read(4))[0]
1401   - check_value('PROJECTLCID_Lcid', 0x409, projectlcid_lcid)
1402   -
1403   - # PROJECTLCIDINVOKE Record
1404   - projectlcidinvoke_id = struct.unpack("<H", dir_stream.read(2))[0]
1405   - check_value('PROJECTLCIDINVOKE_Id', 0x0014, projectlcidinvoke_id)
1406   - projectlcidinvoke_size = struct.unpack("<L", dir_stream.read(4))[0]
1407   - check_value('PROJECTLCIDINVOKE_Size', 0x0004, projectlcidinvoke_size)
1408   - projectlcidinvoke_lcidinvoke = struct.unpack("<L", dir_stream.read(4))[0]
1409   - check_value('PROJECTLCIDINVOKE_LcidInvoke', 0x409, projectlcidinvoke_lcidinvoke)
1410   -
1411   - # PROJECTCODEPAGE Record
1412   - projectcodepage_id = struct.unpack("<H", dir_stream.read(2))[0]
1413   - check_value('PROJECTCODEPAGE_Id', 0x0003, projectcodepage_id)
1414   - projectcodepage_size = struct.unpack("<L", dir_stream.read(4))[0]
1415   - check_value('PROJECTCODEPAGE_Size', 0x0002, projectcodepage_size)
1416   - projectcodepage_codepage = struct.unpack("<H", dir_stream.read(2))[0]
1417   -
1418   - # PROJECTNAME Record
1419   - projectname_id = struct.unpack("<H", dir_stream.read(2))[0]
1420   - check_value('PROJECTNAME_Id', 0x0004, projectname_id)
1421   - projectname_sizeof_projectname = struct.unpack("<L", dir_stream.read(4))[0]
1422   - if projectname_sizeof_projectname < 1 or projectname_sizeof_projectname > 128:
1423   - log.error("PROJECTNAME_SizeOfProjectName value not in range: {0}".format(projectname_sizeof_projectname))
1424   - projectname_projectname = dir_stream.read(projectname_sizeof_projectname)
1425   - unused = projectname_projectname
1426   -
1427   - # PROJECTDOCSTRING Record
1428   - projectdocstring_id = struct.unpack("<H", dir_stream.read(2))[0]
1429   - check_value('PROJECTDOCSTRING_Id', 0x0005, projectdocstring_id)
1430   - projectdocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0]
1431   - if projectdocstring_sizeof_docstring > 2000:
1432   - log.error(
1433   - "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring))
1434   - projectdocstring_docstring = dir_stream.read(projectdocstring_sizeof_docstring)
1435   - projectdocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1436   - check_value('PROJECTDOCSTRING_Reserved', 0x0040, projectdocstring_reserved)
1437   - projectdocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1438   - if projectdocstring_sizeof_docstring_unicode % 2 != 0:
1439   - log.error("PROJECTDOCSTRING_SizeOfDocStringUnicode is not even")
1440   - projectdocstring_docstring_unicode = dir_stream.read(projectdocstring_sizeof_docstring_unicode)
1441   - unused = projectdocstring_docstring
1442   - unused = projectdocstring_docstring_unicode
1443   -
1444   - # PROJECTHELPFILEPATH Record - MS-OVBA 2.3.4.2.1.7
1445   - projecthelpfilepath_id = struct.unpack("<H", dir_stream.read(2))[0]
1446   - check_value('PROJECTHELPFILEPATH_Id', 0x0006, projecthelpfilepath_id)
1447   - projecthelpfilepath_sizeof_helpfile1 = struct.unpack("<L", dir_stream.read(4))[0]
1448   - if projecthelpfilepath_sizeof_helpfile1 > 260:
1449   - log.error(
1450   - "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1))
1451   - projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1)
1452   - projecthelpfilepath_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1453   - check_value('PROJECTHELPFILEPATH_Reserved', 0x003D, projecthelpfilepath_reserved)
1454   - projecthelpfilepath_sizeof_helpfile2 = struct.unpack("<L", dir_stream.read(4))[0]
1455   - if projecthelpfilepath_sizeof_helpfile2 != projecthelpfilepath_sizeof_helpfile1:
1456   - log.error("PROJECTHELPFILEPATH_SizeOfHelpFile1 does not equal PROJECTHELPFILEPATH_SizeOfHelpFile2")
1457   - projecthelpfilepath_helpfile2 = dir_stream.read(projecthelpfilepath_sizeof_helpfile2)
1458   - if projecthelpfilepath_helpfile2 != projecthelpfilepath_helpfile1:
1459   - log.error("PROJECTHELPFILEPATH_HelpFile1 does not equal PROJECTHELPFILEPATH_HelpFile2")
1460   -
1461   - # PROJECTHELPCONTEXT Record
1462   - projecthelpcontext_id = struct.unpack("<H", dir_stream.read(2))[0]
1463   - check_value('PROJECTHELPCONTEXT_Id', 0x0007, projecthelpcontext_id)
1464   - projecthelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0]
1465   - check_value('PROJECTHELPCONTEXT_Size', 0x0004, projecthelpcontext_size)
1466   - projecthelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0]
1467   - unused = projecthelpcontext_helpcontext
1468   -
1469   - # PROJECTLIBFLAGS Record
1470   - projectlibflags_id = struct.unpack("<H", dir_stream.read(2))[0]
1471   - check_value('PROJECTLIBFLAGS_Id', 0x0008, projectlibflags_id)
1472   - projectlibflags_size = struct.unpack("<L", dir_stream.read(4))[0]
1473   - check_value('PROJECTLIBFLAGS_Size', 0x0004, projectlibflags_size)
1474   - projectlibflags_projectlibflags = struct.unpack("<L", dir_stream.read(4))[0]
1475   - check_value('PROJECTLIBFLAGS_ProjectLibFlags', 0x0000, projectlibflags_projectlibflags)
1476   -
1477   - # PROJECTVERSION Record
1478   - projectversion_id = struct.unpack("<H", dir_stream.read(2))[0]
1479   - check_value('PROJECTVERSION_Id', 0x0009, projectversion_id)
1480   - projectversion_reserved = struct.unpack("<L", dir_stream.read(4))[0]
1481   - check_value('PROJECTVERSION_Reserved', 0x0004, projectversion_reserved)
1482   - projectversion_versionmajor = struct.unpack("<L", dir_stream.read(4))[0]
1483   - projectversion_versionminor = struct.unpack("<H", dir_stream.read(2))[0]
1484   - unused = projectversion_versionmajor
1485   - unused = projectversion_versionminor
1486   -
1487   - # PROJECTCONSTANTS Record
1488   - projectconstants_id = struct.unpack("<H", dir_stream.read(2))[0]
1489   - check_value('PROJECTCONSTANTS_Id', 0x000C, projectconstants_id)
1490   - projectconstants_sizeof_constants = struct.unpack("<L", dir_stream.read(4))[0]
1491   - if projectconstants_sizeof_constants > 1015:
1492   - log.error(
1493   - "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants))
1494   - projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants)
1495   - projectconstants_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1496   - check_value('PROJECTCONSTANTS_Reserved', 0x003C, projectconstants_reserved)
1497   - projectconstants_sizeof_constants_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1498   - if projectconstants_sizeof_constants_unicode % 2 != 0:
1499   - log.error("PROJECTCONSTANTS_SizeOfConstantsUnicode is not even")
1500   - projectconstants_constants_unicode = dir_stream.read(projectconstants_sizeof_constants_unicode)
1501   - unused = projectconstants_constants
1502   - unused = projectconstants_constants_unicode
1503   -
1504   - # array of REFERENCE records
1505   - check = None
1506   - while True:
1507   - check = struct.unpack("<H", dir_stream.read(2))[0]
1508   - log.debug("reference type = {0:04X}".format(check))
1509   - if check == 0x000F:
1510   - break
1511   -
1512   - if check == 0x0016:
1513   - # REFERENCENAME
1514   - reference_id = check
1515   - reference_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]
1516   - reference_name = dir_stream.read(reference_sizeof_name)
1517   - reference_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1518   - # According to [MS-OVBA] 2.3.4.2.2.2 REFERENCENAME Record:
1519   - # "Reserved (2 bytes): MUST be 0x003E. MUST be ignored."
1520   - # So let's ignore it, otherwise it crashes on some files (issue #132)
1521   - # PR #135 by @c1fe:
1522   - # contrary to the specification I think that the unicode name
1523   - # is optional. if reference_reserved is not 0x003E I think it
1524   - # is actually the start of another REFERENCE record
1525   - # at least when projectsyskind_syskind == 0x02 (Macintosh)
1526   - if reference_reserved == 0x003E:
1527   - #if reference_reserved not in (0x003E, 0x000D):
1528   - # raise UnexpectedDataError(dir_path, 'REFERENCE_Reserved',
1529   - # 0x0003E, reference_reserved)
1530   - reference_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1531   - reference_name_unicode = dir_stream.read(reference_sizeof_name_unicode)
1532   - unused = reference_id
1533   - unused = reference_name
1534   - unused = reference_name_unicode
1535   - continue
1536   - else:
1537   - check = reference_reserved
1538   - log.debug("reference type = {0:04X}".format(check))
1539   -
1540   - if check == 0x0033:
1541   - # REFERENCEORIGINAL (followed by REFERENCECONTROL)
1542   - referenceoriginal_id = check
1543   - referenceoriginal_sizeof_libidoriginal = struct.unpack("<L", dir_stream.read(4))[0]
1544   - referenceoriginal_libidoriginal = dir_stream.read(referenceoriginal_sizeof_libidoriginal)
1545   - unused = referenceoriginal_id
1546   - unused = referenceoriginal_libidoriginal
1547   - continue
1548   -
1549   - if check == 0x002F:
1550   - # REFERENCECONTROL
1551   - referencecontrol_id = check
1552   - referencecontrol_sizetwiddled = struct.unpack("<L", dir_stream.read(4))[0] # ignore
1553   - referencecontrol_sizeof_libidtwiddled = struct.unpack("<L", dir_stream.read(4))[0]
1554   - referencecontrol_libidtwiddled = dir_stream.read(referencecontrol_sizeof_libidtwiddled)
1555   - referencecontrol_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] # ignore
1556   - check_value('REFERENCECONTROL_Reserved1', 0x0000, referencecontrol_reserved1)
1557   - referencecontrol_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] # ignore
1558   - check_value('REFERENCECONTROL_Reserved2', 0x0000, referencecontrol_reserved2)
1559   - unused = referencecontrol_id
1560   - unused = referencecontrol_sizetwiddled
1561   - unused = referencecontrol_libidtwiddled
1562   - # optional field
1563   - check2 = struct.unpack("<H", dir_stream.read(2))[0]
1564   - if check2 == 0x0016:
1565   - referencecontrol_namerecordextended_id = check
1566   - referencecontrol_namerecordextended_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]
1567   - referencecontrol_namerecordextended_name = dir_stream.read(
1568   - referencecontrol_namerecordextended_sizeof_name)
1569   - referencecontrol_namerecordextended_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1570   - if referencecontrol_namerecordextended_reserved == 0x003E:
1571   - referencecontrol_namerecordextended_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1572   - referencecontrol_namerecordextended_name_unicode = dir_stream.read(
1573   - referencecontrol_namerecordextended_sizeof_name_unicode)
1574   - referencecontrol_reserved3 = struct.unpack("<H", dir_stream.read(2))[0]
1575   - unused = referencecontrol_namerecordextended_id
1576   - unused = referencecontrol_namerecordextended_name
1577   - unused = referencecontrol_namerecordextended_name_unicode
1578   - else:
1579   - referencecontrol_reserved3 = referencecontrol_namerecordextended_reserved
1580   - else:
1581   - referencecontrol_reserved3 = check2
1582   -
1583   - check_value('REFERENCECONTROL_Reserved3', 0x0030, referencecontrol_reserved3)
1584   - referencecontrol_sizeextended = struct.unpack("<L", dir_stream.read(4))[0]
1585   - referencecontrol_sizeof_libidextended = struct.unpack("<L", dir_stream.read(4))[0]
1586   - referencecontrol_libidextended = dir_stream.read(referencecontrol_sizeof_libidextended)
1587   - referencecontrol_reserved4 = struct.unpack("<L", dir_stream.read(4))[0]
1588   - referencecontrol_reserved5 = struct.unpack("<H", dir_stream.read(2))[0]
1589   - referencecontrol_originaltypelib = dir_stream.read(16)
1590   - referencecontrol_cookie = struct.unpack("<L", dir_stream.read(4))[0]
1591   - unused = referencecontrol_sizeextended
1592   - unused = referencecontrol_libidextended
1593   - unused = referencecontrol_reserved4
1594   - unused = referencecontrol_reserved5
1595   - unused = referencecontrol_originaltypelib
1596   - unused = referencecontrol_cookie
1597   - continue
1598   -
1599   - if check == 0x000D:
1600   - # REFERENCEREGISTERED
1601   - referenceregistered_id = check
1602   - referenceregistered_size = struct.unpack("<L", dir_stream.read(4))[0]
1603   - referenceregistered_sizeof_libid = struct.unpack("<L", dir_stream.read(4))[0]
1604   - referenceregistered_libid = dir_stream.read(referenceregistered_sizeof_libid)
1605   - referenceregistered_reserved1 = struct.unpack("<L", dir_stream.read(4))[0]
1606   - check_value('REFERENCEREGISTERED_Reserved1', 0x0000, referenceregistered_reserved1)
1607   - referenceregistered_reserved2 = struct.unpack("<H", dir_stream.read(2))[0]
1608   - check_value('REFERENCEREGISTERED_Reserved2', 0x0000, referenceregistered_reserved2)
1609   - unused = referenceregistered_id
1610   - unused = referenceregistered_size
1611   - unused = referenceregistered_libid
1612   - continue
1613 1382  
1614   - if check == 0x000E:
1615   - # REFERENCEPROJECT
1616   - referenceproject_id = check
1617   - referenceproject_size = struct.unpack("<L", dir_stream.read(4))[0]
1618   - referenceproject_sizeof_libidabsolute = struct.unpack("<L", dir_stream.read(4))[0]
1619   - referenceproject_libidabsolute = dir_stream.read(referenceproject_sizeof_libidabsolute)
1620   - referenceproject_sizeof_libidrelative = struct.unpack("<L", dir_stream.read(4))[0]
1621   - referenceproject_libidrelative = dir_stream.read(referenceproject_sizeof_libidrelative)
1622   - referenceproject_majorversion = struct.unpack("<L", dir_stream.read(4))[0]
1623   - referenceproject_minorversion = struct.unpack("<H", dir_stream.read(2))[0]
1624   - unused = referenceproject_id
1625   - unused = referenceproject_size
1626   - unused = referenceproject_libidabsolute
1627   - unused = referenceproject_libidrelative
1628   - unused = referenceproject_majorversion
1629   - unused = referenceproject_minorversion
1630   - continue
  1383 + def __init__(self, project, dir_stream, module_index):
  1384 + """
  1385 + Parse a VBA Module record from the dir stream of a VBA project.
  1386 + Reference: MS-OVBA 2.3.4.2.3.2 MODULE Record
1631 1387  
1632   - log.error('invalid or unknown check Id {0:04X}'.format(check))
1633   - # raise an exception instead of stopping abruptly (issue #180)
1634   - raise UnexpectedDataError(dir_path, 'reference type', (0x0F, 0x16, 0x33, 0x2F, 0x0D, 0x0E), check)
1635   - #sys.exit(0)
1636   -
1637   - projectmodules_id = check #struct.unpack("<H", dir_stream.read(2))[0]
1638   - check_value('PROJECTMODULES_Id', 0x000F, projectmodules_id)
1639   - projectmodules_size = struct.unpack("<L", dir_stream.read(4))[0]
1640   - check_value('PROJECTMODULES_Size', 0x0002, projectmodules_size)
1641   - projectmodules_count = struct.unpack("<H", dir_stream.read(2))[0]
1642   - projectmodules_projectcookierecord_id = struct.unpack("<H", dir_stream.read(2))[0]
1643   - check_value('PROJECTMODULES_ProjectCookieRecord_Id', 0x0013, projectmodules_projectcookierecord_id)
1644   - projectmodules_projectcookierecord_size = struct.unpack("<L", dir_stream.read(4))[0]
1645   - check_value('PROJECTMODULES_ProjectCookieRecord_Size', 0x0002, projectmodules_projectcookierecord_size)
1646   - projectmodules_projectcookierecord_cookie = struct.unpack("<H", dir_stream.read(2))[0]
1647   - unused = projectmodules_projectcookierecord_cookie
1648   -
1649   - # short function to simplify unicode text output
1650   - uni_out = lambda unicode_text: unicode_text.encode('utf-8', 'replace')
1651   -
1652   - log.debug("parsing {0} modules".format(projectmodules_count))
1653   - for projectmodule_index in xrange(0, projectmodules_count):
  1388 + :param VBA_Project project: VBA_Project, corresponding VBA project
  1389 + :param olefile.OleStream dir_stream: olefile.OleStream, file object containing the module record
  1390 + :param int module_index: int, index of the module in the VBA project list
  1391 + """
  1392 + #: reference to the VBA project for later use (VBA_Project)
  1393 + self.project = project
  1394 + #: VBA module name (unicode str)
  1395 + self.name = None
  1396 + #: VBA module name as a native str (utf8 bytes on py2, str on py3)
  1397 + self.name_str = None
  1398 + #: VBA module name, unicode copy (unicode str)
  1399 + self._name_unicode = None
  1400 + #: Stream name containing the VBA module (unicode str)
  1401 + self.streamname = None
  1402 + #: Stream name containing the VBA module as a native str (utf8 bytes on py2, str on py3)
  1403 + self.streamname_str = None
  1404 + self._streamname_unicode = None
  1405 + self.docstring = None
  1406 + self._docstring_unicode = None
  1407 + self.textoffset = None
  1408 + self.type = None
  1409 + self.readonly = False
  1410 + self.private = False
  1411 + #: VBA source code in bytes format, using the original code page from the VBA project
  1412 + self.code_raw = None
  1413 + #: VBA source code in unicode format (unicode for Python2, str for Python 3)
  1414 + self.code = None
  1415 + #: VBA source code in native str format (str encoded with UTF-8 for Python 2, str for Python 3)
  1416 + self.code_str = None
  1417 + #: VBA module file name including an extension based on the module type such as bas, cls, frm (unicode str)
  1418 + self.filename = None
  1419 + #: VBA module file name in native str format (str)
  1420 + self.filename_str = None
  1421 + self.code_path = None
1654 1422 try:
1655   - modulename_id = struct.unpack("<H", dir_stream.read(2))[0]
1656   - check_value('MODULENAME_Id', 0x0019, modulename_id)
1657   - modulename_sizeof_modulename = struct.unpack("<L", dir_stream.read(4))[0]
1658   - modulename_modulename = dir_stream.read(modulename_sizeof_modulename)
1659   - # TODO: preset variables to avoid "referenced before assignment" errors
1660   - modulename_unicode_modulename_unicode = ''
  1423 + # 2.3.4.2.3.2.1 MODULENAME Record
  1424 + # Specifies a VBA identifier as the name of the containing MODULE Record
  1425 + _id = struct.unpack("<H", dir_stream.read(2))[0]
  1426 + project.check_value('MODULENAME_Id', 0x0019, _id)
  1427 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1428 + modulename_bytes = dir_stream.read(size)
  1429 + # Module name always stored as Unicode:
  1430 + self.name = project.decode_bytes(modulename_bytes)
  1431 + self.name_str = unicode2str(self.name)
1661 1432 # account for optional sections
  1433 + # TODO: shouldn't this be a loop? (check MS-OVBA)
1662 1434 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1663 1435 if section_id == 0x0047:
1664   - modulename_unicode_id = section_id
1665   - modulename_unicode_sizeof_modulename_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1666   - modulename_unicode_modulename_unicode = dir_stream.read(
1667   - modulename_unicode_sizeof_modulename_unicode).decode('UTF-16LE', 'replace')
1668   - # just guessing that this is the same encoding as used in OleFileIO
1669   - unused = modulename_unicode_id
  1436 + # 2.3.4.2.3.2.2 MODULENAMEUNICODE Record
  1437 + # Specifies a VBA identifier as the name of the containing MODULE Record (section 2.3.4.2.3.2).
  1438 + # MUST contain the UTF-16 encoding of MODULENAME Record
  1439 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1440 + self._name_unicode = dir_stream.read(size).decode('UTF-16LE', 'replace')
1670 1441 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1671 1442 if section_id == 0x001A:
1672   - modulestreamname_id = section_id
1673   - modulestreamname_sizeof_streamname = struct.unpack("<L", dir_stream.read(4))[0]
1674   - modulestreamname_streamname = dir_stream.read(modulestreamname_sizeof_streamname)
1675   - modulestreamname_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1676   - check_value('MODULESTREAMNAME_Reserved', 0x0032, modulestreamname_reserved)
1677   - modulestreamname_sizeof_streamname_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1678   - modulestreamname_streamname_unicode = dir_stream.read(
1679   - modulestreamname_sizeof_streamname_unicode).decode('UTF-16LE', 'replace')
1680   - # just guessing that this is the same encoding as used in OleFileIO
1681   - unused = modulestreamname_id
  1443 + # 2.3.4.2.3.2.3 MODULESTREAMNAME Record
  1444 + # Specifies the stream name of the ModuleStream (section 2.3.4.3) in the VBA Storage (section 2.3.4)
  1445 + # corresponding to the containing MODULE Record
  1446 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1447 + streamname_bytes = dir_stream.read(size)
  1448 + # Store it as Unicode:
  1449 + self.streamname = project.decode_bytes(streamname_bytes)
  1450 + self.streamname_str = unicode2str(self.streamname)
  1451 + reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1452 + project.check_value('MODULESTREAMNAME_Reserved', 0x0032, reserved)
  1453 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1454 + self._streamname_unicode = dir_stream.read(size).decode('UTF-16LE', 'replace')
1682 1455 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1683 1456 if section_id == 0x001C:
1684   - moduledocstring_id = section_id
1685   - check_value('MODULEDOCSTRING_Id', 0x001C, moduledocstring_id)
1686   - moduledocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0]
1687   - moduledocstring_docstring = dir_stream.read(moduledocstring_sizeof_docstring)
1688   - moduledocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1689   - check_value('MODULEDOCSTRING_Reserved', 0x0048, moduledocstring_reserved)
1690   - moduledocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1691   - moduledocstring_docstring_unicode = dir_stream.read(moduledocstring_sizeof_docstring_unicode)
1692   - unused = moduledocstring_docstring
1693   - unused = moduledocstring_docstring_unicode
  1457 + # 2.3.4.2.3.2.4 MODULEDOCSTRING Record
  1458 + # Specifies the description for the containing MODULE Record
  1459 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1460 + docstring_bytes = dir_stream.read(size)
  1461 + self.docstring = project.decode_bytes(docstring_bytes)
  1462 + reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1463 + project.check_value('MODULEDOCSTRING_Reserved', 0x0048, reserved)
  1464 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1465 + self._docstring_unicode = dir_stream.read(size)
1694 1466 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1695 1467 if section_id == 0x0031:
1696   - moduleoffset_id = section_id
1697   - check_value('MODULEOFFSET_Id', 0x0031, moduleoffset_id)
1698   - moduleoffset_size = struct.unpack("<L", dir_stream.read(4))[0]
1699   - check_value('MODULEOFFSET_Size', 0x0004, moduleoffset_size)
1700   - moduleoffset_textoffset = struct.unpack("<L", dir_stream.read(4))[0]
  1468 + # 2.3.4.2.3.2.5 MODULEOFFSET Record
  1469 + # Specifies the location of the source code within the ModuleStream (section 2.3.4.3)
  1470 + # that corresponds to the containing MODULE Record
  1471 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1472 + project.check_value('MODULEOFFSET_Size', 0x0004, size)
  1473 + self.textoffset = struct.unpack("<L", dir_stream.read(4))[0]
1701 1474 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1702 1475 if section_id == 0x001E:
1703   - modulehelpcontext_id = section_id
1704   - check_value('MODULEHELPCONTEXT_Id', 0x001E, modulehelpcontext_id)
  1476 + # 2.3.4.2.3.2.6 MODULEHELPCONTEXT Record
  1477 + # Specifies the Help topic identifier for the containing MODULE Record
1705 1478 modulehelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0]
1706   - check_value('MODULEHELPCONTEXT_Size', 0x0004, modulehelpcontext_size)
1707   - modulehelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0]
1708   - unused = modulehelpcontext_helpcontext
  1479 + project.check_value('MODULEHELPCONTEXT_Size', 0x0004, modulehelpcontext_size)
  1480 + # HelpContext (4 bytes): An unsigned integer that specifies the Help topic identifier
  1481 + # in the Help file specified by PROJECTHELPFILEPATH Record
  1482 + helpcontext = struct.unpack("<L", dir_stream.read(4))[0]
1709 1483 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1710 1484 if section_id == 0x002C:
1711   - modulecookie_id = section_id
1712   - check_value('MODULECOOKIE_Id', 0x002C, modulecookie_id)
1713   - modulecookie_size = struct.unpack("<L", dir_stream.read(4))[0]
1714   - check_value('MODULECOOKIE_Size', 0x0002, modulecookie_size)
1715   - modulecookie_cookie = struct.unpack("<H", dir_stream.read(2))[0]
1716   - unused = modulecookie_cookie
  1485 + # 2.3.4.2.3.2.7 MODULECOOKIE Record
  1486 + # Specifies ignored data.
  1487 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1488 + project.check_value('MODULECOOKIE_Size', 0x0002, size)
  1489 + cookie = struct.unpack("<H", dir_stream.read(2))[0]
1717 1490 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1718 1491 if section_id == 0x0021 or section_id == 0x0022:
1719   - moduletype_id = section_id
1720   - moduletype_reserved = struct.unpack("<L", dir_stream.read(4))[0]
1721   - unused = moduletype_id
1722   - unused = moduletype_reserved
  1492 + # 2.3.4.2.3.2.8 MODULETYPE Record
  1493 + # Specifies whether the containing MODULE Record (section 2.3.4.2.3.2) is a procedural module,
  1494 + # document module, class module, or designer module.
  1495 + # Id (2 bytes): An unsigned integer that specifies the identifier for this record.
  1496 + # MUST be 0x0021 when the containing MODULE Record (section 2.3.4.2.3.2) is a procedural module.
  1497 + # MUST be 0x0022 when the containing MODULE Record (section 2.3.4.2.3.2) is a document module,
  1498 + # class module, or designer module.
  1499 + self.type = section_id
  1500 + reserved = struct.unpack("<L", dir_stream.read(4))[0]
1723 1501 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1724 1502 if section_id == 0x0025:
1725   - modulereadonly_id = section_id
1726   - check_value('MODULEREADONLY_Id', 0x0025, modulereadonly_id)
1727   - modulereadonly_reserved = struct.unpack("<L", dir_stream.read(4))[0]
1728   - check_value('MODULEREADONLY_Reserved', 0x0000, modulereadonly_reserved)
  1503 + # 2.3.4.2.3.2.9 MODULEREADONLY Record
  1504 + # Specifies that the containing MODULE Record (section 2.3.4.2.3.2) is read-only.
  1505 + self.readonly = True
  1506 + reserved = struct.unpack("<L", dir_stream.read(4))[0]
  1507 + project.check_value('MODULEREADONLY_Reserved', 0x0000, reserved)
1729 1508 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1730 1509 if section_id == 0x0028:
1731   - moduleprivate_id = section_id
1732   - check_value('MODULEPRIVATE_Id', 0x0028, moduleprivate_id)
1733   - moduleprivate_reserved = struct.unpack("<L", dir_stream.read(4))[0]
1734   - check_value('MODULEPRIVATE_Reserved', 0x0000, moduleprivate_reserved)
  1510 + # 2.3.4.2.3.2.10 MODULEPRIVATE Record
  1511 + # Specifies that the containing MODULE Record (section 2.3.4.2.3.2) is only usable from within
  1512 + # the current VBA project.
  1513 + self.private = True
  1514 + reserved = struct.unpack("<L", dir_stream.read(4))[0]
  1515 + project.check_value('MODULEPRIVATE_Reserved', 0x0000, reserved)
1735 1516 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1736 1517 if section_id == 0x002B: # TERMINATOR
1737   - module_reserved = struct.unpack("<L", dir_stream.read(4))[0]
1738   - check_value('MODULE_Reserved', 0x0000, module_reserved)
  1518 + # Terminator (2 bytes): An unsigned integer that specifies the end of this record. MUST be 0x002B.
  1519 + # Reserved (4 bytes): MUST be 0x00000000. MUST be ignored.
  1520 + reserved = struct.unpack("<L", dir_stream.read(4))[0]
  1521 + project.check_value('MODULE_Reserved', 0x0000, reserved)
1739 1522 section_id = None
1740 1523 if section_id != None:
1741 1524 log.warning('unknown or invalid module section id {0:04X}'.format(section_id))
1742   -
1743   - log.debug('Project CodePage = %d' % projectcodepage_codepage)
1744   - if projectcodepage_codepage in MAC_CODEPAGES:
1745   - vba_codec = MAC_CODEPAGES[projectcodepage_codepage]
1746   - else:
1747   - vba_codec = 'cp%d' % projectcodepage_codepage
1748   - log.debug("ModuleName = {0}".format(modulename_modulename))
1749   - log.debug("ModuleNameUnicode = {0}".format(uni_out(modulename_unicode_modulename_unicode)))
1750   - log.debug("StreamName = {0}".format(modulestreamname_streamname))
1751   - try:
1752   - streamname_unicode = modulestreamname_streamname.decode(vba_codec)
1753   - except UnicodeError as ue:
1754   - log.debug('failed to decode stream name {0!r} with codec {1}'
1755   - .format(uni_out(streamname_unicode), vba_codec))
1756   - streamname_unicode = modulestreamname_streamname.decode(vba_codec, errors='replace')
1757   - log.debug("StreamName.decode('%s') = %s" % (vba_codec, uni_out(streamname_unicode)))
1758   - log.debug("StreamNameUnicode = {0}".format(uni_out(modulestreamname_streamname_unicode)))
1759   - log.debug("TextOffset = {0}".format(moduleoffset_textoffset))
1760   -
  1525 +
  1526 + log.debug("Module Name = {0}".format(self.name_str))
  1527 + # log.debug("Module Name Unicode = {0}".format(self._name_unicode))
  1528 + log.debug("Stream Name = {0}".format(self.streamname_str))
  1529 + # log.debug("Stream Name Unicode = {0}".format(self._streamname_unicode))
  1530 + log.debug("TextOffset = {0}".format(self.textoffset))
  1531 +
1761 1532 code_data = None
1762   - try_names = streamname_unicode, \
1763   - modulename_unicode_modulename_unicode, \
1764   - modulestreamname_streamname_unicode
  1533 + # let's try the different names we have, just in case some are missing:
  1534 + try_names = (self.streamname, self._streamname_unicode, self.name, self._name_unicode)
1765 1535 for stream_name in try_names:
1766 1536 # TODO: if olefile._find were less private, could replace this
1767 1537 # try-except with calls to it
1768   - try:
1769   - code_path = vba_root + u'VBA/' + stream_name
1770   - log.debug('opening VBA code stream %s' % uni_out(code_path))
1771   - code_data = ole.openstream(code_path).read()
1772   - break
1773   - except IOError as ioe:
1774   - log.debug('failed to open stream VBA/%r (%r), try other name'
1775   - % (uni_out(stream_name), ioe))
1776   -
  1538 + if stream_name is not None:
  1539 + try:
  1540 + self.code_path = project.vba_root + u'VBA/' + stream_name
  1541 + log.debug('opening VBA code stream %s' % self.code_path)
  1542 + code_data = project.ole.openstream(self.code_path).read()
  1543 + break
  1544 + except IOError as ioe:
  1545 + log.debug('failed to open stream VBA/%r (%r), try other name'
  1546 + % (stream_name, ioe))
  1547 +
1777 1548 if code_data is None:
1778 1549 log.info("Could not open stream %d of %d ('VBA/' + one of %r)!"
1779   - % (projectmodule_index, projectmodules_count,
1780   - '/'.join("'" + uni_out(stream_name) + "'"
1781   - for stream_name in try_names)))
1782   - if relaxed:
1783   - continue # ... with next submodule
  1550 + % (module_index, project.modules_count,
  1551 + '/'.join("'" + stream_name + "'"
  1552 + for stream_name in try_names)))
  1553 + if project.relaxed:
  1554 + return # ... continue with next submodule
1784 1555 else:
1785   - raise SubstreamOpenError('[BASE]', 'VBA/' +
1786   - uni_out(modulename_unicode_modulename_unicode))
1787   -
  1556 + raise SubstreamOpenError('[BASE]', 'VBA/' + self.name)
  1557 +
1788 1558 log.debug("length of code_data = {0}".format(len(code_data)))
1789   - log.debug("offset of code_data = {0}".format(moduleoffset_textoffset))
1790   - code_data = code_data[moduleoffset_textoffset:]
  1559 + log.debug("offset of code_data = {0}".format(self.textoffset))
  1560 + code_data = code_data[self.textoffset:]
1791 1561 if len(code_data) > 0:
1792   - code_data = decompress_stream(code_data)
  1562 + code_data = decompress_stream(bytearray(code_data))
  1563 + # store the raw code encoded as bytes with the project's code page:
  1564 + self.code_raw = code_data
  1565 + # decode it to unicode:
  1566 + self.code = project.decode_bytes(code_data)
  1567 + # also store a native str version:
  1568 + self.code_str = unicode2str(self.code)
1793 1569 # case-insensitive search in the code_modules dict to find the file extension:
1794   - filext = code_modules.get(modulename_modulename.lower(), 'bin')
1795   - filename = '{0}.{1}'.format(modulename_modulename, filext)
1796   - #TODO: also yield the codepage so that callers can decode it properly
1797   - yield (code_path, filename, code_data)
1798   - # print '-'*79
1799   - # print filename
1800   - # print ''
1801   - # print code_data
1802   - # print ''
1803   - log.debug('extracted file {0}'.format(filename))
  1570 + filext = self.project.module_ext.get(self.name.lower(), 'vba')
  1571 + self.filename = u'{0}.{1}'.format(self.name, filext)
  1572 + self.filename_str = unicode2str(self.filename)
  1573 + log.debug('extracted file {0}'.format(self.filename_str))
1804 1574 else:
1805   - log.warning("module stream {0} has code data length 0".format(modulestreamname_streamname))
  1575 + log.warning("module stream {0} has code data length 0".format(self.streamname_str))
1806 1576 except (UnexpectedDataError, SubstreamOpenError):
1807 1577 raise
1808 1578 except Exception as exc:
1809   - log.info('Error parsing module {0} of {1} in _extract_vba:'
1810   - .format(projectmodule_index, projectmodules_count),
  1579 + log.info('Error parsing module {0} of {1}:'
  1580 + .format(module_index, project.modules_count),
1811 1581 exc_info=True)
1812   - if not relaxed:
  1582 + if not project.relaxed:
1813 1583 raise
1814   - _ = unused # make pylint happy: now variable "unused" is being used ;-)
1815   - return
  1584 +
  1585 +
  1586 +class VBA_Project(object):
  1587 + """
  1588 + Class to parse a VBA project from an OLE file, and to store all the corresponding
  1589 + metadata and VBA modules.
  1590 + """
  1591 +
  1592 + def __init__(self, ole, vba_root, project_path, dir_path, relaxed=False):
  1593 + """
  1594 + Extract VBA macros from an OleFileIO object.
  1595 +
  1596 + :param vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream
  1597 + :param project_path: path to the PROJECT stream
  1598 + :param relaxed: If True, only create info/debug log entry if data is not as expected
  1599 + (e.g. opening substream fails); if False, raise an error in this case
  1600 + """
  1601 + self.ole = ole
  1602 + self.vba_root = vba_root
  1603 + self. project_path = project_path
  1604 + self.dir_path = dir_path
  1605 + self.relaxed = relaxed
  1606 + #: VBA modules contained in the project (list of VBA_Module objects)
  1607 + self.modules = []
  1608 + #: file extension for each VBA module
  1609 + self.module_ext = {}
  1610 + log.debug('Parsing the dir stream from %r' % dir_path)
  1611 + # read data from dir stream (compressed)
  1612 + dir_compressed = ole.openstream(dir_path).read()
  1613 + # decompress it:
  1614 + dir_stream = BytesIO(decompress_stream(bytearray(dir_compressed)))
  1615 + # store reference for later use:
  1616 + self.dir_stream = dir_stream
  1617 +
  1618 + # reference: MS-VBAL 2.3.4.2 dir Stream: Version Independent Project Information
  1619 +
  1620 + # PROJECTSYSKIND Record
  1621 + # Specifies the platform for which the VBA project is created.
  1622 + projectsyskind_id = struct.unpack("<H", dir_stream.read(2))[0]
  1623 + self.check_value('PROJECTSYSKIND_Id', 0x0001, projectsyskind_id)
  1624 + projectsyskind_size = struct.unpack("<L", dir_stream.read(4))[0]
  1625 + self.check_value('PROJECTSYSKIND_Size', 0x0004, projectsyskind_size)
  1626 + self.syskind = struct.unpack("<L", dir_stream.read(4))[0]
  1627 + SYSKIND_NAME = {
  1628 + 0x00: "16-bit Windows",
  1629 + 0x01: "32-bit Windows",
  1630 + 0x02: "Macintosh",
  1631 + 0x03: "64-bit Windows"
  1632 + }
  1633 + self.syskind_name = SYSKIND_NAME.get(self.syskind, 'Unknown')
  1634 + log.debug("PROJECTSYSKIND_SysKind: %d - %s" % (self.syskind, self.syskind_name))
  1635 + if self.syskind not in SYSKIND_NAME:
  1636 + log.error("invalid PROJECTSYSKIND_SysKind {0:04X}".format(self.syskind))
  1637 +
  1638 + # PROJECTLCID Record
  1639 + # Specifies the VBA project's LCID.
  1640 + projectlcid_id = struct.unpack("<H", dir_stream.read(2))[0]
  1641 + self.check_value('PROJECTLCID_Id', 0x0002, projectlcid_id)
  1642 + projectlcid_size = struct.unpack("<L", dir_stream.read(4))[0]
  1643 + self.check_value('PROJECTLCID_Size', 0x0004, projectlcid_size)
  1644 + # Lcid (4 bytes): An unsigned integer that specifies the LCID value for the VBA project. MUST be 0x00000409.
  1645 + self.lcid = struct.unpack("<L", dir_stream.read(4))[0]
  1646 + self.check_value('PROJECTLCID_Lcid', 0x409, self.lcid)
  1647 +
  1648 + # PROJECTLCIDINVOKE Record
  1649 + # Specifies an LCID value used for Invoke calls on an Automation server as specified in [MS-OAUT] section 3.1.4.4.
  1650 + projectlcidinvoke_id = struct.unpack("<H", dir_stream.read(2))[0]
  1651 + self.check_value('PROJECTLCIDINVOKE_Id', 0x0014, projectlcidinvoke_id)
  1652 + projectlcidinvoke_size = struct.unpack("<L", dir_stream.read(4))[0]
  1653 + self.check_value('PROJECTLCIDINVOKE_Size', 0x0004, projectlcidinvoke_size)
  1654 + # LcidInvoke (4 bytes): An unsigned integer that specifies the LCID value used for Invoke calls. MUST be 0x00000409.
  1655 + self.lcidinvoke = struct.unpack("<L", dir_stream.read(4))[0]
  1656 + self.check_value('PROJECTLCIDINVOKE_LcidInvoke', 0x409, self.lcidinvoke)
  1657 +
  1658 + # PROJECTCODEPAGE Record
  1659 + # Specifies the VBA project's code page.
  1660 + projectcodepage_id = struct.unpack("<H", dir_stream.read(2))[0]
  1661 + self.check_value('PROJECTCODEPAGE_Id', 0x0003, projectcodepage_id)
  1662 + projectcodepage_size = struct.unpack("<L", dir_stream.read(4))[0]
  1663 + self.check_value('PROJECTCODEPAGE_Size', 0x0002, projectcodepage_size)
  1664 + self.codepage = struct.unpack("<H", dir_stream.read(2))[0]
  1665 + self.codepage_name = codepages.get_codepage_name(self.codepage)
  1666 + log.debug('Project Code Page: %r - %s' % (self.codepage, self.codepage_name))
  1667 + self.codec = codepages.codepage2codec(self.codepage)
  1668 + log.debug('Python codec corresponding to code page %d: %s' % (self.codepage, self.codec))
  1669 +
  1670 +
  1671 + # PROJECTNAME Record
  1672 + # Specifies a unique VBA identifier as the name of the VBA project.
  1673 + projectname_id = struct.unpack("<H", dir_stream.read(2))[0]
  1674 + self.check_value('PROJECTNAME_Id', 0x0004, projectname_id)
  1675 + sizeof_projectname = struct.unpack("<L", dir_stream.read(4))[0]
  1676 + log.debug('Project name size: %d bytes' % sizeof_projectname)
  1677 + if sizeof_projectname < 1 or sizeof_projectname > 128:
  1678 + # TODO: raise an actual error? What is MS Office's behaviour?
  1679 + log.error("PROJECTNAME_SizeOfProjectName value not in range [1-128]: {0}".format(sizeof_projectname))
  1680 + projectname_bytes = dir_stream.read(sizeof_projectname)
  1681 + self.projectname = self.decode_bytes(projectname_bytes)
  1682 +
  1683 +
  1684 + # PROJECTDOCSTRING Record
  1685 + # Specifies the description for the VBA project.
  1686 + projectdocstring_id = struct.unpack("<H", dir_stream.read(2))[0]
  1687 + self.check_value('PROJECTDOCSTRING_Id', 0x0005, projectdocstring_id)
  1688 + projectdocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0]
  1689 + if projectdocstring_sizeof_docstring > 2000:
  1690 + log.error(
  1691 + "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring))
  1692 + # DocString (variable): An array of SizeOfDocString bytes that specifies the description for the VBA project.
  1693 + # MUST contain MBCS characters encoded using the code page specified in PROJECTCODEPAGE (section 2.3.4.2.1.4).
  1694 + # MUST NOT contain null characters.
  1695 + docstring_bytes = dir_stream.read(projectdocstring_sizeof_docstring)
  1696 + self.docstring = self.decode_bytes(docstring_bytes)
  1697 + projectdocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1698 + self.check_value('PROJECTDOCSTRING_Reserved', 0x0040, projectdocstring_reserved)
  1699 + projectdocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0]
  1700 + if projectdocstring_sizeof_docstring_unicode % 2 != 0:
  1701 + log.error("PROJECTDOCSTRING_SizeOfDocStringUnicode is not even")
  1702 + # DocStringUnicode (variable): An array of SizeOfDocStringUnicode bytes that specifies the description for the
  1703 + # VBA project. MUST contain UTF-16 characters. MUST NOT contain null characters.
  1704 + # MUST contain the UTF-16 encoding of DocString.
  1705 + docstring_unicode_bytes = dir_stream.read(projectdocstring_sizeof_docstring_unicode)
  1706 + self.docstring_unicode = docstring_unicode_bytes.decode('utf16', errors='replace')
  1707 +
  1708 + # PROJECTHELPFILEPATH Record - MS-OVBA 2.3.4.2.1.7
  1709 + projecthelpfilepath_id = struct.unpack("<H", dir_stream.read(2))[0]
  1710 + self.check_value('PROJECTHELPFILEPATH_Id', 0x0006, projecthelpfilepath_id)
  1711 + projecthelpfilepath_sizeof_helpfile1 = struct.unpack("<L", dir_stream.read(4))[0]
  1712 + if projecthelpfilepath_sizeof_helpfile1 > 260:
  1713 + log.error(
  1714 + "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1))
  1715 + projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1)
  1716 + projecthelpfilepath_reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1717 + self.check_value('PROJECTHELPFILEPATH_Reserved', 0x003D, projecthelpfilepath_reserved)
  1718 + projecthelpfilepath_sizeof_helpfile2 = struct.unpack("<L", dir_stream.read(4))[0]
  1719 + if projecthelpfilepath_sizeof_helpfile2 != projecthelpfilepath_sizeof_helpfile1:
  1720 + log.error("PROJECTHELPFILEPATH_SizeOfHelpFile1 does not equal PROJECTHELPFILEPATH_SizeOfHelpFile2")
  1721 + projecthelpfilepath_helpfile2 = dir_stream.read(projecthelpfilepath_sizeof_helpfile2)
  1722 + if projecthelpfilepath_helpfile2 != projecthelpfilepath_helpfile1:
  1723 + log.error("PROJECTHELPFILEPATH_HelpFile1 does not equal PROJECTHELPFILEPATH_HelpFile2")
  1724 +
  1725 + # PROJECTHELPCONTEXT Record
  1726 + projecthelpcontext_id = struct.unpack("<H", dir_stream.read(2))[0]
  1727 + self.check_value('PROJECTHELPCONTEXT_Id', 0x0007, projecthelpcontext_id)
  1728 + projecthelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0]
  1729 + self.check_value('PROJECTHELPCONTEXT_Size', 0x0004, projecthelpcontext_size)
  1730 + projecthelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0]
  1731 + unused = projecthelpcontext_helpcontext
  1732 +
  1733 + # PROJECTLIBFLAGS Record
  1734 + projectlibflags_id = struct.unpack("<H", dir_stream.read(2))[0]
  1735 + self.check_value('PROJECTLIBFLAGS_Id', 0x0008, projectlibflags_id)
  1736 + projectlibflags_size = struct.unpack("<L", dir_stream.read(4))[0]
  1737 + self.check_value('PROJECTLIBFLAGS_Size', 0x0004, projectlibflags_size)
  1738 + projectlibflags_projectlibflags = struct.unpack("<L", dir_stream.read(4))[0]
  1739 + self.check_value('PROJECTLIBFLAGS_ProjectLibFlags', 0x0000, projectlibflags_projectlibflags)
  1740 +
  1741 + # PROJECTVERSION Record
  1742 + projectversion_id = struct.unpack("<H", dir_stream.read(2))[0]
  1743 + self.check_value('PROJECTVERSION_Id', 0x0009, projectversion_id)
  1744 + projectversion_reserved = struct.unpack("<L", dir_stream.read(4))[0]
  1745 + self.check_value('PROJECTVERSION_Reserved', 0x0004, projectversion_reserved)
  1746 + projectversion_versionmajor = struct.unpack("<L", dir_stream.read(4))[0]
  1747 + projectversion_versionminor = struct.unpack("<H", dir_stream.read(2))[0]
  1748 + unused = projectversion_versionmajor
  1749 + unused = projectversion_versionminor
  1750 +
  1751 + # PROJECTCONSTANTS Record
  1752 + projectconstants_id = struct.unpack("<H", dir_stream.read(2))[0]
  1753 + self.check_value('PROJECTCONSTANTS_Id', 0x000C, projectconstants_id)
  1754 + projectconstants_sizeof_constants = struct.unpack("<L", dir_stream.read(4))[0]
  1755 + if projectconstants_sizeof_constants > 1015:
  1756 + log.error(
  1757 + "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants))
  1758 + projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants)
  1759 + projectconstants_reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1760 + self.check_value('PROJECTCONSTANTS_Reserved', 0x003C, projectconstants_reserved)
  1761 + projectconstants_sizeof_constants_unicode = struct.unpack("<L", dir_stream.read(4))[0]
  1762 + if projectconstants_sizeof_constants_unicode % 2 != 0:
  1763 + log.error("PROJECTCONSTANTS_SizeOfConstantsUnicode is not even")
  1764 + projectconstants_constants_unicode = dir_stream.read(projectconstants_sizeof_constants_unicode)
  1765 + unused = projectconstants_constants
  1766 + unused = projectconstants_constants_unicode
  1767 +
  1768 + # array of REFERENCE records
  1769 + # Specifies a reference to an Automation type library or VBA project.
  1770 + check = None
  1771 + while True:
  1772 + check = struct.unpack("<H", dir_stream.read(2))[0]
  1773 + log.debug("reference type = {0:04X}".format(check))
  1774 + if check == 0x000F:
  1775 + break
  1776 +
  1777 + if check == 0x0016:
  1778 + # REFERENCENAME
  1779 + # Specifies the name of a referenced VBA project or Automation type library.
  1780 + reference_id = check
  1781 + reference_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]
  1782 + reference_name = dir_stream.read(reference_sizeof_name)
  1783 + log.debug('REFERENCE name: %s' % unicode2str(self.decode_bytes(reference_name)))
  1784 + reference_reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1785 + # According to [MS-OVBA] 2.3.4.2.2.2 REFERENCENAME Record:
  1786 + # "Reserved (2 bytes): MUST be 0x003E. MUST be ignored."
  1787 + # So let's ignore it, otherwise it crashes on some files (issue #132)
  1788 + # PR #135 by @c1fe:
  1789 + # contrary to the specification I think that the unicode name
  1790 + # is optional. if reference_reserved is not 0x003E I think it
  1791 + # is actually the start of another REFERENCE record
  1792 + # at least when projectsyskind_syskind == 0x02 (Macintosh)
  1793 + if reference_reserved == 0x003E:
  1794 + #if reference_reserved not in (0x003E, 0x000D):
  1795 + # raise UnexpectedDataError(dir_path, 'REFERENCE_Reserved',
  1796 + # 0x0003E, reference_reserved)
  1797 + reference_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]
  1798 + reference_name_unicode = dir_stream.read(reference_sizeof_name_unicode)
  1799 + unused = reference_id
  1800 + unused = reference_name
  1801 + unused = reference_name_unicode
  1802 + continue
  1803 + else:
  1804 + check = reference_reserved
  1805 + log.debug("reference type = {0:04X}".format(check))
  1806 +
  1807 + if check == 0x0033:
  1808 + # REFERENCEORIGINAL (followed by REFERENCECONTROL)
  1809 + # Specifies the identifier of the Automation type library the containing REFERENCECONTROL's
  1810 + # (section 2.3.4.2.2.3) twiddled type library was generated from.
  1811 + referenceoriginal_id = check
  1812 + referenceoriginal_sizeof_libidoriginal = struct.unpack("<L", dir_stream.read(4))[0]
  1813 + referenceoriginal_libidoriginal = dir_stream.read(referenceoriginal_sizeof_libidoriginal)
  1814 + log.debug('REFERENCE original lib id: %s' % unicode2str(self.decode_bytes(referenceoriginal_libidoriginal)))
  1815 + unused = referenceoriginal_id
  1816 + unused = referenceoriginal_libidoriginal
  1817 + continue
  1818 +
  1819 + if check == 0x002F:
  1820 + # REFERENCECONTROL
  1821 + # Specifies a reference to a twiddled type library and its extended type library.
  1822 + referencecontrol_id = check
  1823 + referencecontrol_sizetwiddled = struct.unpack("<L", dir_stream.read(4))[0] # ignore
  1824 + referencecontrol_sizeof_libidtwiddled = struct.unpack("<L", dir_stream.read(4))[0]
  1825 + referencecontrol_libidtwiddled = dir_stream.read(referencecontrol_sizeof_libidtwiddled)
  1826 + log.debug('REFERENCE control twiddled lib id: %s' % unicode2str(self.decode_bytes(referencecontrol_libidtwiddled)))
  1827 + referencecontrol_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] # ignore
  1828 + self.check_value('REFERENCECONTROL_Reserved1', 0x0000, referencecontrol_reserved1)
  1829 + referencecontrol_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] # ignore
  1830 + self.check_value('REFERENCECONTROL_Reserved2', 0x0000, referencecontrol_reserved2)
  1831 + unused = referencecontrol_id
  1832 + unused = referencecontrol_sizetwiddled
  1833 + unused = referencecontrol_libidtwiddled
  1834 + # optional field
  1835 + check2 = struct.unpack("<H", dir_stream.read(2))[0]
  1836 + if check2 == 0x0016:
  1837 + referencecontrol_namerecordextended_id = check
  1838 + referencecontrol_namerecordextended_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]
  1839 + referencecontrol_namerecordextended_name = dir_stream.read(
  1840 + referencecontrol_namerecordextended_sizeof_name)
  1841 + log.debug('REFERENCE control name record extended: %s' % unicode2str(
  1842 + self.decode_bytes(referencecontrol_namerecordextended_name)))
  1843 + referencecontrol_namerecordextended_reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1844 + if referencecontrol_namerecordextended_reserved == 0x003E:
  1845 + referencecontrol_namerecordextended_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]
  1846 + referencecontrol_namerecordextended_name_unicode = dir_stream.read(
  1847 + referencecontrol_namerecordextended_sizeof_name_unicode)
  1848 + referencecontrol_reserved3 = struct.unpack("<H", dir_stream.read(2))[0]
  1849 + unused = referencecontrol_namerecordextended_id
  1850 + unused = referencecontrol_namerecordextended_name
  1851 + unused = referencecontrol_namerecordextended_name_unicode
  1852 + else:
  1853 + referencecontrol_reserved3 = referencecontrol_namerecordextended_reserved
  1854 + else:
  1855 + referencecontrol_reserved3 = check2
  1856 +
  1857 + self.check_value('REFERENCECONTROL_Reserved3', 0x0030, referencecontrol_reserved3)
  1858 + referencecontrol_sizeextended = struct.unpack("<L", dir_stream.read(4))[0]
  1859 + referencecontrol_sizeof_libidextended = struct.unpack("<L", dir_stream.read(4))[0]
  1860 + referencecontrol_libidextended = dir_stream.read(referencecontrol_sizeof_libidextended)
  1861 + referencecontrol_reserved4 = struct.unpack("<L", dir_stream.read(4))[0]
  1862 + referencecontrol_reserved5 = struct.unpack("<H", dir_stream.read(2))[0]
  1863 + referencecontrol_originaltypelib = dir_stream.read(16)
  1864 + referencecontrol_cookie = struct.unpack("<L", dir_stream.read(4))[0]
  1865 + unused = referencecontrol_sizeextended
  1866 + unused = referencecontrol_libidextended
  1867 + unused = referencecontrol_reserved4
  1868 + unused = referencecontrol_reserved5
  1869 + unused = referencecontrol_originaltypelib
  1870 + unused = referencecontrol_cookie
  1871 + continue
  1872 +
  1873 + if check == 0x000D:
  1874 + # REFERENCEREGISTERED
  1875 + # Specifies a reference to an Automation type library.
  1876 + referenceregistered_id = check
  1877 + referenceregistered_size = struct.unpack("<L", dir_stream.read(4))[0]
  1878 + referenceregistered_sizeof_libid = struct.unpack("<L", dir_stream.read(4))[0]
  1879 + referenceregistered_libid = dir_stream.read(referenceregistered_sizeof_libid)
  1880 + log.debug('REFERENCE registered lib id: %s' % unicode2str(self.decode_bytes(referenceregistered_libid)))
  1881 + referenceregistered_reserved1 = struct.unpack("<L", dir_stream.read(4))[0]
  1882 + self.check_value('REFERENCEREGISTERED_Reserved1', 0x0000, referenceregistered_reserved1)
  1883 + referenceregistered_reserved2 = struct.unpack("<H", dir_stream.read(2))[0]
  1884 + self.check_value('REFERENCEREGISTERED_Reserved2', 0x0000, referenceregistered_reserved2)
  1885 + unused = referenceregistered_id
  1886 + unused = referenceregistered_size
  1887 + unused = referenceregistered_libid
  1888 + continue
  1889 +
  1890 + if check == 0x000E:
  1891 + # REFERENCEPROJECT
  1892 + # Specifies a reference to an external VBA project.
  1893 + referenceproject_id = check
  1894 + referenceproject_size = struct.unpack("<L", dir_stream.read(4))[0]
  1895 + referenceproject_sizeof_libidabsolute = struct.unpack("<L", dir_stream.read(4))[0]
  1896 + referenceproject_libidabsolute = dir_stream.read(referenceproject_sizeof_libidabsolute)
  1897 + log.debug('REFERENCE project lib id absolute: %s' % unicode2str(self.decode_bytes(referenceproject_libidabsolute)))
  1898 + referenceproject_sizeof_libidrelative = struct.unpack("<L", dir_stream.read(4))[0]
  1899 + referenceproject_libidrelative = dir_stream.read(referenceproject_sizeof_libidrelative)
  1900 + log.debug('REFERENCE project lib id relative: %s' % unicode2str(self.decode_bytes(referenceproject_libidrelative)))
  1901 + referenceproject_majorversion = struct.unpack("<L", dir_stream.read(4))[0]
  1902 + referenceproject_minorversion = struct.unpack("<H", dir_stream.read(2))[0]
  1903 + unused = referenceproject_id
  1904 + unused = referenceproject_size
  1905 + unused = referenceproject_libidabsolute
  1906 + unused = referenceproject_libidrelative
  1907 + unused = referenceproject_majorversion
  1908 + unused = referenceproject_minorversion
  1909 + continue
  1910 +
  1911 + log.error('invalid or unknown check Id {0:04X}'.format(check))
  1912 + # raise an exception instead of stopping abruptly (issue #180)
  1913 + raise UnexpectedDataError(dir_path, 'reference type', (0x0F, 0x16, 0x33, 0x2F, 0x0D, 0x0E), check)
  1914 + #sys.exit(0)
  1915 +
  1916 + def check_value(self, name, expected, value):
  1917 + if expected != value:
  1918 + if self.relaxed:
  1919 + log.error("invalid value for {0} expected {1:04X} got {2:04X}"
  1920 + .format(name, expected, value))
  1921 + else:
  1922 + raise UnexpectedDataError(self.dir_path, name, expected, value)
  1923 +
  1924 + def parse_project_stream(self):
  1925 + """
  1926 + Parse the PROJECT stream from the VBA project
  1927 + :return:
  1928 + """
  1929 + # Open the PROJECT stream:
  1930 + # reference: [MS-OVBA] 2.3.1 PROJECT Stream
  1931 + project_stream = self.ole.openstream(self.project_path)
  1932 +
  1933 + # sample content of the PROJECT stream:
  1934 +
  1935 + ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}"
  1936 + ## Document=ThisDocument/&H00000000
  1937 + ## Module=NewMacros
  1938 + ## Name="Project"
  1939 + ## HelpContextID="0"
  1940 + ## VersionCompatible32="393222000"
  1941 + ## CMG="F1F301E705E705E705E705"
  1942 + ## DPB="8F8D7FE3831F2020202020"
  1943 + ## GC="2D2FDD81E51EE61EE6E1"
  1944 + ##
  1945 + ## [Host Extender Info]
  1946 + ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000
  1947 + ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000
  1948 + ##
  1949 + ## [Workspace]
  1950 + ## ThisDocument=22, 29, 339, 477, Z
  1951 + ## NewMacros=-4, 42, 832, 510, C
  1952 +
  1953 + self.module_ext = {}
  1954 +
  1955 + for line in project_stream:
  1956 + line = self.decode_bytes(line)
  1957 + log.debug('PROJECT: %r' % line)
  1958 + line = line.strip()
  1959 + if '=' in line:
  1960 + # split line at the 1st equal sign:
  1961 + name, value = line.split('=', 1)
  1962 + # looking for code modules
  1963 + # add the code module as a key in the dictionary
  1964 + # the value will be the extension needed later
  1965 + # The value is converted to lowercase, to allow case-insensitive matching (issue #3)
  1966 + value = value.lower()
  1967 + if name == 'Document':
  1968 + # split value at the 1st slash, keep 1st part:
  1969 + value = value.split('/', 1)[0]
  1970 + self.module_ext[value] = CLASS_EXTENSION
  1971 + elif name == 'Module':
  1972 + self.module_ext[value] = MODULE_EXTENSION
  1973 + elif name == 'Class':
  1974 + self.module_ext[value] = CLASS_EXTENSION
  1975 + elif name == 'BaseClass':
  1976 + self.module_ext[value] = FORM_EXTENSION
  1977 +
  1978 + def parse_modules(self):
  1979 + dir_stream = self.dir_stream
  1980 + # projectmodules_id has already been read by the previous loop = 0x000F
  1981 + # projectmodules_id = check #struct.unpack("<H", dir_stream.read(2))[0]
  1982 + # self.check_value('PROJECTMODULES_Id', 0x000F, projectmodules_id)
  1983 + projectmodules_size = struct.unpack("<L", dir_stream.read(4))[0]
  1984 + self.check_value('PROJECTMODULES_Size', 0x0002, projectmodules_size)
  1985 + self.modules_count = struct.unpack("<H", dir_stream.read(2))[0]
  1986 + _id = struct.unpack("<H", dir_stream.read(2))[0]
  1987 + self.check_value('PROJECTMODULES_ProjectCookieRecord_Id', 0x0013, _id)
  1988 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1989 + self.check_value('PROJECTMODULES_ProjectCookieRecord_Size', 0x0002, size)
  1990 + projectcookierecord_cookie = struct.unpack("<H", dir_stream.read(2))[0]
  1991 + unused = projectcookierecord_cookie
  1992 +
  1993 + log.debug("parsing {0} modules".format(self.modules_count))
  1994 + for module_index in xrange(0, self.modules_count):
  1995 + module = VBA_Module(self, self.dir_stream, module_index=module_index)
  1996 + self.modules.append(module)
  1997 + yield (module.code_path, module.filename_str, module.code_str)
  1998 + _ = unused # make pylint happy: now variable "unused" is being used ;-)
  1999 + return
  2000 +
  2001 + def decode_bytes(self, bytes_string, errors='replace'):
  2002 + """
  2003 + Decode a bytes string to a unicode string, using the project code page
  2004 + :param bytes_string: bytes, bytes string to be decoded
  2005 + :param errors: str, mode to handle unicode conversion errors
  2006 + :return: str/unicode, decoded string
  2007 + """
  2008 + return bytes_string.decode(self.codec, errors=errors)
  2009 +
  2010 +
  2011 +
  2012 +def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False):
  2013 + """
  2014 + Extract VBA macros from an OleFileIO object.
  2015 + Internal function, do not call directly.
  2016 +
  2017 + vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream
  2018 + vba_project: path to the PROJECT stream
  2019 + :param relaxed: If True, only create info/debug log entry if data is not as expected
  2020 + (e.g. opening substream fails); if False, raise an error in this case
  2021 + This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream
  2022 + """
  2023 + log.debug('relaxed is %s' % relaxed)
  2024 +
  2025 + project = VBA_Project(ole, vba_root, project_path, dir_path, relaxed=False)
  2026 + project.parse_project_stream()
  2027 +
  2028 + for code_path, filename, code_data in project.parse_modules():
  2029 + yield (code_path, filename, code_data)
1816 2030  
1817 2031  
1818 2032 def vba_collapse_long_lines(vba_code):
... ... @@ -1824,9 +2038,13 @@ def vba_collapse_long_lines(vba_code):
1824 2038 :return: str, VBA module code with long lines collapsed
1825 2039 """
1826 2040 # TODO: use a regex instead, to allow whitespaces after the underscore?
1827   - vba_code = vba_code.replace(' _\r\n', ' ')
1828   - vba_code = vba_code.replace(' _\r', ' ')
1829   - vba_code = vba_code.replace(' _\n', ' ')
  2041 + try:
  2042 + vba_code = vba_code.replace(' _\r\n', ' ')
  2043 + vba_code = vba_code.replace(' _\r', ' ')
  2044 + vba_code = vba_code.replace(' _\n', ' ')
  2045 + except:
  2046 + log.exception('type(vba_code)=%s' % type(vba_code))
  2047 + raise
1830 2048 return vba_code
1831 2049  
1832 2050  
... ... @@ -1875,7 +2093,7 @@ def detect_autoexec(vba_code, obfuscation=None):
1875 2093 for keyword in keywords:
1876 2094 #TODO: if keyword is already a compiled regex, use it as-is
1877 2095 # search using regex to detect word boundaries:
1878   - match = re.search(r'(?i)\b' + keyword + r'\b', vba_code)
  2096 + match = re.search(r'(?i)\b' + re.escape(keyword) + r'\b', vba_code)
1879 2097 if match:
1880 2098 #if keyword.lower() in vba_code:
1881 2099 found_keyword = match.group()
... ... @@ -1901,7 +2119,8 @@ def detect_suspicious(vba_code, obfuscation=None):
1901 2119 for description, keywords in SUSPICIOUS_KEYWORDS.items():
1902 2120 for keyword in keywords:
1903 2121 # search using regex to detect word boundaries:
1904   - match = re.search(r'(?i)\b' + keyword + r'\b', vba_code)
  2122 + # note: each keyword must be escaped if it contains special chars such as '\'
  2123 + match = re.search(r'(?i)\b' + re.escape(keyword) + r'\b', vba_code)
1905 2124 if match:
1906 2125 #if keyword.lower() in vba_code:
1907 2126 found_keyword = match.group()
... ... @@ -1909,7 +2128,9 @@ def detect_suspicious(vba_code, obfuscation=None):
1909 2128 for description, keywords in SUSPICIOUS_KEYWORDS_NOREGEX.items():
1910 2129 for keyword in keywords:
1911 2130 if keyword.lower() in vba_code:
1912   - results.append((keyword, description + obf_text))
  2131 + # avoid reporting backspace chars out of plain VBA code:
  2132 + if not(keyword=='\b' and obfuscation is not None):
  2133 + results.append((keyword, description + obf_text))
1913 2134 return results
1914 2135  
1915 2136  
... ... @@ -1947,7 +2168,7 @@ def detect_hex_strings(vba_code):
1947 2168 for match in re_hex_string.finditer(vba_code):
1948 2169 value = match.group()
1949 2170 if value not in found:
1950   - decoded = binascii.unhexlify(value)
  2171 + decoded = bytes2str(binascii.unhexlify(value))
1951 2172 results.append((value, decoded))
1952 2173 found.add(value)
1953 2174 return results
... ... @@ -1972,7 +2193,7 @@ def detect_base64_strings(vba_code):
1972 2193 # only keep new values and not in the whitelist:
1973 2194 if value not in found and value.lower() not in BASE64_WHITELIST:
1974 2195 try:
1975   - decoded = base64.b64decode(value)
  2196 + decoded = bytes2str(base64.b64decode(value))
1976 2197 results.append((value, decoded))
1977 2198 found.add(value)
1978 2199 except (TypeError, ValueError) as exc:
... ... @@ -2000,7 +2221,7 @@ def detect_dridex_strings(vba_code):
2000 2221 continue
2001 2222 if value not in found:
2002 2223 try:
2003   - decoded = DridexUrlDecode(value)
  2224 + decoded = bytes2str(DridexUrlDecode(value))
2004 2225 results.append((value, decoded))
2005 2226 found.add(value)
2006 2227 except Exception as exc:
... ... @@ -2047,7 +2268,8 @@ def detect_vba_strings(vba_code):
2047 2268  
2048 2269  
2049 2270 def json2ascii(json_obj, encoding='utf8', errors='replace'):
2050   - """ ensure there is no unicode in json and all strings are safe to decode
  2271 + """
  2272 + ensure there is no unicode in json and all strings are safe to decode
2051 2273  
2052 2274 works recursively, decodes and re-encodes every string to/from unicode
2053 2275 to ensure there will be no trouble in loading the dumped json output
... ... @@ -2057,20 +2279,32 @@ def json2ascii(json_obj, encoding=&#39;utf8&#39;, errors=&#39;replace&#39;):
2057 2279 elif isinstance(json_obj, (bool, int, float)):
2058 2280 pass
2059 2281 elif isinstance(json_obj, str):
2060   - # de-code and re-encode
2061   - dencoded = json_obj.decode(encoding, errors).encode(encoding, errors)
2062   - if dencoded != json_obj:
2063   - log.debug('json2ascii: replaced: {0} (len {1})'
2064   - .format(json_obj, len(json_obj)))
2065   - log.debug('json2ascii: with: {0} (len {1})'
2066   - .format(dencoded, len(dencoded)))
2067   - return dencoded
2068   - elif isinstance(json_obj, unicode):
2069   - log.debug('json2ascii: encode unicode: {0}'
2070   - .format(json_obj.encode(encoding, errors)))
  2282 + if PYTHON2:
  2283 + # de-code and re-encode
  2284 + dencoded = json_obj.decode(encoding, errors).encode(encoding, errors)
  2285 + if dencoded != json_obj:
  2286 + log.debug('json2ascii: replaced: {0} (len {1})'
  2287 + .format(json_obj, len(json_obj)))
  2288 + log.debug('json2ascii: with: {0} (len {1})'
  2289 + .format(dencoded, len(dencoded)))
  2290 + return dencoded
  2291 + else:
  2292 + # on Python 3, just keep Unicode strings as-is:
  2293 + return json_obj
  2294 + elif isinstance(json_obj, unicode) and PYTHON2:
  2295 + # On Python 2, encode unicode to bytes:
  2296 + json_obj_bytes = json_obj.encode(encoding, errors)
  2297 + log.debug('json2ascii: encode unicode: {0}'.format(json_obj_bytes))
  2298 + # cannot put original into logger
  2299 + # print 'original: ' json_obj
  2300 + return json_obj_bytes
  2301 + elif isinstance(json_obj, bytes) and not PYTHON2:
  2302 + # On Python 3, decode bytes to unicode str
  2303 + json_obj_str = json_obj.decode(encoding, errors)
  2304 + log.debug('json2ascii: encode unicode: {0}'.format(json_obj_str))
2071 2305 # cannot put original into logger
2072 2306 # print 'original: ' json_obj
2073   - return json_obj.encode(encoding, errors)
  2307 + return json_obj_str
2074 2308 elif isinstance(json_obj, dict):
2075 2309 for key in json_obj:
2076 2310 json_obj[key] = json2ascii(json_obj[key])
... ... @@ -2096,7 +2330,6 @@ def print_json(json_dict=None, _json_is_first=False, _json_is_last=False,
2096 2330 :param bool _json_is_last: set to True only for very last entry to complete
2097 2331 the top-level json-list
2098 2332 """
2099   -
2100 2333 if json_dict and json_parts:
2101 2334 raise ValueError('Invalid json argument: want either single dict or '
2102 2335 'key=value parts but got both)')
... ... @@ -2177,7 +2410,7 @@ class VBA_Scanner(object):
2177 2410 # StrReverse after hex decoding:
2178 2411 self.code_hex_rev += '\n' + decoded[::-1]
2179 2412 # StrReverse before hex decoding:
2180   - self.code_rev_hex += '\n' + binascii.unhexlify(encoded[::-1])
  2413 + self.code_rev_hex += '\n' + bytes2str(binascii.unhexlify(encoded[::-1]))
2181 2414 #example: https://malwr.com/analysis/NmFlMGI4YTY1YzYyNDkwNTg1ZTBiZmY5OGI3YjlhYzU/
2182 2415 #TODO: also append the full code reversed if StrReverse? (risk of false positives?)
2183 2416 # Detect Base64-encoded strings
... ... @@ -2287,7 +2520,7 @@ def scan_vba(vba_code, include_decoded_strings, deobfuscate=False):
2287 2520 :param include_decoded_strings: bool, if True all encoded strings will be included with their decoded content.
2288 2521 :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
2289 2522 :return: list of tuples (type, keyword, description)
2290   - (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String')
  2523 + with type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String'
2291 2524 """
2292 2525 return VBA_Scanner(vba_code).scan(include_decoded_strings, deobfuscate)
2293 2526  
... ... @@ -2297,44 +2530,38 @@ def scan_vba(vba_code, include_decoded_strings, deobfuscate=False):
2297 2530 class VBA_Parser(object):
2298 2531 """
2299 2532 Class to parse MS Office files, to detect VBA macros and extract VBA source code
2300   - Supported file formats:
2301   - - Word 97-2003 (.doc, .dot)
2302   - - Word 2007+ (.docm, .dotm)
2303   - - Word 2003 XML (.xml)
2304   - - Word MHT - Single File Web Page / MHTML (.mht)
2305   - - Excel 97-2003 (.xls)
2306   - - Excel 2007+ (.xlsm, .xlsb)
2307   - - PowerPoint 97-2003 (.ppt)
2308   - - PowerPoint 2007+ (.pptm, .ppsm)
2309 2533 """
2310 2534  
2311   - def __init__(self, filename, data=None, container=None, relaxed=False):
  2535 + def __init__(self, filename, data=None, container=None, relaxed=False, encoding=DEFAULT_API_ENCODING):
2312 2536 """
2313 2537 Constructor for VBA_Parser
2314 2538  
2315   - :param filename: filename or path of file to parse, or file-like object
  2539 + :param str filename: filename or path of file to parse, or file-like object
2316 2540  
2317   - :param data: None or bytes str, if None the file will be read from disk (or from the file-like object).
2318   - If data is provided as a bytes string, it will be parsed as the content of the file in memory,
2319   - and not read from disk. Note: files must be read in binary mode, i.e. open(f, 'rb').
  2541 + :param bytes data: None or bytes str, if None the file will be read from disk (or from the file-like object).
  2542 + If data is provided as a bytes string, it will be parsed as the content of the file in memory,
  2543 + and not read from disk. Note: files must be read in binary mode, i.e. open(f, 'rb').
2320 2544  
2321   - :param container: str, path and filename of container if the file is within
2322   - a zip archive, None otherwise.
  2545 + :param str container: str, path and filename of container if the file is within
  2546 + a zip archive, None otherwise.
2323 2547  
2324   - :param relaxed: if True, treat mal-formed documents and missing streams more like MS office:
2325   - do nothing; if False (default), raise errors in these cases
  2548 + :param bool relaxed: if True, treat mal-formed documents and missing streams more like MS office:
  2549 + do nothing; if False (default), raise errors in these cases
2326 2550  
2327   - raises a FileOpenError if all attemps to interpret the data header failed
  2551 + :param str encoding: encoding for VBA source code and strings.
  2552 + Default: UTF-8 bytes strings on Python 2, unicode strings on Python 3 (None)
  2553 +
  2554 + raises a FileOpenError if all attempts to interpret the data header failed.
2328 2555 """
2329   - #TODO: filename should only be a string, data should be used for the file-like object
2330   - #TODO: filename should be mandatory, optional data is a string or file-like object
2331   - #TODO: also support olefile and zipfile as input
  2556 + # TODO: filename should only be a string, data should be used for the file-like object
  2557 + # TODO: filename should be mandatory, optional data is a string or file-like object
  2558 + # TODO: also support olefile and zipfile as input
2332 2559 if data is None:
2333 2560 # open file from disk:
2334 2561 _file = filename
2335 2562 else:
2336 2563 # file already read in memory, make it a file-like object for zipfile:
2337   - _file = StringIO(data)
  2564 + _file = BytesIO(data)
2338 2565 #self.file = _file
2339 2566 self.ole_file = None
2340 2567 self.ole_subfiles = []
... ... @@ -2359,6 +2586,13 @@ class VBA_Parser(object):
2359 2586 self.nb_base64strings = 0
2360 2587 self.nb_dridexstrings = 0
2361 2588 self.nb_vbastrings = 0
  2589 + #: Encoding for VBA source code and strings returned by all methods
  2590 + self.encoding = encoding
  2591 + self.xlm_macros = []
  2592 + #: Output from pcodedmp, disassembly of the VBA P-code
  2593 + self.pcodedmp_output = None
  2594 + #: Flag set to True/False if VBA stomping detected
  2595 + self.vba_stomping_detected = None
2362 2596  
2363 2597 # if filename is None:
2364 2598 # if isinstance(_file, basestring):
... ... @@ -2372,15 +2606,9 @@ class VBA_Parser(object):
2372 2606 # This looks like an OLE file
2373 2607 self.open_ole(_file)
2374 2608  
2375   - # check whether file is encrypted (need to do this before try ppt)
2376   - log.debug('Check encryption of ole file')
2377   - crypt_indicator = oleid.OleID(self.ole_file).check_encrypted()
2378   - if crypt_indicator.value:
2379   - raise FileIsEncryptedError(filename)
2380   -
2381 2609 # if this worked, try whether it is a ppt file (special ole file)
2382 2610 self.open_ppt()
2383   - if self.type is None and is_zipfile(_file):
  2611 + if self.type is None and zipfile.is_zipfile(_file):
2384 2612 # Zip file, which may be an OpenXML document
2385 2613 self.open_openxml(_file)
2386 2614 if self.type is None:
... ... @@ -2600,12 +2828,12 @@ class VBA_Parser(object):
2600 2828 try:
2601 2829 # parse the MIME content
2602 2830 # remove any leading whitespace or newline (workaround for issue in email package)
2603   - stripped_data = data.lstrip('\r\n\t ')
  2831 + stripped_data = data.lstrip(b'\r\n\t ')
2604 2832 # strip any junk from the beginning of the file
2605 2833 # (issue #31 fix by Greg C - gdigreg)
2606 2834 # TODO: improve keywords to avoid false positives
2607   - mime_offset = stripped_data.find('MIME')
2608   - content_offset = stripped_data.find('Content')
  2835 + mime_offset = stripped_data.find(b'MIME')
  2836 + content_offset = stripped_data.find(b'Content')
2609 2837 # if "MIME" is found, and located before "Content":
2610 2838 if -1 < mime_offset <= content_offset:
2611 2839 stripped_data = stripped_data[mime_offset:]
... ... @@ -2614,7 +2842,11 @@ class VBA_Parser(object):
2614 2842 elif content_offset > -1:
2615 2843 stripped_data = stripped_data[content_offset:]
2616 2844 # TODO: quick and dirty fix: insert a standard line with MIME-Version header?
2617   - mhtml = email.message_from_string(stripped_data)
  2845 + if PYTHON2:
  2846 + mhtml = email.message_from_string(stripped_data)
  2847 + else:
  2848 + # on Python 3, need to use message_from_bytes instead:
  2849 + mhtml = email.message_from_bytes(stripped_data)
2618 2850 # find all the attached files:
2619 2851 for part in mhtml.walk():
2620 2852 content_type = part.get_content_type() # always returns a value
... ... @@ -2627,7 +2859,7 @@ class VBA_Parser(object):
2627 2859 # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded.
2628 2860 # decompress the zlib data starting at offset 0x32, which is the OLE container:
2629 2861 # check ActiveMime header:
2630   - if isinstance(part_data, str) and is_mso_file(part_data):
  2862 + if isinstance(part_data, bytes) and is_mso_file(part_data):
2631 2863 log.debug('Found ActiveMime header, decompressing MSO container')
2632 2864 try:
2633 2865 ole_data = mso_file_extract(part_data)
... ... @@ -2697,7 +2929,9 @@ class VBA_Parser(object):
2697 2929 """
2698 2930 log.info('Opening text file %s' % self.filename)
2699 2931 # directly store the source code:
2700   - self.vba_code_all_modules = data
  2932 + # On Python 2, store it as a raw bytes string
  2933 + # On Python 3, convert it to unicode assuming it was encoded with UTF-8
  2934 + self.vba_code_all_modules = bytes2str(data)
2701 2935 self.contains_macros = True
2702 2936 # set type only if parsing succeeds
2703 2937 self.type = TYPE_TEXT
... ... @@ -2853,7 +3087,7 @@ class VBA_Parser(object):
2853 3087 log.debug('%r...[much more data]...%r' % (data[:100], data[-50:]))
2854 3088 else:
2855 3089 log.debug(repr(data))
2856   - if 'Attribut\x00' in data:
  3090 + if b'Attribut\x00' in data:
2857 3091 log.debug('Found VBA compressed code')
2858 3092 self.contains_macros = True
2859 3093 except IOError as exc:
... ... @@ -2862,8 +3096,44 @@ class VBA_Parser(object):
2862 3096 log.debug('Trace:', exc_trace=True)
2863 3097 else:
2864 3098 raise SubstreamOpenError(self.filename, d.name, exc)
  3099 + if self.detect_xlm_macros():
  3100 + self.contains_macros = True
2865 3101 return self.contains_macros
2866 3102  
  3103 + def detect_xlm_macros(self):
  3104 + from oletools.thirdparty.oledump.plugin_biff import cBIFF
  3105 + self.xlm_macros = []
  3106 + if self.ole_file is None:
  3107 + return False
  3108 + for excel_stream in ('Workbook', 'Book'):
  3109 + if self.ole_file.exists(excel_stream):
  3110 + log.debug('Found Excel stream %r' % excel_stream)
  3111 + data = self.ole_file.openstream(excel_stream).read()
  3112 + log.debug('Running BIFF plugin from oledump')
  3113 + try:
  3114 + biff_plugin = cBIFF(name=[excel_stream], stream=data, options='-x')
  3115 + self.xlm_macros = biff_plugin.Analyze()
  3116 + if len(self.xlm_macros)>0:
  3117 + log.debug('Found XLM macros')
  3118 + return True
  3119 + except:
  3120 + log.exception('Error when running oledump.plugin_biff, please report to %s' % URL_OLEVBA_ISSUES)
  3121 + return False
  3122 +
  3123 +
  3124 + def encode_string(self, unicode_str):
  3125 + """
  3126 + Encode a unicode string to bytes or str, using the specified encoding
  3127 + for the VBA_parser. By default, it will be bytes/UTF-8 on Python 2, and
  3128 + a normal unicode string on Python 3.
  3129 + :param str unicode_str: string to be encoded
  3130 + :return: encoded string
  3131 + """
  3132 + if self.encoding is None:
  3133 + return unicode_str
  3134 + else:
  3135 + return unicode_str.encode(self.encoding, errors='replace')
  3136 +
2867 3137 def extract_macros(self):
2868 3138 """
2869 3139 Extract and decompress source code for each VBA macro found in the file
... ... @@ -2920,18 +3190,33 @@ class VBA_Parser(object):
2920 3190 # read data
2921 3191 log.debug('Reading data from stream %r' % d.name)
2922 3192 data = ole._open(d.isectStart, d.size).read()
2923   - for match in re.finditer(r'\x00Attribut[^e]', data, flags=re.IGNORECASE):
  3193 + for match in re.finditer(b'\\x00Attribut[^e]', data, flags=re.IGNORECASE):
2924 3194 start = match.start() - 3
2925 3195 log.debug('Found VBA compressed code at index %X' % start)
2926 3196 compressed_code = data[start:]
2927 3197 try:
2928   - vba_code = decompress_stream(compressed_code)
  3198 + vba_code = decompress_stream(bytearray(compressed_code))
  3199 + # TODO vba_code = self.encode_string(vba_code)
2929 3200 yield (self.filename, d.name, d.name, vba_code)
2930 3201 except Exception as exc:
2931 3202 # display the exception with full stack trace for debugging
2932 3203 log.debug('Error processing stream %r in file %r (%s)' % (d.name, self.filename, exc))
2933 3204 log.debug('Traceback:', exc_info=True)
2934 3205 # do not raise the error, as it is unlikely to be a compressed macro stream
  3206 + if self.xlm_macros:
  3207 + vba_code = ''
  3208 + for line in self.xlm_macros:
  3209 + vba_code += "' " + line + '\n'
  3210 + yield ('xlm_macro', 'xlm_macro', 'xlm_macro.txt', vba_code)
  3211 + # Analyse the VBA P-code to detect VBA stomping:
  3212 + # If stomping is detected, add a fake VBA module with the P-code as source comments
  3213 + # so that VBA_Scanner can find keywords and IOCs in it
  3214 + if self.detect_vba_stomping():
  3215 + vba_code = ''
  3216 + for line in self.pcodedmp_output.splitlines():
  3217 + vba_code += "' " + line + '\n'
  3218 + yield ('VBA P-code', 'VBA P-code', 'VBA_P-code.txt', vba_code)
  3219 +
2935 3220  
2936 3221 def extract_all_macros(self):
2937 3222 """
... ... @@ -2953,6 +3238,8 @@ class VBA_Parser(object):
2953 3238 """
2954 3239 runs extract_macros and analyze the source code of all VBA macros
2955 3240 found in the file.
  3241 + All results are stored in self.analysis_results.
  3242 + If called more than once, simply returns the previous results.
2956 3243 """
2957 3244 if self.detect_vba_macros():
2958 3245 # if the analysis was already done, avoid doing it twice:
... ... @@ -2969,6 +3256,13 @@ class VBA_Parser(object):
2969 3256 # Analyze the whole code at once:
2970 3257 scanner = VBA_Scanner(self.vba_code_all_modules)
2971 3258 self.analysis_results = scanner.scan(show_decoded_strings, deobfuscate)
  3259 + if self.detect_vba_stomping():
  3260 + log.debug('adding VBA stomping to suspicious keywords')
  3261 + keyword = 'VBA Stomping'
  3262 + description = 'VBA Stomping was detected: the VBA source code and P-code are different, '\
  3263 + 'this may have been used to hide malicious code'
  3264 + scanner.suspicious_keywords.append((keyword, description))
  3265 + scanner.results.append(('Suspicious', keyword, description))
2972 3266 autoexec, suspicious, iocs, hexstrings, base64strings, dridex, vbastrings = scanner.scan_summary()
2973 3267 self.nb_autoexec += autoexec
2974 3268 self.nb_suspicious += suspicious
... ... @@ -3080,11 +3374,12 @@ class VBA_Parser(object):
3080 3374 """
3081 3375 Extract printable strings from each VBA Form found in the file
3082 3376  
3083   - Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found
  3377 + Iterator: yields (filename, stream_path, form_string) for each printable string found in forms
3084 3378 If the file is OLE, filename is the path of the file.
3085 3379 If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros
3086 3380 within the zip archive, e.g. word/vbaProject.bin.
3087 3381 If the file is PPT, result is as for OpenXML but filename is useless
  3382 + Note: form_string is a raw bytes string on Python 2, a unicode str on Python 3
3088 3383 """
3089 3384 if self.ole_file is None:
3090 3385 # This may be either an OpenXML/PPT or a text file:
... ... @@ -3107,7 +3402,13 @@ class VBA_Parser(object):
3107 3402 # Extract printable strings from the form object stream "o":
3108 3403 for m in re_printable_string.finditer(form_data):
3109 3404 log.debug('Printable string found in form: %r' % m.group())
3110   - yield (self.filename, '/'.join(o_stream), m.group())
  3405 + # On Python 3, convert bytes string to unicode str:
  3406 + if PYTHON2:
  3407 + found_str = m.group()
  3408 + else:
  3409 + found_str = m.group().decode('utf8', errors='replace')
  3410 + if found_str != 'Tahoma':
  3411 + yield (self.filename, '/'.join(o_stream), found_str)
3111 3412  
3112 3413 def extract_form_strings_extended(self):
3113 3414 if self.ole_file is None:
... ... @@ -3128,6 +3429,136 @@ class VBA_Parser(object):
3128 3429 for variable in oleform.extract_OleFormVariables(ole, form_storage):
3129 3430 yield (self.filename, '/'.join(form_storage), variable)
3130 3431  
  3432 + def extract_pcode(self):
  3433 + """
  3434 + Extract and disassemble the VBA P-code, using pcodedmp
  3435 +
  3436 + :return: VBA P-code disassembly
  3437 + :rtype: str
  3438 + """
  3439 + # only run it once:
  3440 + if self.pcodedmp_output is None:
  3441 + log.debug('Calling pcodedmp to extract and disassemble the VBA P-code')
  3442 + # import pcodedmp here to avoid circular imports:
  3443 + try:
  3444 + from pcodedmp import pcodedmp
  3445 + except Exception as e:
  3446 + # This may happen with Pypy, because pcodedmp imports win_unicode_console...
  3447 + # TODO: this is a workaround, we just ignore P-code
  3448 + # TODO: here we just use log.info, because the word "error" in the output makes some of the tests fail...
  3449 + log.info('Exception when importing pcodedmp: {}'.format(e))
  3450 + self.pcodedmp_output = ''
  3451 + return ''
  3452 + # logging is disabled after importing pcodedmp, need to re-enable it
  3453 + # This is because pcodedmp imports olevba again :-/
  3454 + # TODO: here it works only if logging was enabled, need to change pcodedmp!
  3455 + enable_logging()
  3456 + # pcodedmp prints all its output to sys.stdout, so we need to capture it so that
  3457 + # we can process the results later on.
  3458 + # save sys.stdout, then modify it to capture pcodedmp's output:
  3459 + # stdout = sys.stdout
  3460 + if PYTHON2:
  3461 + # on Python 2, console output is bytes
  3462 + output = BytesIO()
  3463 + else:
  3464 + # on Python 3, console output is unicode
  3465 + output = StringIO()
  3466 + # sys.stdout = output
  3467 + # we need to fake an argparser for those two args used by pcodedmp:
  3468 + class args:
  3469 + disasmOnly = True
  3470 + verbose = False
  3471 + try:
  3472 + # TODO: handle files in memory too
  3473 + log.debug('before pcodedmp')
  3474 + pcodedmp.processFile(self.filename, args, output_file=output)
  3475 + log.debug('after pcodedmp')
  3476 + except Exception as e:
  3477 + # print('Error while running pcodedmp: {}'.format(e), file=sys.stderr, flush=True)
  3478 + # set sys.stdout back to its original value
  3479 + # sys.stdout = stdout
  3480 + log.exception('Error while running pcodedmp')
  3481 + # finally:
  3482 + # # set sys.stdout back to its original value
  3483 + # sys.stdout = stdout
  3484 + self.pcodedmp_output = output.getvalue()
  3485 + # print(self.pcodedmp_output)
  3486 + # log.debug(self.pcodedmp_output)
  3487 + return self.pcodedmp_output
  3488 +
  3489 + def detect_vba_stomping(self):
  3490 + """
  3491 + Detect VBA stomping, by comparing the keywords present in the P-code and
  3492 + in the VBA source code.
  3493 +
  3494 + :return: True if VBA stomping detected, False otherwise
  3495 + :rtype: bool
  3496 + """
  3497 + # only run it once:
  3498 + if self.vba_stomping_detected is None:
  3499 + log.debug('Analysing the P-code to detect VBA stomping')
  3500 + self.extract_pcode()
  3501 + # print('pcodedmp OK')
  3502 + log.debug('pcodedmp OK')
  3503 + # process the output to extract keywords, to detect VBA stomping
  3504 + keywords = set()
  3505 + for line in self.pcodedmp_output.splitlines():
  3506 + if line.startswith('\t'):
  3507 + log.debug('P-code: ' + line.strip())
  3508 + tokens = line.split(None, 1)
  3509 + mnemonic = tokens[0]
  3510 + args = ''
  3511 + if len(tokens) == 2:
  3512 + args = tokens[1].strip()
  3513 + # log.debug(repr([mnemonic, args]))
  3514 + # if mnemonic in ('VarDefn',):
  3515 + # # just add the rest of the line
  3516 + # keywords.add(args)
  3517 + # if mnemonic == 'FuncDefn':
  3518 + # # function definition: just strip parentheses
  3519 + # funcdefn = args.strip('()')
  3520 + # keywords.add(funcdefn)
  3521 + if mnemonic in ('ArgsCall', 'ArgsLd', 'St', 'Ld', 'MemSt', 'Label'):
  3522 + # add 1st argument:
  3523 + name = args.split(None, 1)[0]
  3524 + # sometimes pcodedmp reports names like "id_FFFF", which are not
  3525 + # directly present in the VBA source code
  3526 + # (for example "Me" in VBA appears as id_FFFF in P-code)
  3527 + if not name.startswith('id_'):
  3528 + keywords.add(name)
  3529 + if mnemonic == 'LitStr':
  3530 + # re_string = re.compile(r'\"([^\"]|\"\")*\"')
  3531 + # for match in re_string.finditer(line):
  3532 + # print('\t' + match.group())
  3533 + # the string is the 2nd argument:
  3534 + s = args.split(None, 1)[1]
  3535 + # tricky issue: when a string contains double quotes inside,
  3536 + # pcodedmp returns a single ", whereas in the VBA source code
  3537 + # it is always a double "".
  3538 + # We have to remove the " around the strings, then double the remaining ",
  3539 + # and put back the " around:
  3540 + if len(s)>=2:
  3541 + assert(s[0]=='"' and s[-1]=='"')
  3542 + s = s[1:-1]
  3543 + s = s.replace('"', '""')
  3544 + s = '"' + s + '"'
  3545 + keywords.add(s)
  3546 + log.debug('Keywords extracted from P-code: ' + repr(sorted(keywords)))
  3547 + self.vba_stomping_detected = False
  3548 + # TODO: add a method to get all VBA code as one string
  3549 + vba_code_all_modules = ''
  3550 + for (_, _, _, vba_code) in self.extract_all_macros():
  3551 + vba_code_all_modules += vba_code + '\n'
  3552 + for keyword in keywords:
  3553 + if keyword not in vba_code_all_modules:
  3554 + log.debug('Keyword {!r} not found in VBA code'.format(keyword))
  3555 + log.debug('VBA STOMPING DETECTED!')
  3556 + self.vba_stomping_detected = True
  3557 + break
  3558 + if not self.vba_stomping_detected:
  3559 + log.debug('No VBA stomping detected.')
  3560 + return self.vba_stomping_detected
  3561 +
3131 3562 def close(self):
3132 3563 """
3133 3564 Close all the open files. This method must be called after usage, if
... ... @@ -3156,11 +3587,11 @@ class VBA_Parser_CLI(VBA_Parser):
3156 3587 super(VBA_Parser_CLI, self).__init__(*args, **kwargs)
3157 3588  
3158 3589  
3159   - def print_analysis(self, show_decoded_strings=False, deobfuscate=False):
  3590 + def run_analysis(self, show_decoded_strings=False, deobfuscate=False):
3160 3591 """
3161   - Analyze the provided VBA code, and print the results in a table
  3592 + Analyze the provided VBA code, without printing the results (yet)
  3593 + All results are stored in self.analysis_results.
3162 3594  
3163   - :param vba_code: str, VBA source code to be analyzed
3164 3595 :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
3165 3596 :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
3166 3597 :return: None
... ... @@ -3169,21 +3600,37 @@ class VBA_Parser_CLI(VBA_Parser):
3169 3600 if sys.stdout.isatty():
3170 3601 print('Analysis...\r', end='')
3171 3602 sys.stdout.flush()
3172   - results = self.analyze_macros(show_decoded_strings, deobfuscate)
  3603 + self.analyze_macros(show_decoded_strings, deobfuscate)
  3604 +
  3605 +
  3606 + def print_analysis(self, show_decoded_strings=False, deobfuscate=False):
  3607 + """
  3608 + print the analysis results in a table
  3609 +
  3610 + :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
  3611 + :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
  3612 + :return: None
  3613 + """
  3614 + results = self.analysis_results
3173 3615 if results:
3174   - t = prettytable.PrettyTable(('Type', 'Keyword', 'Description'))
3175   - t.align = 'l'
3176   - t.max_width['Type'] = 10
3177   - t.max_width['Keyword'] = 20
3178   - t.max_width['Description'] = 39
  3616 + t = tablestream.TableStream(column_width=(10, 20, 45),
  3617 + header_row=('Type', 'Keyword', 'Description'))
  3618 + COLOR_TYPE = {
  3619 + 'AutoExec': 'yellow',
  3620 + 'Suspicious': 'red',
  3621 + 'IOC': 'cyan',
  3622 + }
3179 3623 for kw_type, keyword, description in results:
3180 3624 # handle non printable strings:
3181 3625 if not is_printable(keyword):
3182 3626 keyword = repr(keyword)
3183 3627 if not is_printable(description):
3184 3628 description = repr(description)
3185   - t.add_row((kw_type, keyword, description))
3186   - print(t)
  3629 + color_type = COLOR_TYPE.get(kw_type, None)
  3630 + t.write_row((kw_type, keyword, description), colors=(color_type, None, None))
  3631 + t.close()
  3632 + if self.vba_stomping_detected:
  3633 + print('VBA Stomping detection is experimental: please report any false positive/negative at https://github.com/decalage2/oletools/issues')
3187 3634 else:
3188 3635 print('No suspicious keyword or IOC found.')
3189 3636  
... ... @@ -3204,10 +3651,29 @@ class VBA_Parser_CLI(VBA_Parser):
3204 3651 return [dict(type=kw_type, keyword=keyword, description=description)
3205 3652 for kw_type, keyword, description in self.analyze_macros(show_decoded_strings, deobfuscate)]
3206 3653  
  3654 + def colorize_keywords(self, vba_code):
  3655 + """
  3656 + Colorize keywords found during the VBA code analysis
  3657 + :param vba_code: str, VBA code to be colorized
  3658 + :return: str, VBA code including color tags for Colorclass
  3659 + """
  3660 + results = self.analysis_results
  3661 + if results:
  3662 + COLOR_TYPE = {
  3663 + 'AutoExec': 'yellow',
  3664 + 'Suspicious': 'red',
  3665 + 'IOC': 'cyan',
  3666 + }
  3667 + for kw_type, keyword, description in results:
  3668 + color_type = COLOR_TYPE.get(kw_type, None)
  3669 + if color_type:
  3670 + vba_code = vba_code.replace(keyword, '{auto%s}%s{/%s}' % (color_type, keyword, color_type))
  3671 + return vba_code
  3672 +
3207 3673 def process_file(self, show_decoded_strings=False,
3208 3674 display_code=True, hide_attributes=True,
3209 3675 vba_code_only=False, show_deobfuscated_code=False,
3210   - deobfuscate=False):
  3676 + deobfuscate=False, pcode=False):
3211 3677 """
3212 3678 Process a single file
3213 3679  
... ... @@ -3219,6 +3685,7 @@ class VBA_Parser_CLI(VBA_Parser):
3219 3685 otherwise each module is analyzed separately (old behaviour)
3220 3686 :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default)
3221 3687 :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
  3688 + :param pcode bool: if True, call pcodedmp to disassemble P-code and display it
3222 3689 """
3223 3690 #TODO: replace print by writing to a provided output file (sys.stdout by default)
3224 3691 # fix conflicting parameters:
... ... @@ -3234,6 +3701,8 @@ class VBA_Parser_CLI(VBA_Parser):
3234 3701 #TODO: handle olefile errors, when an OLE file is malformed
3235 3702 print('Type: %s'% self.type)
3236 3703 if self.detect_vba_macros():
  3704 + # run analysis before displaying VBA code, in order to colorize found keywords
  3705 + self.run_analysis(show_decoded_strings=show_decoded_strings, deobfuscate=deobfuscate)
3237 3706 #print 'Contains VBA Macros:'
3238 3707 for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros():
3239 3708 if hide_attributes:
... ... @@ -3251,21 +3720,30 @@ class VBA_Parser_CLI(VBA_Parser):
3251 3720 print('(empty macro)')
3252 3721 else:
3253 3722 # check if the VBA code contains special characters such as backspace (issue #358)
3254   - if b'\x08' in vba_code_filtered:
  3723 + if '\x08' in vba_code_filtered:
3255 3724 log.warning('The VBA code contains special characters such as backspace, that may be used for obfuscation.')
3256 3725 if sys.stdout.isatty():
3257 3726 # if the standard output is the console, we'll display colors
3258 3727 backspace = colorclass.Color(b'{autored}\\x08{/red}')
3259 3728 else:
3260   - backspace = b'\x08'
  3729 + backspace = '\x08'
3261 3730 # replace backspace by "\x08" for display
3262   - vba_code_filtered = vba_code_filtered.replace(b'\x08', backspace)
  3731 + vba_code_filtered = vba_code_filtered.replace('\x08', backspace)
  3732 + try:
  3733 + # Colorize the interesting keywords in the output:
  3734 + # (unless the output is redirected to a file)
  3735 + if sys.stdout.isatty():
  3736 + vba_code_filtered = colorclass.Color(self.colorize_keywords(vba_code_filtered))
  3737 + except UnicodeError:
  3738 + # TODO better handling of Unicode
  3739 + log.error('Unicode conversion to be fixed before colorizing the output')
3263 3740 print(vba_code_filtered)
3264 3741 for (subfilename, stream_path, form_string) in self.extract_form_strings():
3265   - print('-' * 79)
3266   - print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path))
3267   - print('- ' * 39)
3268   - print(form_string)
  3742 + if form_string is not None:
  3743 + print('-' * 79)
  3744 + print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path))
  3745 + print('- ' * 39)
  3746 + print(form_string)
3269 3747 try:
3270 3748 for (subfilename, stream_path, form_variables) in self.extract_form_strings_extended():
3271 3749 if form_variables is not None:
... ... @@ -3277,6 +3755,11 @@ class VBA_Parser_CLI(VBA_Parser):
3277 3755 # display the exception with full stack trace for debugging
3278 3756 log.info('Error parsing form: %s' % exc)
3279 3757 log.debug('Traceback:', exc_info=True)
  3758 + if pcode:
  3759 + print('-' * 79)
  3760 + print('P-CODE disassembly:')
  3761 + pcode = self.extract_pcode()
  3762 + print(pcode)
3280 3763  
3281 3764 if not vba_code_only:
3282 3765 # analyse the code from all modules at once:
... ... @@ -3398,16 +3881,6 @@ class VBA_Parser_CLI(VBA_Parser):
3398 3881  
3399 3882 line = '%-12s %s' % (flags, self.filename)
3400 3883 print(line)
3401   -
3402   - # old table display:
3403   - # macros = autoexec = suspicious = iocs = hexstrings = 'no'
3404   - # if nb_macros: macros = 'YES:%d' % nb_macros
3405   - # if nb_autoexec: autoexec = 'YES:%d' % nb_autoexec
3406   - # if nb_suspicious: suspicious = 'YES:%d' % nb_suspicious
3407   - # if nb_iocs: iocs = 'YES:%d' % nb_iocs
3408   - # if nb_hexstrings: hexstrings = 'YES:%d' % nb_hexstrings
3409   - # # 2nd line = info
3410   - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % (self.type, macros, autoexec, suspicious, iocs, hexstrings)
3411 3884 except Exception as exc:
3412 3885 # display the exception with full stack trace for debugging only
3413 3886 log.debug('Error processing file %s (%s)' % (self.filename, exc),
... ... @@ -3415,20 +3888,6 @@ class VBA_Parser_CLI(VBA_Parser):
3415 3888 raise ProcessingError(self.filename, exc)
3416 3889  
3417 3890  
3418   - # t = prettytable.PrettyTable(('filename', 'type', 'macros', 'autoexec', 'suspicious', 'ioc', 'hexstrings'),
3419   - # header=False, border=False)
3420   - # t.align = 'l'
3421   - # t.max_width['filename'] = 30
3422   - # t.max_width['type'] = 10
3423   - # t.max_width['macros'] = 6
3424   - # t.max_width['autoexec'] = 6
3425   - # t.max_width['suspicious'] = 6
3426   - # t.max_width['ioc'] = 6
3427   - # t.max_width['hexstrings'] = 6
3428   - # t.add_row((filename, ftype, macros, autoexec, suspicious, iocs, hexstrings))
3429   - # print t
3430   -
3431   -
3432 3891 #=== MAIN =====================================================================
3433 3892  
3434 3893 def parse_args(cmd_line_args=None):
... ... @@ -3452,7 +3911,11 @@ def parse_args(cmd_line_args=None):
3452 3911 parser.add_option("-r", action="store_true", dest="recursive",
3453 3912 help='find files recursively in subdirectories.')
3454 3913 parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None,
3455   - help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)')
  3914 + help='if the file is a zip archive, open all files from it, using the provided password.')
  3915 + parser.add_option("-p", "--password", type='str', action='append',
  3916 + default=[],
  3917 + help='if encrypted office files are encountered, try '
  3918 + 'decryption with this password. May be repeated.')
3456 3919 parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*',
3457 3920 help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)')
3458 3921 # output mode; could make this even simpler with add_option(type='choice') but that would make
... ... @@ -3484,12 +3947,17 @@ def parse_args(cmd_line_args=None):
3484 3947 help="Attempt to deobfuscate VBA expressions (slow)")
3485 3948 parser.add_option('--relaxed', dest="relaxed", action="store_true", default=False,
3486 3949 help="Do not raise errors if opening of substream fails")
  3950 + parser.add_option('--pcode', dest="pcode", action="store_true", default=False,
  3951 + help="Disassemble and display the P-code (using pcodedmp)")
3487 3952  
3488 3953 (options, args) = parser.parse_args(cmd_line_args)
3489 3954  
3490 3955 # Print help if no arguments are passed
3491 3956 if len(args) == 0:
3492   - print('olevba %s - http://decalage.info/python/oletools' % __version__)
  3957 + # print banner with version
  3958 + python_version = '%d.%d.%d' % sys.version_info[0:3]
  3959 + print('olevba %s on Python %s - http://decalage.info/python/oletools' %
  3960 + (__version__, python_version))
3493 3961 print(__doc__)
3494 3962 parser.print_help()
3495 3963 sys.exit(RETURN_WRONG_ARGS)
... ... @@ -3499,6 +3967,112 @@ def parse_args(cmd_line_args=None):
3499 3967 return options, args
3500 3968  
3501 3969  
  3970 +def process_file(filename, data, container, options, crypto_nesting=0):
  3971 + """
  3972 + Part of main function that processes a single file.
  3973 +
  3974 + This handles exceptions and encryption.
  3975 +
  3976 + Returns a single code summarizing the status of processing of this file
  3977 + """
  3978 + try:
  3979 + # Open the file
  3980 + vba_parser = VBA_Parser_CLI(filename, data=data, container=container,
  3981 + relaxed=options.relaxed)
  3982 +
  3983 + if options.output_mode == 'detailed':
  3984 + # fully detailed output
  3985 + vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
  3986 + display_code=options.display_code,
  3987 + hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
  3988 + show_deobfuscated_code=options.show_deobfuscated_code,
  3989 + deobfuscate=options.deobfuscate, pcode=options.pcode)
  3990 + elif options.output_mode == 'triage':
  3991 + # summarized output for triage:
  3992 + vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings,
  3993 + deobfuscate=options.deobfuscate)
  3994 + elif options.output_mode == 'json':
  3995 + print_json(
  3996 + vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings,
  3997 + display_code=options.display_code,
  3998 + hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
  3999 + show_deobfuscated_code=options.show_deobfuscated_code,
  4000 + deobfuscate=options.deobfuscate))
  4001 + else: # (should be impossible)
  4002 + raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode))
  4003 +
  4004 + # even if processing succeeds, file might still be encrypted
  4005 + log.debug('Checking for encryption (normal)')
  4006 + if not crypto.is_encrypted(filename):
  4007 + log.debug('no encryption detected')
  4008 + return RETURN_OK
  4009 + except Exception as exc:
  4010 + log.debug('Checking for encryption (after exception)')
  4011 + if crypto.is_encrypted(filename):
  4012 + pass # deal with this below
  4013 + else:
  4014 + if isinstance(exc, (SubstreamOpenError, UnexpectedDataError)):
  4015 + if options.output_mode in ('triage', 'unspecified'):
  4016 + print('%-12s %s - Error opening substream or uenxpected ' \
  4017 + 'content' % ('?', filename))
  4018 + elif options.output_mode == 'json':
  4019 + print_json(file=filename, type='error',
  4020 + error=type(exc).__name__, message=str(exc))
  4021 + else:
  4022 + log.exception('Error opening substream or unexpected '
  4023 + 'content in %s' % filename)
  4024 + return RETURN_OPEN_ERROR
  4025 + elif isinstance(exc, FileOpenError):
  4026 + if options.output_mode in ('triage', 'unspecified'):
  4027 + print('%-12s %s - File format not supported' % ('?', filename))
  4028 + elif options.output_mode == 'json':
  4029 + print_json(file=filename, type='error',
  4030 + error=type(exc).__name__, message=str(exc))
  4031 + else:
  4032 + log.exception('Failed to open %s -- probably not supported!' % filename)
  4033 + return RETURN_OPEN_ERROR
  4034 + elif isinstance(exc, ProcessingError):
  4035 + if options.output_mode in ('triage', 'unspecified'):
  4036 + print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc))
  4037 + elif options.output_mode == 'json':
  4038 + print_json(file=filename, type='error',
  4039 + error=type(exc).__name__,
  4040 + message=str(exc.orig_exc))
  4041 + else:
  4042 + log.exception('Error processing file %s (%s)!'
  4043 + % (filename, exc.orig_exc))
  4044 + return RETURN_PARSE_ERROR
  4045 + else:
  4046 + raise # let caller deal with this
  4047 +
  4048 + # we reach this point only if file is encrypted
  4049 + # check if this is an encrypted file in an encrypted file in an ...
  4050 + if crypto_nesting >= crypto.MAX_NESTING_DEPTH:
  4051 + raise crypto.MaxCryptoNestingReached(crypto_nesting, filename)
  4052 +
  4053 + decrypted_file = None
  4054 + try:
  4055 + log.debug('Checking encryption passwords {}'.format(options.password))
  4056 + passwords = options.password + crypto.DEFAULT_PASSWORDS
  4057 + decrypted_file = crypto.decrypt(filename, passwords)
  4058 + if not decrypted_file:
  4059 + log.error('Decrypt failed, run with debug output to get details')
  4060 + raise crypto.WrongEncryptionPassword(filename)
  4061 + log.info('Working on decrypted file')
  4062 + return process_file(decrypted_file, data, container or filename,
  4063 + options, crypto_nesting+1)
  4064 + except Exception:
  4065 + raise
  4066 + finally: # clean up
  4067 + try:
  4068 + log.debug('Removing crypt temp file {}'.format(decrypted_file))
  4069 + os.unlink(decrypted_file)
  4070 + except Exception: # e.g. file does not exist or is None
  4071 + pass
  4072 + # no idea what to return now
  4073 + raise Exception('Programming error -- should never have reached this!')
  4074 +
  4075 +
3502 4076 def main(cmd_line_args=None):
3503 4077 """
3504 4078 Main function, called when olevba is run from the command line
... ... @@ -3517,52 +4091,60 @@ def main(cmd_line_args=None):
3517 4091 url='http://decalage.info/python/oletools',
3518 4092 type='MetaInformation', _json_is_first=True)
3519 4093 else:
3520   - print('olevba %s - http://decalage.info/python/oletools' % __version__)
  4094 + # print banner with version
  4095 + python_version = '%d.%d.%d' % sys.version_info[0:3]
  4096 + print('olevba %s on Python %s - http://decalage.info/python/oletools' %
  4097 + (__version__, python_version))
3521 4098  
3522 4099 logging.basicConfig(level=options.loglevel, format='%(levelname)-8s %(message)s')
3523 4100 # enable logging in the modules:
3524 4101 enable_logging()
3525 4102  
3526   - # Old display with number of items detected:
3527   - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('Type', 'Macros', 'AutoEx', 'Susp.', 'IOCs', 'HexStr')
3528   - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('-'*8, '-'*7, '-'*7, '-'*7, '-'*7, '-'*7)
3529   -
3530 4103 # with the option --reveal, make sure --deobf is also enabled:
3531 4104 if options.show_deobfuscated_code and not options.deobfuscate:
3532   - log.info('set --deobf because --reveal was set')
  4105 + log.debug('set --deobf because --reveal was set')
3533 4106 options.deobfuscate = True
3534 4107 if options.output_mode == 'triage' and options.show_deobfuscated_code:
3535   - log.info('ignoring option --reveal in triage output mode')
  4108 + log.debug('ignoring option --reveal in triage output mode')
  4109 +
  4110 + # gather info on all files that must be processed
  4111 + # ignore directory names stored in zip files:
  4112 + all_input_info = tuple((container, filename, data) for
  4113 + container, filename, data in xglob.iter_files(
  4114 + args, recursive=options.recursive,
  4115 + zip_password=options.zip_password,
  4116 + zip_fname=options.zip_fname)
  4117 + if not (container and filename.endswith('/')))
  4118 +
  4119 + # specify output mode if options -t, -d and -j were not specified
  4120 + if options.output_mode == 'unspecified':
  4121 + if len(all_input_info) == 1:
  4122 + options.output_mode = 'detailed'
  4123 + else:
  4124 + options.output_mode = 'triage'
3536 4125  
3537   - # Column headers (do not know how many files there will be yet, so if no output_mode
3538   - # was specified, we will print triage for first file --> need these headers)
3539   - if options.output_mode in ('triage', 'unspecified'):
  4126 + # Column headers for triage mode
  4127 + if options.output_mode == 'triage':
3540 4128 print('%-12s %-65s' % ('Flags', 'Filename'))
3541 4129 print('%-12s %-65s' % ('-' * 11, '-' * 65))
3542 4130  
3543 4131 previous_container = None
3544 4132 count = 0
3545 4133 container = filename = data = None
3546   - vba_parser = None
3547 4134 return_code = RETURN_OK
3548 4135 try:
3549   - for container, filename, data in xglob.iter_files(args, recursive=options.recursive,
3550   - zip_password=options.zip_password, zip_fname=options.zip_fname):
3551   - # ignore directory names stored in zip files:
3552   - if container and filename.endswith('/'):
3553   - continue
3554   -
  4136 + for container, filename, data in all_input_info:
3555 4137 # handle errors from xglob
3556 4138 if isinstance(data, Exception):
3557 4139 if isinstance(data, PathNotFoundException):
3558   - if options.output_mode in ('triage', 'unspecified'):
  4140 + if options.output_mode == 'triage':
3559 4141 print('%-12s %s - File not found' % ('?', filename))
3560 4142 elif options.output_mode != 'json':
3561 4143 log.error('Given path %r does not exist!' % filename)
3562 4144 return_code = RETURN_FILE_NOT_FOUND if return_code == 0 \
3563 4145 else RETURN_SEVERAL_ERRS
3564 4146 else:
3565   - if options.output_mode in ('triage', 'unspecified'):
  4147 + if options.output_mode == 'triage':
3566 4148 print('%-12s %s - Failed to read from zip file %s' % ('?', filename, container))
3567 4149 elif options.output_mode != 'json':
3568 4150 log.error('Exception opening/reading %r from zip file %r: %s'
... ... @@ -3574,107 +4156,42 @@ def main(cmd_line_args=None):
3574 4156 error=type(data).__name__, message=str(data))
3575 4157 continue
3576 4158  
3577   - try:
3578   - # close the previous file if analyzing several:
3579   - # (this must be done here to avoid closing the file if there is only 1,
3580   - # to fix issue #219)
3581   - if vba_parser is not None:
3582   - vba_parser.close()
3583   - # Open the file
3584   - vba_parser = VBA_Parser_CLI(filename, data=data, container=container,
3585   - relaxed=options.relaxed)
3586   -
3587   - if options.output_mode == 'detailed':
3588   - # fully detailed output
3589   - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
3590   - display_code=options.display_code,
3591   - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
3592   - show_deobfuscated_code=options.show_deobfuscated_code,
3593   - deobfuscate=options.deobfuscate)
3594   - elif options.output_mode in ('triage', 'unspecified'):
3595   - # print container name when it changes:
3596   - if container != previous_container:
3597   - if container is not None:
3598   - print('\nFiles in %s:' % container)
3599   - previous_container = container
3600   - # summarized output for triage:
3601   - vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings,
3602   - deobfuscate=options.deobfuscate)
3603   - elif options.output_mode == 'json':
3604   - print_json(
3605   - vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings,
3606   - display_code=options.display_code,
3607   - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
3608   - show_deobfuscated_code=options.show_deobfuscated_code,
3609   - deobfuscate=options.deobfuscate))
3610   - else: # (should be impossible)
3611   - raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode))
3612   - count += 1
3613   -
3614   - except (SubstreamOpenError, UnexpectedDataError) as exc:
3615   - if options.output_mode in ('triage', 'unspecified'):
3616   - print('%-12s %s - Error opening substream or uenxpected ' \
3617   - 'content' % ('?', filename))
3618   - elif options.output_mode == 'json':
3619   - print_json(file=filename, type='error',
3620   - error=type(exc).__name__, message=str(exc))
3621   - else:
3622   - log.exception('Error opening substream or unexpected '
3623   - 'content in %s' % filename)
3624   - return_code = RETURN_OPEN_ERROR if return_code == 0 \
3625   - else RETURN_SEVERAL_ERRS
3626   - except FileOpenError as exc:
3627   - if options.output_mode in ('triage', 'unspecified'):
3628   - print('%-12s %s - File format not supported' % ('?', filename))
3629   - elif options.output_mode == 'json':
3630   - print_json(file=filename, type='error',
3631   - error=type(exc).__name__, message=str(exc))
3632   - else:
3633   - log.exception('Failed to open %s -- probably not supported!' % filename)
3634   - return_code = RETURN_OPEN_ERROR if return_code == 0 \
3635   - else RETURN_SEVERAL_ERRS
3636   - except ProcessingError as exc:
3637   - if options.output_mode in ('triage', 'unspecified'):
3638   - print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc))
3639   - elif options.output_mode == 'json':
3640   - print_json(file=filename, type='error',
3641   - error=type(exc).__name__,
3642   - message=str(exc.orig_exc))
3643   - else:
3644   - log.exception('Error processing file %s (%s)!'
3645   - % (filename, exc.orig_exc))
3646   - return_code = RETURN_PARSE_ERROR if return_code == 0 \
3647   - else RETURN_SEVERAL_ERRS
3648   - except FileIsEncryptedError as exc:
3649   - if options.output_mode in ('triage', 'unspecified'):
3650   - print('%-12s %s - File is encrypted' % ('!ERROR', filename))
3651   - elif options.output_mode == 'json':
3652   - print_json(file=filename, type='error',
3653   - error=type(exc).__name__, message=str(exc))
3654   - else:
3655   - log.exception('File %s is encrypted!' % (filename))
3656   - return_code = RETURN_ENCRYPTED if return_code == 0 \
3657   - else RETURN_SEVERAL_ERRS
3658   - # Here we do not close the vba_parser, because process_file may need it below.
  4159 + if options.output_mode == 'triage':
  4160 + # print container name when it changes:
  4161 + if container != previous_container:
  4162 + if container is not None:
  4163 + print('\nFiles in %s:' % container)
  4164 + previous_container = container
  4165 +
  4166 + # process the file, handling errors and encryption
  4167 + curr_return_code = process_file(filename, data, container, options)
  4168 + count += 1
  4169 +
  4170 + # adjust overall return code
  4171 + if curr_return_code == RETURN_OK:
  4172 + continue # do not modify overall return code
  4173 + if return_code == RETURN_OK:
  4174 + return_code = curr_return_code # first error return code
  4175 + else:
  4176 + return_code = RETURN_SEVERAL_ERRS # several errors
3659 4177  
3660 4178 if options.output_mode == 'triage':
3661 4179 print('\n(Flags: OpX=OpenXML, XML=Word2003XML, FlX=FlatOPC XML, MHT=MHTML, TXT=Text, M=Macros, ' \
3662 4180 'A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, ' \
3663 4181 'B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown)\n')
3664 4182  
3665   - if count == 1 and options.output_mode == 'unspecified':
3666   - # if options -t, -d and -j were not specified and it's a single file, print details:
3667   - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
3668   - display_code=options.display_code,
3669   - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
3670   - show_deobfuscated_code=options.show_deobfuscated_code,
3671   - deobfuscate=options.deobfuscate)
3672   -
3673 4183 if options.output_mode == 'json':
3674 4184 # print last json entry (a last one without a comma) and closing ]
3675 4185 print_json(type='MetaInformation', return_code=return_code,
3676 4186 n_processed=count, _json_is_last=True)
3677 4187  
  4188 + except crypto.CryptoErrorBase as exc:
  4189 + log.exception('Problems with encryption in main: {}'.format(exc),
  4190 + exc_info=True)
  4191 + if return_code == RETURN_OK:
  4192 + return_code = RETURN_ENCRYPTED
  4193 + else:
  4194 + return_code == RETURN_SEVERAL_ERRS
3678 4195 except Exception as exc:
3679 4196 # some unexpected error, maybe some of the types caught in except clauses
3680 4197 # above were not sufficient. This is very bad, so log complete trace at exception level
... ...
oletools/olevba3.py
1 1 #!/usr/bin/env python
2   -"""
3   -olevba3.py
4 2  
5   -olevba is a script to parse OLE and OpenXML files such as MS Office documents
6   -(e.g. Word, Excel), to extract VBA Macro code in clear text, deobfuscate
7   -and analyze malicious macros.
  3 +# olevba3 is a stub that redirects to olevba.py, for backwards compatibility
8 4  
9   -olevba3 is the version of olevba that runs on Python 3.x.
  5 +import sys, os, warnings
10 6  
11   -Supported formats:
12   -- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
13   -- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
14   -- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
15   -- Word/PowerPoint 2007+ XML (aka Flat OPC)
16   -- Word 2003 XML (.xml)
17   -- Word/Excel Single File Web Page / MHTML (.mht)
18   -- Publisher (.pub)
19   -- raises an error if run with files encrypted using MS Crypto API RC4
20   -
21   -Author: Philippe Lagadec - http://www.decalage.info
22   -License: BSD, see source code or documentation
23   -
24   -olevba is part of the python-oletools package:
25   -http://www.decalage.info/python/oletools
26   -
27   -olevba is based on source code from officeparser by John William Davison
28   -https://github.com/unixfreak0037/officeparser
29   -"""
30   -
31   -# === LICENSE ==================================================================
32   -
33   -# olevba is copyright (c) 2014-2018 Philippe Lagadec (http://www.decalage.info)
34   -# All rights reserved.
35   -#
36   -# Redistribution and use in source and binary forms, with or without modification,
37   -# are permitted provided that the following conditions are met:
38   -#
39   -# * Redistributions of source code must retain the above copyright notice, this
40   -# list of conditions and the following disclaimer.
41   -# * Redistributions in binary form must reproduce the above copyright notice,
42   -# this list of conditions and the following disclaimer in the documentation
43   -# and/or other materials provided with the distribution.
44   -#
45   -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
46   -# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
47   -# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
48   -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
49   -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
50   -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
51   -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
52   -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
53   -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
54   -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
55   -
56   -
57   -# olevba contains modified source code from the officeparser project, published
58   -# under the following MIT License (MIT):
59   -#
60   -# officeparser is copyright (c) 2014 John William Davison
61   -#
62   -# Permission is hereby granted, free of charge, to any person obtaining a copy
63   -# of this software and associated documentation files (the "Software"), to deal
64   -# in the Software without restriction, including without limitation the rights
65   -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
66   -# copies of the Software, and to permit persons to whom the Software is
67   -# furnished to do so, subject to the following conditions:
68   -#
69   -# The above copyright notice and this permission notice shall be included in all
70   -# copies or substantial portions of the Software.
71   -#
72   -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
73   -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
74   -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
75   -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
76   -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
77   -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
78   -# SOFTWARE.
79   -
80   -from __future__ import print_function
81   -
82   -
83   -#------------------------------------------------------------------------------
84   -# CHANGELOG:
85   -# 2014-08-05 v0.01 PL: - first version based on officeparser code
86   -# 2014-08-14 v0.02 PL: - fixed bugs in code, added license from officeparser
87   -# 2014-08-15 PL: - fixed incorrect value check in projecthelpfilepath Record
88   -# 2014-08-15 v0.03 PL: - refactored extract_macros to support OpenXML formats
89   -# and to find the VBA project root anywhere in the file
90   -# 2014-11-29 v0.04 PL: - use olefile instead of OleFileIO_PL
91   -# 2014-12-05 v0.05 PL: - refactored most functions into a class, new API
92   -# - added detect_vba_macros
93   -# 2014-12-10 v0.06 PL: - hide first lines with VB attributes
94   -# - detect auto-executable macros
95   -# - ignore empty macros
96   -# 2014-12-14 v0.07 PL: - detect_autoexec() is now case-insensitive
97   -# 2014-12-15 v0.08 PL: - improved display for empty macros
98   -# - added pattern extraction
99   -# 2014-12-25 v0.09 PL: - added suspicious keywords detection
100   -# 2014-12-27 v0.10 PL: - added OptionParser, main and process_file
101   -# - uses xglob to scan several files with wildcards
102   -# - option -r to recurse subdirectories
103   -# - option -z to scan files in password-protected zips
104   -# 2015-01-02 v0.11 PL: - improved filter_vba to detect colons
105   -# 2015-01-03 v0.12 PL: - fixed detect_patterns to detect all patterns
106   -# - process_file: improved display, shows container file
107   -# - improved list of executable file extensions
108   -# 2015-01-04 v0.13 PL: - added several suspicious keywords, improved display
109   -# 2015-01-08 v0.14 PL: - added hex strings detection and decoding
110   -# - fixed issue #2, decoding VBA stream names using
111   -# specified codepage and unicode stream names
112   -# 2015-01-11 v0.15 PL: - added new triage mode, options -t and -d
113   -# 2015-01-16 v0.16 PL: - fix for issue #3 (exception when module name="text")
114   -# - added several suspicious keywords
115   -# - added option -i to analyze VBA source code directly
116   -# 2015-01-17 v0.17 PL: - removed .com from the list of executable extensions
117   -# - added scan_vba to run all detection algorithms
118   -# - decoded hex strings are now also scanned + reversed
119   -# 2015-01-23 v0.18 PL: - fixed issue #3, case-insensitive search in code_modules
120   -# 2015-01-24 v0.19 PL: - improved the detection of IOCs obfuscated with hex
121   -# strings and StrReverse
122   -# 2015-01-26 v0.20 PL: - added option --hex to show all hex strings decoded
123   -# 2015-01-29 v0.21 PL: - added Dridex obfuscation decoding
124   -# - improved display, shows obfuscation name
125   -# 2015-02-01 v0.22 PL: - fixed issue #4: regex for URL, e-mail and exe filename
126   -# - added Base64 obfuscation decoding (contribution from
127   -# @JamesHabben)
128   -# 2015-02-03 v0.23 PL: - triage now uses VBA_Scanner results, shows Base64 and
129   -# Dridex strings
130   -# - exception handling in detect_base64_strings
131   -# 2015-02-07 v0.24 PL: - renamed option --hex to --decode, fixed display
132   -# - display exceptions with stack trace
133   -# - added several suspicious keywords
134   -# - improved Base64 detection and decoding
135   -# - fixed triage mode not to scan attrib lines
136   -# 2015-03-04 v0.25 PL: - added support for Word 2003 XML
137   -# 2015-03-22 v0.26 PL: - added suspicious keywords for sandboxing and
138   -# virtualisation detection
139   -# 2015-05-06 v0.27 PL: - added support for MHTML files with VBA macros
140   -# (issue #10 reported by Greg from SpamStopsHere)
141   -# 2015-05-24 v0.28 PL: - improved support for MHTML files with modified header
142   -# (issue #11 reported by Thomas Chopitea)
143   -# 2015-05-26 v0.29 PL: - improved MSO files parsing, taking into account
144   -# various data offsets (issue #12)
145   -# - improved detection of MSO files, avoiding incorrect
146   -# parsing errors (issue #7)
147   -# 2015-05-29 v0.30 PL: - added suspicious keywords suggested by @ozhermit,
148   -# Davy Douhine (issue #9), issue #13
149   -# 2015-06-16 v0.31 PL: - added generic VBA expression deobfuscation (chr,asc,etc)
150   -# 2015-06-19 PL: - added options -a, -c, --each, --attr
151   -# 2015-06-21 v0.32 PL: - always display decoded strings which are printable
152   -# - fix VBA_Scanner.scan to return raw strings, not repr()
153   -# 2015-07-09 v0.40 PL: - removed usage of sys.stderr which causes issues
154   -# 2015-07-12 PL: - added Hex function decoding to VBA Parser
155   -# 2015-07-13 PL: - added Base64 function decoding to VBA Parser
156   -# 2015-09-06 PL: - improved VBA_Parser, refactored the main functions
157   -# 2015-09-13 PL: - moved main functions to a class VBA_Parser_CLI
158   -# - fixed issue when analysis was done twice
159   -# 2015-09-15 PL: - remove duplicate IOCs from results
160   -# 2015-09-16 PL: - join long VBA lines ending with underscore before scan
161   -# - disabled unused option --each
162   -# 2015-09-22 v0.41 PL: - added new option --reveal
163   -# - added suspicious strings for PowerShell.exe options
164   -# 2015-10-09 v0.42 PL: - VBA_Parser: split each format into a separate method
165   -# 2015-10-10 PL: - added support for text files with VBA source code
166   -# 2015-11-17 PL: - fixed bug with --decode option
167   -# 2015-12-16 PL: - fixed bug in main (no options input anymore)
168   -# - improved logging, added -l option
169   -# 2016-01-31 PL: - fixed issue #31 in VBA_Parser.open_mht
170   -# - fixed issue #32 by monkeypatching email.feedparser
171   -# 2016-02-07 PL: - KeyboardInterrupt is now raised properly
172   -# 2016-02-20 v0.43 PL: - fixed issue #34 in the VBA parser and vba_chr
173   -# 2016-02-29 PL: - added Workbook_Activate to suspicious keywords
174   -# 2016-03-08 v0.44 PL: - added VBA Form strings extraction and analysis
175   -# 2016-03-04 v0.45 CH: - added JSON output (by Christian Herdtweck)
176   -# 2016-03-16 CH: - added option --no-deobfuscate (temporary)
177   -# 2016-04-19 v0.46 PL: - new option --deobf instead of --no-deobfuscate
178   -# - updated suspicious keywords
179   -# 2016-05-04 v0.47 PL: - look for VBA code in any stream including orphans
180   -# 2016-04-28 CH: - return an exit code depending on the results
181   -# - improved error and exception handling
182   -# - improved JSON output
183   -# 2016-05-12 CH: - added support for PowerPoint 97-2003 files
184   -# 2016-06-06 CH: - improved handling of unicode VBA module names
185   -# 2016-06-07 CH: - added option --relaxed, stricter parsing by default
186   -# 2016-06-12 v0.50 PL: - fixed small bugs in VBA parsing code
187   -# 2016-07-01 PL: - fixed issue #58 with format() to support Python 2.6
188   -# 2016-07-29 CH: - fixed several bugs including #73 (Mac Roman encoding)
189   -# 2016-08-31 PL: - added autoexec keyword InkPicture_Painted
190   -# - detect_autoexec now returns the exact keyword found
191   -# 2016-09-05 PL: - added autoexec keywords for MS Publisher (.pub)
192   -# 2016-09-06 PL: - fixed issue #20, is_zipfile on Python 2.6
193   -# 2016-09-12 PL: - enabled packrat to improve pyparsing performance
194   -# 2016-10-25 PL: - fixed raise and print statements for Python 3
195   -# 2016-11-03 v0.51 PL: - added EnumDateFormats and EnumSystemLanguageGroupsW
196   -# 2017-02-07 PL: - temporary fix for issue #132
197   -# - added keywords for Mac-specific macros (issue #130)
198   -# 2017-03-08 PL: - fixed absolute imports
199   -# 2017-03-16 PL: - fixed issues #148 and #149 for option --reveal
200   -# 2017-05-19 PL: - added enable_logging to fix issue #154
201   -# 2017-05-31 c1fe: - PR #135 fixing issue #132 for some Mac files
202   -# 2017-06-08 PL: - fixed issue #122 Chr() with negative numbers
203   -# 2017-06-15 PL: - deobfuscation line by line to handle large files
204   -# 2017-07-11 v0.52 PL: - raise exception instead of sys.exit (issue #180)
205   -# 2018-03-19 PL: - removed pyparsing from the thirdparty subfolder
206   -# 2018-05-13 v0.53 PL: - added support for Word/PowerPoint 2007+ XML (FlatOPC)
207   -# (issue #283)
208   -# 2018-06-11 v0.53.1 MHW: - fixed #320: chr instead of unichr on python 3
209   -# 2018-06-12 MHW: - fixed #322: import reduce from functools
210   -# 2018-09-11 v0.54 PL: - olefile is now a dependency
211   -# 2018-10-25 CH: - detect encryption and raise error if detected
212   -
213   -__version__ = '0.54dev4'
214   -
215   -#------------------------------------------------------------------------------
216   -# TODO:
217   -# + setup logging (common with other oletools)
218   -# + add xor bruteforcing like bbharvest
219   -# + options -a and -c should imply -d
220   -
221   -# TODO later:
222   -# + performance improvement: instead of searching each keyword separately,
223   -# first split vba code into a list of words (per line), then check each
224   -# word against a dict. (or put vba words into a set/dict?)
225   -# + for regex, maybe combine them into a single re with named groups?
226   -# + add Yara support, include sample rules? plugins like balbuzard?
227   -# + add balbuzard support
228   -# + output to file (replace print by file.write, sys.stdout by default)
229   -# + look for VBA in embedded documents (e.g. Excel in Word)
230   -# + support SRP streams (see Lenny's article + links and sample)
231   -# - python 3.x support
232   -# - check VBA macros in Visio, Access, Project, etc
233   -# - extract_macros: convert to a class, split long function into smaller methods
234   -# - extract_macros: read bytes from stream file objects instead of strings
235   -# - extract_macros: use combined struct.unpack instead of many calls
236   -# - all except clauses should target specific exceptions
237   -
238   -#------------------------------------------------------------------------------
239   -# REFERENCES:
240   -# - [MS-OVBA]: Microsoft Office VBA File Format Structure
241   -# http://msdn.microsoft.com/en-us/library/office/cc313094%28v=office.12%29.aspx
242   -# - officeparser: https://github.com/unixfreak0037/officeparser
243   -
244   -
245   -#--- IMPORTS ------------------------------------------------------------------
246   -
247   -import sys
248   -import os
249   -import logging
250   -import struct
251   -from _io import StringIO,BytesIO
252   -import math
253   -import zipfile
254   -import re
255   -import optparse
256   -import binascii
257   -import base64
258   -import zlib
259   -import email # for MHTML parsing
260   -import string # for printable
261   -import json # for json output mode (argument --json)
262   -from functools import reduce
263   -
264   -# import lxml or ElementTree for XML parsing:
265   -try:
266   - # lxml: best performance for XML processing
267   - import lxml.etree as ET
268   -except ImportError:
269   - try:
270   - # Python 2.5+: batteries included
271   - import xml.etree.cElementTree as ET
272   - except ImportError:
273   - try:
274   - # Python <2.5: standalone ElementTree install
275   - import elementtree.cElementTree as ET
276   - except ImportError:
277   - raise ImportError("lxml or ElementTree are not installed, " \
278   - + "see http://codespeak.net/lxml " \
279   - + "or http://effbot.org/zone/element-index.htm")
  7 +warnings.warn('olevba3 is deprecated, olevba should be used instead.', DeprecationWarning)
280 8  
281 9 # IMPORTANT: it should be possible to run oletools directly as scripts
282 10 # in any directory without installing them with pip or setup.py.
... ... @@ -284,3374 +12,13 @@ except ImportError:
284 12 # And to enable Python 2+3 compatibility, we need to use absolute imports,
285 13 # so we add the oletools parent folder to sys.path (absolute+normalized path):
286 14 _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
287   -# print('_thismodule_dir = %r' % _thismodule_dir)
288 15 _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
289   -# print('_parent_dir = %r' % _thirdparty_dir)
290   -if not _parent_dir in sys.path:
  16 +if _parent_dir not in sys.path:
291 17 sys.path.insert(0, _parent_dir)
292 18  
293   -import olefile
294   -from oletools.thirdparty.prettytable import prettytable
295   -from oletools.thirdparty.xglob import xglob, PathNotFoundException
296   -from pyparsing import \
297   - CaselessKeyword, CaselessLiteral, Combine, Forward, Literal, \
298   - Optional, QuotedString,Regex, Suppress, Word, WordStart, \
299   - alphanums, alphas, hexnums,nums, opAssoc, srange, \
300   - infixNotation, ParserElement
301   -import oletools.ppt_parser as ppt_parser
302   -from oletools import rtfobj
303   -from oletools import oleid
304   -from oletools.common.errors import FileIsEncryptedError
305   -
306   -# monkeypatch email to fix issue #32:
307   -# allow header lines without ":"
308   -import email.feedparser
309   -email.feedparser.headerRE = re.compile(r'^(From |[\041-\071\073-\176]{1,}:?|[\t ])')
310   -
311   -# === PYTHON 2+3 SUPPORT ======================================================
312   -
313   -if sys.version_info[0] <= 2:
314   - # Python 2.x
315   - if sys.version_info[1] <= 6:
316   - # Python 2.6
317   - # use is_zipfile backported from Python 2.7:
318   - from thirdparty.zipfile27 import is_zipfile
319   - else:
320   - # Python 2.7
321   - from zipfile import is_zipfile
322   -else:
323   - # Python 3.x+
324   - from zipfile import is_zipfile
325   - # xrange is now called range:
326   - xrange = range
327   -
328   -
329   -# === PYTHON 3.0 - 3.4 SUPPORT ======================================================
330   -
331   -# From https://gist.github.com/ynkdir/867347/c5e188a4886bc2dd71876c7e069a7b00b6c16c61
332   -
333   -if sys.version_info >= (3, 0) and sys.version_info < (3, 5):
334   - import codecs
335   -
336   - _backslashreplace_errors = codecs.lookup_error("backslashreplace")
337   -
338   - def backslashreplace_errors(exc):
339   - if isinstance(exc, UnicodeDecodeError):
340   - u = "".join("\\x{0:02x}".format(c) for c in exc.object[exc.start:exc.end])
341   - return (u, exc.end)
342   - return _backslashreplace_errors(exc)
343   -
344   - codecs.register_error("backslashreplace", backslashreplace_errors)
345   -
346   -
347   -# === LOGGING =================================================================
348   -
349   -class NullHandler(logging.Handler):
350   - """
351   - Log Handler without output, to avoid printing messages if logging is not
352   - configured by the main application.
353   - Python 2.7 has logging.NullHandler, but this is necessary for 2.6:
354   - see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library
355   - """
356   - def emit(self, record):
357   - pass
358   -
359   -def get_logger(name, level=logging.CRITICAL+1):
360   - """
361   - Create a suitable logger object for this module.
362   - The goal is not to change settings of the root logger, to avoid getting
363   - other modules' logs on the screen.
364   - If a logger exists with same name, reuse it. (Else it would have duplicate
365   - handlers and messages would be doubled.)
366   - The level is set to CRITICAL+1 by default, to avoid any logging.
367   - """
368   - # First, test if there is already a logger with the same name, else it
369   - # will generate duplicate messages (due to duplicate handlers):
370   - if name in logging.Logger.manager.loggerDict:
371   - #NOTE: another less intrusive but more "hackish" solution would be to
372   - # use getLogger then test if its effective level is not default.
373   - logger = logging.getLogger(name)
374   - # make sure level is OK:
375   - logger.setLevel(level)
376   - return logger
377   - # get a new logger:
378   - logger = logging.getLogger(name)
379   - # only add a NullHandler for this logger, it is up to the application
380   - # to configure its own logging:
381   - logger.addHandler(NullHandler())
382   - logger.setLevel(level)
383   - return logger
384   -
385   -# a global logger object used for debugging:
386   -log = get_logger('olevba')
387   -
388   -
389   -def enable_logging():
390   - """
391   - Enable logging for this module (disabled by default).
392   - This will set the module-specific logger level to NOTSET, which
393   - means the main application controls the actual logging level.
394   - """
395   - log.setLevel(logging.NOTSET)
396   - # Also enable logging in the ppt_parser module:
397   - ppt_parser.enable_logging()
398   -
399   -
400   -
401   -#=== EXCEPTIONS ==============================================================
402   -
403   -class OlevbaBaseException(Exception):
404   - """ Base class for exceptions produced here for simpler except clauses """
405   - def __init__(self, msg, filename=None, orig_exc=None, **kwargs):
406   - if orig_exc:
407   - super(OlevbaBaseException, self).__init__(msg +
408   - ' ({0})'.format(orig_exc),
409   - **kwargs)
410   - else:
411   - super(OlevbaBaseException, self).__init__(msg, **kwargs)
412   - self.msg = msg
413   - self.filename = filename
414   - self.orig_exc = orig_exc
415   -
416   -
417   -class FileOpenError(OlevbaBaseException):
418   - """ raised by VBA_Parser constructor if all open_... attempts failed
419   -
420   - probably means the file type is not supported
421   - """
422   -
423   - def __init__(self, filename, orig_exc=None):
424   - super(FileOpenError, self).__init__(
425   - 'Failed to open file %s' % filename, filename, orig_exc)
426   -
427   -
428   -class ProcessingError(OlevbaBaseException):
429   - """ raised by VBA_Parser.process_file* functions """
430   -
431   - def __init__(self, filename, orig_exc):
432   - super(ProcessingError, self).__init__(
433   - 'Error processing file %s' % filename, filename, orig_exc)
434   -
435   -
436   -class MsoExtractionError(RuntimeError, OlevbaBaseException):
437   - """ raised by mso_file_extract if parsing MSO/ActiveMIME data failed """
438   -
439   - def __init__(self, msg):
440   - MsoExtractionError.__init__(self, msg)
441   - OlevbaBaseException.__init__(self, msg)
442   -
443   -
444   -class SubstreamOpenError(FileOpenError):
445   - """ special kind of FileOpenError: file is a substream of original file """
446   -
447   - def __init__(self, filename, subfilename, orig_exc=None):
448   - super(SubstreamOpenError, self).__init__(
449   - str(filename) + '/' + str(subfilename), orig_exc)
450   - self.filename = filename # overwrite setting in OlevbaBaseException
451   - self.subfilename = subfilename
452   -
453   -
454   -class UnexpectedDataError(OlevbaBaseException):
455   - """ raised when parsing is strict (=not relaxed) and data is unexpected """
456   -
457   - def __init__(self, stream_path, variable, expected, value):
458   - if isinstance(expected, int):
459   - es = '{0:04X}'.format(expected)
460   - elif isinstance(expected, tuple):
461   - es = ','.join('{0:04X}'.format(e) for e in expected)
462   - es = '({0})'.format(es)
463   - else:
464   - raise ValueError('Unknown type encountered: {0}'.format(type(expected)))
465   - super(UnexpectedDataError, self).__init__(
466   - 'Unexpected value in {0} for variable {1}: '
467   - 'expected {2} but found {3:04X}!'
468   - .format(stream_path, variable, es, value))
469   - self.stream_path = stream_path
470   - self.variable = variable
471   - self.expected = expected
472   - self.value = value
473   -
474   -#--- CONSTANTS ----------------------------------------------------------------
475   -
476   -# return codes
477   -RETURN_OK = 0
478   -RETURN_WARNINGS = 1 # (reserved, not used yet)
479   -RETURN_WRONG_ARGS = 2 # (fixed, built into optparse)
480   -RETURN_FILE_NOT_FOUND = 3
481   -RETURN_XGLOB_ERR = 4
482   -RETURN_OPEN_ERROR = 5
483   -RETURN_PARSE_ERROR = 6
484   -RETURN_SEVERAL_ERRS = 7
485   -RETURN_UNEXPECTED = 8
486   -RETURN_ENCRYPTED = 9
487   -
488   -# MAC codepages (from http://stackoverflow.com/questions/1592925/decoding-mac-os-text-in-python)
489   -MAC_CODEPAGES = {
490   - 10000: 'mac-roman',
491   - 10001: 'shiftjis', # not found: 'mac-shift-jis',
492   - 10003: 'ascii', # nothing appropriate found: 'mac-hangul',
493   - 10008: 'gb2321', # not found: 'mac-gb2312',
494   - 10002: 'big5', # not found: 'mac-big5',
495   - 10005: 'hebrew', # not found: 'mac-hebrew',
496   - 10004: 'mac-arabic',
497   - 10006: 'mac-greek',
498   - 10081: 'mac-turkish',
499   - 10021: 'thai', # not found: mac-thai',
500   - 10029: 'maccentraleurope', # not found: 'mac-east europe',
501   - 10007: 'ascii', # nothing appropriate found: 'mac-russian',
502   -}
503   -
504   -# URL and message to report issues:
505   -URL_OLEVBA_ISSUES = 'https://github.com/decalage2/oletools/issues'
506   -MSG_OLEVBA_ISSUES = 'Please report this issue on %s' % URL_OLEVBA_ISSUES
507   -
508   -# Container types:
509   -TYPE_OLE = 'OLE'
510   -TYPE_OpenXML = 'OpenXML'
511   -TYPE_FlatOPC_XML = 'FlatOPC_XML'
512   -TYPE_Word2003_XML = 'Word2003_XML'
513   -TYPE_MHTML = 'MHTML'
514   -TYPE_TEXT = 'Text'
515   -TYPE_PPT = 'PPT'
516   -
517   -# short tag to display file types in triage mode:
518   -TYPE2TAG = {
519   - TYPE_OLE: 'OLE:',
520   - TYPE_OpenXML: 'OpX:',
521   - TYPE_FlatOPC_XML: 'FlX:',
522   - TYPE_Word2003_XML: 'XML:',
523   - TYPE_MHTML: 'MHT:',
524   - TYPE_TEXT: 'TXT:',
525   - TYPE_PPT: 'PPT',
526   -}
527   -
528   -
529   -# MSO files ActiveMime header magic
530   -MSO_ACTIVEMIME_HEADER = b'ActiveMime'
531   -
532   -MODULE_EXTENSION = "bas"
533   -CLASS_EXTENSION = "cls"
534   -FORM_EXTENSION = "frm"
535   -
536   -# Namespaces and tags for Word2003 XML parsing:
537   -NS_W = '{http://schemas.microsoft.com/office/word/2003/wordml}'
538   -# the tag <w:binData w:name="editdata.mso"> contains the VBA macro code:
539   -TAG_BINDATA = NS_W + 'binData'
540   -ATTR_NAME = NS_W + 'name'
541   -
542   -# Namespaces and tags for Word/PowerPoint 2007+ XML parsing:
543   -# root: <pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage">
544   -NS_XMLPACKAGE = '{http://schemas.microsoft.com/office/2006/xmlPackage}'
545   -TAG_PACKAGE = NS_XMLPACKAGE + 'package'
546   -# the tag <pkg:part> includes <pkg:binaryData> that contains the VBA macro code in Base64:
547   -# <pkg:part pkg:name="/word/vbaProject.bin" pkg:contentType="application/vnd.ms-office.vbaProject"><pkg:binaryData>
548   -TAG_PKGPART = NS_XMLPACKAGE + 'part'
549   -ATTR_PKG_NAME = NS_XMLPACKAGE + 'name'
550   -ATTR_PKG_CONTENTTYPE = NS_XMLPACKAGE + 'contentType'
551   -CTYPE_VBAPROJECT = "application/vnd.ms-office.vbaProject"
552   -TAG_PKGBINDATA = NS_XMLPACKAGE + 'binaryData'
553   -
554   -# Keywords to detect auto-executable macros
555   -AUTOEXEC_KEYWORDS = {
556   - # MS Word:
557   - 'Runs when the Word document is opened':
558   - ('AutoExec', 'AutoOpen', 'DocumentOpen'),
559   - 'Runs when the Word document is closed':
560   - ('AutoExit', 'AutoClose', 'Document_Close', 'DocumentBeforeClose'),
561   - 'Runs when the Word document is modified':
562   - ('DocumentChange',),
563   - 'Runs when a new Word document is created':
564   - ('AutoNew', 'Document_New', 'NewDocument'),
565   -
566   - # MS Word and Publisher:
567   - 'Runs when the Word or Publisher document is opened':
568   - ('Document_Open',),
569   - 'Runs when the Publisher document is closed':
570   - ('Document_BeforeClose',),
571   -
572   - # MS Excel:
573   - 'Runs when the Excel Workbook is opened':
574   - ('Auto_Open', 'Workbook_Open', 'Workbook_Activate'),
575   - 'Runs when the Excel Workbook is closed':
576   - ('Auto_Close', 'Workbook_Close'),
577   -
578   - # any MS Office application:
579   - 'Runs when the file is opened (using InkPicture ActiveX object)':
580   - # ref:https://twitter.com/joe4security/status/770691099988025345
581   - (r'\w+_Painted',),
582   - 'Runs when the file is opened and ActiveX objects trigger events':
583   - (r'\w+_(?:GotFocus|LostFocus|MouseHover)',),
584   -}
585   -
586   -# Suspicious Keywords that may be used by malware
587   -# See VBA language reference: http://msdn.microsoft.com/en-us/library/office/jj692818%28v=office.15%29.aspx
588   -SUSPICIOUS_KEYWORDS = {
589   - #TODO: use regex to support variable whitespaces
590   - 'May read system environment variables':
591   - ('Environ',),
592   - 'May open a file':
593   - ('Open',),
594   - 'May write to a file (if combined with Open)':
595   - #TODO: regex to find Open+Write on same line
596   - ('Write', 'Put', 'Output', 'Print #'),
597   - 'May read or write a binary file (if combined with Open)':
598   - #TODO: regex to find Open+Binary on same line
599   - ('Binary',),
600   - 'May copy a file':
601   - ('FileCopy', 'CopyFile'),
602   - #FileCopy: http://msdn.microsoft.com/en-us/library/office/gg264390%28v=office.15%29.aspx
603   - #CopyFile: http://msdn.microsoft.com/en-us/library/office/gg264089%28v=office.15%29.aspx
604   - 'May delete a file':
605   - ('Kill',),
606   - 'May create a text file':
607   - ('CreateTextFile', 'ADODB.Stream', 'WriteText', 'SaveToFile'),
608   - #CreateTextFile: http://msdn.microsoft.com/en-us/library/office/gg264617%28v=office.15%29.aspx
609   - #ADODB.Stream sample: http://pastebin.com/Z4TMyuq6
610   - 'May run an executable file or a system command':
611   - ('Shell', 'vbNormal', 'vbNormalFocus', 'vbHide', 'vbMinimizedFocus', 'vbMaximizedFocus', 'vbNormalNoFocus',
612   - 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute'),
613   - # MacScript: see https://msdn.microsoft.com/en-us/library/office/gg264812.aspx
614   - 'May run an executable file or a system command on a Mac':
615   - ('MacScript',),
616   - 'May run an executable file or a system command on a Mac (if combined with libc.dylib)':
617   - ('system', 'popen', r'exec[lv][ep]?'),
618   - #Shell: http://msdn.microsoft.com/en-us/library/office/gg278437%28v=office.15%29.aspx
619   - #WScript.Shell+Run sample: http://pastebin.com/Z4TMyuq6
620   - 'May run PowerShell commands':
621   - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
622   - #also: https://bitbucket.org/decalage/oletools/issues/14/olevba-library-update-ioc
623   - # ref: https://blog.netspi.com/15-ways-to-bypass-the-powershell-execution-policy/
624   - # TODO: add support for keywords starting with a non-alpha character, such as "-noexit"
625   - # TODO: '-command', '-EncodedCommand', '-scriptblock'
626   - ('PowerShell', 'noexit', 'ExecutionPolicy', 'noprofile', 'command', 'EncodedCommand',
627   - 'invoke-command', 'scriptblock', 'Invoke-Expression', 'AuthorizationManager'),
628   - 'May run an executable file or a system command using PowerShell':
629   - ('Start-Process',),
630   - 'May hide the application':
631   - ('Application.Visible', 'ShowWindow', 'SW_HIDE'),
632   - 'May create a directory':
633   - ('MkDir',),
634   - 'May save the current workbook':
635   - ('ActiveWorkbook.SaveAs',),
636   - 'May change which directory contains files to open at startup':
637   - #TODO: confirm the actual effect
638   - ('Application.AltStartupPath',),
639   - 'May create an OLE object':
640   - ('CreateObject',),
641   - 'May create an OLE object using PowerShell':
642   - ('New-Object',),
643   - 'May run an application (if combined with CreateObject)':
644   - ('Shell.Application',),
645   - 'May enumerate application windows (if combined with Shell.Application object)':
646   - ('Windows', 'FindWindow'),
647   - 'May run code from a DLL':
648   - #TODO: regex to find declare+lib on same line - see mraptor
649   - ('Lib',),
650   - 'May run code from a library on a Mac':
651   - #TODO: regex to find declare+lib on same line - see mraptor
652   - ('libc.dylib', 'dylib'),
653   - 'May inject code into another process':
654   - ('CreateThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload
655   - 'VirtualAllocEx', 'RtlMoveMemory',
656   - ),
657   - 'May run a shellcode in memory':
658   - ('EnumSystemLanguageGroupsW?', # Used by Hancitor in Oct 2016
659   - 'EnumDateFormats(?:W|(?:Ex){1,2})?'), # see https://msdn.microsoft.com/en-us/library/windows/desktop/dd317810(v=vs.85).aspx
660   - 'May download files from the Internet':
661   - #TODO: regex to find urlmon+URLDownloadToFileA on same line
662   - ('URLDownloadToFileA', 'Msxml2.XMLHTTP', 'Microsoft.XMLHTTP',
663   - 'MSXML2.ServerXMLHTTP', # suggested in issue #13
664   - 'User-Agent', # sample from @ozhermit: http://pastebin.com/MPc3iV6z
665   - ),
666   - 'May download files from the Internet using PowerShell':
667   - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
668   - ('Net.WebClient', 'DownloadFile', 'DownloadString'),
669   - 'May control another application by simulating user keystrokes':
670   - ('SendKeys', 'AppActivate'),
671   - #SendKeys: http://msdn.microsoft.com/en-us/library/office/gg278655%28v=office.15%29.aspx
672   - 'May attempt to obfuscate malicious function calls':
673   - ('CallByName',),
674   - #CallByName: http://msdn.microsoft.com/en-us/library/office/gg278760%28v=office.15%29.aspx
675   - 'May attempt to obfuscate specific strings (use option --deobf to deobfuscate)':
676   - #TODO: regex to find several Chr*, not just one
677   - ('Chr', 'ChrB', 'ChrW', 'StrReverse', 'Xor'),
678   - #Chr: http://msdn.microsoft.com/en-us/library/office/gg264465%28v=office.15%29.aspx
679   - 'May read or write registry keys':
680   - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
681   - ('RegOpenKeyExA', 'RegOpenKeyEx', 'RegCloseKey'),
682   - 'May read registry keys':
683   - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
684   - ('RegQueryValueExA', 'RegQueryValueEx',
685   - 'RegRead', #with Wscript.Shell
686   - ),
687   - 'May detect virtualization':
688   - # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
689   - (r'SYSTEM\ControlSet001\Services\Disk\Enum', 'VIRTUAL', 'VMWARE', 'VBOX'),
690   - 'May detect Anubis Sandbox':
691   - # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
692   - # NOTES: this sample also checks App.EXEName but that seems to be a bug, it works in VB6 but not in VBA
693   - # ref: http://www.syssec-project.eu/m/page-media/3/disarm-raid11.pdf
694   - ('GetVolumeInformationA', 'GetVolumeInformation', # with kernel32.dll
695   - '1824245000', r'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProductId',
696   - '76487-337-8429955-22614', 'andy', 'sample', r'C:\exec\exec.exe', 'popupkiller'
697   - ),
698   - 'May detect Sandboxie':
699   - # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/
700   - # ref: http://www.cplusplus.com/forum/windows/96874/
701   - ('SbieDll.dll', 'SandboxieControlWndClass'),
702   - 'May detect Sunbelt Sandbox':
703   - # ref: http://www.cplusplus.com/forum/windows/96874/
704   - (r'C:\file.exe',),
705   - 'May detect Norman Sandbox':
706   - # ref: http://www.cplusplus.com/forum/windows/96874/
707   - ('currentuser',),
708   - 'May detect CW Sandbox':
709   - # ref: http://www.cplusplus.com/forum/windows/96874/
710   - ('Schmidti',),
711   - 'May detect WinJail Sandbox':
712   - # ref: http://www.cplusplus.com/forum/windows/96874/
713   - ('Afx:400000:0',),
714   -}
715   -
716   -# Regular Expression for a URL:
717   -# http://en.wikipedia.org/wiki/Uniform_resource_locator
718   -# http://www.w3.org/Addressing/URL/uri-spec.html
719   -#TODO: also support username:password@server
720   -#TODO: other protocols (file, gopher, wais, ...?)
721   -SCHEME = r'\b(?:http|ftp)s?'
722   -# see http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains
723   -TLD = r'(?:xn--[a-zA-Z0-9]{4,20}|[a-zA-Z]{2,20})'
724   -DNS_NAME = r'(?:[a-zA-Z0-9\-\.]+\.' + TLD + ')'
725   -#TODO: IPv6 - see https://www.debuggex.com/
726   -# A literal numeric IPv6 address may be given, but must be enclosed in [ ] e.g. [db8:0cec::99:123a]
727   -NUMBER_0_255 = r'(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])'
728   -IPv4 = r'(?:' + NUMBER_0_255 + r'\.){3}' + NUMBER_0_255
729   -# IPv4 must come before the DNS name because it is more specific
730   -SERVER = r'(?:' + IPv4 + '|' + DNS_NAME + ')'
731   -PORT = r'(?:\:[0-9]{1,5})?'
732   -SERVER_PORT = SERVER + PORT
733   -URL_PATH = r'(?:/[a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~]*)?' # [^\.\,\)\(\s"]
734   -URL_RE = SCHEME + r'\://' + SERVER_PORT + URL_PATH
735   -re_url = re.compile(URL_RE)
736   -
737   -
738   -# Patterns to be extracted (IP addresses, URLs, etc)
739   -# From patterns.py in balbuzard
740   -RE_PATTERNS = (
741   - ('URL', re.compile(URL_RE)),
742   - ('IPv4 address', re.compile(IPv4)),
743   - # TODO: add IPv6
744   - ('E-mail address', re.compile(r'(?i)\b[A-Z0-9._%+-]+@' + SERVER + '\b')),
745   - # ('Domain name', re.compile(r'(?=^.{1,254}$)(^(?:(?!\d+\.|-)[a-zA-Z0-9_\-]{1,63}(?<!-)\.?)+(?:[a-zA-Z]{2,})$)')),
746   - # Executable file name with known extensions (except .com which is present in many URLs, and .application):
747   - ("Executable file name", re.compile(
748   - r"(?i)\b\w+\.(EXE|PIF|GADGET|MSI|MSP|MSC|VBS|VBE|VB|JSE|JS|WSF|WSC|WSH|WS|BAT|CMD|DLL|SCR|HTA|CPL|CLASS|JAR|PS1XML|PS1|PS2XML|PS2|PSC1|PSC2|SCF|LNK|INF|REG)\b")),
749   - # Sources: http://www.howtogeek.com/137270/50-file-extensions-that-are-potentially-dangerous-on-windows/
750   - # TODO: https://support.office.com/en-us/article/Blocked-attachments-in-Outlook-3811cddc-17c3-4279-a30c-060ba0207372#__attachment_file_types
751   - # TODO: add win & unix file paths
752   - #('Hex string', re.compile(r'(?:[0-9A-Fa-f]{2}){4,}')),
753   -)
754   -
755   -# regex to detect strings encoded in hexadecimal
756   -re_hex_string = re.compile(r'(?:[0-9A-Fa-f]{2}){4,}')
757   -
758   -# regex to detect strings encoded in base64
759   -#re_base64_string = re.compile(r'"(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?"')
760   -# better version from balbuzard, less false positives:
761   -# (plain version without double quotes, used also below in quoted_base64_string)
762   -BASE64_RE = r'(?:[A-Za-z0-9+/]{4}){1,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[A-Za-z0-9+/][AQgw]==)?'
763   -re_base64_string = re.compile('"' + BASE64_RE + '"')
764   -# white list of common strings matching the base64 regex, but which are not base64 strings (all lowercase):
765   -BASE64_WHITELIST = set(['thisdocument', 'thisworkbook', 'test', 'temp', 'http', 'open', 'exit'])
766   -
767   -# regex to detect strings encoded with a specific Dridex algorithm
768   -# (see https://github.com/JamesHabben/MalwareStuff)
769   -re_dridex_string = re.compile(r'"[0-9A-Za-z]{20,}"')
770   -# regex to check that it is not just a hex string:
771   -re_nothex_check = re.compile(r'[G-Zg-z]')
772   -
773   -# regex to extract printable strings (at least 5 chars) from VBA Forms:
774   -re_printable_string = re.compile(b'[\\t\\r\\n\\x20-\\xFF]{5,}')
775   -
776   -
777   -# === PARTIAL VBA GRAMMAR ====================================================
778   -
779   -# REFERENCES:
780   -# - [MS-VBAL]: VBA Language Specification
781   -# https://msdn.microsoft.com/en-us/library/dd361851.aspx
782   -# - pyparsing: http://pyparsing.wikispaces.com/
783   -
784   -# TODO: set whitespaces according to VBA
785   -# TODO: merge extended lines before parsing
786   -
787   -# Enable PackRat for better performance:
788   -# (see https://pythonhosted.org/pyparsing/pyparsing.ParserElement-class.html#enablePackrat)
789   -ParserElement.enablePackrat()
790   -
791   -# VBA identifier chars (from MS-VBAL 3.3.5)
792   -vba_identifier_chars = alphanums + '_'
793   -
794   -class VbaExpressionString(str):
795   - """
796   - Class identical to str, used to distinguish plain strings from strings
797   - obfuscated using VBA expressions (Chr, StrReverse, etc)
798   - Usage: each VBA expression parse action should convert strings to
799   - VbaExpressionString.
800   - Then isinstance(s, VbaExpressionString) is True only for VBA expressions.
801   - (see detect_vba_strings)
802   - """
803   - # TODO: use Unicode everywhere instead of str
804   - pass
805   -
806   -
807   -# --- NUMBER TOKENS ----------------------------------------------------------
808   -
809   -# 3.3.2 Number Tokens
810   -# INTEGER = integer-literal ["%" / "&" / "^"]
811   -# integer-literal = decimal-literal / octal-literal / hex-literal
812   -# decimal-literal = 1*decimal-digit
813   -# octal-literal = "&" [%x004F / %x006F] 1*octal-digit
814   -# ; & or &o or &O
815   -# hex-literal = "&" (%x0048 / %x0068) 1*hex-digit
816   -# ; &h or &H
817   -# octal-digit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7"
818   -# decimal-digit = octal-digit / "8" / "9"
819   -# hex-digit = decimal-digit / %x0041-0046 / %x0061-0066 ;A-F / a-f
820   -
821   -# NOTE: here Combine() is required to avoid spaces between elements
822   -# NOTE: here WordStart is necessary to avoid matching a number preceded by
823   -# letters or underscore (e.g. "VBT1" or "ABC_34"), when using scanString
824   -decimal_literal = Combine(Optional('-') + WordStart(vba_identifier_chars) + Word(nums)
825   - + Suppress(Optional(Word('%&^', exact=1))))
826   -decimal_literal.setParseAction(lambda t: int(t[0]))
827   -
828   -octal_literal = Combine(Suppress(Literal('&') + Optional((CaselessLiteral('o')))) + Word(srange('[0-7]'))
829   - + Suppress(Optional(Word('%&^', exact=1))))
830   -octal_literal.setParseAction(lambda t: int(t[0], base=8))
831   -
832   -hex_literal = Combine(Suppress(CaselessLiteral('&h')) + Word(srange('[0-9a-fA-F]'))
833   - + Suppress(Optional(Word('%&^', exact=1))))
834   -hex_literal.setParseAction(lambda t: int(t[0], base=16))
835   -
836   -integer = decimal_literal | octal_literal | hex_literal
837   -
838   -
839   -# --- QUOTED STRINGS ---------------------------------------------------------
840   -
841   -# 3.3.4 String Tokens
842   -# STRING = double-quote *string-character (double-quote / line-continuation / LINE-END)
843   -# double-quote = %x0022 ; "
844   -# string-character = NO-LINE-CONTINUATION ((double-quote double-quote) termination-character)
845   -
846   -quoted_string = QuotedString('"', escQuote='""')
847   -quoted_string.setParseAction(lambda t: str(t[0]))
848   -
849   -
850   -#--- VBA Expressions ---------------------------------------------------------
851   -
852   -# See MS-VBAL 5.6 Expressions
853   -
854   -# need to pre-declare using Forward() because it is recursive
855   -# VBA string expression and integer expression
856   -vba_expr_str = Forward()
857   -vba_expr_int = Forward()
858   -
859   -# --- CHR --------------------------------------------------------------------
860   -
861   -# MS-VBAL 6.1.2.11.1.4 Chr / Chr$
862   -# Function Chr(CharCode As Long) As Variant
863   -# Function Chr$(CharCode As Long) As String
864   -# Parameter Description
865   -# CharCode Long whose value is a code point.
866   -# Returns a String data value consisting of a single character containing the character whose code
867   -# point is the data value of the argument.
868   -# - If the argument is not in the range 0 to 255, Error Number 5 ("Invalid procedure call or
869   -# argument") is raised unless the implementation supports a character set with a larger code point
870   -# range.
871   -# - If the argument value is in the range of 0 to 127, it is interpreted as a 7-bit ASCII code point.
872   -# - If the argument value is in the range of 128 to 255, the code point interpretation of the value is
873   -# implementation defined.
874   -# - Chr$ has the same runtime semantics as Chr, however the declared type of its function result is
875   -# String rather than Variant.
876   -
877   -# 6.1.2.11.1.5 ChrB / ChrB$
878   -# Function ChrB(CharCode As Long) As Variant
879   -# Function ChrB$(CharCode As Long) As String
880   -# CharCode Long whose value is a code point.
881   -# Returns a String data value consisting of a single byte character whose code point value is the
882   -# data value of the argument.
883   -# - If the argument is not in the range 0 to 255, Error Number 6 ("Overflow") is raised.
884   -# - ChrB$ has the same runtime semantics as ChrB however the declared type of its function result
885   -# is String rather than Variant.
886   -# - Note: the ChrB function is used with byte data contained in a String. Instead of returning a
887   -# character, which may be one or two bytes, ChrB always returns a single byte. The ChrW function
888   -# returns a String containing the Unicode character except on platforms where Unicode is not
889   -# supported, in which case, the behavior is identical to the Chr function.
890   -
891   -# 6.1.2.11.1.6 ChrW/ ChrW$
892   -# Function ChrW(CharCode As Long) As Variant
893   -# Function ChrW$(CharCode As Long) As String
894   -# CharCode Long whose value is a code point.
895   -# Returns a String data value consisting of a single character containing the character whose code
896   -# point is the data value of the argument.
897   -# - If the argument is not in the range -32,767 to 65,535 then Error Number 5 ("Invalid procedure
898   -# call or argument") is raised.
899   -# - If the argument is a negative value it is treated as if it was the value: CharCode + 65,536.
900   -# - If the implemented uses 16-bit Unicode code points argument, data value is interpreted as a 16-
901   -# bit Unicode code point.
902   -# - If the implementation does not support Unicode, ChrW has the same semantics as Chr.
903   -# - ChrW$ has the same runtime semantics as ChrW, however the declared type of its function result
904   -# is String rather than Variant.
905   -
906   -# Chr, Chr$, ChrB, ChrW(int) => char
907   -vba_chr = Suppress(
908   - Combine(WordStart(vba_identifier_chars) + CaselessLiteral('Chr')
909   - + Optional(CaselessLiteral('B') | CaselessLiteral('W')) + Optional('$'))
910   - + '(') + vba_expr_int + Suppress(')')
911   -
912   -def vba_chr_tostr(t):
913   - try:
914   - i = t[0]
915   - # normal, non-unicode character:
916   - if i>=0 and i<=255:
917   - return VbaExpressionString(chr(i))
918   - else:
919   - return VbaExpressionString(chr(i).encode('utf-8', 'backslashreplace'))
920   - except ValueError:
921   - log.exception('ERROR: incorrect parameter value for chr(): %r' % i)
922   - return VbaExpressionString('Chr(%r)' % i)
923   -
924   -vba_chr.setParseAction(vba_chr_tostr)
925   -
926   -
927   -# --- ASC --------------------------------------------------------------------
928   -
929   -# Asc(char) => int
930   -#TODO: see MS-VBAL 6.1.2.11.1.1 page 240 => AscB, AscW
931   -vba_asc = Suppress(CaselessKeyword('Asc') + '(') + vba_expr_str + Suppress(')')
932   -vba_asc.setParseAction(lambda t: ord(t[0]))
933   -
934   -
935   -# --- VAL --------------------------------------------------------------------
936   -
937   -# Val(string) => int
938   -# TODO: make sure the behavior of VBA's val is fully covered
939   -vba_val = Suppress(CaselessKeyword('Val') + '(') + vba_expr_str + Suppress(')')
940   -vba_val.setParseAction(lambda t: int(t[0].strip()))
941   -
942   -
943   -# --- StrReverse() --------------------------------------------------------------------
944   -
945   -# StrReverse(string) => string
946   -strReverse = Suppress(CaselessKeyword('StrReverse') + '(') + vba_expr_str + Suppress(')')
947   -strReverse.setParseAction(lambda t: VbaExpressionString(str(t[0])[::-1]))
948   -
949   -
950   -# --- ENVIRON() --------------------------------------------------------------------
951   -
952   -# Environ("name") => just translated to "%name%", that is enough for malware analysis
953   -environ = Suppress(CaselessKeyword('Environ') + '(') + vba_expr_str + Suppress(')')
954   -environ.setParseAction(lambda t: VbaExpressionString('%%%s%%' % t[0]))
955   -
956   -
957   -# --- IDENTIFIER -------------------------------------------------------------
958   -
959   -#TODO: see MS-VBAL 3.3.5 page 33
960   -# 3.3.5 Identifier Tokens
961   -# Latin-identifier = first-Latin-identifier-character *subsequent-Latin-identifier-character
962   -# first-Latin-identifier-character = (%x0041-005A / %x0061-007A) ; A-Z / a-z
963   -# subsequent-Latin-identifier-character = first-Latin-identifier-character / DIGIT / %x5F ; underscore
964   -latin_identifier = Word(initChars=alphas, bodyChars=alphanums + '_')
965   -
966   -# --- HEX FUNCTION -----------------------------------------------------------
967   -
968   -# match any custom function name with a hex string as argument:
969   -# TODO: accept vba_expr_str_item as argument, check if it is a hex or base64 string at runtime
970   -
971   -# quoted string of at least two hexadecimal numbers of two digits:
972   -quoted_hex_string = Suppress('"') + Combine(Word(hexnums, exact=2) * (2, None)) + Suppress('"')
973   -quoted_hex_string.setParseAction(lambda t: str(t[0]))
974   -
975   -hex_function_call = Suppress(latin_identifier) + Suppress('(') + \
976   - quoted_hex_string('hex_string') + Suppress(')')
977   -hex_function_call.setParseAction(lambda t: VbaExpressionString(binascii.a2b_hex(t.hex_string)))
978   -
979   -
980   -# --- BASE64 FUNCTION -----------------------------------------------------------
981   -
982   -# match any custom function name with a Base64 string as argument:
983   -# TODO: accept vba_expr_str_item as argument, check if it is a hex or base64 string at runtime
984   -
985   -# quoted string of at least two hexadecimal numbers of two digits:
986   -quoted_base64_string = Suppress('"') + Regex(BASE64_RE) + Suppress('"')
987   -quoted_base64_string.setParseAction(lambda t: str(t[0]))
988   -
989   -base64_function_call = Suppress(latin_identifier) + Suppress('(') + \
990   - quoted_base64_string('base64_string') + Suppress(')')
991   -base64_function_call.setParseAction(lambda t: VbaExpressionString(binascii.a2b_base64(t.base64_string)))
992   -
993   -
994   -# ---STRING EXPRESSION -------------------------------------------------------
995   -
996   -def concat_strings_list(tokens):
997   - """
998   - parse action to concatenate strings in a VBA expression with operators '+' or '&'
999   - """
1000   - # extract argument from the tokens:
1001   - # expected to be a tuple containing a list of strings such as [a,'&',b,'&',c,...]
1002   - strings = tokens[0][::2]
1003   - return VbaExpressionString(''.join(strings))
1004   -
1005   -
1006   -vba_expr_str_item = (vba_chr | strReverse | environ | quoted_string | hex_function_call | base64_function_call)
1007   -
1008   -vba_expr_str <<= infixNotation(vba_expr_str_item,
1009   - [
1010   - ("+", 2, opAssoc.LEFT, concat_strings_list),
1011   - ("&", 2, opAssoc.LEFT, concat_strings_list),
1012   - ])
1013   -
1014   -
1015   -# --- INTEGER EXPRESSION -------------------------------------------------------
1016   -
1017   -def sum_ints_list(tokens):
1018   - """
1019   - parse action to sum integers in a VBA expression with operator '+'
1020   - """
1021   - # extract argument from the tokens:
1022   - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]
1023   - integers = tokens[0][::2]
1024   - return sum(integers)
1025   -
1026   -
1027   -def subtract_ints_list(tokens):
1028   - """
1029   - parse action to subtract integers in a VBA expression with operator '-'
1030   - """
1031   - # extract argument from the tokens:
1032   - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]
1033   - integers = tokens[0][::2]
1034   - return reduce(lambda x,y:x-y, integers)
1035   -
1036   -
1037   -def multiply_ints_list(tokens):
1038   - """
1039   - parse action to multiply integers in a VBA expression with operator '*'
1040   - """
1041   - # extract argument from the tokens:
1042   - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]
1043   - integers = tokens[0][::2]
1044   - return reduce(lambda x,y:x*y, integers)
1045   -
1046   -
1047   -def divide_ints_list(tokens):
1048   - """
1049   - parse action to divide integers in a VBA expression with operator '/'
1050   - """
1051   - # extract argument from the tokens:
1052   - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]
1053   - integers = tokens[0][::2]
1054   - return reduce(lambda x,y:x/y, integers)
1055   -
1056   -
1057   -vba_expr_int_item = (vba_asc | vba_val | integer)
1058   -
1059   -# operators associativity:
1060   -# https://en.wikipedia.org/wiki/Operator_associativity
1061   -
1062   -vba_expr_int <<= infixNotation(vba_expr_int_item,
1063   - [
1064   - ("*", 2, opAssoc.LEFT, multiply_ints_list),
1065   - ("/", 2, opAssoc.LEFT, divide_ints_list),
1066   - ("-", 2, opAssoc.LEFT, subtract_ints_list),
1067   - ("+", 2, opAssoc.LEFT, sum_ints_list),
1068   - ])
1069   -
1070   -
1071   -# see detect_vba_strings for the deobfuscation code using this grammar
1072   -
1073   -# === MSO/ActiveMime files parsing ===========================================
1074   -
1075   -def is_mso_file(data):
1076   - """
1077   - Check if the provided data is the content of a MSO/ActiveMime file, such as
1078   - the ones created by Outlook in some cases, or Word/Excel when saving a
1079   - file with the MHTML format or the Word 2003 XML format.
1080   - This function only checks the ActiveMime magic at the beginning of data.
1081   - :param data: bytes string, MSO/ActiveMime file content
1082   - :return: bool, True if the file is MSO, False otherwise
1083   - """
1084   - return data.startswith(MSO_ACTIVEMIME_HEADER)
1085   -
1086   -
1087   -# regex to find zlib block headers, starting with byte 0x78 = 'x'
1088   -re_zlib_header = re.compile(r'x')
1089   -
1090   -
1091   -def mso_file_extract(data):
1092   - """
1093   - Extract the data stored into a MSO/ActiveMime file, such as
1094   - the ones created by Outlook in some cases, or Word/Excel when saving a
1095   - file with the MHTML format or the Word 2003 XML format.
1096   -
1097   - :param data: bytes string, MSO/ActiveMime file content
1098   - :return: bytes string, extracted data (uncompressed)
1099   -
1100   - raise a MsoExtractionError if the data cannot be extracted
1101   - """
1102   - # check the magic:
1103   - assert is_mso_file(data)
1104   -
1105   - # In all the samples seen so far, Word always uses an offset of 0x32,
1106   - # and Excel 0x22A. But we read the offset from the header to be more
1107   - # generic.
1108   - offsets = [0x32, 0x22A]
1109   -
1110   - # First, attempt to get the compressed data offset from the header
1111   - # According to my tests, it should be an unsigned 16 bits integer,
1112   - # at offset 0x1E (little endian) + add 46:
1113   - try:
1114   - offset = struct.unpack_from('<H', data, offset=0x1E)[0] + 46
1115   - log.debug('Parsing MSO file: data offset = 0x%X' % offset)
1116   - offsets.insert(0, offset) # insert at beginning of offsets
1117   - except struct.error as exc:
1118   - log.info('Unable to parse MSO/ActiveMime file header (%s)' % exc)
1119   - log.debug('Trace:', exc_info=True)
1120   - raise MsoExtractionError('Unable to parse MSO/ActiveMime file header')
1121   - # now try offsets
1122   - for start in offsets:
1123   - try:
1124   - log.debug('Attempting zlib decompression from MSO file offset 0x%X' % start)
1125   - extracted_data = zlib.decompress(data[start:])
1126   - return extracted_data
1127   - except zlib.error as exc:
1128   - log.info('zlib decompression failed for offset %s (%s)'
1129   - % (start, exc))
1130   - log.debug('Trace:', exc_info=True)
1131   - # None of the guessed offsets worked, let's try brute-forcing by looking
1132   - # for potential zlib-compressed blocks starting with 0x78:
1133   - log.debug('Looking for potential zlib-compressed blocks in MSO file')
1134   - for match in re_zlib_header.finditer(data):
1135   - start = match.start()
1136   - try:
1137   - log.debug('Attempting zlib decompression from MSO file offset 0x%X' % start)
1138   - extracted_data = zlib.decompress(data[start:])
1139   - return extracted_data
1140   - except zlib.error as exc:
1141   - log.info('zlib decompression failed (%s)' % exc)
1142   - log.debug('Trace:', exc_info=True)
1143   - raise MsoExtractionError('Unable to decompress data from a MSO/ActiveMime file')
1144   -
1145   -
1146   -#--- FUNCTIONS ----------------------------------------------------------------
1147   -
1148   -# set of printable characters, for is_printable
1149   -_PRINTABLE_SET = set(string.printable)
1150   -
1151   -def is_printable(s):
1152   - """
1153   - returns True if string s only contains printable ASCII characters
1154   - (i.e. contained in string.printable)
1155   - This is similar to Python 3's str.isprintable, for Python 2.x.
1156   - :param s: str
1157   - :return: bool
1158   - """
1159   - # inspired from http://stackoverflow.com/questions/3636928/test-if-a-python-string-is-printable
1160   - # check if the set of chars from s is contained into the set of printable chars:
1161   - return set(s).issubset(_PRINTABLE_SET)
1162   -
1163   -
1164   -def copytoken_help(decompressed_current, decompressed_chunk_start):
1165   - """
1166   - compute bit masks to decode a CopyToken according to MS-OVBA 2.4.1.3.19.1 CopyToken Help
1167   -
1168   - decompressed_current: number of decompressed bytes so far, i.e. len(decompressed_container)
1169   - decompressed_chunk_start: offset of the current chunk in the decompressed container
1170   - return length_mask, offset_mask, bit_count, maximum_length
1171   - """
1172   - difference = decompressed_current - decompressed_chunk_start
1173   - bit_count = int(math.ceil(math.log(difference, 2)))
1174   - bit_count = max([bit_count, 4])
1175   - length_mask = 0xFFFF >> bit_count
1176   - offset_mask = ~length_mask
1177   - maximum_length = (0xFFFF >> bit_count) + 3
1178   - return length_mask, offset_mask, bit_count, maximum_length
1179   -
1180   -
1181   -def decompress_stream(compressed_container):
1182   - """
1183   - Decompress a stream according to MS-OVBA section 2.4.1
1184   -
1185   - compressed_container: string compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm
1186   - return the decompressed container as a string (bytes)
1187   - """
1188   - # 2.4.1.2 State Variables
1189   -
1190   - # The following state is maintained for the CompressedContainer (section 2.4.1.1.1):
1191   - # CompressedRecordEnd: The location of the byte after the last byte in the CompressedContainer (section 2.4.1.1.1).
1192   - # CompressedCurrent: The location of the next byte in the CompressedContainer (section 2.4.1.1.1) to be read by
1193   - # decompression or to be written by compression.
1194   -
1195   - # The following state is maintained for the current CompressedChunk (section 2.4.1.1.4):
1196   - # CompressedChunkStart: The location of the first byte of the CompressedChunk (section 2.4.1.1.4) within the
1197   - # CompressedContainer (section 2.4.1.1.1).
1198   -
1199   - # The following state is maintained for a DecompressedBuffer (section 2.4.1.1.2):
1200   - # DecompressedCurrent: The location of the next byte in the DecompressedBuffer (section 2.4.1.1.2) to be written by
1201   - # decompression or to be read by compression.
1202   - # DecompressedBufferEnd: The location of the byte after the last byte in the DecompressedBuffer (section 2.4.1.1.2).
1203   -
1204   - # The following state is maintained for the current DecompressedChunk (section 2.4.1.1.3):
1205   - # DecompressedChunkStart: The location of the first byte of the DecompressedChunk (section 2.4.1.1.3) within the
1206   - # DecompressedBuffer (section 2.4.1.1.2).
1207   -
1208   - decompressed_container = bytearray() # result
1209   - compressed_current = 0
1210   -
1211   - sig_byte = compressed_container[compressed_current]
1212   - if sig_byte != 0x01:
1213   - raise ValueError('invalid signature byte {0:02X}'.format(sig_byte))
1214   -
1215   - compressed_current += 1
1216   -
1217   - #NOTE: the definition of CompressedRecordEnd is ambiguous. Here we assume that
1218   - # CompressedRecordEnd = len(compressed_container)
1219   - while compressed_current < len(compressed_container):
1220   - # 2.4.1.1.5
1221   - compressed_chunk_start = compressed_current
1222   - # chunk header = first 16 bits
1223   - compressed_chunk_header = \
1224   - struct.unpack("<H", compressed_container[compressed_chunk_start:compressed_chunk_start + 2])[0]
1225   - # chunk size = 12 first bits of header + 3
1226   - chunk_size = (compressed_chunk_header & 0x0FFF) + 3
1227   - # chunk signature = 3 next bits - should always be 0b011
1228   - chunk_signature = (compressed_chunk_header >> 12) & 0x07
1229   - if chunk_signature != 0b011:
1230   - raise ValueError('Invalid CompressedChunkSignature in VBA compressed stream')
1231   - # chunk flag = next bit - 1 == compressed, 0 == uncompressed
1232   - chunk_flag = (compressed_chunk_header >> 15) & 0x01
1233   - log.debug("chunk size = {0}, compressed flag = {1}".format(chunk_size, chunk_flag))
1234   -
1235   - #MS-OVBA 2.4.1.3.12: the maximum size of a chunk including its header is 4098 bytes (header 2 + data 4096)
1236   - # The minimum size is 3 bytes
1237   - # NOTE: there seems to be a typo in MS-OVBA, the check should be with 4098, not 4095 (which is the max value
1238   - # in chunk header before adding 3.
1239   - # Also the first test is not useful since a 12 bits value cannot be larger than 4095.
1240   - if chunk_flag == 1 and chunk_size > 4098:
1241   - raise ValueError('CompressedChunkSize > 4098 but CompressedChunkFlag == 1')
1242   - if chunk_flag == 0 and chunk_size != 4098:
1243   - raise ValueError('CompressedChunkSize != 4098 but CompressedChunkFlag == 0')
1244   -
1245   - # check if chunk_size goes beyond the compressed data, instead of silently cutting it:
1246   - #TODO: raise an exception?
1247   - if compressed_chunk_start + chunk_size > len(compressed_container):
1248   - log.warning('Chunk size is larger than remaining compressed data')
1249   - compressed_end = min([len(compressed_container), compressed_chunk_start + chunk_size])
1250   - # read after chunk header:
1251   - compressed_current = compressed_chunk_start + 2
1252   -
1253   - if chunk_flag == 0:
1254   - # MS-OVBA 2.4.1.3.3 Decompressing a RawChunk
1255   - # uncompressed chunk: read the next 4096 bytes as-is
1256   - #TODO: check if there are at least 4096 bytes left
1257   - decompressed_container.extend([compressed_container[compressed_current:compressed_current + 4096]])
1258   - compressed_current += 4096
1259   - else:
1260   - # MS-OVBA 2.4.1.3.2 Decompressing a CompressedChunk
1261   - # compressed chunk
1262   - decompressed_chunk_start = len(decompressed_container)
1263   - while compressed_current < compressed_end:
1264   - # MS-OVBA 2.4.1.3.4 Decompressing a TokenSequence
1265   - # log.debug('compressed_current = %d / compressed_end = %d' % (compressed_current, compressed_end))
1266   - # FlagByte: 8 bits indicating if the following 8 tokens are either literal (1 byte of plain text) or
1267   - # copy tokens (reference to a previous literal token)
1268   - flag_byte = compressed_container[compressed_current]
1269   - compressed_current += 1
1270   - for bit_index in range(0, 8):
1271   - # log.debug('bit_index=%d / compressed_current=%d / compressed_end=%d' % (bit_index, compressed_current, compressed_end))
1272   - if compressed_current >= compressed_end:
1273   - break
1274   - # MS-OVBA 2.4.1.3.5 Decompressing a Token
1275   - # MS-OVBA 2.4.1.3.17 Extract FlagBit
1276   - flag_bit = (flag_byte >> bit_index) & 1
1277   - #log.debug('bit_index=%d: flag_bit=%d' % (bit_index, flag_bit))
1278   - if flag_bit == 0: # LiteralToken
1279   - # copy one byte directly to output
1280   - decompressed_container.extend([compressed_container[compressed_current]])
1281   - compressed_current += 1
1282   - else: # CopyToken
1283   - # MS-OVBA 2.4.1.3.19.2 Unpack CopyToken
1284   - copy_token = \
1285   - struct.unpack("<H", compressed_container[compressed_current:compressed_current + 2])[0]
1286   - #TODO: check this
1287   - length_mask, offset_mask, bit_count, _ = copytoken_help(
1288   - len(decompressed_container), decompressed_chunk_start)
1289   - length = (copy_token & length_mask) + 3
1290   - temp1 = copy_token & offset_mask
1291   - temp2 = 16 - bit_count
1292   - offset = (temp1 >> temp2) + 1
1293   - #log.debug('offset=%d length=%d' % (offset, length))
1294   - copy_source = len(decompressed_container) - offset
1295   - for index in range(copy_source, copy_source + length):
1296   - decompressed_container.extend([decompressed_container[index]])
1297   - compressed_current += 2
1298   - return bytes(decompressed_container)
1299   -
1300   -
1301   -def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False):
1302   - """
1303   - Extract VBA macros from an OleFileIO object.
1304   - Internal function, do not call directly.
1305   -
1306   - vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream
1307   - vba_project: path to the PROJECT stream
1308   - :param relaxed: If True, only create info/debug log entry if data is not as expected
1309   - (e.g. opening substream fails); if False, raise an error in this case
1310   - This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream
1311   - """
1312   - # Open the PROJECT stream:
1313   - project = ole.openstream(project_path)
1314   - log.debug('relaxed is %s' % relaxed)
1315   -
1316   - # sample content of the PROJECT stream:
1317   -
1318   - ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}"
1319   - ## Document=ThisDocument/&H00000000
1320   - ## Module=NewMacros
1321   - ## Name="Project"
1322   - ## HelpContextID="0"
1323   - ## VersionCompatible32="393222000"
1324   - ## CMG="F1F301E705E705E705E705"
1325   - ## DPB="8F8D7FE3831F2020202020"
1326   - ## GC="2D2FDD81E51EE61EE6E1"
1327   - ##
1328   - ## [Host Extender Info]
1329   - ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000
1330   - ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000
1331   - ##
1332   - ## [Workspace]
1333   - ## ThisDocument=22, 29, 339, 477, Z
1334   - ## NewMacros=-4, 42, 832, 510, C
1335   -
1336   - code_modules = {}
1337   -
1338   - for line in project:
1339   - line = line.strip().decode('utf-8','ignore')
1340   - if '=' in line:
1341   - # split line at the 1st equal sign:
1342   - name, value = line.split('=', 1)
1343   - # looking for code modules
1344   - # add the code module as a key in the dictionary
1345   - # the value will be the extension needed later
1346   - # The value is converted to lowercase, to allow case-insensitive matching (issue #3)
1347   - value = value.lower()
1348   - if name == 'Document':
1349   - # split value at the 1st slash, keep 1st part:
1350   - value = value.split('/', 1)[0]
1351   - code_modules[value] = CLASS_EXTENSION
1352   - elif name == 'Module':
1353   - code_modules[value] = MODULE_EXTENSION
1354   - elif name == 'Class':
1355   - code_modules[value] = CLASS_EXTENSION
1356   - elif name == 'BaseClass':
1357   - code_modules[value] = FORM_EXTENSION
1358   -
1359   - # read data from dir stream (compressed)
1360   - dir_compressed = ole.openstream(dir_path).read()
1361   -
1362   - def check_value(name, expected, value):
1363   - if expected != value:
1364   - if relaxed:
1365   - log.error("invalid value for {0} expected {1:04X} got {2:04X}"
1366   - .format(name, expected, value))
1367   - else:
1368   - raise UnexpectedDataError(dir_path, name, expected, value)
1369   -
1370   - dir_stream = BytesIO(decompress_stream(dir_compressed))
1371   -
1372   - # PROJECTSYSKIND Record
1373   - projectsyskind_id = struct.unpack("<H", dir_stream.read(2))[0]
1374   - check_value('PROJECTSYSKIND_Id', 0x0001, projectsyskind_id)
1375   - projectsyskind_size = struct.unpack("<L", dir_stream.read(4))[0]
1376   - check_value('PROJECTSYSKIND_Size', 0x0004, projectsyskind_size)
1377   - projectsyskind_syskind = struct.unpack("<L", dir_stream.read(4))[0]
1378   - if projectsyskind_syskind == 0x00:
1379   - log.debug("16-bit Windows")
1380   - elif projectsyskind_syskind == 0x01:
1381   - log.debug("32-bit Windows")
1382   - elif projectsyskind_syskind == 0x02:
1383   - log.debug("Macintosh")
1384   - elif projectsyskind_syskind == 0x03:
1385   - log.debug("64-bit Windows")
1386   - else:
1387   - log.error("invalid PROJECTSYSKIND_SysKind {0:04X}".format(projectsyskind_syskind))
1388   -
1389   - # PROJECTLCID Record
1390   - projectlcid_id = struct.unpack("<H", dir_stream.read(2))[0]
1391   - check_value('PROJECTLCID_Id', 0x0002, projectlcid_id)
1392   - projectlcid_size = struct.unpack("<L", dir_stream.read(4))[0]
1393   - check_value('PROJECTLCID_Size', 0x0004, projectlcid_size)
1394   - projectlcid_lcid = struct.unpack("<L", dir_stream.read(4))[0]
1395   - check_value('PROJECTLCID_Lcid', 0x409, projectlcid_lcid)
1396   -
1397   - # PROJECTLCIDINVOKE Record
1398   - projectlcidinvoke_id = struct.unpack("<H", dir_stream.read(2))[0]
1399   - check_value('PROJECTLCIDINVOKE_Id', 0x0014, projectlcidinvoke_id)
1400   - projectlcidinvoke_size = struct.unpack("<L", dir_stream.read(4))[0]
1401   - check_value('PROJECTLCIDINVOKE_Size', 0x0004, projectlcidinvoke_size)
1402   - projectlcidinvoke_lcidinvoke = struct.unpack("<L", dir_stream.read(4))[0]
1403   - check_value('PROJECTLCIDINVOKE_LcidInvoke', 0x409, projectlcidinvoke_lcidinvoke)
1404   -
1405   - # PROJECTCODEPAGE Record
1406   - projectcodepage_id = struct.unpack("<H", dir_stream.read(2))[0]
1407   - check_value('PROJECTCODEPAGE_Id', 0x0003, projectcodepage_id)
1408   - projectcodepage_size = struct.unpack("<L", dir_stream.read(4))[0]
1409   - check_value('PROJECTCODEPAGE_Size', 0x0002, projectcodepage_size)
1410   - projectcodepage_codepage = struct.unpack("<H", dir_stream.read(2))[0]
1411   -
1412   - # PROJECTNAME Record
1413   - projectname_id = struct.unpack("<H", dir_stream.read(2))[0]
1414   - check_value('PROJECTNAME_Id', 0x0004, projectname_id)
1415   - projectname_sizeof_projectname = struct.unpack("<L", dir_stream.read(4))[0]
1416   - if projectname_sizeof_projectname < 1 or projectname_sizeof_projectname > 128:
1417   - log.error("PROJECTNAME_SizeOfProjectName value not in range: {0}".format(projectname_sizeof_projectname))
1418   - projectname_projectname = dir_stream.read(projectname_sizeof_projectname)
1419   - unused = projectname_projectname
1420   -
1421   - # PROJECTDOCSTRING Record
1422   - projectdocstring_id = struct.unpack("<H", dir_stream.read(2))[0]
1423   - check_value('PROJECTDOCSTRING_Id', 0x0005, projectdocstring_id)
1424   - projectdocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0]
1425   - if projectdocstring_sizeof_docstring > 2000:
1426   - log.error(
1427   - "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring))
1428   - projectdocstring_docstring = dir_stream.read(projectdocstring_sizeof_docstring)
1429   - projectdocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1430   - check_value('PROJECTDOCSTRING_Reserved', 0x0040, projectdocstring_reserved)
1431   - projectdocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1432   - if projectdocstring_sizeof_docstring_unicode % 2 != 0:
1433   - log.error("PROJECTDOCSTRING_SizeOfDocStringUnicode is not even")
1434   - projectdocstring_docstring_unicode = dir_stream.read(projectdocstring_sizeof_docstring_unicode)
1435   - unused = projectdocstring_docstring
1436   - unused = projectdocstring_docstring_unicode
1437   -
1438   - # PROJECTHELPFILEPATH Record - MS-OVBA 2.3.4.2.1.7
1439   - projecthelpfilepath_id = struct.unpack("<H", dir_stream.read(2))[0]
1440   - check_value('PROJECTHELPFILEPATH_Id', 0x0006, projecthelpfilepath_id)
1441   - projecthelpfilepath_sizeof_helpfile1 = struct.unpack("<L", dir_stream.read(4))[0]
1442   - if projecthelpfilepath_sizeof_helpfile1 > 260:
1443   - log.error(
1444   - "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1))
1445   - projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1)
1446   - projecthelpfilepath_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1447   - check_value('PROJECTHELPFILEPATH_Reserved', 0x003D, projecthelpfilepath_reserved)
1448   - projecthelpfilepath_sizeof_helpfile2 = struct.unpack("<L", dir_stream.read(4))[0]
1449   - if projecthelpfilepath_sizeof_helpfile2 != projecthelpfilepath_sizeof_helpfile1:
1450   - log.error("PROJECTHELPFILEPATH_SizeOfHelpFile1 does not equal PROJECTHELPFILEPATH_SizeOfHelpFile2")
1451   - projecthelpfilepath_helpfile2 = dir_stream.read(projecthelpfilepath_sizeof_helpfile2)
1452   - if projecthelpfilepath_helpfile2 != projecthelpfilepath_helpfile1:
1453   - log.error("PROJECTHELPFILEPATH_HelpFile1 does not equal PROJECTHELPFILEPATH_HelpFile2")
1454   -
1455   - # PROJECTHELPCONTEXT Record
1456   - projecthelpcontext_id = struct.unpack("<H", dir_stream.read(2))[0]
1457   - check_value('PROJECTHELPCONTEXT_Id', 0x0007, projecthelpcontext_id)
1458   - projecthelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0]
1459   - check_value('PROJECTHELPCONTEXT_Size', 0x0004, projecthelpcontext_size)
1460   - projecthelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0]
1461   - unused = projecthelpcontext_helpcontext
1462   -
1463   - # PROJECTLIBFLAGS Record
1464   - projectlibflags_id = struct.unpack("<H", dir_stream.read(2))[0]
1465   - check_value('PROJECTLIBFLAGS_Id', 0x0008, projectlibflags_id)
1466   - projectlibflags_size = struct.unpack("<L", dir_stream.read(4))[0]
1467   - check_value('PROJECTLIBFLAGS_Size', 0x0004, projectlibflags_size)
1468   - projectlibflags_projectlibflags = struct.unpack("<L", dir_stream.read(4))[0]
1469   - check_value('PROJECTLIBFLAGS_ProjectLibFlags', 0x0000, projectlibflags_projectlibflags)
1470   -
1471   - # PROJECTVERSION Record
1472   - projectversion_id = struct.unpack("<H", dir_stream.read(2))[0]
1473   - check_value('PROJECTVERSION_Id', 0x0009, projectversion_id)
1474   - projectversion_reserved = struct.unpack("<L", dir_stream.read(4))[0]
1475   - check_value('PROJECTVERSION_Reserved', 0x0004, projectversion_reserved)
1476   - projectversion_versionmajor = struct.unpack("<L", dir_stream.read(4))[0]
1477   - projectversion_versionminor = struct.unpack("<H", dir_stream.read(2))[0]
1478   - unused = projectversion_versionmajor
1479   - unused = projectversion_versionminor
1480   -
1481   - # PROJECTCONSTANTS Record
1482   - projectconstants_id = struct.unpack("<H", dir_stream.read(2))[0]
1483   - check_value('PROJECTCONSTANTS_Id', 0x000C, projectconstants_id)
1484   - projectconstants_sizeof_constants = struct.unpack("<L", dir_stream.read(4))[0]
1485   - if projectconstants_sizeof_constants > 1015:
1486   - log.error(
1487   - "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants))
1488   - projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants)
1489   - projectconstants_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1490   - check_value('PROJECTCONSTANTS_Reserved', 0x003C, projectconstants_reserved)
1491   - projectconstants_sizeof_constants_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1492   - if projectconstants_sizeof_constants_unicode % 2 != 0:
1493   - log.error("PROJECTCONSTANTS_SizeOfConstantsUnicode is not even")
1494   - projectconstants_constants_unicode = dir_stream.read(projectconstants_sizeof_constants_unicode)
1495   - unused = projectconstants_constants
1496   - unused = projectconstants_constants_unicode
1497   -
1498   - # array of REFERENCE records
1499   - check = None
1500   - while True:
1501   - check = struct.unpack("<H", dir_stream.read(2))[0]
1502   - log.debug("reference type = {0:04X}".format(check))
1503   - if check == 0x000F:
1504   - break
1505   -
1506   - if check == 0x0016:
1507   - # REFERENCENAME
1508   - reference_id = check
1509   - reference_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]
1510   - reference_name = dir_stream.read(reference_sizeof_name)
1511   - reference_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1512   - # According to [MS-OVBA] 2.3.4.2.2.2 REFERENCENAME Record:
1513   - # "Reserved (2 bytes): MUST be 0x003E. MUST be ignored."
1514   - # So let's ignore it, otherwise it crashes on some files (issue #132)
1515   - # PR #135 by @c1fe:
1516   - # contrary to the specification I think that the unicode name
1517   - # is optional. if reference_reserved is not 0x003E I think it
1518   - # is actually the start of another REFERENCE record
1519   - # at least when projectsyskind_syskind == 0x02 (Macintosh)
1520   - if reference_reserved == 0x003E:
1521   - #if reference_reserved not in (0x003E, 0x000D):
1522   - # raise UnexpectedDataError(dir_path, 'REFERENCE_Reserved',
1523   - # 0x0003E, reference_reserved)
1524   - reference_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1525   - reference_name_unicode = dir_stream.read(reference_sizeof_name_unicode)
1526   - unused = reference_id
1527   - unused = reference_name
1528   - unused = reference_name_unicode
1529   - continue
1530   - else:
1531   - check = reference_reserved
1532   - log.debug("reference type = {0:04X}".format(check))
1533   -
1534   - if check == 0x0033:
1535   - # REFERENCEORIGINAL (followed by REFERENCECONTROL)
1536   - referenceoriginal_id = check
1537   - referenceoriginal_sizeof_libidoriginal = struct.unpack("<L", dir_stream.read(4))[0]
1538   - referenceoriginal_libidoriginal = dir_stream.read(referenceoriginal_sizeof_libidoriginal)
1539   - unused = referenceoriginal_id
1540   - unused = referenceoriginal_libidoriginal
1541   - continue
1542   -
1543   - if check == 0x002F:
1544   - # REFERENCECONTROL
1545   - referencecontrol_id = check
1546   - referencecontrol_sizetwiddled = struct.unpack("<L", dir_stream.read(4))[0] # ignore
1547   - referencecontrol_sizeof_libidtwiddled = struct.unpack("<L", dir_stream.read(4))[0]
1548   - referencecontrol_libidtwiddled = dir_stream.read(referencecontrol_sizeof_libidtwiddled)
1549   - referencecontrol_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] # ignore
1550   - check_value('REFERENCECONTROL_Reserved1', 0x0000, referencecontrol_reserved1)
1551   - referencecontrol_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] # ignore
1552   - check_value('REFERENCECONTROL_Reserved2', 0x0000, referencecontrol_reserved2)
1553   - unused = referencecontrol_id
1554   - unused = referencecontrol_sizetwiddled
1555   - unused = referencecontrol_libidtwiddled
1556   - # optional field
1557   - check2 = struct.unpack("<H", dir_stream.read(2))[0]
1558   - if check2 == 0x0016:
1559   - referencecontrol_namerecordextended_id = check
1560   - referencecontrol_namerecordextended_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]
1561   - referencecontrol_namerecordextended_name = dir_stream.read(
1562   - referencecontrol_namerecordextended_sizeof_name)
1563   - referencecontrol_namerecordextended_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1564   - if referencecontrol_namerecordextended_reserved == 0x003E:
1565   - referencecontrol_namerecordextended_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1566   - referencecontrol_namerecordextended_name_unicode = dir_stream.read(
1567   - referencecontrol_namerecordextended_sizeof_name_unicode)
1568   - referencecontrol_reserved3 = struct.unpack("<H", dir_stream.read(2))[0]
1569   - unused = referencecontrol_namerecordextended_id
1570   - unused = referencecontrol_namerecordextended_name
1571   - unused = referencecontrol_namerecordextended_name_unicode
1572   - else:
1573   - referencecontrol_reserved3 = referencecontrol_namerecordextended_reserved
1574   - else:
1575   - referencecontrol_reserved3 = check2
1576   -
1577   - check_value('REFERENCECONTROL_Reserved3', 0x0030, referencecontrol_reserved3)
1578   - referencecontrol_sizeextended = struct.unpack("<L", dir_stream.read(4))[0]
1579   - referencecontrol_sizeof_libidextended = struct.unpack("<L", dir_stream.read(4))[0]
1580   - referencecontrol_libidextended = dir_stream.read(referencecontrol_sizeof_libidextended)
1581   - referencecontrol_reserved4 = struct.unpack("<L", dir_stream.read(4))[0]
1582   - referencecontrol_reserved5 = struct.unpack("<H", dir_stream.read(2))[0]
1583   - referencecontrol_originaltypelib = dir_stream.read(16)
1584   - referencecontrol_cookie = struct.unpack("<L", dir_stream.read(4))[0]
1585   - unused = referencecontrol_sizeextended
1586   - unused = referencecontrol_libidextended
1587   - unused = referencecontrol_reserved4
1588   - unused = referencecontrol_reserved5
1589   - unused = referencecontrol_originaltypelib
1590   - unused = referencecontrol_cookie
1591   - continue
1592   -
1593   - if check == 0x000D:
1594   - # REFERENCEREGISTERED
1595   - referenceregistered_id = check
1596   - referenceregistered_size = struct.unpack("<L", dir_stream.read(4))[0]
1597   - referenceregistered_sizeof_libid = struct.unpack("<L", dir_stream.read(4))[0]
1598   - referenceregistered_libid = dir_stream.read(referenceregistered_sizeof_libid)
1599   - referenceregistered_reserved1 = struct.unpack("<L", dir_stream.read(4))[0]
1600   - check_value('REFERENCEREGISTERED_Reserved1', 0x0000, referenceregistered_reserved1)
1601   - referenceregistered_reserved2 = struct.unpack("<H", dir_stream.read(2))[0]
1602   - check_value('REFERENCEREGISTERED_Reserved2', 0x0000, referenceregistered_reserved2)
1603   - unused = referenceregistered_id
1604   - unused = referenceregistered_size
1605   - unused = referenceregistered_libid
1606   - continue
1607   -
1608   - if check == 0x000E:
1609   - # REFERENCEPROJECT
1610   - referenceproject_id = check
1611   - referenceproject_size = struct.unpack("<L", dir_stream.read(4))[0]
1612   - referenceproject_sizeof_libidabsolute = struct.unpack("<L", dir_stream.read(4))[0]
1613   - referenceproject_libidabsolute = dir_stream.read(referenceproject_sizeof_libidabsolute)
1614   - referenceproject_sizeof_libidrelative = struct.unpack("<L", dir_stream.read(4))[0]
1615   - referenceproject_libidrelative = dir_stream.read(referenceproject_sizeof_libidrelative)
1616   - referenceproject_majorversion = struct.unpack("<L", dir_stream.read(4))[0]
1617   - referenceproject_minorversion = struct.unpack("<H", dir_stream.read(2))[0]
1618   - unused = referenceproject_id
1619   - unused = referenceproject_size
1620   - unused = referenceproject_libidabsolute
1621   - unused = referenceproject_libidrelative
1622   - unused = referenceproject_majorversion
1623   - unused = referenceproject_minorversion
1624   - continue
1625   -
1626   - log.error('invalid or unknown check Id {0:04X}'.format(check))
1627   - # raise an exception instead of stopping abruptly (issue #180)
1628   - raise UnexpectedDataError(dir_path, 'reference type', (0x0F, 0x16, 0x33, 0x2F, 0x0D, 0x0E), check)
1629   - #sys.exit(0)
1630   -
1631   - projectmodules_id = check #struct.unpack("<H", dir_stream.read(2))[0]
1632   - check_value('PROJECTMODULES_Id', 0x000F, projectmodules_id)
1633   - projectmodules_size = struct.unpack("<L", dir_stream.read(4))[0]
1634   - check_value('PROJECTMODULES_Size', 0x0002, projectmodules_size)
1635   - projectmodules_count = struct.unpack("<H", dir_stream.read(2))[0]
1636   - projectmodules_projectcookierecord_id = struct.unpack("<H", dir_stream.read(2))[0]
1637   - check_value('PROJECTMODULES_ProjectCookieRecord_Id', 0x0013, projectmodules_projectcookierecord_id)
1638   - projectmodules_projectcookierecord_size = struct.unpack("<L", dir_stream.read(4))[0]
1639   - check_value('PROJECTMODULES_ProjectCookieRecord_Size', 0x0002, projectmodules_projectcookierecord_size)
1640   - projectmodules_projectcookierecord_cookie = struct.unpack("<H", dir_stream.read(2))[0]
1641   - unused = projectmodules_projectcookierecord_cookie
1642   -
1643   - # short function to simplify unicode text output
1644   - uni_out = lambda unicode_text: unicode_text.encode('utf-8', 'replace')
1645   -
1646   - log.debug("parsing {0} modules".format(projectmodules_count))
1647   - for projectmodule_index in range(0, projectmodules_count):
1648   - try:
1649   - modulename_id = struct.unpack("<H", dir_stream.read(2))[0]
1650   - check_value('MODULENAME_Id', 0x0019, modulename_id)
1651   - modulename_sizeof_modulename = struct.unpack("<L", dir_stream.read(4))[0]
1652   - modulename_modulename = dir_stream.read(modulename_sizeof_modulename).decode('utf-8', 'backslashreplace')
1653   - # TODO: preset variables to avoid "referenced before assignment" errors
1654   - modulename_unicode_modulename_unicode = ''
1655   - # account for optional sections
1656   - section_id = struct.unpack("<H", dir_stream.read(2))[0]
1657   - if section_id == 0x0047:
1658   - modulename_unicode_id = section_id
1659   - modulename_unicode_sizeof_modulename_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1660   - modulename_unicode_modulename_unicode = dir_stream.read(
1661   - modulename_unicode_sizeof_modulename_unicode).decode('UTF-16LE', 'replace')
1662   - # just guessing that this is the same encoding as used in OleFileIO
1663   - unused = modulename_unicode_id
1664   - section_id = struct.unpack("<H", dir_stream.read(2))[0]
1665   - if section_id == 0x001A:
1666   - modulestreamname_id = section_id
1667   - modulestreamname_sizeof_streamname = struct.unpack("<L", dir_stream.read(4))[0]
1668   - modulestreamname_streamname = dir_stream.read(modulestreamname_sizeof_streamname)
1669   - modulestreamname_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1670   - check_value('MODULESTREAMNAME_Reserved', 0x0032, modulestreamname_reserved)
1671   - modulestreamname_sizeof_streamname_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1672   - modulestreamname_streamname_unicode = dir_stream.read(
1673   - modulestreamname_sizeof_streamname_unicode).decode('UTF-16LE', 'replace')
1674   - # just guessing that this is the same encoding as used in OleFileIO
1675   - unused = modulestreamname_id
1676   - section_id = struct.unpack("<H", dir_stream.read(2))[0]
1677   - if section_id == 0x001C:
1678   - moduledocstring_id = section_id
1679   - check_value('MODULEDOCSTRING_Id', 0x001C, moduledocstring_id)
1680   - moduledocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0]
1681   - moduledocstring_docstring = dir_stream.read(moduledocstring_sizeof_docstring)
1682   - moduledocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0]
1683   - check_value('MODULEDOCSTRING_Reserved', 0x0048, moduledocstring_reserved)
1684   - moduledocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0]
1685   - moduledocstring_docstring_unicode = dir_stream.read(moduledocstring_sizeof_docstring_unicode)
1686   - unused = moduledocstring_docstring
1687   - unused = moduledocstring_docstring_unicode
1688   - section_id = struct.unpack("<H", dir_stream.read(2))[0]
1689   - if section_id == 0x0031:
1690   - moduleoffset_id = section_id
1691   - check_value('MODULEOFFSET_Id', 0x0031, moduleoffset_id)
1692   - moduleoffset_size = struct.unpack("<L", dir_stream.read(4))[0]
1693   - check_value('MODULEOFFSET_Size', 0x0004, moduleoffset_size)
1694   - moduleoffset_textoffset = struct.unpack("<L", dir_stream.read(4))[0]
1695   - section_id = struct.unpack("<H", dir_stream.read(2))[0]
1696   - if section_id == 0x001E:
1697   - modulehelpcontext_id = section_id
1698   - check_value('MODULEHELPCONTEXT_Id', 0x001E, modulehelpcontext_id)
1699   - modulehelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0]
1700   - check_value('MODULEHELPCONTEXT_Size', 0x0004, modulehelpcontext_size)
1701   - modulehelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0]
1702   - unused = modulehelpcontext_helpcontext
1703   - section_id = struct.unpack("<H", dir_stream.read(2))[0]
1704   - if section_id == 0x002C:
1705   - modulecookie_id = section_id
1706   - check_value('MODULECOOKIE_Id', 0x002C, modulecookie_id)
1707   - modulecookie_size = struct.unpack("<L", dir_stream.read(4))[0]
1708   - check_value('MODULECOOKIE_Size', 0x0002, modulecookie_size)
1709   - modulecookie_cookie = struct.unpack("<H", dir_stream.read(2))[0]
1710   - unused = modulecookie_cookie
1711   - section_id = struct.unpack("<H", dir_stream.read(2))[0]
1712   - if section_id == 0x0021 or section_id == 0x0022:
1713   - moduletype_id = section_id
1714   - moduletype_reserved = struct.unpack("<L", dir_stream.read(4))[0]
1715   - unused = moduletype_id
1716   - unused = moduletype_reserved
1717   - section_id = struct.unpack("<H", dir_stream.read(2))[0]
1718   - if section_id == 0x0025:
1719   - modulereadonly_id = section_id
1720   - check_value('MODULEREADONLY_Id', 0x0025, modulereadonly_id)
1721   - modulereadonly_reserved = struct.unpack("<L", dir_stream.read(4))[0]
1722   - check_value('MODULEREADONLY_Reserved', 0x0000, modulereadonly_reserved)
1723   - section_id = struct.unpack("<H", dir_stream.read(2))[0]
1724   - if section_id == 0x0028:
1725   - moduleprivate_id = section_id
1726   - check_value('MODULEPRIVATE_Id', 0x0028, moduleprivate_id)
1727   - moduleprivate_reserved = struct.unpack("<L", dir_stream.read(4))[0]
1728   - check_value('MODULEPRIVATE_Reserved', 0x0000, moduleprivate_reserved)
1729   - section_id = struct.unpack("<H", dir_stream.read(2))[0]
1730   - if section_id == 0x002B: # TERMINATOR
1731   - module_reserved = struct.unpack("<L", dir_stream.read(4))[0]
1732   - check_value('MODULE_Reserved', 0x0000, module_reserved)
1733   - section_id = None
1734   - if section_id != None:
1735   - log.warning('unknown or invalid module section id {0:04X}'.format(section_id))
1736   -
1737   - log.debug('Project CodePage = %d' % projectcodepage_codepage)
1738   - if projectcodepage_codepage in MAC_CODEPAGES:
1739   - vba_codec = MAC_CODEPAGES[projectcodepage_codepage]
1740   - else:
1741   - vba_codec = 'cp%d' % projectcodepage_codepage
1742   - log.debug("ModuleName = {0}".format(modulename_modulename))
1743   - log.debug("ModuleNameUnicode = {0}".format(uni_out(modulename_unicode_modulename_unicode)))
1744   - log.debug("StreamName = {0}".format(modulestreamname_streamname))
1745   - try:
1746   - streamname_unicode = modulestreamname_streamname.decode(vba_codec)
1747   - except UnicodeError as ue:
1748   - log.debug('failed to decode stream name {0!r} with codec {1}'
1749   - .format(uni_out(streamname_unicode), vba_codec))
1750   - streamname_unicode = modulestreamname_streamname.decode(vba_codec, errors='replace')
1751   - log.debug("StreamName.decode('%s') = %s" % (vba_codec, uni_out(streamname_unicode)))
1752   - log.debug("StreamNameUnicode = {0}".format(uni_out(modulestreamname_streamname_unicode)))
1753   - log.debug("TextOffset = {0}".format(moduleoffset_textoffset))
1754   -
1755   - code_data = None
1756   - try_names = streamname_unicode, \
1757   - modulename_unicode_modulename_unicode, \
1758   - modulestreamname_streamname_unicode
1759   - for stream_name in try_names:
1760   - # TODO: if olefile._find were less private, could replace this
1761   - # try-except with calls to it
1762   - try:
1763   - code_path = vba_root + u'VBA/' + stream_name
1764   - log.debug('opening VBA code stream %s' % uni_out(code_path))
1765   - code_data = ole.openstream(code_path).read()
1766   - break
1767   - except IOError as ioe:
1768   - log.debug('failed to open stream VBA/%r (%r), try other name'
1769   - % (uni_out(stream_name), ioe))
1770   -
1771   - if code_data is None:
1772   - log.info("Could not open stream %d of %d ('VBA/' + one of %r)!"
1773   - % (projectmodule_index, projectmodules_count,
1774   - '/'.join("'" + uni_out(stream_name) + "'"
1775   - for stream_name in try_names)))
1776   - if relaxed:
1777   - continue # ... with next submodule
1778   - else:
1779   - raise SubstreamOpenError('[BASE]', 'VBA/' +
1780   - uni_out(modulename_unicode_modulename_unicode))
1781   -
1782   - log.debug("length of code_data = {0}".format(len(code_data)))
1783   - log.debug("offset of code_data = {0}".format(moduleoffset_textoffset))
1784   - code_data = code_data[moduleoffset_textoffset:]
1785   - if len(code_data) > 0:
1786   - code_data = decompress_stream(code_data)
1787   - # case-insensitive search in the code_modules dict to find the file extension:
1788   - filext = code_modules.get(modulename_modulename.lower(), 'bin')
1789   - filename = '{0}.{1}'.format(modulename_modulename, filext)
1790   - #TODO: also yield the codepage so that callers can decode it properly
1791   - yield (code_path, filename, code_data)
1792   - # print '-'*79
1793   - # print filename
1794   - # print ''
1795   - # print code_data
1796   - # print ''
1797   - log.debug('extracted file {0}'.format(filename))
1798   - else:
1799   - log.warning("module stream {0} has code data length 0".format(modulestreamname_streamname))
1800   - except (UnexpectedDataError, SubstreamOpenError):
1801   - raise
1802   - except Exception as exc:
1803   - log.info('Error parsing module {0} of {1} in _extract_vba:'
1804   - .format(projectmodule_index, projectmodules_count),
1805   - exc_info=True)
1806   - if not relaxed:
1807   - raise
1808   - _ = unused # make pylint happy: now variable "unused" is being used ;-)
1809   - return
1810   -
1811   -
1812   -def vba_collapse_long_lines(vba_code):
1813   - """
1814   - Parse a VBA module code to detect continuation line characters (underscore) and
1815   - collapse split lines. Continuation line characters are replaced by spaces.
1816   -
1817   - :param vba_code: str, VBA module code
1818   - :return: str, VBA module code with long lines collapsed
1819   - """
1820   - # TODO: use a regex instead, to allow whitespaces after the underscore?
1821   - vba_code = vba_code.replace(' _\r\n', ' ')
1822   - vba_code = vba_code.replace(' _\r', ' ')
1823   - vba_code = vba_code.replace(' _\n', ' ')
1824   - return vba_code
1825   -
1826   -
1827   -def filter_vba(vba_code):
1828   - """
1829   - Filter VBA source code to remove the first lines starting with "Attribute VB_",
1830   - which are automatically added by MS Office and not displayed in the VBA Editor.
1831   - This should only be used when displaying source code for human analysis.
1832   -
1833   - Note: lines are not filtered if they contain a colon, because it could be
1834   - used to hide malicious instructions.
1835   -
1836   - :param vba_code: str, VBA source code
1837   - :return: str, filtered VBA source code
1838   - """
1839   - vba_lines = vba_code.splitlines()
1840   - start = 0
1841   - for line in vba_lines:
1842   - if line.startswith("Attribute VB_") and not ':' in line:
1843   - start += 1
1844   - else:
1845   - break
1846   - #TODO: also remove empty lines?
1847   - vba = '\n'.join(vba_lines[start:])
1848   - return vba
1849   -
1850   -
1851   -def detect_autoexec(vba_code, obfuscation=None):
1852   - """
1853   - Detect if the VBA code contains keywords corresponding to macros running
1854   - automatically when triggered by specific actions (e.g. when a document is
1855   - opened or closed).
1856   -
1857   - :param vba_code: str, VBA source code
1858   - :param obfuscation: None or str, name of obfuscation to be added to description
1859   - :return: list of str tuples (keyword, description)
1860   - """
1861   - #TODO: merge code with detect_suspicious
1862   - # case-insensitive search
1863   - #vba_code = vba_code.lower()
1864   - results = []
1865   - obf_text = ''
1866   - if obfuscation:
1867   - obf_text = ' (obfuscation: %s)' % obfuscation
1868   - for description, keywords in AUTOEXEC_KEYWORDS.items():
1869   - for keyword in keywords:
1870   - #TODO: if keyword is already a compiled regex, use it as-is
1871   - # search using regex to detect word boundaries:
1872   - match = re.search(r'(?i)\b' + keyword + r'\b', vba_code)
1873   - if match:
1874   - #if keyword.lower() in vba_code:
1875   - found_keyword = match.group()
1876   - results.append((found_keyword, description + obf_text))
1877   - return results
1878   -
1879   -
1880   -def detect_suspicious(vba_code, obfuscation=None):
1881   - """
1882   - Detect if the VBA code contains suspicious keywords corresponding to
1883   - potential malware behaviour.
1884   -
1885   - :param vba_code: str, VBA source code
1886   - :param obfuscation: None or str, name of obfuscation to be added to description
1887   - :return: list of str tuples (keyword, description)
1888   - """
1889   - # case-insensitive search
1890   - #vba_code = vba_code.lower()
1891   - results = []
1892   - obf_text = ''
1893   - if obfuscation:
1894   - obf_text = ' (obfuscation: %s)' % obfuscation
1895   - for description, keywords in SUSPICIOUS_KEYWORDS.items():
1896   - for keyword in keywords:
1897   - # search using regex to detect word boundaries:
1898   - match = re.search(r'(?i)\b' + re.escape(keyword) + r'\b', vba_code)
1899   - if match:
1900   - #if keyword.lower() in vba_code:
1901   - found_keyword = match.group()
1902   - results.append((found_keyword, description + obf_text))
1903   - return results
1904   -
1905   -
1906   -def detect_patterns(vba_code, obfuscation=None):
1907   - """
1908   - Detect if the VBA code contains specific patterns such as IP addresses,
1909   - URLs, e-mail addresses, executable file names, etc.
1910   -
1911   - :param vba_code: str, VBA source code
1912   - :return: list of str tuples (pattern type, value)
1913   - """
1914   - results = []
1915   - found = set()
1916   - obf_text = ''
1917   - if obfuscation:
1918   - obf_text = ' (obfuscation: %s)' % obfuscation
1919   - for pattern_type, pattern_re in RE_PATTERNS:
1920   - for match in pattern_re.finditer(vba_code):
1921   - value = match.group()
1922   - if value not in found:
1923   - results.append((pattern_type + obf_text, value))
1924   - found.add(value)
1925   - return results
1926   -
1927   -
1928   -def detect_hex_strings(vba_code):
1929   - """
1930   - Detect if the VBA code contains strings encoded in hexadecimal.
1931   -
1932   - :param vba_code: str, VBA source code
1933   - :return: list of str tuples (encoded string, decoded string)
1934   - """
1935   - results = []
1936   - found = set()
1937   - for match in re_hex_string.finditer(vba_code):
1938   - value = match.group()
1939   - if value not in found:
1940   - decoded = binascii.unhexlify(value)
1941   - results.append((value, decoded.decode('utf-8', 'backslashreplace')))
1942   - found.add(value)
1943   - return results
1944   -
1945   -
1946   -def detect_base64_strings(vba_code):
1947   - """
1948   - Detect if the VBA code contains strings encoded in base64.
1949   -
1950   - :param vba_code: str, VBA source code
1951   - :return: list of str tuples (encoded string, decoded string)
1952   - """
1953   - #TODO: avoid matching simple hex strings as base64?
1954   - results = []
1955   - found = set()
1956   - for match in re_base64_string.finditer(vba_code):
1957   - # extract the base64 string without quotes:
1958   - value = match.group().strip('"')
1959   - # check it is not just a hex string:
1960   - if not re_nothex_check.search(value):
1961   - continue
1962   - # only keep new values and not in the whitelist:
1963   - if value not in found and value.lower() not in BASE64_WHITELIST:
1964   - try:
1965   - decoded = base64.b64decode(value)
1966   - results.append((value, decoded.decode('utf-8','replace')))
1967   - found.add(value)
1968   - except (TypeError, ValueError) as exc:
1969   - log.debug('Failed to base64-decode (%s)' % exc)
1970   - # if an exception occurs, it is likely not a base64-encoded string
1971   - return results
1972   -
1973   -
1974   -def detect_dridex_strings(vba_code):
1975   - """
1976   - Detect if the VBA code contains strings obfuscated with a specific algorithm found in Dridex samples.
1977   -
1978   - :param vba_code: str, VBA source code
1979   - :return: list of str tuples (encoded string, decoded string)
1980   - """
1981   - # TODO: move this at the beginning of script
1982   - from oletools.thirdparty.DridexUrlDecoder.DridexUrlDecoder import DridexUrlDecode
1983   -
1984   - results = []
1985   - found = set()
1986   - for match in re_dridex_string.finditer(vba_code):
1987   - value = match.group()[1:-1]
1988   - # check it is not just a hex string:
1989   - if not re_nothex_check.search(value):
1990   - continue
1991   - if value not in found:
1992   - try:
1993   - decoded = DridexUrlDecode(value)
1994   - results.append((value, decoded))
1995   - found.add(value)
1996   - except Exception as exc:
1997   - log.debug('Failed to Dridex-decode (%s)' % exc)
1998   - # if an exception occurs, it is likely not a dridex-encoded string
1999   - return results
2000   -
2001   -
2002   -def detect_vba_strings(vba_code):
2003   - """
2004   - Detect if the VBA code contains strings obfuscated with VBA expressions
2005   - using keywords such as Chr, Asc, Val, StrReverse, etc.
2006   -
2007   - :param vba_code: str, VBA source code
2008   - :return: list of str tuples (encoded string, decoded string)
2009   - """
2010   - # TODO: handle exceptions
2011   - results = []
2012   - found = set()
2013   - # IMPORTANT: to extract the actual VBA expressions found in the code,
2014   - # we must expand tabs to have the same string as pyparsing.
2015   - # Otherwise, start and end offsets are incorrect.
2016   - vba_code = vba_code.expandtabs()
2017   - # Split the VBA code line by line to avoid MemoryError on large scripts:
2018   - for vba_line in vba_code.splitlines():
2019   - for tokens, start, end in vba_expr_str.scanString(vba_line):
2020   - encoded = vba_line[start:end]
2021   - decoded = tokens[0]
2022   - if isinstance(decoded, VbaExpressionString):
2023   - # This is a VBA expression, not a simple string
2024   - # print 'VBA EXPRESSION: encoded=%r => decoded=%r' % (encoded, decoded)
2025   - # remove parentheses and quotes from original string:
2026   - # if encoded.startswith('(') and encoded.endswith(')'):
2027   - # encoded = encoded[1:-1]
2028   - # if encoded.startswith('"') and encoded.endswith('"'):
2029   - # encoded = encoded[1:-1]
2030   - # avoid duplicates and simple strings:
2031   - if encoded not in found and decoded != encoded:
2032   - results.append((encoded, decoded))
2033   - found.add(encoded)
2034   - # else:
2035   - # print 'VBA STRING: encoded=%r => decoded=%r' % (encoded, decoded)
2036   - return results
2037   -
2038   -
2039   -def json2ascii(json_obj, encoding='utf8', errors='replace'):
2040   - """ ensure there is no unicode in json and all strings are safe to decode
2041   -
2042   - works recursively, decodes and re-encodes every string to/from unicode
2043   - to ensure there will be no trouble in loading the dumped json output
2044   - """
2045   - if json_obj is None:
2046   - pass
2047   - elif isinstance(json_obj, (bool, int, float)):
2048   - pass
2049   - elif isinstance(json_obj, str):
2050   - # de-code and re-encode
2051   - dencoded = json_obj
2052   - if dencoded != json_obj:
2053   - log.debug('json2ascii: replaced: {0} (len {1})'
2054   - .format(json_obj, len(json_obj)))
2055   - log.debug('json2ascii: with: {0} (len {1})'
2056   - .format(dencoded, len(dencoded)))
2057   - return dencoded
2058   - elif isinstance(json_obj, bytes):
2059   - log.debug('json2ascii: encode unicode: {0}'
2060   - .format(json_obj.decode(encoding, errors)))
2061   - # cannot put original into logger
2062   - # print 'original: ' json_obj
2063   - return json_obj.decode(encoding, errors)
2064   - elif isinstance(json_obj, dict):
2065   - for key in json_obj:
2066   - json_obj[key] = json2ascii(json_obj[key])
2067   - elif isinstance(json_obj, (list,tuple)):
2068   - for item in json_obj:
2069   - item = json2ascii(item)
2070   - else:
2071   - log.debug('unexpected type in json2ascii: {0} -- leave as is'
2072   - .format(type(json_obj)))
2073   - return json_obj
2074   -
2075   -
2076   -def print_json(json_dict=None, _json_is_first=False, _json_is_last=False,
2077   - **json_parts):
2078   - """ line-wise print of json.dumps(json2ascii(..)) with options and indent+1
2079   -
2080   - can use in two ways:
2081   - (1) print_json(some_dict)
2082   - (2) print_json(key1=value1, key2=value2, ...)
2083   -
2084   - :param bool _json_is_first: set to True only for very first entry to complete
2085   - the top-level json-list
2086   - :param bool _json_is_last: set to True only for very last entry to complete
2087   - the top-level json-list
2088   - """
2089   - if json_dict and json_parts:
2090   - raise ValueError('Invalid json argument: want either single dict or '
2091   - 'key=value parts but got both)')
2092   - elif (json_dict is not None) and (not isinstance(json_dict, dict)):
2093   - raise ValueError('Invalid json argument: want either single dict or '
2094   - 'key=value parts but got {0} instead of dict)'
2095   - .format(type(json_dict)))
2096   - if json_parts:
2097   - json_dict = json_parts
2098   -
2099   - if _json_is_first:
2100   - print('[')
2101   -
2102   - lines = json.dumps(json2ascii(json_dict), check_circular=False,
2103   - indent=4, ensure_ascii=False).splitlines()
2104   - for line in lines[:-1]:
2105   - print(' {0}'.format(line))
2106   - if _json_is_last:
2107   - print(' {0}'.format(lines[-1])) # print last line without comma
2108   - print(']')
2109   - else:
2110   - print(' {0},'.format(lines[-1])) # print last line with comma
2111   -
2112   -
2113   -class VBA_Scanner(object):
2114   - """
2115   - Class to scan the source code of a VBA module to find obfuscated strings,
2116   - suspicious keywords, IOCs, auto-executable macros, etc.
2117   - """
2118   -
2119   - def __init__(self, vba_code):
2120   - """
2121   - VBA_Scanner constructor
2122   -
2123   - :param vba_code: str, VBA source code to be analyzed
2124   - """
2125   - if isinstance(vba_code, bytes):
2126   - vba_code = vba_code.decode('utf-8', 'backslashreplace')
2127   - # join long lines ending with " _":
2128   - self.code = vba_collapse_long_lines(vba_code)
2129   - self.code_hex = ''
2130   - self.code_hex_rev = ''
2131   - self.code_rev_hex = ''
2132   - self.code_base64 = ''
2133   - self.code_dridex = ''
2134   - self.code_vba = ''
2135   - self.strReverse = None
2136   - # results = None before scanning, then a list of tuples after scanning
2137   - self.results = None
2138   - self.autoexec_keywords = None
2139   - self.suspicious_keywords = None
2140   - self.iocs = None
2141   - self.hex_strings = None
2142   - self.base64_strings = None
2143   - self.dridex_strings = None
2144   - self.vba_strings = None
2145   -
2146   -
2147   - def scan(self, include_decoded_strings=False, deobfuscate=False):
2148   - """
2149   - Analyze the provided VBA code to detect suspicious keywords,
2150   - auto-executable macros, IOC patterns, obfuscation patterns
2151   - such as hex-encoded strings.
2152   -
2153   - :param include_decoded_strings: bool, if True, all encoded strings will be included with their decoded content.
2154   - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
2155   - :return: list of tuples (type, keyword, description)
2156   - (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String')
2157   - """
2158   - # First, detect and extract hex-encoded strings:
2159   - self.hex_strings = detect_hex_strings(self.code)
2160   - # detect if the code contains StrReverse:
2161   - self.strReverse = False
2162   - if 'strreverse' in self.code.lower(): self.strReverse = True
2163   - # Then append the decoded strings to the VBA code, to detect obfuscated IOCs and keywords:
2164   - for encoded, decoded in self.hex_strings:
2165   - self.code_hex += '\n' + decoded
2166   - # if the code contains "StrReverse", also append the hex strings in reverse order:
2167   - if self.strReverse:
2168   - # StrReverse after hex decoding:
2169   - self.code_hex_rev += '\n' + decoded[::-1]
2170   - # StrReverse before hex decoding:
2171   - self.code_rev_hex += '\n' + str(binascii.unhexlify(encoded[::-1]))
2172   - #example: https://malwr.com/analysis/NmFlMGI4YTY1YzYyNDkwNTg1ZTBiZmY5OGI3YjlhYzU/
2173   - #TODO: also append the full code reversed if StrReverse? (risk of false positives?)
2174   - # Detect Base64-encoded strings
2175   - self.base64_strings = detect_base64_strings(self.code)
2176   - for encoded, decoded in self.base64_strings:
2177   - self.code_base64 += '\n' + decoded
2178   - # Detect Dridex-encoded strings
2179   - self.dridex_strings = detect_dridex_strings(self.code)
2180   - for encoded, decoded in self.dridex_strings:
2181   - self.code_dridex += '\n' + decoded
2182   - # Detect obfuscated strings in VBA expressions
2183   - if deobfuscate:
2184   - self.vba_strings = detect_vba_strings(self.code)
2185   - else:
2186   - self.vba_strings = []
2187   - for encoded, decoded in self.vba_strings:
2188   - self.code_vba += '\n' + decoded
2189   - results = []
2190   - self.autoexec_keywords = []
2191   - self.suspicious_keywords = []
2192   - self.iocs = []
2193   -
2194   - for code, obfuscation in (
2195   - (self.code, None),
2196   - (self.code_hex, 'Hex'),
2197   - (self.code_hex_rev, 'Hex+StrReverse'),
2198   - (self.code_rev_hex, 'StrReverse+Hex'),
2199   - (self.code_base64, 'Base64'),
2200   - (self.code_dridex, 'Dridex'),
2201   - (self.code_vba, 'VBA expression'),
2202   - ):
2203   - if isinstance(code,bytes):
2204   - code=code.decode('utf-8','backslashreplace')
2205   - self.autoexec_keywords += detect_autoexec(code, obfuscation)
2206   - self.suspicious_keywords += detect_suspicious(code, obfuscation)
2207   - self.iocs += detect_patterns(code, obfuscation)
2208   -
2209   - # If hex-encoded strings were discovered, add an item to suspicious keywords:
2210   - if self.hex_strings:
2211   - self.suspicious_keywords.append(('Hex Strings',
2212   - 'Hex-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)'))
2213   - if self.base64_strings:
2214   - self.suspicious_keywords.append(('Base64 Strings',
2215   - 'Base64-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)'))
2216   - if self.dridex_strings:
2217   - self.suspicious_keywords.append(('Dridex Strings',
2218   - 'Dridex-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)'))
2219   - if self.vba_strings:
2220   - self.suspicious_keywords.append(('VBA obfuscated Strings',
2221   - 'VBA string expressions were detected, may be used to obfuscate strings (option --decode to see all)'))
2222   - # use a set to avoid duplicate keywords
2223   - keyword_set = set()
2224   - for keyword, description in self.autoexec_keywords:
2225   - if keyword not in keyword_set:
2226   - results.append(('AutoExec', keyword, description))
2227   - keyword_set.add(keyword)
2228   - keyword_set = set()
2229   - for keyword, description in self.suspicious_keywords:
2230   - if keyword not in keyword_set:
2231   - results.append(('Suspicious', keyword, description))
2232   - keyword_set.add(keyword)
2233   - keyword_set = set()
2234   - for pattern_type, value in self.iocs:
2235   - if value not in keyword_set:
2236   - results.append(('IOC', value, pattern_type))
2237   - keyword_set.add(value)
2238   -
2239   - # include decoded strings only if they are printable or if --decode option:
2240   - for encoded, decoded in self.hex_strings:
2241   - if include_decoded_strings or is_printable(decoded):
2242   - results.append(('Hex String', decoded, encoded))
2243   - for encoded, decoded in self.base64_strings:
2244   - if include_decoded_strings or is_printable(decoded):
2245   - results.append(('Base64 String', decoded, encoded))
2246   - for encoded, decoded in self.dridex_strings:
2247   - if include_decoded_strings or is_printable(decoded):
2248   - results.append(('Dridex string', decoded, encoded))
2249   - for encoded, decoded in self.vba_strings:
2250   - if include_decoded_strings or is_printable(decoded):
2251   - results.append(('VBA string', decoded, encoded))
2252   - self.results = results
2253   - return results
2254   -
2255   - def scan_summary(self):
2256   - """
2257   - Analyze the provided VBA code to detect suspicious keywords,
2258   - auto-executable macros, IOC patterns, obfuscation patterns
2259   - such as hex-encoded strings.
2260   -
2261   - :return: tuple with the number of items found for each category:
2262   - (autoexec, suspicious, IOCs, hex, base64, dridex, vba)
2263   - """
2264   - # avoid scanning the same code twice:
2265   - if self.results is None:
2266   - self.scan()
2267   - return (len(self.autoexec_keywords), len(self.suspicious_keywords),
2268   - len(self.iocs), len(self.hex_strings), len(self.base64_strings),
2269   - len(self.dridex_strings), len(self.vba_strings))
2270   -
2271   -
2272   -def scan_vba(vba_code, include_decoded_strings, deobfuscate=False):
2273   - """
2274   - Analyze the provided VBA code to detect suspicious keywords,
2275   - auto-executable macros, IOC patterns, obfuscation patterns
2276   - such as hex-encoded strings.
2277   - (shortcut for VBA_Scanner(vba_code).scan())
2278   -
2279   - :param vba_code: str, VBA source code to be analyzed
2280   - :param include_decoded_strings: bool, if True all encoded strings will be included with their decoded content.
2281   - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
2282   - :return: list of tuples (type, keyword, description)
2283   - (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String')
2284   - """
2285   - return VBA_Scanner(vba_code).scan(include_decoded_strings, deobfuscate)
2286   -
2287   -
2288   -#=== CLASSES =================================================================
2289   -
2290   -class VBA_Parser(object):
2291   - """
2292   - Class to parse MS Office files, to detect VBA macros and extract VBA source code
2293   - Supported file formats:
2294   - - Word 97-2003 (.doc, .dot)
2295   - - Word 2007+ (.docm, .dotm)
2296   - - Word 2003 XML (.xml)
2297   - - Word MHT - Single File Web Page / MHTML (.mht)
2298   - - Excel 97-2003 (.xls)
2299   - - Excel 2007+ (.xlsm, .xlsb)
2300   - - PowerPoint 97-2003 (.ppt)
2301   - - PowerPoint 2007+ (.pptm, .ppsm)
2302   - """
2303   -
2304   - def __init__(self, filename, data=None, container=None, relaxed=False):
2305   - """
2306   - Constructor for VBA_Parser
2307   -
2308   - :param filename: filename or path of file to parse, or file-like object
2309   -
2310   - :param data: None or bytes str, if None the file will be read from disk (or from the file-like object).
2311   - If data is provided as a bytes string, it will be parsed as the content of the file in memory,
2312   - and not read from disk. Note: files must be read in binary mode, i.e. open(f, 'rb').
2313   -
2314   - :param container: str, path and filename of container if the file is within
2315   - a zip archive, None otherwise.
2316   -
2317   - :param relaxed: if True, treat mal-formed documents and missing streams more like MS office:
2318   - do nothing; if False (default), raise errors in these cases
2319   -
2320   - raises a FileOpenError if all attemps to interpret the data header failed
2321   - """
2322   - #TODO: filename should only be a string, data should be used for the file-like object
2323   - #TODO: filename should be mandatory, optional data is a string or file-like object
2324   - #TODO: also support olefile and zipfile as input
2325   - if data is None:
2326   - # open file from disk:
2327   - _file = filename
2328   - else:
2329   - # file already read in memory, make it a file-like object for zipfile:
2330   - _file = BytesIO(data)
2331   - #self.file = _file
2332   - self.ole_file = None
2333   - self.ole_subfiles = []
2334   - self.filename = filename
2335   - self.container = container
2336   - self.relaxed = relaxed
2337   - self.type = None
2338   - self.vba_projects = None
2339   - self.vba_forms = None
2340   - self.contains_macros = None # will be set to True or False by detect_macros
2341   - self.vba_code_all_modules = None # to store the source code of all modules
2342   - # list of tuples for each module: (subfilename, stream_path, vba_filename, vba_code)
2343   - self.modules = None
2344   - # Analysis results: list of tuples (type, keyword, description) - See VBA_Scanner
2345   - self.analysis_results = None
2346   - # statistics for the scan summary and flags
2347   - self.nb_macros = 0
2348   - self.nb_autoexec = 0
2349   - self.nb_suspicious = 0
2350   - self.nb_iocs = 0
2351   - self.nb_hexstrings = 0
2352   - self.nb_base64strings = 0
2353   - self.nb_dridexstrings = 0
2354   - self.nb_vbastrings = 0
2355   -
2356   - # if filename is None:
2357   - # if isinstance(_file, basestring):
2358   - # if len(_file) < olefile.MINIMAL_OLEFILE_SIZE:
2359   - # self.filename = _file
2360   - # else:
2361   - # self.filename = '<file in bytes string>'
2362   - # else:
2363   - # self.filename = '<file-like object>'
2364   - if olefile.isOleFile(_file):
2365   - # This looks like an OLE file
2366   - self.open_ole(_file)
2367   -
2368   - # check whether file is encrypted (need to do this before try ppt)
2369   - log.debug('Check encryption of ole file')
2370   - crypt_indicator = oleid.OleID(self.ole_file).check_encrypted()
2371   - if crypt_indicator.value:
2372   - raise FileIsEncryptedError(filename)
2373   -
2374   - # if this worked, try whether it is a ppt file (special ole file)
2375   - self.open_ppt()
2376   - if self.type is None and is_zipfile(_file):
2377   - # Zip file, which may be an OpenXML document
2378   - self.open_openxml(_file)
2379   - if self.type is None:
2380   - # read file from disk, check if it is a Word 2003 XML file (WordProcessingML), Excel 2003 XML,
2381   - # or a plain text file containing VBA code
2382   - if data is None:
2383   - with open(filename, 'rb') as file_handle:
2384   - data = file_handle.read()
2385   - # check if it is a Word 2003 XML file (WordProcessingML): must contain the namespace
2386   - if b'http://schemas.microsoft.com/office/word/2003/wordml' in data:
2387   - self.open_word2003xml(data)
2388   - # check if it is a Word/PowerPoint 2007+ XML file (Flat OPC): must contain the namespace
2389   - if b'http://schemas.microsoft.com/office/2006/xmlPackage' in data:
2390   - self.open_flatopc(data)
2391   - # store a lowercase version for the next tests:
2392   - data_lowercase = data.lower()
2393   - # check if it is a MHT file (MIME HTML, Word or Excel saved as "Single File Web Page"):
2394   - # According to my tests, these files usually start with "MIME-Version: 1.0" on the 1st line
2395   - # BUT Word accepts a blank line or other MIME headers inserted before,
2396   - # and even whitespaces in between "MIME", "-", "Version" and ":". The version number is ignored.
2397   - # And the line is case insensitive.
2398   - # so we'll just check the presence of mime, version and multipart anywhere:
2399   - if self.type is None and b'mime' in data_lowercase and b'version' in data_lowercase \
2400   - and b'multipart' in data_lowercase:
2401   - self.open_mht(data)
2402   - #TODO: handle exceptions
2403   - #TODO: Excel 2003 XML
2404   - # Check whether this is rtf
2405   - if rtfobj.is_rtf(data, treat_str_as_data=True):
2406   - # Ignore RTF since it contains no macros and methods in here will not find macros
2407   - # in embedded objects. run rtfobj and repeat on its output.
2408   - msg = '%s is RTF, need to run rtfobj.py and find VBA Macros in its output.' % self.filename
2409   - log.info(msg)
2410   - raise FileOpenError(msg)
2411   - # Check if this is a plain text VBA or VBScript file:
2412   - # To avoid scanning binary files, we simply check for some control chars:
2413   - if self.type is None and b'\x00' not in data:
2414   - self.open_text(data)
2415   - if self.type is None:
2416   - # At this stage, could not match a known format:
2417   - msg = '%s is not a supported file type, cannot extract VBA Macros.' % self.filename
2418   - log.info(msg)
2419   - raise FileOpenError(msg)
2420   -
2421   - def open_ole(self, _file):
2422   - """
2423   - Open an OLE file
2424   - :param _file: filename or file contents in a file object
2425   - :return: nothing
2426   - """
2427   - log.info('Opening OLE file %s' % self.filename)
2428   - try:
2429   - # Open and parse the OLE file, using unicode for path names:
2430   - self.ole_file = olefile.OleFileIO(_file, path_encoding=None)
2431   - # set type only if parsing succeeds
2432   - self.type = TYPE_OLE
2433   - except (IOError, TypeError, ValueError) as exc:
2434   - # TODO: handle OLE parsing exceptions
2435   - log.info('Failed OLE parsing for file %r (%s)' % (self.filename, exc))
2436   - log.debug('Trace:', exc_info=True)
2437   -
2438   -
2439   - def open_openxml(self, _file):
2440   - """
2441   - Open an OpenXML file
2442   - :param _file: filename or file contents in a file object
2443   - :return: nothing
2444   - """
2445   - # This looks like a zip file, need to look for vbaProject.bin inside
2446   - # It can be any OLE file inside the archive
2447   - #...because vbaProject.bin can be renamed:
2448   - # see http://www.decalage.info/files/JCV07_Lagadec_OpenDocument_OpenXML_v4_decalage.pdf#page=18
2449   - log.info('Opening ZIP/OpenXML file %s' % self.filename)
2450   - try:
2451   - z = zipfile.ZipFile(_file)
2452   - #TODO: check if this is actually an OpenXML file
2453   - #TODO: if the zip file is encrypted, suggest to use the -z option, or try '-z infected' automatically
2454   - # check each file within the zip if it is an OLE file, by reading its magic:
2455   - for subfile in z.namelist():
2456   - with z.open(subfile) as file_handle:
2457   - magic = file_handle.read(len(olefile.MAGIC))
2458   - if magic == olefile.MAGIC:
2459   - log.debug('Opening OLE file %s within zip' % subfile)
2460   - with z.open(subfile) as file_handle:
2461   - ole_data = file_handle.read()
2462   - try:
2463   - self.ole_subfiles.append(
2464   - VBA_Parser(filename=subfile, data=ole_data,
2465   - relaxed=self.relaxed))
2466   - except OlevbaBaseException as exc:
2467   - if self.relaxed:
2468   - log.info('%s is not a valid OLE file (%s)' % (subfile, exc))
2469   - log.debug('Trace:', exc_info=True)
2470   - continue
2471   - else:
2472   - raise SubstreamOpenError(self.filename, subfile,
2473   - exc)
2474   - z.close()
2475   - # set type only if parsing succeeds
2476   - self.type = TYPE_OpenXML
2477   - except OlevbaBaseException as exc:
2478   - if self.relaxed:
2479   - log.info('Error {0} caught in Zip/OpenXML parsing for file {1}'
2480   - .format(exc, self.filename))
2481   - log.debug('Trace:', exc_info=True)
2482   - else:
2483   - raise
2484   - except (RuntimeError, zipfile.BadZipfile, zipfile.LargeZipFile, IOError) as exc:
2485   - # TODO: handle parsing exceptions
2486   - log.info('Failed Zip/OpenXML parsing for file %r (%s)'
2487   - % (self.filename, exc))
2488   - log.debug('Trace:', exc_info=True)
2489   -
2490   - def open_word2003xml(self, data):
2491   - """
2492   - Open a Word 2003 XML file
2493   - :param data: file contents in a string or bytes
2494   - :return: nothing
2495   - """
2496   - log.info('Opening Word 2003 XML file %s' % self.filename)
2497   - try:
2498   - # parse the XML content
2499   - # TODO: handle XML parsing exceptions
2500   - et = ET.fromstring(data)
2501   - # find all the binData elements:
2502   - for bindata in et.getiterator(TAG_BINDATA):
2503   - # the binData content is an OLE container for the VBA project, compressed
2504   - # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded.
2505   - # get the filename:
2506   - fname = bindata.get(ATTR_NAME, 'noname.mso')
2507   - # decode the base64 activemime
2508   - mso_data = binascii.a2b_base64(bindata.text)
2509   - if is_mso_file(mso_data):
2510   - # decompress the zlib data stored in the MSO file, which is the OLE container:
2511   - # TODO: handle different offsets => separate function
2512   - try:
2513   - ole_data = mso_file_extract(mso_data)
2514   - self.ole_subfiles.append(
2515   - VBA_Parser(filename=fname, data=ole_data,
2516   - relaxed=self.relaxed))
2517   - except OlevbaBaseException as exc:
2518   - if self.relaxed:
2519   - log.info('Error parsing subfile {0}: {1}'
2520   - .format(fname, exc))
2521   - log.debug('Trace:', exc_info=True)
2522   - else:
2523   - raise SubstreamOpenError(self.filename, fname, exc)
2524   - else:
2525   - log.info('%s is not a valid MSO file' % fname)
2526   - # set type only if parsing succeeds
2527   - self.type = TYPE_Word2003_XML
2528   - except OlevbaBaseException as exc:
2529   - if self.relaxed:
2530   - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc))
2531   - log.debug('Trace:', exc_info=True)
2532   - else:
2533   - raise
2534   - except Exception as exc:
2535   - # TODO: differentiate exceptions for each parsing stage
2536   - # (but ET is different libs, no good exception description in API)
2537   - # found: XMLSyntaxError
2538   - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc))
2539   - log.debug('Trace:', exc_info=True)
2540   -
2541   - def open_flatopc(self, data):
2542   - """
2543   - Open a Word or PowerPoint 2007+ XML file, aka "Flat OPC"
2544   - :param data: file contents in a string or bytes
2545   - :return: nothing
2546   - """
2547   - log.info('Opening Flat OPC Word/PowerPoint XML file %s' % self.filename)
2548   - try:
2549   - # parse the XML content
2550   - # TODO: handle XML parsing exceptions
2551   - et = ET.fromstring(data)
2552   - # TODO: check root node namespace and tag
2553   - # find all the pkg:part elements:
2554   - for pkgpart in et.iter(TAG_PKGPART):
2555   - fname = pkgpart.get(ATTR_PKG_NAME, 'unknown')
2556   - content_type = pkgpart.get(ATTR_PKG_CONTENTTYPE, 'unknown')
2557   - if content_type == CTYPE_VBAPROJECT:
2558   - for bindata in pkgpart.iterfind(TAG_PKGBINDATA):
2559   - try:
2560   - ole_data = binascii.a2b_base64(bindata.text)
2561   - self.ole_subfiles.append(
2562   - VBA_Parser(filename=fname, data=ole_data,
2563   - relaxed=self.relaxed))
2564   - except OlevbaBaseException as exc:
2565   - if self.relaxed:
2566   - log.info('Error parsing subfile {0}: {1}'
2567   - .format(fname, exc))
2568   - log.debug('Trace:', exc_info=True)
2569   - else:
2570   - raise SubstreamOpenError(self.filename, fname, exc)
2571   - # set type only if parsing succeeds
2572   - self.type = TYPE_FlatOPC_XML
2573   - except OlevbaBaseException as exc:
2574   - if self.relaxed:
2575   - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc))
2576   - log.debug('Trace:', exc_info=True)
2577   - else:
2578   - raise
2579   - except Exception as exc:
2580   - # TODO: differentiate exceptions for each parsing stage
2581   - # (but ET is different libs, no good exception description in API)
2582   - # found: XMLSyntaxError
2583   - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc))
2584   - log.debug('Trace:', exc_info=True)
2585   -
2586   - def open_mht(self, data):
2587   - """
2588   - Open a MHTML file
2589   - :param data: file contents in a string or bytes
2590   - :return: nothing
2591   - """
2592   - log.info('Opening MHTML file %s' % self.filename)
2593   - try:
2594   - if isinstance(data,bytes):
2595   - data = data.decode('utf8', 'backslashreplace')
2596   - # parse the MIME content
2597   - # remove any leading whitespace or newline (workaround for issue in email package)
2598   - stripped_data = data.lstrip('\r\n\t ')
2599   - # strip any junk from the beginning of the file
2600   - # (issue #31 fix by Greg C - gdigreg)
2601   - # TODO: improve keywords to avoid false positives
2602   - mime_offset = stripped_data.find('MIME')
2603   - content_offset = stripped_data.find('Content')
2604   - # if "MIME" is found, and located before "Content":
2605   - if -1 < mime_offset <= content_offset:
2606   - stripped_data = stripped_data[mime_offset:]
2607   - # else if "Content" is found, and before "MIME"
2608   - # TODO: can it work without "MIME" at all?
2609   - elif content_offset > -1:
2610   - stripped_data = stripped_data[content_offset:]
2611   - # TODO: quick and dirty fix: insert a standard line with MIME-Version header?
2612   - mhtml = email.message_from_string(stripped_data)
2613   - # find all the attached files:
2614   - for part in mhtml.walk():
2615   - content_type = part.get_content_type() # always returns a value
2616   - fname = part.get_filename(None) # returns None if it fails
2617   - # TODO: get content-location if no filename
2618   - log.debug('MHTML part: filename=%r, content-type=%r' % (fname, content_type))
2619   - part_data = part.get_payload(decode=True)
2620   - # VBA macros are stored in a binary file named "editdata.mso".
2621   - # the data content is an OLE container for the VBA project, compressed
2622   - # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded.
2623   - # decompress the zlib data starting at offset 0x32, which is the OLE container:
2624   - # check ActiveMime header:
2625   -
2626   - if (isinstance(part_data, str) or isinstance(part_data, bytes)) and is_mso_file(part_data):
2627   - log.debug('Found ActiveMime header, decompressing MSO container')
2628   - try:
2629   - ole_data = mso_file_extract(part_data)
2630   -
2631   - # TODO: check if it is actually an OLE file
2632   - # TODO: get the MSO filename from content_location?
2633   - self.ole_subfiles.append(
2634   - VBA_Parser(filename=fname, data=ole_data,
2635   - relaxed=self.relaxed))
2636   - except OlevbaBaseException as exc:
2637   - if self.relaxed:
2638   - log.info('%s does not contain a valid OLE file (%s)'
2639   - % (fname, exc))
2640   - log.debug('Trace:', exc_info=True)
2641   - # TODO: bug here - need to split in smaller functions/classes?
2642   - else:
2643   - raise SubstreamOpenError(self.filename, fname, exc)
2644   - else:
2645   - log.debug('type(part_data) = %s' % type(part_data))
2646   - try:
2647   - log.debug('part_data[0:20] = %r' % part_data[0:20])
2648   - except TypeError as err:
2649   - log.debug('part_data has no __getitem__')
2650   - # set type only if parsing succeeds
2651   - self.type = TYPE_MHTML
2652   - except OlevbaBaseException:
2653   - raise
2654   - except Exception:
2655   - log.info('Failed MIME parsing for file %r - %s'
2656   - % (self.filename, MSG_OLEVBA_ISSUES))
2657   - log.debug('Trace:', exc_info=True)
2658   -
2659   - def open_ppt(self):
2660   - """ try to interpret self.ole_file as PowerPoint 97-2003 using PptParser
2661   -
2662   - Although self.ole_file is a valid olefile.OleFileIO, we set
2663   - self.ole_file = None in here and instead set self.ole_subfiles to the
2664   - VBA ole streams found within the main ole file. That makes most of the
2665   - code below treat this like an OpenXML file and only look at the
2666   - ole_subfiles (except find_vba_* which needs to explicitly check for
2667   - self.type)
2668   - """
2669   -
2670   - log.info('Check whether OLE file is PPT')
2671   - try:
2672   - ppt = ppt_parser.PptParser(self.ole_file, fast_fail=True)
2673   - for vba_data in ppt.iter_vba_data():
2674   - self.ole_subfiles.append(VBA_Parser(None, vba_data,
2675   - container='PptParser'))
2676   - log.info('File is PPT')
2677   - self.ole_file.close() # just in case
2678   - self.ole_file = None # required to make other methods look at ole_subfiles
2679   - self.type = TYPE_PPT
2680   - except Exception as exc:
2681   - if self.container == 'PptParser':
2682   - # this is a subfile of a ppt --> to be expected that is no ppt
2683   - log.debug('PPT subfile is not a PPT file')
2684   - else:
2685   - log.debug("File appears not to be a ppt file (%s)" % exc)
2686   -
2687   -
2688   - def open_text(self, data):
2689   - """
2690   - Open a text file containing VBA or VBScript source code
2691   - :param data: file contents in a string or bytes
2692   - :return: nothing
2693   - """
2694   - log.info('Opening text file %s' % self.filename)
2695   - # directly store the source code:
2696   - if isinstance(data,bytes):
2697   - data=data.decode('utf8','backslashreplace')
2698   - self.vba_code_all_modules = data
2699   - self.contains_macros = True
2700   - # set type only if parsing succeeds
2701   - self.type = TYPE_TEXT
2702   -
2703   -
2704   - def find_vba_projects(self):
2705   - """
2706   - Finds all the VBA projects stored in an OLE file.
2707   -
2708   - Return None if the file is not OLE but OpenXML.
2709   - Return a list of tuples (vba_root, project_path, dir_path) for each VBA project.
2710   - vba_root is the path of the root OLE storage containing the VBA project,
2711   - including a trailing slash unless it is the root of the OLE file.
2712   - project_path is the path of the OLE stream named "PROJECT" within the VBA project.
2713   - dir_path is the path of the OLE stream named "VBA/dir" within the VBA project.
2714   -
2715   - If this function returns an empty list for one of the supported formats
2716   - (i.e. Word, Excel, Powerpoint), then the file does not contain VBA macros.
2717   -
2718   - :return: None if OpenXML file, list of tuples (vba_root, project_path, dir_path)
2719   - for each VBA project found if OLE file
2720   - """
2721   - log.debug('VBA_Parser.find_vba_projects')
2722   -
2723   - # if the file is not OLE but OpenXML, return None:
2724   - if self.ole_file is None and self.type != TYPE_PPT:
2725   - return None
2726   -
2727   - # if this method has already been called, return previous result:
2728   - if self.vba_projects is not None:
2729   - return self.vba_projects
2730   -
2731   - # if this is a ppt file (PowerPoint 97-2003):
2732   - # self.ole_file is None but the ole_subfiles do contain vba_projects
2733   - # (like for OpenXML files).
2734   - if self.type == TYPE_PPT:
2735   - # TODO: so far, this function is never called for PPT files, but
2736   - # if that happens, the information is lost which ole file contains
2737   - # which storage!
2738   - log.warning('Returned info is not complete for PPT types!')
2739   - self.vba_projects = []
2740   - for subfile in self.ole_subfiles:
2741   - self.vba_projects.extend(subfile.find_vba_projects())
2742   - return self.vba_projects
2743   -
2744   - # Find the VBA project root (different in MS Word, Excel, etc):
2745   - # - Word 97-2003: Macros
2746   - # - Excel 97-2003: _VBA_PROJECT_CUR
2747   - # - PowerPoint 97-2003: PptParser has identified ole_subfiles
2748   - # - Word 2007+: word/vbaProject.bin in zip archive, then the VBA project is the root of vbaProject.bin.
2749   - # - Excel 2007+: xl/vbaProject.bin in zip archive, then same as Word
2750   - # - PowerPoint 2007+: ppt/vbaProject.bin in zip archive, then same as Word
2751   - # - Visio 2007: not supported yet (different file structure)
2752   -
2753   - # According to MS-OVBA section 2.2.1:
2754   - # - the VBA project root storage MUST contain a VBA storage and a PROJECT stream
2755   - # - The root/VBA storage MUST contain a _VBA_PROJECT stream and a dir stream
2756   - # - all names are case-insensitive
2757   -
2758   - def check_vba_stream(ole, vba_root, stream_path):
2759   - full_path = vba_root + stream_path
2760   - if ole.exists(full_path) and ole.get_type(full_path) == olefile.STGTY_STREAM:
2761   - log.debug('Found %s stream: %s' % (stream_path, full_path))
2762   - return full_path
2763   - else:
2764   - log.debug('Missing %s stream, this is not a valid VBA project structure' % stream_path)
2765   - return False
2766   -
2767   - # start with an empty list:
2768   - self.vba_projects = []
2769   - # Look for any storage containing those storage/streams:
2770   - ole = self.ole_file
2771   - for storage in ole.listdir(streams=False, storages=True):
2772   - log.debug('Checking storage %r' % storage)
2773   - # Look for a storage ending with "VBA":
2774   - if storage[-1].upper() == 'VBA':
2775   - log.debug('Found VBA storage: %s' % ('/'.join(storage)))
2776   - vba_root = '/'.join(storage[:-1])
2777   - # Add a trailing slash to vba_root, unless it is the root of the OLE file:
2778   - # (used later to append all the child streams/storages)
2779   - if vba_root != '':
2780   - vba_root += '/'
2781   - log.debug('Checking vba_root="%s"' % vba_root)
2782   -
2783   - # Check if the VBA root storage also contains a PROJECT stream:
2784   - project_path = check_vba_stream(ole, vba_root, 'PROJECT')
2785   - if not project_path: continue
2786   - # Check if the VBA root storage also contains a VBA/_VBA_PROJECT stream:
2787   - vba_project_path = check_vba_stream(ole, vba_root, 'VBA/_VBA_PROJECT')
2788   - if not vba_project_path: continue
2789   - # Check if the VBA root storage also contains a VBA/dir stream:
2790   - dir_path = check_vba_stream(ole, vba_root, 'VBA/dir')
2791   - if not dir_path: continue
2792   - # Now we are pretty sure it is a VBA project structure
2793   - log.debug('VBA root storage: "%s"' % vba_root)
2794   - # append the results to the list as a tuple for later use:
2795   - self.vba_projects.append((vba_root, project_path, dir_path))
2796   - return self.vba_projects
2797   -
2798   - def detect_vba_macros(self):
2799   - """
2800   - Detect the potential presence of VBA macros in the file, by checking
2801   - if it contains VBA projects. Both OLE and OpenXML files are supported.
2802   -
2803   - Important: for now, results are accurate only for Word, Excel and PowerPoint
2804   -
2805   - Note: this method does NOT attempt to check the actual presence or validity
2806   - of VBA macro source code, so there might be false positives.
2807   - It may also detect VBA macros in files embedded within the main file,
2808   - for example an Excel workbook with macros embedded into a Word
2809   - document without macros may be detected, without distinction.
2810   -
2811   - :return: bool, True if at least one VBA project has been found, False otherwise
2812   - """
2813   - #TODO: return None or raise exception if format not supported
2814   - #TODO: return the number of VBA projects found instead of True/False?
2815   - # if this method was already called, return the previous result:
2816   - if self.contains_macros is not None:
2817   - return self.contains_macros
2818   - # if OpenXML/PPT, check all the OLE subfiles:
2819   - if self.ole_file is None:
2820   - for ole_subfile in self.ole_subfiles:
2821   - if ole_subfile.detect_vba_macros():
2822   - self.contains_macros = True
2823   - return True
2824   - # otherwise, no macro found:
2825   - self.contains_macros = False
2826   - return False
2827   - # otherwise it's an OLE file, find VBA projects:
2828   - vba_projects = self.find_vba_projects()
2829   - if len(vba_projects) == 0:
2830   - self.contains_macros = False
2831   - else:
2832   - self.contains_macros = True
2833   - # Also look for VBA code in any stream including orphans
2834   - # (happens in some malformed files)
2835   - ole = self.ole_file
2836   - for sid in xrange(len(ole.direntries)):
2837   - # check if id is already done above:
2838   - log.debug('Checking DirEntry #%d' % sid)
2839   - d = ole.direntries[sid]
2840   - if d is None:
2841   - # this direntry is not part of the tree: either unused or an orphan
2842   - d = ole._load_direntry(sid)
2843   - log.debug('This DirEntry is an orphan or unused')
2844   - if d.entry_type == olefile.STGTY_STREAM:
2845   - # read data
2846   - log.debug('Reading data from stream %r - size: %d bytes' % (d.name, d.size))
2847   - try:
2848   - data = ole._open(d.isectStart, d.size).read()
2849   - log.debug('Read %d bytes' % len(data))
2850   - if len(data) > 200:
2851   - log.debug('%r...[much more data]...%r' % (data[:100], data[-50:]))
2852   - else:
2853   - log.debug(repr(data))
2854   - if 'Attribut\x00' in data.decode('utf-8', 'ignore'):
2855   - log.debug('Found VBA compressed code')
2856   - self.contains_macros = True
2857   - except IOError as exc:
2858   - if self.relaxed:
2859   - log.info('Error when reading OLE Stream %r' % d.name)
2860   - log.debug('Trace:', exc_trace=True)
2861   - else:
2862   - raise SubstreamOpenError(self.filename, d.name, exc)
2863   - return self.contains_macros
2864   -
2865   - def extract_macros(self):
2866   - """
2867   - Extract and decompress source code for each VBA macro found in the file
2868   -
2869   - Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found
2870   - If the file is OLE, filename is the path of the file.
2871   - If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros
2872   - within the zip archive, e.g. word/vbaProject.bin.
2873   - If the file is PPT, result is as for OpenXML but filename is useless
2874   - """
2875   - log.debug('extract_macros:')
2876   - if self.ole_file is None:
2877   - # This may be either an OpenXML/PPT or a text file:
2878   - if self.type == TYPE_TEXT:
2879   - # This is a text file, yield the full code:
2880   - yield (self.filename, '', self.filename, self.vba_code_all_modules)
2881   - else:
2882   - # OpenXML/PPT: recursively yield results from each OLE subfile:
2883   - for ole_subfile in self.ole_subfiles:
2884   - for results in ole_subfile.extract_macros():
2885   - yield results
2886   - else:
2887   - # This is an OLE file:
2888   - self.find_vba_projects()
2889   - # set of stream ids
2890   - vba_stream_ids = set()
2891   - for vba_root, project_path, dir_path in self.vba_projects:
2892   - # extract all VBA macros from that VBA root storage:
2893   - # The function _extract_vba may fail on some files (issue #132)
2894   - try:
2895   - for stream_path, vba_filename, vba_code in \
2896   - _extract_vba(self.ole_file, vba_root, project_path,
2897   - dir_path, self.relaxed):
2898   - # store direntry ids in a set:
2899   - vba_stream_ids.add(self.ole_file._find(stream_path))
2900   - yield (self.filename, stream_path, vba_filename, vba_code)
2901   - except Exception as e:
2902   - log.exception('Error in _extract_vba')
2903   - # Also look for VBA code in any stream including orphans
2904   - # (happens in some malformed files)
2905   - ole = self.ole_file
2906   - for sid in xrange(len(ole.direntries)):
2907   - # check if id is already done above:
2908   - log.debug('Checking DirEntry #%d' % sid)
2909   - if sid in vba_stream_ids:
2910   - log.debug('Already extracted')
2911   - continue
2912   - d = ole.direntries[sid]
2913   - if d is None:
2914   - # this direntry is not part of the tree: either unused or an orphan
2915   - d = ole._load_direntry(sid)
2916   - log.debug('This DirEntry is an orphan or unused')
2917   - if d.entry_type == olefile.STGTY_STREAM:
2918   - # read data
2919   - log.debug('Reading data from stream %r' % d.name)
2920   - data = ole._open(d.isectStart, d.size).read()
2921   - for match in re.finditer(b'\\x00Attribut[^e]', data, flags=re.IGNORECASE):
2922   - start = match.start() - 3
2923   - log.debug('Found VBA compressed code at index %X' % start)
2924   - compressed_code = data[start:]
2925   - try:
2926   - vba_code = decompress_stream(compressed_code)
2927   - yield (self.filename, d.name, d.name, vba_code)
2928   - except Exception as exc:
2929   - # display the exception with full stack trace for debugging
2930   - log.debug('Error processing stream %r in file %r (%s)' % (d.name, self.filename, exc))
2931   - log.debug('Traceback:', exc_info=True)
2932   - # do not raise the error, as it is unlikely to be a compressed macro stream
2933   -
2934   - def extract_all_macros(self):
2935   - """
2936   - Extract and decompress source code for each VBA macro found in the file
2937   - by calling extract_macros(), store the results as a list of tuples
2938   - (filename, stream_path, vba_filename, vba_code) in self.modules.
2939   - See extract_macros for details.
2940   - """
2941   - if self.modules is None:
2942   - self.modules = []
2943   - for (subfilename, stream_path, vba_filename, vba_code) in self.extract_macros():
2944   - self.modules.append((subfilename, stream_path, vba_filename, vba_code))
2945   - self.nb_macros = len(self.modules)
2946   - return self.modules
2947   -
2948   -
2949   -
2950   - def analyze_macros(self, show_decoded_strings=False, deobfuscate=False):
2951   - """
2952   - runs extract_macros and analyze the source code of all VBA macros
2953   - found in the file.
2954   - """
2955   - if self.detect_vba_macros():
2956   - # if the analysis was already done, avoid doing it twice:
2957   - if self.analysis_results is not None:
2958   - return self.analysis_results
2959   - # variable to merge source code from all modules:
2960   - if self.vba_code_all_modules is None:
2961   - self.vba_code_all_modules = ''
2962   - for (_, _, _, vba_code) in self.extract_all_macros():
2963   - #TODO: filter code? (each module)
2964   - if isinstance(vba_code, bytes):
2965   - vba_code = vba_code.decode('utf-8', 'ignore')
2966   - self.vba_code_all_modules += vba_code + '\n'
2967   - for (_, _, form_string) in self.extract_form_strings():
2968   - self.vba_code_all_modules += form_string.decode('utf-8', 'ignore') + '\n'
2969   - # Analyze the whole code at once:
2970   - scanner = VBA_Scanner(self.vba_code_all_modules)
2971   - self.analysis_results = scanner.scan(show_decoded_strings, deobfuscate)
2972   - autoexec, suspicious, iocs, hexstrings, base64strings, dridex, vbastrings = scanner.scan_summary()
2973   - self.nb_autoexec += autoexec
2974   - self.nb_suspicious += suspicious
2975   - self.nb_iocs += iocs
2976   - self.nb_hexstrings += hexstrings
2977   - self.nb_base64strings += base64strings
2978   - self.nb_dridexstrings += dridex
2979   - self.nb_vbastrings += vbastrings
2980   -
2981   - return self.analysis_results
2982   -
2983   -
2984   - def reveal(self):
2985   - # we only want printable strings:
2986   - analysis = self.analyze_macros(show_decoded_strings=False)
2987   - # to avoid replacing short strings contained into longer strings, we sort the analysis results
2988   - # based on the length of the encoded string, in reverse order:
2989   - analysis = sorted(analysis, key=lambda type_decoded_encoded: len(type_decoded_encoded[2]), reverse=True)
2990   - # normally now self.vba_code_all_modules contains source code from all modules
2991   - # Need to collapse long lines:
2992   - deobf_code = vba_collapse_long_lines(self.vba_code_all_modules)
2993   - deobf_code = filter_vba(deobf_code)
2994   - for kw_type, decoded, encoded in analysis:
2995   - if kw_type == 'VBA string':
2996   - #print '%3d occurences: %r => %r' % (deobf_code.count(encoded), encoded, decoded)
2997   - # need to add double quotes around the decoded strings
2998   - # after escaping double-quotes as double-double-quotes for VBA:
2999   - decoded = decoded.replace('"', '""')
3000   - decoded = '"%s"' % decoded
3001   - # if the encoded string is enclosed in parentheses,
3002   - # keep them in the decoded version:
3003   - if encoded.startswith('(') and encoded.endswith(')'):
3004   - decoded = '(%s)' % decoded
3005   - deobf_code = deobf_code.replace(encoded, decoded)
3006   - # # TODO: there is a bug somewhere which creates double returns '\r\r'
3007   - # deobf_code = deobf_code.replace('\r\r', '\r')
3008   - return deobf_code
3009   - #TODO: repasser l'analyse plusieurs fois si des chaines hex ou base64 sont revelees
3010   -
3011   -
3012   - def find_vba_forms(self):
3013   - """
3014   - Finds all the VBA forms stored in an OLE file.
3015   -
3016   - Return None if the file is not OLE but OpenXML.
3017   - Return a list of tuples (vba_root, project_path, dir_path) for each VBA project.
3018   - vba_root is the path of the root OLE storage containing the VBA project,
3019   - including a trailing slash unless it is the root of the OLE file.
3020   - project_path is the path of the OLE stream named "PROJECT" within the VBA project.
3021   - dir_path is the path of the OLE stream named "VBA/dir" within the VBA project.
3022   -
3023   - If this function returns an empty list for one of the supported formats
3024   - (i.e. Word, Excel, Powerpoint), then the file does not contain VBA forms.
3025   -
3026   - :return: None if OpenXML file, list of tuples (vba_root, project_path, dir_path)
3027   - for each VBA project found if OLE file
3028   - """
3029   - log.debug('VBA_Parser.find_vba_forms')
3030   -
3031   - # if the file is not OLE but OpenXML, return None:
3032   - if self.ole_file is None and self.type != TYPE_PPT:
3033   - return None
3034   -
3035   - # if this method has already been called, return previous result:
3036   - # if self.vba_projects is not None:
3037   - # return self.vba_projects
3038   -
3039   - # According to MS-OFORMS section 2.1.2 Control Streams:
3040   - # - A parent control, that is, a control that can contain embedded controls,
3041   - # MUST be persisted as a storage that contains multiple streams.
3042   - # - All parent controls MUST contain a FormControl. The FormControl
3043   - # properties are persisted to a stream (1) as specified in section 2.1.1.2.
3044   - # The name of this stream (1) MUST be "f".
3045   - # - Embedded controls that cannot themselves contain other embedded
3046   - # controls are persisted sequentially as FormEmbeddedActiveXControls
3047   - # to a stream (1) contained in the same storage as the parent control.
3048   - # The name of this stream (1) MUST be "o".
3049   - # - all names are case-insensitive
3050   -
3051   - if self.type == TYPE_PPT:
3052   - # TODO: so far, this function is never called for PPT files, but
3053   - # if that happens, the information is lost which ole file contains
3054   - # which storage!
3055   - ole_files = self.ole_subfiles
3056   - log.warning('Returned info is not complete for PPT types!')
3057   - else:
3058   - ole_files = [self.ole_file, ]
3059   -
3060   - # start with an empty list:
3061   - self.vba_forms = []
3062   -
3063   - # Loop over ole streams
3064   - for ole in ole_files:
3065   - # Look for any storage containing those storage/streams:
3066   - for storage in ole.listdir(streams=False, storages=True):
3067   - log.debug('Checking storage %r' % storage)
3068   - # Look for two streams named 'o' and 'f':
3069   - o_stream = storage + ['o']
3070   - f_stream = storage + ['f']
3071   - log.debug('Checking if streams %r and %r exist' % (f_stream, o_stream))
3072   - if ole.exists(o_stream) and ole.get_type(o_stream) == olefile.STGTY_STREAM \
3073   - and ole.exists(f_stream) and ole.get_type(f_stream) == olefile.STGTY_STREAM:
3074   - form_path = '/'.join(storage)
3075   - log.debug('Found VBA Form: %r' % form_path)
3076   - self.vba_forms.append(storage)
3077   - return self.vba_forms
3078   -
3079   - def extract_form_strings(self):
3080   - """
3081   - Extract printable strings from each VBA Form found in the file
3082   -
3083   - Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found
3084   - If the file is OLE, filename is the path of the file.
3085   - If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros
3086   - within the zip archive, e.g. word/vbaProject.bin.
3087   - If the file is PPT, result is as for OpenXML but filename is useless
3088   - """
3089   - if self.ole_file is None:
3090   - # This may be either an OpenXML/PPT or a text file:
3091   - if self.type == TYPE_TEXT:
3092   - # This is a text file, return no results:
3093   - return
3094   - else:
3095   - # OpenXML/PPT: recursively yield results from each OLE subfile:
3096   - for ole_subfile in self.ole_subfiles:
3097   - for results in ole_subfile.extract_form_strings():
3098   - yield results
3099   - else:
3100   - # This is an OLE file:
3101   - self.find_vba_forms()
3102   - ole = self.ole_file
3103   - for form_storage in self.vba_forms:
3104   - o_stream = form_storage + ['o']
3105   - log.debug('Opening form object stream %r' % '/'.join(o_stream))
3106   - form_data = ole.openstream(o_stream).read()
3107   - # Extract printable strings from the form object stream "o":
3108   - for m in re_printable_string.finditer(form_data):
3109   - log.debug('Printable string found in form: %r' % m.group())
3110   - yield (self.filename, '/'.join(o_stream), m.group())
3111   -
3112   -
3113   - def close(self):
3114   - """
3115   - Close all the open files. This method must be called after usage, if
3116   - the application is opening many files.
3117   - """
3118   - if self.ole_file is None:
3119   - if self.ole_subfiles is not None:
3120   - for ole_subfile in self.ole_subfiles:
3121   - ole_subfile.close()
3122   - else:
3123   - self.ole_file.close()
3124   -
3125   -
3126   -
3127   -class VBA_Parser_CLI(VBA_Parser):
3128   - """
3129   - VBA parser and analyzer, adding methods for the command line interface
3130   - of olevba. (see VBA_Parser)
3131   - """
3132   -
3133   - def __init__(self, *args, **kwargs):
3134   - """
3135   - Constructor for VBA_Parser_CLI.
3136   - Calls __init__ from VBA_Parser with all arguments --> see doc there
3137   - """
3138   - super(VBA_Parser_CLI, self).__init__(*args, **kwargs)
3139   -
3140   -
3141   - def print_analysis(self, show_decoded_strings=False, deobfuscate=False):
3142   - """
3143   - Analyze the provided VBA code, and print the results in a table
3144   -
3145   - :param vba_code: str, VBA source code to be analyzed
3146   - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
3147   - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
3148   - :return: None
3149   - """
3150   - # print a waiting message only if the output is not redirected to a file:
3151   - if sys.stdout.isatty():
3152   - print('Analysis...\r', end='')
3153   - sys.stdout.flush()
3154   - results = self.analyze_macros(show_decoded_strings, deobfuscate)
3155   - if results:
3156   - t = prettytable.PrettyTable(('Type', 'Keyword', 'Description'))
3157   - t.align = 'l'
3158   - t.max_width['Type'] = 10
3159   - t.max_width['Keyword'] = 20
3160   - t.max_width['Description'] = 39
3161   - for kw_type, keyword, description in results:
3162   - # handle non printable strings:
3163   - if not is_printable(keyword):
3164   - keyword = repr(keyword)
3165   - if not is_printable(description):
3166   - description = repr(description)
3167   - t.add_row((kw_type, keyword, description))
3168   - print(t)
3169   - else:
3170   - print('No suspicious keyword or IOC found.')
3171   -
3172   - def print_analysis_json(self, show_decoded_strings=False, deobfuscate=False):
3173   - """
3174   - Analyze the provided VBA code, and return the results in json format
3175   -
3176   - :param vba_code: str, VBA source code to be analyzed
3177   - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
3178   - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
3179   -
3180   - :return: dict
3181   - """
3182   - # print a waiting message only if the output is not redirected to a file:
3183   - if sys.stdout.isatty():
3184   - print('Analysis...\r', end='')
3185   - sys.stdout.flush()
3186   - return [dict(type=kw_type, keyword=keyword, description=description)
3187   - for kw_type, keyword, description in self.analyze_macros(show_decoded_strings, deobfuscate)]
3188   -
3189   - def process_file(self, show_decoded_strings=False,
3190   - display_code=True, hide_attributes=True,
3191   - vba_code_only=False, show_deobfuscated_code=False,
3192   - deobfuscate=False):
3193   - """
3194   - Process a single file
3195   -
3196   - :param filename: str, path and filename of file on disk, or within the container.
3197   - :param data: bytes, content of the file if it is in a container, None if it is a file on disk.
3198   - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
3199   - :param display_code: bool, if False VBA source code is not displayed (default True)
3200   - :param global_analysis: bool, if True all modules are merged for a single analysis (default),
3201   - otherwise each module is analyzed separately (old behaviour)
3202   - :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default)
3203   - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
3204   - """
3205   - #TODO: replace print by writing to a provided output file (sys.stdout by default)
3206   - # fix conflicting parameters:
3207   - if vba_code_only and not display_code:
3208   - display_code = True
3209   - if self.container:
3210   - display_filename = '%s in %s' % (self.filename, self.container)
3211   - else:
3212   - display_filename = self.filename
3213   - print('=' * 79)
3214   - print('FILE: %s' % display_filename)
3215   - try:
3216   - #TODO: handle olefile errors, when an OLE file is malformed
3217   - print('Type: %s'% self.type)
3218   - if self.detect_vba_macros():
3219   - #print 'Contains VBA Macros:'
3220   - for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros():
3221   - if hide_attributes:
3222   - # hide attribute lines:
3223   - if isinstance(vba_code,bytes):
3224   - vba_code =vba_code.decode('utf-8','backslashreplace')
3225   - vba_code_filtered = filter_vba(vba_code)
3226   - else:
3227   - vba_code_filtered = vba_code
3228   - print('-' * 79)
3229   - print('VBA MACRO %s ' % vba_filename)
3230   - print('in file: %s - OLE stream: %s' % (subfilename, repr(stream_path)))
3231   - if display_code:
3232   - print('- ' * 39)
3233   - # detect empty macros:
3234   - if vba_code_filtered.strip() == '':
3235   - print('(empty macro)')
3236   - else:
3237   - print(vba_code_filtered)
3238   - for (subfilename, stream_path, form_string) in self.extract_form_strings():
3239   - print('-' * 79)
3240   - print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path))
3241   - print('- ' * 39)
3242   - print(form_string.decode('utf-8', 'ignore'))
3243   - if not vba_code_only:
3244   - # analyse the code from all modules at once:
3245   - self.print_analysis(show_decoded_strings, deobfuscate)
3246   - if show_deobfuscated_code:
3247   - print('MACRO SOURCE CODE WITH DEOBFUSCATED VBA STRINGS (EXPERIMENTAL):\n\n')
3248   - print(self.reveal())
3249   - else:
3250   - print('No VBA macros found.')
3251   - except OlevbaBaseException:
3252   - raise
3253   - except Exception as exc:
3254   - # display the exception with full stack trace for debugging
3255   - log.info('Error processing file %s (%s)' % (self.filename, exc))
3256   - log.debug('Traceback:', exc_info=True)
3257   - raise ProcessingError(self.filename, exc)
3258   - print('')
3259   -
3260   -
3261   - def process_file_json(self, show_decoded_strings=False,
3262   - display_code=True, hide_attributes=True,
3263   - vba_code_only=False, show_deobfuscated_code=False,
3264   - deobfuscate=False):
3265   - """
3266   - Process a single file
3267   -
3268   - every "show" or "print" here is to be translated as "add to json"
3269   -
3270   - :param filename: str, path and filename of file on disk, or within the container.
3271   - :param data: bytes, content of the file if it is in a container, None if it is a file on disk.
3272   - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
3273   - :param display_code: bool, if False VBA source code is not displayed (default True)
3274   - :param global_analysis: bool, if True all modules are merged for a single analysis (default),
3275   - otherwise each module is analyzed separately (old behaviour)
3276   - :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default)
3277   - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
3278   - """
3279   - #TODO: fix conflicting parameters (?)
3280   -
3281   - if vba_code_only and not display_code:
3282   - display_code = True
3283   -
3284   - result = {}
3285   -
3286   - if self.container:
3287   - result['container'] = self.container
3288   - else:
3289   - result['container'] = None
3290   - result['file'] = self.filename
3291   - result['json_conversion_successful'] = False
3292   - result['analysis'] = None
3293   - result['code_deobfuscated'] = None
3294   - result['do_deobfuscate'] = deobfuscate
3295   -
3296   - try:
3297   - #TODO: handle olefile errors, when an OLE file is malformed
3298   - result['type'] = self.type
3299   - macros = []
3300   - if self.detect_vba_macros():
3301   - for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros():
3302   - curr_macro = {}
3303   - if isinstance(vba_code, bytes):
3304   - vba_code = vba_code.decode('utf-8', 'backslashreplace')
3305   -
3306   - if hide_attributes:
3307   - # hide attribute lines:
3308   - vba_code_filtered = filter_vba(vba_code)
3309   - else:
3310   - vba_code_filtered = vba_code
3311   -
3312   - curr_macro['vba_filename'] = vba_filename
3313   - curr_macro['subfilename'] = subfilename
3314   - curr_macro['ole_stream'] = stream_path
3315   - if display_code:
3316   - curr_macro['code'] = vba_code_filtered.strip()
3317   - else:
3318   - curr_macro['code'] = None
3319   - macros.append(curr_macro)
3320   - if not vba_code_only:
3321   - # analyse the code from all modules at once:
3322   - result['analysis'] = self.print_analysis_json(show_decoded_strings,
3323   - deobfuscate)
3324   - if show_deobfuscated_code:
3325   - result['code_deobfuscated'] = self.reveal()
3326   - result['macros'] = macros
3327   - result['json_conversion_successful'] = True
3328   - except Exception as exc:
3329   - # display the exception with full stack trace for debugging
3330   - log.info('Error processing file %s (%s)' % (self.filename, exc))
3331   - log.debug('Traceback:', exc_info=True)
3332   - raise ProcessingError(self.filename, exc)
3333   -
3334   - return result
3335   -
3336   -
3337   - def process_file_triage(self, show_decoded_strings=False, deobfuscate=False):
3338   - """
3339   - Process a file in triage mode, showing only summary results on one line.
3340   - """
3341   - #TODO: replace print by writing to a provided output file (sys.stdout by default)
3342   - try:
3343   - #TODO: handle olefile errors, when an OLE file is malformed
3344   - if self.detect_vba_macros():
3345   - # print a waiting message only if the output is not redirected to a file:
3346   - if sys.stdout.isatty():
3347   - print('Analysis...\r', end='')
3348   - sys.stdout.flush()
3349   - self.analyze_macros(show_decoded_strings=show_decoded_strings,
3350   - deobfuscate=deobfuscate)
3351   - flags = TYPE2TAG[self.type]
3352   - macros = autoexec = suspicious = iocs = hexstrings = base64obf = dridex = vba_obf = '-'
3353   - if self.contains_macros: macros = 'M'
3354   - if self.nb_autoexec: autoexec = 'A'
3355   - if self.nb_suspicious: suspicious = 'S'
3356   - if self.nb_iocs: iocs = 'I'
3357   - if self.nb_hexstrings: hexstrings = 'H'
3358   - if self.nb_base64strings: base64obf = 'B'
3359   - if self.nb_dridexstrings: dridex = 'D'
3360   - if self.nb_vbastrings: vba_obf = 'V'
3361   - flags += '%s%s%s%s%s%s%s%s' % (macros, autoexec, suspicious, iocs, hexstrings,
3362   - base64obf, dridex, vba_obf)
3363   -
3364   - line = '%-12s %s' % (flags, self.filename)
3365   - print(line)
3366   -
3367   - # old table display:
3368   - # macros = autoexec = suspicious = iocs = hexstrings = 'no'
3369   - # if nb_macros: macros = 'YES:%d' % nb_macros
3370   - # if nb_autoexec: autoexec = 'YES:%d' % nb_autoexec
3371   - # if nb_suspicious: suspicious = 'YES:%d' % nb_suspicious
3372   - # if nb_iocs: iocs = 'YES:%d' % nb_iocs
3373   - # if nb_hexstrings: hexstrings = 'YES:%d' % nb_hexstrings
3374   - # # 2nd line = info
3375   - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % (self.type, macros, autoexec, suspicious, iocs, hexstrings)
3376   - except Exception as exc:
3377   - # display the exception with full stack trace for debugging only
3378   - log.debug('Error processing file %s (%s)' % (self.filename, exc),
3379   - exc_info=True)
3380   - raise ProcessingError(self.filename, exc)
3381   -
3382   -
3383   - # t = prettytable.PrettyTable(('filename', 'type', 'macros', 'autoexec', 'suspicious', 'ioc', 'hexstrings'),
3384   - # header=False, border=False)
3385   - # t.align = 'l'
3386   - # t.max_width['filename'] = 30
3387   - # t.max_width['type'] = 10
3388   - # t.max_width['macros'] = 6
3389   - # t.max_width['autoexec'] = 6
3390   - # t.max_width['suspicious'] = 6
3391   - # t.max_width['ioc'] = 6
3392   - # t.max_width['hexstrings'] = 6
3393   - # t.add_row((filename, ftype, macros, autoexec, suspicious, iocs, hexstrings))
3394   - # print t
3395   -
3396   -
3397   -#=== MAIN =====================================================================
3398   -
3399   -def parse_args(cmd_line_args=None):
3400   - """ parse command line arguments (given ones or per default sys.argv) """
3401   -
3402   - DEFAULT_LOG_LEVEL = "warning" # Default log level
3403   - LOG_LEVELS = {
3404   - 'debug': logging.DEBUG,
3405   - 'info': logging.INFO,
3406   - 'warning': logging.WARNING,
3407   - 'error': logging.ERROR,
3408   - 'critical': logging.CRITICAL
3409   - }
3410   -
3411   - usage = 'usage: olevba [options] <filename> [filename2 ...]'
3412   - parser = optparse.OptionParser(usage=usage)
3413   - # parser.add_option('-o', '--outfile', dest='outfile',
3414   - # help='output file')
3415   - # parser.add_option('-c', '--csv', dest='csv',
3416   - # help='export results to a CSV file')
3417   - parser.add_option("-r", action="store_true", dest="recursive",
3418   - help='find files recursively in subdirectories.')
3419   - parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None,
3420   - help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)')
3421   - parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*',
3422   - help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)')
3423   - # output mode; could make this even simpler with add_option(type='choice') but that would make
3424   - # cmd line interface incompatible...
3425   - modes = optparse.OptionGroup(parser, title='Output mode (mutually exclusive)')
3426   - modes.add_option("-t", '--triage', action="store_const", dest="output_mode",
3427   - const='triage', default='unspecified',
3428   - help='triage mode, display results as a summary table (default for multiple files)')
3429   - modes.add_option("-d", '--detailed', action="store_const", dest="output_mode",
3430   - const='detailed', default='unspecified',
3431   - help='detailed mode, display full results (default for single file)')
3432   - modes.add_option("-j", '--json', action="store_const", dest="output_mode",
3433   - const='json', default='unspecified',
3434   - help='json mode, detailed in json format (never default)')
3435   - parser.add_option_group(modes)
3436   - parser.add_option("-a", '--analysis', action="store_false", dest="display_code", default=True,
3437   - help='display only analysis results, not the macro source code')
3438   - parser.add_option("-c", '--code', action="store_true", dest="vba_code_only", default=False,
3439   - help='display only VBA source code, do not analyze it')
3440   - parser.add_option("--decode", action="store_true", dest="show_decoded_strings",
3441   - help='display all the obfuscated strings with their decoded content (Hex, Base64, StrReverse, Dridex, VBA).')
3442   - parser.add_option("--attr", action="store_false", dest="hide_attributes", default=True,
3443   - help='display the attribute lines at the beginning of VBA source code')
3444   - parser.add_option("--reveal", action="store_true", dest="show_deobfuscated_code",
3445   - help='display the macro source code after replacing all the obfuscated strings by their decoded content.')
3446   - parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL,
3447   - help="logging level debug/info/warning/error/critical (default=%default)")
3448   - parser.add_option('--deobf', dest="deobfuscate", action="store_true", default=False,
3449   - help="Attempt to deobfuscate VBA expressions (slow)")
3450   - parser.add_option('--relaxed', dest="relaxed", action="store_true", default=False,
3451   - help="Do not raise errors if opening of substream fails")
3452   -
3453   - (options, args) = parser.parse_args(cmd_line_args)
3454   -
3455   - # Print help if no arguments are passed
3456   - if len(args) == 0:
3457   - print('olevba %s - http://decalage.info/python/oletools' % __version__)
3458   - print(__doc__)
3459   - parser.print_help()
3460   - sys.exit(RETURN_WRONG_ARGS)
3461   -
3462   - options.loglevel = LOG_LEVELS[options.loglevel]
3463   -
3464   - return options, args
3465   -
3466   -
3467   -def main(cmd_line_args=None):
3468   - """
3469   - Main function, called when olevba is run from the command line
3470   -
3471   - Optional argument: command line arguments to be forwarded to ArgumentParser
3472   - in process_args. Per default (cmd_line_args=None), sys.argv is used. Option
3473   - mainly added for unit-testing
3474   - """
3475   -
3476   - options, args = parse_args(cmd_line_args)
3477   -
3478   - # provide info about tool and its version
3479   - if options.output_mode == 'json':
3480   - # print first json entry with meta info and opening '['
3481   - print_json(script_name='olevba', version=__version__,
3482   - url='http://decalage.info/python/oletools',
3483   - type='MetaInformation', _json_is_first=True)
3484   - else:
3485   - print('olevba3 %s - http://decalage.info/python/oletools' % __version__)
3486   -
3487   - logging.basicConfig(level=options.loglevel, format='%(levelname)-8s %(message)s')
3488   - # enable logging in the modules:
3489   - enable_logging()
3490   -
3491   - # Old display with number of items detected:
3492   - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('Type', 'Macros', 'AutoEx', 'Susp.', 'IOCs', 'HexStr')
3493   - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('-'*8, '-'*7, '-'*7, '-'*7, '-'*7, '-'*7)
3494   -
3495   - # with the option --reveal, make sure --deobf is also enabled:
3496   - if options.show_deobfuscated_code and not options.deobfuscate:
3497   - log.info('set --deobf because --reveal was set')
3498   - options.deobfuscate = True
3499   - if options.output_mode == 'triage' and options.show_deobfuscated_code:
3500   - log.info('ignoring option --reveal in triage output mode')
3501   -
3502   - # Column headers (do not know how many files there will be yet, so if no output_mode
3503   - # was specified, we will print triage for first file --> need these headers)
3504   - if options.output_mode in ('triage', 'unspecified'):
3505   - print('%-12s %-65s' % ('Flags', 'Filename'))
3506   - print('%-12s %-65s' % ('-' * 11, '-' * 65))
3507   -
3508   - previous_container = None
3509   - count = 0
3510   - container = filename = data = None
3511   - vba_parser = None
3512   - return_code = RETURN_OK
3513   - try:
3514   - for container, filename, data in xglob.iter_files(args, recursive=options.recursive,
3515   - zip_password=options.zip_password, zip_fname=options.zip_fname):
3516   - # ignore directory names stored in zip files:
3517   - if container and filename.endswith('/'):
3518   - continue
3519   -
3520   - # handle errors from xglob
3521   - if isinstance(data, Exception):
3522   - if isinstance(data, PathNotFoundException):
3523   - if options.output_mode in ('triage', 'unspecified'):
3524   - print('%-12s %s - File not found' % ('?', filename))
3525   - elif options.output_mode != 'json':
3526   - log.error('Given path %r does not exist!' % filename)
3527   - return_code = RETURN_FILE_NOT_FOUND if return_code == 0 \
3528   - else RETURN_SEVERAL_ERRS
3529   - else:
3530   - if options.output_mode in ('triage', 'unspecified'):
3531   - print('%-12s %s - Failed to read from zip file %s' % ('?', filename, container))
3532   - elif options.output_mode != 'json':
3533   - log.error('Exception opening/reading %r from zip file %r: %s'
3534   - % (filename, container, data))
3535   - return_code = RETURN_XGLOB_ERR if return_code == 0 \
3536   - else RETURN_SEVERAL_ERRS
3537   - if options.output_mode == 'json':
3538   - print_json(file=filename, type='error',
3539   - error=type(data).__name__, message=str(data))
3540   - continue
3541   -
3542   - try:
3543   - # Open the file
3544   - vba_parser = VBA_Parser_CLI(filename, data=data, container=container,
3545   - relaxed=options.relaxed)
3546   -
3547   - if options.output_mode == 'detailed':
3548   - # fully detailed output
3549   - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
3550   - display_code=options.display_code,
3551   - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
3552   - show_deobfuscated_code=options.show_deobfuscated_code,
3553   - deobfuscate=options.deobfuscate)
3554   - elif options.output_mode in ('triage', 'unspecified'):
3555   - # print container name when it changes:
3556   - if container != previous_container:
3557   - if container is not None:
3558   - print('\nFiles in %s:' % container)
3559   - previous_container = container
3560   - # summarized output for triage:
3561   - vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings,
3562   - deobfuscate=options.deobfuscate)
3563   - elif options.output_mode == 'json':
3564   - print_json(
3565   - vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings,
3566   - display_code=options.display_code,
3567   - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
3568   - show_deobfuscated_code=options.show_deobfuscated_code,
3569   - deobfuscate=options.deobfuscate))
3570   - else: # (should be impossible)
3571   - raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode))
3572   - count += 1
3573   -
3574   - except (SubstreamOpenError, UnexpectedDataError) as exc:
3575   - if options.output_mode in ('triage', 'unspecified'):
3576   - print('%-12s %s - Error opening substream or uenxpected ' \
3577   - 'content' % ('?', filename))
3578   - elif options.output_mode == 'json':
3579   - print_json(file=filename, type='error',
3580   - error=type(exc).__name__, message=str(exc))
3581   - else:
3582   - log.exception('Error opening substream or unexpected '
3583   - 'content in %s' % filename)
3584   - return_code = RETURN_OPEN_ERROR if return_code == 0 \
3585   - else RETURN_SEVERAL_ERRS
3586   - except FileOpenError as exc:
3587   - if options.output_mode in ('triage', 'unspecified'):
3588   - print('%-12s %s - File format not supported' % ('?', filename))
3589   - elif options.output_mode == 'json':
3590   - print_json(file=filename, type='error',
3591   - error=type(exc).__name__, message=str(exc))
3592   - else:
3593   - log.exception('Failed to open %s -- probably not supported!' % filename)
3594   - return_code = RETURN_OPEN_ERROR if return_code == 0 \
3595   - else RETURN_SEVERAL_ERRS
3596   - except ProcessingError as exc:
3597   - if options.output_mode in ('triage', 'unspecified'):
3598   - print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc))
3599   - elif options.output_mode == 'json':
3600   - print_json(file=filename, type='error',
3601   - error=type(exc).__name__,
3602   - message=str(exc.orig_exc))
3603   - else:
3604   - log.exception('Error processing file %s (%s)!'
3605   - % (filename, exc.orig_exc))
3606   - return_code = RETURN_PARSE_ERROR if return_code == 0 \
3607   - else RETURN_SEVERAL_ERRS
3608   - except FileIsEncryptedError as exc:
3609   - if options.output_mode in ('triage', 'unspecified'):
3610   - print('%-12s %s - File is encrypted' % ('!ERROR', filename))
3611   - elif options.output_mode == 'json':
3612   - print_json(file=filename, type='error',
3613   - error=type(exc).__name__, message=str(exc))
3614   - else:
3615   - log.exception('File %s is encrypted!' % (filename))
3616   - return_code = RETURN_ENCRYPTED if return_code == 0 \
3617   - else RETURN_SEVERAL_ERRS
3618   - # Here we do not close the vba_parser, because process_file may need it below.
3619   -
3620   - finally:
3621   - if vba_parser is not None:
3622   - vba_parser.close()
3623   -
3624   - if options.output_mode == 'triage':
3625   - print('\n(Flags: OpX=OpenXML, XML=Word2003XML, FlX=FlatOPC XML, MHT=MHTML, TXT=Text, M=Macros, ' \
3626   - 'A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, ' \
3627   - 'B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown)\n')
3628   -
3629   - if count == 1 and options.output_mode == 'unspecified':
3630   - # if options -t, -d and -j were not specified and it's a single file, print details:
3631   - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
3632   - display_code=options.display_code,
3633   - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
3634   - show_deobfuscated_code=options.show_deobfuscated_code,
3635   - deobfuscate=options.deobfuscate)
3636   -
3637   - if options.output_mode == 'json':
3638   - # print last json entry (a last one without a comma) and closing ]
3639   - print_json(type='MetaInformation', return_code=return_code,
3640   - n_processed=count, _json_is_last=True)
3641   -
3642   - except Exception as exc:
3643   - # some unexpected error, maybe some of the types caught in except clauses
3644   - # above were not sufficient. This is very bad, so log complete trace at exception level
3645   - # and do not care about output mode
3646   - log.exception('Unhandled exception in main: %s' % exc, exc_info=True)
3647   - return_code = RETURN_UNEXPECTED # even if there were others before -- this is more important
3648   - # TODO: print msg with URL to report issues (except in JSON mode)
3649   -
3650   - # done. exit
3651   - log.debug('will exit now with code %s' % return_code)
3652   - sys.exit(return_code)
  19 +from oletools.olevba import *
  20 +from oletools.olevba import __doc__, __version__
3653 21  
3654 22 if __name__ == '__main__':
3655 23 main()
3656 24  
3657   -# This was coded while listening to "Dust" from I Love You But I've Chosen Darkness
... ...
oletools/ooxml.py
... ... @@ -16,11 +16,11 @@ TODO: &quot;xml2003&quot; == &quot;flatopc&quot;?
16 16 """
17 17  
18 18 import sys
19   -from oletools.common.log_helper import log_helper
20 19 from zipfile import ZipFile, BadZipfile, is_zipfile
21 20 from os.path import splitext
22 21 import io
23 22 import re
  23 +from oletools.common.log_helper import log_helper
24 24  
25 25 # import lxml or ElementTree for XML parsing:
26 26 try:
... ... @@ -107,16 +107,14 @@ def debug_str(elem):
107 107 text = u', '.join(parts)
108 108 if len(text) > 150:
109 109 return text[:147] + u'...]'
110   - else:
111   - return text + u']'
  110 + return text + u']'
112 111  
113 112  
114 113 def isstr(some_var):
115 114 """ version-independent test for isinstance(some_var, (str, unicode)) """
116 115 if sys.version_info.major == 2:
117 116 return isinstance(some_var, basestring) # true for str and unicode
118   - else:
119   - return isinstance(some_var, str) # there is no unicode
  117 + return isinstance(some_var, str) # there is no unicode
120 118  
121 119  
122 120 ###############################################################################
... ... @@ -136,23 +134,29 @@ def get_type(filename):
136 134 prog_id = match.groups()[0]
137 135 if prog_id == WORD_XML_PROG_ID:
138 136 return DOCTYPE_WORD_XML
139   - elif prog_id == EXCEL_XML_PROG_ID:
  137 + if prog_id == EXCEL_XML_PROG_ID:
140 138 return DOCTYPE_EXCEL_XML
141   - else:
142   - return DOCTYPE_NONE
  139 + return DOCTYPE_NONE
143 140  
144 141 is_doc = False
145 142 is_xls = False
146 143 is_ppt = False
147   - for _, elem, _ in parser.iter_xml(FILE_CONTENT_TYPES):
148   - logger.debug(u' ' + debug_str(elem))
149   - try:
150   - content_type = elem.attrib['ContentType']
151   - except KeyError: # ContentType not an attr
152   - continue
153   - is_xls |= content_type.startswith(CONTENT_TYPES_EXCEL)
154   - is_doc |= content_type.startswith(CONTENT_TYPES_WORD)
155   - is_ppt |= content_type.startswith(CONTENT_TYPES_PPT)
  144 + try:
  145 + for _, elem, _ in parser.iter_xml(FILE_CONTENT_TYPES):
  146 + logger.debug(u' ' + debug_str(elem))
  147 + try:
  148 + content_type = elem.attrib['ContentType']
  149 + except KeyError: # ContentType not an attr
  150 + continue
  151 + is_xls |= content_type.startswith(CONTENT_TYPES_EXCEL)
  152 + is_doc |= content_type.startswith(CONTENT_TYPES_WORD)
  153 + is_ppt |= content_type.startswith(CONTENT_TYPES_PPT)
  154 + except BadOOXML as oo_err:
  155 + if oo_err.more_info.startswith('invalid subfile') and \
  156 + FILE_CONTENT_TYPES in oo_err.more_info:
  157 + # no FILE_CONTENT_TYPES in zip, so probably no ms office xml.
  158 + return DOCTYPE_NONE
  159 + raise
156 160  
157 161 if is_doc and not is_xls and not is_ppt:
158 162 return DOCTYPE_WORD
... ... @@ -162,9 +166,8 @@ def get_type(filename):
162 166 return DOCTYPE_POWERPOINT
163 167 if not is_doc and not is_xls and not is_ppt:
164 168 return DOCTYPE_NONE
165   - else:
166   - logger.warning('Encountered contradictory content types')
167   - return DOCTYPE_MIXED
  169 + logger.warning('Encountered contradictory content types')
  170 + return DOCTYPE_MIXED
168 171  
169 172  
170 173 def is_ooxml(filename):
... ... @@ -177,6 +180,7 @@ def is_ooxml(filename):
177 180 return False
178 181 if doctype == DOCTYPE_NONE:
179 182 return False
  183 + return True
180 184  
181 185  
182 186 ###############################################################################
... ... @@ -216,6 +220,7 @@ class ZipSubFile(object):
216 220 See also (and maybe could some day merge with):
217 221 ppt_record_parser.IterStream; also: oleobj.FakeFile
218 222 """
  223 + CHUNK_SIZE = 4096
219 224  
220 225 def __init__(self, container, filename, mode='r', size=None):
221 226 """ remember all necessary vars but do not open yet """
... ... @@ -253,7 +258,7 @@ class ZipSubFile(object):
253 258 # print('ZipSubFile: opened; size={}'.format(self.size))
254 259 return self
255 260  
256   - def write(self, *args, **kwargs): # pylint: disable=unused-argument,no-self-use
  261 + def write(self, *args, **kwargs):
257 262 """ write is not allowed """
258 263 raise IOError('writing not implemented')
259 264  
... ... @@ -311,10 +316,9 @@ class ZipSubFile(object):
311 316 """ helper for seek: skip forward by given amount using read() """
312 317 # print('ZipSubFile: seek by skipping {} bytes starting at {}'
313 318 # .format(self.pos, to_skip))
314   - CHUNK_SIZE = 4096
315   - n_chunks, leftover = divmod(to_skip, CHUNK_SIZE)
  319 + n_chunks, leftover = divmod(to_skip, self.CHUNK_SIZE)
316 320 for _ in range(n_chunks):
317   - self.read(CHUNK_SIZE) # just read and discard
  321 + self.read(self.CHUNK_SIZE) # just read and discard
318 322 self.read(leftover)
319 323 # print('ZipSubFile: seek by skipping done, pos now {}'
320 324 # .format(self.pos))
... ... @@ -417,8 +421,7 @@ class XmlParser(object):
417 421 if match:
418 422 self._is_single_xml = True
419 423 return True
420   - if not match:
421   - raise BadOOXML(self.filename, 'is no zip and has no prog_id')
  424 + raise BadOOXML(self.filename, 'is no zip and has no prog_id')
422 425  
423 426 def iter_files(self, args=None):
424 427 """ Find files in zip or just give single xml file """
... ... @@ -433,17 +436,14 @@ class XmlParser(object):
433 436 subfiles = None
434 437 try:
435 438 zipper = ZipFile(self.filename)
436   - try:
437   - _ = zipper.getinfo(FILE_CONTENT_TYPES)
438   - except KeyError:
439   - raise BadOOXML(self.filename,
440   - 'No content type information')
441 439 if not args:
442 440 subfiles = zipper.namelist()
443 441 elif isstr(args):
444 442 subfiles = [args, ]
445 443 else:
446   - subfiles = tuple(args) # make a copy in case orig changes
  444 + # make a copy in case original args are modified
  445 + # Not sure whether this really is needed...
  446 + subfiles = tuple(arg for arg in args)
447 447  
448 448 for subfile in subfiles:
449 449 with zipper.open(subfile, 'r') as handle:
... ... @@ -451,10 +451,12 @@ class XmlParser(object):
451 451 if not args:
452 452 self.did_iter_all = True
453 453 except KeyError as orig_err:
  454 + # Note: do not change text of this message without adjusting
  455 + # conditions in except handlers
454 456 raise BadOOXML(self.filename,
455 457 'invalid subfile: ' + str(orig_err))
456 458 except BadZipfile:
457   - raise BadOOXML(self.filename, 'neither zip nor xml')
  459 + raise BadOOXML(self.filename, 'not in zip format')
458 460 finally:
459 461 if zipper:
460 462 zipper.close()
... ... @@ -503,7 +505,7 @@ class XmlParser(object):
503 505 if event == 'start':
504 506 if elem.tag in want_tags:
505 507 logger.debug('remember start of tag {0} at {1}'
506   - .format(elem.tag, depth))
  508 + .format(elem.tag, depth))
507 509 inside_tags.append((elem.tag, depth))
508 510 depth += 1
509 511 continue
... ... @@ -519,18 +521,18 @@ class XmlParser(object):
519 521 inside_tags.pop()
520 522 else:
521 523 logger.error('found end for wanted tag {0} '
522   - 'but last start tag {1} does not'
523   - ' match'.format(curr_tag,
524   - inside_tags[-1]))
  524 + 'but last start tag {1} does not'
  525 + ' match'.format(curr_tag,
  526 + inside_tags[-1]))
525 527 # try to recover: close all deeper tags
526 528 while inside_tags and \
527 529 inside_tags[-1][1] >= depth:
528 530 logger.debug('recover: pop {0}'
529   - .format(inside_tags[-1]))
  531 + .format(inside_tags[-1]))
530 532 inside_tags.pop()
531 533 except IndexError: # no inside_tag[-1]
532 534 logger.error('found end of {0} at depth {1} but '
533   - 'no start event')
  535 + 'no start event')
534 536 # yield element
535 537 if is_wanted or not want_tags:
536 538 yield subfile, elem, depth
... ... @@ -544,7 +546,7 @@ class XmlParser(object):
544 546 except ET.ParseError as err:
545 547 self.subfiles_no_xml.add(subfile)
546 548 if subfile is None: # this is no zip subfile but single xml
547   - raise BadOOXML(self.filename, 'is neither zip nor xml')
  549 + raise BadOOXML(self.filename, 'content is not valid XML')
548 550 elif subfile.endswith('.xml'):
549 551 log = logger.warning
550 552 else:
... ... @@ -568,21 +570,30 @@ class XmlParser(object):
568 570  
569 571 defaults = []
570 572 files = []
571   - for _, elem, _ in self.iter_xml(FILE_CONTENT_TYPES):
572   - if elem.tag.endswith('Default'):
573   - extension = elem.attrib['Extension']
574   - if extension.startswith('.'):
575   - extension = extension[1:]
576   - defaults.append((extension, elem.attrib['ContentType']))
577   - logger.debug('found content type for extension {0[0]}: {0[1]}'
578   - .format(defaults[-1]))
579   - elif elem.tag.endswith('Override'):
580   - subfile = elem.attrib['PartName']
581   - if subfile.startswith('/'):
582   - subfile = subfile[1:]
583   - files.append((subfile, elem.attrib['ContentType']))
584   - logger.debug('found content type for subfile {0[0]}: {0[1]}'
585   - .format(files[-1]))
  573 + try:
  574 + for _, elem, _ in self.iter_xml(FILE_CONTENT_TYPES):
  575 + if elem.tag.endswith('Default'):
  576 + extension = elem.attrib['Extension']
  577 + if extension.startswith('.'):
  578 + extension = extension[1:]
  579 + defaults.append((extension, elem.attrib['ContentType']))
  580 + logger.debug('found content type for extension {0[0]}: '
  581 + '{0[1]}'.format(defaults[-1]))
  582 + elif elem.tag.endswith('Override'):
  583 + subfile = elem.attrib['PartName']
  584 + if subfile.startswith('/'):
  585 + subfile = subfile[1:]
  586 + files.append((subfile, elem.attrib['ContentType']))
  587 + logger.debug('found content type for subfile {0[0]}: '
  588 + '{0[1]}'.format(files[-1]))
  589 + except BadOOXML as oo_err:
  590 + if oo_err.more_info.startswith('invalid subfile') and \
  591 + FILE_CONTENT_TYPES in oo_err.more_info:
  592 + # no FILE_CONTENT_TYPES in zip, so probably no ms office xml.
  593 + # Maybe OpenDocument format? In any case, try to analyze.
  594 + pass
  595 + else:
  596 + raise
586 597 return dict(files), dict(defaults)
587 598  
588 599 def iter_non_xml(self):
... ... @@ -599,7 +610,7 @@ class XmlParser(object):
599 610 """
600 611 if not self.did_iter_all:
601 612 logger.warning('Did not iterate through complete file. '
602   - 'Should run iter_xml() without args, first.')
  613 + 'Should run iter_xml() without args, first.')
603 614 if not self.subfiles_no_xml:
604 615 return
605 616  
... ... @@ -631,7 +642,7 @@ def test():
631 642  
632 643 see module doc for more info
633 644 """
634   - log_helper.enable_logging(False, logger.DEBUG)
  645 + log_helper.enable_logging(False, 'debug')
635 646 if len(sys.argv) != 2:
636 647 print(u'To test this code, give me a single file as arg')
637 648 return 2
... ...
oletools/ppt_parser.py
... ... @@ -43,7 +43,7 @@ file structure and will replace this module some time soon!
43 43 # 2017-04-23 v0.51 PL: - fixed absolute imports and issue #101
44 44 # 2018-09-11 v0.54 PL: - olefile is now a dependency
45 45  
46   -__version__ = '0.54dev1'
  46 +__version__ = '0.54'
47 47  
48 48  
49 49 # --- IMPORTS ------------------------------------------------------------------
... ...
oletools/ppt_record_parser.py
... ... @@ -63,7 +63,6 @@ except ImportError:
63 63 sys.path.insert(0, PARENT_DIR)
64 64 del PARENT_DIR
65 65 from oletools import record_base
66   -from oletools.common.errors import FileIsEncryptedError
67 66  
68 67  
69 68 # types of relevant records (there are much more than listed here)
... ... @@ -109,10 +108,11 @@ RECORD_TYPES = dict([
109 108 ])
110 109  
111 110  
112   -# record types where version is not 0x0 or 0xf
  111 +# record types where version is not 0x0 or 0x1 or 0xf
113 112 VERSION_EXCEPTIONS = dict([
114 113 (0x0400, 2), # rt_vbainfoatom
115 114 (0x03ef, 2), # rt_slideatom
  115 + (0xe9c7, 7), # tests/test-data/encrypted/encrypted.ppt, not investigated
116 116 ])
117 117  
118 118  
... ... @@ -149,6 +149,10 @@ def is_ppt(filename):
149 149 Param filename can be anything that OleFileIO constructor accepts: name of
150 150 file or file data or data stream.
151 151  
  152 + Will not try to decrypt the file not even try to determine whether it is
  153 + encrypted. If the file is encrypted will either raise an error or just
  154 + return `False`.
  155 +
152 156 see also: oleid.OleID.check_powerpoint
153 157 """
154 158 have_current_user = False
... ... @@ -170,7 +174,7 @@ def is_ppt(filename):
170 174 for record in stream.iter_records():
171 175 if record.type == 0x0ff5: # UserEditAtom
172 176 have_user_edit = True
173   - elif record.type == 0x1772: # PersisDirectoryAtom
  177 + elif record.type == 0x1772: # PersistDirectoryAtom
174 178 have_persist_dir = True
175 179 elif record.type == 0x03e8: # DocumentContainer
176 180 have_document_container = True
... ... @@ -181,13 +185,12 @@ def is_ppt(filename):
181 185 return True
182 186 else: # ignore other streams/storages since they are optional
183 187 continue
184   - except FileIsEncryptedError:
185   - assert ppt_file is not None, \
186   - 'Encryption error should not be raised from just opening OLE file.'
187   - # just rely on stream names, copied from oleid
188   - return ppt_file.exists('PowerPoint Document')
189   - except Exception:
190   - pass
  188 + except Exception as exc:
  189 + logging.debug('Ignoring exception in is_ppt, assume is not ppt',
  190 + exc_info=True)
  191 + finally:
  192 + if ppt_file is not None:
  193 + ppt_file.close()
191 194 return False
192 195  
193 196  
... ...
oletools/pyxswf.py
... ... @@ -25,7 +25,7 @@ http://www.decalage.info/python/oletools
25 25  
26 26 #=== LICENSE =================================================================
27 27  
28   -# pyxswf is copyright (c) 2012-2016, Philippe Lagadec (http://www.decalage.info)
  28 +# pyxswf is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
29 29 # All rights reserved.
30 30 #
31 31 # Redistribution and use in source and binary forms, with or without modification,
... ... @@ -59,7 +59,7 @@ http://www.decalage.info/python/oletools
59 59 # 2016-11-01 PL: - replaced StringIO by BytesIO for Python 3
60 60 # 2018-09-11 v0.54 PL: - olefile is now a dependency
61 61  
62   -__version__ = '0.54dev1'
  62 +__version__ = '0.54'
63 63  
64 64 #------------------------------------------------------------------------------
65 65 # TODO:
... ...
oletools/record_base.py
... ... @@ -8,7 +8,10 @@ This is the case for xls and ppt, so classes are bases for xls_parser.py and
8 8 ppt_record_parser.py .
9 9 """
10 10  
11   -# === LICENSE =================================================================
  11 +# === LICENSE ==================================================================
  12 +
  13 +# record_base is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info)
  14 +# All rights reserved.
12 15 #
13 16 # Redistribution and use in source and binary forms, with or without
14 17 # modification, are permitted provided that the following conditions are met:
... ... @@ -37,8 +40,10 @@ from __future__ import print_function
37 40 # CHANGELOG:
38 41 # 2017-11-30 v0.01 CH: - first version based on xls_parser
39 42 # 2018-09-11 v0.54 PL: - olefile is now a dependency
  43 +# 2019-01-30 PL: - fixed import to avoid mixing installed oletools
  44 +# and dev version
40 45  
41   -__version__ = '0.54dev1'
  46 +__version__ = '0.54'
42 47  
43 48 # -----------------------------------------------------------------------------
44 49 # TODO:
... ... @@ -63,16 +68,12 @@ import logging
63 68  
64 69 import olefile
65 70  
66   -try:
67   - from oletools.common.errors import FileIsEncryptedError
68   -except ImportError:
69   - # little hack to allow absolute imports even if oletools is not installed.
70   - PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(
71   - os.path.abspath(__file__))))
72   - if PARENT_DIR not in sys.path:
73   - sys.path.insert(0, PARENT_DIR)
74   - del PARENT_DIR
75   - from oletools.common.errors import FileIsEncryptedError
  71 +# little hack to allow absolute imports even if oletools is not installed.
  72 +PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(
  73 + os.path.abspath(__file__))))
  74 +if PARENT_DIR not in sys.path:
  75 + sys.path.insert(0, PARENT_DIR)
  76 +del PARENT_DIR
76 77 from oletools import oleid
77 78  
78 79  
... ... @@ -125,10 +126,9 @@ class OleRecordFile(olefile.OleFileIO):
125 126 """
126 127  
127 128 def open(self, filename, *args, **kwargs):
128   - """Call OleFileIO.open, raise error if is encrypted."""
  129 + """Call OleFileIO.open."""
129 130 #super(OleRecordFile, self).open(filename, *args, **kwargs)
130 131 OleFileIO.open(self, filename, *args, **kwargs)
131   - self.is_encrypted = oleid.OleID(self).check_encrypted().value
132 132  
133 133 @classmethod
134 134 def stream_class_for_name(cls, stream_name):
... ... @@ -161,8 +161,7 @@ class OleRecordFile(olefile.OleFileIO):
161 161 stream = clz(self._open(direntry.isectStart, direntry.size),
162 162 direntry.size,
163 163 None if is_orphan else direntry.name,
164   - direntry.entry_type,
165   - self.is_encrypted)
  164 + direntry.entry_type)
166 165 yield stream
167 166 stream.close()
168 167  
... ... @@ -175,14 +174,13 @@ class OleRecordStream(object):
175 174 abstract base class
176 175 """
177 176  
178   - def __init__(self, stream, size, name, stream_type, is_encrypted=False):
  177 + def __init__(self, stream, size, name, stream_type):
179 178 self.stream = stream
180 179 self.size = size
181 180 self.name = name
182 181 if stream_type not in ENTRY_TYPE2STR:
183 182 raise ValueError('Unknown stream type: {0}'.format(stream_type))
184 183 self.stream_type = stream_type
185   - self.is_encrypted = is_encrypted
186 184  
187 185 def read_record_head(self):
188 186 """ read first few bytes of record to determine size and type
... ... @@ -211,9 +209,6 @@ class OleRecordStream(object):
211 209  
212 210 Stream must be positioned at start of records (e.g. start of stream).
213 211 """
214   - if self.is_encrypted:
215   - raise FileIsEncryptedError()
216   -
217 212 while True:
218 213 # unpacking as in olevba._extract_vba
219 214 pos = self.stream.tell()
... ...
oletools/rtfobj.py
... ... @@ -17,7 +17,7 @@ http://www.decalage.info/python/oletools
17 17  
18 18 #=== LICENSE =================================================================
19 19  
20   -# rtfobj is copyright (c) 2012-2018, Philippe Lagadec (http://www.decalage.info)
  20 +# rtfobj is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
21 21 # All rights reserved.
22 22 #
23 23 # Redistribution and use in source and binary forms, with or without modification,
... ... @@ -88,8 +88,10 @@ http://www.decalage.info/python/oletools
88 88 # 2018-05-31 v0.53.1 PP: - fixed issue #316: whitespace after \bin on Python 3
89 89 # 2018-06-22 v0.53.2 PL: - fixed issue #327: added "\pnaiu" & "\pnaiud"
90 90 # 2018-09-11 v0.54 PL: - olefile is now a dependency
  91 +# 2019-07-08 v0.55 MM: - added URL carver for CVE-2017-0199 (Equation Editor) PR #460
  92 +# - added SCT to the list of executable file extensions PR #461
91 93  
92   -__version__ = '0.54dev1'
  94 +__version__ = '0.55.dev3'
93 95  
94 96 # ------------------------------------------------------------------------------
95 97 # TODO:
... ... @@ -103,7 +105,7 @@ __version__ = &#39;0.54dev1&#39;
103 105  
104 106 # === IMPORTS =================================================================
105 107  
106   -import re, os, sys, binascii, logging, optparse
  108 +import re, os, sys, binascii, logging, optparse, hashlib
107 109 import os.path
108 110 from time import time
109 111  
... ... @@ -268,7 +270,7 @@ re_delim_hexblock = re.compile(DELIMITER + PATTERN)
268 270  
269 271 # TODO: use a frozenset instead of a regex?
270 272 re_executable_extensions = re.compile(
271   - r"(?i)\.(EXE|COM|PIF|GADGET|MSI|MSP|MSC|VBS|VBE|VB|JSE|JS|WSF|WSC|WSH|WS|BAT|CMD|DLL|SCR|HTA|CPL|CLASS|JAR|PS1XML|PS1|PS2XML|PS2|PSC1|PSC2|SCF|LNK|INF|REG)\b")
  273 + r"(?i)\.(BAT|CLASS|CMD|CPL|DLL|EXECOM|GADGET|HTA|INF|JAR|JS|JSE|LNK|MSC|MSI|MSP|PIF|PS1|PS1XML|PS2|PS2XML|PSC1|PSC2|REG|SCF|SCR|SCT|VB|VBE|VBS|WS|WSC|WSF|WSH)\b")
272 274  
273 275 # Destination Control Words, according to MS RTF Specifications v1.9.1:
274 276 DESTINATION_CONTROL_WORDS = frozenset((
... ... @@ -678,6 +680,7 @@ class RtfObjParser(RtfParser):
678 680 rtfobj.hexdata = hexdata
679 681 object_data = binascii.unhexlify(hexdata)
680 682 rtfobj.rawdata = object_data
  683 + rtfobj.rawdata_md5 = hashlib.md5(object_data).hexdigest()
681 684 # TODO: check if all hex data is extracted properly
682 685  
683 686 obj = oleobj.OleObject()
... ... @@ -687,6 +690,7 @@ class RtfObjParser(RtfParser):
687 690 rtfobj.class_name = obj.class_name
688 691 rtfobj.oledata_size = obj.data_size
689 692 rtfobj.oledata = obj.data
  693 + rtfobj.oledata_md5 = hashlib.md5(obj.data).hexdigest()
690 694 rtfobj.is_ole = True
691 695 if obj.class_name.lower() == b'package':
692 696 opkg = oleobj.OleNativeStream(bindata=obj.data,
... ... @@ -695,6 +699,7 @@ class RtfObjParser(RtfParser):
695 699 rtfobj.src_path = opkg.src_path
696 700 rtfobj.temp_path = opkg.temp_path
697 701 rtfobj.olepkgdata = opkg.data
  702 + rtfobj.olepkgdata_md5 = hashlib.md5(opkg.data).hexdigest()
698 703 rtfobj.is_package = True
699 704 else:
700 705 if olefile.isOleFile(obj.data):
... ... @@ -878,15 +883,23 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
878 883 ole_column += '\nFilename: %r' % rtfobj.filename
879 884 ole_column += '\nSource path: %r' % rtfobj.src_path
880 885 ole_column += '\nTemp path = %r' % rtfobj.temp_path
  886 + ole_column += '\nMD5 = %r' % rtfobj.olepkgdata_md5
881 887 ole_color = 'yellow'
882 888 # check if the file extension is executable:
883   - _, ext = os.path.splitext(rtfobj.filename)
884   - log.debug('File extension: %r' % ext)
885   - if re_executable_extensions.match(ext):
  889 +
  890 + _, temp_ext = os.path.splitext(rtfobj.temp_path)
  891 + log.debug('Temp path extension: %r' % temp_ext)
  892 + _, file_ext = os.path.splitext(rtfobj.filename)
  893 + log.debug('File extension: %r' % file_ext)
  894 +
  895 + if temp_ext != file_ext:
  896 + ole_column += "\nMODIFIED FILE EXTENSION"
  897 +
  898 + if re_executable_extensions.match(temp_ext) or re_executable_extensions.match(file_ext):
886 899 ole_color = 'red'
887 900 ole_column += '\nEXECUTABLE FILE'
888   - # else:
889   - # pkg_column = 'Not an OLE Package'
  901 + else:
  902 + ole_column += '\nMD5 = %r' % rtfobj.oledata_md5
890 903 if rtfobj.clsid is not None:
891 904 ole_column += '\nCLSID: %s' % rtfobj.clsid
892 905 ole_column += '\n%s' % rtfobj.clsid_desc
... ... @@ -896,7 +909,28 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
896 909 # http://www.kb.cert.org/vuls/id/921560
897 910 if rtfobj.class_name == b'OLE2Link':
898 911 ole_color = 'red'
899   - ole_column += '\nPossibly an exploit for the OLE2Link vulnerability (VU#921560, CVE-2017-0199)'
  912 + ole_column += '\nPossibly an exploit for the OLE2Link vulnerability (VU#921560, CVE-2017-0199)\n'
  913 + # https://bitbucket.org/snippets/Alexander_Hanel/7Adpp
  914 + found_list = re.findall(r'[a-fA-F0-9\x0D\x0A]{128,}',data)
  915 + urls = []
  916 + for item in found_list:
  917 + try:
  918 + temp = item.replace("\x0D\x0A","").decode("hex")
  919 + except:
  920 + continue
  921 + pat = re.compile(r'(?:[\x20-\x7E][\x00]){3,}')
  922 + words = [w.decode('utf-16le') for w in pat.findall(temp)]
  923 + for w in words:
  924 + if "http" in w:
  925 + urls.append(w)
  926 + urls = sorted(set(urls))
  927 + if urls:
  928 + ole_column += 'URL extracted: ' + ', '.join(urls)
  929 + # Detect Equation Editor exploit
  930 + # https://www.kb.cert.org/vuls/id/421280/
  931 + elif rtfobj.class_name.lower() == b'equation.3':
  932 + ole_color = 'red'
  933 + ole_column += '\nPossibly an exploit for the Equation Editor vulnerability (VU#421280, CVE-2017-11882)'
900 934 else:
901 935 ole_column = 'Not a well-formed OLE object'
902 936 tstream.write_row((
... ... @@ -930,6 +964,7 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
930 964 else:
931 965 fname = '%s_object_%08X.noname' % (fname_prefix, rtfobj.start)
932 966 print(' saving to file %s' % fname)
  967 + print(' md5 %s' % rtfobj.olepkgdata_md5)
933 968 open(fname, 'wb').write(rtfobj.olepkgdata)
934 969 # When format_id=TYPE_LINKED, oledata_size=None
935 970 elif rtfobj.is_ole and rtfobj.oledata_size is not None:
... ... @@ -947,11 +982,13 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
947 982 ext = 'bin'
948 983 fname = '%s_object_%08X.%s' % (fname_prefix, rtfobj.start, ext)
949 984 print(' saving to file %s' % fname)
  985 + print(' md5 %s' % rtfobj.oledata_md5)
950 986 open(fname, 'wb').write(rtfobj.oledata)
951 987 else:
952 988 print('Saving raw data in object #%d:' % i)
953 989 fname = '%s_object_%08X.raw' % (fname_prefix, rtfobj.start)
954 990 print(' saving object to file %s' % fname)
  991 + print(' md5 %s' % rtfobj.rawdata_md5)
955 992 open(fname, 'wb').write(rtfobj.rawdata)
956 993  
957 994  
... ... @@ -1035,4 +1072,3 @@ if __name__ == &#39;__main__&#39;:
1035 1072 main()
1036 1073  
1037 1074 # This code was developed while listening to The Mary Onettes "Lost"
1038   -
... ...
oletools/thirdparty/oledump/__init__.py 0 → 100644
oletools/thirdparty/oledump/plugin_biff.py 0 → 100644
  1 +#!/usr/bin/env python
  2 +
  3 +__description__ = 'BIFF plugin for oledump.py'
  4 +__author__ = 'Didier Stevens'
  5 +__version__ = '0.0.5'
  6 +__date__ = '2019/03/06'
  7 +
  8 +# Slightly modified version by Philippe Lagadec to be imported into olevba
  9 +
  10 +"""
  11 +
  12 +Source code put in public domain by Didier Stevens, no Copyright
  13 +https://DidierStevens.com
  14 +Use at your own risk
  15 +
  16 +History:
  17 + 2014/11/15: start
  18 + 2014/11/21: changed interface: added options; added options -a (asciidump) and -s (strings)
  19 + 2017/12/10: 0.0.2 added optparse & option -o
  20 + 2017/12/12: added option -f
  21 + 2017/12/13: added 0x support for option -f
  22 + 2018/10/24: 0.0.3 started coding Excel 4.0 macro support
  23 + 2018/10/25: continue
  24 + 2018/10/26: continue
  25 + 2019/01/05: 0.0.4 added option -x
  26 + 2019/03/06: 0.0.5 enhanced parsing of formula expressions
  27 +
  28 +Todo:
  29 +"""
  30 +
  31 +import struct
  32 +import re
  33 +import optparse
  34 +import binascii
  35 +import sys
  36 +
  37 +# from olevba:
  38 +
  39 +if sys.version_info[0] <= 2:
  40 + # Python 2.x
  41 + PYTHON2 = True
  42 +else:
  43 + # Python 3.x+
  44 + PYTHON2 = False
  45 +
  46 +def unicode2str(unicode_string):
  47 + """
  48 + convert a unicode string to a native str:
  49 + - on Python 3, it returns the same string
  50 + - on Python 2, the string is encoded with UTF-8 to a bytes str
  51 + :param unicode_string: unicode string to be converted
  52 + :return: the string converted to str
  53 + :rtype: str
  54 + """
  55 + if PYTHON2:
  56 + return unicode_string.encode('utf8', errors='replace')
  57 + else:
  58 + return unicode_string
  59 +
  60 +
  61 +def bytes2str(bytes_string, encoding='utf8'):
  62 + """
  63 + convert a bytes string to a native str:
  64 + - on Python 2, it returns the same string (bytes=str)
  65 + - on Python 3, the string is decoded using the provided encoding
  66 + (UTF-8 by default) to a unicode str
  67 + :param bytes_string: bytes string to be converted
  68 + :param encoding: codec to be used for decoding
  69 + :return: the string converted to str
  70 + :rtype: str
  71 + """
  72 + if PYTHON2:
  73 + return bytes_string
  74 + else:
  75 + return bytes_string.decode(encoding, errors='replace')
  76 +
  77 +
  78 +dTokens = {
  79 +0x01: 'ptgExp',
  80 +0x02: 'ptgTbl',
  81 +0x03: 'ptgAdd',
  82 +0x04: 'ptgSub',
  83 +0x05: 'ptgMul',
  84 +0x06: 'ptgDiv',
  85 +0x07: 'ptgPower',
  86 +0x08: 'ptgConcat',
  87 +0x09: 'ptgLT',
  88 +0x0A: 'ptgLE',
  89 +0x0B: 'ptgEQ',
  90 +0x0C: 'ptgGE',
  91 +0x0D: 'ptgGT',
  92 +0x0E: 'ptgNE',
  93 +0x0F: 'ptgIsect',
  94 +0x10: 'ptgUnion',
  95 +0x11: 'ptgRange',
  96 +0x12: 'ptgUplus',
  97 +0x13: 'ptgUminus',
  98 +0x14: 'ptgPercent',
  99 +0x15: 'ptgParen',
  100 +0x16: 'ptgMissArg',
  101 +0x17: 'ptgStr',
  102 +0x19: 'ptgAttr',
  103 +0x1A: 'ptgSheet',
  104 +0x1B: 'ptgEndSheet',
  105 +0x1C: 'ptgErr',
  106 +0x1D: 'ptgBool',
  107 +0x1E: 'ptgInt',
  108 +0x1F: 'ptgNum',
  109 +0x20: 'ptgArray',
  110 +0x21: 'ptgFunc',
  111 +0x22: 'ptgFuncVar',
  112 +0x23: 'ptgName',
  113 +0x24: 'ptgRef',
  114 +0x25: 'ptgArea',
  115 +0x26: 'ptgMemArea',
  116 +0x27: 'ptgMemErr',
  117 +0x28: 'ptgMemNoMem',
  118 +0x29: 'ptgMemFunc',
  119 +0x2A: 'ptgRefErr',
  120 +0x2B: 'ptgAreaErr',
  121 +0x2C: 'ptgRefN',
  122 +0x2D: 'ptgAreaN',
  123 +0x2E: 'ptgMemAreaN',
  124 +0x2F: 'ptgMemNoMemN',
  125 +0x39: 'ptgNameX',
  126 +0x3A: 'ptgRef3d',
  127 +0x3B: 'ptgArea3d',
  128 +0x3C: 'ptgRefErr3d',
  129 +0x3D: 'ptgAreaErr3d',
  130 +0x40: 'ptgArrayV',
  131 +0x41: 'ptgFuncV',
  132 +0x42: 'ptgFuncVarV',
  133 +0x43: 'ptgNameV',
  134 +0x44: 'ptgRefV',
  135 +0x45: 'ptgAreaV',
  136 +0x46: 'ptgMemAreaV',
  137 +0x47: 'ptgMemErrV',
  138 +0x48: 'ptgMemNoMemV',
  139 +0x49: 'ptgMemFuncV',
  140 +0x4A: 'ptgRefErrV',
  141 +0x4B: 'ptgAreaErrV',
  142 +0x4C: 'ptgRefNV',
  143 +0x4D: 'ptgAreaNV',
  144 +0x4E: 'ptgMemAreaNV',
  145 +0x4F: 'ptgMemNoMemNV',
  146 +0x58: 'ptgFuncCEV',
  147 +0x59: 'ptgNameXV',
  148 +0x5A: 'ptgRef3dV',
  149 +0x5B: 'ptgArea3dV',
  150 +0x5C: 'ptgRefErr3dV',
  151 +0x5D: 'ptgAreaErr3dV',
  152 +0x60: 'ptgArrayA',
  153 +0x61: 'ptgFuncA',
  154 +0x62: 'ptgFuncVarA',
  155 +0x63: 'ptgNameA',
  156 +0x64: 'ptgRefA',
  157 +0x65: 'ptgAreaA',
  158 +0x66: 'ptgMemAreaA',
  159 +0x67: 'ptgMemErrA',
  160 +0x68: 'ptgMemNoMemA',
  161 +0x69: 'ptgMemFuncA',
  162 +0x6A: 'ptgRefErrA',
  163 +0x6B: 'ptgAreaErrA',
  164 +0x6C: 'ptgRefNA',
  165 +0x6D: 'ptgAreaNA',
  166 +0x6E: 'ptgMemAreaNA',
  167 +0x6F: 'ptgMemNoMemNA',
  168 +0x78: 'ptgFuncCEA',
  169 +0x79: 'ptgNameXA',
  170 +0x7A: 'ptgRef3dA',
  171 +0x7B: 'ptgArea3dA',
  172 +0x7C: 'ptgRefErr3dA',
  173 +0x7D: 'ptgAreaErr3dA',
  174 +}
  175 +
  176 +#https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/00b5dd7d-51ca-4938-b7b7-483fe0e5933b
  177 +dFunctions = {
  178 +0x0000: 'COUNT',
  179 +0x0001: 'IF',
  180 +0x0002: 'ISNA',
  181 +0x0003: 'ISERROR',
  182 +0x0004: 'SUM',
  183 +0x0005: 'AVERAGE',
  184 +0x0006: 'MIN',
  185 +0x0007: 'MAX',
  186 +0x0008: 'ROW',
  187 +0x0009: 'COLUMN',
  188 +0x000A: 'NA',
  189 +0x000B: 'NPV',
  190 +0x000C: 'STDEV',
  191 +0x000D: 'DOLLAR',
  192 +0x000E: 'FIXED',
  193 +0x000F: 'SIN',
  194 +0x0010: 'COS',
  195 +0x0011: 'TAN',
  196 +0x0012: 'ATAN',
  197 +0x0013: 'PI',
  198 +0x0014: 'SQRT',
  199 +0x0015: 'EXP',
  200 +0x0016: 'LN',
  201 +0x0017: 'LOG10',
  202 +0x0018: 'ABS',
  203 +0x0019: 'INT',
  204 +0x001A: 'SIGN',
  205 +0x001B: 'ROUND',
  206 +0x001C: 'LOOKUP',
  207 +0x001D: 'INDEX',
  208 +0x001E: 'REPT',
  209 +0x001F: 'MID',
  210 +0x0020: 'LEN',
  211 +0x0021: 'VALUE',
  212 +0x0022: 'TRUE',
  213 +0x0023: 'FALSE',
  214 +0x0024: 'AND',
  215 +0x0025: 'OR',
  216 +0x0026: 'NOT',
  217 +0x0027: 'MOD',
  218 +0x0028: 'DCOUNT',
  219 +0x0029: 'DSUM',
  220 +0x002A: 'DAVERAGE',
  221 +0x002B: 'DMIN',
  222 +0x002C: 'DMAX',
  223 +0x002D: 'DSTDEV',
  224 +0x002E: 'VAR',
  225 +0x002F: 'DVAR',
  226 +0x0030: 'TEXT',
  227 +0x0031: 'LINEST',
  228 +0x0032: 'TREND',
  229 +0x0033: 'LOGEST',
  230 +0x0034: 'GROWTH',
  231 +0x0035: 'GOTO',
  232 +0x0036: 'HALT',
  233 +0x0037: 'RETURN',
  234 +0x0038: 'PV',
  235 +0x0039: 'FV',
  236 +0x003A: 'NPER',
  237 +0x003B: 'PMT',
  238 +0x003C: 'RATE',
  239 +0x003D: 'MIRR',
  240 +0x003E: 'IRR',
  241 +0x003F: 'RAND',
  242 +0x0040: 'MATCH',
  243 +0x0041: 'DATE',
  244 +0x0042: 'TIME',
  245 +0x0043: 'DAY',
  246 +0x0044: 'MONTH',
  247 +0x0045: 'YEAR',
  248 +0x0046: 'WEEKDAY',
  249 +0x0047: 'HOUR',
  250 +0x0048: 'MINUTE',
  251 +0x0049: 'SECOND',
  252 +0x004A: 'NOW',
  253 +0x004B: 'AREAS',
  254 +0x004C: 'ROWS',
  255 +0x004D: 'COLUMNS',
  256 +0x004E: 'OFFSET',
  257 +0x004F: 'ABSREF',
  258 +0x0050: 'RELREF',
  259 +0x0051: 'ARGUMENT',
  260 +0x0052: 'SEARCH',
  261 +0x0053: 'TRANSPOSE',
  262 +0x0054: 'ERROR',
  263 +0x0055: 'STEP',
  264 +0x0056: 'TYPE',
  265 +0x0057: 'ECHO',
  266 +0x0058: 'SET.NAME',
  267 +0x0059: 'CALLER',
  268 +0x005A: 'DEREF',
  269 +0x005B: 'WINDOWS',
  270 +0x005C: 'SERIES',
  271 +0x005D: 'DOCUMENTS',
  272 +0x005E: 'ACTIVE.CELL',
  273 +0x005F: 'SELECTION',
  274 +0x0060: 'RESULT',
  275 +0x0061: 'ATAN2',
  276 +0x0062: 'ASIN',
  277 +0x0063: 'ACOS',
  278 +0x0064: 'CHOOSE',
  279 +0x0065: 'HLOOKUP',
  280 +0x0066: 'VLOOKUP',
  281 +0x0067: 'LINKS',
  282 +0x0068: 'INPUT',
  283 +0x0069: 'ISREF',
  284 +0x006A: 'GET.FORMULA',
  285 +0x006B: 'GET.NAME',
  286 +0x006C: 'SET.VALUE',
  287 +0x006D: 'LOG',
  288 +0x006E: 'EXEC',
  289 +0x006F: 'CHAR',
  290 +0x0070: 'LOWER',
  291 +0x0071: 'UPPER',
  292 +0x0072: 'PROPER',
  293 +0x0073: 'LEFT',
  294 +0x0074: 'RIGHT',
  295 +0x0075: 'EXACT',
  296 +0x0076: 'TRIM',
  297 +0x0077: 'REPLACE',
  298 +0x0078: 'SUBSTITUTE',
  299 +0x0079: 'CODE',
  300 +0x007A: 'NAMES',
  301 +0x007B: 'DIRECTORY',
  302 +0x007C: 'FIND',
  303 +0x007D: 'CELL',
  304 +0x007E: 'ISERR',
  305 +0x007F: 'ISTEXT',
  306 +0x0080: 'ISNUMBER',
  307 +0x0081: 'ISBLANK',
  308 +0x0082: 'T',
  309 +0x0083: 'N',
  310 +0x0084: 'FOPEN',
  311 +0x0085: 'FCLOSE',
  312 +0x0086: 'FSIZE',
  313 +0x0087: 'FREADLN',
  314 +0x0088: 'FREAD',
  315 +0x0089: 'FWRITELN',
  316 +0x008A: 'FWRITE',
  317 +0x008B: 'FPOS',
  318 +0x008C: 'DATEVALUE',
  319 +0x008D: 'TIMEVALUE',
  320 +0x008E: 'SLN',
  321 +0x008F: 'SYD',
  322 +0x0090: 'DDB',
  323 +0x0091: 'GET.DEF',
  324 +0x0092: 'REFTEXT',
  325 +0x0093: 'TEXTREF',
  326 +0x0094: 'INDIRECT',
  327 +0x0095: 'REGISTER',
  328 +0x0096: 'CALL',
  329 +0x0097: 'ADD.BAR',
  330 +0x0098: 'ADD.MENU',
  331 +0x0099: 'ADD.COMMAND',
  332 +0x009A: 'ENABLE.COMMAND',
  333 +0x009B: 'CHECK.COMMAND',
  334 +0x009C: 'RENAME.COMMAND',
  335 +0x009D: 'SHOW.BAR',
  336 +0x009E: 'DELETE.MENU',
  337 +0x009F: 'DELETE.COMMAND',
  338 +0x00A0: 'GET.CHART.ITEM',
  339 +0x00A1: 'DIALOG.BOX',
  340 +0x00A2: 'CLEAN',
  341 +0x00A3: 'MDETERM',
  342 +0x00A4: 'MINVERSE',
  343 +0x00A5: 'MMULT',
  344 +0x00A6: 'FILES',
  345 +0x00A7: 'IPMT',
  346 +0x00A8: 'PPMT',
  347 +0x00A9: 'COUNTA',
  348 +0x00AA: 'CANCEL.KEY',
  349 +0x00AB: 'FOR',
  350 +0x00AC: 'WHILE',
  351 +0x00AD: 'BREAK',
  352 +0x00AE: 'NEXT',
  353 +0x00AF: 'INITIATE',
  354 +0x00B0: 'REQUEST',
  355 +0x00B1: 'POKE',
  356 +0x00B2: 'EXECUTE',
  357 +0x00B3: 'TERMINATE',
  358 +0x00B4: 'RESTART',
  359 +0x00B5: 'HELP',
  360 +0x00B6: 'GET.BAR',
  361 +0x00B7: 'PRODUCT',
  362 +0x00B8: 'FACT',
  363 +0x00B9: 'GET.CELL',
  364 +0x00BA: 'GET.WORKSPACE',
  365 +0x00BB: 'GET.WINDOW',
  366 +0x00BC: 'GET.DOCUMENT',
  367 +0x00BD: 'DPRODUCT',
  368 +0x00BE: 'ISNONTEXT',
  369 +0x00BF: 'GET.NOTE',
  370 +0x00C0: 'NOTE',
  371 +0x00C1: 'STDEVP',
  372 +0x00C2: 'VARP',
  373 +0x00C3: 'DSTDEVP',
  374 +0x00C4: 'DVARP',
  375 +0x00C5: 'TRUNC',
  376 +0x00C6: 'ISLOGICAL',
  377 +0x00C7: 'DCOUNTA',
  378 +0x00C8: 'DELETE.BAR',
  379 +0x00C9: 'UNREGISTER',
  380 +0x00CC: 'USDOLLAR',
  381 +0x00CD: 'FINDB',
  382 +0x00CE: 'SEARCHB',
  383 +0x00CF: 'REPLACEB',
  384 +0x00D0: 'LEFTB',
  385 +0x00D1: 'RIGHTB',
  386 +0x00D2: 'MIDB',
  387 +0x00D3: 'LENB',
  388 +0x00D4: 'ROUNDUP',
  389 +0x00D5: 'ROUNDDOWN',
  390 +0x00D6: 'ASC',
  391 +0x00D7: 'DBCS',
  392 +0x00D8: 'RANK',
  393 +0x00DB: 'ADDRESS',
  394 +0x00DC: 'DAYS360',
  395 +0x00DD: 'TODAY',
  396 +0x00DE: 'VDB',
  397 +0x00DF: 'ELSE',
  398 +0x00E0: 'ELSE.IF',
  399 +0x00E1: 'END.IF',
  400 +0x00E2: 'FOR.CELL',
  401 +0x00E3: 'MEDIAN',
  402 +0x00E4: 'SUMPRODUCT',
  403 +0x00E5: 'SINH',
  404 +0x00E6: 'COSH',
  405 +0x00E7: 'TANH',
  406 +0x00E8: 'ASINH',
  407 +0x00E9: 'ACOSH',
  408 +0x00EA: 'ATANH',
  409 +0x00EB: 'DGET',
  410 +0x00EC: 'CREATE.OBJECT',
  411 +0x00ED: 'VOLATILE',
  412 +0x00EE: 'LAST.ERROR',
  413 +0x00EF: 'CUSTOM.UNDO',
  414 +0x00F0: 'CUSTOM.REPEAT',
  415 +0x00F1: 'FORMULA.CONVERT',
  416 +0x00F2: 'GET.LINK.INFO',
  417 +0x00F3: 'TEXT.BOX',
  418 +0x00F4: 'INFO',
  419 +0x00F5: 'GROUP',
  420 +0x00F6: 'GET.OBJECT',
  421 +0x00F7: 'DB',
  422 +0x00F8: 'PAUSE',
  423 +0x00FB: 'RESUME',
  424 +0x00FC: 'FREQUENCY',
  425 +0x00FD: 'ADD.TOOLBAR',
  426 +0x00FE: 'DELETE.TOOLBAR',
  427 +0x00FF: 'User Defined Function',
  428 +0x0100: 'RESET.TOOLBAR',
  429 +0x0101: 'EVALUATE',
  430 +0x0102: 'GET.TOOLBAR',
  431 +0x0103: 'GET.TOOL',
  432 +0x0104: 'SPELLING.CHECK',
  433 +0x0105: 'ERROR.TYPE',
  434 +0x0106: 'APP.TITLE',
  435 +0x0107: 'WINDOW.TITLE',
  436 +0x0108: 'SAVE.TOOLBAR',
  437 +0x0109: 'ENABLE.TOOL',
  438 +0x010A: 'PRESS.TOOL',
  439 +0x010B: 'REGISTER.ID',
  440 +0x010C: 'GET.WORKBOOK',
  441 +0x010D: 'AVEDEV',
  442 +0x010E: 'BETADIST',
  443 +0x010F: 'GAMMALN',
  444 +0x0110: 'BETAINV',
  445 +0x0111: 'BINOMDIST',
  446 +0x0112: 'CHIDIST',
  447 +0x0113: 'CHIINV',
  448 +0x0114: 'COMBIN',
  449 +0x0115: 'CONFIDENCE',
  450 +0x0116: 'CRITBINOM',
  451 +0x0117: 'EVEN',
  452 +0x0118: 'EXPONDIST',
  453 +0x0119: 'FDIST',
  454 +0x011A: 'FINV',
  455 +0x011B: 'FISHER',
  456 +0x011C: 'FISHERINV',
  457 +0x011D: 'FLOOR',
  458 +0x011E: 'GAMMADIST',
  459 +0x011F: 'GAMMAINV',
  460 +0x0120: 'CEILING',
  461 +0x0121: 'HYPGEOMDIST',
  462 +0x0122: 'LOGNORMDIST',
  463 +0x0123: 'LOGINV',
  464 +0x0124: 'NEGBINOMDIST',
  465 +0x0125: 'NORMDIST',
  466 +0x0126: 'NORMSDIST',
  467 +0x0127: 'NORMINV',
  468 +0x0128: 'NORMSINV',
  469 +0x0129: 'STANDARDIZE',
  470 +0x012A: 'ODD',
  471 +0x012B: 'PERMUT',
  472 +0x012C: 'POISSON',
  473 +0x012D: 'TDIST',
  474 +0x012E: 'WEIBULL',
  475 +0x012F: 'SUMXMY2',
  476 +0x0130: 'SUMX2MY2',
  477 +0x0131: 'SUMX2PY2',
  478 +0x0132: 'CHITEST',
  479 +0x0133: 'CORREL',
  480 +0x0134: 'COVAR',
  481 +0x0135: 'FORECAST',
  482 +0x0136: 'FTEST',
  483 +0x0137: 'INTERCEPT',
  484 +0x0138: 'PEARSON',
  485 +0x0139: 'RSQ',
  486 +0x013A: 'STEYX',
  487 +0x013B: 'SLOPE',
  488 +0x013C: 'TTEST',
  489 +0x013D: 'PROB',
  490 +0x013E: 'DEVSQ',
  491 +0x013F: 'GEOMEAN',
  492 +0x0140: 'HARMEAN',
  493 +0x0141: 'SUMSQ',
  494 +0x0142: 'KURT',
  495 +0x0143: 'SKEW',
  496 +0x0144: 'ZTEST',
  497 +0x0145: 'LARGE',
  498 +0x0146: 'SMALL',
  499 +0x0147: 'QUARTILE',
  500 +0x0148: 'PERCENTILE',
  501 +0x0149: 'PERCENTRANK',
  502 +0x014A: 'MODE',
  503 +0x014B: 'TRIMMEAN',
  504 +0x014C: 'TINV',
  505 +0x014E: 'MOVIE.COMMAND',
  506 +0x014F: 'GET.MOVIE',
  507 +0x0150: 'CONCATENATE',
  508 +0x0151: 'POWER',
  509 +0x0152: 'PIVOT.ADD.DATA',
  510 +0x0153: 'GET.PIVOT.TABLE',
  511 +0x0154: 'GET.PIVOT.FIELD',
  512 +0x0155: 'GET.PIVOT.ITEM',
  513 +0x0156: 'RADIANS',
  514 +0x0157: 'DEGREES',
  515 +0x0158: 'SUBTOTAL',
  516 +0x0159: 'SUMIF',
  517 +0x015A: 'COUNTIF',
  518 +0x015B: 'COUNTBLANK',
  519 +0x015C: 'SCENARIO.GET',
  520 +0x015D: 'OPTIONS.LISTS.GET',
  521 +0x015E: 'ISPMT',
  522 +0x015F: 'DATEDIF',
  523 +0x0160: 'DATESTRING',
  524 +0x0161: 'NUMBERSTRING',
  525 +0x0162: 'ROMAN',
  526 +0x0163: 'OPEN.DIALOG',
  527 +0x0164: 'SAVE.DIALOG',
  528 +0x0165: 'VIEW.GET',
  529 +0x0166: 'GETPIVOTDATA',
  530 +0x0167: 'HYPERLINK',
  531 +0x0168: 'PHONETIC',
  532 +0x0169: 'AVERAGEA',
  533 +0x016A: 'MAXA',
  534 +0x016B: 'MINA',
  535 +0x016C: 'STDEVPA',
  536 +0x016D: 'VARPA',
  537 +0x016E: 'STDEVA',
  538 +0x016F: 'VARA',
  539 +0x0170: 'BAHTTEXT',
  540 +0x0171: 'THAIDAYOFWEEK',
  541 +0x0172: 'THAIDIGIT',
  542 +0x0173: 'THAIMONTHOFYEAR',
  543 +0x0174: 'THAINUMSOUND',
  544 +0x0175: 'THAINUMSTRING',
  545 +0x0176: 'THAISTRINGLENGTH',
  546 +0x0177: 'ISTHAIDIGIT',
  547 +0x0178: 'ROUNDBAHTDOWN',
  548 +0x0179: 'ROUNDBAHTUP',
  549 +0x017A: 'THAIYEAR',
  550 +0x017B: 'RTD',
  551 +
  552 +0x8076: 'ALERT',
  553 +}
  554 +
  555 +dOpcodes = {
  556 + 0x06: 'FORMULA : Cell Formula',
  557 + 0x0A: 'EOF : End of File',
  558 + 0x0C: 'CALCCOUNT : Iteration Count',
  559 + 0x0D: 'CALCMODE : Calculation Mode',
  560 + 0x0E: 'PRECISION : Precision',
  561 + 0x0F: 'REFMODE : Reference Mode',
  562 + 0x10: 'DELTA : Iteration Increment',
  563 + 0x11: 'ITERATION : Iteration Mode',
  564 + 0x12: 'PROTECT : Protection Flag',
  565 + 0x13: 'PASSWORD : Protection Password',
  566 + 0x14: 'HEADER : Print Header on Each Page',
  567 + 0x15: 'FOOTER : Print Footer on Each Page',
  568 + 0x16: 'EXTERNCOUNT : Number of External References',
  569 + 0x17: 'EXTERNSHEET : External Reference',
  570 + 0x18: 'LABEL : Cell Value, String Constant',
  571 + 0x19: 'WINDOWPROTECT : Windows Are Protected',
  572 + 0x1A: 'VERTICALPAGEBREAKS : Explicit Column Page Breaks',
  573 + 0x1B: 'HORIZONTALPAGEBREAKS : Explicit Row Page Breaks',
  574 + 0x1C: 'NOTE : Comment Associated with a Cell',
  575 + 0x1D: 'SELECTION : Current Selection',
  576 + 0x22: '1904 : 1904 Date System',
  577 + 0x26: 'LEFTMARGIN : Left Margin Measurement',
  578 + 0x27: 'RIGHTMARGIN : Right Margin Measurement',
  579 + 0x28: 'TOPMARGIN : Top Margin Measurement',
  580 + 0x29: 'BOTTOMMARGIN : Bottom Margin Measurement',
  581 + 0x2A: 'PRINTHEADERS : Print Row/Column Labels',
  582 + 0x2B: 'PRINTGRIDLINES : Print Gridlines Flag',
  583 + 0x2F: 'FILEPASS : File Is Password-Protected',
  584 + 0x3C: 'CONTINUE : Continues Long Records',
  585 + 0x3D: 'WINDOW1 : Window Information',
  586 + 0x40: 'BACKUP : Save Backup Version of the File',
  587 + 0x41: 'PANE : Number of Panes and Their Position',
  588 + 0x42: 'CODENAME : VBE Object Name',
  589 + 0x42: 'CODEPAGE : Default Code Page',
  590 + 0x4D: 'PLS : Environment-Specific Print Record',
  591 + 0x50: 'DCON : Data Consolidation Information',
  592 + 0x51: 'DCONREF : Data Consolidation References',
  593 + 0x52: 'DCONNAME : Data Consolidation Named References',
  594 + 0x55: 'DEFCOLWIDTH : Default Width for Columns',
  595 + 0x59: 'XCT : CRN Record Count',
  596 + 0x5A: 'CRN : Nonresident Operands',
  597 + 0x5B: 'FILESHARING : File-Sharing Information',
  598 + 0x5C: 'WRITEACCESS : Write Access User Name',
  599 + 0x5D: 'OBJ : Describes a Graphic Object',
  600 + 0x5E: 'UNCALCED : Recalculation Status',
  601 + 0x5F: 'SAVERECALC : Recalculate Before Save',
  602 + 0x60: 'TEMPLATE : Workbook Is a Template',
  603 + 0x63: 'OBJPROTECT : Objects Are Protected',
  604 + 0x7D: 'COLINFO : Column Formatting Information',
  605 + 0x7E: 'RK : Cell Value, RK Number',
  606 + 0x7F: 'IMDATA : Image Data',
  607 + 0x80: 'GUTS : Size of Row and Column Gutters',
  608 + 0x81: 'WSBOOL : Additional Workspace Information',
  609 + 0x82: 'GRIDSET : State Change of Gridlines Option',
  610 + 0x83: 'HCENTER : Center Between Horizontal Margins',
  611 + 0x84: 'VCENTER : Center Between Vertical Margins',
  612 + 0x85: 'BOUNDSHEET : Sheet Information',
  613 + 0x86: 'WRITEPROT : Workbook Is Write-Protected',
  614 + 0x87: 'ADDIN : Workbook Is an Add-in Macro',
  615 + 0x88: 'EDG : Edition Globals',
  616 + 0x89: 'PUB : Publisher',
  617 + 0x8C: 'COUNTRY : Default Country and WIN.INI Country',
  618 + 0x8D: 'HIDEOBJ : Object Display Options',
  619 + 0x90: 'SORT : Sorting Options',
  620 + 0x91: 'SUB : Subscriber',
  621 + 0x92: 'PALETTE : Color Palette Definition',
  622 + 0x94: 'LHRECORD : .WK? File Conversion Information',
  623 + 0x95: 'LHNGRAPH : Named Graph Information',
  624 + 0x96: 'SOUND : Sound Note',
  625 + 0x98: 'LPR : Sheet Was Printed Using LINE.PRINT(',
  626 + 0x99: 'STANDARDWIDTH : Standard Column Width',
  627 + 0x9A: 'FNGROUPNAME : Function Group Name',
  628 + 0x9B: 'FILTERMODE : Sheet Contains Filtered List',
  629 + 0x9C: 'FNGROUPCOUNT : Built-in Function Group Count',
  630 + 0x9D: 'AUTOFILTERINFO : Drop-Down Arrow Count',
  631 + 0x9E: 'AUTOFILTER : AutoFilter Data',
  632 + 0xA0: 'SCL : Window Zoom Magnification',
  633 + 0xA1: 'SETUP : Page Setup',
  634 + 0xA9: 'COORDLIST : Polygon Object Vertex Coordinates',
  635 + 0xAB: 'GCW : Global Column-Width Flags',
  636 + 0xAE: 'SCENMAN : Scenario Output Data',
  637 + 0xAF: 'SCENARIO : Scenario Data',
  638 + 0xB0: 'SXVIEW : View Definition',
  639 + 0xB1: 'SXVD : View Fields',
  640 + 0xB2: 'SXVI : View Item',
  641 + 0xB4: 'SXIVD : Row/Column Field IDs',
  642 + 0xB5: 'SXLI : Line Item Array',
  643 + 0xB6: 'SXPI : Page Item',
  644 + 0xB8: 'DOCROUTE : Routing Slip Information',
  645 + 0xB9: 'RECIPNAME : Recipient Name',
  646 + 0xBC: 'SHRFMLA : Shared Formula',
  647 + 0xBD: 'MULRK : Multiple RK Cells',
  648 + 0xBE: 'MULBLANK : Multiple Blank Cells',
  649 + 0xC1: 'MMS : ADDMENU / DELMENU Record Group Count',
  650 + 0xC2: 'ADDMENU : Menu Addition',
  651 + 0xC3: 'DELMENU : Menu Deletion',
  652 + 0xC5: 'SXDI : Data Item',
  653 + 0xC6: 'SXDB : PivotTable Cache Data',
  654 + 0xCD: 'SXSTRING : String',
  655 + 0xD0: 'SXTBL : Multiple Consolidation Source Info',
  656 + 0xD1: 'SXTBRGIITM : Page Item Name Count',
  657 + 0xD2: 'SXTBPG : Page Item Indexes',
  658 + 0xD3: 'OBPROJ : Visual Basic Project',
  659 + 0xD5: 'SXIDSTM : Stream ID',
  660 + 0xD6: 'RSTRING : Cell with Character Formatting',
  661 + 0xD7: 'DBCELL : Stream Offsets',
  662 + 0xDA: 'BOOKBOOL : Workbook Option Flag',
  663 + 0xDC: 'PARAMQRY : Query Parameters',
  664 + 0xDC: 'SXEXT : External Source Information',
  665 + 0xDD: 'SCENPROTECT : Scenario Protection',
  666 + 0xDE: 'OLESIZE : Size of OLE Object',
  667 + 0xDF: 'UDDESC : Description String for Chart Autoformat',
  668 + 0xE0: 'XF : Extended Format',
  669 + 0xE1: 'INTERFACEHDR : Beginning of User Interface Records',
  670 + 0xE2: 'INTERFACEEND : End of User Interface Records',
  671 + 0xE3: 'SXVS : View Source',
  672 + 0xE5: 'MERGECELLS : Merged Cells',
  673 + 0xEA: 'TABIDCONF : Sheet Tab ID of Conflict History',
  674 + 0xEB: 'MSODRAWINGGROUP : Microsoft Office Drawing Group',
  675 + 0xEC: 'MSODRAWING : Microsoft Office Drawing',
  676 + 0xED: 'MSODRAWINGSELECTION : Microsoft Office Drawing Selection',
  677 + 0xF0: 'SXRULE : PivotTable Rule Data',
  678 + 0xF1: 'SXEX : PivotTable View Extended Information',
  679 + 0xF2: 'SXFILT : PivotTable Rule Filter',
  680 + 0xF4: 'SXDXF : Pivot Table Formatting',
  681 + 0xF5: 'SXITM : Pivot Table Item Indexes',
  682 + 0xF6: 'SXNAME : PivotTable Name',
  683 + 0xF7: 'SXSELECT : PivotTable Selection Information',
  684 + 0xF8: 'SXPAIR : PivotTable Name Pair',
  685 + 0xF9: 'SXFMLA : Pivot Table Parsed Expression',
  686 + 0xFB: 'SXFORMAT : PivotTable Format Record',
  687 + 0xFC: 'SST : Shared String Table',
  688 + 0xFD: 'LABELSST : Cell Value, String Constant/ SST',
  689 + 0xFF: 'EXTSST : Extended Shared String Table',
  690 + 0x100: 'SXVDEX : Extended PivotTable View Fields',
  691 + 0x103: 'SXFORMULA : PivotTable Formula Record',
  692 + 0x122: 'SXDBEX : PivotTable Cache Data',
  693 + 0x13D: 'TABID : Sheet Tab Index Array',
  694 + 0x160: 'USESELFS : Natural Language Formulas Flag',
  695 + 0x161: 'DSF : Double Stream File',
  696 + 0x162: 'XL5MODIFY : Flag for DSF',
  697 + 0x1A5: 'FILESHARING2 : File-Sharing Information for Shared Lists',
  698 + 0x1A9: 'USERBVIEW : Workbook Custom View Settings',
  699 + 0x1AA: 'USERSVIEWBEGIN : Custom View Settings',
  700 + 0x1AB: 'USERSVIEWEND : End of Custom View Records',
  701 + 0x1AD: 'QSI : External Data Range',
  702 + 0x1AE: 'SUPBOOK : Supporting Workbook',
  703 + 0x1AF: 'PROT4REV : Shared Workbook Protection Flag',
  704 + 0x1B0: 'CONDFMT : Conditional Formatting Range Information',
  705 + 0x1B1: 'CF : Conditional Formatting Conditions',
  706 + 0x1B2: 'DVAL : Data Validation Information',
  707 + 0x1B5: 'DCONBIN : Data Consolidation Information',
  708 + 0x1B6: 'TXO : Text Object',
  709 + 0x1B7: 'REFRESHALL : Refresh Flag',
  710 + 0x1B8: 'HLINK : Hyperlink',
  711 + 0x1BB: 'SXFDBTYPE : SQL Datatype Identifier',
  712 + 0x1BC: 'PROT4REVPASS : Shared Workbook Protection Password',
  713 + 0x1BE: 'DV : Data Validation Criteria',
  714 + 0x1C0: 'EXCEL9FILE : Excel 9 File',
  715 + 0x1C1: 'RECALCID : Recalc Information',
  716 + 0x200: 'DIMENSIONS : Cell Table Size',
  717 + 0x201: 'BLANK : Cell Value, Blank Cell',
  718 + 0x203: 'NUMBER : Cell Value, Floating-Point Number',
  719 + 0x204: 'LABEL : Cell Value, String Constant',
  720 + 0x205: 'BOOLERR : Cell Value, Boolean or Error',
  721 + 0x207: 'STRING : String Value of a Formula',
  722 + 0x208: 'ROW : Describes a Row',
  723 + 0x20B: 'INDEX : Index Record',
  724 + 0x218: 'NAME : Defined Name',
  725 + 0x221: 'ARRAY : Array-Entered Formula',
  726 + 0x223: 'EXTERNNAME : Externally Referenced Name',
  727 + 0x225: 'DEFAULTROWHEIGHT : Default Row Height',
  728 + 0x231: 'FONT : Font Description',
  729 + 0x236: 'TABLE : Data Table',
  730 + 0x23E: 'WINDOW2 : Sheet Window Information',
  731 + 0x293: 'STYLE : Style Information',
  732 + 0x406: 'FORMULA : Cell Formula',
  733 + 0x41E: 'FORMAT : Number Format',
  734 + 0x800: 'HLINKTOOLTIP : Hyperlink Tooltip',
  735 + 0x801: 'WEBPUB : Web Publish Item',
  736 + 0x802: 'QSISXTAG : PivotTable and Query Table Extensions',
  737 + 0x803: 'DBQUERYEXT : Database Query Extensions',
  738 + 0x804: 'EXTSTRING : FRT String',
  739 + 0x805: 'TXTQUERY : Text Query Information',
  740 + 0x806: 'QSIR : Query Table Formatting',
  741 + 0x807: 'QSIF : Query Table Field Formatting',
  742 + 0x809: 'BOF : Beginning of File',
  743 + 0x80A: 'OLEDBCONN : OLE Database Connection',
  744 + 0x80B: 'WOPT : Web Options',
  745 + 0x80C: 'SXVIEWEX : Pivot Table OLAP Extensions',
  746 + 0x80D: 'SXTH : PivotTable OLAP Hierarchy',
  747 + 0x80E: 'SXPIEX : OLAP Page Item Extensions',
  748 + 0x80F: 'SXVDTEX : View Dimension OLAP Extensions',
  749 + 0x810: 'SXVIEWEX9 : Pivot Table Extensions',
  750 + 0x812: 'CONTINUEFRT : Continued FRT',
  751 + 0x813: 'REALTIMEDATA : Real-Time Data (RTD)',
  752 + 0x862: 'SHEETEXT : Extra Sheet Info',
  753 + 0x863: 'BOOKEXT : Extra Book Info',
  754 + 0x864: 'SXADDL : Pivot Table Additional Info',
  755 + 0x865: 'CRASHRECERR : Crash Recovery Error',
  756 + 0x866: 'HFPicture : Header / Footer Picture',
  757 + 0x867: 'FEATHEADR : Shared Feature Header',
  758 + 0x868: 'FEAT : Shared Feature Record',
  759 + 0x86A: 'DATALABEXT : Chart Data Label Extension',
  760 + 0x86B: 'DATALABEXTCONTENTS : Chart Data Label Extension Contents',
  761 + 0x86C: 'CELLWATCH : Cell Watch',
  762 + 0x86d: 'FEATINFO : Shared Feature Info Record',
  763 + 0x871: 'FEATHEADR11 : Shared Feature Header 11',
  764 + 0x872: 'FEAT11 : Shared Feature 11 Record',
  765 + 0x873: 'FEATINFO11 : Shared Feature Info 11 Record',
  766 + 0x874: 'DROPDOWNOBJIDS : Drop Down Object',
  767 + 0x875: 'CONTINUEFRT11 : Continue FRT 11',
  768 + 0x876: 'DCONN : Data Connection',
  769 + 0x877: 'LIST12 : Extra Table Data Introduced in Excel 2007',
  770 + 0x878: 'FEAT12 : Shared Feature 12 Record',
  771 + 0x879: 'CONDFMT12 : Conditional Formatting Range Information 12',
  772 + 0x87A: 'CF12 : Conditional Formatting Condition 12',
  773 + 0x87B: 'CFEX : Conditional Formatting Extension',
  774 + 0x87C: 'XFCRC : XF Extensions Checksum',
  775 + 0x87D: 'XFEXT : XF Extension',
  776 + 0x87E: 'EZFILTER12 : AutoFilter Data Introduced in Excel 2007',
  777 + 0x87F: 'CONTINUEFRT12 : Continue FRT 12',
  778 + 0x881: 'SXADDL12 : Additional Workbook Connections Information',
  779 + 0x884: 'MDTINFO : Information about a Metadata Type',
  780 + 0x885: 'MDXSTR : MDX Metadata String',
  781 + 0x886: 'MDXTUPLE : Tuple MDX Metadata',
  782 + 0x887: 'MDXSET : Set MDX Metadata',
  783 + 0x888: 'MDXPROP : Member Property MDX Metadata',
  784 + 0x889: 'MDXKPI : Key Performance Indicator MDX Metadata',
  785 + 0x88A: 'MDTB : Block of Metadata Records',
  786 + 0x88B: 'PLV : Page Layout View Settings in Excel 2007',
  787 + 0x88C: 'COMPAT12 : Compatibility Checker 12',
  788 + 0x88D: 'DXF : Differential XF',
  789 + 0x88E: 'TABLESTYLES : Table Styles',
  790 + 0x88F: 'TABLESTYLE : Table Style',
  791 + 0x890: 'TABLESTYLEELEMENT : Table Style Element',
  792 + 0x892: 'STYLEEXT : Named Cell Style Extension',
  793 + 0x893: 'NAMEPUBLISH : Publish To Excel Server Data for Name',
  794 + 0x894: 'NAMECMT : Name Comment',
  795 + 0x895: 'SORTDATA12 : Sort Data 12',
  796 + 0x896: 'THEME : Theme',
  797 + 0x897: 'GUIDTYPELIB : VB Project Typelib GUID',
  798 + 0x898: 'FNGRP12 : Function Group',
  799 + 0x899: 'NAMEFNGRP12 : Extra Function Group',
  800 + 0x89A: 'MTRSETTINGS : Multi-Threaded Calculation Settings',
  801 + 0x89B: 'COMPRESSPICTURES : Automatic Picture Compression Mode',
  802 + 0x89C: 'HEADERFOOTER : Header Footer',
  803 + 0x8A3: 'FORCEFULLCALCULATION : Force Full Calculation Settings',
  804 + 0x8c1: 'LISTOBJ : List Object',
  805 + 0x8c2: 'LISTFIELD : List Field',
  806 + 0x8c3: 'LISTDV : List Data Validation',
  807 + 0x8c4: 'LISTCONDFMT : List Conditional Formatting',
  808 + 0x8c5: 'LISTCF : List Cell Formatting',
  809 + 0x8c6: 'FMQRY : Filemaker queries',
  810 + 0x8c7: 'FMSQRY : File maker queries',
  811 + 0x8c8: 'PLV : Page Layout View in Mac Excel 11',
  812 + 0x8c9: 'LNEXT : Extension information for borders in Mac Office 11',
  813 + 0x8ca: 'MKREXT : Extension information for markers in Mac Office 11'
  814 +}
  815 +
  816 +
  817 +# CIC: Call If Callable
  818 +def CIC(expression):
  819 + if callable(expression):
  820 + return expression()
  821 + else:
  822 + return expression
  823 +
  824 +
  825 +# IFF: IF Function
  826 +def IFF(expression, valueTrue, valueFalse):
  827 + if expression:
  828 + return CIC(valueTrue)
  829 + else:
  830 + return CIC(valueFalse)
  831 +
  832 +
  833 +def CombineHexASCII(hexDump, asciiDump, length):
  834 + if hexDump == '':
  835 + return ''
  836 + return hexDump + ' ' + (' ' * (3 * (length - len(asciiDump)))) + asciiDump
  837 +
  838 +def HexASCII(data, length=16):
  839 + result = []
  840 + if len(data) > 0:
  841 + hexDump = ''
  842 + asciiDump = ''
  843 + for i, b in enumerate(data):
  844 + if i % length == 0:
  845 + if hexDump != '':
  846 + result.append(CombineHexASCII(hexDump, asciiDump, length))
  847 + hexDump = '%08X:' % i
  848 + asciiDump = ''
  849 + hexDump += ' %02X' % ord(b)
  850 + asciiDump += IFF(ord(b) >= 32, b, '.')
  851 + result.append(CombineHexASCII(hexDump, asciiDump, length))
  852 + return result
  853 +
  854 +def StringsASCII(data):
  855 + """
  856 + Extract a list of plain ASCII strings of 4+ chars found in data.
  857 + :param data: bytearray or bytes
  858 + :return: list of str (converted to unicode on Python 3)
  859 + """
  860 + # list of bytes strings:
  861 + bytes_strings = re.findall(b'[^\x00-\x08\x0A-\x1F\x7F-\xFF]{4,}', bytes(data))
  862 + return [bytes2str(bs) for bs in bytes_strings]
  863 +
  864 +def StringsUNICODE(data):
  865 + """
  866 + Extract a list of Unicode strings (made of 4+ plain ASCII characters only) found in data.
  867 + :param data: bytearray or bytes
  868 + :return: list of str (converted to unicode on Python 3)
  869 + """
  870 + # list of bytes strings:
  871 + # TODO: check if the null byte should be before or after the ascii byte
  872 + bytes_strings = [foundunicodestring.replace(b'\x00', b'') for foundunicodestring, dummy in re.findall(b'(([^\x00-\x08\x0A-\x1F\x7F-\xFF]\x00){4,})', bytes(data))]
  873 + return [bytes2str(bs) for bs in bytes_strings]
  874 +
  875 +def Strings(data, encodings='sL'):
  876 + """
  877 +
  878 + :param data bytearray: bytearray, data to be scanned for strings
  879 + :param encodings:
  880 + :return: dict with key = 's' or 'L', values = list of str
  881 + """
  882 + dStrings = {}
  883 + for encoding in encodings:
  884 + if encoding == 's':
  885 + dStrings[encoding] = StringsASCII(data)
  886 + elif encoding == 'L':
  887 + dStrings[encoding] = StringsUNICODE(data)
  888 + return dStrings
  889 +
  890 +def ContainsWord(word, expression):
  891 + return struct.pack('<H', word) in expression
  892 +
  893 +# https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/6e5eed10-5b77-43d6-8dd0-37345f8654ad
  894 +def ParseLoc(expression):
  895 + """
  896 +
  897 + :param expression bytearray: bytearray, data to be parsed
  898 + :return:
  899 + :rtype: str
  900 + """
  901 + formatcodes = 'HH'
  902 + formatsize = struct.calcsize(formatcodes)
  903 + row, column = struct.unpack(formatcodes, expression[0:formatsize])
  904 + rowRelative = column & 0x8000
  905 + colRelative = column & 0x4000
  906 + column = column & 0x3FFF
  907 + if rowRelative:
  908 + rowindicator = '~'
  909 + else:
  910 + rowindicator = ''
  911 + row += 1
  912 + if colRelative:
  913 + colindicator = '~'
  914 + else:
  915 + colindicator = ''
  916 + column += 1
  917 + return 'R%s%dC%s%d' % (rowindicator, row, colindicator, column)
  918 +
  919 +def ParseExpression(expression):
  920 + '''
  921 + Parse an expression into a human readable string.
  922 +
  923 + :param expression bytearray: bytearray, expression data to be parsed
  924 + :return: str, parsed expression as a string (bytes on Python 2, unicode on python 3)
  925 + :rtype: str
  926 + '''
  927 + result = ''
  928 + while len(expression) > 0:
  929 + ptgid = expression[0] # int
  930 + expression = expression[1:] # bytearray
  931 + if ptgid in dTokens:
  932 + result += dTokens[ptgid] + ' '
  933 + if ptgid == 0x17: # ptgStr
  934 + length = expression[0] # int
  935 + expression = expression[1:]
  936 + if expression[0] == 0: # probably BIFF8 -> UNICODE (compressed)
  937 + expression = expression[1:]
  938 + result += '"%s" ' % bytes2str(expression[:length])
  939 + expression = expression[length:]
  940 + elif ptgid == 0x19: # ptgAttr
  941 + grbit = expression[0] # int
  942 + expression = expression[1:]
  943 + if grbit & 0x04:
  944 + result += 'CHOOSE '
  945 + break
  946 + else:
  947 + expression = expression[2:]
  948 + elif ptgid == 0x16 or ptgid == 0x0e: # 0x0E: 'ptgNE', 0x16: 'ptgMissArg'
  949 + pass
  950 + elif ptgid == 0x1e: # ptgInt
  951 + result += '%d ' % (expression[0] + expression[1] * 0x100)
  952 + expression = expression[2:]
  953 + elif ptgid == 0x41: # ptgFuncV
  954 + functionid = expression[0] + expression[1] * 0x100
  955 + result += '%s (0x%04x) ' % (dFunctions.get(functionid, '*UNKNOWN FUNCTION*'), functionid)
  956 + expression = expression[2:]
  957 + elif ptgid == 0x22 or ptgid == 0x42: # 0x22: 'ptgFuncVar', 0x42: 'ptgFuncVarV'
  958 + functionid = expression[1] + expression[2] * 0x100
  959 + result += 'args %d func %s (0x%04x) ' % (expression[0], dFunctions.get(functionid, '*UNKNOWN FUNCTION*'), functionid)
  960 + expression = expression[3:]
  961 + elif ptgid == 0x23: # ptgName
  962 + result += '%04x ' % (expression[0] + expression[1] * 0x100)
  963 + # TODO: looks like we're skipping quite a few bytes
  964 + expression = expression[14:]
  965 + elif ptgid == 0x1f: # ptgNum
  966 + result += 'FLOAT '
  967 + # TODO: looks like we're skipping quite a few bytes
  968 + expression = expression[8:]
  969 + elif ptgid == 0x26: # ptgMemArea
  970 + expression = expression[4:] # skipping 4 bytes
  971 + expression = expression[expression[0] + expression[1] * 0x100:]
  972 + result += 'REFERENCE-EXPRESSION '
  973 + elif ptgid == 0x01: # ptgExp
  974 + formatcodes = 'HH'
  975 + formatsize = struct.calcsize(formatcodes)
  976 + row, column = struct.unpack(formatcodes, expression[0:formatsize])
  977 + expression = expression[formatsize:]
  978 + result += 'R%dC%d ' % (row + 1, column + 1)
  979 + elif ptgid == 0x24 or ptgid == 0x44: # 0x24: 'ptgRef', 0x44: 'ptgRefV'
  980 + result += '%s ' % ParseLoc(expression)
  981 + expression = expression[4:]
  982 + elif ptgid == 0x3A or ptgid == 0x5A: # 0x3A: 'ptgRef3d', 0x5A: 'ptgRef3dV'
  983 + result += '%s ' % ParseLoc(expression[2:])
  984 + expression = expression[6:]
  985 + else:
  986 + break
  987 + else:
  988 + result += '*UNKNOWN TOKEN* '
  989 + break
  990 + if len(expression) == 0:
  991 + return result
  992 + else:
  993 + # 0x006E: 'EXEC', 0x0095: 'REGISTER'
  994 + functions = [dFunctions[functionid] for functionid in [0x6E, 0x95] if ContainsWord(functionid, expression)]
  995 + if functions != []:
  996 + message = ' Could contain following functions: ' + ','.join(functions) + ' -'
  997 + else:
  998 + message = ''
  999 + return result + ' *INCOMPLETE FORMULA PARSING*' + message + ' Remaining, unparsed expression: ' + repr(expression)
  1000 +
  1001 +
  1002 +class cBIFF(object): # cPluginParent):
  1003 + macroOnly = False
  1004 + name = 'BIFF plugin'
  1005 +
  1006 + def __init__(self, name, stream, options):
  1007 + self.streamname = name
  1008 + self.stream = stream
  1009 + self.options = options
  1010 + self.ran = False
  1011 +
  1012 + def Analyze(self):
  1013 + result = []
  1014 + macros4Found = False
  1015 + if self.streamname in [['Workbook'], ['Book']]:
  1016 + self.ran = True
  1017 + # use a bytearray to have Python 2+3 compatibility with the same code (no need for ord())
  1018 + stream = bytearray(self.stream)
  1019 +
  1020 + oParser = optparse.OptionParser()
  1021 + oParser.add_option('-s', '--strings', action='store_true', default=False, help='Dump strings')
  1022 + oParser.add_option('-a', '--hexascii', action='store_true', default=False, help='Dump hex ascii')
  1023 + oParser.add_option('-x', '--xlm', action='store_true', default=False, help='Select all records relevant for Excel 4.0 macros')
  1024 + oParser.add_option('-o', '--opcode', type=str, default='', help='Opcode to filter for')
  1025 + oParser.add_option('-f', '--find', type=str, default='', help='Content to search for')
  1026 + (options, args) = oParser.parse_args(self.options.split(' '))
  1027 +
  1028 + if options.find.startswith('0x'):
  1029 + options.find = binascii.a2b_hex(options.find[2:])
  1030 +
  1031 + while len(stream)>0:
  1032 + formatcodes = 'HH'
  1033 + formatsize = struct.calcsize(formatcodes)
  1034 + # print('formatsize=%d' % formatsize)
  1035 + opcode, length = struct.unpack(formatcodes, stream[0:formatsize])
  1036 + # print('opcode=%d length=%d len(stream)=%d' % (opcode, length, len(stream)))
  1037 + stream = stream[formatsize:]
  1038 + data = stream[:length]
  1039 + stream = stream[length:]
  1040 +
  1041 + if opcode in dOpcodes:
  1042 + opcodename = dOpcodes[opcode]
  1043 + else:
  1044 + opcodename = ''
  1045 + line = '%04x %6d %s' % (opcode, length, opcodename)
  1046 + # print(line)
  1047 +
  1048 + # FORMULA record
  1049 + if opcode == 0x06 and len(data) >= 21:
  1050 + formatcodes = 'HH'
  1051 + formatsize = struct.calcsize(formatcodes)
  1052 + row, column = struct.unpack(formatcodes, data[0:formatsize])
  1053 + formatcodes = 'H'
  1054 + formatsize = struct.calcsize(formatcodes)
  1055 + length = struct.unpack(formatcodes, data[20:20 + formatsize])[0]
  1056 + expression = data[22:]
  1057 + line += ' - R%dC%d len=%d %s' % (row + 1, column + 1, length, ParseExpression(expression))
  1058 + # print(line)
  1059 +
  1060 + # FORMULA record #a# difference BIFF4 and BIFF5+
  1061 + if opcode == 0x18 and len(data) >= 16:
  1062 + if data[0] & 0x20:
  1063 + dBuildInNames = {1: 'Auto_Open', 2: 'Auto_Close'}
  1064 + code = data[14]
  1065 + if code == 0: #a# hack with BIFF8 Unicode
  1066 + code = data[15]
  1067 + line += ' - build-in-name %d %s' % (code, dBuildInNames.get(code, '?'))
  1068 + else:
  1069 + pass
  1070 + line += ' - %s' % bytes2str(data[14:14+data[3]])
  1071 + # print(line)
  1072 +
  1073 + # BOUNDSHEET record
  1074 + if opcode == 0x85 and len(data) >= 6:
  1075 + dSheetType = {0: 'worksheet or dialog sheet', 1: 'Excel 4.0 macro sheet', 2: 'chart', 6: 'Visual Basic module'}
  1076 + if data[5] == 1:
  1077 + macros4Found = True
  1078 + dSheetState = {0: 'visible', 1: 'hidden', 2: 'very hidden'}
  1079 + line += ' - %s, %s' % (dSheetType.get(data[5], '%02x' % data[5]), dSheetState.get(data[4], '%02x' % data[4]))
  1080 + # print(line)
  1081 +
  1082 + # STRING record
  1083 + if opcode == 0x207 and len(data) >= 4:
  1084 + values = list(Strings(data[3:]).values())
  1085 + strings = ''
  1086 + if values[0] != []:
  1087 + strings += ' '.join(values[0])
  1088 + if values[1] != []:
  1089 + if strings != '':
  1090 + strings += ' '
  1091 + strings += ' '.join(values[1])
  1092 + line += ' - %s' % strings
  1093 + # print(line)
  1094 +
  1095 + if options.find == '' and options.opcode == '' and not options.xlm or options.opcode != '' and options.opcode.lower() in line.lower() or options.find != '' and options.find in data or options.xlm and opcode in [0x06, 0x18, 0x85, 0x207]:
  1096 + result.append(line)
  1097 +
  1098 + if options.hexascii:
  1099 + result.extend(' ' + foundstring for foundstring in HexASCII(data, 8))
  1100 + elif options.strings:
  1101 + dEncodings = {'s': 'ASCII', 'L': 'UNICODE'}
  1102 + for encoding, strings in Strings(data).items():
  1103 + if len(strings) > 0:
  1104 + result.append(' ' + dEncodings[encoding] + ':')
  1105 + result.extend(' ' + foundstring for foundstring in strings)
  1106 +
  1107 + if options.xlm and not macros4Found:
  1108 + result = []
  1109 +
  1110 + return result
  1111 +
  1112 +# AddPlugin(cBIFF)
... ...
oletools/thirdparty/tablestream/tablestream.py
... ... @@ -55,8 +55,9 @@ from __future__ import print_function
55 55 # 2016-08-28 v0.07 PL: - support for both Python 2.6+ and 3.x
56 56 # - all cells are converted to unicode
57 57 # 2018-09-22 v0.08 PL: - removed mention to oletools' thirdparty folder
  58 +# 2019-03-27 v0.09 PL: - slight fix, TableStyleSlim inherits from TableStyle
58 59  
59   -__version__ = '0.08'
  60 +__version__ = '0.09'
60 61  
61 62 #------------------------------------------------------------------------------
62 63 # TODO:
... ... @@ -174,7 +175,7 @@ class TableStyle(object):
174 175 bottom_right = u'+'
175 176  
176 177  
177   -class TableStyleSlim(object):
  178 +class TableStyleSlim(TableStyle):
178 179 """
179 180 Style for a TableStream.
180 181 Example:
... ...
oletools/thirdparty/xglob/xglob.py
1   -#! /usr/bin/env python2
2   -"""
3   -xglob
4   -
5   -xglob is a python package to list files matching wildcards (*, ?, []),
6   -extending the functionality of the glob module from the standard python
7   -library (https://docs.python.org/2/library/glob.html).
8   -
9   -Main features:
10   -- recursive file listing (including subfolders)
11   -- file listing within Zip archives
12   -- helper function to open files specified as arguments, supporting files
13   - within zip archives encrypted with a password
14   -
15   -Author: Philippe Lagadec - http://www.decalage.info
16   -License: BSD, see source code or documentation
17   -
18   -For more info and updates: http://www.decalage.info/xglob
19   -"""
20   -
21   -# LICENSE:
22   -#
23   -# xglob is copyright (c) 2013-2016, Philippe Lagadec (http://www.decalage.info)
24   -# All rights reserved.
25   -#
26   -# Redistribution and use in source and binary forms, with or without modification,
27   -# are permitted provided that the following conditions are met:
28   -#
29   -# * Redistributions of source code must retain the above copyright notice, this
30   -# list of conditions and the following disclaimer.
31   -# * Redistributions in binary form must reproduce the above copyright notice,
32   -# this list of conditions and the following disclaimer in the documentation
33   -# and/or other materials provided with the distribution.
34   -#
35   -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
36   -# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
37   -# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
38   -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
39   -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
40   -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
41   -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
42   -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
43   -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
44   -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
45   -
46   -
47   -#------------------------------------------------------------------------------
48   -# CHANGELOG:
49   -# 2013-12-04 v0.01 PL: - scan several files from command line args
50   -# 2014-01-14 v0.02 PL: - added riglob, ziglob
51   -# 2014-12-26 v0.03 PL: - moved code from balbuzard into a separate package
52   -# 2015-01-03 v0.04 PL: - fixed issues in iter_files + yield container name
53   -# 2016-02-24 v0.05 PL: - do not stop on exceptions, return them as data
54   -# - fixed issue when using wildcards with empty path
55   -# 2016-04-28 v0.06 CH: - improved handling of non-existing files
56   -# (by Christian Herdtweck)
57   -
58   -__version__ = '0.06'
59   -
60   -
61   -#=== IMPORTS =================================================================
62   -
63   -import os, fnmatch, glob, zipfile
64   -
65   -#=== EXCEPTIONS ==============================================================
66   -
67   -class PathNotFoundException(Exception):
68   - """ raised if given a fixed file/dir (not a glob) that does not exist """
69   - def __init__(self, path):
70   - super(PathNotFoundException, self).__init__(
71   - 'Given path does not exist: %r' % path)
72   -
73   -
74   -#=== FUNCTIONS ===============================================================
75   -
76   -# recursive glob function to find files in any subfolder:
77   -# inspired by http://stackoverflow.com/questions/14798220/how-can-i-search-sub-folders-using-glob-glob-module-in-python
78   -def rglob (path, pattern='*.*'):
79   - """
80   - Recursive glob:
81   - similar to glob.glob, but finds files recursively in all subfolders of path.
82   - path: root directory where to search files
83   - pattern: pattern for filenames, using wildcards, e.g. *.txt
84   - """
85   - #TODO: more compatible API with glob: use single param, split path from pattern
86   - return [os.path.join(dirpath, f)
87   - for dirpath, dirnames, files in os.walk(path)
88   - for f in fnmatch.filter(files, pattern)]
89   -
90   -
91   -def riglob (pathname):
92   - """
93   - Recursive iglob:
94   - similar to glob.iglob, but finds files recursively in all subfolders of path.
95   - pathname: root directory where to search files followed by pattern for
96   - filenames, using wildcards, e.g. *.txt
97   - """
98   - path, filespec = os.path.split(pathname)
99   - # fix path if empty:
100   - if path == '':
101   - path = '.'
102   - # print 'riglob: path=%r, filespec=%r' % (path, filespec)
103   - for dirpath, dirnames, files in os.walk(path):
104   - for f in fnmatch.filter(files, filespec):
105   - yield os.path.join(dirpath, f)
106   -
107   -
108   -def ziglob (zipfileobj, pathname):
109   - """
110   - iglob in a zip:
111   - similar to glob.iglob, but finds files within a zip archive.
112   - - zipfileobj: zipfile.ZipFile object
113   - - pathname: root directory where to search files followed by pattern for
114   - filenames, using wildcards, e.g. *.txt
115   - """
116   - files = zipfileobj.namelist()
117   - #for f in files: print f
118   - for f in fnmatch.filter(files, pathname):
119   - yield f
120   -
121   -
122   -def iter_files(files, recursive=False, zip_password=None, zip_fname='*'):
123   - """
124   - Open each file provided as argument:
125   - - files is a list of arguments
126   - - if zip_password is None, each file is listed without reading its content.
127   - Wilcards are supported.
128   - - if not, then each file is opened as a zip archive with the provided password
129   - - then files matching zip_fname are opened from the zip archive
130   -
131   - Iterator: yields (container, filename, data) for each file. If zip_password is None, then
132   - only the filename is returned, container and data=None. Otherwise container is the
133   - filename of the container (zip file), and data is the file content (or an exception).
134   - If a given filename is not a glob and does not exist, the triplet
135   - (None, filename, PathNotFoundException) is yielded. (Globs matching nothing
136   - do not trigger exceptions)
137   - """
138   - #TODO: catch exceptions and yield them for the caller (no file found, file is not zip, wrong password, etc)
139   - #TODO: use logging instead of printing
140   - #TODO: split in two simpler functions, the caller knows if it's a zip or not
141   - # print 'iter_files: files=%r, recursive=%s' % (files, recursive)
142   - # choose recursive or non-recursive iglob:
143   - if recursive:
144   - iglob = riglob
145   - else:
146   - iglob = glob.iglob
147   - for filespec in files:
148   - if not is_glob(filespec) and not os.path.exists(filespec):
149   - yield None, filespec, PathNotFoundException(filespec)
150   - continue
151   - for filename in iglob(filespec):
152   - if zip_password is not None:
153   - # Each file is expected to be a zip archive:
154   - #print 'Opening zip archive %s with provided password' % filename
155   - z = zipfile.ZipFile(filename, 'r')
156   - #print 'Looking for file(s) matching "%s"' % zip_fname
157   - for subfilename in ziglob(z, zip_fname):
158   - #print 'Opening file in zip archive:', filename
159   - try:
160   - data = z.read(subfilename, zip_password)
161   - yield filename, subfilename, data
162   - except Exception as e:
163   - yield filename, subfilename, e
164   - z.close()
165   - else:
166   - # normal file
167   - # do not read the file content, just yield the filename
168   - yield None, filename, None
169   - #print 'Opening file', filename
170   - #data = open(filename, 'rb').read()
171   - #yield None, filename, data
172   -
173   -
174   -def is_glob(filespec):
175   - """ determine if given file specification is a single file name or a glob
176   -
177   - python's glob and fnmatch can only interpret ?, *, [list], and [ra-nge],
178   - (and combinations: hex_*_[A-Fabcdef0-9]).
179   - The special chars *?[-] can only be escaped using []
180   - --> file_name is not a glob
181   - --> file?name is a glob
182   - --> file* is a glob
183   - --> file[-._]name is a glob
184   - --> file[?]name is not a glob (matches literal "file?name")
185   - --> file[*]name is not a glob (matches literal "file*name")
186   - --> file[-]name is not a glob (matches literal "file-name")
187   - --> file-name is not a glob
188   -
189   - Also, obviously incorrect globs are treated as non-globs
190   - --> file[name is not a glob (matches literal "file[name")
191   - --> file]-[name is treated as a glob
192   - (it is not a valid glob but detecting errors like this requires
193   - sophisticated regular expression matching)
194   -
195   - Python's glob also works with globs in directory-part of path
196   - --> dir-part of path is analyzed just like filename-part
197   - --> thirdparty/*/xglob.py is a (valid) glob
198   -
199   - TODO: create a correct regexp to test for validity of ranges
200   - """
201   -
202   - # remove escaped special chars
203   - cleaned = filespec.replace('[*]', '').replace('[?]', '') \
204   - .replace('[[]', '').replace('[]]', '').replace('[-]', '')
205   -
206   - # check if special chars remain
207   - return '*' in cleaned or '?' in cleaned or \
208   - ('[' in cleaned and ']' in cleaned)
  1 +#! /usr/bin/env python2
  2 +"""
  3 +xglob
  4 +
  5 +xglob is a python package to list files matching wildcards (*, ?, []),
  6 +extending the functionality of the glob module from the standard python
  7 +library (https://docs.python.org/2/library/glob.html).
  8 +
  9 +Main features:
  10 +- recursive file listing (including subfolders)
  11 +- file listing within Zip archives
  12 +- helper function to open files specified as arguments, supporting files
  13 + within zip archives encrypted with a password
  14 +
  15 +Author: Philippe Lagadec - http://www.decalage.info
  16 +License: BSD, see source code or documentation
  17 +
  18 +For more info and updates: http://www.decalage.info/xglob
  19 +"""
  20 +
  21 +# LICENSE:
  22 +#
  23 +# xglob is copyright (c) 2013-2018, Philippe Lagadec (http://www.decalage.info)
  24 +# All rights reserved.
  25 +#
  26 +# Redistribution and use in source and binary forms, with or without modification,
  27 +# are permitted provided that the following conditions are met:
  28 +#
  29 +# * Redistributions of source code must retain the above copyright notice, this
  30 +# list of conditions and the following disclaimer.
  31 +# * Redistributions in binary form must reproduce the above copyright notice,
  32 +# this list of conditions and the following disclaimer in the documentation
  33 +# and/or other materials provided with the distribution.
  34 +#
  35 +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  36 +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  37 +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  38 +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  39 +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  40 +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  41 +# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  42 +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  43 +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  44 +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  45 +
  46 +
  47 +#------------------------------------------------------------------------------
  48 +# CHANGELOG:
  49 +# 2013-12-04 v0.01 PL: - scan several files from command line args
  50 +# 2014-01-14 v0.02 PL: - added riglob, ziglob
  51 +# 2014-12-26 v0.03 PL: - moved code from balbuzard into a separate package
  52 +# 2015-01-03 v0.04 PL: - fixed issues in iter_files + yield container name
  53 +# 2016-02-24 v0.05 PL: - do not stop on exceptions, return them as data
  54 +# - fixed issue when using wildcards with empty path
  55 +# 2016-04-28 v0.06 CH: - improved handling of non-existing files
  56 +# (by Christian Herdtweck)
  57 +# 2018-12-08 v0.07 PL: - fixed issue #373, zip password must be bytes
  58 +
  59 +__version__ = '0.07'
  60 +
  61 +
  62 +#=== IMPORTS =================================================================
  63 +
  64 +import os, fnmatch, glob, zipfile
  65 +
  66 +#=== EXCEPTIONS ==============================================================
  67 +
  68 +class PathNotFoundException(Exception):
  69 + """ raised if given a fixed file/dir (not a glob) that does not exist """
  70 + def __init__(self, path):
  71 + super(PathNotFoundException, self).__init__(
  72 + 'Given path does not exist: %r' % path)
  73 +
  74 +
  75 +#=== FUNCTIONS ===============================================================
  76 +
  77 +# recursive glob function to find files in any subfolder:
  78 +# inspired by http://stackoverflow.com/questions/14798220/how-can-i-search-sub-folders-using-glob-glob-module-in-python
  79 +def rglob (path, pattern='*.*'):
  80 + """
  81 + Recursive glob:
  82 + similar to glob.glob, but finds files recursively in all subfolders of path.
  83 + path: root directory where to search files
  84 + pattern: pattern for filenames, using wildcards, e.g. *.txt
  85 + """
  86 + #TODO: more compatible API with glob: use single param, split path from pattern
  87 + return [os.path.join(dirpath, f)
  88 + for dirpath, dirnames, files in os.walk(path)
  89 + for f in fnmatch.filter(files, pattern)]
  90 +
  91 +
  92 +def riglob (pathname):
  93 + """
  94 + Recursive iglob:
  95 + similar to glob.iglob, but finds files recursively in all subfolders of path.
  96 + pathname: root directory where to search files followed by pattern for
  97 + filenames, using wildcards, e.g. *.txt
  98 + """
  99 + path, filespec = os.path.split(pathname)
  100 + # fix path if empty:
  101 + if path == '':
  102 + path = '.'
  103 + # print 'riglob: path=%r, filespec=%r' % (path, filespec)
  104 + for dirpath, dirnames, files in os.walk(path):
  105 + for f in fnmatch.filter(files, filespec):
  106 + yield os.path.join(dirpath, f)
  107 +
  108 +
  109 +def ziglob (zipfileobj, pathname):
  110 + """
  111 + iglob in a zip:
  112 + similar to glob.iglob, but finds files within a zip archive.
  113 + - zipfileobj: zipfile.ZipFile object
  114 + - pathname: root directory where to search files followed by pattern for
  115 + filenames, using wildcards, e.g. *.txt
  116 + """
  117 + files = zipfileobj.namelist()
  118 + #for f in files: print f
  119 + for f in fnmatch.filter(files, pathname):
  120 + yield f
  121 +
  122 +
  123 +def iter_files(files, recursive=False, zip_password=None, zip_fname='*'):
  124 + """
  125 + Open each file provided as argument:
  126 + - files is a list of arguments
  127 + - if zip_password is None, each file is listed without reading its content.
  128 + Wilcards are supported.
  129 + - if not, then each file is opened as a zip archive with the provided password
  130 + - then files matching zip_fname are opened from the zip archive
  131 +
  132 + Iterator: yields (container, filename, data) for each file. If zip_password is None, then
  133 + only the filename is returned, container and data=None. Otherwise container is the
  134 + filename of the container (zip file), and data is the file content (or an exception).
  135 + If a given filename is not a glob and does not exist, the triplet
  136 + (None, filename, PathNotFoundException) is yielded. (Globs matching nothing
  137 + do not trigger exceptions)
  138 + """
  139 + #TODO: catch exceptions and yield them for the caller (no file found, file is not zip, wrong password, etc)
  140 + #TODO: use logging instead of printing
  141 + #TODO: split in two simpler functions, the caller knows if it's a zip or not
  142 + # print 'iter_files: files=%r, recursive=%s' % (files, recursive)
  143 + # choose recursive or non-recursive iglob:
  144 + if recursive:
  145 + iglob = riglob
  146 + else:
  147 + iglob = glob.iglob
  148 + for filespec in files:
  149 + if not is_glob(filespec) and not os.path.exists(filespec):
  150 + yield None, filespec, PathNotFoundException(filespec)
  151 + continue
  152 + for filename in iglob(filespec):
  153 + if zip_password is not None:
  154 + # Each file is expected to be a zip archive:
  155 + # The zip password must be bytes, not unicode/str:
  156 + if not isinstance(zip_password, bytes):
  157 + zip_password = bytes(zip_password, encoding='utf8')
  158 + # print('Opening zip archive %s with provided password' % filename)
  159 + # print('zip password: %r' % zip_password)
  160 + # print(type(zip_password))
  161 + z = zipfile.ZipFile(filename, 'r')
  162 + #print 'Looking for file(s) matching "%s"' % zip_fname
  163 + for subfilename in ziglob(z, zip_fname):
  164 + #print 'Opening file in zip archive:', filename
  165 + try:
  166 + data = z.read(subfilename, zip_password)
  167 + yield filename, subfilename, data
  168 + except Exception as e:
  169 + yield filename, subfilename, e
  170 + z.close()
  171 + else:
  172 + # normal file
  173 + # do not read the file content, just yield the filename
  174 + yield None, filename, None
  175 + #print 'Opening file', filename
  176 + #data = open(filename, 'rb').read()
  177 + #yield None, filename, data
  178 +
  179 +
  180 +def is_glob(filespec):
  181 + """ determine if given file specification is a single file name or a glob
  182 +
  183 + python's glob and fnmatch can only interpret ?, *, [list], and [ra-nge],
  184 + (and combinations: hex_*_[A-Fabcdef0-9]).
  185 + The special chars *?[-] can only be escaped using []
  186 + --> file_name is not a glob
  187 + --> file?name is a glob
  188 + --> file* is a glob
  189 + --> file[-._]name is a glob
  190 + --> file[?]name is not a glob (matches literal "file?name")
  191 + --> file[*]name is not a glob (matches literal "file*name")
  192 + --> file[-]name is not a glob (matches literal "file-name")
  193 + --> file-name is not a glob
  194 +
  195 + Also, obviously incorrect globs are treated as non-globs
  196 + --> file[name is not a glob (matches literal "file[name")
  197 + --> file]-[name is treated as a glob
  198 + (it is not a valid glob but detecting errors like this requires
  199 + sophisticated regular expression matching)
  200 +
  201 + Python's glob also works with globs in directory-part of path
  202 + --> dir-part of path is analyzed just like filename-part
  203 + --> thirdparty/*/xglob.py is a (valid) glob
  204 +
  205 + TODO: create a correct regexp to test for validity of ranges
  206 + """
  207 +
  208 + # remove escaped special chars
  209 + cleaned = filespec.replace('[*]', '').replace('[?]', '') \
  210 + .replace('[[]', '').replace('[]]', '').replace('[-]', '')
  211 +
  212 + # check if special chars remain
  213 + return '*' in cleaned or '?' in cleaned or \
  214 + ('[' in cleaned and ']' in cleaned)
... ...
oletools/thirdparty/zipfile27/LICENSE.txt deleted
1   -Python 2.7 license
2   -
3   -This is the official license for the Python 2.7 release:
4   -
5   -A. HISTORY OF THE SOFTWARE
6   -==========================
7   -
8   -Python was created in the early 1990s by Guido van Rossum at Stichting
9   -Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands
10   -as a successor of a language called ABC. Guido remains Python's
11   -principal author, although it includes many contributions from others.
12   -
13   -In 1995, Guido continued his work on Python at the Corporation for
14   -National Research Initiatives (CNRI, see http://www.cnri.reston.va.us)
15   -in Reston, Virginia where he released several versions of the
16   -software.
17   -
18   -In May 2000, Guido and the Python core development team moved to
19   -BeOpen.com to form the BeOpen PythonLabs team. In October of the same
20   -year, the PythonLabs team moved to Digital Creations (now Zope
21   -Corporation, see http://www.zope.com). In 2001, the Python Software
22   -Foundation (PSF, see http://www.python.org/psf/) was formed, a
23   -non-profit organization created specifically to own Python-related
24   -Intellectual Property. Zope Corporation is a sponsoring member of
25   -the PSF.
26   -
27   -All Python releases are Open Source (see http://www.opensource.org for
28   -the Open Source Definition). Historically, most, but not all, Python
29   -releases have also been GPL-compatible; the table below summarizes
30   -the various releases.
31   -
32   - Release Derived Year Owner GPL-
33   - from compatible? (1)
34   -
35   - 0.9.0 thru 1.2 1991-1995 CWI yes
36   - 1.3 thru 1.5.2 1.2 1995-1999 CNRI yes
37   - 1.6 1.5.2 2000 CNRI no
38   - 2.0 1.6 2000 BeOpen.com no
39   - 1.6.1 1.6 2001 CNRI yes (2)
40   - 2.1 2.0+1.6.1 2001 PSF no
41   - 2.0.1 2.0+1.6.1 2001 PSF yes
42   - 2.1.1 2.1+2.0.1 2001 PSF yes
43   - 2.2 2.1.1 2001 PSF yes
44   - 2.1.2 2.1.1 2002 PSF yes
45   - 2.1.3 2.1.2 2002 PSF yes
46   - 2.2.1 2.2 2002 PSF yes
47   - 2.2.2 2.2.1 2002 PSF yes
48   - 2.2.3 2.2.2 2003 PSF yes
49   - 2.3 2.2.2 2002-2003 PSF yes
50   - 2.3.1 2.3 2002-2003 PSF yes
51   - 2.3.2 2.3.1 2002-2003 PSF yes
52   - 2.3.3 2.3.2 2002-2003 PSF yes
53   - 2.3.4 2.3.3 2004 PSF yes
54   - 2.3.5 2.3.4 2005 PSF yes
55   - 2.4 2.3 2004 PSF yes
56   - 2.4.1 2.4 2005 PSF yes
57   - 2.4.2 2.4.1 2005 PSF yes
58   - 2.4.3 2.4.2 2006 PSF yes
59   - 2.5 2.4 2006 PSF yes
60   - 2.7 2.6 2010 PSF yes
61   -
62   -Footnotes:
63   -
64   -(1) GPL-compatible doesn't mean that we're distributing Python under
65   - the GPL. All Python licenses, unlike the GPL, let you distribute
66   - a modified version without making your changes open source. The
67   - GPL-compatible licenses make it possible to combine Python with
68   - other software that is released under the GPL; the others don't.
69   -
70   -(2) According to Richard Stallman, 1.6.1 is not GPL-compatible,
71   - because its license has a choice of law clause. According to
72   - CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1
73   - is "not incompatible" with the GPL.
74   -
75   -Thanks to the many outside volunteers who have worked under Guido's
76   -direction to make these releases possible.
77   -
78   -
79   -B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON
80   -===============================================================
81   -
82   -PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2
83   ---------------------------------------------
84   -
85   -1. This LICENSE AGREEMENT is between the Python Software Foundation
86   -("PSF"), and the Individual or Organization ("Licensee") accessing and
87   -otherwise using this software ("Python") in source or binary form and
88   -its associated documentation.
89   -
90   -2. Subject to the terms and conditions of this License Agreement, PSF
91   -hereby grants Licensee a nonexclusive, royalty-free, world-wide
92   -license to reproduce, analyze, test, perform and/or display publicly,
93   -prepare derivative works, distribute, and otherwise use Python
94   -alone or in any derivative version, provided, however, that PSF's
95   -License Agreement and PSF's notice of copyright, i.e., "Copyright (c)
96   -2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation; All Rights
97   -Reserved" are retained in Python alone or in any derivative version
98   -prepared by Licensee.
99   -
100   -3. In the event Licensee prepares a derivative work that is based on
101   -or incorporates Python or any part thereof, and wants to make
102   -the derivative work available to others as provided herein, then
103   -Licensee hereby agrees to include in any such work a brief summary of
104   -the changes made to Python.
105   -
106   -4. PSF is making Python available to Licensee on an "AS IS"
107   -basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
108   -IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND
109   -DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
110   -FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT
111   -INFRINGE ANY THIRD PARTY RIGHTS.
112   -
113   -5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
114   -FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
115   -A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON,
116   -OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
117   -
118   -6. This License Agreement will automatically terminate upon a material
119   -breach of its terms and conditions.
120   -
121   -7. Nothing in this License Agreement shall be deemed to create any
122   -relationship of agency, partnership, or joint venture between PSF and
123   -Licensee. This License Agreement does not grant permission to use PSF
124   -trademarks or trade name in a trademark sense to endorse or promote
125   -products or services of Licensee, or any third party.
126   -
127   -8. By copying, installing or otherwise using Python, Licensee
128   -agrees to be bound by the terms and conditions of this License
129   -Agreement.
130   -
131   -
132   -BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0
133   --------------------------------------------
134   -
135   -BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1
136   -
137   -1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an
138   -office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the
139   -Individual or Organization ("Licensee") accessing and otherwise using
140   -this software in source or binary form and its associated
141   -documentation ("the Software").
142   -
143   -2. Subject to the terms and conditions of this BeOpen Python License
144   -Agreement, BeOpen hereby grants Licensee a non-exclusive,
145   -royalty-free, world-wide license to reproduce, analyze, test, perform
146   -and/or display publicly, prepare derivative works, distribute, and
147   -otherwise use the Software alone or in any derivative version,
148   -provided, however, that the BeOpen Python License is retained in the
149   -Software, alone or in any derivative version prepared by Licensee.
150   -
151   -3. BeOpen is making the Software available to Licensee on an "AS IS"
152   -basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
153   -IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND
154   -DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
155   -FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT
156   -INFRINGE ANY THIRD PARTY RIGHTS.
157   -
158   -4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE
159   -SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS
160   -AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY
161   -DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
162   -
163   -5. This License Agreement will automatically terminate upon a material
164   -breach of its terms and conditions.
165   -
166   -6. This License Agreement shall be governed by and interpreted in all
167   -respects by the law of the State of California, excluding conflict of
168   -law provisions. Nothing in this License Agreement shall be deemed to
169   -create any relationship of agency, partnership, or joint venture
170   -between BeOpen and Licensee. This License Agreement does not grant
171   -permission to use BeOpen trademarks or trade names in a trademark
172   -sense to endorse or promote products or services of Licensee, or any
173   -third party. As an exception, the "BeOpen Python" logos available at
174   -http://www.pythonlabs.com/logos.html may be used according to the
175   -permissions granted on that web page.
176   -
177   -7. By copying, installing or otherwise using the software, Licensee
178   -agrees to be bound by the terms and conditions of this License
179   -Agreement.
180   -
181   -
182   -CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1
183   ----------------------------------------
184   -
185   -1. This LICENSE AGREEMENT is between the Corporation for National
186   -Research Initiatives, having an office at 1895 Preston White Drive,
187   -Reston, VA 20191 ("CNRI"), and the Individual or Organization
188   -("Licensee") accessing and otherwise using Python 1.6.1 software in
189   -source or binary form and its associated documentation.
190   -
191   -2. Subject to the terms and conditions of this License Agreement, CNRI
192   -hereby grants Licensee a nonexclusive, royalty-free, world-wide
193   -license to reproduce, analyze, test, perform and/or display publicly,
194   -prepare derivative works, distribute, and otherwise use Python 1.6.1
195   -alone or in any derivative version, provided, however, that CNRI's
196   -License Agreement and CNRI's notice of copyright, i.e., "Copyright (c)
197   -1995-2001 Corporation for National Research Initiatives; All Rights
198   -Reserved" are retained in Python 1.6.1 alone or in any derivative
199   -version prepared by Licensee. Alternately, in lieu of CNRI's License
200   -Agreement, Licensee may substitute the following text (omitting the
201   -quotes): "Python 1.6.1 is made available subject to the terms and
202   -conditions in CNRI's License Agreement. This Agreement together with
203   -Python 1.6.1 may be located on the Internet using the following
204   -unique, persistent identifier (known as a handle): 1895.22/1013. This
205   -Agreement may also be obtained from a proxy server on the Internet
206   -using the following URL: http://hdl.handle.net/1895.22/1013".
207   -
208   -3. In the event Licensee prepares a derivative work that is based on
209   -or incorporates Python 1.6.1 or any part thereof, and wants to make
210   -the derivative work available to others as provided herein, then
211   -Licensee hereby agrees to include in any such work a brief summary of
212   -the changes made to Python 1.6.1.
213   -
214   -4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS"
215   -basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR
216   -IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND
217   -DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS
218   -FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT
219   -INFRINGE ANY THIRD PARTY RIGHTS.
220   -
221   -5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON
222   -1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS
223   -A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1,
224   -OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.
225   -
226   -6. This License Agreement will automatically terminate upon a material
227   -breach of its terms and conditions.
228   -
229   -7. This License Agreement shall be governed by the federal
230   -intellectual property law of the United States, including without
231   -limitation the federal copyright law, and, to the extent such
232   -U.S. federal law does not apply, by the law of the Commonwealth of
233   -Virginia, excluding Virginia's conflict of law provisions.
234   -Notwithstanding the foregoing, with regard to derivative works based
235   -on Python 1.6.1 that incorporate non-separable material that was
236   -previously distributed under the GNU General Public License (GPL), the
237   -law of the Commonwealth of Virginia shall govern this License
238   -Agreement only as to issues arising under or with respect to
239   -Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this
240   -License Agreement shall be deemed to create any relationship of
241   -agency, partnership, or joint venture between CNRI and Licensee. This
242   -License Agreement does not grant permission to use CNRI trademarks or
243   -trade name in a trademark sense to endorse or promote products or
244   -services of Licensee, or any third party.
245   -
246   -8. By clicking on the "ACCEPT" button where indicated, or by copying,
247   -installing or otherwise using Python 1.6.1, Licensee agrees to be
248   -bound by the terms and conditions of this License Agreement.
249   -
250   - ACCEPT
251   -
252   -
253   -CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2
254   ---------------------------------------------------
255   -
256   -Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam,
257   -The Netherlands. All rights reserved.
258   -
259   -Permission to use, copy, modify, and distribute this software and its
260   -documentation for any purpose and without fee is hereby granted,
261   -provided that the above copyright notice appear in all copies and that
262   -both that copyright notice and this permission notice appear in
263   -supporting documentation, and that the name of Stichting Mathematisch
264   -Centrum or CWI not be used in advertising or publicity pertaining to
265   -distribution of the software without specific, written prior
266   -permission.
267   -
268   -STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO
269   -THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
270   -FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE
271   -FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
272   -WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
273   -ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT
274   -OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
275   -
oletools/thirdparty/zipfile27/__init__.py deleted
1   -# Excerpt from the zipfile module from Python 2.7, to enable is_zipfile
2   -# to check any file object (e.g. in memory), for Python 2.6.
3   -# is_zipfile in Python 2.6 can only check files on disk.
4   -
5   -# This code from Python 2.7 was not modified.
6   -
7   -# 2016-09-06 v0.01 PL: - first version
8   -
9   -
10   -from zipfile import _EndRecData
11   -
12   -def _check_zipfile(fp):
13   - try:
14   - if _EndRecData(fp):
15   - return True # file has correct magic number
16   - except IOError:
17   - pass
18   - return False
19   -
20   -def is_zipfile(filename):
21   - """Quickly see if a file is a ZIP file by checking the magic number.
22   -
23   - The filename argument may be a file or file-like object too.
24   - """
25   - result = False
26   - try:
27   - if hasattr(filename, "read"):
28   - result = _check_zipfile(fp=filename)
29   - else:
30   - with open(filename, "rb") as fp:
31   - result = _check_zipfile(fp)
32   - except IOError:
33   - pass
34   - return result
35   -
oletools/xls_parser.py
... ... @@ -5,7 +5,7 @@ Read storages, (sub-)streams, records from xls file
5 5 #
6 6 # === LICENSE ==================================================================
7 7  
8   -# xls_parser is copyright (c) 2014-2018 Philippe Lagadec (http://www.decalage.info)
  8 +# xls_parser is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info)
9 9 # All rights reserved.
10 10 #
11 11 # Redistribution and use in source and binary forms, with or without modification,
... ... @@ -33,8 +33,10 @@ Read storages, (sub-)streams, records from xls file
33 33 # 2017-11-02 v0.1 CH: - first version
34 34 # 2017-11-02 v0.2 CH: - move some code to record_base.py
35 35 # (to avoid copy-and-paste in ppt_parser.py)
  36 +# 2019-01-30 v0.54 PL: - fixed import to avoid mixing installed oletools
  37 +# and dev version
36 38  
37   -__version__ = '0.2'
  39 +__version__ = '0.54'
38 40  
39 41 # -----------------------------------------------------------------------------
40 42 # TODO:
... ... @@ -56,17 +58,14 @@ import os.path
56 58 from struct import unpack
57 59 import logging
58 60  
59   -try:
60   - from oletools import record_base
61   -except ImportError:
62   - # little hack to allow absolute imports even if oletools is not installed.
63   - # Copied from olevba.py
64   - PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(
65   - os.path.abspath(__file__))))
66   - if PARENT_DIR not in sys.path:
67   - sys.path.insert(0, PARENT_DIR)
68   - del PARENT_DIR
69   - from oletools import record_base
  61 +# little hack to allow absolute imports even if oletools is not installed.
  62 +# Copied from olevba.py
  63 +PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(
  64 + os.path.abspath(__file__))))
  65 +if PARENT_DIR not in sys.path:
  66 + sys.path.insert(0, PARENT_DIR)
  67 +del PARENT_DIR
  68 +from oletools import record_base
70 69  
71 70  
72 71 # === PYTHON 2+3 SUPPORT ======================================================
... ... @@ -89,12 +88,18 @@ def is_xls(filename):
89 88 substream.
90 89 See also: oleid.OleID.check_excel
91 90 """
  91 + xls_file = None
92 92 try:
93   - for stream in XlsFile(filename).iter_streams():
  93 + xls_file = XlsFile(filename)
  94 + for stream in xls_file.iter_streams():
94 95 if isinstance(stream, WorkbookStream):
95 96 return True
96 97 except Exception:
97   - pass
  98 + logging.debug('Ignoring exception in is_xls, assume is not xls',
  99 + exc_info=True)
  100 + finally:
  101 + if xls_file is not None:
  102 + xls_file.close()
98 103 return False
99 104  
100 105  
... ... @@ -102,7 +107,7 @@ def read_unicode(data, start_idx, n_chars):
102 107 """ read a unicode string from a XLUnicodeStringNoCch structure """
103 108 # first bit 0x0 --> only low-bytes are saved, all high bytes are 0
104 109 # first bit 0x1 --> 2 bytes per character
105   - low_bytes_only = (ord(data[start_idx]) == 0)
  110 + low_bytes_only = (ord(data[start_idx:start_idx+1]) == 0)
106 111 if low_bytes_only:
107 112 end_idx = start_idx + 1 + n_chars
108 113 return data[start_idx+1:end_idx].decode('ascii'), end_idx
... ... @@ -350,6 +355,7 @@ class XlsRecordSupBook(XlsRecord):
350 355 LINK_TYPE_EXTERNAL = 'external workbook'
351 356  
352 357 def finish_constructing(self, _):
  358 + """Finish constructing this record; called at end of constructor."""
353 359 # set defaults
354 360 self.ctab = None
355 361 self.cch = None
... ...
requirements.txt
1 1 pyparsing>=2.2.0
2   -olefile>=0.45
  2 +olefile>=0.46
  3 +easygui
  4 +colorclass
  5 +msoffcrypto-tool
  6 +pcodedmp>=1.2.5
3 7 \ No newline at end of file
... ...
setup.py
... ... @@ -28,6 +28,9 @@ to install this package.
28 28 # 2018-09-15 PL: - easygui is now a dependency
29 29 # 2018-09-22 PL: - colorclass is now a dependency
30 30 # 2018-10-27 PL: - fixed issue #359 (bug when importing log_helper)
  31 +# 2019-02-26 CH: - add optional dependency msoffcrypto for decryption
  32 +# 2019-05-22 PL: - 'msoffcrypto-tool' is now a required dependency
  33 +# 2019-05-23 v0.55 PL: - added pcodedmp as dependency
31 34  
32 35 #--- TODO ---------------------------------------------------------------------
33 36  
... ... @@ -47,7 +50,7 @@ import os, fnmatch
47 50 #--- METADATA -----------------------------------------------------------------
48 51  
49 52 name = "oletools"
50   -version = '0.54dev4'
  53 +version = '0.55.dev3'
51 54 desc = "Python tools to analyze security characteristics of MS Office and OLE files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), for Malware Analysis and Incident Response #DFIR"
52 55 long_desc = open('oletools/README.rst').read()
53 56 author = "Philippe Lagadec"
... ... @@ -73,6 +76,7 @@ classifiers=[
73 76 "Programming Language :: Python :: 3.4",
74 77 "Programming Language :: Python :: 3.5",
75 78 "Programming Language :: Python :: 3.6",
  79 + "Programming Language :: Python :: 3.7",
76 80 "Topic :: Security",
77 81 "Topic :: Software Development :: Libraries :: Python Modules",
78 82 ]
... ... @@ -89,7 +93,7 @@ packages=[
89 93 'oletools.thirdparty.xglob',
90 94 'oletools.thirdparty.DridexUrlDecoder',
91 95 'oletools.thirdparty.tablestream',
92   - 'oletools.thirdparty.zipfile27',
  96 + 'oletools.thirdparty.oledump',
93 97 ]
94 98 ##setupdir = '.'
95 99 ##package_dir={'': setupdir}
... ... @@ -177,9 +181,6 @@ package_data={
177 181 'oletools.thirdparty.DridexUrlDecoder': [
178 182 'LICENSE.txt',
179 183 ],
180   - 'oletools.thirdparty.zipfile27': [
181   - 'LICENSE.txt',
182   - ],
183 184 # 'oletools.thirdparty.tablestream': [
184 185 # 'LICENSE', 'README',
185 186 # ],
... ... @@ -305,11 +306,11 @@ def main():
305 306 author_email=author_email,
306 307 url=url,
307 308 license=license,
308   -## package_dir=package_dir,
  309 + # package_dir=package_dir,
309 310 packages=packages,
310 311 package_data = package_data,
311 312 download_url=download_url,
312   -# data_files=data_files,
  313 + # data_files=data_files,
313 314 entry_points=entry_points,
314 315 test_suite="tests",
315 316 # scripts=scripts,
... ... @@ -318,6 +319,8 @@ def main():
318 319 "olefile>=0.46",
319 320 "easygui",
320 321 'colorclass',
  322 + 'msoffcrypto-tool',
  323 + 'pcodedmp>=1.2.5',
321 324 ],
322 325 )
323 326  
... ...
tests/common/log_helper/log_helper_test_imported.py
... ... @@ -11,6 +11,8 @@ INFO_MESSAGE = &#39;imported: info log&#39;
11 11 WARNING_MESSAGE = 'imported: warning log'
12 12 ERROR_MESSAGE = 'imported: error log'
13 13 CRITICAL_MESSAGE = 'imported: critical log'
  14 +RESULT_MESSAGE = 'imported: result log'
  15 +RESULT_TYPE = 'imported: result'
14 16  
15 17 logger = log_helper.get_or_create_silent_logger('test_imported', logging.ERROR)
16 18  
... ... @@ -21,3 +23,4 @@ def log():
21 23 logger.warning(WARNING_MESSAGE)
22 24 logger.error(ERROR_MESSAGE)
23 25 logger.critical(CRITICAL_MESSAGE)
  26 + logger.info(RESULT_MESSAGE, type=RESULT_TYPE)
... ...
tests/common/log_helper/log_helper_test_main.py
... ... @@ -9,6 +9,8 @@ INFO_MESSAGE = &#39;main: info log&#39;
9 9 WARNING_MESSAGE = 'main: warning log'
10 10 ERROR_MESSAGE = 'main: error log'
11 11 CRITICAL_MESSAGE = 'main: critical log'
  12 +RESULT_MESSAGE = 'main: result log'
  13 +RESULT_TYPE = 'main: result'
12 14  
13 15 logger = log_helper.get_or_create_silent_logger('test_main')
14 16  
... ... @@ -32,12 +34,16 @@ def init_logging_and_log(args):
32 34 level = args[-1]
33 35 use_json = 'as-json' in args
34 36 throw = 'throw' in args
  37 + percent_autoformat = '%-autoformat' in args
35 38  
36 39 if 'enable' in args:
37 40 log_helper.enable_logging(use_json, level, stream=sys.stdout)
38 41  
39 42 _log()
40 43  
  44 + if percent_autoformat:
  45 + logger.info('The %s is %d.', 'answer', 47)
  46 +
41 47 if throw:
42 48 raise Exception('An exception occurred before ending the logging')
43 49  
... ... @@ -50,6 +56,7 @@ def _log():
50 56 logger.warning(WARNING_MESSAGE)
51 57 logger.error(ERROR_MESSAGE)
52 58 logger.critical(CRITICAL_MESSAGE)
  59 + logger.info(RESULT_MESSAGE, type=RESULT_TYPE)
53 60 log_helper_test_imported.log()
54 61  
55 62  
... ...
tests/common/log_helper/test_log_helper.py
... ... @@ -13,9 +13,11 @@ from tests.common.log_helper import log_helper_test_main
13 13 from tests.common.log_helper import log_helper_test_imported
14 14 from os.path import dirname, join, relpath, abspath
15 15  
  16 +from tests.test_utils import PROJECT_ROOT
  17 +
16 18 # this is the common base of "tests" and "oletools" dirs
17   -ROOT_DIRECTORY = abspath(join(__file__, '..', '..', '..', '..'))
18   -TEST_FILE = relpath(join(dirname(__file__), 'log_helper_test_main.py'), ROOT_DIRECTORY)
  19 +TEST_FILE = relpath(join(dirname(abspath(__file__)), 'log_helper_test_main.py'),
  20 + PROJECT_ROOT)
19 21 PYTHON_EXECUTABLE = sys.executable
20 22  
21 23 MAIN_LOG_MESSAGES = [
... ... @@ -59,6 +61,62 @@ class TestLogHelper(unittest.TestCase):
59 61 log_helper_test_imported.CRITICAL_MESSAGE
60 62 ])
61 63  
  64 + def test_logs_type_ignored(self):
  65 + """Run test script with logging enabled at info level. Want no type."""
  66 + output = self._run_test(['enable', 'info'])
  67 +
  68 + expect = '\n'.join([
  69 + 'INFO ' + log_helper_test_main.INFO_MESSAGE,
  70 + 'WARNING ' + log_helper_test_main.WARNING_MESSAGE,
  71 + 'ERROR ' + log_helper_test_main.ERROR_MESSAGE,
  72 + 'CRITICAL ' + log_helper_test_main.CRITICAL_MESSAGE,
  73 + 'INFO ' + log_helper_test_main.RESULT_MESSAGE,
  74 + 'INFO ' + log_helper_test_imported.INFO_MESSAGE,
  75 + 'WARNING ' + log_helper_test_imported.WARNING_MESSAGE,
  76 + 'ERROR ' + log_helper_test_imported.ERROR_MESSAGE,
  77 + 'CRITICAL ' + log_helper_test_imported.CRITICAL_MESSAGE,
  78 + 'INFO ' + log_helper_test_imported.RESULT_MESSAGE,
  79 + ])
  80 + self.assertEqual(output, expect)
  81 +
  82 + def test_logs_type_in_json(self):
  83 + """Check type field is contained in json log."""
  84 + output = self._run_test(['enable', 'as-json', 'info'])
  85 +
  86 + # convert to json preserving order of output
  87 + jout = json.loads(output)
  88 +
  89 + jexpect = [
  90 + dict(type='msg', level='INFO',
  91 + msg=log_helper_test_main.INFO_MESSAGE),
  92 + dict(type='msg', level='WARNING',
  93 + msg=log_helper_test_main.WARNING_MESSAGE),
  94 + dict(type='msg', level='ERROR',
  95 + msg=log_helper_test_main.ERROR_MESSAGE),
  96 + dict(type='msg', level='CRITICAL',
  97 + msg=log_helper_test_main.CRITICAL_MESSAGE),
  98 + # this is the important entry (has a different "type" field):
  99 + dict(type=log_helper_test_main.RESULT_TYPE, level='INFO',
  100 + msg=log_helper_test_main.RESULT_MESSAGE),
  101 + dict(type='msg', level='INFO',
  102 + msg=log_helper_test_imported.INFO_MESSAGE),
  103 + dict(type='msg', level='WARNING',
  104 + msg=log_helper_test_imported.WARNING_MESSAGE),
  105 + dict(type='msg', level='ERROR',
  106 + msg=log_helper_test_imported.ERROR_MESSAGE),
  107 + dict(type='msg', level='CRITICAL',
  108 + msg=log_helper_test_imported.CRITICAL_MESSAGE),
  109 + # ... and this:
  110 + dict(type=log_helper_test_imported.RESULT_TYPE, level='INFO',
  111 + msg=log_helper_test_imported.RESULT_MESSAGE),
  112 + ]
  113 + self.assertEqual(jout, jexpect)
  114 +
  115 + def test_percent_autoformat(self):
  116 + """Test that auto-formatting of log strings with `%` works."""
  117 + output = self._run_test(['enable', '%-autoformat', 'info'])
  118 + self.assertIn('The answer is 47.', output)
  119 +
62 120 def test_json_correct_on_exceptions(self):
63 121 """
64 122 Test that even on unhandled exceptions our JSON is always correct
... ... @@ -72,10 +130,10 @@ class TestLogHelper(unittest.TestCase):
72 130 def _assert_json_messages(self, output, messages):
73 131 try:
74 132 json_data = json.loads(output)
75   - self.assertEquals(len(json_data), len(messages))
  133 + self.assertEqual(len(json_data), len(messages))
76 134  
77 135 for i in range(len(messages)):
78   - self.assertEquals(messages[i], json_data[i]['msg'])
  136 + self.assertEqual(messages[i], json_data[i]['msg'])
79 137 except ValueError:
80 138 self.fail('Invalid json:\n' + output)
81 139  
... ... @@ -90,9 +148,9 @@ class TestLogHelper(unittest.TestCase):
90 148 child = subprocess.Popen(
91 149 [PYTHON_EXECUTABLE, TEST_FILE] + args,
92 150 shell=False,
93   - env={'PYTHONPATH': ROOT_DIRECTORY},
  151 + env={'PYTHONPATH': PROJECT_ROOT},
94 152 universal_newlines=True,
95   - cwd=ROOT_DIRECTORY,
  153 + cwd=PROJECT_ROOT,
96 154 stdin=None,
97 155 stdout=subprocess.PIPE,
98 156 stderr=subprocess.PIPE
... ... @@ -102,7 +160,7 @@ class TestLogHelper(unittest.TestCase):
102 160 if not isinstance(output, str):
103 161 output = output.decode('utf-8')
104 162  
105   - self.assertEquals(child.returncode == 0, should_succeed)
  163 + self.assertEqual(child.returncode == 0, should_succeed)
106 164  
107 165 return output.strip()
108 166  
... ...
tests/msodde/test_basic.py
... ... @@ -9,11 +9,16 @@ Ensure that
9 9 from __future__ import print_function
10 10  
11 11 import unittest
12   -from oletools import msodde
13   -from tests.test_utils import DATA_BASE_DIR as BASE_DIR
  12 +import sys
14 13 import os
15   -from os.path import join
  14 +from os.path import join, basename
16 15 from traceback import print_exc
  16 +import json
  17 +from collections import OrderedDict
  18 +from oletools import msodde
  19 +from oletools.crypto import \
  20 + WrongEncryptionPassword, CryptoLibNotImported, check_msoffcrypto
  21 +from tests.test_utils import call_and_capture, DATA_BASE_DIR as BASE_DIR
17 22  
18 23  
19 24 class TestReturnCode(unittest.TestCase):
... ... @@ -46,15 +51,21 @@ class TestReturnCode(unittest.TestCase):
46 51  
47 52 def test_invalid_none(self):
48 53 """ check that no file argument leads to non-zero exit status """
49   - self.do_test_validity('', True)
  54 + if sys.hexversion > 0x03030000: # version 3.3 and higher
  55 + # different errors probably depending on whether msoffcryto is
  56 + # available or not
  57 + expect_error = (AttributeError, FileNotFoundError)
  58 + else:
  59 + expect_error = (AttributeError, IOError)
  60 + self.do_test_validity('', expect_error)
50 61  
51 62 def test_invalid_empty(self):
52 63 """ check that empty file argument leads to non-zero exit status """
53   - self.do_test_validity(join(BASE_DIR, 'basic/empty'), True)
  64 + self.do_test_validity(join(BASE_DIR, 'basic/empty'), Exception)
54 65  
55 66 def test_invalid_text(self):
56 67 """ check that text file argument leads to non-zero exit status """
57   - self.do_test_validity(join(BASE_DIR, 'basic/text'), True)
  68 + self.do_test_validity(join(BASE_DIR, 'basic/text'), Exception)
58 69  
59 70 def test_encrypted(self):
60 71 """
... ... @@ -64,28 +75,56 @@ class TestReturnCode(unittest.TestCase):
64 75 Encryption) is tested.
65 76 """
66 77 CRYPT_DIR = join(BASE_DIR, 'encrypted')
67   - ADD_ARGS = '', '-j', '-d', '-f', '-a'
  78 + have_crypto = check_msoffcrypto()
68 79 for filename in os.listdir(CRYPT_DIR):
69   - full_name = join(CRYPT_DIR, filename)
70   - for args in ADD_ARGS:
71   - self.do_test_validity(args + ' ' + full_name, True)
72   -
73   - def do_test_validity(self, args, expect_error=False):
74   - """ helper for test_valid_doc[x] """
75   - have_exception = False
  80 + if have_crypto and 'standardpassword' in filename:
  81 + # these are automagically decrypted
  82 + self.do_test_validity(join(CRYPT_DIR, filename))
  83 + elif have_crypto:
  84 + self.do_test_validity(join(CRYPT_DIR, filename),
  85 + WrongEncryptionPassword)
  86 + else:
  87 + self.do_test_validity(join(CRYPT_DIR, filename),
  88 + CryptoLibNotImported)
  89 +
  90 + def do_test_validity(self, filename, expect_error=None):
  91 + """ helper for test_[in]valid_* """
  92 + found_error = None
  93 + # DEBUG: print('Testing file {}'.format(filename))
76 94 try:
77   - msodde.process_file(args, msodde.FIELD_FILTER_BLACKLIST)
78   - except Exception:
79   - have_exception = True
80   - print_exc()
81   - except SystemExit as exc: # sys.exit() was called
82   - have_exception = True
83   - if exc.code is None:
84   - have_exception = False
85   -
86   - self.assertEqual(expect_error, have_exception,
87   - msg='Args={0}, expect={1}, exc={2}'
88   - .format(args, expect_error, have_exception))
  95 + msodde.process_maybe_encrypted(filename,
  96 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
  97 + except Exception as exc:
  98 + found_error = exc
  99 + # DEBUG: print_exc()
  100 +
  101 + if expect_error and not found_error:
  102 + self.fail('Expected {} but msodde finished without errors for {}'
  103 + .format(expect_error, filename))
  104 + elif not expect_error and found_error:
  105 + self.fail('Unexpected error {} from msodde for {}'
  106 + .format(found_error, filename))
  107 + elif expect_error and not isinstance(found_error, expect_error):
  108 + self.fail('Wrong kind of error {} from msodde for {}, expected {}'
  109 + .format(type(found_error), filename, expect_error))
  110 +
  111 +
  112 +@unittest.skipIf(not check_msoffcrypto(),
  113 + 'Module msoffcrypto not installed for {}'
  114 + .format(basename(sys.executable)))
  115 +class TestErrorOutput(unittest.TestCase):
  116 + """msodde does not specify error by return code but text output."""
  117 +
  118 + def test_crypt_output(self):
  119 + """Check for helpful error message when failing to decrypt."""
  120 + for suffix in 'doc', 'docm', 'docx', 'ppt', 'pptm', 'pptx', 'xls', \
  121 + 'xlsb', 'xlsm', 'xlsx':
  122 + example_file = join(BASE_DIR, 'encrypted', 'encrypted.' + suffix)
  123 + output, ret_code = call_and_capture('msodde', [example_file, ],
  124 + accept_nonzero_exit=True)
  125 + self.assertEqual(ret_code, 1)
  126 + self.assertIn('passwords could not decrypt office file', output,
  127 + msg='Unexpected output: {}'.format(output.strip()))
89 128  
90 129  
91 130 class TestDdeLinks(unittest.TestCase):
... ... @@ -100,33 +139,37 @@ class TestDdeLinks(unittest.TestCase):
100 139 def test_with_dde(self):
101 140 """ check that dde links appear on stdout """
102 141 filename = 'dde-test-from-office2003.doc'
103   - output = msodde.process_file(
104   - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST)
  142 + output = msodde.process_maybe_encrypted(
  143 + join(BASE_DIR, 'msodde', filename),
  144 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
105 145 self.assertNotEqual(len(self.get_dde_from_output(output)), 0,
106 146 msg='Found no dde links in output of ' + filename)
107 147  
108 148 def test_no_dde(self):
109 149 """ check that no dde links appear on stdout """
110 150 filename = 'harmless-clean.doc'
111   - output = msodde.process_file(
112   - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST)
  151 + output = msodde.process_maybe_encrypted(
  152 + join(BASE_DIR, 'msodde', filename),
  153 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
113 154 self.assertEqual(len(self.get_dde_from_output(output)), 0,
114 155 msg='Found dde links in output of ' + filename)
115 156  
116 157 def test_with_dde_utf16le(self):
117 158 """ check that dde links appear on stdout """
118 159 filename = 'dde-test-from-office2013-utf_16le-korean.doc'
119   - output = msodde.process_file(
120   - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST)
  160 + output = msodde.process_maybe_encrypted(
  161 + join(BASE_DIR, 'msodde', filename),
  162 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
121 163 self.assertNotEqual(len(self.get_dde_from_output(output)), 0,
122 164 msg='Found no dde links in output of ' + filename)
123 165  
124 166 def test_excel(self):
125 167 """ check that dde links are found in excel 2007+ files """
126   - expect = ['DDE-Link cmd /c calc.exe', ]
  168 + expect = ['cmd /c calc.exe', ]
127 169 for extn in 'xlsx', 'xlsm', 'xlsb':
128   - output = msodde.process_file(
129   - join(BASE_DIR, 'msodde', 'dde-test.' + extn), msodde.FIELD_FILTER_BLACKLIST)
  170 + output = msodde.process_maybe_encrypted(
  171 + join(BASE_DIR, 'msodde', 'dde-test.' + extn),
  172 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
130 173  
131 174 self.assertEqual(expect, self.get_dde_from_output(output),
132 175 msg='unexpected output for dde-test.{0}: {1}'
... ... @@ -136,8 +179,9 @@ class TestDdeLinks(unittest.TestCase):
136 179 """ check that dde in xml from word / excel is found """
137 180 for name_part in 'excel2003', 'word2003', 'word2007':
138 181 filename = 'dde-in-' + name_part + '.xml'
139   - output = msodde.process_file(
140   - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST)
  182 + output = msodde.process_maybe_encrypted(
  183 + join(BASE_DIR, 'msodde', filename),
  184 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
141 185 links = self.get_dde_from_output(output)
142 186 self.assertEqual(len(links), 1, 'found {0} dde-links in {1}'
143 187 .format(len(links), filename))
... ... @@ -149,15 +193,17 @@ class TestDdeLinks(unittest.TestCase):
149 193 def test_clean_rtf_blacklist(self):
150 194 """ find a lot of hyperlinks in rtf spec """
151 195 filename = 'RTF-Spec-1.7.rtf'
152   - output = msodde.process_file(
153   - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST)
  196 + output = msodde.process_maybe_encrypted(
  197 + join(BASE_DIR, 'msodde', filename),
  198 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
154 199 self.assertEqual(len(self.get_dde_from_output(output)), 1413)
155 200  
156 201 def test_clean_rtf_ddeonly(self):
157 202 """ find no dde links in rtf spec """
158 203 filename = 'RTF-Spec-1.7.rtf'
159   - output = msodde.process_file(
160   - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_DDE)
  204 + output = msodde.process_maybe_encrypted(
  205 + join(BASE_DIR, 'msodde', filename),
  206 + field_filter_mode=msodde.FIELD_FILTER_DDE)
161 207 self.assertEqual(len(self.get_dde_from_output(output)), 0,
162 208 msg='Found dde links in output of ' + filename)
163 209  
... ...
tests/msodde/test_crypto.py 0 → 100644
  1 +"""Check decryption of files from msodde works."""
  2 +
  3 +import sys
  4 +import unittest
  5 +from os.path import basename, join as pjoin
  6 +
  7 +from tests.test_utils import DATA_BASE_DIR, call_and_capture
  8 +
  9 +from oletools import crypto
  10 +
  11 +
  12 +@unittest.skipIf(not crypto.check_msoffcrypto(),
  13 + 'Module msoffcrypto not installed for {}'
  14 + .format(basename(sys.executable)))
  15 +class MsoddeCryptoTest(unittest.TestCase):
  16 + """Test integration of decryption in msodde."""
  17 +
  18 + def test_standard_password(self):
  19 + """Check dde-link is found in xls[mb] sample files."""
  20 + for suffix in 'xls', 'xlsx', 'xlsm', 'xlsb':
  21 + example_file = pjoin(DATA_BASE_DIR, 'encrypted',
  22 + 'dde-test-encrypt-standardpassword.' + suffix)
  23 + output, _ = call_and_capture('msodde', [example_file, ])
  24 + self.assertIn('\nDDE Links:\ncmd /c calc.exe\n', output,
  25 + msg='Unexpected output {!r} for {}'
  26 + .format(output, suffix))
  27 +
  28 + # TODO: add more, in particular a sample with a "proper" password
  29 +
  30 +
  31 +if __name__ == '__main__':
  32 + unittest.main()
... ...
tests/oleid/test_basic.py
... ... @@ -20,7 +20,7 @@ class TestOleIDBasic(unittest.TestCase):
20 20 """Run all file in test-data through oleid and compare to known ouput"""
21 21 # this relies on order of indicators being constant, could relax that
22 22 # Also requires that files have the correct suffixes (no rtf in doc)
23   - NON_OLE_SUFFIXES = ('.xml', '.csv', '.rtf', '')
  23 + NON_OLE_SUFFIXES = ('.xml', '.csv', '.rtf', '', '.odt', '.ods', '.odp')
24 24 NON_OLE_VALUES = (False, )
25 25 WORD = b'Microsoft Office Word'
26 26 PPT = b'Microsoft Office PowerPoint'
... ... @@ -121,6 +121,33 @@ class TestOleIDBasic(unittest.TestCase):
121 121 'msodde/harmless-clean.docx': (False,),
122 122 'oleform/oleform-PR314.docm': (False,),
123 123 'basic/encrypted.docx': CRYPT,
  124 + 'oleobj/external_link/sample_with_external_link_to_doc.docx': (False,),
  125 + 'oleobj/external_link/sample_with_external_link_to_doc.xlsb': (False,),
  126 + 'oleobj/external_link/sample_with_external_link_to_doc.dotm': (False,),
  127 + 'oleobj/external_link/sample_with_external_link_to_doc.xlsm': (False,),
  128 + 'oleobj/external_link/sample_with_external_link_to_doc.pptx': (False,),
  129 + 'oleobj/external_link/sample_with_external_link_to_doc.dotx': (False,),
  130 + 'oleobj/external_link/sample_with_external_link_to_doc.docm': (False,),
  131 + 'oleobj/external_link/sample_with_external_link_to_doc.potm': (False,),
  132 + 'oleobj/external_link/sample_with_external_link_to_doc.xlsx': (False,),
  133 + 'oleobj/external_link/sample_with_external_link_to_doc.potx': (False,),
  134 + 'oleobj/external_link/sample_with_external_link_to_doc.ppsm': (False,),
  135 + 'oleobj/external_link/sample_with_external_link_to_doc.pptm': (False,),
  136 + 'oleobj/external_link/sample_with_external_link_to_doc.ppsx': (False,),
  137 + 'encrypted/autostart-encrypt-standardpassword.xlsm':
  138 + (True, False, 'unknown', True, False, False, False, False, False, False, 0),
  139 + 'encrypted/autostart-encrypt-standardpassword.xls':
  140 + (True, True, EXCEL, True, False, True, True, False, False, False, 0),
  141 + 'encrypted/dde-test-encrypt-standardpassword.xlsx':
  142 + (True, False, 'unknown', True, False, False, False, False, False, False, 0),
  143 + 'encrypted/dde-test-encrypt-standardpassword.xlsm':
  144 + (True, False, 'unknown', True, False, False, False, False, False, False, 0),
  145 + 'encrypted/autostart-encrypt-standardpassword.xlsb':
  146 + (True, False, 'unknown', True, False, False, False, False, False, False, 0),
  147 + 'encrypted/dde-test-encrypt-standardpassword.xls':
  148 + (True, True, EXCEL, True, False, False, True, False, False, False, 0),
  149 + 'encrypted/dde-test-encrypt-standardpassword.xlsb':
  150 + (True, False, 'unknown', True, False, False, False, False, False, False, 0),
124 151 }
125 152  
126 153 indicator_names = []
... ... @@ -148,7 +175,8 @@ class TestOleIDBasic(unittest.TestCase):
148 175 OLE_VALUES[name]))
149 176 except KeyError:
150 177 print('Should add oleid output for {} to {} ({})'
151   - .format(name, __name__, values[3:]))
  178 + .format(name, __name__, values))
  179 +
152 180  
153 181 # just in case somebody calls this file as a script
154 182 if __name__ == '__main__':
... ...
tests/oleobj/test_basic.py
... ... @@ -8,7 +8,7 @@ from hashlib import md5
8 8 from glob import glob
9 9  
10 10 # Directory with test data, independent of current working directory
11   -from tests.test_utils import DATA_BASE_DIR
  11 +from tests.test_utils import DATA_BASE_DIR, call_and_capture
12 12 from oletools import oleobj
13 13  
14 14  
... ... @@ -41,8 +41,10 @@ SAMPLES += tuple(
41 41 'ab8c65e4c0fc51739aa66ca5888265b4')
42 42 for extn in ('xls', 'xlsx', 'xlsb', 'xlsm', 'xla', 'xlam', 'xlt', 'xltm',
43 43 'xltx', 'ppt', 'pptx', 'pptm', 'pps', 'ppsx', 'ppsm', 'pot',
44   - 'potx', 'potm')
  44 + 'potx', 'potm', 'ods', 'odp')
45 45 )
  46 +SAMPLES += (('embedded-simple-2007.odt', 'simple-text-file.txt',
  47 + 'bd5c063a5a43f67b3c50dc7b0f1195af'), )
46 48  
47 49  
48 50 def calc_md5(filename):
... ... @@ -79,10 +81,6 @@ class TestOleObj(unittest.TestCase):
79 81 """ fixture start: create temp dir """
80 82 self.temp_dir = mkdtemp(prefix='oletools-oleobj-')
81 83 self.did_fail = False
82   - if DEBUG:
83   - import logging
84   - logging.basicConfig(level=logging.DEBUG if DEBUG else logging.INFO)
85   - oleobj.log.setLevel(logging.NOTSET)
86 84  
87 85 def tearDown(self):
88 86 """ fixture end: remove temp dir """
... ... @@ -99,7 +97,8 @@ class TestOleObj(unittest.TestCase):
99 97 """
100 98 test that oleobj can be called with -i and -v
101 99  
102   - this is the way that amavisd calls oleobj, thinking it is ripOLE
  100 + This is how ripOLE used to be often called (e.g. by amavisd-new);
  101 + ensure oleobj is a compatible replacement.
103 102 """
104 103 self.do_test_md5(['-d', self.temp_dir, '-v', '-i'])
105 104  
... ... @@ -110,35 +109,52 @@ class TestOleObj(unittest.TestCase):
110 109 'embedded-simple-2007.xml',
111 110 'embedded-simple-2007-as2003.xml'):
112 111 full_name = join(DATA_BASE_DIR, 'oleobj', sample_name)
113   - ret_val = oleobj.main(args + [full_name, ])
  112 + output, ret_val = call_and_capture('oleobj', args + [full_name, ],
  113 + accept_nonzero_exit=True)
114 114 if glob(self.temp_dir + 'ole-object-*'):
115   - self.fail('found embedded data in {0}'.format(sample_name))
116   - self.assertEqual(ret_val, oleobj.RETURN_NO_DUMP)
  115 + self.fail('found embedded data in {0}. Output:\n{1}'
  116 + .format(sample_name, output))
  117 + self.assertEqual(ret_val, oleobj.RETURN_NO_DUMP,
  118 + msg='Wrong return value {} for {}. Output:\n{}'
  119 + .format(ret_val, sample_name, output))
117 120  
118   - def do_test_md5(self, args, test_fun=oleobj.main):
  121 + def do_test_md5(self, args, test_fun=None, only_run_every=1):
119 122 """ helper for test_md5 and test_md5_args """
120   - # name of sample, extension of embedded file, md5 hash of embedded file
121 123 data_dir = join(DATA_BASE_DIR, 'oleobj')
122   - for sample_name, embedded_name, expect_hash in SAMPLES:
123   - ret_val = test_fun(args + [join(data_dir, sample_name), ])
124   - self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP)
  124 +
  125 + # name of sample, extension of embedded file, md5 hash of embedded file
  126 + for sample_index, (sample_name, embedded_name, expect_hash) \
  127 + in enumerate(SAMPLES):
  128 + if sample_index % only_run_every != 0:
  129 + continue
  130 + args_with_path = args + [join(data_dir, sample_name), ]
  131 + if test_fun is None:
  132 + output, ret_val = call_and_capture('oleobj', args_with_path,
  133 + accept_nonzero_exit=True)
  134 + else:
  135 + ret_val = test_fun(args_with_path)
  136 + output = '[output: see above]'
  137 + self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP,
  138 + msg='Wrong return value {} for {}. Output:\n{}'
  139 + .format(ret_val, sample_name, output))
125 140 expect_name = join(self.temp_dir,
126 141 sample_name + '_' + embedded_name)
127 142 if not isfile(expect_name):
128 143 self.did_fail = True
129   - self.fail('{0} not created from {1}'.format(expect_name,
130   - sample_name))
  144 + self.fail('{0} not created from {1}. Output:\n{2}'
  145 + .format(expect_name, sample_name, output))
131 146 continue
132 147 md5_hash = calc_md5(expect_name)
133 148 if md5_hash != expect_hash:
134 149 self.did_fail = True
135   - self.fail('Wrong md5 {0} of {1} from {2}'
136   - .format(md5_hash, expect_name, sample_name))
  150 + self.fail('Wrong md5 {0} of {1} from {2}. Output:\n{3}'
  151 + .format(md5_hash, expect_name, sample_name, output))
137 152 continue
138 153  
139 154 def test_non_streamed(self):
140 155 """ Ensure old oleobj behaviour still works: pre-read whole file """
141   - return self.do_test_md5(['-d', self.temp_dir], test_fun=preread_file)
  156 + return self.do_test_md5(['-d', self.temp_dir], test_fun=preread_file,
  157 + only_run_every=4)
142 158  
143 159  
144 160 # just in case somebody calls this file as a script
... ...
tests/oleobj/test_external_links.py
... ... @@ -6,7 +6,7 @@ import os
6 6 from os import path
7 7  
8 8 # Directory with test data, independent of current working directory
9   -from tests.test_utils import DATA_BASE_DIR
  9 +from tests.test_utils import DATA_BASE_DIR, call_and_capture
10 10 from oletools import oleobj
11 11  
12 12 BASE_DIR = path.join(DATA_BASE_DIR, 'oleobj', 'external_link')
... ... @@ -22,8 +22,11 @@ class TestExternalLinks(unittest.TestCase):
22 22 for filename in filenames:
23 23 file_path = path.join(dirpath, filename)
24 24  
25   - ret_val = oleobj.main([file_path])
26   - self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP)
  25 + output, ret_val = call_and_capture('oleobj', [file_path, ],
  26 + accept_nonzero_exit=True)
  27 + self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP,
  28 + msg='Wrong return value {} for {}. Output:\n{}'
  29 + .format(ret_val, filename, output))
27 30  
28 31  
29 32 # just in case somebody calls this file as a script
... ...
tests/olevba/test_basic.py
... ... @@ -3,21 +3,71 @@ Test basic functionality of olevba[3]
3 3 """
4 4  
5 5 import unittest
6   -import sys
7   -if sys.version_info.major <= 2:
8   - from oletools import olevba
9   -else:
10   - from oletools import olevba3 as olevba
11 6 import os
12 7 from os.path import join
  8 +import re
13 9  
14 10 # Directory with test data, independent of current working directory
15   -from tests.test_utils import DATA_BASE_DIR
  11 +from tests.test_utils import DATA_BASE_DIR, call_and_capture
16 12  
17 13  
18 14 class TestOlevbaBasic(unittest.TestCase):
19 15 """Tests olevba basic functionality"""
20 16  
  17 + def test_text_behaviour(self):
  18 + """Test behaviour of olevba when presented with pure text file."""
  19 + self.do_test_behaviour('text')
  20 +
  21 + def test_empty_behaviour(self):
  22 + """Test behaviour of olevba when presented with pure text file."""
  23 + self.do_test_behaviour('empty')
  24 +
  25 + def do_test_behaviour(self, filename):
  26 + """Helper for test_{text,empty}_behaviour."""
  27 + input_file = join(DATA_BASE_DIR, 'basic', filename)
  28 + output, _ = call_and_capture('olevba', args=(input_file, ))
  29 +
  30 + # check output
  31 + self.assertTrue(re.search(r'^Type:\s+Text\s*$', output, re.MULTILINE),
  32 + msg='"Type: Text" not found in output:\n' + output)
  33 + self.assertTrue(re.search(r'^No suspicious .+ found.$', output,
  34 + re.MULTILINE),
  35 + msg='"No suspicous...found" not found in output:\n' + \
  36 + output)
  37 + self.assertNotIn('error', output.lower())
  38 +
  39 + # check warnings
  40 + for line in output.splitlines():
  41 + if line.startswith('WARNING ') and 'encrypted' in line:
  42 + continue # encryption warnings are ok
  43 + elif 'warn' in line.lower():
  44 + raise self.fail('Found "warn" in output line: "{}"'
  45 + .format(line.rstrip()))
  46 + self.assertIn('not encrypted', output)
  47 +
  48 + def test_rtf_behaviour(self):
  49 + """Test behaviour of olevba when presented with an rtf file."""
  50 + input_file = join(DATA_BASE_DIR, 'msodde', 'RTF-Spec-1.7.rtf')
  51 + output, ret_code = call_and_capture('olevba', args=(input_file, ),
  52 + accept_nonzero_exit=True)
  53 +
  54 + # check that return code is olevba.RETURN_OPEN_ERROR
  55 + self.assertEqual(ret_code, 5)
  56 +
  57 + # check output:
  58 + self.assertIn('FileOpenError', output)
  59 + self.assertIn('is RTF', output)
  60 + self.assertIn('rtfobj.py', output)
  61 + self.assertIn('not encrypted', output)
  62 +
  63 + # check warnings
  64 + for line in output.splitlines():
  65 + if line.startswith('WARNING ') and 'encrypted' in line:
  66 + continue # encryption warnings are ok
  67 + elif 'warn' in line.lower():
  68 + raise self.fail('Found "warn" in output line: "{}"'
  69 + .format(line.rstrip()))
  70 +
21 71 def test_crypt_return(self):
22 72 """
23 73 Tests that encrypted files give a certain return code.
... ... @@ -28,15 +78,23 @@ class TestOlevbaBasic(unittest.TestCase):
28 78 CRYPT_DIR = join(DATA_BASE_DIR, 'encrypted')
29 79 CRYPT_RETURN_CODE = 9
30 80 ADD_ARGS = [], ['-d', ], ['-a', ], ['-j', ], ['-t', ]
  81 + EXCEPTIONS = ['autostart-encrypt-standardpassword.xls', # These ...
  82 + 'autostart-encrypt-standardpassword.xlsm', # files ...
  83 + 'autostart-encrypt-standardpassword.xlsb', # are ...
  84 + 'dde-test-encrypt-standardpassword.xls', # automati...
  85 + 'dde-test-encrypt-standardpassword.xlsx', # ...cally...
  86 + 'dde-test-encrypt-standardpassword.xlsm', # decrypted.
  87 + 'dde-test-encrypt-standardpassword.xlsb']
31 88 for filename in os.listdir(CRYPT_DIR):
  89 + if filename in EXCEPTIONS:
  90 + continue
32 91 full_name = join(CRYPT_DIR, filename)
33 92 for args in ADD_ARGS:
34   - try:
35   - ret_code = olevba.main(args + [full_name, ])
36   - except SystemExit as se:
37   - ret_code = se.code or 0 # se.code can be None
  93 + _, ret_code = call_and_capture('olevba',
  94 + args=[full_name, ] + args,
  95 + accept_nonzero_exit=True)
38 96 self.assertEqual(ret_code, CRYPT_RETURN_CODE,
39   - msg='Wrong return code {} for args {}'
  97 + msg='Wrong return code {} for args {}'\
40 98 .format(ret_code, args + [filename, ]))
41 99  
42 100  
... ...
tests/olevba/test_crypto.py 0 → 100644
  1 +"""Check decryption of files from olevba works."""
  2 +
  3 +import sys
  4 +import unittest
  5 +from os.path import basename, join as pjoin
  6 +import json
  7 +from collections import OrderedDict
  8 +
  9 +from tests.test_utils import DATA_BASE_DIR, call_and_capture
  10 +
  11 +from oletools import crypto
  12 +
  13 +
  14 +@unittest.skipIf(not crypto.check_msoffcrypto(),
  15 + 'Module msoffcrypto not installed for {}'
  16 + .format(basename(sys.executable)))
  17 +class OlevbaCryptoWriteProtectTest(unittest.TestCase):
  18 + """
  19 + Test documents that are 'write-protected' through encryption.
  20 +
  21 + Excel has a way to 'write-protect' documents by encrypting them with a
  22 + hard-coded standard password. When looking at the file-structure you see
  23 + an OLE-file with streams `EncryptedPackage`, `StrongEncryptionSpace`, and
  24 + `EncryptionInfo`. Contained in the first is the actual file. When opening
  25 + such a file in excel, it is decrypted without the user noticing.
  26 +
  27 + Olevba should detect such encryption, try to decrypt with the standard
  28 + password and look for VBA code in the decrypted file.
  29 +
  30 + All these tests are skipped if the module `msoffcrypto-tools` is not
  31 + installed.
  32 + """
  33 + def test_autostart(self):
  34 + """Check that autostart macro is found in xls[mb] sample file."""
  35 + for suffix in 'xlsm', 'xlsb':
  36 + example_file = pjoin(
  37 + DATA_BASE_DIR, 'encrypted',
  38 + 'autostart-encrypt-standardpassword.' + suffix)
  39 + output, _ = call_and_capture('olevba', args=('-j', example_file),
  40 + exclude_stderr=True)
  41 + data = json.loads(output, object_pairs_hook=OrderedDict)
  42 + # debug: json.dump(data, sys.stdout, indent=4)
  43 + self.assertEqual(len(data), 4)
  44 + self.assertIn('script_name', data[0])
  45 + self.assertIn('version', data[0])
  46 + self.assertEqual(data[0]['type'], 'MetaInformation')
  47 + self.assertIn('return_code', data[-1])
  48 + self.assertEqual(data[-1]['type'], 'MetaInformation')
  49 + self.assertEqual(data[1]['container'], None)
  50 + self.assertEqual(data[1]['file'], example_file)
  51 + self.assertEqual(data[1]['analysis'], None)
  52 + self.assertEqual(data[1]['macros'], [])
  53 + self.assertEqual(data[1]['type'], 'OLE')
  54 + self.assertEqual(data[2]['container'], example_file)
  55 + self.assertNotEqual(data[2]['file'], example_file)
  56 + self.assertEqual(data[2]['type'], "OpenXML")
  57 + analysis = data[2]['analysis']
  58 + self.assertEqual(analysis[0]['type'], 'AutoExec')
  59 + self.assertEqual(analysis[0]['keyword'], 'Auto_Open')
  60 + macros = data[2]['macros']
  61 + self.assertEqual(macros[0]['vba_filename'], 'Modul1.bas')
  62 + self.assertIn('Sub Auto_Open()', macros[0]['code'])
  63 +
  64 +
  65 +if __name__ == '__main__':
  66 + unittest.main()
... ...
tests/ooxml/test_basic.py
... ... @@ -33,6 +33,8 @@ class TestOOXML(unittest.TestCase):
33 33 pptx=ooxml.DOCTYPE_POWERPOINT, pptm=ooxml.DOCTYPE_POWERPOINT,
34 34 ppsx=ooxml.DOCTYPE_POWERPOINT, ppsm=ooxml.DOCTYPE_POWERPOINT,
35 35 potx=ooxml.DOCTYPE_POWERPOINT, potm=ooxml.DOCTYPE_POWERPOINT,
  36 + ods=ooxml.DOCTYPE_NONE, odt=ooxml.DOCTYPE_NONE,
  37 + odp=ooxml.DOCTYPE_NONE,
36 38 )
37 39  
38 40 # files that are neither OLE nor xml:
... ...
tests/ooxml/test_zip_sub_file.py
... ... @@ -144,15 +144,15 @@ class TestZipSubFile(unittest.TestCase):
144 144 self.subfile.seek(0, os.SEEK_END)
145 145 self.compare.seek(0, os.SEEK_END)
146 146  
147   - self.assertEquals(self.compare.read(10), self.subfile.read(10))
148   - self.assertEquals(self.compare.tell(), self.subfile.tell())
  147 + self.assertEqual(self.compare.read(10), self.subfile.read(10))
  148 + self.assertEqual(self.compare.tell(), self.subfile.tell())
149 149  
150 150 self.subfile.seek(0)
151 151 self.compare.seek(0)
152 152 self.subfile.seek(len(FILE_CONTENTS) - 1)
153 153 self.compare.seek(len(FILE_CONTENTS) - 1)
154   - self.assertEquals(self.compare.read(10), self.subfile.read(10))
155   - self.assertEquals(self.compare.tell(), self.subfile.tell())
  154 + self.assertEqual(self.compare.read(10), self.subfile.read(10))
  155 + self.assertEqual(self.compare.tell(), self.subfile.tell())
156 156  
157 157 def test_error_seek(self):
158 158 """ test correct behaviour if seek beyond end (no exception) """
... ...
tests/ppt_parser/test_basic.py
... ... @@ -16,7 +16,7 @@ class TestBasic(unittest.TestCase):
16 16  
17 17 def test_is_ppt(self):
18 18 """ test ppt_record_parser.is_ppt(filename) """
19   - exceptions = []
  19 + exceptions = ['encrypted.ppt', ] # actually is ppt but embedded
20 20 for base_dir, _, files in os.walk(DATA_BASE_DIR):
21 21 for filename in files:
22 22 if filename in exceptions:
... ...
tests/test-data/encrypted/autostart-encrypt-standardpassword.xls 0 → 100644
No preview for this file type
tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsb 0 → 100644
No preview for this file type
tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsm 0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xls 0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsb 0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsm 0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsx 0 → 100644
No preview for this file type
tests/test-data/oleobj/embedded-simple-2007.odp 0 → 100644
No preview for this file type
tests/test-data/oleobj/embedded-simple-2007.ods 0 → 100644
No preview for this file type
tests/test-data/oleobj/embedded-simple-2007.odt 0 → 100644
No preview for this file type
tests/test_utils/__init__.py
1   -from os.path import dirname, join
2   -
3   -# Directory with test data, independent of current working directory
4   -DATA_BASE_DIR = join(dirname(dirname(__file__)), 'test-data')
  1 +from .utils import *
... ...
tests/test_utils/utils.py 0 → 100644
  1 +#!/usr/bin/env python3
  2 +
  3 +"""Utils generally useful for unittests."""
  4 +
  5 +import sys
  6 +import os
  7 +from os.path import dirname, join, abspath
  8 +from subprocess import check_output, PIPE, STDOUT, CalledProcessError
  9 +
  10 +
  11 +# Base dir of project, contains subdirs "tests" and "oletools" and README.md
  12 +PROJECT_ROOT = dirname(dirname(dirname(abspath(__file__))))
  13 +
  14 +# Directory with test data, independent of current working directory
  15 +DATA_BASE_DIR = join(PROJECT_ROOT, 'tests', 'test-data')
  16 +
  17 +# Directory with source code
  18 +SOURCE_BASE_DIR = join(PROJECT_ROOT, 'oletools')
  19 +
  20 +
  21 +def call_and_capture(module, args=None, accept_nonzero_exit=False,
  22 + exclude_stderr=False):
  23 + """
  24 + Run module as script, capturing and returning output and return code.
  25 +
  26 + This is the best way to capture a module's stdout and stderr; trying to
  27 + modify sys.stdout/sys.stderr to StringIO-Buffers frequently causes trouble.
  28 +
  29 + Only drawback sofar: stdout and stderr are merged into one (which is
  30 + what users see on their shell as well). When testing for json-compatible
  31 + output you should `exclude_stderr` to `False` since logging ignores stderr,
  32 + so unforseen warnings (e.g. issued by pypy) would mess up your json.
  33 +
  34 + :param str module: name of module to test, e.g. `olevba`
  35 + :param args: arguments for module's main function
  36 + :param bool fail_nonzero: Raise error if command returns non-0 return code
  37 + :param bool exclude_stderr: Exclude output to `sys.stderr` from output
  38 + (e.g. if parsing output through json)
  39 + :returns: ret_code, output
  40 + :rtype: int, str
  41 + """
  42 + # create a PYTHONPATH environment var to prefer our current code
  43 + env = os.environ.copy()
  44 + try:
  45 + env['PYTHONPATH'] = SOURCE_BASE_DIR + os.pathsep + \
  46 + os.environ['PYTHONPATH']
  47 + except KeyError:
  48 + env['PYTHONPATH'] = SOURCE_BASE_DIR
  49 +
  50 + # hack: in python2 output encoding (sys.stdout.encoding) was None
  51 + # although sys.getdefaultencoding() and sys.getfilesystemencoding were ok
  52 + # TODO: maybe can remove this once branch
  53 + # "encoding-for-non-unicode-environments" is merged
  54 + if 'PYTHONIOENCODING' not in env:
  55 + env['PYTHONIOENCODING'] = 'utf8'
  56 +
  57 + # ensure args is a tuple
  58 + my_args = tuple(args) if args else ()
  59 +
  60 + ret_code = -1
  61 + try:
  62 + output = check_output((sys.executable, '-m', module) + my_args,
  63 + universal_newlines=True, env=env,
  64 + stderr=PIPE if exclude_stderr else STDOUT)
  65 + ret_code = 0
  66 +
  67 + except CalledProcessError as err:
  68 + if accept_nonzero_exit:
  69 + ret_code = err.returncode
  70 + output = err.output
  71 + else:
  72 + print(err.output)
  73 + raise
  74 +
  75 + return output, ret_code
... ...