Commit 677d9ad57da9670ed58cd2d824b5baf7ac7c5c64

Authored by kirk-sayre-work
2 parents be2898f5 c03e948d

Merge remote-tracking branch 'upstream/master'

Showing 93 changed files with 4772 additions and 5845 deletions
.travis.yml
@@ -17,5 +17,8 @@ matrix: @@ -17,5 +17,8 @@ matrix:
17 - python: pypy 17 - python: pypy
18 - python: pypy3 18 - python: pypy3
19 19
  20 +install:
  21 + - pip install msoffcrypto-tool
  22 +
20 script: 23 script:
21 - python setup.py test 24 - python setup.py test
INSTALL.txt
1 -How to Download and Install python-oletools  
2 -=========================================== 1 +How to Download and Install oletools
  2 +====================================
3 3
4 Pre-requisites 4 Pre-requisites
5 -------------- 5 --------------
6 6
7 -The recommended Python version to run oletools is Python 2.7.  
8 -Python 2.6 is also supported, but as it is not tested as often as 2.7, some features  
9 -might not work as expected.  
10 -  
11 -Since v0.50, oletools can also run with Python 3.x. As this is quite new, please  
12 -report any issue you may encounter.  
13 - 7 +The recommended Python version to run oletools is the latest **Python 3.x** (3.7 for now).
  8 +Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly
  9 +recommended to switch to Python 3 now.
14 10
15 Recommended way to Download+Install/Update oletools: pip 11 Recommended way to Download+Install/Update oletools: pip
16 -------------------------------------------------------- 12 --------------------------------------------------------
@@ -23,7 +19,11 @@ system, either upgrade Python or see https://pip.pypa.io/en/stable/installing/ @@ -23,7 +19,11 @@ system, either upgrade Python or see https://pip.pypa.io/en/stable/installing/
23 To download and install/update the latest release version of oletools, 19 To download and install/update the latest release version of oletools,
24 run the following command in a shell: 20 run the following command in a shell:
25 21
  22 +```text
26 sudo -H pip install -U oletools 23 sudo -H pip install -U oletools
  24 +```
  25 +
  26 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
27 27
28 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts 28 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts
29 in /usr/local/bin to run all the oletools from any directory. 29 in /usr/local/bin to run all the oletools from any directory.
@@ -33,7 +33,19 @@ in /usr/local/bin to run all the oletools from any directory. @@ -33,7 +33,19 @@ in /usr/local/bin to run all the oletools from any directory.
33 To download and install/update the latest release version of oletools, 33 To download and install/update the latest release version of oletools,
34 run the following command in a cmd window: 34 run the following command in a cmd window:
35 35
  36 +```text
36 pip install -U oletools 37 pip install -U oletools
  38 +```
  39 +
  40 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  41 +
  42 +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip
  43 +and install for all users. If that is not possible, you may also install only for the current user
  44 +by adding the `--user` option:
  45 +
  46 +```text
  47 +pip3 install -U --user oletools
  48 +```
37 49
38 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts 50 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts
39 to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc. 51 to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc.
@@ -47,18 +59,33 @@ you may also use pip: @@ -47,18 +59,33 @@ you may also use pip:
47 59
48 ### Linux, Mac OSX, Unix 60 ### Linux, Mac OSX, Unix
49 61
  62 +```text
50 sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip 63 sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip
  64 +```
  65 +
  66 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
51 67
52 ### Windows 68 ### Windows
53 69
  70 +```text
54 pip install -U https://github.com/decalage2/oletools/archive/master.zip 71 pip install -U https://github.com/decalage2/oletools/archive/master.zip
  72 +```
  73 +
  74 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  75 +
  76 +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip
  77 +and install for all users. If that is not possible, you may also install only for the current user
  78 +by adding the `--user` option:
55 79
  80 +```text
  81 +pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip
  82 +```
56 83
57 How to install offline - Computer without Internet access 84 How to install offline - Computer without Internet access
58 --------------------------------------------------------- 85 ---------------------------------------------------------
59 86
60 First, download the oletools archive on a computer with Internet access: 87 First, download the oletools archive on a computer with Internet access:
61 -* Latest stable version: from https://github.com/decalage2/oletools/releases 88 +* Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases
62 * Development version: https://github.com/decalage2/oletools/archive/master.zip 89 * Development version: https://github.com/decalage2/oletools/archive/master.zip
63 90
64 Copy the archive file to the target computer. 91 Copy the archive file to the target computer.
@@ -66,11 +93,15 @@ Copy the archive file to the target computer. @@ -66,11 +93,15 @@ Copy the archive file to the target computer.
66 On Linux, Mac OSX, Unix, run the following command using the filename of the 93 On Linux, Mac OSX, Unix, run the following command using the filename of the
67 archive that you downloaded: 94 archive that you downloaded:
68 95
  96 +```text
69 sudo -H pip install -U oletools.zip 97 sudo -H pip install -U oletools.zip
  98 +```
70 99
71 On Windows: 100 On Windows:
72 101
  102 +```text
73 pip install -U oletools.zip 103 pip install -U oletools.zip
  104 +```
74 105
75 106
76 Old school install using setup.py 107 Old school install using setup.py
@@ -88,9 +119,12 @@ Then extract the archive, open a shell and go to the oletools directory. @@ -88,9 +119,12 @@ Then extract the archive, open a shell and go to the oletools directory.
88 119
89 ### Linux, Mac OSX, Unix 120 ### Linux, Mac OSX, Unix
90 121
  122 +```text
91 sudo -H python setup.py install 123 sudo -H python setup.py install
  124 +```
92 125
93 ### Windows: 126 ### Windows:
94 127
  128 +```text
95 python setup.py install 129 python setup.py install
96 - 130 +```
LICENSE.md 0 → 100644
  1 +This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files
  2 +published with their own license.
  3 +
  4 +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)
  5 +
  6 +All rights reserved.
  7 +
  8 +Redistribution and use in source and binary forms, with or without modification,
  9 +are permitted provided that the following conditions are met:
  10 +
  11 + * Redistributions of source code must retain the above copyright notice, this
  12 + list of conditions and the following disclaimer.
  13 + * Redistributions in binary form must reproduce the above copyright notice,
  14 + this list of conditions and the following disclaimer in the documentation
  15 + and/or other materials provided with the distribution.
  16 +
  17 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  18 +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  19 +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  20 +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  21 +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  22 +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  23 +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  24 +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  25 +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  26 +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  27 +
  28 +
  29 +----------
  30 +
  31 +olevba contains modified source code from the officeparser project, published
  32 +under the following MIT License (MIT):
  33 +
  34 +officeparser is copyright (c) 2014 John William Davison
  35 +
  36 +Permission is hereby granted, free of charge, to any person obtaining a copy
  37 +of this software and associated documentation files (the "Software"), to deal
  38 +in the Software without restriction, including without limitation the rights
  39 +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  40 +copies of the Software, and to permit persons to whom the Software is
  41 +furnished to do so, subject to the following conditions:
  42 +
  43 +The above copyright notice and this permission notice shall be included in all
  44 +copies or substantial portions of the Software.
  45 +
  46 +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  47 +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  48 +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  49 +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  50 +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  51 +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  52 +SOFTWARE.
MANIFEST.in 0 → 100644
  1 +include install.bat
  2 +include INSTALL.txt
  3 +include README.md
  4 +include requirements.txt
  5 +include oletools/README.rst
  6 +include oletools/README.html
  7 +include oletools/LICENSE.txt
  8 +include oletools/DocVarDump.vba
  9 +recursive-include oletools/thirdparty *.*
  10 +recursive-include cheatsheet *.*
  11 +global-exclude *.pyc
  12 +
  13 +recursive-include tests *.py
  14 +graft tests/test-data
README.md
@@ -26,7 +26,25 @@ Note: python-oletools is not related to OLETools published by BeCubed Software. @@ -26,7 +26,25 @@ Note: python-oletools is not related to OLETools published by BeCubed Software.
26 News 26 News
27 ---- 27 ----
28 28
29 -- **2018-05-30 v0.53**: 29 +- **2019-05-22 v0.54.2**:
  30 + - bugfix release: fixed several issues related to encrypted documents
  31 + and XLM/XLF Excel 4 macros
  32 + - msoffcrypto-tool is now installed by default to handle encrypted documents
  33 + - olevba and msodde now handle documents encrypted with common passwords such
  34 + as 123, 1234, 4321, 12345, 123456, VelvetSweatShop automatically.
  35 +- **2019-04-04 v0.54**:
  36 + - olevba, msodde: added support for encrypted MS Office files
  37 + - olevba: added detection and extraction of XLM/XLF Excel 4 macros (thanks to plugin_biff from Didier Stevens' oledump)
  38 + - olevba, mraptor: added detection of VBA running Excel 4 macros
  39 + - olevba: detect and display special characters such as backspace
  40 + - olevba: colorized output showing suspicious keywords in the VBA code
  41 + - olevba, mraptor: full Python 3 compatibility, no separate olevba3/mraptor3 anymore
  42 + - olevba: improved handling of code pages and unicode
  43 + - olevba: fixed a false-positive in VBA macro detection
  44 + - rtfobj: improved OLE Package handling, improved Equation object detection
  45 + - oleobj: added detection of external links to objects in OpenXML
  46 + - replaced third party packages by PyPI dependencies
  47 +- 2018-05-30 v0.53:
30 - olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML files (aka Flat OPC format) 48 - olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML files (aka Flat OPC format)
31 - improved support for VBA forms in olevba (oleform) 49 - improved support for VBA forms in olevba (oleform)
32 - rtfobj now displays the CLSID of OLE objects, which is the best way to identify them. Known-bad CLSIDs such as MS Equation Editor are highlighted in red. 50 - rtfobj now displays the CLSID of OLE objects, which is the best way to identify them. Known-bad CLSIDs such as MS Equation Editor are highlighted in red.
@@ -75,26 +93,38 @@ Projects using oletools: @@ -75,26 +93,38 @@ Projects using oletools:
75 ------------------------ 93 ------------------------
76 94
77 oletools are used by a number of projects and online malware analysis services, 95 oletools are used by a number of projects and online malware analysis services,
78 -including [Viper](http://viper.li/), [REMnux](https://remnux.org/), 96 +including
  97 +[ACE](https://github.com/IntegralDefense/ACE),
  98 +[Anlyz.io](https://sandbox.anlyz.io/),
  99 +[AssemblyLine](https://www.cse-cst.gc.ca/en/assemblyline),
  100 +[CAPE](https://github.com/ctxis/CAPE),
  101 +[Cuckoo Sandbox](https://github.com/cuckoosandbox/cuckoo),
  102 +[DARKSURGEON](https://github.com/cryps1s/DARKSURGEON),
  103 +[Deepviz](https://sandbox.deepviz.com/),
  104 +[dridex.malwareconfig.com](https://dridex.malwareconfig.com),
79 [FAME](https://certsocietegenerale.github.io/fame/), 105 [FAME](https://certsocietegenerale.github.io/fame/),
  106 +[FLARE-VM](https://github.com/fireeye/flare-vm),
80 [Hybrid-analysis.com](https://www.hybrid-analysis.com/), 107 [Hybrid-analysis.com](https://www.hybrid-analysis.com/),
81 [Joe Sandbox](https://www.document-analyzer.net/), 108 [Joe Sandbox](https://www.document-analyzer.net/),
82 -[Deepviz](https://sandbox.deepviz.com/),  
83 [Laika BOSS](https://github.com/lmco/laikaboss), 109 [Laika BOSS](https://github.com/lmco/laikaboss),
84 -[Cuckoo Sandbox](https://github.com/cuckoosandbox/cuckoo),  
85 -[Anlyz.io](https://sandbox.anlyz.io/),  
86 -[ViperMonkey](https://github.com/decalage2/ViperMonkey),  
87 -[pcodedmp](https://github.com/bontchev/pcodedmp),  
88 -[dridex.malwareconfig.com](https://dridex.malwareconfig.com),  
89 -[Snake](https://github.com/countercept/snake),  
90 -[DARKSURGEON](https://github.com/cryps1s/DARKSURGEON),  
91 -[CAPE](https://github.com/ctxis/CAPE),  
92 -[AssemblyLine](https://www.cse-cst.gc.ca/en/assemblyline), 110 +[MacroMilter](https://github.com/sbidy/MacroMilter),
93 [malshare.io](https://malshare.io), 111 [malshare.io](https://malshare.io),
94 -[Malware Repository Framework (MRF)](https://www.adlice.com/download/mrf/),  
95 [malware-repo](https://github.com/Tigzy/malware-repo), 112 [malware-repo](https://github.com/Tigzy/malware-repo),
96 -[Vba2Graph](https://github.com/MalwareCantFly/Vba2Graph), 113 +[Malware Repository Framework (MRF)](https://www.adlice.com/download/mrf/),
  114 +[olefy](https://github.com/HeinleinSupport/olefy),
  115 +[PeekabooAV](https://github.com/scVENUS/PeekabooAV),
  116 +[pcodedmp](https://github.com/bontchev/pcodedmp),
  117 +[PyCIRCLean](https://github.com/CIRCL/PyCIRCLean),
  118 +[REMnux](https://remnux.org/),
  119 +[Snake](https://github.com/countercept/snake),
  120 +[SNDBOX](https://app.sndbox.com),
97 [Strelka](https://github.com/target/strelka), 121 [Strelka](https://github.com/target/strelka),
  122 +[stoQ](https://stoq.punchcyber.com/),
  123 +[TheHive/Cortex](https://github.com/TheHive-Project/Cortex-Analyzers),
  124 +[Vba2Graph](https://github.com/MalwareCantFly/Vba2Graph),
  125 +[Viper](http://viper.li/),
  126 +[ViperMonkey](https://github.com/decalage2/ViperMonkey),
  127 +[YOMI](https://yomi.yoroi.company),
98 and probably [VirusTotal](https://www.virustotal.com). 128 and probably [VirusTotal](https://www.virustotal.com).
99 And quite a few [other projects on GitHub](https://github.com/search?q=oletools&type=Repositories). 129 And quite a few [other projects on GitHub](https://github.com/search?q=oletools&type=Repositories).
100 (Please [contact me]((http://decalage.info/contact)) if you have or know 130 (Please [contact me]((http://decalage.info/contact)) if you have or know
@@ -149,7 +179,7 @@ License @@ -149,7 +179,7 @@ License
149 This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files 179 This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files
150 published with their own license. 180 published with their own license.
151 181
152 -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (http://www.decalage.info) 182 +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)
153 183
154 All rights reserved. 184 All rights reserved.
155 185
oletools/LICENSE.txt
1 -LICENSE for the python-oletools package:  
2 -  
3 -This license applies to the python-oletools package, apart from the thirdparty  
4 -folder which contains third-party files published with their own license.  
5 -  
6 -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (http://www.decalage.info)  
7 -  
8 -All rights reserved.  
9 -  
10 -Redistribution and use in source and binary forms, with or without modification,  
11 -are permitted provided that the following conditions are met:  
12 -  
13 - * Redistributions of source code must retain the above copyright notice, this  
14 - list of conditions and the following disclaimer.  
15 - * Redistributions in binary form must reproduce the above copyright notice,  
16 - this list of conditions and the following disclaimer in the documentation  
17 - and/or other materials provided with the distribution.  
18 -  
19 -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND  
20 -ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED  
21 -WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE  
22 -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE  
23 -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL  
24 -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR  
25 -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER  
26 -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,  
27 -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE  
28 -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.  
29 -  
30 -  
31 -----------  
32 -  
33 -olevba contains modified source code from the officeparser project, published  
34 -under the following MIT License (MIT):  
35 -  
36 -officeparser is copyright (c) 2014 John William Davison  
37 -  
38 -Permission is hereby granted, free of charge, to any person obtaining a copy  
39 -of this software and associated documentation files (the "Software"), to deal  
40 -in the Software without restriction, including without limitation the rights  
41 -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell  
42 -copies of the Software, and to permit persons to whom the Software is  
43 -furnished to do so, subject to the following conditions:  
44 -  
45 -The above copyright notice and this permission notice shall be included in all  
46 -copies or substantial portions of the Software.  
47 -  
48 -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR  
49 -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,  
50 -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE  
51 -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER  
52 -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,  
53 -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE  
54 -SOFTWARE. 1 +LICENSE for the python-oletools package:
  2 +
  3 +This license applies to the python-oletools package, apart from the thirdparty
  4 +folder which contains third-party files published with their own license.
  5 +
  6 +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)
  7 +
  8 +All rights reserved.
  9 +
  10 +Redistribution and use in source and binary forms, with or without modification,
  11 +are permitted provided that the following conditions are met:
  12 +
  13 + * Redistributions of source code must retain the above copyright notice, this
  14 + list of conditions and the following disclaimer.
  15 + * Redistributions in binary form must reproduce the above copyright notice,
  16 + this list of conditions and the following disclaimer in the documentation
  17 + and/or other materials provided with the distribution.
  18 +
  19 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  20 +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  21 +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  22 +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  23 +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  24 +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  25 +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  26 +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  27 +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  28 +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  29 +
  30 +
  31 +----------
  32 +
  33 +olevba contains modified source code from the officeparser project, published
  34 +under the following MIT License (MIT):
  35 +
  36 +officeparser is copyright (c) 2014 John William Davison
  37 +
  38 +Permission is hereby granted, free of charge, to any person obtaining a copy
  39 +of this software and associated documentation files (the "Software"), to deal
  40 +in the Software without restriction, including without limitation the rights
  41 +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
  42 +copies of the Software, and to permit persons to whom the Software is
  43 +furnished to do so, subject to the following conditions:
  44 +
  45 +The above copyright notice and this permission notice shall be included in all
  46 +copies or substantial portions of the Software.
  47 +
  48 +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
  49 +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
  50 +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
  51 +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
  52 +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
  53 +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
  54 +SOFTWARE.
oletools/README.html
@@ -17,13 +17,33 @@ @@ -17,13 +17,33 @@
17 </head> 17 </head>
18 <body> 18 <body>
19 <h1 id="python-oletools">python-oletools</h1> 19 <h1 id="python-oletools">python-oletools</h1>
20 -<p><a href="https://pypi.org/project/oletools/"><img src="https://img.shields.io/pypi/v/oletools.svg" alt="PyPI" /></a> <a href="https://travis-ci.org/decalage2/oletools"><img src="https://travis-ci.org/decalage2/oletools.svg?branch=master" alt="Build Status" /></a></p> 20 +<p><a href="https://pypi.org/project/oletools/"><img src="https://img.shields.io/pypi/v/oletools.svg" alt="PyPI" /></a> <a href="https://travis-ci.org/decalage2/oletools"><img src="https://travis-ci.org/decalage2/oletools.svg?branch=master" alt="Build Status" /></a> <a href="https://saythanks.io/to/decalage2"><img src="https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg" alt="Say Thanks!" /></a></p>
21 <p><a href="http://www.decalage.info/python/oletools">oletools</a> is a package of python tools to analyze <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on the <a href="http://www.decalage.info/olefile">olefile</a> parser. See <a href="http://www.decalage.info/python/oletools" class="uri">http://www.decalage.info/python/oletools</a> for more info.</p> 21 <p><a href="http://www.decalage.info/python/oletools">oletools</a> is a package of python tools to analyze <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on the <a href="http://www.decalage.info/olefile">olefile</a> parser. See <a href="http://www.decalage.info/python/oletools" class="uri">http://www.decalage.info/python/oletools</a> for more info.</p>
22 <p><strong>Quick links:</strong> <a href="http://www.decalage.info/python/oletools">Home page</a> - <a href="https://github.com/decalage2/oletools/wiki/Install">Download/Install</a> - <a href="https://github.com/decalage2/oletools/wiki">Documentation</a> - <a href="https://github.com/decalage2/oletools/issues">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the Author</a> - <a href="https://github.com/decalage2/oletools">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a> <a href="https://github.com/decalage2/oletools/blob/master/cheatsheet/oletools_cheatsheet.pdf">Cheatsheet</a></p> 22 <p><strong>Quick links:</strong> <a href="http://www.decalage.info/python/oletools">Home page</a> - <a href="https://github.com/decalage2/oletools/wiki/Install">Download/Install</a> - <a href="https://github.com/decalage2/oletools/wiki">Documentation</a> - <a href="https://github.com/decalage2/oletools/issues">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the Author</a> - <a href="https://github.com/decalage2/oletools">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a> <a href="https://github.com/decalage2/oletools/blob/master/cheatsheet/oletools_cheatsheet.pdf">Cheatsheet</a></p>
23 <p>Note: python-oletools is not related to OLETools published by BeCubed Software.</p> 23 <p>Note: python-oletools is not related to OLETools published by BeCubed Software.</p>
24 <h2 id="news">News</h2> 24 <h2 id="news">News</h2>
25 <ul> 25 <ul>
26 -<li><strong>2018-05-30 v0.53</strong>: 26 +<li><strong>2019-05-22 v0.54.2</strong>:
  27 +<ul>
  28 +<li>bugfix release: fixed several issues related to encrypted documents and XLM/XLF Excel 4 macros</li>
  29 +<li>msoffcrypto-tool is now installed by default to handle encrypted documents</li>
  30 +<li>olevba and msodde now handle documents encrypted with common passwords such as 123, 1234, 4321, 12345, 123456, VelvetSweatShop automatically.</li>
  31 +</ul></li>
  32 +<li><strong>2019-04-04 v0.54</strong>:
  33 +<ul>
  34 +<li>olevba, msodde: added support for encrypted MS Office files</li>
  35 +<li>olevba: added detection and extraction of XLM/XLF Excel 4 macros (thanks to plugin_biff from Didier Stevens' oledump)</li>
  36 +<li>olevba, mraptor: added detection of VBA running Excel 4 macros</li>
  37 +<li>olevba: detect and display special characters such as backspace</li>
  38 +<li>olevba: colorized output showing suspicious keywords in the VBA code</li>
  39 +<li>olevba, mraptor: full Python 3 compatibility, no separate olevba3/mraptor3 anymore</li>
  40 +<li>olevba: improved handling of code pages and unicode</li>
  41 +<li>olevba: fixed a false-positive in VBA macro detection</li>
  42 +<li>rtfobj: improved OLE Package handling, improved Equation object detection</li>
  43 +<li>oleobj: added detection of external links to objects in OpenXML</li>
  44 +<li>replaced third party packages by PyPI dependencies</li>
  45 +</ul></li>
  46 +<li>2018-05-30 v0.53:
27 <ul> 47 <ul>
28 <li>olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML files (aka Flat OPC format)</li> 48 <li>olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML files (aka Flat OPC format)</li>
29 <li>improved support for VBA forms in olevba (oleform)</li> 49 <li>improved support for VBA forms in olevba (oleform)</li>
@@ -66,7 +86,7 @@ @@ -66,7 +86,7 @@
66 <li><a href="https://github.com/decalage2/oletools/wiki/olemap">olemap</a>: to display a map of all the sectors in an OLE file.</li> 86 <li><a href="https://github.com/decalage2/oletools/wiki/olemap">olemap</a>: to display a map of all the sectors in an OLE file.</li>
67 </ul> 87 </ul>
68 <h2 id="projects-using-oletools">Projects using oletools:</h2> 88 <h2 id="projects-using-oletools">Projects using oletools:</h2>
69 -<p>oletools are used by a number of projects and online malware analysis services, including <a href="http://viper.li/">Viper</a>, <a href="https://remnux.org/">REMnux</a>, <a href="https://certsocietegenerale.github.io/fame/">FAME</a>, <a href="https://www.hybrid-analysis.com/">Hybrid-analysis.com</a>, <a href="https://www.document-analyzer.net/">Joe Sandbox</a>, <a href="https://sandbox.deepviz.com/">Deepviz</a>, <a href="https://github.com/lmco/laikaboss">Laika BOSS</a>, <a href="https://github.com/cuckoosandbox/cuckoo">Cuckoo Sandbox</a>, <a href="https://sandbox.anlyz.io/">Anlyz.io</a>, <a href="https://github.com/decalage2/ViperMonkey">ViperMonkey</a>, <a href="https://github.com/bontchev/pcodedmp">pcodedmp</a>, <a href="https://dridex.malwareconfig.com">dridex.malwareconfig.com</a>, <a href="https://github.com/countercept/snake">Snake</a>, <a href="https://github.com/cryps1s/DARKSURGEON">DARKSURGEON</a>, and probably <a href="https://www.virustotal.com">VirusTotal</a>. (Please <a href="(http://decalage.info/contact)">contact me</a> if you have or know a project using oletools)</p> 89 +<p>oletools are used by a number of projects and online malware analysis services, including <a href="http://viper.li/">Viper</a>, <a href="https://remnux.org/">REMnux</a>, <a href="https://github.com/fireeye/flare-vm">FLARE-VM</a>, <a href="https://certsocietegenerale.github.io/fame/">FAME</a>, <a href="https://www.hybrid-analysis.com/">Hybrid-analysis.com</a>, <a href="https://www.document-analyzer.net/">Joe Sandbox</a>, <a href="https://sandbox.deepviz.com/">Deepviz</a>, <a href="https://github.com/lmco/laikaboss">Laika BOSS</a>, <a href="https://github.com/cuckoosandbox/cuckoo">Cuckoo Sandbox</a>, <a href="https://sandbox.anlyz.io/">Anlyz.io</a>, <a href="https://github.com/decalage2/ViperMonkey">ViperMonkey</a>, <a href="https://github.com/bontchev/pcodedmp">pcodedmp</a>, <a href="https://dridex.malwareconfig.com">dridex.malwareconfig.com</a>, <a href="https://github.com/countercept/snake">Snake</a>, <a href="https://github.com/cryps1s/DARKSURGEON">DARKSURGEON</a>, <a href="https://github.com/ctxis/CAPE">CAPE</a>, <a href="https://www.cse-cst.gc.ca/en/assemblyline">AssemblyLine</a>, <a href="https://malshare.io">malshare.io</a>, <a href="https://www.adlice.com/download/mrf/">Malware Repository Framework (MRF)</a>, <a href="https://github.com/Tigzy/malware-repo">malware-repo</a>, <a href="https://github.com/MalwareCantFly/Vba2Graph">Vba2Graph</a>, <a href="https://github.com/target/strelka">Strelka</a>, <a href="https://stoq.punchcyber.com/">stoQ</a>, <a href="https://yomi.yoroi.company">YOMI</a>, and probably <a href="https://www.virustotal.com">VirusTotal</a>. And quite a few <a href="https://github.com/search?q=oletools&amp;type=Repositories">other projects on GitHub</a>. (Please <a href="(http://decalage.info/contact)">contact me</a> if you have or know a project using oletools)</p>
70 <h2 id="download-and-install">Download and Install:</h2> 90 <h2 id="download-and-install">Download and Install:</h2>
71 <p>The recommended way to download and install/update the <strong>latest stable release</strong> of oletools is to use <a href="https://pip.pypa.io/en/stable/installing/">pip</a>:</p> 91 <p>The recommended way to download and install/update the <strong>latest stable release</strong> of oletools is to use <a href="https://pip.pypa.io/en/stable/installing/">pip</a>:</p>
72 <ul> 92 <ul>
@@ -89,7 +109,7 @@ @@ -89,7 +109,7 @@
89 <p>The code is available in <a href="https://github.com/decalage2/oletools">a GitHub repository</a>. You may use it to submit enhancements using forks and pull requests.</p> 109 <p>The code is available in <a href="https://github.com/decalage2/oletools">a GitHub repository</a>. You may use it to submit enhancements using forks and pull requests.</p>
90 <h2 id="license">License</h2> 110 <h2 id="license">License</h2>
91 <p>This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files published with their own license.</p> 111 <p>This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files published with their own license.</p>
92 -<p>The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (http://www.decalage.info)</p> 112 +<p>The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)</p>
93 <p>All rights reserved.</p> 113 <p>All rights reserved.</p>
94 <p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p> 114 <p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p>
95 <ul> 115 <ul>
oletools/README.rst
1 python-oletools 1 python-oletools
2 =============== 2 ===============
3 3
4 -|PyPI| |Build Status| 4 +|PyPI| |Build Status| |Say Thanks!|
5 5
6 `oletools <http://www.decalage.info/python/oletools>`__ is a package of 6 `oletools <http://www.decalage.info/python/oletools>`__ is a package of
7 python tools to analyze `Microsoft OLE2 7 python tools to analyze `Microsoft OLE2
@@ -29,7 +29,35 @@ Software. @@ -29,7 +29,35 @@ Software.
29 News 29 News
30 ---- 30 ----
31 31
32 -- **2018-05-30 v0.53**: 32 +- **2019-05-22 v0.54.2**:
  33 +
  34 + - bugfix release: fixed several issues related to encrypted
  35 + documents and XLM/XLF Excel 4 macros
  36 + - msoffcrypto-tool is now installed by default to handle encrypted
  37 + documents
  38 + - olevba and msodde now handle documents encrypted with common
  39 + passwords such as 123, 1234, 4321, 12345, 123456, VelvetSweatShop
  40 + automatically.
  41 +
  42 +- **2019-04-04 v0.54**:
  43 +
  44 + - olevba, msodde: added support for encrypted MS Office files
  45 + - olevba: added detection and extraction of XLM/XLF Excel 4 macros
  46 + (thanks to plugin_biff from Didier Stevens' oledump)
  47 + - olevba, mraptor: added detection of VBA running Excel 4 macros
  48 + - olevba: detect and display special characters such as backspace
  49 + - olevba: colorized output showing suspicious keywords in the VBA
  50 + code
  51 + - olevba, mraptor: full Python 3 compatibility, no separate
  52 + olevba3/mraptor3 anymore
  53 + - olevba: improved handling of code pages and unicode
  54 + - olevba: fixed a false-positive in VBA macro detection
  55 + - rtfobj: improved OLE Package handling, improved Equation object
  56 + detection
  57 + - oleobj: added detection of external links to objects in OpenXML
  58 + - replaced third party packages by PyPI dependencies
  59 +
  60 +- 2018-05-30 v0.53:
33 61
34 - olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML 62 - olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML
35 files (aka Flat OPC format) 63 files (aka Flat OPC format)
@@ -115,6 +143,7 @@ Projects using oletools: @@ -115,6 +143,7 @@ Projects using oletools:
115 oletools are used by a number of projects and online malware analysis 143 oletools are used by a number of projects and online malware analysis
116 services, including `Viper <http://viper.li/>`__, 144 services, including `Viper <http://viper.li/>`__,
117 `REMnux <https://remnux.org/>`__, 145 `REMnux <https://remnux.org/>`__,
  146 +`FLARE-VM <https://github.com/fireeye/flare-vm>`__,
118 `FAME <https://certsocietegenerale.github.io/fame/>`__, 147 `FAME <https://certsocietegenerale.github.io/fame/>`__,
119 `Hybrid-analysis.com <https://www.hybrid-analysis.com/>`__, `Joe 148 `Hybrid-analysis.com <https://www.hybrid-analysis.com/>`__, `Joe
120 Sandbox <https://www.document-analyzer.net/>`__, 149 Sandbox <https://www.document-analyzer.net/>`__,
@@ -126,10 +155,21 @@ Sandbox &lt;https://github.com/cuckoosandbox/cuckoo&gt;`__, @@ -126,10 +155,21 @@ Sandbox &lt;https://github.com/cuckoosandbox/cuckoo&gt;`__,
126 `pcodedmp <https://github.com/bontchev/pcodedmp>`__, 155 `pcodedmp <https://github.com/bontchev/pcodedmp>`__,
127 `dridex.malwareconfig.com <https://dridex.malwareconfig.com>`__, 156 `dridex.malwareconfig.com <https://dridex.malwareconfig.com>`__,
128 `Snake <https://github.com/countercept/snake>`__, 157 `Snake <https://github.com/countercept/snake>`__,
129 -`DARKSURGEON <https://github.com/cryps1s/DARKSURGEON>`__, and probably  
130 -`VirusTotal <https://www.virustotal.com>`__. (Please `contact  
131 -me <(http://decalage.info/contact)>`__ if you have or know a project  
132 -using oletools) 158 +`DARKSURGEON <https://github.com/cryps1s/DARKSURGEON>`__,
  159 +`CAPE <https://github.com/ctxis/CAPE>`__,
  160 +`AssemblyLine <https://www.cse-cst.gc.ca/en/assemblyline>`__,
  161 +`malshare.io <https://malshare.io>`__, `Malware Repository Framework
  162 +(MRF) <https://www.adlice.com/download/mrf/>`__,
  163 +`malware-repo <https://github.com/Tigzy/malware-repo>`__,
  164 +`Vba2Graph <https://github.com/MalwareCantFly/Vba2Graph>`__,
  165 +`Strelka <https://github.com/target/strelka>`__,
  166 +`stoQ <https://stoq.punchcyber.com/>`__,
  167 +`YOMI <https://yomi.yoroi.company>`__, and probably
  168 +`VirusTotal <https://www.virustotal.com>`__. And quite a few `other
  169 +projects on
  170 +GitHub <https://github.com/search?q=oletools&type=Repositories>`__.
  171 +(Please `contact me <(http://decalage.info/contact)>`__ if you have or
  172 +know a project using oletools)
133 173
134 Download and Install: 174 Download and Install:
135 --------------------- 175 ---------------------
@@ -186,7 +226,7 @@ This license applies to the python-oletools package, apart from the @@ -186,7 +226,7 @@ This license applies to the python-oletools package, apart from the
186 thirdparty folder which contains third-party files published with their 226 thirdparty folder which contains third-party files published with their
187 own license. 227 own license.
188 228
189 -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec 229 +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec
190 (http://www.decalage.info) 230 (http://www.decalage.info)
191 231
192 All rights reserved. 232 All rights reserved.
@@ -243,3 +283,5 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. @@ -243,3 +283,5 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
243 :target: https://pypi.org/project/oletools/ 283 :target: https://pypi.org/project/oletools/
244 .. |Build Status| image:: https://travis-ci.org/decalage2/oletools.svg?branch=master 284 .. |Build Status| image:: https://travis-ci.org/decalage2/oletools.svg?branch=master
245 :target: https://travis-ci.org/decalage2/oletools 285 :target: https://travis-ci.org/decalage2/oletools
  286 +.. |Say Thanks!| image:: https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg
  287 + :target: https://saythanks.io/to/decalage2
oletools/common/clsid.py
@@ -12,7 +12,7 @@ http://www.decalage.info/python/oletools @@ -12,7 +12,7 @@ http://www.decalage.info/python/oletools
12 12
13 #=== LICENSE ================================================================== 13 #=== LICENSE ==================================================================
14 14
15 -# oletools are copyright (c) 2018 Philippe Lagadec (http://www.decalage.info) 15 +# oletools are copyright (c) 2018-2019 Philippe Lagadec (http://www.decalage.info)
16 # All rights reserved. 16 # All rights reserved.
17 # 17 #
18 # Redistribution and use in source and binary forms, with or without modification, 18 # Redistribution and use in source and binary forms, with or without modification,
@@ -43,7 +43,7 @@ http://www.decalage.info/python/oletools @@ -43,7 +43,7 @@ http://www.decalage.info/python/oletools
43 # 2018-04-18 PL: - added known-bad CLSIDs from Cuckoo sandbox (issue #290) 43 # 2018-04-18 PL: - added known-bad CLSIDs from Cuckoo sandbox (issue #290)
44 # 2018-05-08 PL: - added more CLSIDs (issues #299, #304), merged and sorted 44 # 2018-05-08 PL: - added more CLSIDs (issues #299, #304), merged and sorted
45 45
46 -__version__ = '0.54dev3' 46 +__version__ = '0.54'
47 47
48 48
49 # REFERENCES: 49 # REFERENCES:
@@ -137,9 +137,23 @@ KNOWN_CLSIDS = { @@ -137,9 +137,23 @@ KNOWN_CLSIDS = {
137 '85131630-480C-11D2-B1F9-00C04F86C324': 'scrrun.dll - JS File Host Encode Object (ProgID: JSFile.HostEncode)', 137 '85131630-480C-11D2-B1F9-00C04F86C324': 'scrrun.dll - JS File Host Encode Object (ProgID: JSFile.HostEncode)',
138 '85131631-480C-11D2-B1F9-00C04F86C324': 'scrrun.dll - VBS File Host Encode Object (ProgID: VBSFile.HostEncode)', 138 '85131631-480C-11D2-B1F9-00C04F86C324': 'scrrun.dll - VBS File Host Encode Object (ProgID: VBSFile.HostEncode)',
139 '8627E73B-B5AA-4643-A3B0-570EDA17E3E7': 'UmOutlookAddin.ButtonBar (potential exploit document CVE-2016-0042 / MS16-014)', 139 '8627E73B-B5AA-4643-A3B0-570EDA17E3E7': 'UmOutlookAddin.ButtonBar (potential exploit document CVE-2016-0042 / MS16-014)',
  140 + '88D969E5-F192-11D4-A65F-0040963251E5': 'Msxml2.DOMDocument.5.0',
  141 + '88D969E9-F192-11D4-A65F-0040963251E5': 'Msxml2.DSOControl.5.0',
  142 + '88D969E6-F192-11D4-A65F-0040963251E5': 'Msxml2.FreeThreadedDOMDocument.5.0',
  143 + '88D969F5-F192-11D4-A65F-0040963251E5': 'Msxml2.MXDigitalSignature.5.0',
  144 + '88D969F0-F192-11D4-A65F-0040963251E5': 'Msxml2.MXHTMLWriter.5.0',
  145 + '88D969F1-F192-11D4-A65F-0040963251E5': 'Msxml2.MXNamespaceManager.5.0',
  146 + '88D969EF-F192-11D4-A65F-0040963251E5': 'Msxml2.MXXMLWriter.5.0',
  147 + '88D969EE-F192-11D4-A65F-0040963251E5': 'Msxml2.SAXAttributes.5.0',
  148 + '88D969EC-8B8B-4C3D-859E-AF6CD158BE0F': 'Msxml2.SAXXMLReader.5.0',
  149 + '88D969EB-F192-11D4-A65F-0040963251E5': 'Msxml2.ServerXMLHTTP.5.0',
  150 + '88D969EA-F192-11D4-A65F-0040963251E5': 'Msxml2.XMLHTTP.5.0',
  151 + '88D969E7-F192-11D4-A65F-0040963251E5': 'Msxml2.XMLSchemaCache.5.0',
  152 + '88D969E8-F192-11D4-A65F-0040963251E5': 'Msxml2.XSLTemplate.5.0',
140 '8E75D913-3D21-11D2-85C4-080009A0C626': 'AutoCAD 2004-2006 Document', 153 '8E75D913-3D21-11D2-85C4-080009A0C626': 'AutoCAD 2004-2006 Document',
141 '9181DC5F-E07D-418A-ACA6-8EEA1ECB8E9E': 'MSCOMCTL.TreeCtrl (may trigger CVE-2012-0158)', 154 '9181DC5F-E07D-418A-ACA6-8EEA1ECB8E9E': 'MSCOMCTL.TreeCtrl (may trigger CVE-2012-0158)',
142 '975797FC-4E2A-11D0-B702-00C04FD8DBF7': 'Loads ELSEXT.DLL (Known Related to CVE-2015-6128)', 155 '975797FC-4E2A-11D0-B702-00C04FD8DBF7': 'Loads ELSEXT.DLL (Known Related to CVE-2015-6128)',
  156 + '978C9E23-D4B0-11CE-BF2D-00AA003F40D0': 'Microsoft Forms 2.0 Label (Forms.Label.1)',
143 '996BF5E0-8044-4650-ADEB-0B013914E99C': 'MSCOMCTL.ListViewCtrl (may trigger CVE-2012-0158)', 157 '996BF5E0-8044-4650-ADEB-0B013914E99C': 'MSCOMCTL.ListViewCtrl (may trigger CVE-2012-0158)',
144 'A08A033D-1A75-4AB6-A166-EAD02F547959': 'otkloadr WRAssembly Object (can be used to bypass ASLR after triggering an exploit)', 158 'A08A033D-1A75-4AB6-A166-EAD02F547959': 'otkloadr WRAssembly Object (can be used to bypass ASLR after triggering an exploit)',
145 'B54F3741-5B07-11CF-A4B0-00AA004A55E8': 'vbscript.dll - VB Script Language (ProgID: VBS, VBScript)', 159 'B54F3741-5B07-11CF-A4B0-00AA004A55E8': 'vbscript.dll - VB Script Language (ProgID: VBS, VBScript)',
oletools/common/codepages.py 0 → 100644
  1 +"""
  2 +codepages.py
  3 +
  4 +codepages is a python module to map code pages (numbers) to Python codecs,
  5 +in order to decode bytes to unicode.
  6 +It also provides the name/description of code pages.
  7 +
  8 +Author: Philippe Lagadec - http://www.decalage.info
  9 +License: BSD, see source code or documentation
  10 +
  11 +codepages is part of the python-oletools package:
  12 +http://www.decalage.info/python/oletools
  13 +"""
  14 +
  15 +# === LICENSE ==================================================================
  16 +
  17 +# codepages is copyright (c) 2018-2019 Philippe Lagadec (http://www.decalage.info)
  18 +# All rights reserved.
  19 +#
  20 +# Redistribution and use in source and binary forms, with or without modification,
  21 +# are permitted provided that the following conditions are met:
  22 +#
  23 +# * Redistributions of source code must retain the above copyright notice, this
  24 +# list of conditions and the following disclaimer.
  25 +# * Redistributions in binary form must reproduce the above copyright notice,
  26 +# this list of conditions and the following disclaimer in the documentation
  27 +# and/or other materials provided with the distribution.
  28 +#
  29 +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  30 +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  31 +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  32 +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  33 +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  34 +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  35 +# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  36 +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  37 +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  38 +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  39 +
  40 +
  41 +# -----------------------------------------------------------------------------
  42 +# CHANGELOG:
  43 +# 2018-12-13 v0.54 PL: - first version
  44 +# 2019-01-30 PL: - added a few code pages from xlrd
  45 +
  46 +__version__ = '0.54'
  47 +
  48 +# -----------------------------------------------------------------------------
  49 +# TODO:
  50 +# TODO: check also http://www.aivosto.com/articles/charsets-codepages.html
  51 +# TODO: https://en.wikipedia.org/wiki/Code_page
  52 +
  53 +# -----------------------------------------------------------------------------
  54 +# REFERENCES:
  55 +# - https://docs.microsoft.com/en-gb/windows/desktop/Intl/code-page-identifiers
  56 +
  57 +
  58 +# --- IMPORTS -----------------------------------------------------------------
  59 +
  60 +import codecs
  61 +
  62 +# === CONSTANTS ===============================================================
  63 +
  64 +# Code page names from https://docs.microsoft.com/en-gb/windows/desktop/Intl/code-page-identifiers
  65 +# Retrieved on the 2018-12-13
  66 +# How it was converted to Python:
  67 +# 1) copy the table data (3 columns) from browser into Excel
  68 +# 2) use the following formula to concatenate 1st and 3rd columns: =A1 & ": " & "'" & C1 & "',"
  69 +# 3) copy from Excel into Python
  70 +
  71 +CODEPAGE_NAME = {
  72 + 37: 'IBM EBCDIC US-Canada',
  73 + 437: 'OEM United States',
  74 + 500: 'IBM EBCDIC International',
  75 + 708: 'Arabic (ASMO 708)',
  76 + 709: 'Arabic (ASMO-449+, BCON V4)',
  77 + 710: 'Arabic - Transparent Arabic',
  78 + 720: 'Arabic (Transparent ASMO); Arabic (DOS)',
  79 + 737: 'OEM Greek (formerly 437G); Greek (DOS)',
  80 + 775: 'OEM Baltic; Baltic (DOS)',
  81 + 850: 'OEM Multilingual Latin 1; Western European (DOS)',
  82 + 852: 'OEM Latin 2; Central European (DOS)',
  83 + 855: 'OEM Cyrillic (primarily Russian)',
  84 + 857: 'OEM Turkish; Turkish (DOS)',
  85 + 858: 'OEM Multilingual Latin 1 + Euro symbol',
  86 + 860: 'OEM Portuguese; Portuguese (DOS)',
  87 + 861: 'OEM Icelandic; Icelandic (DOS)',
  88 + 862: 'OEM Hebrew; Hebrew (DOS)',
  89 + 863: 'OEM French Canadian; French Canadian (DOS)',
  90 + 864: 'OEM Arabic; Arabic (864)',
  91 + 865: 'OEM Nordic; Nordic (DOS)',
  92 + 866: 'OEM Russian; Cyrillic (DOS)',
  93 + 869: 'OEM Modern Greek; Greek, Modern (DOS)',
  94 + 870: 'IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2',
  95 + 874: 'ANSI/OEM Thai (ISO 8859-11); Thai (Windows)',
  96 + 875: 'IBM EBCDIC Greek Modern',
  97 + 932: 'ANSI/OEM Japanese; Japanese (Shift-JIS)',
  98 + 936: 'ANSI/OEM Simplified Chinese (PRC, Singapore); Chinese Simplified (GB2312)',
  99 + 949: 'ANSI/OEM Korean (Unified Hangul Code)',
  100 + 950: 'ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5)',
  101 + 1026: 'IBM EBCDIC Turkish (Latin 5)',
  102 + 1047: 'IBM EBCDIC Latin 1/Open System',
  103 + 1140: 'IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro)',
  104 + 1141: 'IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro)',
  105 + 1142: 'IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro)',
  106 + 1143: 'IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro)',
  107 + 1144: 'IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro)',
  108 + 1145: 'IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro)',
  109 + 1146: 'IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro)',
  110 + 1147: 'IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro)',
  111 + 1148: 'IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro)',
  112 + 1149: 'IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro)',
  113 + 1200: 'Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications',
  114 + 1201: 'Unicode UTF-16, big endian byte order; available only to managed applications',
  115 + 1250: 'ANSI Central European; Central European (Windows)',
  116 + 1251: 'ANSI Cyrillic; Cyrillic (Windows)',
  117 + 1252: 'ANSI Latin 1; Western European (Windows)',
  118 + 1253: 'ANSI Greek; Greek (Windows)',
  119 + 1254: 'ANSI Turkish; Turkish (Windows)',
  120 + 1255: 'ANSI Hebrew; Hebrew (Windows)',
  121 + 1256: 'ANSI Arabic; Arabic (Windows)',
  122 + 1257: 'ANSI Baltic; Baltic (Windows)',
  123 + 1258: 'ANSI/OEM Vietnamese; Vietnamese (Windows)',
  124 + 1361: 'Korean (Johab)',
  125 + 10000: 'MAC Roman; Western European (Mac)',
  126 + 10001: 'Japanese (Mac)',
  127 + 10002: 'MAC Traditional Chinese (Big5); Chinese Traditional (Mac)',
  128 + 10003: 'Korean (Mac)',
  129 + 10004: 'Arabic (Mac)',
  130 + 10005: 'Hebrew (Mac)',
  131 + 10006: 'Greek (Mac)',
  132 + 10007: 'Cyrillic (Mac)',
  133 + 10008: 'MAC Simplified Chinese (GB 2312); Chinese Simplified (Mac)',
  134 + 10010: 'Romanian (Mac)',
  135 + 10017: 'Ukrainian (Mac)',
  136 + 10021: 'Thai (Mac)',
  137 + 10029: 'MAC Latin 2; Central European (Mac)',
  138 + 10079: 'Icelandic (Mac)',
  139 + 10081: 'Turkish (Mac)',
  140 + 10082: 'Croatian (Mac)',
  141 + 12000: 'Unicode UTF-32, little endian byte order; available only to managed applications',
  142 + 12001: 'Unicode UTF-32, big endian byte order; available only to managed applications',
  143 + 20000: 'CNS Taiwan; Chinese Traditional (CNS)',
  144 + 20001: 'TCA Taiwan',
  145 + 20002: 'Eten Taiwan; Chinese Traditional (Eten)',
  146 + 20003: 'IBM5550 Taiwan',
  147 + 20004: 'TeleText Taiwan',
  148 + 20005: 'Wang Taiwan',
  149 + 20105: 'IA5 (IRV International Alphabet No. 5, 7-bit); Western European (IA5)',
  150 + 20106: 'IA5 German (7-bit)',
  151 + 20107: 'IA5 Swedish (7-bit)',
  152 + 20108: 'IA5 Norwegian (7-bit)',
  153 + 20127: 'US-ASCII (7-bit)',
  154 + 20261: 'T.61',
  155 + 20269: 'ISO 6937 Non-Spacing Accent',
  156 + 20273: 'IBM EBCDIC Germany',
  157 + 20277: 'IBM EBCDIC Denmark-Norway',
  158 + 20278: 'IBM EBCDIC Finland-Sweden',
  159 + 20280: 'IBM EBCDIC Italy',
  160 + 20284: 'IBM EBCDIC Latin America-Spain',
  161 + 20285: 'IBM EBCDIC United Kingdom',
  162 + 20290: 'IBM EBCDIC Japanese Katakana Extended',
  163 + 20297: 'IBM EBCDIC France',
  164 + 20420: 'IBM EBCDIC Arabic',
  165 + 20423: 'IBM EBCDIC Greek',
  166 + 20424: 'IBM EBCDIC Hebrew',
  167 + 20833: 'IBM EBCDIC Korean Extended',
  168 + 20838: 'IBM EBCDIC Thai',
  169 + 20866: 'Russian (KOI8-R); Cyrillic (KOI8-R)',
  170 + 20871: 'IBM EBCDIC Icelandic',
  171 + 20880: 'IBM EBCDIC Cyrillic Russian',
  172 + 20905: 'IBM EBCDIC Turkish',
  173 + 20924: 'IBM EBCDIC Latin 1/Open System (1047 + Euro symbol)',
  174 + 20932: 'Japanese (JIS 0208-1990 and 0212-1990)',
  175 + 20936: 'Simplified Chinese (GB2312); Chinese Simplified (GB2312-80)',
  176 + 20949: 'Korean Wansung',
  177 + 21025: 'IBM EBCDIC Cyrillic Serbian-Bulgarian',
  178 + 21027: '(deprecated)',
  179 + 21866: 'Ukrainian (KOI8-U); Cyrillic (KOI8-U)',
  180 + 28591: 'ISO 8859-1 Latin 1; Western European (ISO)',
  181 + 28592: 'ISO 8859-2 Central European; Central European (ISO)',
  182 + 28593: 'ISO 8859-3 Latin 3',
  183 + 28594: 'ISO 8859-4 Baltic',
  184 + 28595: 'ISO 8859-5 Cyrillic',
  185 + 28596: 'ISO 8859-6 Arabic',
  186 + 28597: 'ISO 8859-7 Greek',
  187 + 28598: 'ISO 8859-8 Hebrew; Hebrew (ISO-Visual)',
  188 + 28599: 'ISO 8859-9 Turkish',
  189 + 28603: 'ISO 8859-13 Estonian',
  190 + 28605: 'ISO 8859-15 Latin 9',
  191 + 29001: 'Europa 3',
  192 + 38598: 'ISO 8859-8 Hebrew; Hebrew (ISO-Logical)',
  193 + 50220: 'ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS)',
  194 + 50221: 'ISO 2022 Japanese with halfwidth Katakana; Japanese (JIS-Allow 1 byte Kana)',
  195 + 50222: 'ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI)',
  196 + 50225: 'ISO 2022 Korean',
  197 + 50227: 'ISO 2022 Simplified Chinese; Chinese Simplified (ISO 2022)',
  198 + 50229: 'ISO 2022 Traditional Chinese',
  199 + 50930: 'EBCDIC Japanese (Katakana) Extended',
  200 + 50931: 'EBCDIC US-Canada and Japanese',
  201 + 50933: 'EBCDIC Korean Extended and Korean',
  202 + 50935: 'EBCDIC Simplified Chinese Extended and Simplified Chinese',
  203 + 50936: 'EBCDIC Simplified Chinese',
  204 + 50937: 'EBCDIC US-Canada and Traditional Chinese',
  205 + 50939: 'EBCDIC Japanese (Latin) Extended and Japanese',
  206 + 51932: 'EUC Japanese',
  207 + 51936: 'EUC Simplified Chinese; Chinese Simplified (EUC)',
  208 + 51949: 'EUC Korean',
  209 + 51950: 'EUC Traditional Chinese',
  210 + 52936: 'HZ-GB2312 Simplified Chinese; Chinese Simplified (HZ)',
  211 + 54936: 'Windows XP and later: GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030)',
  212 + 57002: 'ISCII Devanagari',
  213 + 57003: 'ISCII Bangla',
  214 + 57004: 'ISCII Tamil',
  215 + 57005: 'ISCII Telugu',
  216 + 57006: 'ISCII Assamese',
  217 + 57007: 'ISCII Odia',
  218 + 57008: 'ISCII Kannada',
  219 + 57009: 'ISCII Malayalam',
  220 + 57010: 'ISCII Gujarati',
  221 + 57011: 'ISCII Punjabi',
  222 + 65000: 'Unicode (UTF-7)',
  223 + 65001: 'Unicode (UTF-8)',
  224 +}
  225 +
  226 +
  227 +# Mapping from codepages to Python codecs, when 'cpXXX' does not work
  228 +# (inspired from http://stackoverflow.com/questions/1592925/decoding-mac-os-text-in-python)
  229 +CODEPAGE_TO_CODEC = {
  230 + 37: 'cp037',
  231 + 708: 'arabic', # not found: Arabic (ASMO 708) => arabic = iso-8859-6
  232 + 709: 'arabic', # not found: Arabic (ASMO-449+, BCON V4) => arabic = iso-8859-6
  233 + 710: 'arabic', # not found: Arabic - Transparent Arabic => arabic = iso-8859-6
  234 + 870: 'latin2', # IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2
  235 + 1047: 'latin1', # IBM EBCDIC Latin 1/Open System
  236 + 1141: 'cp273', # IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro)
  237 + 1200: 'utf_16_le', # Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications
  238 + 1201: 'utf_16_be', # Unicode UTF-16, big endian byte order; available only to managed applications
  239 +
  240 + 10000: 'mac-roman',
  241 + 10001: 'shiftjis', # not found: 'mac-shift-jis',
  242 + 10002: 'big5', # not found: 'mac-big5',
  243 + 10003: 'ascii', # nothing appropriate found: 'mac-hangul',
  244 + 10004: 'mac-arabic',
  245 + 10005: 'hebrew', # not found: 'mac-hebrew',
  246 + 10006: 'mac-greek',
  247 + #10007: 'ascii', # nothing appropriate found: 'mac-russian',
  248 + 10007: 'mac_cyrillic', # guess (from xlrd)
  249 + 10008: 'gb2312', # not found: 'mac-gb2312',
  250 + 10021: 'thai', # not found: mac-thai',
  251 + #10029: 'maccentraleurope', # not found: 'mac-east europe',
  252 + 10029: 'mac_latin2', # guess (from xlrd)
  253 + 10079: 'mac_iceland', # guess (from xlrd)
  254 + 10081: 'mac-turkish',
  255 +
  256 + 12000: 'utf_32_le', # Unicode UTF-32, little endian byte order
  257 + 12001: 'utf_32_be', # Unicode UTF-32, big endian byte order
  258 +
  259 + 20127: 'ascii',
  260 +
  261 + 28591: 'latin1',
  262 + 28592: 'iso8859_2',
  263 + 28593: 'iso8859_3',
  264 + 28594: 'iso8859_4',
  265 + 28595: 'iso8859_5',
  266 + 28596: 'iso8859_6',
  267 + 28597: 'iso8859_7',
  268 + 28598: 'iso8859_8',
  269 + 28599: 'iso8859_9',
  270 + 28603: 'iso8859_13',
  271 + 28605: 'iso8859_15',
  272 +
  273 + 32768: 'mac_roman', # from xlrd
  274 + 32769: 'cp1252', # from xlrd
  275 + 38598: 'iso8859_8',
  276 +
  277 + 65000: 'utf7',
  278 + 65001: 'utf8',
  279 +}
  280 +
  281 +
  282 +# === FUNCTIONS ==============================================================
  283 +
  284 +def codepage2codec(codepage):
  285 + """
  286 + convert a codepage number to a Python codec.
  287 + If the corresponding codec cannot be found, returns "utf8" by default.
  288 +
  289 + :param codepage: int, code page number
  290 + :return: str, Python codec name
  291 + """
  292 + if codepage in CODEPAGE_TO_CODEC:
  293 + codec = CODEPAGE_TO_CODEC[codepage]
  294 + else:
  295 + codec = 'cp%d' % codepage
  296 + try:
  297 + codecs.lookup(codec)
  298 + except LookupError:
  299 + #log.error('Codec not found for code page %d, using UTF-8 as fallback.' % codepage)
  300 + codec = 'utf8'
  301 + return codec
  302 +
  303 +
  304 +def get_codepage_name(codepage):
  305 + """
  306 + return the name of a codepage based on its number
  307 + :param codepage: int, codepage number
  308 + :return: str, codepage name
  309 + """
  310 + return CODEPAGE_NAME.get(codepage, 'Unknown code page')
  311 +
  312 +
  313 +# === MAIN: TESTS ============================================================
  314 +
  315 +if __name__ == '__main__':
  316 + for cp in sorted(CODEPAGE_NAME.keys()):
  317 + print('Code Page: %d => codec: %s - %s' % (cp, codepage2codec(cp), CODEPAGE_NAME[cp]))
0 \ No newline at end of file 318 \ No newline at end of file
oletools/common/errors.py
@@ -4,10 +4,42 @@ Errors used in several tools to avoid duplication @@ -4,10 +4,42 @@ Errors used in several tools to avoid duplication
4 .. codeauthor:: Intra2net AG <info@intra2net.com> 4 .. codeauthor:: Intra2net AG <info@intra2net.com>
5 """ 5 """
6 6
7 -class FileIsEncryptedError(ValueError): 7 +class CryptoErrorBase(ValueError):
  8 + """Base class for crypto-based exceptions."""
  9 + pass
  10 +
  11 +
  12 +class CryptoLibNotImported(CryptoErrorBase, ImportError):
  13 + """Exception thrown if msoffcrypto is needed but could not be imported."""
  14 +
  15 + def __init__(self):
  16 + super(CryptoLibNotImported, self).__init__(
  17 + 'msoffcrypto-tools is not installed. Please run "pip install msoffcrypto-tool" or see https://github.com/nolze/msoffcrypto-tool')
  18 +
  19 +
  20 +class UnsupportedEncryptionError(CryptoErrorBase):
8 """Exception thrown if file is encrypted and cannot deal with it.""" 21 """Exception thrown if file is encrypted and cannot deal with it."""
9 - # see also: same class in olevba[3] and record_base  
10 def __init__(self, filename=None): 22 def __init__(self, filename=None):
11 - super(FileIsEncryptedError, self).__init__( 23 + super(UnsupportedEncryptionError, self).__init__(
12 'Office file {}is encrypted, not yet supported' 24 'Office file {}is encrypted, not yet supported'
13 .format('' if filename is None else filename + ' ')) 25 .format('' if filename is None else filename + ' '))
  26 +
  27 +
  28 +class WrongEncryptionPassword(CryptoErrorBase):
  29 + """Exception thrown if encryption could be handled but passwords wrong."""
  30 + def __init__(self, filename=None):
  31 + super(WrongEncryptionPassword, self).__init__(
  32 + 'Given passwords could not decrypt office file{}, use option -p to specify the password'
  33 + .format('' if filename is None else ' ' + filename))
  34 +
  35 +
  36 +class MaxCryptoNestingReached(CryptoErrorBase):
  37 + """
  38 + Exception thrown if decryption is too deeply layered.
  39 +
  40 + (...or decrypt code creates inf loop)
  41 + """
  42 + def __init__(self, n_layers, filename=None):
  43 + super(MaxCryptoNestingReached, self).__init__(
  44 + 'Encountered more than {} layers of encryption for office file{}'
  45 + .format(n_layers, '' if filename is None else ' ' + filename))
oletools/common/log_helper/_json_formatter.py
@@ -13,8 +13,13 @@ class JsonFormatter(logging.Formatter): @@ -13,8 +13,13 @@ class JsonFormatter(logging.Formatter):
13 Since we don't buffer messages, we always prepend messages with a comma to make 13 Since we don't buffer messages, we always prepend messages with a comma to make
14 the output JSON-compatible. The only exception is when printing the first line, 14 the output JSON-compatible. The only exception is when printing the first line,
15 so we need to keep track of it. 15 so we need to keep track of it.
  16 +
  17 + We assume that all input comes from the OletoolsLoggerAdapter which
  18 + ensures that there is a `type` field in the record. Otherwise will have
  19 + to add a try-except around the access to `record.type`.
16 """ 20 """
17 - json_dict = dict(msg=record.msg, level=record.levelname) 21 + json_dict = dict(msg=record.msg.replace('\n', ' '), level=record.levelname)
  22 + json_dict['type'] = record.type
18 formatted_message = ' ' + json.dumps(json_dict) 23 formatted_message = ' ' + json.dumps(json_dict)
19 24
20 if self._is_first_line: 25 if self._is_first_line:
oletools/common/log_helper/_logger_adapter.py
@@ -8,18 +8,45 @@ class OletoolsLoggerAdapter(logging.LoggerAdapter): @@ -8,18 +8,45 @@ class OletoolsLoggerAdapter(logging.LoggerAdapter):
8 """ 8 """
9 _json_enabled = None 9 _json_enabled = None
10 10
11 - def print_str(self, message): 11 + def print_str(self, message, **kwargs):
12 """ 12 """
13 This function replaces normal print() calls so we can format them as JSON 13 This function replaces normal print() calls so we can format them as JSON
14 when needed or just print them right away otherwise. 14 when needed or just print them right away otherwise.
15 """ 15 """
16 if self._json_enabled and self._json_enabled(): 16 if self._json_enabled and self._json_enabled():
17 # Messages from this function should always be printed, 17 # Messages from this function should always be printed,
18 - # so when using JSON we log using the same level that set  
19 - self.log(_root_logger_wrapper.level(), message) 18 + # so when using JSON we log using the same level that set.
  19 + # Additional information in kwargs is added to LogRecord
  20 + self.log(_root_logger_wrapper.level(), message, extra=kwargs)
20 else: 21 else:
21 print(message) 22 print(message)
22 23
  24 + def log(self, lvl, msg, *args, **kwargs):
  25 + """
  26 + Run :py:meth:`process` on kwargs, then forward to actual logger.
  27 +
  28 + This is based on the logging cookbox, section "Using LoggerAdapter to
  29 + impart contextual information".
  30 + """
  31 + msg, kwargs = self.process(msg, kwargs)
  32 + self.logger.log(lvl, msg, *args, **kwargs)
  33 +
  34 + def process(self, msg, kwargs):
  35 + """
  36 + Ensure `kwargs['extra']['type']` exists, init with given arg `type`.
  37 +
  38 + The `type` field will be added to the :py:class:`logging.LogRecord` and
  39 + is used by the :py:class:`JsonFormatter`.
  40 + """
  41 + if 'extra' not in kwargs:
  42 + kwargs['extra'] = {}
  43 + if 'type' in kwargs:
  44 + kwargs['extra']['type'] = kwargs['type']
  45 + del kwargs['type'] # downstream loggers cannot deal with this
  46 + if 'type' not in kwargs['extra']:
  47 + kwargs['extra']['type'] = 'msg' # type will be added to LogRecord
  48 + return msg, kwargs
  49 +
23 def set_json_enabled_function(self, json_enabled): 50 def set_json_enabled_function(self, json_enabled):
24 """ 51 """
25 Set a function to be called to check whether JSON output is enabled. 52 Set a function to be called to check whether JSON output is enabled.
oletools/crypto.py 0 → 100644
  1 +#!/usr/bin/env python
  2 +"""
  3 +crypto.py
  4 +
  5 +Module to be used by other scripts and modules in oletools, that provides
  6 +information on encryption in OLE files.
  7 +
  8 +Uses :py:mod:`msoffcrypto-tool` to decrypt if it is available. Otherwise
  9 +decryption will fail with an ImportError.
  10 +
  11 +Encryption/Write-Protection can be realized in many different ways. They range
  12 +from setting a single flag in an otherwise unprotected file to embedding a
  13 +regular file (e.g. xlsx) in an EncryptedStream inside an OLE file. That means
  14 +that (1) that lots of bad things are accesible even if no encryption password
  15 +is known, and (2) even basic attributes like the file type can change by
  16 +decryption. Therefore I suggest the following general routine to deal with
  17 +potentially encrypted files::
  18 +
  19 + def script_main_function(input_file, passwords, crypto_nesting=0, args):
  20 + '''Wrapper around main function to deal with encrypted files.'''
  21 + initial_stuff(input_file, args)
  22 + result = None
  23 + try:
  24 + result = do_your_thing_assuming_no_encryption(input_file)
  25 + if not crypto.is_encrypted(input_file):
  26 + return result
  27 + except Exception:
  28 + if not crypto.is_encrypted(input_file):
  29 + raise
  30 + # we reach this point only if file is encrypted
  31 + # check if this is an encrypted file in an encrypted file in an ...
  32 + if crypto_nesting >= crypto.MAX_NESTING_DEPTH:
  33 + raise crypto.MaxCryptoNestingReached(crypto_nesting, filename)
  34 + decrypted_file = None
  35 + try:
  36 + decrypted_file = crypto.decrypt(input_file, passwords)
  37 + if decrypted_file is None:
  38 + raise crypto.WrongEncryptionPassword(input_file)
  39 + # might still be encrypted, so call this again recursively
  40 + result = script_main_function(decrypted_file, passwords,
  41 + crypto_nesting+1, args)
  42 + except Exception:
  43 + raise
  44 + finally: # clean up
  45 + try: # (maybe file was not yet created)
  46 + os.unlink(decrypted_file)
  47 + except Exception:
  48 + pass
  49 +
  50 +(Realized e.g. in :py:mod:`oletools.msodde`).
  51 +That means that caller code needs another wrapper around its main function. I
  52 +did try it another way first (a transparent on-demand unencrypt) but for the
  53 +above reasons I believe this is the better way. Also, non-top-level-code can
  54 +just assume that it works on unencrypted data and fail with an exception if
  55 +encrypted data makes its work impossible. No need to check `if is_encrypted()`
  56 +at the start of functions.
  57 +
  58 +.. seealso:: [MS-OFFCRYPTO]
  59 +.. seealso:: https://github.com/nolze/msoffcrypto-tool
  60 +
  61 +crypto is part of the python-oletools package:
  62 +http://www.decalage.info/python/oletools
  63 +"""
  64 +
  65 +# === LICENSE =================================================================
  66 +
  67 +# crypto is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info)
  68 +# All rights reserved.
  69 +#
  70 +# Redistribution and use in source and binary forms, with or without
  71 +# modification, are permitted provided that the following conditions are met:
  72 +#
  73 +# * Redistributions of source code must retain the above copyright notice,
  74 +# this list of conditions and the following disclaimer.
  75 +# * Redistributions in binary form must reproduce the above copyright notice,
  76 +# this list of conditions and the following disclaimer in the documentation
  77 +# and/or other materials provided with the distribution.
  78 +#
  79 +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
  80 +# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
  81 +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
  82 +# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
  83 +# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
  84 +# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
  85 +# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
  86 +# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
  87 +# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
  88 +# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
  89 +# POSSIBILITY OF SUCH DAMAGE.
  90 +
  91 +# -----------------------------------------------------------------------------
  92 +# CHANGELOG:
  93 +# 2019-02-14 v0.01 CH: - first version with encryption check from oleid
  94 +# 2019-04-01 v0.54 PL: - fixed bug in is_encrypted_ole
  95 +# 2019-05-23 PL: - added DEFAULT_PASSWORDS list
  96 +
  97 +__version__ = '0.54.2'
  98 +
  99 +import sys
  100 +import struct
  101 +import os
  102 +from os.path import splitext, isfile
  103 +from tempfile import mkstemp
  104 +import zipfile
  105 +import logging
  106 +
  107 +from olefile import OleFileIO
  108 +
  109 +try:
  110 + import msoffcrypto
  111 +except ImportError:
  112 + msoffcrypto = None
  113 +
  114 +# IMPORTANT: it should be possible to run oletools directly as scripts
  115 +# in any directory without installing them with pip or setup.py.
  116 +# In that case, relative imports are NOT usable.
  117 +# And to enable Python 2+3 compatibility, we need to use absolute imports,
  118 +# so we add the oletools parent folder to sys.path (absolute+normalized path):
  119 +_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
  120 +_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
  121 +if _parent_dir not in sys.path:
  122 + sys.path.insert(0, _parent_dir)
  123 +
  124 +from oletools.common.errors import CryptoErrorBase, WrongEncryptionPassword, \
  125 + UnsupportedEncryptionError, MaxCryptoNestingReached, CryptoLibNotImported
  126 +from oletools.common.log_helper import log_helper
  127 +
  128 +
  129 +#: if there is an encrypted file embedded in an encrypted file,
  130 +#: how deep down do we go
  131 +MAX_NESTING_DEPTH = 10
  132 +
  133 +# === LOGGING =================================================================
  134 +
  135 +# TODO: use log_helper instead
  136 +
  137 +def get_logger(name, level=logging.CRITICAL+1):
  138 + """
  139 + Create a suitable logger object for this module.
  140 + The goal is not to change settings of the root logger, to avoid getting
  141 + other modules' logs on the screen.
  142 + If a logger exists with same name, reuse it. (Else it would have duplicate
  143 + handlers and messages would be doubled.)
  144 + The level is set to CRITICAL+1 by default, to avoid any logging.
  145 + """
  146 + # First, test if there is already a logger with the same name, else it
  147 + # will generate duplicate messages (due to duplicate handlers):
  148 + if name in logging.Logger.manager.loggerDict:
  149 + # NOTE: another less intrusive but more "hackish" solution would be to
  150 + # use getLogger then test if its effective level is not default.
  151 + logger = logging.getLogger(name)
  152 + # make sure level is OK:
  153 + logger.setLevel(level)
  154 + return logger
  155 + # get a new logger:
  156 + logger = logging.getLogger(name)
  157 + # only add a NullHandler for this logger, it is up to the application
  158 + # to configure its own logging:
  159 + logger.addHandler(logging.NullHandler())
  160 + logger.setLevel(level)
  161 + return logger
  162 +
  163 +# a global logger object used for debugging:
  164 +log = get_logger('crypto')
  165 +
  166 +def enable_logging():
  167 + """
  168 + Enable logging for this module (disabled by default).
  169 + This will set the module-specific logger level to NOTSET, which
  170 + means the main application controls the actual logging level.
  171 + """
  172 + log.setLevel(logging.NOTSET)
  173 +
  174 +
  175 +def is_encrypted(some_file):
  176 + """
  177 + Determine whether document contains encrypted content.
  178 +
  179 + This should return False for documents that are just write-protected or
  180 + signed or finalized. It should return True if ANY content of the file is
  181 + encrypted and can therefore not be analyzed by other oletools modules
  182 + without given a password.
  183 +
  184 + Exception: there are way to write-protect an office document by embedding
  185 + it as encrypted stream with hard-coded standard password into an otherwise
  186 + empty OLE file. From an office user point of view, this is no encryption,
  187 + but regarding file structure this is encryption, so we return `True` for
  188 + these.
  189 +
  190 + This should not raise exceptions needlessly.
  191 +
  192 + This implementation is rather simple: it returns True if the file contains
  193 + streams with typical encryption names (c.f. [MS-OFFCRYPTO]). It does not
  194 + test whether these streams actually contain data or whether the ole file
  195 + structure contains the necessary references to these. It also checks the
  196 + "well-known property" PIDSI_DOC_SECURITY if the SummaryInformation stream
  197 + is accessible (c.f. [MS-OLEPS] 2.25.1)
  198 +
  199 + :param some_file: File name or an opened OleFileIO
  200 + :type some_file: :py:class:`olefile.OleFileIO` or `str`
  201 + :returns: True if (and only if) the file contains encrypted content
  202 + """
  203 + log.debug('is_encrypted')
  204 +
  205 + # ask msoffcrypto if possible
  206 + if check_msoffcrypto():
  207 + log.debug('Checking for encryption using msoffcrypto')
  208 + file_handle = None
  209 + file_pos = None
  210 + try:
  211 + if isinstance(some_file, OleFileIO):
  212 + # TODO: hacky, replace once msoffcrypto-tools accepts OleFileIO
  213 + file_handle = some_file.fp
  214 + file_pos = file_handle.tell()
  215 + file_handle.seek(0)
  216 + else:
  217 + file_handle = open(some_file, 'rb')
  218 +
  219 + return msoffcrypto.OfficeFile(file_handle).is_encrypted()
  220 +
  221 + except Exception as exc:
  222 + log.warning('msoffcrypto failed to interpret file {} or determine '
  223 + 'whether it is encrypted: {}'
  224 + .format(file_handle.name, exc))
  225 +
  226 + finally:
  227 + try:
  228 + if file_pos is not None: # input was OleFileIO
  229 + file_handle.seek(file_pos)
  230 + else: # input was file name
  231 + file_handle.close()
  232 + except Exception as exc:
  233 + log.warning('Ignoring error during clean up: {}'.format(exc))
  234 +
  235 + # if that failed, try ourselves with older and less accurate code
  236 + try:
  237 + if isinstance(some_file, OleFileIO):
  238 + return _is_encrypted_ole(some_file)
  239 + if zipfile.is_zipfile(some_file):
  240 + return _is_encrypted_zip(some_file)
  241 + # otherwise assume it is the name of an ole file
  242 + with OleFileIO(some_file) as ole:
  243 + return _is_encrypted_ole(ole)
  244 + except Exception as exc:
  245 + log.warning('Failed to check {} for encryption ({}); assume it is not '
  246 + 'encrypted.'.format(some_file, exc))
  247 +
  248 + return False
  249 +
  250 +
  251 +def _is_encrypted_zip(filename):
  252 + """Specialization of :py:func:`is_encrypted` for zip-based files."""
  253 + log.debug('Checking for encryption in zip file')
  254 + # TODO: distinguish OpenXML from normal zip files
  255 + # try to decrypt a few bytes from first entry
  256 + with zipfile.ZipFile(filename, 'r') as zipper:
  257 + first_entry = zipper.infolist()[0]
  258 + try:
  259 + with zipper.open(first_entry, 'r') as reader:
  260 + reader.read(min(16, first_entry.file_size))
  261 + return False
  262 + except RuntimeError as rt_err:
  263 + return 'crypt' in str(rt_err)
  264 +
  265 +
  266 +def _is_encrypted_ole(ole):
  267 + """Specialization of :py:func:`is_encrypted` for ole files."""
  268 + log.debug('Checking for encryption in OLE file')
  269 + # check well known property for password protection
  270 + # (this field may be missing for Powerpoint2000, for example)
  271 + # TODO: check whether password protection always implies encryption. Could
  272 + # write-protection or signing with password trigger this as well?
  273 + if ole.exists("\x05SummaryInformation"):
  274 + suminfo_data = ole.getproperties("\x05SummaryInformation")
  275 + if 0x13 in suminfo_data and (suminfo_data[0x13] & 1):
  276 + return True
  277 +
  278 + # check a few stream names
  279 + # TODO: check whether these actually contain data and whether other
  280 + # necessary properties exist / are set
  281 + if ole.exists('EncryptionInfo'):
  282 + log.debug('found stream EncryptionInfo')
  283 + return True
  284 + # or an encrypted ppt file
  285 + if ole.exists('EncryptedSummary') and \
  286 + not ole.exists('SummaryInformation'):
  287 + return True
  288 +
  289 + # Word-specific old encryption:
  290 + if ole.exists('WordDocument'):
  291 + # check for Word-specific encryption flag:
  292 + stream = None
  293 + try:
  294 + stream = ole.openstream(["WordDocument"])
  295 + # pass header 10 bytes
  296 + stream.read(10)
  297 + # read flag structure:
  298 + temp16 = struct.unpack("H", stream.read(2))[0]
  299 + f_encrypted = (temp16 & 0x0100) >> 8
  300 + if f_encrypted:
  301 + return True
  302 + finally:
  303 + if stream is not None:
  304 + stream.close()
  305 +
  306 + # no indication of encryption
  307 + return False
  308 +
  309 +
  310 +#: one way to achieve "write protection" in office files is to encrypt the file
  311 +#: using this password
  312 +WRITE_PROTECT_ENCRYPTION_PASSWORD = 'VelvetSweatshop'
  313 +
  314 +#: list of common passwords to be tried by default, used by malware
  315 +DEFAULT_PASSWORDS = [WRITE_PROTECT_ENCRYPTION_PASSWORD, '123', '1234', '12345', '123456', '4321']
  316 +
  317 +
  318 +def _check_msoffcrypto():
  319 + """Raise a :py:class:`CryptoLibNotImported` if msoffcrypto not imported."""
  320 + if msoffcrypto is None:
  321 + raise CryptoLibNotImported()
  322 +
  323 +
  324 +def check_msoffcrypto():
  325 + """Return `True` iff :py:mod:`msoffcrypto` could be imported."""
  326 + return msoffcrypto is not None
  327 +
  328 +
  329 +def decrypt(filename, passwords=None, **temp_file_args):
  330 + """
  331 + Try to decrypt an encrypted file
  332 +
  333 + This function tries to decrypt the given file using a given set of
  334 + passwords. If no password is given, tries the standard password for write
  335 + protection. Creates a file with decrypted data whose file name is returned.
  336 + If the decryption fails, None is returned.
  337 +
  338 + :param str filename: path to an ole file on disc
  339 + :param passwords: list/set/tuple/... of passwords or a single password or
  340 + None
  341 + :type passwords: iterable or str or None
  342 + :param temp_file_args: arguments for :py:func:`tempfile.mkstemp` e.g.,
  343 + `dirname` or `prefix`. `suffix` will default to
  344 + suffix of input `filename`, `prefix` defaults to
  345 + `oletools-decrypt-`; `text` will be ignored
  346 + :returns: name of the decrypted temporary file (type str) or `None`
  347 + :raises: :py:class:`ImportError` if :py:mod:`msoffcrypto-tools` not found
  348 + :raises: :py:class:`ValueError` if the given file is not encrypted
  349 + """
  350 + _check_msoffcrypto()
  351 +
  352 + # normalize password so we always have a list/tuple
  353 + if isinstance(passwords, str):
  354 + passwords = (passwords, )
  355 + elif not passwords:
  356 + passwords = DEFAULT_PASSWORDS
  357 +
  358 + # check temp file args
  359 + if 'prefix' not in temp_file_args:
  360 + temp_file_args['prefix'] = 'oletools-decrypt-'
  361 + if 'suffix' not in temp_file_args:
  362 + temp_file_args['suffix'] = splitext(filename)[1]
  363 + temp_file_args['text'] = False
  364 +
  365 + decrypt_file = None
  366 + with open(filename, 'rb') as reader:
  367 + try:
  368 + crypto_file = msoffcrypto.OfficeFile(reader)
  369 + except Exception as exc: # e.g. ppt, not yet supported by msoffcrypto
  370 + if 'Unrecognized file format' in str(exc):
  371 + log.debug('Caught exception', exc_info=True)
  372 +
  373 + # raise different exception without stack trace of original exc
  374 + if sys.version_info.major == 2:
  375 + raise UnsupportedEncryptionError(filename)
  376 + else:
  377 + # this is a syntax error in python 2, so wrap it in exec()
  378 + exec('raise UnsupportedEncryptionError(filename) from None')
  379 + else:
  380 + raise
  381 + if not crypto_file.is_encrypted():
  382 + raise ValueError('Given input file {} is not encrypted!'
  383 + .format(filename))
  384 +
  385 + for password in passwords:
  386 + log.debug('Trying to decrypt with password {!r}'.format(password))
  387 + write_descriptor = None
  388 + write_handle = None
  389 + decrypt_file = None
  390 + try:
  391 + crypto_file.load_key(password=password)
  392 +
  393 + # create temp file
  394 + write_descriptor, decrypt_file = mkstemp(**temp_file_args)
  395 + write_handle = os.fdopen(write_descriptor, 'wb')
  396 + write_descriptor = None # is now handled via write_handle
  397 + crypto_file.decrypt(write_handle)
  398 +
  399 + # decryption was successfull; clean up and return
  400 + write_handle.close()
  401 + write_handle = None
  402 + break
  403 + except Exception:
  404 + log.debug('Failed to decrypt', exc_info=True)
  405 +
  406 + # error-clean up: close everything and del temp file
  407 + if write_handle:
  408 + write_handle.close()
  409 + elif write_descriptor:
  410 + os.close(write_descriptor)
  411 + if decrypt_file and isfile(decrypt_file):
  412 + os.unlink(decrypt_file)
  413 + decrypt_file = None
  414 + # if we reach this, all passwords were tried without success
  415 + log.debug('All passwords failed')
  416 + return decrypt_file
oletools/doc/Home.html
@@ -16,7 +16,7 @@ @@ -16,7 +16,7 @@
16 <![endif]--> 16 <![endif]-->
17 </head> 17 </head>
18 <body> 18 <body>
19 -<h1 id="python-oletools-v0.53-documentation">python-oletools v0.53 documentation</h1> 19 +<h1 id="python-oletools-v0.54-documentation">python-oletools v0.54 documentation</h1>
20 <p>This is the home page of the documentation for python-oletools. The latest version can be found <a href="https://github.com/decalage2/oletools/wiki">online</a>, otherwise a copy is provided in the doc subfolder of the package.</p> 20 <p>This is the home page of the documentation for python-oletools. The latest version can be found <a href="https://github.com/decalage2/oletools/wiki">online</a>, otherwise a copy is provided in the doc subfolder of the package.</p>
21 <p><a href="http://www.decalage.info/python/oletools">python-oletools</a> is a package of python tools to analyze <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on the <a href="http://www.decalage.info/olefile">olefile</a> parser. See <a href="http://www.decalage.info/python/oletools" class="uri">http://www.decalage.info/python/oletools</a> for more info.</p> 21 <p><a href="http://www.decalage.info/python/oletools">python-oletools</a> is a package of python tools to analyze <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on the <a href="http://www.decalage.info/olefile">olefile</a> parser. See <a href="http://www.decalage.info/python/oletools" class="uri">http://www.decalage.info/python/oletools</a> for more info.</p>
22 <p><strong>Quick links:</strong> <a href="http://www.decalage.info/python/oletools">Home page</a> - <a href="https://github.com/decalage2/oletools/wiki/Install">Download/Install</a> - <a href="https://github.com/decalage2/oletools/wiki">Documentation</a> - <a href="https://github.com/decalage2/oletools/issues">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the Author</a> - <a href="https://github.com/decalage2/oletools">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a></p> 22 <p><strong>Quick links:</strong> <a href="http://www.decalage.info/python/oletools">Home page</a> - <a href="https://github.com/decalage2/oletools/wiki/Install">Download/Install</a> - <a href="https://github.com/decalage2/oletools/wiki">Documentation</a> - <a href="https://github.com/decalage2/oletools/issues">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the Author</a> - <a href="https://github.com/decalage2/oletools">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a></p>
oletools/doc/Home.md
1 -python-oletools v0.53 documentation 1 +python-oletools v0.54 documentation
2 =================================== 2 ===================================
3 3
4 This is the home page of the documentation for python-oletools. The latest version can be found 4 This is the home page of the documentation for python-oletools. The latest version can be found
oletools/doc/Install.html
@@ -16,28 +16,35 @@ @@ -16,28 +16,35 @@
16 <![endif]--> 16 <![endif]-->
17 </head> 17 </head>
18 <body> 18 <body>
19 -<h1 id="how-to-download-and-install-python-oletools">How to Download and Install python-oletools</h1> 19 +<h1 id="how-to-download-and-install-oletools">How to Download and Install oletools</h1>
20 <h2 id="pre-requisites">Pre-requisites</h2> 20 <h2 id="pre-requisites">Pre-requisites</h2>
21 -<p>The recommended Python version to run oletools is <strong>Python 2.7</strong>. Python 2.6 is also supported, but as it is not tested as often as 2.7, some features might not work as expected.</p>  
22 -<p>Since oletools v0.50, thanks to contributions by <span class="citation" data-cites="Sebdraven">[@Sebdraven]</span>(https://twitter.com/Sebdraven), most tools can also run with <strong>Python 3.x</strong>. As this is quite new, please <a href="(https://github.com/decalage2/oletools/issues)">report any issue</a> you may encounter.</p> 21 +<p>The recommended Python version to run oletools is the latest <strong>Python 3.x</strong> (3.7 for now). Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly recommended to switch to Python 3 now.</p>
23 <h2 id="recommended-way-to-downloadinstallupdate-oletools-pip">Recommended way to Download+Install/Update oletools: pip</h2> 22 <h2 id="recommended-way-to-downloadinstallupdate-oletools-pip">Recommended way to Download+Install/Update oletools: pip</h2>
24 <p>Pip is included with Python since version 2.7.9 and 3.4. If it is not installed on your system, either upgrade Python or see https://pip.pypa.io/en/stable/installing/</p> 23 <p>Pip is included with Python since version 2.7.9 and 3.4. If it is not installed on your system, either upgrade Python or see https://pip.pypa.io/en/stable/installing/</p>
25 <h3 id="linux-mac-osx-unix">Linux, Mac OSX, Unix</h3> 24 <h3 id="linux-mac-osx-unix">Linux, Mac OSX, Unix</h3>
26 <p>To download and install/update the latest release version of oletools, run the following command in a shell:</p> 25 <p>To download and install/update the latest release version of oletools, run the following command in a shell:</p>
27 <pre class="text"><code>sudo -H pip install -U oletools</code></pre> 26 <pre class="text"><code>sudo -H pip install -U oletools</code></pre>
  27 +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p>
28 <p><strong>Important</strong>: Since version 0.50, pip will automatically create convenient command-line scripts in /usr/local/bin to run all the oletools from any directory.</p> 28 <p><strong>Important</strong>: Since version 0.50, pip will automatically create convenient command-line scripts in /usr/local/bin to run all the oletools from any directory.</p>
29 <h3 id="windows">Windows</h3> 29 <h3 id="windows">Windows</h3>
30 <p>To download and install/update the latest release version of oletools, run the following command in a cmd window:</p> 30 <p>To download and install/update the latest release version of oletools, run the following command in a cmd window:</p>
31 <pre class="text"><code>pip install -U oletools</code></pre> 31 <pre class="text"><code>pip install -U oletools</code></pre>
  32 +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p>
  33 +<p><strong>Note</strong>: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip and install for all users. If that is not possible, you may also install only for the current user by adding the <code>--user</code> option:</p>
  34 +<pre class="text"><code>pip3 install -U --user oletools</code></pre>
32 <p><strong>Important</strong>: Since version 0.50, pip will automatically create convenient command-line scripts to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc.</p> 35 <p><strong>Important</strong>: Since version 0.50, pip will automatically create convenient command-line scripts to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc.</p>
33 <h2 id="how-to-install-the-latest-development-version">How to install the latest development version</h2> 36 <h2 id="how-to-install-the-latest-development-version">How to install the latest development version</h2>
34 <p>If you want to benefit from the latest improvements in the development version, you may also use pip:</p> 37 <p>If you want to benefit from the latest improvements in the development version, you may also use pip:</p>
35 <h3 id="linux-mac-osx-unix-1">Linux, Mac OSX, Unix</h3> 38 <h3 id="linux-mac-osx-unix-1">Linux, Mac OSX, Unix</h3>
36 <pre class="text"><code>sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip</code></pre> 39 <pre class="text"><code>sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip</code></pre>
  40 +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p>
37 <h3 id="windows-1">Windows</h3> 41 <h3 id="windows-1">Windows</h3>
38 <pre class="text"><code>pip install -U https://github.com/decalage2/oletools/archive/master.zip</code></pre> 42 <pre class="text"><code>pip install -U https://github.com/decalage2/oletools/archive/master.zip</code></pre>
  43 +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p>
  44 +<p><strong>Note</strong>: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip and install for all users. If that is not possible, you may also install only for the current user by adding the <code>--user</code> option:</p>
  45 +<pre class="text"><code>pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip</code></pre>
39 <h2 id="how-to-install-offline---computer-without-internet-access">How to install offline - Computer without Internet access</h2> 46 <h2 id="how-to-install-offline---computer-without-internet-access">How to install offline - Computer without Internet access</h2>
40 -<p>First, download the oletools archive on a computer with Internet access: * Latest stable version: from https://github.com/decalage2/oletools/releases * Development version: https://github.com/decalage2/oletools/archive/master.zip</p> 47 +<p>First, download the oletools archive on a computer with Internet access: * Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases * Development version: https://github.com/decalage2/oletools/archive/master.zip</p>
41 <p>Copy the archive file to the target computer.</p> 48 <p>Copy the archive file to the target computer.</p>
42 <p>On Linux, Mac OSX, Unix, run the following command using the filename of the archive that you downloaded:</p> 49 <p>On Linux, Mac OSX, Unix, run the following command using the filename of the archive that you downloaded:</p>
43 <pre class="text"><code>sudo -H pip install -U oletools.zip</code></pre> 50 <pre class="text"><code>sudo -H pip install -U oletools.zip</code></pre>
oletools/doc/Install.md
1 -How to Download and Install python-oletools  
2 -=========================================== 1 +How to Download and Install oletools
  2 +====================================
3 3
4 Pre-requisites 4 Pre-requisites
5 -------------- 5 --------------
6 6
7 -The recommended Python version to run oletools is **Python 2.7**.  
8 -Python 2.6 is also supported, but as it is not tested as often as 2.7, some features  
9 -might not work as expected.  
10 -  
11 -Since oletools v0.50, thanks to contributions by [@Sebdraven](https://twitter.com/Sebdraven),  
12 -most tools can also run with **Python 3.x**. As this is quite new, please  
13 -[report any issue]((https://github.com/decalage2/oletools/issues)) you may encounter.  
14 -  
15 - 7 +The recommended Python version to run oletools is the latest **Python 3.x** (3.7 for now).
  8 +Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly
  9 +recommended to switch to Python 3 now.
16 10
17 Recommended way to Download+Install/Update oletools: pip 11 Recommended way to Download+Install/Update oletools: pip
18 -------------------------------------------------------- 12 --------------------------------------------------------
@@ -29,6 +23,8 @@ run the following command in a shell: @@ -29,6 +23,8 @@ run the following command in a shell:
29 sudo -H pip install -U oletools 23 sudo -H pip install -U oletools
30 ``` 24 ```
31 25
  26 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  27 +
32 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts 28 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts
33 in /usr/local/bin to run all the oletools from any directory. 29 in /usr/local/bin to run all the oletools from any directory.
34 30
@@ -41,6 +37,16 @@ run the following command in a cmd window: @@ -41,6 +37,16 @@ run the following command in a cmd window:
41 pip install -U oletools 37 pip install -U oletools
42 ``` 38 ```
43 39
  40 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  41 +
  42 +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip
  43 +and install for all users. If that is not possible, you may also install only for the current user
  44 +by adding the `--user` option:
  45 +
  46 +```text
  47 +pip3 install -U --user oletools
  48 +```
  49 +
44 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts 50 **Important**: Since version 0.50, pip will automatically create convenient command-line scripts
45 to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc. 51 to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc.
46 52
@@ -57,17 +63,29 @@ you may also use pip: @@ -57,17 +63,29 @@ you may also use pip:
57 sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip 63 sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip
58 ``` 64 ```
59 65
  66 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  67 +
60 ### Windows 68 ### Windows
61 69
62 ```text 70 ```text
63 pip install -U https://github.com/decalage2/oletools/archive/master.zip 71 pip install -U https://github.com/decalage2/oletools/archive/master.zip
64 ``` 72 ```
65 73
  74 +Replace `pip` by `pip3` or `pip2` to install on a specific Python version.
  75 +
  76 +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip
  77 +and install for all users. If that is not possible, you may also install only for the current user
  78 +by adding the `--user` option:
  79 +
  80 +```text
  81 +pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip
  82 +```
  83 +
66 How to install offline - Computer without Internet access 84 How to install offline - Computer without Internet access
67 --------------------------------------------------------- 85 ---------------------------------------------------------
68 86
69 First, download the oletools archive on a computer with Internet access: 87 First, download the oletools archive on a computer with Internet access:
70 -* Latest stable version: from https://github.com/decalage2/oletools/releases 88 +* Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases
71 * Development version: https://github.com/decalage2/oletools/archive/master.zip 89 * Development version: https://github.com/decalage2/oletools/archive/master.zip
72 90
73 Copy the archive file to the target computer. 91 Copy the archive file to the target computer.
oletools/doc/License.html
@@ -18,7 +18,7 @@ @@ -18,7 +18,7 @@
18 <body> 18 <body>
19 <h1 id="license-for-python-oletools">License for python-oletools</h1> 19 <h1 id="license-for-python-oletools">License for python-oletools</h1>
20 <p>This license applies to the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package, apart from the thirdparty folder which contains third-party files published with their own license.</p> 20 <p>This license applies to the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package, apart from the thirdparty folder which contains third-party files published with their own license.</p>
21 -<p>The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (<a href="http://www.decalage.info" class="uri">http://www.decalage.info</a>)</p> 21 +<p>The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (<a href="http://www.decalage.info" class="uri">http://www.decalage.info</a>)</p>
22 <p>All rights reserved.</p> 22 <p>All rights reserved.</p>
23 <p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p> 23 <p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p>
24 <ul> 24 <ul>
oletools/doc/License.md
@@ -4,7 +4,7 @@ License for python-oletools @@ -4,7 +4,7 @@ License for python-oletools
4 This license applies to the [python-oletools](http://www.decalage.info/python/oletools) package, apart from the 4 This license applies to the [python-oletools](http://www.decalage.info/python/oletools) package, apart from the
5 thirdparty folder which contains third-party files published with their own license. 5 thirdparty folder which contains third-party files published with their own license.
6 6
7 -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec ([http://www.decalage.info](http://www.decalage.info)) 7 +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec ([http://www.decalage.info](http://www.decalage.info))
8 8
9 All rights reserved. 9 All rights reserved.
10 10
oletools/doc/mraptor.html
@@ -24,7 +24,7 @@ @@ -24,7 +24,7 @@
24 <p>mraptor can be used either as a command-line tool, or as a python module from your own applications.</p> 24 <p>mraptor can be used either as a command-line tool, or as a python module from your own applications.</p>
25 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p> 25 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p>
26 <h2 id="usage">Usage</h2> 26 <h2 id="usage">Usage</h2>
27 -<pre class="text"><code>Usage: mraptor.py [options] &lt;filename&gt; [filename2 ...] 27 +<pre class="text"><code>Usage: mraptor [options] &lt;filename&gt; [filename2 ...]
28 28
29 Options: 29 Options:
30 -h, --help show this help message and exit 30 -h, --help show this help message and exit
@@ -49,15 +49,15 @@ An exit code is returned based on the analysis result: @@ -49,15 +49,15 @@ An exit code is returned based on the analysis result:
49 - 20: SUSPICIOUS</code></pre> 49 - 20: SUSPICIOUS</code></pre>
50 <h3 id="examples">Examples</h3> 50 <h3 id="examples">Examples</h3>
51 <p>Scan a single file:</p> 51 <p>Scan a single file:</p>
52 -<pre class="text"><code>mraptor.py file.doc</code></pre> 52 +<pre class="text"><code>mraptor file.doc</code></pre>
53 <p>Scan a single file, stored in a Zip archive with password “infected”:</p> 53 <p>Scan a single file, stored in a Zip archive with password “infected”:</p>
54 -<pre class="text"><code>mraptor.py malicious_file.xls.zip -z infected</code></pre> 54 +<pre class="text"><code>mraptor malicious_file.xls.zip -z infected</code></pre>
55 <p>Scan a collection of files stored in a folder:</p> 55 <p>Scan a collection of files stored in a folder:</p>
56 -<pre class="text"><code>mraptor.py &quot;MalwareZoo/VBA/*&quot;</code></pre> 56 +<pre class="text"><code>mraptor &quot;MalwareZoo/VBA/*&quot;</code></pre>
57 <p><strong>Important</strong>: on Linux/MacOSX, always add double quotes around a file name when you use wildcards such as <code>*</code> and <code>?</code>. Otherwise, the shell may replace the argument with the actual list of files matching the wildcards before starting the script.</p> 57 <p><strong>Important</strong>: on Linux/MacOSX, always add double quotes around a file name when you use wildcards such as <code>*</code> and <code>?</code>. Otherwise, the shell may replace the argument with the actual list of files matching the wildcards before starting the script.</p>
58 <p><img src="mraptor1.png" /></p> 58 <p><img src="mraptor1.png" /></p>
59 <h2 id="python-3-support---mraptor3">Python 3 support - mraptor3</h2> 59 <h2 id="python-3-support---mraptor3">Python 3 support - mraptor3</h2>
60 -<p>As of v0.50, mraptor has been ported to Python 3 thanks to <span class="citation" data-cites="sebdraven">@sebdraven</span>. However, the differences between Python 2 and 3 are significant and for now there is a separate version of mraptor named mraptor3 to be used with Python 3.</p> 60 +<p>Since v0.54, mraptor is fully compatible with both Python 2 and 3. There is no need to use mraptor3 anymore, however it is still present for backward compatibility.</p>
61 <hr /> 61 <hr />
62 <h2 id="how-to-use-mraptor-in-python-applications">How to use mraptor in Python applications</h2> 62 <h2 id="how-to-use-mraptor-in-python-applications">How to use mraptor in Python applications</h2>
63 <p>TODO</p> 63 <p>TODO</p>
oletools/doc/mraptor.md
@@ -24,7 +24,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa @@ -24,7 +24,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
24 ## Usage 24 ## Usage
25 25
26 ```text 26 ```text
27 -Usage: mraptor.py [options] <filename> [filename2 ...] 27 +Usage: mraptor [options] <filename> [filename2 ...]
28 28
29 Options: 29 Options:
30 -h, --help show this help message and exit 30 -h, --help show this help message and exit
@@ -54,19 +54,19 @@ An exit code is returned based on the analysis result: @@ -54,19 +54,19 @@ An exit code is returned based on the analysis result:
54 Scan a single file: 54 Scan a single file:
55 55
56 ```text 56 ```text
57 -mraptor.py file.doc 57 +mraptor file.doc
58 ``` 58 ```
59 59
60 Scan a single file, stored in a Zip archive with password "infected": 60 Scan a single file, stored in a Zip archive with password "infected":
61 61
62 ```text 62 ```text
63 -mraptor.py malicious_file.xls.zip -z infected 63 +mraptor malicious_file.xls.zip -z infected
64 ``` 64 ```
65 65
66 Scan a collection of files stored in a folder: 66 Scan a collection of files stored in a folder:
67 67
68 ```text 68 ```text
69 -mraptor.py "MalwareZoo/VBA/*" 69 +mraptor "MalwareZoo/VBA/*"
70 ``` 70 ```
71 71
72 **Important**: on Linux/MacOSX, always add double quotes around a file name when you use 72 **Important**: on Linux/MacOSX, always add double quotes around a file name when you use
@@ -77,10 +77,8 @@ list of files matching the wildcards before starting the script. @@ -77,10 +77,8 @@ list of files matching the wildcards before starting the script.
77 77
78 ## Python 3 support - mraptor3 78 ## Python 3 support - mraptor3
79 79
80 -As of v0.50, mraptor has been ported to Python 3 thanks to @sebdraven.  
81 -However, the differences between Python 2 and 3 are significant and for now  
82 -there is a separate version of mraptor named mraptor3 to be used with  
83 -Python 3. 80 +Since v0.54, mraptor is fully compatible with both Python 2 and 3.
  81 +There is no need to use mraptor3 anymore, however it is still present for backward compatibility.
84 82
85 83
86 -------------------------------------------------------------------------- 84 --------------------------------------------------------------------------
oletools/doc/olebrowse.html
@@ -26,7 +26,7 @@ @@ -26,7 +26,7 @@
26 <p>And for Python 3:</p> 26 <p>And for Python 3:</p>
27 <pre><code>sudo apt-get install python3-tk</code></pre> 27 <pre><code>sudo apt-get install python3-tk</code></pre>
28 <h2 id="usage">Usage</h2> 28 <h2 id="usage">Usage</h2>
29 -<pre><code>olebrowse.py [file]</code></pre> 29 +<pre><code>olebrowse [file]</code></pre>
30 <p>If you provide a file it will be opened, else a dialog will allow you to browse folders to open a file. Then if it is a valid OLE file, the list of data streams will be displayed. You can select a stream, and then either view its content in a builtin hexadecimal viewer, or save it to a file for further analysis.</p> 30 <p>If you provide a file it will be opened, else a dialog will allow you to browse folders to open a file. Then if it is a valid OLE file, the list of data streams will be displayed. You can select a stream, and then either view its content in a builtin hexadecimal viewer, or save it to a file for further analysis.</p>
31 <h2 id="screenshots">Screenshots</h2> 31 <h2 id="screenshots">Screenshots</h2>
32 <p>Main menu, showing all streams in the OLE file:</p> 32 <p>Main menu, showing all streams in the OLE file:</p>
oletools/doc/olebrowse.md
@@ -30,9 +30,9 @@ sudo apt-get install python3-tk @@ -30,9 +30,9 @@ sudo apt-get install python3-tk
30 30
31 Usage 31 Usage
32 ----- 32 -----
33 -  
34 - olebrowse.py [file]  
35 - 33 +```
  34 +olebrowse [file]
  35 +```
36 If you provide a file it will be opened, else a dialog will allow you to browse 36 If you provide a file it will be opened, else a dialog will allow you to browse
37 folders to open a file. Then if it is a valid OLE file, the list of data streams 37 folders to open a file. Then if it is a valid OLE file, the list of data streams
38 will be displayed. You can select a stream, and then either view its content 38 will be displayed. You can select a stream, and then either view its content
oletools/doc/oledir.html
@@ -21,10 +21,21 @@ @@ -21,10 +21,21 @@
21 <p>It can be used either as a command-line tool, or as a python module from your own applications.</p> 21 <p>It can be used either as a command-line tool, or as a python module from your own applications.</p>
22 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p> 22 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p>
23 <h2 id="usage">Usage</h2> 23 <h2 id="usage">Usage</h2>
24 -<pre class="text"><code>Usage: oledir.py &lt;filename&gt;</code></pre> 24 +<pre class="text"><code>Usage: oledir [options] &lt;filename&gt; [filename2 ...]
  25 +
  26 +Options:
  27 + -h, --help show this help message and exit
  28 + -r find files recursively in subdirectories.
  29 + -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
  30 + if the file is a zip archive, open all files from it,
  31 + using the provided password (requires Python 2.6+)
  32 + -f ZIP_FNAME, --zipfname=ZIP_FNAME
  33 + if the file is a zip archive, file(s) to be opened
  34 + within the zip. Wildcards * and ? are supported.
  35 + (default:*)</code></pre>
25 <h3 id="examples">Examples</h3> 36 <h3 id="examples">Examples</h3>
26 <p>Scan a single file:</p> 37 <p>Scan a single file:</p>
27 -<pre class="text"><code>oledir.py file.doc</code></pre> 38 +<pre class="text"><code>oledir file.doc</code></pre>
28 <p><img src="oledir.png" /></p> 39 <p><img src="oledir.png" /></p>
29 <hr /> 40 <hr />
30 <h2 id="how-to-use-oledir-in-python-applications">How to use oledir in Python applications</h2> 41 <h2 id="how-to-use-oledir-in-python-applications">How to use oledir in Python applications</h2>
oletools/doc/oledir.md
@@ -11,7 +11,18 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa @@ -11,7 +11,18 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
11 ## Usage 11 ## Usage
12 12
13 ```text 13 ```text
14 -Usage: oledir.py <filename> 14 +Usage: oledir [options] <filename> [filename2 ...]
  15 +
  16 +Options:
  17 + -h, --help show this help message and exit
  18 + -r find files recursively in subdirectories.
  19 + -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
  20 + if the file is a zip archive, open all files from it,
  21 + using the provided password (requires Python 2.6+)
  22 + -f ZIP_FNAME, --zipfname=ZIP_FNAME
  23 + if the file is a zip archive, file(s) to be opened
  24 + within the zip. Wildcards * and ? are supported.
  25 + (default:*)
15 ``` 26 ```
16 27
17 ### Examples 28 ### Examples
@@ -19,7 +30,7 @@ Usage: oledir.py &lt;filename&gt; @@ -19,7 +30,7 @@ Usage: oledir.py &lt;filename&gt;
19 Scan a single file: 30 Scan a single file:
20 31
21 ```text 32 ```text
22 -oledir.py file.doc 33 +oledir file.doc
23 ``` 34 ```
24 35
25 ![](oledir.png) 36 ![](oledir.png)
oletools/doc/oleid.html
@@ -107,10 +107,10 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni @@ -107,10 +107,10 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
107 <li>CSV output</li> 107 <li>CSV output</li>
108 </ul> 108 </ul>
109 <h2 id="usage">Usage</h2> 109 <h2 id="usage">Usage</h2>
110 -<pre class="text"><code>oleid.py &lt;file&gt;</code></pre> 110 +<pre class="text"><code>oleid &lt;file&gt;</code></pre>
111 <h3 id="example">Example</h3> 111 <h3 id="example">Example</h3>
112 <p>Analyzing a Word document containing a Flash object and VBA macros:</p> 112 <p>Analyzing a Word document containing a Flash object and VBA macros:</p>
113 -<pre class="text"><code>C:\oletools&gt;oleid.py word_flash_vba.doc 113 +<pre class="text"><code>C:\oletools&gt;oleid word_flash_vba.doc
114 114
115 Filename: word_flash_vba.doc 115 Filename: word_flash_vba.doc
116 +-------------------------------+-----------------------+ 116 +-------------------------------+-----------------------+
oletools/doc/oleid.md
@@ -32,7 +32,7 @@ Planned improvements: @@ -32,7 +32,7 @@ Planned improvements:
32 ## Usage 32 ## Usage
33 33
34 ```text 34 ```text
35 -oleid.py <file> 35 +oleid <file>
36 ``` 36 ```
37 37
38 ### Example 38 ### Example
@@ -40,7 +40,7 @@ oleid.py &lt;file&gt; @@ -40,7 +40,7 @@ oleid.py &lt;file&gt;
40 Analyzing a Word document containing a Flash object and VBA macros: 40 Analyzing a Word document containing a Flash object and VBA macros:
41 41
42 ```text 42 ```text
43 -C:\oletools>oleid.py word_flash_vba.doc 43 +C:\oletools>oleid word_flash_vba.doc
44 44
45 Filename: word_flash_vba.doc 45 Filename: word_flash_vba.doc
46 +-------------------------------+-----------------------+ 46 +-------------------------------+-----------------------+
oletools/doc/olemap.html
@@ -21,10 +21,10 @@ @@ -21,10 +21,10 @@
21 <p>It can be used either as a command-line tool, or as a python module from your own applications.</p> 21 <p>It can be used either as a command-line tool, or as a python module from your own applications.</p>
22 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p> 22 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p>
23 <h2 id="usage">Usage</h2> 23 <h2 id="usage">Usage</h2>
24 -<pre class="text"><code>Usage: olemap.py &lt;filename&gt;</code></pre> 24 +<pre class="text"><code>Usage: olemap &lt;filename&gt;</code></pre>
25 <h3 id="examples">Examples</h3> 25 <h3 id="examples">Examples</h3>
26 <p>Scan a single file:</p> 26 <p>Scan a single file:</p>
27 -<pre class="text"><code>olemap.py file.doc</code></pre> 27 +<pre class="text"><code>olemap file.doc</code></pre>
28 <p><img src="olemap1.png" /></p> 28 <p><img src="olemap1.png" /></p>
29 <p><img src="olemap2.png" /></p> 29 <p><img src="olemap2.png" /></p>
30 <hr /> 30 <hr />
oletools/doc/olemap.md
@@ -10,7 +10,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa @@ -10,7 +10,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
10 ## Usage 10 ## Usage
11 11
12 ```text 12 ```text
13 -Usage: olemap.py <filename> 13 +Usage: olemap <filename>
14 ``` 14 ```
15 15
16 ### Examples 16 ### Examples
@@ -18,7 +18,7 @@ Usage: olemap.py &lt;filename&gt; @@ -18,7 +18,7 @@ Usage: olemap.py &lt;filename&gt;
18 Scan a single file: 18 Scan a single file:
19 19
20 ```text 20 ```text
21 -olemap.py file.doc 21 +olemap file.doc
22 ``` 22 ```
23 23
24 ![](olemap1.png) 24 ![](olemap1.png)
oletools/doc/olemeta.html
@@ -20,7 +20,7 @@ @@ -20,7 +20,7 @@
20 <p>olemeta is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract all standard properties present in the OLE file.</p> 20 <p>olemeta is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract all standard properties present in the OLE file.</p>
21 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p> 21 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p>
22 <h2 id="usage">Usage</h2> 22 <h2 id="usage">Usage</h2>
23 -<pre class="text"><code>olemeta.py &lt;file&gt;</code></pre> 23 +<pre class="text"><code>olemeta &lt;file&gt;</code></pre>
24 <h3 id="example">Example</h3> 24 <h3 id="example">Example</h3>
25 <p><img src="olemeta1.png" /></p> 25 <p><img src="olemeta1.png" /></p>
26 <h2 id="how-to-use-olemeta-in-python-applications">How to use olemeta in Python applications</h2> 26 <h2 id="how-to-use-olemeta-in-python-applications">How to use olemeta in Python applications</h2>
oletools/doc/olemeta.md
@@ -9,7 +9,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa @@ -9,7 +9,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
9 ## Usage 9 ## Usage
10 10
11 ```text 11 ```text
12 -olemeta.py <file> 12 +olemeta <file>
13 ``` 13 ```
14 14
15 ### Example 15 ### Example
oletools/doc/oletimes.html
@@ -20,10 +20,10 @@ @@ -20,10 +20,10 @@
20 <p>oletimes is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract creation and modification times of all streams and storages in the OLE file.</p> 20 <p>oletimes is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract creation and modification times of all streams and storages in the OLE file.</p>
21 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p> 21 <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p>
22 <h2 id="usage">Usage</h2> 22 <h2 id="usage">Usage</h2>
23 -<pre class="text"><code>oletimes.py &lt;file&gt;</code></pre> 23 +<pre class="text"><code>oletimes &lt;file&gt;</code></pre>
24 <h3 id="example">Example</h3> 24 <h3 id="example">Example</h3>
25 <p>Checking the malware sample <a href="https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/">DIAN_caso-5415.doc</a>:</p> 25 <p>Checking the malware sample <a href="https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/">DIAN_caso-5415.doc</a>:</p>
26 -<pre class="text"><code>&gt;oletimes.py DIAN_caso-5415.doc 26 +<pre class="text"><code>&gt;oletimes DIAN_caso-5415.doc
27 27
28 +----------------------------+---------------------+---------------------+ 28 +----------------------------+---------------------+---------------------+
29 | Stream/Storage name | Modification Time | Creation Time | 29 | Stream/Storage name | Modification Time | Creation Time |
oletools/doc/oletimes.md
@@ -10,7 +10,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa @@ -10,7 +10,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa
10 ## Usage 10 ## Usage
11 11
12 ```text 12 ```text
13 -oletimes.py <file> 13 +oletimes <file>
14 ``` 14 ```
15 15
16 ### Example 16 ### Example
@@ -18,7 +18,7 @@ oletimes.py &lt;file&gt; @@ -18,7 +18,7 @@ oletimes.py &lt;file&gt;
18 Checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/): 18 Checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/):
19 19
20 ```text 20 ```text
21 ->oletimes.py DIAN_caso-5415.doc 21 +>oletimes DIAN_caso-5415.doc
22 22
23 +----------------------------+---------------------+---------------------+ 23 +----------------------------+---------------------+---------------------+
24 | Stream/Storage name | Modification Time | Creation Time | 24 | Stream/Storage name | Modification Time | Creation Time |
oletools/doc/olevba.html
@@ -127,56 +127,65 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni @@ -127,56 +127,65 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni
127 <li>olevba scans the macro source code and the deobfuscated strings to find suspicious keywords, auto-executable macros and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames, etc).</li> 127 <li>olevba scans the macro source code and the deobfuscated strings to find suspicious keywords, auto-executable macros and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames, etc).</li>
128 </ol> 128 </ol>
129 <h2 id="usage">Usage</h2> 129 <h2 id="usage">Usage</h2>
130 -<pre class="text"><code>Usage: olevba.py [options] &lt;filename&gt; [filename2 ...]  
131 - 130 +<pre class="text"><code>Usage: olevba [options] &lt;filename&gt; [filename2 ...]
  131 +
132 Options: 132 Options:
133 -h, --help show this help message and exit 133 -h, --help show this help message and exit
134 -r find files recursively in subdirectories. 134 -r find files recursively in subdirectories.
135 -z ZIP_PASSWORD, --zip=ZIP_PASSWORD 135 -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
136 if the file is a zip archive, open all files from it, 136 if the file is a zip archive, open all files from it,
137 - using the provided password (requires Python 2.6+) 137 + using the provided password.
  138 + -p PASSWORD, --password=PASSWORD
  139 + if encrypted office files are encountered, try
  140 + decryption with this password. May be repeated.
138 -f ZIP_FNAME, --zipfname=ZIP_FNAME 141 -f ZIP_FNAME, --zipfname=ZIP_FNAME
139 if the file is a zip archive, file(s) to be opened 142 if the file is a zip archive, file(s) to be opened
140 within the zip. Wildcards * and ? are supported. 143 within the zip. Wildcards * and ? are supported.
141 (default:*) 144 (default:*)
142 - -t, --triage triage mode, display results as a summary table  
143 - (default for multiple files)  
144 - -d, --detailed detailed mode, display full results (default for  
145 - single file)  
146 -a, --analysis display only analysis results, not the macro source 145 -a, --analysis display only analysis results, not the macro source
147 code 146 code
148 -c, --code display only VBA source code, do not analyze it 147 -c, --code display only VBA source code, do not analyze it
149 - -i INPUT, --input=INPUT  
150 - input file containing VBA source code to be analyzed  
151 - (no parsing)  
152 --decode display all the obfuscated strings with their decoded 148 --decode display all the obfuscated strings with their decoded
153 content (Hex, Base64, StrReverse, Dridex, VBA). 149 content (Hex, Base64, StrReverse, Dridex, VBA).
154 --attr display the attribute lines at the beginning of VBA 150 --attr display the attribute lines at the beginning of VBA
155 source code 151 source code
156 --reveal display the macro source code after replacing all the 152 --reveal display the macro source code after replacing all the
157 - obfuscated strings by their decoded content.</code></pre> 153 + obfuscated strings by their decoded content.
  154 + -l LOGLEVEL, --loglevel=LOGLEVEL
  155 + logging level debug/info/warning/error/critical
  156 + (default=warning)
  157 + --deobf Attempt to deobfuscate VBA expressions (slow)
  158 + --relaxed Do not raise errors if opening of substream fails
  159 +
  160 + Output mode (mutually exclusive):
  161 + -t, --triage triage mode, display results as a summary table
  162 + (default for multiple files)
  163 + -d, --detailed detailed mode, display full results (default for
  164 + single file)
  165 + -j, --json json mode, detailed in json format (never default)</code></pre>
  166 +<p><strong>New in v0.54:</strong> the -p option can now be used to decrypt encrypted documents using the provided password(s).</p>
158 <h3 id="examples">Examples</h3> 167 <h3 id="examples">Examples</h3>
159 <p>Scan a single file:</p> 168 <p>Scan a single file:</p>
160 -<pre class="text"><code>olevba.py file.doc</code></pre> 169 +<pre class="text"><code>olevba file.doc</code></pre>
161 <p>Scan a single file, stored in a Zip archive with password “infected”:</p> 170 <p>Scan a single file, stored in a Zip archive with password “infected”:</p>
162 -<pre class="text"><code>olevba.py malicious_file.xls.zip -z infected</code></pre> 171 +<pre class="text"><code>olevba malicious_file.xls.zip -z infected</code></pre>
163 <p>Scan a single file, showing all obfuscated strings decoded:</p> 172 <p>Scan a single file, showing all obfuscated strings decoded:</p>
164 -<pre class="text"><code>olevba.py file.doc --decode</code></pre> 173 +<pre class="text"><code>olevba file.doc --decode</code></pre>
165 <p>Scan a single file, showing the macro source code with VBA strings deobfuscated:</p> 174 <p>Scan a single file, showing the macro source code with VBA strings deobfuscated:</p>
166 -<pre class="text"><code>olevba.py file.doc --reveal</code></pre> 175 +<pre class="text"><code>olevba file.doc --reveal</code></pre>
167 <p>Scan VBA source code extracted into a text file:</p> 176 <p>Scan VBA source code extracted into a text file:</p>
168 -<pre class="text"><code>olevba.py source_code.vba</code></pre> 177 +<pre class="text"><code>olevba source_code.vba</code></pre>
169 <p>Scan a collection of files stored in a folder:</p> 178 <p>Scan a collection of files stored in a folder:</p>
170 -<pre class="text"><code>olevba.py &quot;MalwareZoo/VBA/*&quot;</code></pre> 179 +<pre class="text"><code>olevba &quot;MalwareZoo/VBA/*&quot;</code></pre>
171 <p>NOTE: On Linux, MacOSX and other Unix variants, it is required to add double quotes around wildcards. Otherwise, they will be expanded by the shell instead of olevba.</p> 180 <p>NOTE: On Linux, MacOSX and other Unix variants, it is required to add double quotes around wildcards. Otherwise, they will be expanded by the shell instead of olevba.</p>
172 <p>Scan all .doc and .xls files, recursively in all subfolders:</p> 181 <p>Scan all .doc and .xls files, recursively in all subfolders:</p>
173 -<pre class="text"><code>olevba.py &quot;MalwareZoo/VBA/*.doc&quot; &quot;MalwareZoo/VBA/*.xls&quot; -r</code></pre> 182 +<pre class="text"><code>olevba &quot;MalwareZoo/VBA/*.doc&quot; &quot;MalwareZoo/VBA/*.xls&quot; -r</code></pre>
174 <p>Scan all .doc files within all .zip files with password, recursively:</p> 183 <p>Scan all .doc files within all .zip files with password, recursively:</p>
175 -<pre class="text"><code>olevba.py &quot;MalwareZoo/VBA/*.zip&quot; -r -z infected -f &quot;*.doc&quot;</code></pre> 184 +<pre class="text"><code>olevba &quot;MalwareZoo/VBA/*.zip&quot; -r -z infected -f &quot;*.doc&quot;</code></pre>
176 <h3 id="detailed-analysis-mode-default-for-single-file">Detailed analysis mode (default for single file)</h3> 185 <h3 id="detailed-analysis-mode-default-for-single-file">Detailed analysis mode (default for single file)</h3>
177 <p>When a single file is scanned, or when using the option -d, all details of the analysis are displayed.</p> 186 <p>When a single file is scanned, or when using the option -d, all details of the analysis are displayed.</p>
178 <p>For example, checking the malware sample <a href="https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/">DIAN_caso-5415.doc</a>:</p> 187 <p>For example, checking the malware sample <a href="https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/">DIAN_caso-5415.doc</a>:</p>
179 -<pre class="text"><code>&gt;olevba.py c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected 188 +<pre class="text"><code>&gt;olevba c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
180 =============================================================================== 189 ===============================================================================
181 FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip 190 FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip
182 Type: OLE 191 Type: OLE
@@ -246,7 +255,7 @@ ANALYSIS: @@ -246,7 +255,7 @@ ANALYSIS:
246 <li><strong>V</strong>: VBA string expressions (potential obfuscation)</li> 255 <li><strong>V</strong>: VBA string expressions (potential obfuscation)</li>
247 </ul> 256 </ul>
248 <p>Here is an example:</p> 257 <p>Here is an example:</p>
249 -<pre class="text"><code>c:\&gt;olevba.py \MalwareZoo\VBA\samples\* 258 +<pre class="text"><code>c:\&gt;olevba \MalwareZoo\VBA\samples\*
250 Flags Filename 259 Flags Filename
251 ----------- ----------------------------------------------------------------- 260 ----------- -----------------------------------------------------------------
252 OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware 261 OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware
@@ -266,7 +275,7 @@ OpX:MASI--- \MalwareZoo\VBA\samples\RottenKitten.xlsb.malware @@ -266,7 +275,7 @@ OpX:MASI--- \MalwareZoo\VBA\samples\RottenKitten.xlsb.malware
266 OLE:MASI-B- \MalwareZoo\VBA\samples\ROVNIX.doc.malware 275 OLE:MASI-B- \MalwareZoo\VBA\samples\ROVNIX.doc.malware
267 OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc</code></pre> 276 OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc</code></pre>
268 <h2 id="python-3-support---olevba3">Python 3 support - olevba3</h2> 277 <h2 id="python-3-support---olevba3">Python 3 support - olevba3</h2>
269 -<p>As of v0.50, olevba has been ported to Python 3 thanks to <span class="citation" data-cites="sebdraven">@sebdraven</span>. However, the differences between Python 2 and 3 are significant and for now there is a separate version of olevba named olevba3 to be used with Python 3.</p> 278 +<p>Since v0.54, olevba is fully compatible with both Python 2 and 3. There is no need to use olevba3 anymore, however it is still present for backward compatibility.</p>
270 <hr /> 279 <hr />
271 <h2 id="how-to-use-olevba-in-python-applications">How to use olevba in Python applications</h2> 280 <h2 id="how-to-use-olevba-in-python-applications">How to use olevba in Python applications</h2>
272 <p>olevba may be used to open a MS Office file, detect if it contains VBA macros, extract and analyze the VBA source code from your own python applications.</p> 281 <p>olevba may be used to open a MS Office file, detect if it contains VBA macros, extract and analyze the VBA source code from your own python applications.</p>
oletools/doc/olevba.md
@@ -67,85 +67,95 @@ and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames, @@ -67,85 +67,95 @@ and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames,
67 ## Usage 67 ## Usage
68 68
69 ```text 69 ```text
70 -Usage: olevba.py [options] <filename> [filename2 ...]  
71 - 70 +Usage: olevba [options] <filename> [filename2 ...]
  71 +
72 Options: 72 Options:
73 -h, --help show this help message and exit 73 -h, --help show this help message and exit
74 -r find files recursively in subdirectories. 74 -r find files recursively in subdirectories.
75 -z ZIP_PASSWORD, --zip=ZIP_PASSWORD 75 -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
76 if the file is a zip archive, open all files from it, 76 if the file is a zip archive, open all files from it,
77 - using the provided password (requires Python 2.6+) 77 + using the provided password.
  78 + -p PASSWORD, --password=PASSWORD
  79 + if encrypted office files are encountered, try
  80 + decryption with this password. May be repeated.
78 -f ZIP_FNAME, --zipfname=ZIP_FNAME 81 -f ZIP_FNAME, --zipfname=ZIP_FNAME
79 if the file is a zip archive, file(s) to be opened 82 if the file is a zip archive, file(s) to be opened
80 within the zip. Wildcards * and ? are supported. 83 within the zip. Wildcards * and ? are supported.
81 (default:*) 84 (default:*)
82 - -t, --triage triage mode, display results as a summary table  
83 - (default for multiple files)  
84 - -d, --detailed detailed mode, display full results (default for  
85 - single file)  
86 -a, --analysis display only analysis results, not the macro source 85 -a, --analysis display only analysis results, not the macro source
87 code 86 code
88 -c, --code display only VBA source code, do not analyze it 87 -c, --code display only VBA source code, do not analyze it
89 - -i INPUT, --input=INPUT  
90 - input file containing VBA source code to be analyzed  
91 - (no parsing)  
92 --decode display all the obfuscated strings with their decoded 88 --decode display all the obfuscated strings with their decoded
93 content (Hex, Base64, StrReverse, Dridex, VBA). 89 content (Hex, Base64, StrReverse, Dridex, VBA).
94 --attr display the attribute lines at the beginning of VBA 90 --attr display the attribute lines at the beginning of VBA
95 source code 91 source code
96 --reveal display the macro source code after replacing all the 92 --reveal display the macro source code after replacing all the
97 obfuscated strings by their decoded content. 93 obfuscated strings by their decoded content.
  94 + -l LOGLEVEL, --loglevel=LOGLEVEL
  95 + logging level debug/info/warning/error/critical
  96 + (default=warning)
  97 + --deobf Attempt to deobfuscate VBA expressions (slow)
  98 + --relaxed Do not raise errors if opening of substream fails
  99 +
  100 + Output mode (mutually exclusive):
  101 + -t, --triage triage mode, display results as a summary table
  102 + (default for multiple files)
  103 + -d, --detailed detailed mode, display full results (default for
  104 + single file)
  105 + -j, --json json mode, detailed in json format (never default)
98 ``` 106 ```
99 107
  108 +**New in v0.54:** the -p option can now be used to decrypt encrypted documents using the provided password(s).
  109 +
100 ### Examples 110 ### Examples
101 111
102 Scan a single file: 112 Scan a single file:
103 113
104 ```text 114 ```text
105 -olevba.py file.doc 115 +olevba file.doc
106 ``` 116 ```
107 117
108 Scan a single file, stored in a Zip archive with password "infected": 118 Scan a single file, stored in a Zip archive with password "infected":
109 119
110 ```text 120 ```text
111 -olevba.py malicious_file.xls.zip -z infected 121 +olevba malicious_file.xls.zip -z infected
112 ``` 122 ```
113 123
114 Scan a single file, showing all obfuscated strings decoded: 124 Scan a single file, showing all obfuscated strings decoded:
115 125
116 ```text 126 ```text
117 -olevba.py file.doc --decode 127 +olevba file.doc --decode
118 ``` 128 ```
119 129
120 Scan a single file, showing the macro source code with VBA strings deobfuscated: 130 Scan a single file, showing the macro source code with VBA strings deobfuscated:
121 131
122 ```text 132 ```text
123 -olevba.py file.doc --reveal 133 +olevba file.doc --reveal
124 ``` 134 ```
125 135
126 Scan VBA source code extracted into a text file: 136 Scan VBA source code extracted into a text file:
127 137
128 ```text 138 ```text
129 -olevba.py source_code.vba 139 +olevba source_code.vba
130 ``` 140 ```
131 141
132 Scan a collection of files stored in a folder: 142 Scan a collection of files stored in a folder:
133 143
134 ```text 144 ```text
135 -olevba.py "MalwareZoo/VBA/*" 145 +olevba "MalwareZoo/VBA/*"
136 ``` 146 ```
137 NOTE: On Linux, MacOSX and other Unix variants, it is required to add double quotes around wildcards. Otherwise, they will be expanded by the shell instead of olevba. 147 NOTE: On Linux, MacOSX and other Unix variants, it is required to add double quotes around wildcards. Otherwise, they will be expanded by the shell instead of olevba.
138 148
139 Scan all .doc and .xls files, recursively in all subfolders: 149 Scan all .doc and .xls files, recursively in all subfolders:
140 150
141 ```text 151 ```text
142 -olevba.py "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r 152 +olevba "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r
143 ``` 153 ```
144 154
145 Scan all .doc files within all .zip files with password, recursively: 155 Scan all .doc files within all .zip files with password, recursively:
146 156
147 ```text 157 ```text
148 -olevba.py "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc" 158 +olevba "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc"
149 ``` 159 ```
150 160
151 161
@@ -156,7 +166,7 @@ When a single file is scanned, or when using the option -d, all details of the a @@ -156,7 +166,7 @@ When a single file is scanned, or when using the option -d, all details of the a
156 For example, checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/): 166 For example, checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/):
157 167
158 ```text 168 ```text
159 ->olevba.py c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected 169 +>olevba c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
160 =============================================================================== 170 ===============================================================================
161 FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip 171 FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip
162 Type: OLE 172 Type: OLE
@@ -233,7 +243,7 @@ The following flags show the results of the analysis: @@ -233,7 +243,7 @@ The following flags show the results of the analysis:
233 Here is an example: 243 Here is an example:
234 244
235 ```text 245 ```text
236 -c:\>olevba.py \MalwareZoo\VBA\samples\* 246 +c:\>olevba \MalwareZoo\VBA\samples\*
237 Flags Filename 247 Flags Filename
238 ----------- ----------------------------------------------------------------- 248 ----------- -----------------------------------------------------------------
239 OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware 249 OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware
@@ -256,10 +266,9 @@ OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc @@ -256,10 +266,9 @@ OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc
256 266
257 ## Python 3 support - olevba3 267 ## Python 3 support - olevba3
258 268
259 -As of v0.50, olevba has been ported to Python 3 thanks to @sebdraven.  
260 -However, the differences between Python 2 and 3 are significant and for now  
261 -there is a separate version of olevba named olevba3 to be used with  
262 -Python 3. 269 +Since v0.54, olevba is fully compatible with both Python 2 and 3.
  270 +There is no need to use olevba3 anymore, however it is still present for backward compatibility.
  271 +
263 272
264 -------------------------------------------------------------------------- 273 --------------------------------------------------------------------------
265 274
oletools/doc/pyxswf.html
@@ -24,7 +24,7 @@ @@ -24,7 +24,7 @@
24 <p>It can also extract Flash objects from RTF documents, by parsing embedded objects encoded in hexadecimal format (-f option).</p> 24 <p>It can also extract Flash objects from RTF documents, by parsing embedded objects encoded in hexadecimal format (-f option).</p>
25 <p>For this, simply add the -o option to work on OLE streams rather than raw files, or the -f option to work on RTF files.</p> 25 <p>For this, simply add the -o option to work on OLE streams rather than raw files, or the -f option to work on RTF files.</p>
26 <h2 id="usage">Usage</h2> 26 <h2 id="usage">Usage</h2>
27 -<pre class="text"><code>Usage: pyxswf.py [options] &lt;file.bad&gt; 27 +<pre class="text"><code>Usage: pyxswf [options] &lt;file.bad&gt;
28 28
29 Options: 29 Options:
30 -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF 30 -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF
@@ -46,18 +46,18 @@ Options: @@ -46,18 +46,18 @@ Options:
46 contain SWFs. Must provide path in quotes 46 contain SWFs. Must provide path in quotes
47 -c, --compress Compresses the SWF using Zlib</code></pre> 47 -c, --compress Compresses the SWF using Zlib</code></pre>
48 <h3 id="example-1---detecting-and-extracting-a-swf-file-from-a-word-document-on-windows">Example 1 - detecting and extracting a SWF file from a Word document on Windows:</h3> 48 <h3 id="example-1---detecting-and-extracting-a-swf-file-from-a-word-document-on-windows">Example 1 - detecting and extracting a SWF file from a Word document on Windows:</h3>
49 -<pre class="text"><code>C:\oletools&gt;pyxswf.py -o word_flash.doc 49 +<pre class="text"><code>C:\oletools&gt;pyxswf -o word_flash.doc
50 OLE stream: &#39;Contents&#39; 50 OLE stream: &#39;Contents&#39;
51 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents 51 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
52 [ADDR] SWF 1 at 0x8 - FWS Header 52 [ADDR] SWF 1 at 0x8 - FWS Header
53 53
54 -C:\oletools&gt;pyxswf.py -xo word_flash.doc 54 +C:\oletools&gt;pyxswf -xo word_flash.doc
55 OLE stream: &#39;Contents&#39; 55 OLE stream: &#39;Contents&#39;
56 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents 56 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
57 [ADDR] SWF 1 at 0x8 - FWS Header 57 [ADDR] SWF 1 at 0x8 - FWS Header
58 [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf</code></pre> 58 [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf</code></pre>
59 <h3 id="example-2---detecting-and-extracting-a-swf-file-from-a-rtf-document-on-windows">Example 2 - detecting and extracting a SWF file from a RTF document on Windows:</h3> 59 <h3 id="example-2---detecting-and-extracting-a-swf-file-from-a-rtf-document-on-windows">Example 2 - detecting and extracting a SWF file from a RTF document on Windows:</h3>
60 -<pre class="text"><code>C:\oletools&gt;pyxswf.py -xf &quot;rtf_flash.rtf&quot; 60 +<pre class="text"><code>C:\oletools&gt;pyxswf -xf &quot;rtf_flash.rtf&quot;
61 RTF embedded object size 1498557 at index 000036DD 61 RTF embedded object size 1498557 at index 000036DD
62 [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0 62 [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
63 00036DD 63 00036DD
oletools/doc/pyxswf.md
@@ -21,7 +21,7 @@ For this, simply add the -o option to work on OLE streams rather than raw files, @@ -21,7 +21,7 @@ For this, simply add the -o option to work on OLE streams rather than raw files,
21 ## Usage 21 ## Usage
22 22
23 ```text 23 ```text
24 -Usage: pyxswf.py [options] <file.bad> 24 +Usage: pyxswf [options] <file.bad>
25 25
26 Options: 26 Options:
27 -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF 27 -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF
@@ -47,12 +47,12 @@ Options: @@ -47,12 +47,12 @@ Options:
47 ### Example 1 - detecting and extracting a SWF file from a Word document on Windows: 47 ### Example 1 - detecting and extracting a SWF file from a Word document on Windows:
48 48
49 ```text 49 ```text
50 -C:\oletools>pyxswf.py -o word_flash.doc 50 +C:\oletools>pyxswf -o word_flash.doc
51 OLE stream: 'Contents' 51 OLE stream: 'Contents'
52 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents 52 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
53 [ADDR] SWF 1 at 0x8 - FWS Header 53 [ADDR] SWF 1 at 0x8 - FWS Header
54 54
55 -C:\oletools>pyxswf.py -xo word_flash.doc 55 +C:\oletools>pyxswf -xo word_flash.doc
56 OLE stream: 'Contents' 56 OLE stream: 'Contents'
57 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents 57 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
58 [ADDR] SWF 1 at 0x8 - FWS Header 58 [ADDR] SWF 1 at 0x8 - FWS Header
@@ -62,7 +62,7 @@ OLE stream: &#39;Contents&#39; @@ -62,7 +62,7 @@ OLE stream: &#39;Contents&#39;
62 ### Example 2 - detecting and extracting a SWF file from a RTF document on Windows: 62 ### Example 2 - detecting and extracting a SWF file from a RTF document on Windows:
63 63
64 ```text 64 ```text
65 -C:\oletools>pyxswf.py -xf "rtf_flash.rtf" 65 +C:\oletools>pyxswf -xf "rtf_flash.rtf"
66 RTF embedded object size 1498557 at index 000036DD 66 RTF embedded object size 1498557 at index 000036DD
67 [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0 67 [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
68 00036DD 68 00036DD
oletools/ezhexviewer.py
@@ -16,7 +16,7 @@ Usage in a python application: @@ -16,7 +16,7 @@ Usage in a python application:
16 16
17 ezhexviewer project website: http://www.decalage.info/python/ezhexviewer 17 ezhexviewer project website: http://www.decalage.info/python/ezhexviewer
18 18
19 -ezhexviewer is copyright (c) 2012-2017, Philippe Lagadec (http://www.decalage.info) 19 +ezhexviewer is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
20 All rights reserved. 20 All rights reserved.
21 21
22 Redistribution and use in source and binary forms, with or without modification, 22 Redistribution and use in source and binary forms, with or without modification,
@@ -50,7 +50,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. @@ -50,7 +50,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
50 # 2017-04-26 PL: - fixed absolute imports (issue #141) 50 # 2017-04-26 PL: - fixed absolute imports (issue #141)
51 # 2018-09-15 v0.54 PL: - easygui is now a dependency 51 # 2018-09-15 v0.54 PL: - easygui is now a dependency
52 52
53 -__version__ = '0.54dev1' 53 +__version__ = '0.54'
54 54
55 #----------------------------------------------------------------------------- 55 #-----------------------------------------------------------------------------
56 # TODO: 56 # TODO:
oletools/mraptor.py
@@ -23,7 +23,7 @@ http://www.decalage.info/python/oletools @@ -23,7 +23,7 @@ http://www.decalage.info/python/oletools
23 23
24 # === LICENSE ================================================================== 24 # === LICENSE ==================================================================
25 25
26 -# MacroRaptor is copyright (c) 2016-2018 Philippe Lagadec (http://www.decalage.info) 26 +# MacroRaptor is copyright (c) 2016-2019 Philippe Lagadec (http://www.decalage.info)
27 # All rights reserved. 27 # All rights reserved.
28 # 28 #
29 # Redistribution and use in source and binary forms, with or without modification, 29 # Redistribution and use in source and binary forms, with or without modification,
@@ -58,8 +58,9 @@ http://www.decalage.info/python/oletools @@ -58,8 +58,9 @@ http://www.decalage.info/python/oletools
58 # 2016-12-21 v0.51 PL: - added more ActiveX macro triggers 58 # 2016-12-21 v0.51 PL: - added more ActiveX macro triggers
59 # 2017-03-08 PL: - fixed absolute imports 59 # 2017-03-08 PL: - fixed absolute imports
60 # 2018-05-25 v0.53 PL: - added Word/PowerPoint 2007+ XML (aka Flat OPC) issue #283 60 # 2018-05-25 v0.53 PL: - added Word/PowerPoint 2007+ XML (aka Flat OPC) issue #283
  61 +# 2019-04-04 v0.54 PL: - added ExecuteExcel4Macro, ShellExecuteA, XLM keywords
61 62
62 -__version__ = '0.53' 63 +__version__ = '0.54'
63 64
64 #------------------------------------------------------------------------------ 65 #------------------------------------------------------------------------------
65 # TODO: 66 # TODO:
@@ -119,20 +120,21 @@ re_autoexec = re.compile(r&#39;(?i)\b(?:Auto(?:Exec|_?Open|_?Close|Exit|New)&#39; + @@ -119,20 +120,21 @@ re_autoexec = re.compile(r&#39;(?i)\b(?:Auto(?:Exec|_?Open|_?Close|Exit|New)&#39; +
119 r'|DocumentComplete|DownloadBegin|DownloadComplete|FileDownload' + 120 r'|DocumentComplete|DownloadBegin|DownloadComplete|FileDownload' +
120 r'|NavigateComplete2|NavigateError|ProgressChange|PropertyChange' + 121 r'|NavigateComplete2|NavigateError|ProgressChange|PropertyChange' +
121 r'|SetSecureLockIcon|StatusTextChange|TitleChange|MouseMove' + 122 r'|SetSecureLockIcon|StatusTextChange|TitleChange|MouseMove' +
122 - r'|MouseEnter|MouseLeave|))\b') 123 + r'|MouseEnter|MouseLeave))|Auto_Ope\b')
  124 +# TODO: "Auto_Ope" is temporarily here because of a bug in plugin_biff, which misses the last byte in "Auto_Open"...
123 125
124 # MS-VBAL 5.4.5.1 Open Statement: 126 # MS-VBAL 5.4.5.1 Open Statement:
125 RE_OPEN_WRITE = r'(?:\bOpen\b[^\n]+\b(?:Write|Append|Binary|Output|Random)\b)' 127 RE_OPEN_WRITE = r'(?:\bOpen\b[^\n]+\b(?:Write|Append|Binary|Output|Random)\b)'
126 128
127 re_write = re.compile(r'(?i)\b(?:FileCopy|CopyFile|Kill|CreateTextFile|' 129 re_write = re.compile(r'(?i)\b(?:FileCopy|CopyFile|Kill|CreateTextFile|'
128 - + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|' 130 + + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|WriteProcessMemory|'
129 + r'ADODB\.Stream|WriteText|SaveToFile|SaveAs|SaveAsRTF|FileSaveAs|MkDir|RmDir|SaveSetting|SetAttr)\b|' + RE_OPEN_WRITE) 131 + r'ADODB\.Stream|WriteText|SaveToFile|SaveAs|SaveAsRTF|FileSaveAs|MkDir|RmDir|SaveSetting|SetAttr)\b|' + RE_OPEN_WRITE)
130 132
131 # MS-VBAL 5.2.3.5 External Procedure Declaration 133 # MS-VBAL 5.2.3.5 External Procedure Declaration
132 RE_DECLARE_LIB = r'(?:\bDeclare\b[^\n]+\bLib\b)' 134 RE_DECLARE_LIB = r'(?:\bDeclare\b[^\n]+\bLib\b)'
133 135
134 re_execute = re.compile(r'(?i)\b(?:Shell|CreateObject|GetObject|SendKeys|' 136 re_execute = re.compile(r'(?i)\b(?:Shell|CreateObject|GetObject|SendKeys|'
135 - + r'MacScript|FollowHyperlink|CreateThread|ShellExecute)\b|' + RE_DECLARE_LIB) 137 + + r'MacScript|FollowHyperlink|CreateThread|ShellExecuteA?|ExecuteExcel4Macro|EXEC|REGISTER)\b|' + RE_DECLARE_LIB)
136 138
137 139
138 # === CLASSES ================================================================= 140 # === CLASSES =================================================================
oletools/mraptor3.py
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 -"""  
3 -mraptor.py - MacroRaptor  
4 2
5 -MacroRaptor is a script to parse OLE and OpenXML files such as MS Office  
6 -documents (e.g. Word, Excel), to detect malicious macros. 3 +# mraptor3 is a stub that redirects to mraptor.py, for backwards compatibility
7 4
8 -Supported formats:  
9 -- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)  
10 -- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)  
11 -- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)  
12 -- Word/PowerPoint 2007+ XML (aka Flat OPC)  
13 -- Word 2003 XML (.xml)  
14 -- Word/Excel Single File Web Page / MHTML (.mht)  
15 -- Publisher (.pub) 5 +import sys, os, warnings
16 6
17 -Author: Philippe Lagadec - http://www.decalage.info  
18 -License: BSD, see source code or documentation  
19 -  
20 -MacroRaptor is part of the python-oletools package:  
21 -http://www.decalage.info/python/oletools  
22 -"""  
23 -  
24 -# === LICENSE ==================================================================  
25 -  
26 -# MacroRaptor is copyright (c) 2016-2018 Philippe Lagadec (http://www.decalage.info)  
27 -# All rights reserved.  
28 -#  
29 -# Redistribution and use in source and binary forms, with or without modification,  
30 -# are permitted provided that the following conditions are met:  
31 -#  
32 -# * Redistributions of source code must retain the above copyright notice, this  
33 -# list of conditions and the following disclaimer.  
34 -# * Redistributions in binary form must reproduce the above copyright notice,  
35 -# this list of conditions and the following disclaimer in the documentation  
36 -# and/or other materials provided with the distribution.  
37 -#  
38 -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND  
39 -# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED  
40 -# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE  
41 -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE  
42 -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL  
43 -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR  
44 -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER  
45 -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,  
46 -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE  
47 -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.  
48 -  
49 -#------------------------------------------------------------------------------  
50 -# CHANGELOG:  
51 -# 2016-02-23 v0.01 PL: - first version  
52 -# 2016-02-29 v0.02 PL: - added Workbook_Activate, FileSaveAs  
53 -# 2016-03-04 v0.03 PL: - returns an exit code based on the overall result  
54 -# 2016-03-08 v0.04 PL: - collapse long lines before analysis  
55 -# 2016-07-19 v0.50 SL: - converted to Python 3  
56 -# 2016-08-26 PL: - changed imports for Python 3  
57 -# 2017-04-26 v0.51 PL: - fixed absolute imports (issue #141)  
58 -# 2017-06-29 PL: - synced with mraptor.py 0.51  
59 -# 2018-05-25 v0.53 PL: - added Word/PowerPoint 2007+ XML (aka Flat OPC) issue #283  
60 -  
61 -__version__ = '0.53'  
62 -  
63 -#------------------------------------------------------------------------------  
64 -# TODO:  
65 -  
66 -  
67 -#--- IMPORTS ------------------------------------------------------------------  
68 -  
69 -import sys, os, logging, optparse, re 7 +warnings.warn('mraptor3 is deprecated, mraptor should be used instead.', DeprecationWarning)
70 8
71 # IMPORTANT: it should be possible to run oletools directly as scripts 9 # IMPORTANT: it should be possible to run oletools directly as scripts
72 # in any directory without installing them with pip or setup.py. 10 # in any directory without installing them with pip or setup.py.
@@ -74,280 +12,12 @@ import sys, os, logging, optparse, re @@ -74,280 +12,12 @@ import sys, os, logging, optparse, re
74 # And to enable Python 2+3 compatibility, we need to use absolute imports, 12 # And to enable Python 2+3 compatibility, we need to use absolute imports,
75 # so we add the oletools parent folder to sys.path (absolute+normalized path): 13 # so we add the oletools parent folder to sys.path (absolute+normalized path):
76 _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) 14 _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
77 -# print('_thismodule_dir = %r' % _thismodule_dir)  
78 _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..')) 15 _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
79 -# print('_parent_dir = %r' % _thirdparty_dir)  
80 -if not _parent_dir in sys.path: 16 +if _parent_dir not in sys.path:
81 sys.path.insert(0, _parent_dir) 17 sys.path.insert(0, _parent_dir)
82 18
83 -from oletools.thirdparty.xglob import xglob  
84 -from oletools.thirdparty.tablestream import tablestream  
85 -  
86 -# import the python 3 version of olevba  
87 -from oletools import olevba3 as olevba  
88 -from oletools.olevba3 import TYPE2TAG  
89 -  
90 -# === LOGGING =================================================================  
91 -  
92 -# a global logger object used for debugging:  
93 -log = olevba.get_logger('mraptor')  
94 -  
95 -  
96 -#--- CONSTANTS ----------------------------------------------------------------  
97 -  
98 -# URL and message to report issues:  
99 -# TODO: make it a common variable for all oletools  
100 -URL_ISSUES = 'https://github.com/decalage2/oletools/issues'  
101 -MSG_ISSUES = 'Please report this issue on %s' % URL_ISSUES  
102 -  
103 -# 'AutoExec', 'AutoOpen', 'Auto_Open', 'AutoClose', 'Auto_Close', 'AutoNew', 'AutoExit',  
104 -# 'Document_Open', 'DocumentOpen',  
105 -# 'Document_Close', 'DocumentBeforeClose', 'Document_BeforeClose',  
106 -# 'DocumentChange','Document_New',  
107 -# 'NewDocument'  
108 -# 'Workbook_Open', 'Workbook_Close',  
109 -# *_Painted such as InkPicture1_Painted  
110 -# *_GotFocus|LostFocus|MouseHover for other ActiveX objects  
111 -# reference: http://www.greyhathacker.net/?p=948  
112 -  
113 -# TODO: check if line also contains Sub or Function  
114 -re_autoexec = re.compile(r'(?i)\b(?:Auto(?:Exec|_?Open|_?Close|Exit|New)' +  
115 - r'|Document(?:_?Open|_Close|_?BeforeClose|Change|_New)' +  
116 - r'|NewDocument|Workbook(?:_Open|_Activate|_Close)' +  
117 - r'|\w+_(?:Painted|Painting|GotFocus|LostFocus|MouseHover' +  
118 - r'|Layout|Click|Change|Resize|BeforeNavigate2|BeforeScriptExecute' +  
119 - r'|DocumentComplete|DownloadBegin|DownloadComplete|FileDownload' +  
120 - r'|NavigateComplete2|NavigateError|ProgressChange|PropertyChange' +  
121 - r'|SetSecureLockIcon|StatusTextChange|TitleChange|MouseMove' +  
122 - r'|MouseEnter|MouseLeave|))\b')  
123 -  
124 -# MS-VBAL 5.4.5.1 Open Statement:  
125 -RE_OPEN_WRITE = r'(?:\bOpen\b[^\n]+\b(?:Write|Append|Binary|Output|Random)\b)'  
126 -  
127 -re_write = re.compile(r'(?i)\b(?:FileCopy|CopyFile|Kill|CreateTextFile|'  
128 - + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|'  
129 - + r'ADODB\.Stream|WriteText|SaveToFile|SaveAs|SaveAsRTF|FileSaveAs|MkDir|RmDir|SaveSetting|SetAttr)\b|' + RE_OPEN_WRITE)  
130 -  
131 -# MS-VBAL 5.2.3.5 External Procedure Declaration  
132 -RE_DECLARE_LIB = r'(?:\bDeclare\b[^\n]+\bLib\b)'  
133 -  
134 -re_execute = re.compile(r'(?i)\b(?:Shell|CreateObject|GetObject|SendKeys|'  
135 - + r'MacScript|FollowHyperlink|CreateThread|ShellExecute)\b|' + RE_DECLARE_LIB)  
136 -  
137 -  
138 -# === CLASSES =================================================================  
139 -  
140 -class Result_NoMacro(object):  
141 - exit_code = 0  
142 - color = 'green'  
143 - name = 'No Macro'  
144 -  
145 -  
146 -class Result_NotMSOffice(object):  
147 - exit_code = 1  
148 - color = 'green'  
149 - name = 'Not MS Office'  
150 -  
151 -  
152 -class Result_MacroOK(object):  
153 - exit_code = 2  
154 - color = 'cyan'  
155 - name = 'Macro OK'  
156 -  
157 -  
158 -class Result_Error(object):  
159 - exit_code = 10  
160 - color = 'yellow'  
161 - name = 'ERROR'  
162 -  
163 -  
164 -class Result_Suspicious(object):  
165 - exit_code = 20  
166 - color = 'red'  
167 - name = 'SUSPICIOUS'  
168 -  
169 -  
170 -class MacroRaptor(object):  
171 - """  
172 - class to scan VBA macro code to detect if it is malicious  
173 - """  
174 - def __init__(self, vba_code):  
175 - """  
176 - MacroRaptor constructor  
177 - :param vba_code: string containing the VBA macro code  
178 - """  
179 - # collapse long lines first  
180 - self.vba_code = olevba.vba_collapse_long_lines(vba_code)  
181 - self.autoexec = False  
182 - self.write = False  
183 - self.execute = False  
184 - self.flags = ''  
185 - self.suspicious = False  
186 - self.autoexec_match = None  
187 - self.write_match = None  
188 - self.execute_match = None  
189 - self.matches = []  
190 -  
191 - def scan(self):  
192 - """  
193 - Scan the VBA macro code to detect if it is malicious  
194 - :return:  
195 - """  
196 - m = re_autoexec.search(self.vba_code)  
197 - if m is not None:  
198 - self.autoexec = True  
199 - self.autoexec_match = m.group()  
200 - self.matches.append(m.group())  
201 - m = re_write.search(self.vba_code)  
202 - if m is not None:  
203 - self.write = True  
204 - self.write_match = m.group()  
205 - self.matches.append(m.group())  
206 - m = re_execute.search(self.vba_code)  
207 - if m is not None:  
208 - self.execute = True  
209 - self.execute_match = m.group()  
210 - self.matches.append(m.group())  
211 - if self.autoexec and (self.execute or self.write):  
212 - self.suspicious = True  
213 -  
214 - def get_flags(self):  
215 - flags = ''  
216 - flags += 'A' if self.autoexec else '-'  
217 - flags += 'W' if self.write else '-'  
218 - flags += 'X' if self.execute else '-'  
219 - return flags  
220 -  
221 -  
222 -# === MAIN ====================================================================  
223 -  
224 -def main():  
225 - """  
226 - Main function, called when olevba is run from the command line  
227 - """  
228 - global log  
229 - DEFAULT_LOG_LEVEL = "warning" # Default log level  
230 - LOG_LEVELS = {  
231 - 'debug': logging.DEBUG,  
232 - 'info': logging.INFO,  
233 - 'warning': logging.WARNING,  
234 - 'error': logging.ERROR,  
235 - 'critical': logging.CRITICAL  
236 - }  
237 -  
238 - usage = 'usage: %prog [options] <filename> [filename2 ...]'  
239 - parser = optparse.OptionParser(usage=usage)  
240 - parser.add_option("-r", action="store_true", dest="recursive",  
241 - help='find files recursively in subdirectories.')  
242 - parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None,  
243 - help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)')  
244 - parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*',  
245 - help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)')  
246 - parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL,  
247 - help="logging level debug/info/warning/error/critical (default=%default)")  
248 - parser.add_option("-m", '--matches', action="store_true", dest="show_matches",  
249 - help='Show matched strings.')  
250 -  
251 - # TODO: add logfile option  
252 -  
253 - (options, args) = parser.parse_args()  
254 -  
255 - # Print help if no arguments are passed  
256 - if len(args) == 0:  
257 - print('MacroRaptor %s - http://decalage.info/python/oletools' % __version__)  
258 - print('This is work in progress, please report issues at %s' % URL_ISSUES)  
259 - print(__doc__)  
260 - parser.print_help()  
261 - print('\nAn exit code is returned based on the analysis result:')  
262 - for result in (Result_NoMacro, Result_NotMSOffice, Result_MacroOK, Result_Error, Result_Suspicious):  
263 - print(' - %d: %s' % (result.exit_code, result.name))  
264 - sys.exit()  
265 -  
266 - # print banner with version  
267 - print('MacroRaptor %s - http://decalage.info/python/oletools' % __version__)  
268 - print('This is work in progress, please report issues at %s' % URL_ISSUES)  
269 -  
270 - logging.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(levelname)-8s %(message)s')  
271 - # enable logging in the modules:  
272 - log.setLevel(logging.NOTSET)  
273 -  
274 - t = tablestream.TableStream(style=tablestream.TableStyleSlim,  
275 - header_row=['Result', 'Flags', 'Type', 'File'],  
276 - column_width=[10, 5, 4, 56])  
277 -  
278 - exitcode = -1  
279 - global_result = None  
280 - # TODO: handle errors in xglob, to continue processing the next files  
281 - for container, filename, data in xglob.iter_files(args, recursive=options.recursive,  
282 - zip_password=options.zip_password, zip_fname=options.zip_fname):  
283 - # ignore directory names stored in zip files:  
284 - if container and filename.endswith('/'):  
285 - continue  
286 - full_name = '%s in %s' % (filename, container) if container else filename  
287 - # try:  
288 - # # Open the file  
289 - # if data is None:  
290 - # data = open(filename, 'rb').read()  
291 - # except:  
292 - # log.exception('Error when opening file %r' % full_name)  
293 - # continue  
294 - if isinstance(data, Exception):  
295 - result = Result_Error  
296 - t.write_row([result.name, '', '', full_name],  
297 - colors=[result.color, None, None, None])  
298 - t.write_row(['', '', '', str(data)],  
299 - colors=[None, None, None, result.color])  
300 - else:  
301 - filetype = '???'  
302 - try:  
303 - vba_parser = olevba.VBA_Parser(filename=filename, data=data, container=container)  
304 - filetype = TYPE2TAG[vba_parser.type]  
305 - except Exception as e:  
306 - # log.error('Error when parsing VBA macros from file %r' % full_name)  
307 - # TODO: distinguish actual errors from non-MSOffice files  
308 - result = Result_Error  
309 - t.write_row([result.name, '', filetype, full_name],  
310 - colors=[result.color, None, None, None])  
311 - t.write_row(['', '', '', str(e)],  
312 - colors=[None, None, None, result.color])  
313 - continue  
314 - if vba_parser.detect_vba_macros():  
315 - vba_code_all_modules = ''  
316 - try:  
317 - for (subfilename, stream_path, vba_filename, vba_code) in vba_parser.extract_all_macros():  
318 - vba_code_all_modules += vba_code.decode('utf-8','replace') + '\n'  
319 - except Exception as e:  
320 - # log.error('Error when parsing VBA macros from file %r' % full_name)  
321 - result = Result_Error  
322 - t.write_row([result.name, '', TYPE2TAG[vba_parser.type], full_name],  
323 - colors=[result.color, None, None, None])  
324 - t.write_row(['', '', '', str(e)],  
325 - colors=[None, None, None, result.color])  
326 - continue  
327 - mraptor = MacroRaptor(vba_code_all_modules)  
328 - mraptor.scan()  
329 - if mraptor.suspicious:  
330 - result = Result_Suspicious  
331 - else:  
332 - result = Result_MacroOK  
333 - t.write_row([result.name, mraptor.get_flags(), filetype, full_name],  
334 - colors=[result.color, None, None, None])  
335 - if mraptor.matches and options.show_matches:  
336 - t.write_row(['', '', '', 'Matches: %r' % mraptor.matches])  
337 - else:  
338 - result = Result_NoMacro  
339 - t.write_row([result.name, '', filetype, full_name],  
340 - colors=[result.color, None, None, None])  
341 - if result.exit_code > exitcode:  
342 - global_result = result  
343 - exitcode = result.exit_code  
344 -  
345 - print('')  
346 - print('Flags: A=AutoExec, W=Write, X=Execute')  
347 - print('Exit code: %d - %s' % (exitcode, global_result.name))  
348 - sys.exit(exitcode) 19 +from oletools.mraptor import *
  20 +from oletools.mraptor import __doc__, __version__
349 21
350 if __name__ == '__main__': 22 if __name__ == '__main__':
351 main() 23 main()
352 -  
353 -# Soundtrack: "Dark Child" by Marlon Williams  
oletools/mraptor_milter.py
@@ -98,18 +98,7 @@ from oletools import olevba, mraptor @@ -98,18 +98,7 @@ from oletools import olevba, mraptor
98 98
99 from Milter.utils import parse_addr 99 from Milter.utils import parse_addr
100 100
101 -if sys.version_info[0] <= 2:  
102 - # Python 2.x  
103 - if sys.version_info[1] <= 6:  
104 - # Python 2.6  
105 - # use is_zipfile backported from Python 2.7:  
106 - from oletools.thirdparty.zipfile27 import is_zipfile  
107 - else:  
108 - # Python 2.7  
109 - from zipfile import is_zipfile  
110 -else:  
111 - # Python 3.x+  
112 - from zipfile import is_zipfile 101 +from zipfile import is_zipfile
113 102
114 103
115 104
oletools/msodde.py
@@ -11,7 +11,6 @@ Supported formats: @@ -11,7 +11,6 @@ Supported formats:
11 - RTF 11 - RTF
12 - CSV (exported from / imported into Excel) 12 - CSV (exported from / imported into Excel)
13 - XML (exported from Word 2003, Word 2007+, Excel 2003, (Excel 2007+?) 13 - XML (exported from Word 2003, Word 2007+, Excel 2003, (Excel 2007+?)
14 -- raises an error if run with files encrypted using MS Crypto API RC4  
15 14
16 Author: Philippe Lagadec - http://www.decalage.info 15 Author: Philippe Lagadec - http://www.decalage.info
17 License: BSD, see source code or documentation 16 License: BSD, see source code or documentation
@@ -22,7 +21,7 @@ http://www.decalage.info/python/oletools @@ -22,7 +21,7 @@ http://www.decalage.info/python/oletools
22 21
23 # === LICENSE ================================================================= 22 # === LICENSE =================================================================
24 23
25 -# msodde is copyright (c) 2017-2018 Philippe Lagadec (http://www.decalage.info) 24 +# msodde is copyright (c) 2017-2019 Philippe Lagadec (http://www.decalage.info)
26 # All rights reserved. 25 # All rights reserved.
27 # 26 #
28 # Redistribution and use in source and binary forms, with or without 27 # Redistribution and use in source and binary forms, with or without
@@ -52,19 +51,30 @@ from __future__ import print_function @@ -52,19 +51,30 @@ from __future__ import print_function
52 51
53 import argparse 52 import argparse
54 import os 53 import os
55 -from os.path import abspath, dirname  
56 import sys 54 import sys
57 import re 55 import re
58 import csv 56 import csv
59 57
60 import olefile 58 import olefile
61 59
  60 +# IMPORTANT: it should be possible to run oletools directly as scripts
  61 +# in any directory without installing them with pip or setup.py.
  62 +# In that case, relative imports are NOT usable.
  63 +# And to enable Python 2+3 compatibility, we need to use absolute imports,
  64 +# so we add the oletools parent folder to sys.path (absolute+normalized path):
  65 +_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
  66 +# print('_thismodule_dir = %r' % _thismodule_dir)
  67 +_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
  68 +# print('_parent_dir = %r' % _thirdparty_dir)
  69 +if _parent_dir not in sys.path:
  70 + sys.path.insert(0, _parent_dir)
  71 +
62 from oletools import ooxml 72 from oletools import ooxml
63 from oletools import xls_parser 73 from oletools import xls_parser
64 from oletools import rtfobj 74 from oletools import rtfobj
65 -from oletools import oleid 75 +from oletools.ppt_record_parser import is_ppt
  76 +from oletools import crypto
66 from oletools.common.log_helper import log_helper 77 from oletools.common.log_helper import log_helper
67 -from oletools.common.errors import FileIsEncryptedError  
68 78
69 # ----------------------------------------------------------------------------- 79 # -----------------------------------------------------------------------------
70 # CHANGELOG: 80 # CHANGELOG:
@@ -88,8 +98,11 @@ from oletools.common.errors import FileIsEncryptedError @@ -88,8 +98,11 @@ from oletools.common.errors import FileIsEncryptedError
88 # 2018-03-21 CH: - added detection for various CSV formulas (issue #259) 98 # 2018-03-21 CH: - added detection for various CSV formulas (issue #259)
89 # 2018-09-11 v0.54 PL: - olefile is now a dependency 99 # 2018-09-11 v0.54 PL: - olefile is now a dependency
90 # 2018-10-25 CH: - detect encryption and raise error if detected 100 # 2018-10-25 CH: - detect encryption and raise error if detected
  101 +# 2019-03-25 CH: - added decryption of password-protected files
  102 +# 2019-07-17 v0.55 CH: - fixed issue #267, unicode error on Python 2
  103 +
91 104
92 -__version__ = '0.54dev4' 105 +__version__ = '0.55.dev3'
93 106
94 # ----------------------------------------------------------------------------- 107 # -----------------------------------------------------------------------------
95 # TODO: field codes can be in headers/footers/comments - parse these 108 # TODO: field codes can be in headers/footers/comments - parse these
@@ -305,6 +318,9 @@ def process_args(cmd_line_args=None): @@ -305,6 +318,9 @@ def process_args(cmd_line_args=None):
305 default=DEFAULT_LOG_LEVEL, 318 default=DEFAULT_LOG_LEVEL,
306 help="logging level debug/info/warning/error/critical " 319 help="logging level debug/info/warning/error/critical "
307 "(default=%(default)s)") 320 "(default=%(default)s)")
  321 + parser.add_argument("-p", "--password", type=str, action='append',
  322 + help='if encrypted office files are encountered, try '
  323 + 'decryption with this password. May be repeated.')
308 filter_group = parser.add_argument_group( 324 filter_group = parser.add_argument_group(
309 title='Filter which OpenXML field commands are returned', 325 title='Filter which OpenXML field commands are returned',
310 description='Only applies to OpenXML (e.g. docx) and rtf, not to OLE ' 326 description='Only applies to OpenXML (e.g. docx) and rtf, not to OLE '
@@ -348,14 +364,13 @@ def process_doc_field(data): @@ -348,14 +364,13 @@ def process_doc_field(data):
348 """ check if field instructions start with DDE 364 """ check if field instructions start with DDE
349 365
350 expects unicode input, returns unicode output (empty if not dde) """ 366 expects unicode input, returns unicode output (empty if not dde) """
351 - logger.debug('processing field {0}'.format(data)) 367 + logger.debug(u'processing field {0}'.format(data))
352 368
353 if data.lstrip().lower().startswith(u'dde'): 369 if data.lstrip().lower().startswith(u'dde'):
354 return data 370 return data
355 - elif data.lstrip().lower().startswith(u'\x00d\x00d\x00e\x00'): 371 + if data.lstrip().lower().startswith(u'\x00d\x00d\x00e\x00'):
356 return data 372 return data
357 - else:  
358 - return u'' 373 + return u''
359 374
360 375
361 OLE_FIELD_START = 0x13 376 OLE_FIELD_START = 0x13
@@ -379,7 +394,7 @@ def process_doc_stream(stream): @@ -379,7 +394,7 @@ def process_doc_stream(stream):
379 while True: 394 while True:
380 idx += 1 395 idx += 1
381 char = stream.read(1) # loop over every single byte 396 char = stream.read(1) # loop over every single byte
382 - if len(char) == 0: 397 + if len(char) == 0: # pylint: disable=len-as-condition
383 break 398 break
384 else: 399 else:
385 char = ord(char) 400 char = ord(char)
@@ -417,7 +432,7 @@ def process_doc_stream(stream): @@ -417,7 +432,7 @@ def process_doc_stream(stream):
417 pass 432 pass
418 elif len(field_contents) > OLE_FIELD_MAX_SIZE: 433 elif len(field_contents) > OLE_FIELD_MAX_SIZE:
419 logger.debug('field exceeds max size of {0}. Ignore rest' 434 logger.debug('field exceeds max size of {0}. Ignore rest'
420 - .format(OLE_FIELD_MAX_SIZE)) 435 + .format(OLE_FIELD_MAX_SIZE))
421 max_size_exceeded = True 436 max_size_exceeded = True
422 437
423 # appending a raw byte to a unicode string here. Not clean but 438 # appending a raw byte to a unicode string here. Not clean but
@@ -437,7 +452,7 @@ def process_doc_stream(stream): @@ -437,7 +452,7 @@ def process_doc_stream(stream):
437 logger.debug('big field was not a field after all') 452 logger.debug('big field was not a field after all')
438 453
439 logger.debug('Checked {0} characters, found {1} fields' 454 logger.debug('Checked {0} characters, found {1} fields'
440 - .format(idx, len(result_parts))) 455 + .format(idx, len(result_parts)))
441 456
442 return result_parts 457 return result_parts
443 458
@@ -462,11 +477,10 @@ def process_doc(ole): @@ -462,11 +477,10 @@ def process_doc(ole):
462 direntry = ole._load_direntry(sid) 477 direntry = ole._load_direntry(sid)
463 is_stream = direntry.entry_type == olefile.STGTY_STREAM 478 is_stream = direntry.entry_type == olefile.STGTY_STREAM
464 logger.debug('direntry {:2d} {}: {}' 479 logger.debug('direntry {:2d} {}: {}'
465 - .format(sid, '[orphan]' if is_orphan else direntry.name,  
466 - 'is stream of size {}'.format(direntry.size)  
467 - if is_stream else  
468 - 'no stream ({})'  
469 - .format(direntry.entry_type))) 480 + .format(sid, '[orphan]' if is_orphan else direntry.name,
  481 + 'is stream of size {}'.format(direntry.size)
  482 + if is_stream else
  483 + 'no stream ({})'.format(direntry.entry_type)))
470 if is_stream: 484 if is_stream:
471 new_parts = process_doc_stream( 485 new_parts = process_doc_stream(
472 ole._open(direntry.isectStart, direntry.size)) 486 ole._open(direntry.isectStart, direntry.size))
@@ -480,17 +494,23 @@ def process_xls(filepath): @@ -480,17 +494,23 @@ def process_xls(filepath):
480 """ find dde links in excel ole file """ 494 """ find dde links in excel ole file """
481 495
482 result = [] 496 result = []
483 - for stream in xls_parser.XlsFile(filepath).iter_streams():  
484 - if not isinstance(stream, xls_parser.WorkbookStream):  
485 - continue  
486 - for record in stream.iter_records():  
487 - if not isinstance(record, xls_parser.XlsRecordSupBook): 497 + xls_file = None
  498 + try:
  499 + xls_file = xls_parser.XlsFile(filepath)
  500 + for stream in xls_file.iter_streams():
  501 + if not isinstance(stream, xls_parser.WorkbookStream):
488 continue 502 continue
489 - if record.support_link_type in (  
490 - xls_parser.XlsRecordSupBook.LINK_TYPE_OLE_DDE,  
491 - xls_parser.XlsRecordSupBook.LINK_TYPE_EXTERNAL):  
492 - result.append(record.virt_path.replace(u'\u0003', u' '))  
493 - return u'\n'.join(result) 503 + for record in stream.iter_records():
  504 + if not isinstance(record, xls_parser.XlsRecordSupBook):
  505 + continue
  506 + if record.support_link_type in (
  507 + xls_parser.XlsRecordSupBook.LINK_TYPE_OLE_DDE,
  508 + xls_parser.XlsRecordSupBook.LINK_TYPE_EXTERNAL):
  509 + result.append(record.virt_path.replace(u'\u0003', u' '))
  510 + return u'\n'.join(result)
  511 + finally:
  512 + if xls_file is not None:
  513 + xls_file.close()
494 514
495 515
496 def process_docx(filepath, field_filter_mode=None): 516 def process_docx(filepath, field_filter_mode=None):
@@ -525,7 +545,8 @@ def process_docx(filepath, field_filter_mode=None): @@ -525,7 +545,8 @@ def process_docx(filepath, field_filter_mode=None):
525 else: 545 else:
526 elem = curr_elem 546 elem = curr_elem
527 if elem is None: 547 if elem is None:
528 - raise BadOOXML(filepath, 'Got "None"-Element from iter_xml') 548 + raise ooxml.BadOOXML(filepath,
  549 + 'Got "None"-Element from iter_xml')
529 550
530 # check if FLDCHARTYPE and whether "begin" or "end" tag 551 # check if FLDCHARTYPE and whether "begin" or "end" tag
531 attrib_type = elem.attrib.get(ATTR_W_FLDCHARTYPE[0]) or \ 552 attrib_type = elem.attrib.get(ATTR_W_FLDCHARTYPE[0]) or \
@@ -535,7 +556,7 @@ def process_docx(filepath, field_filter_mode=None): @@ -535,7 +556,7 @@ def process_docx(filepath, field_filter_mode=None):
535 level += 1 556 level += 1
536 if attrib_type == "end": 557 if attrib_type == "end":
537 level -= 1 558 level -= 1
538 - if level == 0 or level == -1: # edge-case; level gets -1 559 + if level in (0, -1): # edge-case; level gets -1
539 all_fields.append(ddetext) 560 all_fields.append(ddetext)
540 ddetext = u'' 561 ddetext = u''
541 level = 0 # reset edge-case 562 level = 0 # reset edge-case
@@ -564,6 +585,7 @@ def process_docx(filepath, field_filter_mode=None): @@ -564,6 +585,7 @@ def process_docx(filepath, field_filter_mode=None):
564 585
565 586
566 def unquote(field): 587 def unquote(field):
  588 + """TODO: document what exactly is happening here..."""
567 if "QUOTE" not in field or NO_QUOTES: 589 if "QUOTE" not in field or NO_QUOTES:
568 return field 590 return field
569 # split into components 591 # split into components
@@ -605,8 +627,8 @@ def field_is_blacklisted(contents): @@ -605,8 +627,8 @@ def field_is_blacklisted(contents):
605 index = FIELD_BLACKLIST_CMDS.index(words[0].lower()) 627 index = FIELD_BLACKLIST_CMDS.index(words[0].lower())
606 except ValueError: # first word is no blacklisted command 628 except ValueError: # first word is no blacklisted command
607 return False 629 return False
608 - logger.debug('trying to match "{0}" to blacklist command {1}'  
609 - .format(contents, FIELD_BLACKLIST[index])) 630 + logger.debug(u'trying to match "{0}" to blacklist command {1}'
  631 + .format(contents, FIELD_BLACKLIST[index]))
610 _, nargs_required, nargs_optional, sw_with_arg, sw_solo, sw_format \ 632 _, nargs_required, nargs_optional, sw_with_arg, sw_solo, sw_format \
611 = FIELD_BLACKLIST[index] 633 = FIELD_BLACKLIST[index]
612 634
@@ -617,12 +639,13 @@ def field_is_blacklisted(contents): @@ -617,12 +639,13 @@ def field_is_blacklisted(contents):
617 break 639 break
618 nargs += 1 640 nargs += 1
619 if nargs < nargs_required: 641 if nargs < nargs_required:
620 - logger.debug('too few args: found {0}, but need at least {1} in "{2}"'  
621 - .format(nargs, nargs_required, contents)) 642 + logger.debug(u'too few args: found {0}, but need at least {1} in "{2}"'
  643 + .format(nargs, nargs_required, contents))
622 return False 644 return False
623 - elif nargs > nargs_required + nargs_optional:  
624 - logger.debug('too many args: found {0}, but need at most {1}+{2} in "{3}"'  
625 - .format(nargs, nargs_required, nargs_optional, contents)) 645 + if nargs > nargs_required + nargs_optional:
  646 + logger.debug(u'too many args: found {0}, but need at most {1}+{2} in '
  647 + u'"{3}"'
  648 + .format(nargs, nargs_required, nargs_optional, contents))
626 return False 649 return False
627 650
628 # check switches 651 # check switches
@@ -631,15 +654,15 @@ def field_is_blacklisted(contents): @@ -631,15 +654,15 @@ def field_is_blacklisted(contents):
631 for word in words[1+nargs:]: 654 for word in words[1+nargs:]:
632 if expect_arg: # this is an argument for the last switch 655 if expect_arg: # this is an argument for the last switch
633 if arg_choices and (word not in arg_choices): 656 if arg_choices and (word not in arg_choices):
634 - logger.debug('Found invalid switch argument "{0}" in "{1}"'  
635 - .format(word, contents)) 657 + logger.debug(u'Found invalid switch argument "{0}" in "{1}"'
  658 + .format(word, contents))
636 return False 659 return False
637 expect_arg = False 660 expect_arg = False
638 arg_choices = [] # in general, do not enforce choices 661 arg_choices = [] # in general, do not enforce choices
639 continue # "no further questions, your honor" 662 continue # "no further questions, your honor"
640 elif not FIELD_SWITCH_REGEX.match(word): 663 elif not FIELD_SWITCH_REGEX.match(word):
641 - logger.debug('expected switch, found "{0}" in "{1}"'  
642 - .format(word, contents)) 664 + logger.debug(u'expected switch, found "{0}" in "{1}"'
  665 + .format(word, contents))
643 return False 666 return False
644 # we want a switch and we got a valid one 667 # we want a switch and we got a valid one
645 switch = word[1] 668 switch = word[1]
@@ -660,8 +683,8 @@ def field_is_blacklisted(contents): @@ -660,8 +683,8 @@ def field_is_blacklisted(contents):
660 if 'numeric' in sw_format: 683 if 'numeric' in sw_format:
661 arg_choices = [] # too many choices to list them here 684 arg_choices = [] # too many choices to list them here
662 else: 685 else:
663 - logger.debug('unexpected switch {0} in "{1}"'  
664 - .format(switch, contents)) 686 + logger.debug(u'unexpected switch {0} in "{1}"'
  687 + .format(switch, contents))
665 return False 688 return False
666 689
667 # if nothing went wrong sofar, the contents seems to match the blacklist 690 # if nothing went wrong sofar, the contents seems to match the blacklist
@@ -676,7 +699,7 @@ def process_xlsx(filepath): @@ -676,7 +699,7 @@ def process_xlsx(filepath):
676 tag = elem.tag.lower() 699 tag = elem.tag.lower()
677 if tag == 'ddelink' or tag.endswith('}ddelink'): 700 if tag == 'ddelink' or tag.endswith('}ddelink'):
678 # we have found a dde link. Try to get more info about it 701 # we have found a dde link. Try to get more info about it
679 - link_info = ['DDE-Link'] 702 + link_info = []
680 if 'ddeService' in elem.attrib: 703 if 'ddeService' in elem.attrib:
681 link_info.append(elem.attrib['ddeService']) 704 link_info.append(elem.attrib['ddeService'])
682 if 'ddeTopic' in elem.attrib: 705 if 'ddeTopic' in elem.attrib:
@@ -687,16 +710,15 @@ def process_xlsx(filepath): @@ -687,16 +710,15 @@ def process_xlsx(filepath):
687 for subfile, content_type, handle in parser.iter_non_xml(): 710 for subfile, content_type, handle in parser.iter_non_xml():
688 try: 711 try:
689 logger.info('Parsing non-xml subfile {0} with content type {1}' 712 logger.info('Parsing non-xml subfile {0} with content type {1}'
690 - .format(subfile, content_type)) 713 + .format(subfile, content_type))
691 for record in xls_parser.parse_xlsb_part(handle, content_type, 714 for record in xls_parser.parse_xlsb_part(handle, content_type,
692 subfile): 715 subfile):
693 logger.debug('{0}: {1}'.format(subfile, record)) 716 logger.debug('{0}: {1}'.format(subfile, record))
694 if isinstance(record, xls_parser.XlsbBeginSupBook) and \ 717 if isinstance(record, xls_parser.XlsbBeginSupBook) and \
695 record.link_type == \ 718 record.link_type == \
696 xls_parser.XlsbBeginSupBook.LINK_TYPE_DDE: 719 xls_parser.XlsbBeginSupBook.LINK_TYPE_DDE:
697 - dde_links.append('DDE-Link ' + record.string1 + ' ' +  
698 - record.string2)  
699 - except Exception: 720 + dde_links.append(record.string1 + ' ' + record.string2)
  721 + except Exception as exc:
700 if content_type.startswith('application/vnd.ms-excel.') or \ 722 if content_type.startswith('application/vnd.ms-excel.') or \
701 content_type.startswith('application/vnd.ms-office.'): # pylint: disable=bad-indentation 723 content_type.startswith('application/vnd.ms-office.'): # pylint: disable=bad-indentation
702 # should really be able to parse these either as xml or records 724 # should really be able to parse these either as xml or records
@@ -727,7 +749,8 @@ class RtfFieldParser(rtfobj.RtfParser): @@ -727,7 +749,8 @@ class RtfFieldParser(rtfobj.RtfParser):
727 749
728 def open_destination(self, destination): 750 def open_destination(self, destination):
729 if destination.cword == b'fldinst': 751 if destination.cword == b'fldinst':
730 - logger.debug('*** Start field data at index %Xh' % destination.start) 752 + logger.debug('*** Start field data at index %Xh'
  753 + % destination.start)
731 754
732 def close_destination(self, destination): 755 def close_destination(self, destination):
733 if destination.cword == b'fldinst': 756 if destination.cword == b'fldinst':
@@ -758,7 +781,7 @@ def process_rtf(file_handle, field_filter_mode=None): @@ -758,7 +781,7 @@ def process_rtf(file_handle, field_filter_mode=None):
758 all_fields = [field.decode('ascii') for field in rtfparser.fields] 781 all_fields = [field.decode('ascii') for field in rtfparser.fields]
759 # apply field command filter 782 # apply field command filter
760 logger.debug('found {1} fields, filtering with mode "{0}"' 783 logger.debug('found {1} fields, filtering with mode "{0}"'
761 - .format(field_filter_mode, len(all_fields))) 784 + .format(field_filter_mode, len(all_fields)))
762 if field_filter_mode in (FIELD_FILTER_ALL, None): 785 if field_filter_mode in (FIELD_FILTER_ALL, None):
763 clean_fields = all_fields 786 clean_fields = all_fields
764 elif field_filter_mode == FIELD_FILTER_DDE: 787 elif field_filter_mode == FIELD_FILTER_DDE:
@@ -815,11 +838,12 @@ def process_csv(filepath): @@ -815,11 +838,12 @@ def process_csv(filepath):
815 results, _ = process_csv_dialect(file_handle, delim) 838 results, _ = process_csv_dialect(file_handle, delim)
816 except csv.Error: # e.g. sniffing fails 839 except csv.Error: # e.g. sniffing fails
817 logger.debug('failed to csv-parse with delimiter {0!r}' 840 logger.debug('failed to csv-parse with delimiter {0!r}'
818 - .format(delim)) 841 + .format(delim))
819 842
820 if is_small and not results: 843 if is_small and not results:
821 # try whole file as single cell, since sniffing fails in this case 844 # try whole file as single cell, since sniffing fails in this case
822 - logger.debug('last attempt: take whole file as single unquoted cell') 845 + logger.debug('last attempt: take whole file as single unquoted '
  846 + 'cell')
823 file_handle.seek(0) 847 file_handle.seek(0)
824 match = CSV_DDE_FORMAT.match(file_handle.read(CSV_SMALL_THRESH)) 848 match = CSV_DDE_FORMAT.match(file_handle.read(CSV_SMALL_THRESH))
825 if match: 849 if match:
@@ -836,8 +860,8 @@ def process_csv_dialect(file_handle, delimiters): @@ -836,8 +860,8 @@ def process_csv_dialect(file_handle, delimiters):
836 delimiters=delimiters) 860 delimiters=delimiters)
837 dialect.strict = False # microsoft is never strict 861 dialect.strict = False # microsoft is never strict
838 logger.debug('sniffed csv dialect with delimiter {0!r} ' 862 logger.debug('sniffed csv dialect with delimiter {0!r} '
839 - 'and quote char {1!r}'  
840 - .format(dialect.delimiter, dialect.quotechar)) 863 + 'and quote char {1!r}'
  864 + .format(dialect.delimiter, dialect.quotechar))
841 865
842 # rewind file handle to start 866 # rewind file handle to start
843 file_handle.seek(0) 867 file_handle.seek(0)
@@ -877,7 +901,7 @@ def process_excel_xml(filepath): @@ -877,7 +901,7 @@ def process_excel_xml(filepath):
877 break 901 break
878 if formula is None: 902 if formula is None:
879 continue 903 continue
880 - logger.debug('found cell with formula {0}'.format(formula)) 904 + logger.debug(u'found cell with formula {0}'.format(formula))
881 match = re.match(XML_DDE_FORMAT, formula) 905 match = re.match(XML_DDE_FORMAT, formula)
882 if match: 906 if match:
883 dde_links.append(u' '.join(match.groups()[:2])) 907 dde_links.append(u' '.join(match.groups()[:2]))
@@ -891,19 +915,11 @@ def process_file(filepath, field_filter_mode=None): @@ -891,19 +915,11 @@ def process_file(filepath, field_filter_mode=None):
891 if xls_parser.is_xls(filepath): 915 if xls_parser.is_xls(filepath):
892 logger.debug('Process file as excel 2003 (xls)') 916 logger.debug('Process file as excel 2003 (xls)')
893 return process_xls(filepath) 917 return process_xls(filepath)
894 -  
895 - # encrypted files also look like ole, even if office 2007+ (xml-based)  
896 - # so check for encryption, first  
897 - ole = olefile.OleFileIO(filepath, path_encoding=None)  
898 - oid = oleid.OleID(ole)  
899 - if oid.check_encrypted().value:  
900 - log.debug('is encrypted - raise error')  
901 - raise FileIsEncryptedError(filepath)  
902 - elif oid.check_powerpoint().value:  
903 - log.debug('is ppt - cannot have DDE') 918 + if is_ppt(filepath):
  919 + logger.debug('is ppt - cannot have DDE')
904 return u'' 920 return u''
905 - else:  
906 - logger.debug('Process file as word 2003 (doc)') 921 + logger.debug('Process file as word 2003 (doc)')
  922 + with olefile.OleFileIO(filepath, path_encoding=None) as ole:
907 return process_doc(ole) 923 return process_doc(ole)
908 924
909 with open(filepath, 'rb') as file_handle: 925 with open(filepath, 'rb') as file_handle:
@@ -921,22 +937,77 @@ def process_file(filepath, field_filter_mode=None): @@ -921,22 +937,77 @@ def process_file(filepath, field_filter_mode=None):
921 if doctype == ooxml.DOCTYPE_EXCEL: 937 if doctype == ooxml.DOCTYPE_EXCEL:
922 logger.debug('Process file as excel 2007+ (xlsx)') 938 logger.debug('Process file as excel 2007+ (xlsx)')
923 return process_xlsx(filepath) 939 return process_xlsx(filepath)
924 - elif doctype in (ooxml.DOCTYPE_EXCEL_XML, ooxml.DOCTYPE_EXCEL_XML2003): 940 + if doctype in (ooxml.DOCTYPE_EXCEL_XML, ooxml.DOCTYPE_EXCEL_XML2003):
925 logger.debug('Process file as xml from excel 2003/2007+') 941 logger.debug('Process file as xml from excel 2003/2007+')
926 return process_excel_xml(filepath) 942 return process_excel_xml(filepath)
927 - elif doctype in (ooxml.DOCTYPE_WORD_XML, ooxml.DOCTYPE_WORD_XML2003): 943 + if doctype in (ooxml.DOCTYPE_WORD_XML, ooxml.DOCTYPE_WORD_XML2003):
928 logger.debug('Process file as xml from word 2003/2007+') 944 logger.debug('Process file as xml from word 2003/2007+')
929 return process_docx(filepath) 945 return process_docx(filepath)
930 - elif doctype is None: 946 + if doctype is None:
931 logger.debug('Process file as csv') 947 logger.debug('Process file as csv')
932 return process_csv(filepath) 948 return process_csv(filepath)
933 - else: # could be docx; if not: this is the old default code path  
934 - logger.debug('Process file as word 2007+ (docx)')  
935 - return process_docx(filepath, field_filter_mode) 949 + # could be docx; if not: this is the old default code path
  950 + logger.debug('Process file as word 2007+ (docx)')
  951 + return process_docx(filepath, field_filter_mode)
936 952
937 953
938 # === MAIN ================================================================= 954 # === MAIN =================================================================
939 955
  956 +
  957 +def process_maybe_encrypted(filepath, passwords=None, crypto_nesting=0,
  958 + **kwargs):
  959 + """
  960 + Process a file that might be encrypted.
  961 +
  962 + Calls :py:func:`process_file` and if that fails tries to decrypt and
  963 + process the result. Based on recommendation in module doc string of
  964 + :py:mod:`oletools.crypto`.
  965 +
  966 + :param str filepath: path to file on disc.
  967 + :param passwords: list of passwords (str) to try for decryption or None
  968 + :param int crypto_nesting: How many decryption layers were already used to
  969 + get the given file.
  970 + :param kwargs: same as :py:func:`process_file`
  971 + :returns: same as :py:func:`process_file`
  972 + """
  973 + result = u''
  974 + try:
  975 + result = process_file(filepath, **kwargs)
  976 + if not crypto.is_encrypted(filepath):
  977 + return result
  978 + except Exception:
  979 + logger.debug('Ignoring exception:', exc_info=True)
  980 + if not crypto.is_encrypted(filepath):
  981 + raise
  982 +
  983 + # we reach this point only if file is encrypted
  984 + # check if this is an encrypted file in an encrypted file in an ...
  985 + if crypto_nesting >= crypto.MAX_NESTING_DEPTH:
  986 + raise crypto.MaxCryptoNestingReached(crypto_nesting, filepath)
  987 +
  988 + decrypted_file = None
  989 + if passwords is None:
  990 + passwords = crypto.DEFAULT_PASSWORDS
  991 + else:
  992 + passwords = list(passwords) + crypto.DEFAULT_PASSWORDS
  993 + try:
  994 + logger.debug('Trying to decrypt file')
  995 + decrypted_file = crypto.decrypt(filepath, passwords)
  996 + if not decrypted_file:
  997 + logger.error('Decrypt failed, run with debug output to get details')
  998 + raise crypto.WrongEncryptionPassword(filepath)
  999 + logger.info('Analyze decrypted file')
  1000 + result = process_maybe_encrypted(decrypted_file, passwords,
  1001 + crypto_nesting+1, **kwargs)
  1002 + finally: # clean up
  1003 + try: # (maybe file was not yet created)
  1004 + os.unlink(decrypted_file)
  1005 + except Exception:
  1006 + logger.debug('Ignoring exception closing decrypted file:',
  1007 + exc_info=True)
  1008 + return result
  1009 +
  1010 +
940 def main(cmd_line_args=None): 1011 def main(cmd_line_args=None):
941 """ Main function, called if this file is called as a script 1012 """ Main function, called if this file is called as a script
942 1013
@@ -961,13 +1032,16 @@ def main(cmd_line_args=None): @@ -961,13 +1032,16 @@ def main(cmd_line_args=None):
961 text = '' 1032 text = ''
962 return_code = 1 1033 return_code = 1
963 try: 1034 try:
964 - text = process_file(args.filepath, args.field_filter_mode) 1035 + text = process_maybe_encrypted(
  1036 + args.filepath, args.password,
  1037 + field_filter_mode=args.field_filter_mode)
965 return_code = 0 1038 return_code = 0
966 except Exception as exc: 1039 except Exception as exc:
967 - logger.exception(exc.message) 1040 + logger.exception(str(exc))
968 1041
969 logger.print_str('DDE Links:') 1042 logger.print_str('DDE Links:')
970 - logger.print_str(text) 1043 + for link in text.splitlines():
  1044 + logger.print_str(text, type='dde-link')
971 1045
972 log_helper.end_logging() 1046 log_helper.end_logging()
973 1047
oletools/olebrowse.py
@@ -12,7 +12,7 @@ olebrowse project website: http://www.decalage.info/python/olebrowse @@ -12,7 +12,7 @@ olebrowse project website: http://www.decalage.info/python/olebrowse
12 olebrowse is part of the python-oletools package: 12 olebrowse is part of the python-oletools package:
13 http://www.decalage.info/python/oletools 13 http://www.decalage.info/python/oletools
14 14
15 -olebrowse is copyright (c) 2012-2017, Philippe Lagadec (http://www.decalage.info) 15 +olebrowse is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
16 All rights reserved. 16 All rights reserved.
17 17
18 Redistribution and use in source and binary forms, with or without modification, 18 Redistribution and use in source and binary forms, with or without modification,
@@ -43,7 +43,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. @@ -43,7 +43,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
43 # 2017-04-26 v0.51 PL: - fixed absolute imports (issue #141) 43 # 2017-04-26 v0.51 PL: - fixed absolute imports (issue #141)
44 # 2018-09-11 v0.54 PL: - olefile is now a dependency 44 # 2018-09-11 v0.54 PL: - olefile is now a dependency
45 45
46 -__version__ = '0.54dev1' 46 +__version__ = '0.54'
47 47
48 #------------------------------------------------------------------------------ 48 #------------------------------------------------------------------------------
49 # TODO: 49 # TODO:
oletools/oledir.py
@@ -14,7 +14,7 @@ http://www.decalage.info/python/oletools @@ -14,7 +14,7 @@ http://www.decalage.info/python/oletools
14 14
15 #=== LICENSE ================================================================== 15 #=== LICENSE ==================================================================
16 16
17 -# oledir is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info) 17 +# oledir is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info)
18 # All rights reserved. 18 # All rights reserved.
19 # 19 #
20 # Redistribution and use in source and binary forms, with or without modification, 20 # Redistribution and use in source and binary forms, with or without modification,
@@ -53,7 +53,7 @@ from __future__ import print_function @@ -53,7 +53,7 @@ from __future__ import print_function
53 # 2018-08-28 v0.54 PL: - olefile is now a dependency 53 # 2018-08-28 v0.54 PL: - olefile is now a dependency
54 # 2018-10-06 - colorclass is now a dependency 54 # 2018-10-06 - colorclass is now a dependency
55 55
56 -__version__ = '0.54dev1' 56 +__version__ = '0.54'
57 57
58 #------------------------------------------------------------------------------ 58 #------------------------------------------------------------------------------
59 # TODO: 59 # TODO:
oletools/oleform.py
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 2
  3 +# REFERENCES:
  4 +# - MS-OFORMS: https://msdn.microsoft.com/en-us/library/office/cc313125%28v=office.12%29.aspx?f=255&MSPPError=-2147217396
  5 +
3 # CHANGELOG: 6 # CHANGELOG:
4 # 2018-02-19 v0.53 PL: - fixed issue #260, removed long integer literals 7 # 2018-02-19 v0.53 PL: - fixed issue #260, removed long integer literals
5 8
oletools/oleid.py
@@ -17,7 +17,7 @@ http://www.decalage.info/python/oletools @@ -17,7 +17,7 @@ http://www.decalage.info/python/oletools
17 17
18 #=== LICENSE ================================================================= 18 #=== LICENSE =================================================================
19 19
20 -# oleid is copyright (c) 2012-2018, Philippe Lagadec (http://www.decalage.info) 20 +# oleid is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
21 # All rights reserved. 21 # All rights reserved.
22 # 22 #
23 # Redistribution and use in source and binary forms, with or without 23 # Redistribution and use in source and binary forms, with or without
@@ -59,7 +59,7 @@ from __future__ import print_function @@ -59,7 +59,7 @@ from __future__ import print_function
59 # 2018-10-19 CH: - accept olefile as well as filename, return Indicators, 59 # 2018-10-19 CH: - accept olefile as well as filename, return Indicators,
60 # improve encryption detection for ppt 60 # improve encryption detection for ppt
61 61
62 -__version__ = '0.54dev4' 62 +__version__ = '0.54'
63 63
64 64
65 #------------------------------------------------------------------------------ 65 #------------------------------------------------------------------------------
@@ -80,22 +80,26 @@ __version__ = &#39;0.54dev4&#39; @@ -80,22 +80,26 @@ __version__ = &#39;0.54dev4&#39;
80 80
81 #=== IMPORTS ================================================================= 81 #=== IMPORTS =================================================================
82 82
83 -import argparse, sys, re, zlib, struct 83 +import argparse, sys, re, zlib, struct, os
84 from os.path import dirname, abspath 84 from os.path import dirname, abspath
85 85
86 -# little hack to allow absolute imports even if oletools is not installed  
87 -# (required to run oletools directly as scripts in any directory).  
88 -try:  
89 - from oletools.thirdparty import prettytable  
90 -except ImportError:  
91 - PARENT_DIR = dirname(dirname(abspath(__file__)))  
92 - if PARENT_DIR not in sys.path:  
93 - sys.path.insert(0, PARENT_DIR)  
94 - del PARENT_DIR  
95 - from oletools.thirdparty import prettytable  
96 -  
97 import olefile 86 import olefile
98 87
  88 +# IMPORTANT: it should be possible to run oletools directly as scripts
  89 +# in any directory without installing them with pip or setup.py.
  90 +# In that case, relative imports are NOT usable.
  91 +# And to enable Python 2+3 compatibility, we need to use absolute imports,
  92 +# so we add the oletools parent folder to sys.path (absolute+normalized path):
  93 +_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
  94 +# print('_thismodule_dir = %r' % _thismodule_dir)
  95 +_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
  96 +# print('_parent_dir = %r' % _thirdparty_dir)
  97 +if _parent_dir not in sys.path:
  98 + sys.path.insert(0, _parent_dir)
  99 +
  100 +from oletools.thirdparty.prettytable import prettytable
  101 +from oletools import crypto
  102 +
99 103
100 104
101 #=== FUNCTIONS =============================================================== 105 #=== FUNCTIONS ===============================================================
@@ -279,20 +283,7 @@ class OleID(object): @@ -279,20 +283,7 @@ class OleID(object):
279 self.indicators.append(encrypted) 283 self.indicators.append(encrypted)
280 if not self.ole: 284 if not self.ole:
281 return None 285 return None
282 - # check if bit 1 of security field = 1:  
283 - # (this field may be missing for Powerpoint2000, for example)  
284 - if self.suminfo_data is None:  
285 - self.check_properties()  
286 - if 0x13 in self.suminfo_data:  
287 - if self.suminfo_data[0x13] & 1:  
288 - encrypted.value = True  
289 - # check if this is an OpenXML encrypted file  
290 - elif self.ole.exists('EncryptionInfo'):  
291 - encrypted.value = True  
292 - # or an encrypted ppt file  
293 - if self.ole.exists('EncryptedSummary') and \  
294 - not self.ole.exists('SummaryInformation'):  
295 - encrypted.value = True 286 + encrypted.value = crypto.is_encrypted(self.ole)
296 return encrypted 287 return encrypted
297 288
298 def check_word(self): 289 def check_word(self):
@@ -316,27 +307,7 @@ class OleID(object): @@ -316,27 +307,7 @@ class OleID(object):
316 return None, None 307 return None, None
317 if self.ole.exists('WordDocument'): 308 if self.ole.exists('WordDocument'):
318 word.value = True 309 word.value = True
319 - # check for Word-specific encryption flag:  
320 - stream = None  
321 - try:  
322 - stream = self.ole.openstream(["WordDocument"])  
323 - # pass header 10 bytes  
324 - stream.read(10)  
325 - # read flag structure:  
326 - temp16 = struct.unpack("H", stream.read(2))[0]  
327 - f_encrypted = (temp16 & 0x0100) >> 8  
328 - if f_encrypted:  
329 - # correct encrypted indicator if present or add one  
330 - encrypt_ind = self.get_indicator('encrypted')  
331 - if encrypt_ind:  
332 - encrypt_ind.value = True  
333 - else:  
334 - self.indicators.append('encrypted', True, name='Encrypted')  
335 - except Exception:  
336 - raise  
337 - finally:  
338 - if stream is not None:  
339 - stream.close() 310 +
340 # check for VBA macros: 311 # check for VBA macros:
341 if self.ole.exists('Macros'): 312 if self.ole.exists('Macros'):
342 macros.value = True 313 macros.value = True
oletools/olemap.py
@@ -13,7 +13,7 @@ http://www.decalage.info/python/oletools @@ -13,7 +13,7 @@ http://www.decalage.info/python/oletools
13 13
14 #=== LICENSE ================================================================== 14 #=== LICENSE ==================================================================
15 15
16 -# olemap is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info) 16 +# olemap is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info)
17 # All rights reserved. 17 # All rights reserved.
18 # 18 #
19 # Redistribution and use in source and binary forms, with or without modification, 19 # Redistribution and use in source and binary forms, with or without modification,
@@ -52,8 +52,9 @@ http://www.decalage.info/python/oletools @@ -52,8 +52,9 @@ http://www.decalage.info/python/oletools
52 # 2017-03-23 PL: - only display the header by default 52 # 2017-03-23 PL: - only display the header by default
53 # - added option --exdata to display extra data in hex 53 # - added option --exdata to display extra data in hex
54 # 2018-08-28 v0.54 PL: - olefile is now a dependency 54 # 2018-08-28 v0.54 PL: - olefile is now a dependency
  55 +# 2019-07-10 v0.55 PL: - fixed display of OLE header CLSID (issue #394)
55 56
56 -__version__ = '0.54dev1' 57 +__version__ = '0.55.dev3'
57 58
58 #------------------------------------------------------------------------------ 59 #------------------------------------------------------------------------------
59 # TODO: 60 # TODO:
@@ -121,7 +122,7 @@ def show_header(ole, extra_data=False): @@ -121,7 +122,7 @@ def show_header(ole, extra_data=False):
121 print("OLE HEADER:") 122 print("OLE HEADER:")
122 t = tablestream.TableStream([24, 16, 79-(4+24+16)], header_row=['Attribute', 'Value', 'Description']) 123 t = tablestream.TableStream([24, 16, 79-(4+24+16)], header_row=['Attribute', 'Value', 'Description'])
123 t.write_row(['OLE Signature (hex)', binascii.b2a_hex(ole.header_signature).upper(), 'Should be D0CF11E0A1B11AE1']) 124 t.write_row(['OLE Signature (hex)', binascii.b2a_hex(ole.header_signature).upper(), 'Should be D0CF11E0A1B11AE1'])
124 - t.write_row(['Header CLSID (hex)', binascii.b2a_hex(ole.header_clsid).upper(), 'Should be 0']) 125 + t.write_row(['Header CLSID', ole.header_clsid, 'Should be empty (0)'])
125 t.write_row(['Minor Version', '%04X' % ole.minor_version, 'Should be 003E']) 126 t.write_row(['Minor Version', '%04X' % ole.minor_version, 'Should be 003E'])
126 t.write_row(['Major Version', '%04X' % ole.dll_version, 'Should be 3 or 4']) 127 t.write_row(['Major Version', '%04X' % ole.dll_version, 'Should be 3 or 4'])
127 t.write_row(['Byte Order', '%04X' % ole.byte_order, 'Should be FFFE (little endian)']) 128 t.write_row(['Byte Order', '%04X' % ole.byte_order, 'Should be FFFE (little endian)'])
oletools/olemeta.py
@@ -15,7 +15,7 @@ http://www.decalage.info/python/oletools @@ -15,7 +15,7 @@ http://www.decalage.info/python/oletools
15 15
16 #=== LICENSE ================================================================= 16 #=== LICENSE =================================================================
17 17
18 -# olemeta is copyright (c) 2013-2018, Philippe Lagadec (http://www.decalage.info) 18 +# olemeta is copyright (c) 2013-2019, Philippe Lagadec (http://www.decalage.info)
19 # All rights reserved. 19 # All rights reserved.
20 # 20 #
21 # Redistribution and use in source and binary forms, with or without modification, 21 # Redistribution and use in source and binary forms, with or without modification,
@@ -51,7 +51,7 @@ http://www.decalage.info/python/oletools @@ -51,7 +51,7 @@ http://www.decalage.info/python/oletools
51 # 2017-05-04 PL: - added optparse and xglob (issue #141) 51 # 2017-05-04 PL: - added optparse and xglob (issue #141)
52 # 2018-09-11 v0.54 PL: - olefile is now a dependency 52 # 2018-09-11 v0.54 PL: - olefile is now a dependency
53 53
54 -__version__ = '0.54dev1' 54 +__version__ = '0.54'
55 55
56 #------------------------------------------------------------------------------ 56 #------------------------------------------------------------------------------
57 # TODO: 57 # TODO:
oletools/oleobj.py
@@ -14,7 +14,7 @@ http://www.decalage.info/python/oletools @@ -14,7 +14,7 @@ http://www.decalage.info/python/oletools
14 14
15 # === LICENSE ================================================================= 15 # === LICENSE =================================================================
16 16
17 -# oleobj is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info) 17 +# oleobj is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info)
18 # All rights reserved. 18 # All rights reserved.
19 # 19 #
20 # Redistribution and use in source and binary forms, with or without 20 # Redistribution and use in source and binary forms, with or without
@@ -89,7 +89,7 @@ from oletools.ooxml import XmlParser @@ -89,7 +89,7 @@ from oletools.ooxml import XmlParser
89 # 2018-09-11 v0.54 PL: - olefile is now a dependency 89 # 2018-09-11 v0.54 PL: - olefile is now a dependency
90 # 2018-10-30 SA: - added detection of external links (PR #317) 90 # 2018-10-30 SA: - added detection of external links (PR #317)
91 91
92 -__version__ = '0.54dev4' 92 +__version__ = '0.54'
93 93
94 # ----------------------------------------------------------------------------- 94 # -----------------------------------------------------------------------------
95 # TODO: 95 # TODO:
@@ -526,29 +526,35 @@ def find_ole_in_ppt(filename): @@ -526,29 +526,35 @@ def find_ole_in_ppt(filename):
526 can contain the actual embedded file we are looking for (caller will check 526 can contain the actual embedded file we are looking for (caller will check
527 for these). 527 for these).
528 """ 528 """
529 - for stream in PptFile(filename).iter_streams():  
530 - for record_idx, record in enumerate(stream.iter_records()):  
531 - if isinstance(record, PptRecordExOleVbaActiveXAtom):  
532 - ole = None  
533 - try:  
534 - data_start = next(record.iter_uncompressed())  
535 - if data_start[:len(olefile.MAGIC)] != olefile.MAGIC:  
536 - continue # could be an ActiveX control or VBA Storage  
537 -  
538 - # otherwise, this should be an OLE object  
539 - log.debug('Found record with embedded ole object in ppt '  
540 - '(stream "{0}", record no {1})'  
541 - .format(stream.name, record_idx))  
542 - ole = record.get_data_as_olefile()  
543 - yield ole  
544 - except IOError:  
545 - log.warning('Error reading data from {0} stream or '  
546 - 'interpreting it as OLE object'  
547 - .format(stream.name))  
548 - log.debug('', exc_info=True)  
549 - finally:  
550 - if ole is not None:  
551 - ole.close() 529 + ppt_file = None
  530 + try:
  531 + ppt_file = PptFile(filename)
  532 + for stream in ppt_file.iter_streams():
  533 + for record_idx, record in enumerate(stream.iter_records()):
  534 + if isinstance(record, PptRecordExOleVbaActiveXAtom):
  535 + ole = None
  536 + try:
  537 + data_start = next(record.iter_uncompressed())
  538 + if data_start[:len(olefile.MAGIC)] != olefile.MAGIC:
  539 + continue # could be ActiveX control / VBA Storage
  540 +
  541 + # otherwise, this should be an OLE object
  542 + log.debug('Found record with embedded ole object in '
  543 + 'ppt (stream "{0}", record no {1})'
  544 + .format(stream.name, record_idx))
  545 + ole = record.get_data_as_olefile()
  546 + yield ole
  547 + except IOError:
  548 + log.warning('Error reading data from {0} stream or '
  549 + 'interpreting it as OLE object'
  550 + .format(stream.name))
  551 + log.debug('', exc_info=True)
  552 + finally:
  553 + if ole is not None:
  554 + ole.close()
  555 + finally:
  556 + if ppt_file is not None:
  557 + ppt_file.close()
552 558
553 559
554 class FakeFile(io.RawIOBase): 560 class FakeFile(io.RawIOBase):
@@ -750,13 +756,13 @@ def process_file(filename, data, output_dir=None): @@ -750,13 +756,13 @@ def process_file(filename, data, output_dir=None):
750 756
751 xml_parser = None 757 xml_parser = None
752 if is_zipfile(filename): 758 if is_zipfile(filename):
753 - log.info('file is a OOXML file, looking for relationships with external links') 759 + log.info('file could be an OOXML file, looking for relationships with '
  760 + 'external links')
754 xml_parser = XmlParser(filename) 761 xml_parser = XmlParser(filename)
755 for relationship, target in find_external_relationships(xml_parser): 762 for relationship, target in find_external_relationships(xml_parser):
756 did_dump = True 763 did_dump = True
757 print("Found relationship '%s' with external link %s" % (relationship, target)) 764 print("Found relationship '%s' with external link %s" % (relationship, target))
758 765
759 -  
760 # look for ole files inside file (e.g. unzip docx) 766 # look for ole files inside file (e.g. unzip docx)
761 # have to finish work on every ole stream inside iteration, since handles 767 # have to finish work on every ole stream inside iteration, since handles
762 # are closed in find_ole 768 # are closed in find_ole
@@ -765,9 +771,9 @@ def process_file(filename, data, output_dir=None): @@ -765,9 +771,9 @@ def process_file(filename, data, output_dir=None):
765 continue 771 continue
766 772
767 for path_parts in ole.listdir(): 773 for path_parts in ole.listdir():
  774 + stream_path = '/'.join(path_parts)
  775 + log.debug('Checking stream %r', stream_path)
768 if path_parts[-1] == '\x01Ole10Native': 776 if path_parts[-1] == '\x01Ole10Native':
769 - stream_path = '/'.join(path_parts)  
770 - log.debug('Checking stream %r', stream_path)  
771 stream = None 777 stream = None
772 try: 778 try:
773 stream = ole.openstream(path_parts) 779 stream = ole.openstream(path_parts)
oletools/oletimes.py
@@ -16,7 +16,7 @@ http://www.decalage.info/python/oletools @@ -16,7 +16,7 @@ http://www.decalage.info/python/oletools
16 16
17 #=== LICENSE ================================================================= 17 #=== LICENSE =================================================================
18 18
19 -# oletimes is copyright (c) 2013-2017, Philippe Lagadec (http://www.decalage.info) 19 +# oletimes is copyright (c) 2013-2019, Philippe Lagadec (http://www.decalage.info)
20 # All rights reserved. 20 # All rights reserved.
21 # 21 #
22 # Redistribution and use in source and binary forms, with or without modification, 22 # Redistribution and use in source and binary forms, with or without modification,
@@ -52,7 +52,7 @@ http://www.decalage.info/python/oletools @@ -52,7 +52,7 @@ http://www.decalage.info/python/oletools
52 # 2017-05-04 PL: - added optparse and xglob (issue #141) 52 # 2017-05-04 PL: - added optparse and xglob (issue #141)
53 # 2018-09-11 v0.54 PL: - olefile is now a dependency 53 # 2018-09-11 v0.54 PL: - olefile is now a dependency
54 54
55 -__version__ = '0.54dev1' 55 +__version__ = '0.54'
56 56
57 #------------------------------------------------------------------------------ 57 #------------------------------------------------------------------------------
58 # TODO: 58 # TODO:
oletools/olevba.py
@@ -7,14 +7,14 @@ olevba is a script to parse OLE and OpenXML files such as MS Office documents @@ -7,14 +7,14 @@ olevba is a script to parse OLE and OpenXML files such as MS Office documents
7 and analyze malicious macros. 7 and analyze malicious macros.
8 8
9 Supported formats: 9 Supported formats:
10 -- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)  
11 -- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)  
12 -- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)  
13 -- Word/PowerPoint 2007+ XML (aka Flat OPC)  
14 -- Word 2003 XML (.xml)  
15 -- Word/Excel Single File Web Page / MHTML (.mht)  
16 -- Publisher (.pub)  
17 -- raises an error if run with files encrypted using MS Crypto API RC4 10 + - Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)
  11 + - Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)
  12 + - PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)
  13 + - Word/PowerPoint 2007+ XML (aka Flat OPC)
  14 + - Word 2003 XML (.xml)
  15 + - Word/Excel Single File Web Page / MHTML (.mht)
  16 + - Publisher (.pub)
  17 + - raises an error if run with files encrypted using MS Crypto API RC4
18 18
19 Author: Philippe Lagadec - http://www.decalage.info 19 Author: Philippe Lagadec - http://www.decalage.info
20 License: BSD, see source code or documentation 20 License: BSD, see source code or documentation
@@ -28,7 +28,7 @@ https://github.com/unixfreak0037/officeparser @@ -28,7 +28,7 @@ https://github.com/unixfreak0037/officeparser
28 28
29 # === LICENSE ================================================================== 29 # === LICENSE ==================================================================
30 30
31 -# olevba is copyright (c) 2014-2018 Philippe Lagadec (http://www.decalage.info) 31 +# olevba is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info)
32 # All rights reserved. 32 # All rights reserved.
33 # 33 #
34 # Redistribution and use in source and binary forms, with or without modification, 34 # Redistribution and use in source and binary forms, with or without modification,
@@ -210,8 +210,16 @@ from __future__ import print_function @@ -210,8 +210,16 @@ from __future__ import print_function
210 # 2018-09-11 v0.54 PL: - olefile is now a dependency 210 # 2018-09-11 v0.54 PL: - olefile is now a dependency
211 # 2018-10-08 PL: - replace backspace before printing to console (issue #358) 211 # 2018-10-08 PL: - replace backspace before printing to console (issue #358)
212 # 2018-10-25 CH: - detect encryption and raise error if detected 212 # 2018-10-25 CH: - detect encryption and raise error if detected
  213 +# 2018-12-03 PL: - uses tablestream (+colors) instead of prettytable
  214 +# 2018-12-06 PL: - colorize the suspicious keywords found in VBA code
  215 +# 2019-01-01 PL: - removed support for Python 2.6
  216 +# 2019-03-18 PL: - added XLM/XLF macros detection for Excel OLE files
  217 +# 2019-03-25 CH: - added decryption of password-protected files
  218 +# 2019-04-09 PL: - decompress_stream accepts bytes (issue #422)
  219 +# 2019-05-23 v0.55 PL: - added option --pcode to call pcodedmp and display P-code
  220 +# 2019-06-05 PL: - added VBA stomping detection
213 221
214 -__version__ = '0.54dev4' 222 +__version__ = '0.55.dev3'
215 223
216 #------------------------------------------------------------------------------ 224 #------------------------------------------------------------------------------
217 # TODO: 225 # TODO:
@@ -236,23 +244,20 @@ __version__ = &#39;0.54dev4&#39; @@ -236,23 +244,20 @@ __version__ = &#39;0.54dev4&#39;
236 # - extract_macros: use combined struct.unpack instead of many calls 244 # - extract_macros: use combined struct.unpack instead of many calls
237 # - all except clauses should target specific exceptions 245 # - all except clauses should target specific exceptions
238 246
239 -#------------------------------------------------------------------------------ 247 +# ------------------------------------------------------------------------------
240 # REFERENCES: 248 # REFERENCES:
241 # - [MS-OVBA]: Microsoft Office VBA File Format Structure 249 # - [MS-OVBA]: Microsoft Office VBA File Format Structure
242 # http://msdn.microsoft.com/en-us/library/office/cc313094%28v=office.12%29.aspx 250 # http://msdn.microsoft.com/en-us/library/office/cc313094%28v=office.12%29.aspx
243 # - officeparser: https://github.com/unixfreak0037/officeparser 251 # - officeparser: https://github.com/unixfreak0037/officeparser
244 252
245 253
246 -#--- IMPORTS ------------------------------------------------------------------ 254 +# --- IMPORTS ------------------------------------------------------------------
247 255
248 import sys 256 import sys
249 import os 257 import os
250 import logging 258 import logging
251 import struct 259 import struct
252 -try:  
253 - from cStringIO import StringIO  
254 -except ImportError:  
255 - from io import StringIO 260 +from io import BytesIO, StringIO
256 import math 261 import math
257 import zipfile 262 import zipfile
258 import re 263 import re
@@ -261,7 +266,7 @@ import binascii @@ -261,7 +266,7 @@ import binascii
261 import base64 266 import base64
262 import zlib 267 import zlib
263 import email # for MHTML parsing 268 import email # for MHTML parsing
264 -import string # for printable 269 +import string # for printable
265 import json # for json output mode (argument --json) 270 import json # for json output mode (argument --json)
266 271
267 # import lxml or ElementTree for XML parsing: 272 # import lxml or ElementTree for XML parsing:
@@ -297,11 +302,11 @@ _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) @@ -297,11 +302,11 @@ _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
297 # print('_thismodule_dir = %r' % _thismodule_dir) 302 # print('_thismodule_dir = %r' % _thismodule_dir)
298 _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..')) 303 _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
299 # print('_parent_dir = %r' % _thirdparty_dir) 304 # print('_parent_dir = %r' % _thirdparty_dir)
300 -if not _parent_dir in sys.path: 305 +if _parent_dir not in sys.path:
301 sys.path.insert(0, _parent_dir) 306 sys.path.insert(0, _parent_dir)
302 307
303 import olefile 308 import olefile
304 -from oletools.thirdparty.prettytable import prettytable 309 +from oletools.thirdparty.tablestream import tablestream
305 from oletools.thirdparty.xglob import xglob, PathNotFoundException 310 from oletools.thirdparty.xglob import xglob, PathNotFoundException
306 from pyparsing import \ 311 from pyparsing import \
307 CaselessKeyword, CaselessLiteral, Combine, Forward, Literal, \ 312 CaselessKeyword, CaselessLiteral, Combine, Forward, Literal, \
@@ -311,9 +316,8 @@ from pyparsing import \ @@ -311,9 +316,8 @@ from pyparsing import \
311 from oletools import ppt_parser 316 from oletools import ppt_parser
312 from oletools import oleform 317 from oletools import oleform
313 from oletools import rtfobj 318 from oletools import rtfobj
314 -from oletools import oleid  
315 -from oletools.common.errors import FileIsEncryptedError  
316 - 319 +from oletools import crypto
  320 +from oletools.common import codepages
317 321
318 # monkeypatch email to fix issue #32: 322 # monkeypatch email to fix issue #32:
319 # allow header lines without ":" 323 # allow header lines without ":"
@@ -324,30 +328,77 @@ email.feedparser.headerRE = re.compile(r&#39;^(From |[\041-\071\073-\176]{1,}:?|[\t @@ -324,30 +328,77 @@ email.feedparser.headerRE = re.compile(r&#39;^(From |[\041-\071\073-\176]{1,}:?|[\t
324 328
325 if sys.version_info[0] <= 2: 329 if sys.version_info[0] <= 2:
326 # Python 2.x 330 # Python 2.x
327 - if sys.version_info[1] <= 6:  
328 - # Python 2.6  
329 - # use is_zipfile backported from Python 2.7:  
330 - from thirdparty.zipfile27 import is_zipfile  
331 - else:  
332 - # Python 2.7  
333 - from zipfile import is_zipfile 331 + PYTHON2 = True
  332 + # to use ord on bytes/bytearray items the same way in Python 2+3
  333 + # on Python 2, just use the normal ord() because items are bytes
  334 + byte_ord = ord
  335 + #: Default string encoding for the olevba API
  336 + DEFAULT_API_ENCODING = 'utf8' # on Python 2: UTF-8 (bytes)
334 else: 337 else:
335 # Python 3.x+ 338 # Python 3.x+
336 - from zipfile import is_zipfile 339 + PYTHON2 = False
  340 +
  341 + # to use ord on bytes/bytearray items the same way in Python 2+3
  342 + # on Python 3, items are int, so just return the item
  343 + def byte_ord(x):
  344 + return x
337 # xrange is now called range: 345 # xrange is now called range:
338 xrange = range 346 xrange = range
  347 + # unichr does not exist anymore, only chr:
  348 + unichr = chr
  349 + # json2ascii also needs "unicode":
  350 + unicode = str
  351 + from functools import reduce
  352 + #: Default string encoding for the olevba API
  353 + DEFAULT_API_ENCODING = None # on Python 3: None (unicode)
  354 + # Python 3.0 - 3.4 support:
  355 + # From https://gist.github.com/ynkdir/867347/c5e188a4886bc2dd71876c7e069a7b00b6c16c61
  356 + if sys.version_info < (3, 5):
  357 + import codecs
  358 + _backslashreplace_errors = codecs.lookup_error("backslashreplace")
  359 +
  360 + def backslashreplace_errors(exc):
  361 + if isinstance(exc, UnicodeDecodeError):
  362 + u = "".join("\\x{0:02x}".format(c) for c in exc.object[exc.start:exc.end])
  363 + return u, exc.end
  364 + return _backslashreplace_errors(exc)
  365 +
  366 + codecs.register_error("backslashreplace", backslashreplace_errors)
  367 +
  368 +
  369 +def unicode2str(unicode_string):
  370 + """
  371 + convert a unicode string to a native str:
  372 + - on Python 3, it returns the same string
  373 + - on Python 2, the string is encoded with UTF-8 to a bytes str
  374 + :param unicode_string: unicode string to be converted
  375 + :return: the string converted to str
  376 + :rtype: str
  377 + """
  378 + if PYTHON2:
  379 + return unicode_string.encode('utf8', errors='replace')
  380 + else:
  381 + return unicode_string
339 382
340 -# === LOGGING =================================================================  
341 383
342 -class NullHandler(logging.Handler): 384 +def bytes2str(bytes_string, encoding='utf8'):
343 """ 385 """
344 - Log Handler without output, to avoid printing messages if logging is not  
345 - configured by the main application.  
346 - Python 2.7 has logging.NullHandler, but this is necessary for 2.6:  
347 - see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library 386 + convert a bytes string to a native str:
  387 + - on Python 2, it returns the same string (bytes=str)
  388 + - on Python 3, the string is decoded using the provided encoding
  389 + (UTF-8 by default) to a unicode str
  390 + :param bytes_string: bytes string to be converted
  391 + :param encoding: codec to be used for decoding
  392 + :return: the string converted to str
  393 + :rtype: str
348 """ 394 """
349 - def emit(self, record):  
350 - pass 395 + if PYTHON2:
  396 + return bytes_string
  397 + else:
  398 + return bytes_string.decode('utf8', errors='replace')
  399 +
  400 +
  401 +# === LOGGING =================================================================
351 402
352 def get_logger(name, level=logging.CRITICAL+1): 403 def get_logger(name, level=logging.CRITICAL+1):
353 """ 404 """
@@ -361,7 +412,7 @@ def get_logger(name, level=logging.CRITICAL+1): @@ -361,7 +412,7 @@ def get_logger(name, level=logging.CRITICAL+1):
361 # First, test if there is already a logger with the same name, else it 412 # First, test if there is already a logger with the same name, else it
362 # will generate duplicate messages (due to duplicate handlers): 413 # will generate duplicate messages (due to duplicate handlers):
363 if name in logging.Logger.manager.loggerDict: 414 if name in logging.Logger.manager.loggerDict:
364 - #NOTE: another less intrusive but more "hackish" solution would be to 415 + # NOTE: another less intrusive but more "hackish" solution would be to
365 # use getLogger then test if its effective level is not default. 416 # use getLogger then test if its effective level is not default.
366 logger = logging.getLogger(name) 417 logger = logging.getLogger(name)
367 # make sure level is OK: 418 # make sure level is OK:
@@ -371,7 +422,7 @@ def get_logger(name, level=logging.CRITICAL+1): @@ -371,7 +422,7 @@ def get_logger(name, level=logging.CRITICAL+1):
371 logger = logging.getLogger(name) 422 logger = logging.getLogger(name)
372 # only add a NullHandler for this logger, it is up to the application 423 # only add a NullHandler for this logger, it is up to the application
373 # to configure its own logging: 424 # to configure its own logging:
374 - logger.addHandler(NullHandler()) 425 + logger.addHandler(logging.NullHandler())
375 logger.setLevel(level) 426 logger.setLevel(level)
376 return logger 427 return logger
377 428
@@ -388,6 +439,7 @@ def enable_logging(): @@ -388,6 +439,7 @@ def enable_logging():
388 log.setLevel(logging.NOTSET) 439 log.setLevel(logging.NOTSET)
389 # Also enable logging in the ppt_parser module: 440 # Also enable logging in the ppt_parser module:
390 ppt_parser.enable_logging() 441 ppt_parser.enable_logging()
  442 + crypto.enable_logging()
391 443
392 444
393 445
@@ -564,7 +616,8 @@ AUTOEXEC_KEYWORDS = { @@ -564,7 +616,8 @@ AUTOEXEC_KEYWORDS = {
564 616
565 # MS Excel: 617 # MS Excel:
566 'Runs when the Excel Workbook is opened': 618 'Runs when the Excel Workbook is opened':
567 - ('Auto_Open', 'Workbook_Open', 'Workbook_Activate'), 619 + ('Auto_Open', 'Workbook_Open', 'Workbook_Activate', 'Auto_Ope'),
  620 + # TODO: "Auto_Ope" is temporarily here because of a bug in plugin_biff, which misses the last byte in "Auto_Open"...
568 'Runs when the Excel Workbook is closed': 621 'Runs when the Excel Workbook is closed':
569 ('Auto_Close', 'Workbook_Close'), 622 ('Auto_Close', 'Workbook_Close'),
570 623
@@ -600,9 +653,10 @@ SUSPICIOUS_KEYWORDS = { @@ -600,9 +653,10 @@ SUSPICIOUS_KEYWORDS = {
600 ('CreateTextFile', 'ADODB.Stream', 'WriteText', 'SaveToFile'), 653 ('CreateTextFile', 'ADODB.Stream', 'WriteText', 'SaveToFile'),
601 #CreateTextFile: http://msdn.microsoft.com/en-us/library/office/gg264617%28v=office.15%29.aspx 654 #CreateTextFile: http://msdn.microsoft.com/en-us/library/office/gg264617%28v=office.15%29.aspx
602 #ADODB.Stream sample: http://pastebin.com/Z4TMyuq6 655 #ADODB.Stream sample: http://pastebin.com/Z4TMyuq6
  656 + # ShellExecute: https://twitter.com/StanHacked/status/1075088449768693762
603 'May run an executable file or a system command': 657 'May run an executable file or a system command':
604 ('Shell', 'vbNormal', 'vbNormalFocus', 'vbHide', 'vbMinimizedFocus', 'vbMaximizedFocus', 'vbNormalNoFocus', 658 ('Shell', 'vbNormal', 'vbNormalFocus', 'vbHide', 'vbMinimizedFocus', 'vbMaximizedFocus', 'vbNormalNoFocus',
605 - 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute'), 659 + 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute', 'ShellExecuteA', 'shell32'),
606 # MacScript: see https://msdn.microsoft.com/en-us/library/office/gg264812.aspx 660 # MacScript: see https://msdn.microsoft.com/en-us/library/office/gg264812.aspx
607 'May run an executable file or a system command on a Mac': 661 'May run an executable file or a system command on a Mac':
608 ('MacScript',), 662 ('MacScript',),
@@ -620,6 +674,8 @@ SUSPICIOUS_KEYWORDS = { @@ -620,6 +674,8 @@ SUSPICIOUS_KEYWORDS = {
620 'invoke-command', 'scriptblock', 'Invoke-Expression', 'AuthorizationManager'), 674 'invoke-command', 'scriptblock', 'Invoke-Expression', 'AuthorizationManager'),
621 'May run an executable file or a system command using PowerShell': 675 'May run an executable file or a system command using PowerShell':
622 ('Start-Process',), 676 ('Start-Process',),
  677 + 'May run an executable file or a system command using Excel 4 Macros (XLM/XLF)':
  678 + ('EXEC',),
623 'May hide the application': 679 'May hide the application':
624 ('Application.Visible', 'ShowWindow', 'SW_HIDE'), 680 ('Application.Visible', 'ShowWindow', 'SW_HIDE'),
625 'May create a directory': 681 'May create a directory':
@@ -635,6 +691,8 @@ SUSPICIOUS_KEYWORDS = { @@ -635,6 +691,8 @@ SUSPICIOUS_KEYWORDS = {
635 ('New-Object',), 691 ('New-Object',),
636 'May run an application (if combined with CreateObject)': 692 'May run an application (if combined with CreateObject)':
637 ('Shell.Application',), 693 ('Shell.Application',),
  694 + 'May run an Excel 4 Macro (aka XLM/XLF)':
  695 + ('ExecuteExcel4Macro',),
638 'May enumerate application windows (if combined with Shell.Application object)': 696 'May enumerate application windows (if combined with Shell.Application object)':
639 ('Windows', 'FindWindow'), 697 ('Windows', 'FindWindow'),
640 'May run code from a DLL': 698 'May run code from a DLL':
@@ -643,9 +701,12 @@ SUSPICIOUS_KEYWORDS = { @@ -643,9 +701,12 @@ SUSPICIOUS_KEYWORDS = {
643 'May run code from a library on a Mac': 701 'May run code from a library on a Mac':
644 #TODO: regex to find declare+lib on same line - see mraptor 702 #TODO: regex to find declare+lib on same line - see mraptor
645 ('libc.dylib', 'dylib'), 703 ('libc.dylib', 'dylib'),
  704 + 'May run code from a DLL using Excel 4 Macros (XLM/XLF)':
  705 + ('REGISTER',),
646 'May inject code into another process': 706 'May inject code into another process':
647 - ('CreateThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload  
648 - 'VirtualAllocEx', 'RtlMoveMemory', 707 + ('CreateThread', 'CreateUserThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload
  708 + 'VirtualAllocEx', 'RtlMoveMemory', 'WriteProcessMemory',
  709 + 'SetContextThread', 'QueueApcThread', 'WriteVirtualMemory', 'VirtualProtect'
649 ), 710 ),
650 'May run a shellcode in memory': 711 'May run a shellcode in memory':
651 ('EnumSystemLanguageGroupsW?', # Used by Hancitor in Oct 2016 712 ('EnumSystemLanguageGroupsW?', # Used by Hancitor in Oct 2016
@@ -777,7 +838,8 @@ re_dridex_string = re.compile(r&#39;&quot;[0-9A-Za-z]{20,}&quot;&#39;) @@ -777,7 +838,8 @@ re_dridex_string = re.compile(r&#39;&quot;[0-9A-Za-z]{20,}&quot;&#39;)
777 re_nothex_check = re.compile(r'[G-Zg-z]') 838 re_nothex_check = re.compile(r'[G-Zg-z]')
778 839
779 # regex to extract printable strings (at least 5 chars) from VBA Forms: 840 # regex to extract printable strings (at least 5 chars) from VBA Forms:
780 -re_printable_string = re.compile(r'[\t\r\n\x20-\xFF]{5,}') 841 +# (must be bytes for Python 3)
  842 +re_printable_string = re.compile(b'[\\t\\r\\n\\x20-\\xFF]{5,}')
781 843
782 844
783 # === PARTIAL VBA GRAMMAR ==================================================== 845 # === PARTIAL VBA GRAMMAR ====================================================
@@ -918,10 +980,13 @@ vba_chr = Suppress( @@ -918,10 +980,13 @@ vba_chr = Suppress(
918 def vba_chr_tostr(t): 980 def vba_chr_tostr(t):
919 try: 981 try:
920 i = t[0] 982 i = t[0]
921 - # normal, non-unicode character:  
922 if i>=0 and i<=255: 983 if i>=0 and i<=255:
  984 + # normal, non-unicode character:
  985 + # TODO: check if it needs to be converted to bytes for Python 3
923 return VbaExpressionString(chr(i)) 986 return VbaExpressionString(chr(i))
924 else: 987 else:
  988 + # unicode character
  989 + # Note: this distinction is only needed for Python 2
925 return VbaExpressionString(unichr(i).encode('utf-8', 'backslashreplace')) 990 return VbaExpressionString(unichr(i).encode('utf-8', 'backslashreplace'))
926 except ValueError: 991 except ValueError:
927 log.exception('ERROR: incorrect parameter value for chr(): %r' % i) 992 log.exception('ERROR: incorrect parameter value for chr(): %r' % i)
@@ -1188,8 +1253,9 @@ def decompress_stream(compressed_container): @@ -1188,8 +1253,9 @@ def decompress_stream(compressed_container):
1188 """ 1253 """
1189 Decompress a stream according to MS-OVBA section 2.4.1 1254 Decompress a stream according to MS-OVBA section 2.4.1
1190 1255
1191 - compressed_container: string compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm  
1192 - return the decompressed container as a string (bytes) 1256 + :param compressed_container bytearray: bytearray or bytes compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm
  1257 + :return: the decompressed container as a bytes string
  1258 + :rtype: bytes
1193 """ 1259 """
1194 # 2.4.1.2 State Variables 1260 # 2.4.1.2 State Variables
1195 1261
@@ -1211,10 +1277,14 @@ def decompress_stream(compressed_container): @@ -1211,10 +1277,14 @@ def decompress_stream(compressed_container):
1211 # DecompressedChunkStart: The location of the first byte of the DecompressedChunk (section 2.4.1.1.3) within the 1277 # DecompressedChunkStart: The location of the first byte of the DecompressedChunk (section 2.4.1.1.3) within the
1212 # DecompressedBuffer (section 2.4.1.1.2). 1278 # DecompressedBuffer (section 2.4.1.1.2).
1213 1279
1214 - decompressed_container = '' # result 1280 + # Check the input is a bytearray, otherwise convert it (assuming it's bytes):
  1281 + if not isinstance(compressed_container, bytearray):
  1282 + compressed_container = bytearray(compressed_container)
  1283 + # raise TypeError('decompress_stream requires a bytearray as input')
  1284 + decompressed_container = bytearray() # result
1215 compressed_current = 0 1285 compressed_current = 0
1216 1286
1217 - sig_byte = ord(compressed_container[compressed_current]) 1287 + sig_byte = compressed_container[compressed_current]
1218 if sig_byte != 0x01: 1288 if sig_byte != 0x01:
1219 raise ValueError('invalid signature byte {0:02X}'.format(sig_byte)) 1289 raise ValueError('invalid signature byte {0:02X}'.format(sig_byte))
1220 1290
@@ -1260,7 +1330,7 @@ def decompress_stream(compressed_container): @@ -1260,7 +1330,7 @@ def decompress_stream(compressed_container):
1260 # MS-OVBA 2.4.1.3.3 Decompressing a RawChunk 1330 # MS-OVBA 2.4.1.3.3 Decompressing a RawChunk
1261 # uncompressed chunk: read the next 4096 bytes as-is 1331 # uncompressed chunk: read the next 4096 bytes as-is
1262 #TODO: check if there are at least 4096 bytes left 1332 #TODO: check if there are at least 4096 bytes left
1263 - decompressed_container += compressed_container[compressed_current:compressed_current + 4096] 1333 + decompressed_container.extend([compressed_container[compressed_current:compressed_current + 4096]])
1264 compressed_current += 4096 1334 compressed_current += 4096
1265 else: 1335 else:
1266 # MS-OVBA 2.4.1.3.2 Decompressing a CompressedChunk 1336 # MS-OVBA 2.4.1.3.2 Decompressing a CompressedChunk
@@ -1271,7 +1341,7 @@ def decompress_stream(compressed_container): @@ -1271,7 +1341,7 @@ def decompress_stream(compressed_container):
1271 # log.debug('compressed_current = %d / compressed_end = %d' % (compressed_current, compressed_end)) 1341 # log.debug('compressed_current = %d / compressed_end = %d' % (compressed_current, compressed_end))
1272 # FlagByte: 8 bits indicating if the following 8 tokens are either literal (1 byte of plain text) or 1342 # FlagByte: 8 bits indicating if the following 8 tokens are either literal (1 byte of plain text) or
1273 # copy tokens (reference to a previous literal token) 1343 # copy tokens (reference to a previous literal token)
1274 - flag_byte = ord(compressed_container[compressed_current]) 1344 + flag_byte = compressed_container[compressed_current]
1275 compressed_current += 1 1345 compressed_current += 1
1276 for bit_index in xrange(0, 8): 1346 for bit_index in xrange(0, 8):
1277 # log.debug('bit_index=%d / compressed_current=%d / compressed_end=%d' % (bit_index, compressed_current, compressed_end)) 1347 # log.debug('bit_index=%d / compressed_current=%d / compressed_end=%d' % (bit_index, compressed_current, compressed_end))
@@ -1283,7 +1353,7 @@ def decompress_stream(compressed_container): @@ -1283,7 +1353,7 @@ def decompress_stream(compressed_container):
1283 #log.debug('bit_index=%d: flag_bit=%d' % (bit_index, flag_bit)) 1353 #log.debug('bit_index=%d: flag_bit=%d' % (bit_index, flag_bit))
1284 if flag_bit == 0: # LiteralToken 1354 if flag_bit == 0: # LiteralToken
1285 # copy one byte directly to output 1355 # copy one byte directly to output
1286 - decompressed_container += compressed_container[compressed_current] 1356 + decompressed_container.extend([compressed_container[compressed_current]])
1287 compressed_current += 1 1357 compressed_current += 1
1288 else: # CopyToken 1358 else: # CopyToken
1289 # MS-OVBA 2.4.1.3.19.2 Unpack CopyToken 1359 # MS-OVBA 2.4.1.3.19.2 Unpack CopyToken
@@ -1299,520 +1369,664 @@ def decompress_stream(compressed_container): @@ -1299,520 +1369,664 @@ def decompress_stream(compressed_container):
1299 #log.debug('offset=%d length=%d' % (offset, length)) 1369 #log.debug('offset=%d length=%d' % (offset, length))
1300 copy_source = len(decompressed_container) - offset 1370 copy_source = len(decompressed_container) - offset
1301 for index in xrange(copy_source, copy_source + length): 1371 for index in xrange(copy_source, copy_source + length):
1302 - decompressed_container += decompressed_container[index] 1372 + decompressed_container.extend([decompressed_container[index]])
1303 compressed_current += 2 1373 compressed_current += 2
1304 - return decompressed_container 1374 + return bytes(decompressed_container)
1305 1375
1306 1376
1307 -def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False): 1377 +class VBA_Module(object):
1308 """ 1378 """
1309 - Extract VBA macros from an OleFileIO object.  
1310 - Internal function, do not call directly.  
1311 -  
1312 - vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream  
1313 - vba_project: path to the PROJECT stream  
1314 - :param relaxed: If True, only create info/debug log entry if data is not as expected  
1315 - (e.g. opening substream fails); if False, raise an error in this case  
1316 - This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream 1379 + Class to parse a VBA module from an OLE file, and to store all the corresponding
  1380 + metadata and VBA source code.
1317 """ 1381 """
1318 - # Open the PROJECT stream:  
1319 - project = ole.openstream(project_path)  
1320 - log.debug('relaxed is %s' % relaxed)  
1321 -  
1322 - # sample content of the PROJECT stream:  
1323 -  
1324 - ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}"  
1325 - ## Document=ThisDocument/&H00000000  
1326 - ## Module=NewMacros  
1327 - ## Name="Project"  
1328 - ## HelpContextID="0"  
1329 - ## VersionCompatible32="393222000"  
1330 - ## CMG="F1F301E705E705E705E705"  
1331 - ## DPB="8F8D7FE3831F2020202020"  
1332 - ## GC="2D2FDD81E51EE61EE6E1"  
1333 - ##  
1334 - ## [Host Extender Info]  
1335 - ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000  
1336 - ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000  
1337 - ##  
1338 - ## [Workspace]  
1339 - ## ThisDocument=22, 29, 339, 477, Z  
1340 - ## NewMacros=-4, 42, 832, 510, C  
1341 -  
1342 - code_modules = {}  
1343 -  
1344 - for line in project:  
1345 - line = line.strip()  
1346 - if '=' in line:  
1347 - # split line at the 1st equal sign:  
1348 - name, value = line.split('=', 1)  
1349 - # looking for code modules  
1350 - # add the code module as a key in the dictionary  
1351 - # the value will be the extension needed later  
1352 - # The value is converted to lowercase, to allow case-insensitive matching (issue #3)  
1353 - value = value.lower()  
1354 - if name == 'Document':  
1355 - # split value at the 1st slash, keep 1st part:  
1356 - value = value.split('/', 1)[0]  
1357 - code_modules[value] = CLASS_EXTENSION  
1358 - elif name == 'Module':  
1359 - code_modules[value] = MODULE_EXTENSION  
1360 - elif name == 'Class':  
1361 - code_modules[value] = CLASS_EXTENSION  
1362 - elif name == 'BaseClass':  
1363 - code_modules[value] = FORM_EXTENSION  
1364 -  
1365 - # read data from dir stream (compressed)  
1366 - dir_compressed = ole.openstream(dir_path).read()  
1367 -  
1368 - def check_value(name, expected, value):  
1369 - if expected != value:  
1370 - if relaxed:  
1371 - log.error("invalid value for {0} expected {1:04X} got {2:04X}"  
1372 - .format(name, expected, value))  
1373 - else:  
1374 - raise UnexpectedDataError(dir_path, name, expected, value)  
1375 -  
1376 - dir_stream = StringIO(decompress_stream(dir_compressed))  
1377 -  
1378 - # PROJECTSYSKIND Record  
1379 - projectsyskind_id = struct.unpack("<H", dir_stream.read(2))[0]  
1380 - check_value('PROJECTSYSKIND_Id', 0x0001, projectsyskind_id)  
1381 - projectsyskind_size = struct.unpack("<L", dir_stream.read(4))[0]  
1382 - check_value('PROJECTSYSKIND_Size', 0x0004, projectsyskind_size)  
1383 - projectsyskind_syskind = struct.unpack("<L", dir_stream.read(4))[0]  
1384 - if projectsyskind_syskind == 0x00:  
1385 - log.debug("16-bit Windows")  
1386 - elif projectsyskind_syskind == 0x01:  
1387 - log.debug("32-bit Windows")  
1388 - elif projectsyskind_syskind == 0x02:  
1389 - log.debug("Macintosh")  
1390 - elif projectsyskind_syskind == 0x03:  
1391 - log.debug("64-bit Windows")  
1392 - else:  
1393 - log.error("invalid PROJECTSYSKIND_SysKind {0:04X}".format(projectsyskind_syskind))  
1394 -  
1395 - # PROJECTLCID Record  
1396 - projectlcid_id = struct.unpack("<H", dir_stream.read(2))[0]  
1397 - check_value('PROJECTLCID_Id', 0x0002, projectlcid_id)  
1398 - projectlcid_size = struct.unpack("<L", dir_stream.read(4))[0]  
1399 - check_value('PROJECTLCID_Size', 0x0004, projectlcid_size)  
1400 - projectlcid_lcid = struct.unpack("<L", dir_stream.read(4))[0]  
1401 - check_value('PROJECTLCID_Lcid', 0x409, projectlcid_lcid)  
1402 -  
1403 - # PROJECTLCIDINVOKE Record  
1404 - projectlcidinvoke_id = struct.unpack("<H", dir_stream.read(2))[0]  
1405 - check_value('PROJECTLCIDINVOKE_Id', 0x0014, projectlcidinvoke_id)  
1406 - projectlcidinvoke_size = struct.unpack("<L", dir_stream.read(4))[0]  
1407 - check_value('PROJECTLCIDINVOKE_Size', 0x0004, projectlcidinvoke_size)  
1408 - projectlcidinvoke_lcidinvoke = struct.unpack("<L", dir_stream.read(4))[0]  
1409 - check_value('PROJECTLCIDINVOKE_LcidInvoke', 0x409, projectlcidinvoke_lcidinvoke)  
1410 -  
1411 - # PROJECTCODEPAGE Record  
1412 - projectcodepage_id = struct.unpack("<H", dir_stream.read(2))[0]  
1413 - check_value('PROJECTCODEPAGE_Id', 0x0003, projectcodepage_id)  
1414 - projectcodepage_size = struct.unpack("<L", dir_stream.read(4))[0]  
1415 - check_value('PROJECTCODEPAGE_Size', 0x0002, projectcodepage_size)  
1416 - projectcodepage_codepage = struct.unpack("<H", dir_stream.read(2))[0]  
1417 -  
1418 - # PROJECTNAME Record  
1419 - projectname_id = struct.unpack("<H", dir_stream.read(2))[0]  
1420 - check_value('PROJECTNAME_Id', 0x0004, projectname_id)  
1421 - projectname_sizeof_projectname = struct.unpack("<L", dir_stream.read(4))[0]  
1422 - if projectname_sizeof_projectname < 1 or projectname_sizeof_projectname > 128:  
1423 - log.error("PROJECTNAME_SizeOfProjectName value not in range: {0}".format(projectname_sizeof_projectname))  
1424 - projectname_projectname = dir_stream.read(projectname_sizeof_projectname)  
1425 - unused = projectname_projectname  
1426 -  
1427 - # PROJECTDOCSTRING Record  
1428 - projectdocstring_id = struct.unpack("<H", dir_stream.read(2))[0]  
1429 - check_value('PROJECTDOCSTRING_Id', 0x0005, projectdocstring_id)  
1430 - projectdocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0]  
1431 - if projectdocstring_sizeof_docstring > 2000:  
1432 - log.error(  
1433 - "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring))  
1434 - projectdocstring_docstring = dir_stream.read(projectdocstring_sizeof_docstring)  
1435 - projectdocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1436 - check_value('PROJECTDOCSTRING_Reserved', 0x0040, projectdocstring_reserved)  
1437 - projectdocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1438 - if projectdocstring_sizeof_docstring_unicode % 2 != 0:  
1439 - log.error("PROJECTDOCSTRING_SizeOfDocStringUnicode is not even")  
1440 - projectdocstring_docstring_unicode = dir_stream.read(projectdocstring_sizeof_docstring_unicode)  
1441 - unused = projectdocstring_docstring  
1442 - unused = projectdocstring_docstring_unicode  
1443 -  
1444 - # PROJECTHELPFILEPATH Record - MS-OVBA 2.3.4.2.1.7  
1445 - projecthelpfilepath_id = struct.unpack("<H", dir_stream.read(2))[0]  
1446 - check_value('PROJECTHELPFILEPATH_Id', 0x0006, projecthelpfilepath_id)  
1447 - projecthelpfilepath_sizeof_helpfile1 = struct.unpack("<L", dir_stream.read(4))[0]  
1448 - if projecthelpfilepath_sizeof_helpfile1 > 260:  
1449 - log.error(  
1450 - "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1))  
1451 - projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1)  
1452 - projecthelpfilepath_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1453 - check_value('PROJECTHELPFILEPATH_Reserved', 0x003D, projecthelpfilepath_reserved)  
1454 - projecthelpfilepath_sizeof_helpfile2 = struct.unpack("<L", dir_stream.read(4))[0]  
1455 - if projecthelpfilepath_sizeof_helpfile2 != projecthelpfilepath_sizeof_helpfile1:  
1456 - log.error("PROJECTHELPFILEPATH_SizeOfHelpFile1 does not equal PROJECTHELPFILEPATH_SizeOfHelpFile2")  
1457 - projecthelpfilepath_helpfile2 = dir_stream.read(projecthelpfilepath_sizeof_helpfile2)  
1458 - if projecthelpfilepath_helpfile2 != projecthelpfilepath_helpfile1:  
1459 - log.error("PROJECTHELPFILEPATH_HelpFile1 does not equal PROJECTHELPFILEPATH_HelpFile2")  
1460 -  
1461 - # PROJECTHELPCONTEXT Record  
1462 - projecthelpcontext_id = struct.unpack("<H", dir_stream.read(2))[0]  
1463 - check_value('PROJECTHELPCONTEXT_Id', 0x0007, projecthelpcontext_id)  
1464 - projecthelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0]  
1465 - check_value('PROJECTHELPCONTEXT_Size', 0x0004, projecthelpcontext_size)  
1466 - projecthelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0]  
1467 - unused = projecthelpcontext_helpcontext  
1468 -  
1469 - # PROJECTLIBFLAGS Record  
1470 - projectlibflags_id = struct.unpack("<H", dir_stream.read(2))[0]  
1471 - check_value('PROJECTLIBFLAGS_Id', 0x0008, projectlibflags_id)  
1472 - projectlibflags_size = struct.unpack("<L", dir_stream.read(4))[0]  
1473 - check_value('PROJECTLIBFLAGS_Size', 0x0004, projectlibflags_size)  
1474 - projectlibflags_projectlibflags = struct.unpack("<L", dir_stream.read(4))[0]  
1475 - check_value('PROJECTLIBFLAGS_ProjectLibFlags', 0x0000, projectlibflags_projectlibflags)  
1476 -  
1477 - # PROJECTVERSION Record  
1478 - projectversion_id = struct.unpack("<H", dir_stream.read(2))[0]  
1479 - check_value('PROJECTVERSION_Id', 0x0009, projectversion_id)  
1480 - projectversion_reserved = struct.unpack("<L", dir_stream.read(4))[0]  
1481 - check_value('PROJECTVERSION_Reserved', 0x0004, projectversion_reserved)  
1482 - projectversion_versionmajor = struct.unpack("<L", dir_stream.read(4))[0]  
1483 - projectversion_versionminor = struct.unpack("<H", dir_stream.read(2))[0]  
1484 - unused = projectversion_versionmajor  
1485 - unused = projectversion_versionminor  
1486 -  
1487 - # PROJECTCONSTANTS Record  
1488 - projectconstants_id = struct.unpack("<H", dir_stream.read(2))[0]  
1489 - check_value('PROJECTCONSTANTS_Id', 0x000C, projectconstants_id)  
1490 - projectconstants_sizeof_constants = struct.unpack("<L", dir_stream.read(4))[0]  
1491 - if projectconstants_sizeof_constants > 1015:  
1492 - log.error(  
1493 - "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants))  
1494 - projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants)  
1495 - projectconstants_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1496 - check_value('PROJECTCONSTANTS_Reserved', 0x003C, projectconstants_reserved)  
1497 - projectconstants_sizeof_constants_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1498 - if projectconstants_sizeof_constants_unicode % 2 != 0:  
1499 - log.error("PROJECTCONSTANTS_SizeOfConstantsUnicode is not even")  
1500 - projectconstants_constants_unicode = dir_stream.read(projectconstants_sizeof_constants_unicode)  
1501 - unused = projectconstants_constants  
1502 - unused = projectconstants_constants_unicode  
1503 -  
1504 - # array of REFERENCE records  
1505 - check = None  
1506 - while True:  
1507 - check = struct.unpack("<H", dir_stream.read(2))[0]  
1508 - log.debug("reference type = {0:04X}".format(check))  
1509 - if check == 0x000F:  
1510 - break  
1511 -  
1512 - if check == 0x0016:  
1513 - # REFERENCENAME  
1514 - reference_id = check  
1515 - reference_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]  
1516 - reference_name = dir_stream.read(reference_sizeof_name)  
1517 - reference_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1518 - # According to [MS-OVBA] 2.3.4.2.2.2 REFERENCENAME Record:  
1519 - # "Reserved (2 bytes): MUST be 0x003E. MUST be ignored."  
1520 - # So let's ignore it, otherwise it crashes on some files (issue #132)  
1521 - # PR #135 by @c1fe:  
1522 - # contrary to the specification I think that the unicode name  
1523 - # is optional. if reference_reserved is not 0x003E I think it  
1524 - # is actually the start of another REFERENCE record  
1525 - # at least when projectsyskind_syskind == 0x02 (Macintosh)  
1526 - if reference_reserved == 0x003E:  
1527 - #if reference_reserved not in (0x003E, 0x000D):  
1528 - # raise UnexpectedDataError(dir_path, 'REFERENCE_Reserved',  
1529 - # 0x0003E, reference_reserved)  
1530 - reference_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1531 - reference_name_unicode = dir_stream.read(reference_sizeof_name_unicode)  
1532 - unused = reference_id  
1533 - unused = reference_name  
1534 - unused = reference_name_unicode  
1535 - continue  
1536 - else:  
1537 - check = reference_reserved  
1538 - log.debug("reference type = {0:04X}".format(check))  
1539 -  
1540 - if check == 0x0033:  
1541 - # REFERENCEORIGINAL (followed by REFERENCECONTROL)  
1542 - referenceoriginal_id = check  
1543 - referenceoriginal_sizeof_libidoriginal = struct.unpack("<L", dir_stream.read(4))[0]  
1544 - referenceoriginal_libidoriginal = dir_stream.read(referenceoriginal_sizeof_libidoriginal)  
1545 - unused = referenceoriginal_id  
1546 - unused = referenceoriginal_libidoriginal  
1547 - continue  
1548 -  
1549 - if check == 0x002F:  
1550 - # REFERENCECONTROL  
1551 - referencecontrol_id = check  
1552 - referencecontrol_sizetwiddled = struct.unpack("<L", dir_stream.read(4))[0] # ignore  
1553 - referencecontrol_sizeof_libidtwiddled = struct.unpack("<L", dir_stream.read(4))[0]  
1554 - referencecontrol_libidtwiddled = dir_stream.read(referencecontrol_sizeof_libidtwiddled)  
1555 - referencecontrol_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] # ignore  
1556 - check_value('REFERENCECONTROL_Reserved1', 0x0000, referencecontrol_reserved1)  
1557 - referencecontrol_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] # ignore  
1558 - check_value('REFERENCECONTROL_Reserved2', 0x0000, referencecontrol_reserved2)  
1559 - unused = referencecontrol_id  
1560 - unused = referencecontrol_sizetwiddled  
1561 - unused = referencecontrol_libidtwiddled  
1562 - # optional field  
1563 - check2 = struct.unpack("<H", dir_stream.read(2))[0]  
1564 - if check2 == 0x0016:  
1565 - referencecontrol_namerecordextended_id = check  
1566 - referencecontrol_namerecordextended_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]  
1567 - referencecontrol_namerecordextended_name = dir_stream.read(  
1568 - referencecontrol_namerecordextended_sizeof_name)  
1569 - referencecontrol_namerecordextended_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1570 - if referencecontrol_namerecordextended_reserved == 0x003E:  
1571 - referencecontrol_namerecordextended_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1572 - referencecontrol_namerecordextended_name_unicode = dir_stream.read(  
1573 - referencecontrol_namerecordextended_sizeof_name_unicode)  
1574 - referencecontrol_reserved3 = struct.unpack("<H", dir_stream.read(2))[0]  
1575 - unused = referencecontrol_namerecordextended_id  
1576 - unused = referencecontrol_namerecordextended_name  
1577 - unused = referencecontrol_namerecordextended_name_unicode  
1578 - else:  
1579 - referencecontrol_reserved3 = referencecontrol_namerecordextended_reserved  
1580 - else:  
1581 - referencecontrol_reserved3 = check2  
1582 -  
1583 - check_value('REFERENCECONTROL_Reserved3', 0x0030, referencecontrol_reserved3)  
1584 - referencecontrol_sizeextended = struct.unpack("<L", dir_stream.read(4))[0]  
1585 - referencecontrol_sizeof_libidextended = struct.unpack("<L", dir_stream.read(4))[0]  
1586 - referencecontrol_libidextended = dir_stream.read(referencecontrol_sizeof_libidextended)  
1587 - referencecontrol_reserved4 = struct.unpack("<L", dir_stream.read(4))[0]  
1588 - referencecontrol_reserved5 = struct.unpack("<H", dir_stream.read(2))[0]  
1589 - referencecontrol_originaltypelib = dir_stream.read(16)  
1590 - referencecontrol_cookie = struct.unpack("<L", dir_stream.read(4))[0]  
1591 - unused = referencecontrol_sizeextended  
1592 - unused = referencecontrol_libidextended  
1593 - unused = referencecontrol_reserved4  
1594 - unused = referencecontrol_reserved5  
1595 - unused = referencecontrol_originaltypelib  
1596 - unused = referencecontrol_cookie  
1597 - continue  
1598 -  
1599 - if check == 0x000D:  
1600 - # REFERENCEREGISTERED  
1601 - referenceregistered_id = check  
1602 - referenceregistered_size = struct.unpack("<L", dir_stream.read(4))[0]  
1603 - referenceregistered_sizeof_libid = struct.unpack("<L", dir_stream.read(4))[0]  
1604 - referenceregistered_libid = dir_stream.read(referenceregistered_sizeof_libid)  
1605 - referenceregistered_reserved1 = struct.unpack("<L", dir_stream.read(4))[0]  
1606 - check_value('REFERENCEREGISTERED_Reserved1', 0x0000, referenceregistered_reserved1)  
1607 - referenceregistered_reserved2 = struct.unpack("<H", dir_stream.read(2))[0]  
1608 - check_value('REFERENCEREGISTERED_Reserved2', 0x0000, referenceregistered_reserved2)  
1609 - unused = referenceregistered_id  
1610 - unused = referenceregistered_size  
1611 - unused = referenceregistered_libid  
1612 - continue  
1613 1382
1614 - if check == 0x000E:  
1615 - # REFERENCEPROJECT  
1616 - referenceproject_id = check  
1617 - referenceproject_size = struct.unpack("<L", dir_stream.read(4))[0]  
1618 - referenceproject_sizeof_libidabsolute = struct.unpack("<L", dir_stream.read(4))[0]  
1619 - referenceproject_libidabsolute = dir_stream.read(referenceproject_sizeof_libidabsolute)  
1620 - referenceproject_sizeof_libidrelative = struct.unpack("<L", dir_stream.read(4))[0]  
1621 - referenceproject_libidrelative = dir_stream.read(referenceproject_sizeof_libidrelative)  
1622 - referenceproject_majorversion = struct.unpack("<L", dir_stream.read(4))[0]  
1623 - referenceproject_minorversion = struct.unpack("<H", dir_stream.read(2))[0]  
1624 - unused = referenceproject_id  
1625 - unused = referenceproject_size  
1626 - unused = referenceproject_libidabsolute  
1627 - unused = referenceproject_libidrelative  
1628 - unused = referenceproject_majorversion  
1629 - unused = referenceproject_minorversion  
1630 - continue 1383 + def __init__(self, project, dir_stream, module_index):
  1384 + """
  1385 + Parse a VBA Module record from the dir stream of a VBA project.
  1386 + Reference: MS-OVBA 2.3.4.2.3.2 MODULE Record
1631 1387
1632 - log.error('invalid or unknown check Id {0:04X}'.format(check))  
1633 - # raise an exception instead of stopping abruptly (issue #180)  
1634 - raise UnexpectedDataError(dir_path, 'reference type', (0x0F, 0x16, 0x33, 0x2F, 0x0D, 0x0E), check)  
1635 - #sys.exit(0)  
1636 -  
1637 - projectmodules_id = check #struct.unpack("<H", dir_stream.read(2))[0]  
1638 - check_value('PROJECTMODULES_Id', 0x000F, projectmodules_id)  
1639 - projectmodules_size = struct.unpack("<L", dir_stream.read(4))[0]  
1640 - check_value('PROJECTMODULES_Size', 0x0002, projectmodules_size)  
1641 - projectmodules_count = struct.unpack("<H", dir_stream.read(2))[0]  
1642 - projectmodules_projectcookierecord_id = struct.unpack("<H", dir_stream.read(2))[0]  
1643 - check_value('PROJECTMODULES_ProjectCookieRecord_Id', 0x0013, projectmodules_projectcookierecord_id)  
1644 - projectmodules_projectcookierecord_size = struct.unpack("<L", dir_stream.read(4))[0]  
1645 - check_value('PROJECTMODULES_ProjectCookieRecord_Size', 0x0002, projectmodules_projectcookierecord_size)  
1646 - projectmodules_projectcookierecord_cookie = struct.unpack("<H", dir_stream.read(2))[0]  
1647 - unused = projectmodules_projectcookierecord_cookie  
1648 -  
1649 - # short function to simplify unicode text output  
1650 - uni_out = lambda unicode_text: unicode_text.encode('utf-8', 'replace')  
1651 -  
1652 - log.debug("parsing {0} modules".format(projectmodules_count))  
1653 - for projectmodule_index in xrange(0, projectmodules_count): 1388 + :param VBA_Project project: VBA_Project, corresponding VBA project
  1389 + :param olefile.OleStream dir_stream: olefile.OleStream, file object containing the module record
  1390 + :param int module_index: int, index of the module in the VBA project list
  1391 + """
  1392 + #: reference to the VBA project for later use (VBA_Project)
  1393 + self.project = project
  1394 + #: VBA module name (unicode str)
  1395 + self.name = None
  1396 + #: VBA module name as a native str (utf8 bytes on py2, str on py3)
  1397 + self.name_str = None
  1398 + #: VBA module name, unicode copy (unicode str)
  1399 + self._name_unicode = None
  1400 + #: Stream name containing the VBA module (unicode str)
  1401 + self.streamname = None
  1402 + #: Stream name containing the VBA module as a native str (utf8 bytes on py2, str on py3)
  1403 + self.streamname_str = None
  1404 + self._streamname_unicode = None
  1405 + self.docstring = None
  1406 + self._docstring_unicode = None
  1407 + self.textoffset = None
  1408 + self.type = None
  1409 + self.readonly = False
  1410 + self.private = False
  1411 + #: VBA source code in bytes format, using the original code page from the VBA project
  1412 + self.code_raw = None
  1413 + #: VBA source code in unicode format (unicode for Python2, str for Python 3)
  1414 + self.code = None
  1415 + #: VBA source code in native str format (str encoded with UTF-8 for Python 2, str for Python 3)
  1416 + self.code_str = None
  1417 + #: VBA module file name including an extension based on the module type such as bas, cls, frm (unicode str)
  1418 + self.filename = None
  1419 + #: VBA module file name in native str format (str)
  1420 + self.filename_str = None
  1421 + self.code_path = None
1654 try: 1422 try:
1655 - modulename_id = struct.unpack("<H", dir_stream.read(2))[0]  
1656 - check_value('MODULENAME_Id', 0x0019, modulename_id)  
1657 - modulename_sizeof_modulename = struct.unpack("<L", dir_stream.read(4))[0]  
1658 - modulename_modulename = dir_stream.read(modulename_sizeof_modulename)  
1659 - # TODO: preset variables to avoid "referenced before assignment" errors  
1660 - modulename_unicode_modulename_unicode = '' 1423 + # 2.3.4.2.3.2.1 MODULENAME Record
  1424 + # Specifies a VBA identifier as the name of the containing MODULE Record
  1425 + _id = struct.unpack("<H", dir_stream.read(2))[0]
  1426 + project.check_value('MODULENAME_Id', 0x0019, _id)
  1427 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1428 + modulename_bytes = dir_stream.read(size)
  1429 + # Module name always stored as Unicode:
  1430 + self.name = project.decode_bytes(modulename_bytes)
  1431 + self.name_str = unicode2str(self.name)
1661 # account for optional sections 1432 # account for optional sections
  1433 + # TODO: shouldn't this be a loop? (check MS-OVBA)
1662 section_id = struct.unpack("<H", dir_stream.read(2))[0] 1434 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1663 if section_id == 0x0047: 1435 if section_id == 0x0047:
1664 - modulename_unicode_id = section_id  
1665 - modulename_unicode_sizeof_modulename_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1666 - modulename_unicode_modulename_unicode = dir_stream.read(  
1667 - modulename_unicode_sizeof_modulename_unicode).decode('UTF-16LE', 'replace')  
1668 - # just guessing that this is the same encoding as used in OleFileIO  
1669 - unused = modulename_unicode_id 1436 + # 2.3.4.2.3.2.2 MODULENAMEUNICODE Record
  1437 + # Specifies a VBA identifier as the name of the containing MODULE Record (section 2.3.4.2.3.2).
  1438 + # MUST contain the UTF-16 encoding of MODULENAME Record
  1439 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1440 + self._name_unicode = dir_stream.read(size).decode('UTF-16LE', 'replace')
1670 section_id = struct.unpack("<H", dir_stream.read(2))[0] 1441 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1671 if section_id == 0x001A: 1442 if section_id == 0x001A:
1672 - modulestreamname_id = section_id  
1673 - modulestreamname_sizeof_streamname = struct.unpack("<L", dir_stream.read(4))[0]  
1674 - modulestreamname_streamname = dir_stream.read(modulestreamname_sizeof_streamname)  
1675 - modulestreamname_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1676 - check_value('MODULESTREAMNAME_Reserved', 0x0032, modulestreamname_reserved)  
1677 - modulestreamname_sizeof_streamname_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1678 - modulestreamname_streamname_unicode = dir_stream.read(  
1679 - modulestreamname_sizeof_streamname_unicode).decode('UTF-16LE', 'replace')  
1680 - # just guessing that this is the same encoding as used in OleFileIO  
1681 - unused = modulestreamname_id 1443 + # 2.3.4.2.3.2.3 MODULESTREAMNAME Record
  1444 + # Specifies the stream name of the ModuleStream (section 2.3.4.3) in the VBA Storage (section 2.3.4)
  1445 + # corresponding to the containing MODULE Record
  1446 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1447 + streamname_bytes = dir_stream.read(size)
  1448 + # Store it as Unicode:
  1449 + self.streamname = project.decode_bytes(streamname_bytes)
  1450 + self.streamname_str = unicode2str(self.streamname)
  1451 + reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1452 + project.check_value('MODULESTREAMNAME_Reserved', 0x0032, reserved)
  1453 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1454 + self._streamname_unicode = dir_stream.read(size).decode('UTF-16LE', 'replace')
1682 section_id = struct.unpack("<H", dir_stream.read(2))[0] 1455 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1683 if section_id == 0x001C: 1456 if section_id == 0x001C:
1684 - moduledocstring_id = section_id  
1685 - check_value('MODULEDOCSTRING_Id', 0x001C, moduledocstring_id)  
1686 - moduledocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0]  
1687 - moduledocstring_docstring = dir_stream.read(moduledocstring_sizeof_docstring)  
1688 - moduledocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1689 - check_value('MODULEDOCSTRING_Reserved', 0x0048, moduledocstring_reserved)  
1690 - moduledocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1691 - moduledocstring_docstring_unicode = dir_stream.read(moduledocstring_sizeof_docstring_unicode)  
1692 - unused = moduledocstring_docstring  
1693 - unused = moduledocstring_docstring_unicode 1457 + # 2.3.4.2.3.2.4 MODULEDOCSTRING Record
  1458 + # Specifies the description for the containing MODULE Record
  1459 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1460 + docstring_bytes = dir_stream.read(size)
  1461 + self.docstring = project.decode_bytes(docstring_bytes)
  1462 + reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1463 + project.check_value('MODULEDOCSTRING_Reserved', 0x0048, reserved)
  1464 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1465 + self._docstring_unicode = dir_stream.read(size)
1694 section_id = struct.unpack("<H", dir_stream.read(2))[0] 1466 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1695 if section_id == 0x0031: 1467 if section_id == 0x0031:
1696 - moduleoffset_id = section_id  
1697 - check_value('MODULEOFFSET_Id', 0x0031, moduleoffset_id)  
1698 - moduleoffset_size = struct.unpack("<L", dir_stream.read(4))[0]  
1699 - check_value('MODULEOFFSET_Size', 0x0004, moduleoffset_size)  
1700 - moduleoffset_textoffset = struct.unpack("<L", dir_stream.read(4))[0] 1468 + # 2.3.4.2.3.2.5 MODULEOFFSET Record
  1469 + # Specifies the location of the source code within the ModuleStream (section 2.3.4.3)
  1470 + # that corresponds to the containing MODULE Record
  1471 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1472 + project.check_value('MODULEOFFSET_Size', 0x0004, size)
  1473 + self.textoffset = struct.unpack("<L", dir_stream.read(4))[0]
1701 section_id = struct.unpack("<H", dir_stream.read(2))[0] 1474 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1702 if section_id == 0x001E: 1475 if section_id == 0x001E:
1703 - modulehelpcontext_id = section_id  
1704 - check_value('MODULEHELPCONTEXT_Id', 0x001E, modulehelpcontext_id) 1476 + # 2.3.4.2.3.2.6 MODULEHELPCONTEXT Record
  1477 + # Specifies the Help topic identifier for the containing MODULE Record
1705 modulehelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0] 1478 modulehelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0]
1706 - check_value('MODULEHELPCONTEXT_Size', 0x0004, modulehelpcontext_size)  
1707 - modulehelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0]  
1708 - unused = modulehelpcontext_helpcontext 1479 + project.check_value('MODULEHELPCONTEXT_Size', 0x0004, modulehelpcontext_size)
  1480 + # HelpContext (4 bytes): An unsigned integer that specifies the Help topic identifier
  1481 + # in the Help file specified by PROJECTHELPFILEPATH Record
  1482 + helpcontext = struct.unpack("<L", dir_stream.read(4))[0]
1709 section_id = struct.unpack("<H", dir_stream.read(2))[0] 1483 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1710 if section_id == 0x002C: 1484 if section_id == 0x002C:
1711 - modulecookie_id = section_id  
1712 - check_value('MODULECOOKIE_Id', 0x002C, modulecookie_id)  
1713 - modulecookie_size = struct.unpack("<L", dir_stream.read(4))[0]  
1714 - check_value('MODULECOOKIE_Size', 0x0002, modulecookie_size)  
1715 - modulecookie_cookie = struct.unpack("<H", dir_stream.read(2))[0]  
1716 - unused = modulecookie_cookie 1485 + # 2.3.4.2.3.2.7 MODULECOOKIE Record
  1486 + # Specifies ignored data.
  1487 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1488 + project.check_value('MODULECOOKIE_Size', 0x0002, size)
  1489 + cookie = struct.unpack("<H", dir_stream.read(2))[0]
1717 section_id = struct.unpack("<H", dir_stream.read(2))[0] 1490 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1718 if section_id == 0x0021 or section_id == 0x0022: 1491 if section_id == 0x0021 or section_id == 0x0022:
1719 - moduletype_id = section_id  
1720 - moduletype_reserved = struct.unpack("<L", dir_stream.read(4))[0]  
1721 - unused = moduletype_id  
1722 - unused = moduletype_reserved 1492 + # 2.3.4.2.3.2.8 MODULETYPE Record
  1493 + # Specifies whether the containing MODULE Record (section 2.3.4.2.3.2) is a procedural module,
  1494 + # document module, class module, or designer module.
  1495 + # Id (2 bytes): An unsigned integer that specifies the identifier for this record.
  1496 + # MUST be 0x0021 when the containing MODULE Record (section 2.3.4.2.3.2) is a procedural module.
  1497 + # MUST be 0x0022 when the containing MODULE Record (section 2.3.4.2.3.2) is a document module,
  1498 + # class module, or designer module.
  1499 + self.type = section_id
  1500 + reserved = struct.unpack("<L", dir_stream.read(4))[0]
1723 section_id = struct.unpack("<H", dir_stream.read(2))[0] 1501 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1724 if section_id == 0x0025: 1502 if section_id == 0x0025:
1725 - modulereadonly_id = section_id  
1726 - check_value('MODULEREADONLY_Id', 0x0025, modulereadonly_id)  
1727 - modulereadonly_reserved = struct.unpack("<L", dir_stream.read(4))[0]  
1728 - check_value('MODULEREADONLY_Reserved', 0x0000, modulereadonly_reserved) 1503 + # 2.3.4.2.3.2.9 MODULEREADONLY Record
  1504 + # Specifies that the containing MODULE Record (section 2.3.4.2.3.2) is read-only.
  1505 + self.readonly = True
  1506 + reserved = struct.unpack("<L", dir_stream.read(4))[0]
  1507 + project.check_value('MODULEREADONLY_Reserved', 0x0000, reserved)
1729 section_id = struct.unpack("<H", dir_stream.read(2))[0] 1508 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1730 if section_id == 0x0028: 1509 if section_id == 0x0028:
1731 - moduleprivate_id = section_id  
1732 - check_value('MODULEPRIVATE_Id', 0x0028, moduleprivate_id)  
1733 - moduleprivate_reserved = struct.unpack("<L", dir_stream.read(4))[0]  
1734 - check_value('MODULEPRIVATE_Reserved', 0x0000, moduleprivate_reserved) 1510 + # 2.3.4.2.3.2.10 MODULEPRIVATE Record
  1511 + # Specifies that the containing MODULE Record (section 2.3.4.2.3.2) is only usable from within
  1512 + # the current VBA project.
  1513 + self.private = True
  1514 + reserved = struct.unpack("<L", dir_stream.read(4))[0]
  1515 + project.check_value('MODULEPRIVATE_Reserved', 0x0000, reserved)
1735 section_id = struct.unpack("<H", dir_stream.read(2))[0] 1516 section_id = struct.unpack("<H", dir_stream.read(2))[0]
1736 if section_id == 0x002B: # TERMINATOR 1517 if section_id == 0x002B: # TERMINATOR
1737 - module_reserved = struct.unpack("<L", dir_stream.read(4))[0]  
1738 - check_value('MODULE_Reserved', 0x0000, module_reserved) 1518 + # Terminator (2 bytes): An unsigned integer that specifies the end of this record. MUST be 0x002B.
  1519 + # Reserved (4 bytes): MUST be 0x00000000. MUST be ignored.
  1520 + reserved = struct.unpack("<L", dir_stream.read(4))[0]
  1521 + project.check_value('MODULE_Reserved', 0x0000, reserved)
1739 section_id = None 1522 section_id = None
1740 if section_id != None: 1523 if section_id != None:
1741 log.warning('unknown or invalid module section id {0:04X}'.format(section_id)) 1524 log.warning('unknown or invalid module section id {0:04X}'.format(section_id))
1742 -  
1743 - log.debug('Project CodePage = %d' % projectcodepage_codepage)  
1744 - if projectcodepage_codepage in MAC_CODEPAGES:  
1745 - vba_codec = MAC_CODEPAGES[projectcodepage_codepage]  
1746 - else:  
1747 - vba_codec = 'cp%d' % projectcodepage_codepage  
1748 - log.debug("ModuleName = {0}".format(modulename_modulename))  
1749 - log.debug("ModuleNameUnicode = {0}".format(uni_out(modulename_unicode_modulename_unicode)))  
1750 - log.debug("StreamName = {0}".format(modulestreamname_streamname))  
1751 - try:  
1752 - streamname_unicode = modulestreamname_streamname.decode(vba_codec)  
1753 - except UnicodeError as ue:  
1754 - log.debug('failed to decode stream name {0!r} with codec {1}'  
1755 - .format(uni_out(streamname_unicode), vba_codec))  
1756 - streamname_unicode = modulestreamname_streamname.decode(vba_codec, errors='replace')  
1757 - log.debug("StreamName.decode('%s') = %s" % (vba_codec, uni_out(streamname_unicode)))  
1758 - log.debug("StreamNameUnicode = {0}".format(uni_out(modulestreamname_streamname_unicode)))  
1759 - log.debug("TextOffset = {0}".format(moduleoffset_textoffset))  
1760 - 1525 +
  1526 + log.debug("Module Name = {0}".format(self.name_str))
  1527 + # log.debug("Module Name Unicode = {0}".format(self._name_unicode))
  1528 + log.debug("Stream Name = {0}".format(self.streamname_str))
  1529 + # log.debug("Stream Name Unicode = {0}".format(self._streamname_unicode))
  1530 + log.debug("TextOffset = {0}".format(self.textoffset))
  1531 +
1761 code_data = None 1532 code_data = None
1762 - try_names = streamname_unicode, \  
1763 - modulename_unicode_modulename_unicode, \  
1764 - modulestreamname_streamname_unicode 1533 + # let's try the different names we have, just in case some are missing:
  1534 + try_names = (self.streamname, self._streamname_unicode, self.name, self._name_unicode)
1765 for stream_name in try_names: 1535 for stream_name in try_names:
1766 # TODO: if olefile._find were less private, could replace this 1536 # TODO: if olefile._find were less private, could replace this
1767 # try-except with calls to it 1537 # try-except with calls to it
1768 - try:  
1769 - code_path = vba_root + u'VBA/' + stream_name  
1770 - log.debug('opening VBA code stream %s' % uni_out(code_path))  
1771 - code_data = ole.openstream(code_path).read()  
1772 - break  
1773 - except IOError as ioe:  
1774 - log.debug('failed to open stream VBA/%r (%r), try other name'  
1775 - % (uni_out(stream_name), ioe))  
1776 - 1538 + if stream_name is not None:
  1539 + try:
  1540 + self.code_path = project.vba_root + u'VBA/' + stream_name
  1541 + log.debug('opening VBA code stream %s' % self.code_path)
  1542 + code_data = project.ole.openstream(self.code_path).read()
  1543 + break
  1544 + except IOError as ioe:
  1545 + log.debug('failed to open stream VBA/%r (%r), try other name'
  1546 + % (stream_name, ioe))
  1547 +
1777 if code_data is None: 1548 if code_data is None:
1778 log.info("Could not open stream %d of %d ('VBA/' + one of %r)!" 1549 log.info("Could not open stream %d of %d ('VBA/' + one of %r)!"
1779 - % (projectmodule_index, projectmodules_count,  
1780 - '/'.join("'" + uni_out(stream_name) + "'"  
1781 - for stream_name in try_names)))  
1782 - if relaxed:  
1783 - continue # ... with next submodule 1550 + % (module_index, project.modules_count,
  1551 + '/'.join("'" + stream_name + "'"
  1552 + for stream_name in try_names)))
  1553 + if project.relaxed:
  1554 + return # ... continue with next submodule
1784 else: 1555 else:
1785 - raise SubstreamOpenError('[BASE]', 'VBA/' +  
1786 - uni_out(modulename_unicode_modulename_unicode))  
1787 - 1556 + raise SubstreamOpenError('[BASE]', 'VBA/' + self.name)
  1557 +
1788 log.debug("length of code_data = {0}".format(len(code_data))) 1558 log.debug("length of code_data = {0}".format(len(code_data)))
1789 - log.debug("offset of code_data = {0}".format(moduleoffset_textoffset))  
1790 - code_data = code_data[moduleoffset_textoffset:] 1559 + log.debug("offset of code_data = {0}".format(self.textoffset))
  1560 + code_data = code_data[self.textoffset:]
1791 if len(code_data) > 0: 1561 if len(code_data) > 0:
1792 - code_data = decompress_stream(code_data) 1562 + code_data = decompress_stream(bytearray(code_data))
  1563 + # store the raw code encoded as bytes with the project's code page:
  1564 + self.code_raw = code_data
  1565 + # decode it to unicode:
  1566 + self.code = project.decode_bytes(code_data)
  1567 + # also store a native str version:
  1568 + self.code_str = unicode2str(self.code)
1793 # case-insensitive search in the code_modules dict to find the file extension: 1569 # case-insensitive search in the code_modules dict to find the file extension:
1794 - filext = code_modules.get(modulename_modulename.lower(), 'bin')  
1795 - filename = '{0}.{1}'.format(modulename_modulename, filext)  
1796 - #TODO: also yield the codepage so that callers can decode it properly  
1797 - yield (code_path, filename, code_data)  
1798 - # print '-'*79  
1799 - # print filename  
1800 - # print ''  
1801 - # print code_data  
1802 - # print ''  
1803 - log.debug('extracted file {0}'.format(filename)) 1570 + filext = self.project.module_ext.get(self.name.lower(), 'vba')
  1571 + self.filename = u'{0}.{1}'.format(self.name, filext)
  1572 + self.filename_str = unicode2str(self.filename)
  1573 + log.debug('extracted file {0}'.format(self.filename_str))
1804 else: 1574 else:
1805 - log.warning("module stream {0} has code data length 0".format(modulestreamname_streamname)) 1575 + log.warning("module stream {0} has code data length 0".format(self.streamname_str))
1806 except (UnexpectedDataError, SubstreamOpenError): 1576 except (UnexpectedDataError, SubstreamOpenError):
1807 raise 1577 raise
1808 except Exception as exc: 1578 except Exception as exc:
1809 - log.info('Error parsing module {0} of {1} in _extract_vba:'  
1810 - .format(projectmodule_index, projectmodules_count), 1579 + log.info('Error parsing module {0} of {1}:'
  1580 + .format(module_index, project.modules_count),
1811 exc_info=True) 1581 exc_info=True)
1812 - if not relaxed: 1582 + if not project.relaxed:
1813 raise 1583 raise
1814 - _ = unused # make pylint happy: now variable "unused" is being used ;-)  
1815 - return 1584 +
  1585 +
  1586 +class VBA_Project(object):
  1587 + """
  1588 + Class to parse a VBA project from an OLE file, and to store all the corresponding
  1589 + metadata and VBA modules.
  1590 + """
  1591 +
  1592 + def __init__(self, ole, vba_root, project_path, dir_path, relaxed=False):
  1593 + """
  1594 + Extract VBA macros from an OleFileIO object.
  1595 +
  1596 + :param vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream
  1597 + :param project_path: path to the PROJECT stream
  1598 + :param relaxed: If True, only create info/debug log entry if data is not as expected
  1599 + (e.g. opening substream fails); if False, raise an error in this case
  1600 + """
  1601 + self.ole = ole
  1602 + self.vba_root = vba_root
  1603 + self. project_path = project_path
  1604 + self.dir_path = dir_path
  1605 + self.relaxed = relaxed
  1606 + #: VBA modules contained in the project (list of VBA_Module objects)
  1607 + self.modules = []
  1608 + #: file extension for each VBA module
  1609 + self.module_ext = {}
  1610 + log.debug('Parsing the dir stream from %r' % dir_path)
  1611 + # read data from dir stream (compressed)
  1612 + dir_compressed = ole.openstream(dir_path).read()
  1613 + # decompress it:
  1614 + dir_stream = BytesIO(decompress_stream(bytearray(dir_compressed)))
  1615 + # store reference for later use:
  1616 + self.dir_stream = dir_stream
  1617 +
  1618 + # reference: MS-VBAL 2.3.4.2 dir Stream: Version Independent Project Information
  1619 +
  1620 + # PROJECTSYSKIND Record
  1621 + # Specifies the platform for which the VBA project is created.
  1622 + projectsyskind_id = struct.unpack("<H", dir_stream.read(2))[0]
  1623 + self.check_value('PROJECTSYSKIND_Id', 0x0001, projectsyskind_id)
  1624 + projectsyskind_size = struct.unpack("<L", dir_stream.read(4))[0]
  1625 + self.check_value('PROJECTSYSKIND_Size', 0x0004, projectsyskind_size)
  1626 + self.syskind = struct.unpack("<L", dir_stream.read(4))[0]
  1627 + SYSKIND_NAME = {
  1628 + 0x00: "16-bit Windows",
  1629 + 0x01: "32-bit Windows",
  1630 + 0x02: "Macintosh",
  1631 + 0x03: "64-bit Windows"
  1632 + }
  1633 + self.syskind_name = SYSKIND_NAME.get(self.syskind, 'Unknown')
  1634 + log.debug("PROJECTSYSKIND_SysKind: %d - %s" % (self.syskind, self.syskind_name))
  1635 + if self.syskind not in SYSKIND_NAME:
  1636 + log.error("invalid PROJECTSYSKIND_SysKind {0:04X}".format(self.syskind))
  1637 +
  1638 + # PROJECTLCID Record
  1639 + # Specifies the VBA project's LCID.
  1640 + projectlcid_id = struct.unpack("<H", dir_stream.read(2))[0]
  1641 + self.check_value('PROJECTLCID_Id', 0x0002, projectlcid_id)
  1642 + projectlcid_size = struct.unpack("<L", dir_stream.read(4))[0]
  1643 + self.check_value('PROJECTLCID_Size', 0x0004, projectlcid_size)
  1644 + # Lcid (4 bytes): An unsigned integer that specifies the LCID value for the VBA project. MUST be 0x00000409.
  1645 + self.lcid = struct.unpack("<L", dir_stream.read(4))[0]
  1646 + self.check_value('PROJECTLCID_Lcid', 0x409, self.lcid)
  1647 +
  1648 + # PROJECTLCIDINVOKE Record
  1649 + # Specifies an LCID value used for Invoke calls on an Automation server as specified in [MS-OAUT] section 3.1.4.4.
  1650 + projectlcidinvoke_id = struct.unpack("<H", dir_stream.read(2))[0]
  1651 + self.check_value('PROJECTLCIDINVOKE_Id', 0x0014, projectlcidinvoke_id)
  1652 + projectlcidinvoke_size = struct.unpack("<L", dir_stream.read(4))[0]
  1653 + self.check_value('PROJECTLCIDINVOKE_Size', 0x0004, projectlcidinvoke_size)
  1654 + # LcidInvoke (4 bytes): An unsigned integer that specifies the LCID value used for Invoke calls. MUST be 0x00000409.
  1655 + self.lcidinvoke = struct.unpack("<L", dir_stream.read(4))[0]
  1656 + self.check_value('PROJECTLCIDINVOKE_LcidInvoke', 0x409, self.lcidinvoke)
  1657 +
  1658 + # PROJECTCODEPAGE Record
  1659 + # Specifies the VBA project's code page.
  1660 + projectcodepage_id = struct.unpack("<H", dir_stream.read(2))[0]
  1661 + self.check_value('PROJECTCODEPAGE_Id', 0x0003, projectcodepage_id)
  1662 + projectcodepage_size = struct.unpack("<L", dir_stream.read(4))[0]
  1663 + self.check_value('PROJECTCODEPAGE_Size', 0x0002, projectcodepage_size)
  1664 + self.codepage = struct.unpack("<H", dir_stream.read(2))[0]
  1665 + self.codepage_name = codepages.get_codepage_name(self.codepage)
  1666 + log.debug('Project Code Page: %r - %s' % (self.codepage, self.codepage_name))
  1667 + self.codec = codepages.codepage2codec(self.codepage)
  1668 + log.debug('Python codec corresponding to code page %d: %s' % (self.codepage, self.codec))
  1669 +
  1670 +
  1671 + # PROJECTNAME Record
  1672 + # Specifies a unique VBA identifier as the name of the VBA project.
  1673 + projectname_id = struct.unpack("<H", dir_stream.read(2))[0]
  1674 + self.check_value('PROJECTNAME_Id', 0x0004, projectname_id)
  1675 + sizeof_projectname = struct.unpack("<L", dir_stream.read(4))[0]
  1676 + log.debug('Project name size: %d bytes' % sizeof_projectname)
  1677 + if sizeof_projectname < 1 or sizeof_projectname > 128:
  1678 + # TODO: raise an actual error? What is MS Office's behaviour?
  1679 + log.error("PROJECTNAME_SizeOfProjectName value not in range [1-128]: {0}".format(sizeof_projectname))
  1680 + projectname_bytes = dir_stream.read(sizeof_projectname)
  1681 + self.projectname = self.decode_bytes(projectname_bytes)
  1682 +
  1683 +
  1684 + # PROJECTDOCSTRING Record
  1685 + # Specifies the description for the VBA project.
  1686 + projectdocstring_id = struct.unpack("<H", dir_stream.read(2))[0]
  1687 + self.check_value('PROJECTDOCSTRING_Id', 0x0005, projectdocstring_id)
  1688 + projectdocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0]
  1689 + if projectdocstring_sizeof_docstring > 2000:
  1690 + log.error(
  1691 + "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring))
  1692 + # DocString (variable): An array of SizeOfDocString bytes that specifies the description for the VBA project.
  1693 + # MUST contain MBCS characters encoded using the code page specified in PROJECTCODEPAGE (section 2.3.4.2.1.4).
  1694 + # MUST NOT contain null characters.
  1695 + docstring_bytes = dir_stream.read(projectdocstring_sizeof_docstring)
  1696 + self.docstring = self.decode_bytes(docstring_bytes)
  1697 + projectdocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1698 + self.check_value('PROJECTDOCSTRING_Reserved', 0x0040, projectdocstring_reserved)
  1699 + projectdocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0]
  1700 + if projectdocstring_sizeof_docstring_unicode % 2 != 0:
  1701 + log.error("PROJECTDOCSTRING_SizeOfDocStringUnicode is not even")
  1702 + # DocStringUnicode (variable): An array of SizeOfDocStringUnicode bytes that specifies the description for the
  1703 + # VBA project. MUST contain UTF-16 characters. MUST NOT contain null characters.
  1704 + # MUST contain the UTF-16 encoding of DocString.
  1705 + docstring_unicode_bytes = dir_stream.read(projectdocstring_sizeof_docstring_unicode)
  1706 + self.docstring_unicode = docstring_unicode_bytes.decode('utf16', errors='replace')
  1707 +
  1708 + # PROJECTHELPFILEPATH Record - MS-OVBA 2.3.4.2.1.7
  1709 + projecthelpfilepath_id = struct.unpack("<H", dir_stream.read(2))[0]
  1710 + self.check_value('PROJECTHELPFILEPATH_Id', 0x0006, projecthelpfilepath_id)
  1711 + projecthelpfilepath_sizeof_helpfile1 = struct.unpack("<L", dir_stream.read(4))[0]
  1712 + if projecthelpfilepath_sizeof_helpfile1 > 260:
  1713 + log.error(
  1714 + "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1))
  1715 + projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1)
  1716 + projecthelpfilepath_reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1717 + self.check_value('PROJECTHELPFILEPATH_Reserved', 0x003D, projecthelpfilepath_reserved)
  1718 + projecthelpfilepath_sizeof_helpfile2 = struct.unpack("<L", dir_stream.read(4))[0]
  1719 + if projecthelpfilepath_sizeof_helpfile2 != projecthelpfilepath_sizeof_helpfile1:
  1720 + log.error("PROJECTHELPFILEPATH_SizeOfHelpFile1 does not equal PROJECTHELPFILEPATH_SizeOfHelpFile2")
  1721 + projecthelpfilepath_helpfile2 = dir_stream.read(projecthelpfilepath_sizeof_helpfile2)
  1722 + if projecthelpfilepath_helpfile2 != projecthelpfilepath_helpfile1:
  1723 + log.error("PROJECTHELPFILEPATH_HelpFile1 does not equal PROJECTHELPFILEPATH_HelpFile2")
  1724 +
  1725 + # PROJECTHELPCONTEXT Record
  1726 + projecthelpcontext_id = struct.unpack("<H", dir_stream.read(2))[0]
  1727 + self.check_value('PROJECTHELPCONTEXT_Id', 0x0007, projecthelpcontext_id)
  1728 + projecthelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0]
  1729 + self.check_value('PROJECTHELPCONTEXT_Size', 0x0004, projecthelpcontext_size)
  1730 + projecthelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0]
  1731 + unused = projecthelpcontext_helpcontext
  1732 +
  1733 + # PROJECTLIBFLAGS Record
  1734 + projectlibflags_id = struct.unpack("<H", dir_stream.read(2))[0]
  1735 + self.check_value('PROJECTLIBFLAGS_Id', 0x0008, projectlibflags_id)
  1736 + projectlibflags_size = struct.unpack("<L", dir_stream.read(4))[0]
  1737 + self.check_value('PROJECTLIBFLAGS_Size', 0x0004, projectlibflags_size)
  1738 + projectlibflags_projectlibflags = struct.unpack("<L", dir_stream.read(4))[0]
  1739 + self.check_value('PROJECTLIBFLAGS_ProjectLibFlags', 0x0000, projectlibflags_projectlibflags)
  1740 +
  1741 + # PROJECTVERSION Record
  1742 + projectversion_id = struct.unpack("<H", dir_stream.read(2))[0]
  1743 + self.check_value('PROJECTVERSION_Id', 0x0009, projectversion_id)
  1744 + projectversion_reserved = struct.unpack("<L", dir_stream.read(4))[0]
  1745 + self.check_value('PROJECTVERSION_Reserved', 0x0004, projectversion_reserved)
  1746 + projectversion_versionmajor = struct.unpack("<L", dir_stream.read(4))[0]
  1747 + projectversion_versionminor = struct.unpack("<H", dir_stream.read(2))[0]
  1748 + unused = projectversion_versionmajor
  1749 + unused = projectversion_versionminor
  1750 +
  1751 + # PROJECTCONSTANTS Record
  1752 + projectconstants_id = struct.unpack("<H", dir_stream.read(2))[0]
  1753 + self.check_value('PROJECTCONSTANTS_Id', 0x000C, projectconstants_id)
  1754 + projectconstants_sizeof_constants = struct.unpack("<L", dir_stream.read(4))[0]
  1755 + if projectconstants_sizeof_constants > 1015:
  1756 + log.error(
  1757 + "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants))
  1758 + projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants)
  1759 + projectconstants_reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1760 + self.check_value('PROJECTCONSTANTS_Reserved', 0x003C, projectconstants_reserved)
  1761 + projectconstants_sizeof_constants_unicode = struct.unpack("<L", dir_stream.read(4))[0]
  1762 + if projectconstants_sizeof_constants_unicode % 2 != 0:
  1763 + log.error("PROJECTCONSTANTS_SizeOfConstantsUnicode is not even")
  1764 + projectconstants_constants_unicode = dir_stream.read(projectconstants_sizeof_constants_unicode)
  1765 + unused = projectconstants_constants
  1766 + unused = projectconstants_constants_unicode
  1767 +
  1768 + # array of REFERENCE records
  1769 + # Specifies a reference to an Automation type library or VBA project.
  1770 + check = None
  1771 + while True:
  1772 + check = struct.unpack("<H", dir_stream.read(2))[0]
  1773 + log.debug("reference type = {0:04X}".format(check))
  1774 + if check == 0x000F:
  1775 + break
  1776 +
  1777 + if check == 0x0016:
  1778 + # REFERENCENAME
  1779 + # Specifies the name of a referenced VBA project or Automation type library.
  1780 + reference_id = check
  1781 + reference_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]
  1782 + reference_name = dir_stream.read(reference_sizeof_name)
  1783 + log.debug('REFERENCE name: %s' % unicode2str(self.decode_bytes(reference_name)))
  1784 + reference_reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1785 + # According to [MS-OVBA] 2.3.4.2.2.2 REFERENCENAME Record:
  1786 + # "Reserved (2 bytes): MUST be 0x003E. MUST be ignored."
  1787 + # So let's ignore it, otherwise it crashes on some files (issue #132)
  1788 + # PR #135 by @c1fe:
  1789 + # contrary to the specification I think that the unicode name
  1790 + # is optional. if reference_reserved is not 0x003E I think it
  1791 + # is actually the start of another REFERENCE record
  1792 + # at least when projectsyskind_syskind == 0x02 (Macintosh)
  1793 + if reference_reserved == 0x003E:
  1794 + #if reference_reserved not in (0x003E, 0x000D):
  1795 + # raise UnexpectedDataError(dir_path, 'REFERENCE_Reserved',
  1796 + # 0x0003E, reference_reserved)
  1797 + reference_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]
  1798 + reference_name_unicode = dir_stream.read(reference_sizeof_name_unicode)
  1799 + unused = reference_id
  1800 + unused = reference_name
  1801 + unused = reference_name_unicode
  1802 + continue
  1803 + else:
  1804 + check = reference_reserved
  1805 + log.debug("reference type = {0:04X}".format(check))
  1806 +
  1807 + if check == 0x0033:
  1808 + # REFERENCEORIGINAL (followed by REFERENCECONTROL)
  1809 + # Specifies the identifier of the Automation type library the containing REFERENCECONTROL's
  1810 + # (section 2.3.4.2.2.3) twiddled type library was generated from.
  1811 + referenceoriginal_id = check
  1812 + referenceoriginal_sizeof_libidoriginal = struct.unpack("<L", dir_stream.read(4))[0]
  1813 + referenceoriginal_libidoriginal = dir_stream.read(referenceoriginal_sizeof_libidoriginal)
  1814 + log.debug('REFERENCE original lib id: %s' % unicode2str(self.decode_bytes(referenceoriginal_libidoriginal)))
  1815 + unused = referenceoriginal_id
  1816 + unused = referenceoriginal_libidoriginal
  1817 + continue
  1818 +
  1819 + if check == 0x002F:
  1820 + # REFERENCECONTROL
  1821 + # Specifies a reference to a twiddled type library and its extended type library.
  1822 + referencecontrol_id = check
  1823 + referencecontrol_sizetwiddled = struct.unpack("<L", dir_stream.read(4))[0] # ignore
  1824 + referencecontrol_sizeof_libidtwiddled = struct.unpack("<L", dir_stream.read(4))[0]
  1825 + referencecontrol_libidtwiddled = dir_stream.read(referencecontrol_sizeof_libidtwiddled)
  1826 + log.debug('REFERENCE control twiddled lib id: %s' % unicode2str(self.decode_bytes(referencecontrol_libidtwiddled)))
  1827 + referencecontrol_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] # ignore
  1828 + self.check_value('REFERENCECONTROL_Reserved1', 0x0000, referencecontrol_reserved1)
  1829 + referencecontrol_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] # ignore
  1830 + self.check_value('REFERENCECONTROL_Reserved2', 0x0000, referencecontrol_reserved2)
  1831 + unused = referencecontrol_id
  1832 + unused = referencecontrol_sizetwiddled
  1833 + unused = referencecontrol_libidtwiddled
  1834 + # optional field
  1835 + check2 = struct.unpack("<H", dir_stream.read(2))[0]
  1836 + if check2 == 0x0016:
  1837 + referencecontrol_namerecordextended_id = check
  1838 + referencecontrol_namerecordextended_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]
  1839 + referencecontrol_namerecordextended_name = dir_stream.read(
  1840 + referencecontrol_namerecordextended_sizeof_name)
  1841 + log.debug('REFERENCE control name record extended: %s' % unicode2str(
  1842 + self.decode_bytes(referencecontrol_namerecordextended_name)))
  1843 + referencecontrol_namerecordextended_reserved = struct.unpack("<H", dir_stream.read(2))[0]
  1844 + if referencecontrol_namerecordextended_reserved == 0x003E:
  1845 + referencecontrol_namerecordextended_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]
  1846 + referencecontrol_namerecordextended_name_unicode = dir_stream.read(
  1847 + referencecontrol_namerecordextended_sizeof_name_unicode)
  1848 + referencecontrol_reserved3 = struct.unpack("<H", dir_stream.read(2))[0]
  1849 + unused = referencecontrol_namerecordextended_id
  1850 + unused = referencecontrol_namerecordextended_name
  1851 + unused = referencecontrol_namerecordextended_name_unicode
  1852 + else:
  1853 + referencecontrol_reserved3 = referencecontrol_namerecordextended_reserved
  1854 + else:
  1855 + referencecontrol_reserved3 = check2
  1856 +
  1857 + self.check_value('REFERENCECONTROL_Reserved3', 0x0030, referencecontrol_reserved3)
  1858 + referencecontrol_sizeextended = struct.unpack("<L", dir_stream.read(4))[0]
  1859 + referencecontrol_sizeof_libidextended = struct.unpack("<L", dir_stream.read(4))[0]
  1860 + referencecontrol_libidextended = dir_stream.read(referencecontrol_sizeof_libidextended)
  1861 + referencecontrol_reserved4 = struct.unpack("<L", dir_stream.read(4))[0]
  1862 + referencecontrol_reserved5 = struct.unpack("<H", dir_stream.read(2))[0]
  1863 + referencecontrol_originaltypelib = dir_stream.read(16)
  1864 + referencecontrol_cookie = struct.unpack("<L", dir_stream.read(4))[0]
  1865 + unused = referencecontrol_sizeextended
  1866 + unused = referencecontrol_libidextended
  1867 + unused = referencecontrol_reserved4
  1868 + unused = referencecontrol_reserved5
  1869 + unused = referencecontrol_originaltypelib
  1870 + unused = referencecontrol_cookie
  1871 + continue
  1872 +
  1873 + if check == 0x000D:
  1874 + # REFERENCEREGISTERED
  1875 + # Specifies a reference to an Automation type library.
  1876 + referenceregistered_id = check
  1877 + referenceregistered_size = struct.unpack("<L", dir_stream.read(4))[0]
  1878 + referenceregistered_sizeof_libid = struct.unpack("<L", dir_stream.read(4))[0]
  1879 + referenceregistered_libid = dir_stream.read(referenceregistered_sizeof_libid)
  1880 + log.debug('REFERENCE registered lib id: %s' % unicode2str(self.decode_bytes(referenceregistered_libid)))
  1881 + referenceregistered_reserved1 = struct.unpack("<L", dir_stream.read(4))[0]
  1882 + self.check_value('REFERENCEREGISTERED_Reserved1', 0x0000, referenceregistered_reserved1)
  1883 + referenceregistered_reserved2 = struct.unpack("<H", dir_stream.read(2))[0]
  1884 + self.check_value('REFERENCEREGISTERED_Reserved2', 0x0000, referenceregistered_reserved2)
  1885 + unused = referenceregistered_id
  1886 + unused = referenceregistered_size
  1887 + unused = referenceregistered_libid
  1888 + continue
  1889 +
  1890 + if check == 0x000E:
  1891 + # REFERENCEPROJECT
  1892 + # Specifies a reference to an external VBA project.
  1893 + referenceproject_id = check
  1894 + referenceproject_size = struct.unpack("<L", dir_stream.read(4))[0]
  1895 + referenceproject_sizeof_libidabsolute = struct.unpack("<L", dir_stream.read(4))[0]
  1896 + referenceproject_libidabsolute = dir_stream.read(referenceproject_sizeof_libidabsolute)
  1897 + log.debug('REFERENCE project lib id absolute: %s' % unicode2str(self.decode_bytes(referenceproject_libidabsolute)))
  1898 + referenceproject_sizeof_libidrelative = struct.unpack("<L", dir_stream.read(4))[0]
  1899 + referenceproject_libidrelative = dir_stream.read(referenceproject_sizeof_libidrelative)
  1900 + log.debug('REFERENCE project lib id relative: %s' % unicode2str(self.decode_bytes(referenceproject_libidrelative)))
  1901 + referenceproject_majorversion = struct.unpack("<L", dir_stream.read(4))[0]
  1902 + referenceproject_minorversion = struct.unpack("<H", dir_stream.read(2))[0]
  1903 + unused = referenceproject_id
  1904 + unused = referenceproject_size
  1905 + unused = referenceproject_libidabsolute
  1906 + unused = referenceproject_libidrelative
  1907 + unused = referenceproject_majorversion
  1908 + unused = referenceproject_minorversion
  1909 + continue
  1910 +
  1911 + log.error('invalid or unknown check Id {0:04X}'.format(check))
  1912 + # raise an exception instead of stopping abruptly (issue #180)
  1913 + raise UnexpectedDataError(dir_path, 'reference type', (0x0F, 0x16, 0x33, 0x2F, 0x0D, 0x0E), check)
  1914 + #sys.exit(0)
  1915 +
  1916 + def check_value(self, name, expected, value):
  1917 + if expected != value:
  1918 + if self.relaxed:
  1919 + log.error("invalid value for {0} expected {1:04X} got {2:04X}"
  1920 + .format(name, expected, value))
  1921 + else:
  1922 + raise UnexpectedDataError(self.dir_path, name, expected, value)
  1923 +
  1924 + def parse_project_stream(self):
  1925 + """
  1926 + Parse the PROJECT stream from the VBA project
  1927 + :return:
  1928 + """
  1929 + # Open the PROJECT stream:
  1930 + # reference: [MS-OVBA] 2.3.1 PROJECT Stream
  1931 + project_stream = self.ole.openstream(self.project_path)
  1932 +
  1933 + # sample content of the PROJECT stream:
  1934 +
  1935 + ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}"
  1936 + ## Document=ThisDocument/&H00000000
  1937 + ## Module=NewMacros
  1938 + ## Name="Project"
  1939 + ## HelpContextID="0"
  1940 + ## VersionCompatible32="393222000"
  1941 + ## CMG="F1F301E705E705E705E705"
  1942 + ## DPB="8F8D7FE3831F2020202020"
  1943 + ## GC="2D2FDD81E51EE61EE6E1"
  1944 + ##
  1945 + ## [Host Extender Info]
  1946 + ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000
  1947 + ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000
  1948 + ##
  1949 + ## [Workspace]
  1950 + ## ThisDocument=22, 29, 339, 477, Z
  1951 + ## NewMacros=-4, 42, 832, 510, C
  1952 +
  1953 + self.module_ext = {}
  1954 +
  1955 + for line in project_stream:
  1956 + line = self.decode_bytes(line)
  1957 + log.debug('PROJECT: %r' % line)
  1958 + line = line.strip()
  1959 + if '=' in line:
  1960 + # split line at the 1st equal sign:
  1961 + name, value = line.split('=', 1)
  1962 + # looking for code modules
  1963 + # add the code module as a key in the dictionary
  1964 + # the value will be the extension needed later
  1965 + # The value is converted to lowercase, to allow case-insensitive matching (issue #3)
  1966 + value = value.lower()
  1967 + if name == 'Document':
  1968 + # split value at the 1st slash, keep 1st part:
  1969 + value = value.split('/', 1)[0]
  1970 + self.module_ext[value] = CLASS_EXTENSION
  1971 + elif name == 'Module':
  1972 + self.module_ext[value] = MODULE_EXTENSION
  1973 + elif name == 'Class':
  1974 + self.module_ext[value] = CLASS_EXTENSION
  1975 + elif name == 'BaseClass':
  1976 + self.module_ext[value] = FORM_EXTENSION
  1977 +
  1978 + def parse_modules(self):
  1979 + dir_stream = self.dir_stream
  1980 + # projectmodules_id has already been read by the previous loop = 0x000F
  1981 + # projectmodules_id = check #struct.unpack("<H", dir_stream.read(2))[0]
  1982 + # self.check_value('PROJECTMODULES_Id', 0x000F, projectmodules_id)
  1983 + projectmodules_size = struct.unpack("<L", dir_stream.read(4))[0]
  1984 + self.check_value('PROJECTMODULES_Size', 0x0002, projectmodules_size)
  1985 + self.modules_count = struct.unpack("<H", dir_stream.read(2))[0]
  1986 + _id = struct.unpack("<H", dir_stream.read(2))[0]
  1987 + self.check_value('PROJECTMODULES_ProjectCookieRecord_Id', 0x0013, _id)
  1988 + size = struct.unpack("<L", dir_stream.read(4))[0]
  1989 + self.check_value('PROJECTMODULES_ProjectCookieRecord_Size', 0x0002, size)
  1990 + projectcookierecord_cookie = struct.unpack("<H", dir_stream.read(2))[0]
  1991 + unused = projectcookierecord_cookie
  1992 +
  1993 + log.debug("parsing {0} modules".format(self.modules_count))
  1994 + for module_index in xrange(0, self.modules_count):
  1995 + module = VBA_Module(self, self.dir_stream, module_index=module_index)
  1996 + self.modules.append(module)
  1997 + yield (module.code_path, module.filename_str, module.code_str)
  1998 + _ = unused # make pylint happy: now variable "unused" is being used ;-)
  1999 + return
  2000 +
  2001 + def decode_bytes(self, bytes_string, errors='replace'):
  2002 + """
  2003 + Decode a bytes string to a unicode string, using the project code page
  2004 + :param bytes_string: bytes, bytes string to be decoded
  2005 + :param errors: str, mode to handle unicode conversion errors
  2006 + :return: str/unicode, decoded string
  2007 + """
  2008 + return bytes_string.decode(self.codec, errors=errors)
  2009 +
  2010 +
  2011 +
  2012 +def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False):
  2013 + """
  2014 + Extract VBA macros from an OleFileIO object.
  2015 + Internal function, do not call directly.
  2016 +
  2017 + vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream
  2018 + vba_project: path to the PROJECT stream
  2019 + :param relaxed: If True, only create info/debug log entry if data is not as expected
  2020 + (e.g. opening substream fails); if False, raise an error in this case
  2021 + This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream
  2022 + """
  2023 + log.debug('relaxed is %s' % relaxed)
  2024 +
  2025 + project = VBA_Project(ole, vba_root, project_path, dir_path, relaxed=False)
  2026 + project.parse_project_stream()
  2027 +
  2028 + for code_path, filename, code_data in project.parse_modules():
  2029 + yield (code_path, filename, code_data)
1816 2030
1817 2031
1818 def vba_collapse_long_lines(vba_code): 2032 def vba_collapse_long_lines(vba_code):
@@ -1824,9 +2038,13 @@ def vba_collapse_long_lines(vba_code): @@ -1824,9 +2038,13 @@ def vba_collapse_long_lines(vba_code):
1824 :return: str, VBA module code with long lines collapsed 2038 :return: str, VBA module code with long lines collapsed
1825 """ 2039 """
1826 # TODO: use a regex instead, to allow whitespaces after the underscore? 2040 # TODO: use a regex instead, to allow whitespaces after the underscore?
1827 - vba_code = vba_code.replace(' _\r\n', ' ')  
1828 - vba_code = vba_code.replace(' _\r', ' ')  
1829 - vba_code = vba_code.replace(' _\n', ' ') 2041 + try:
  2042 + vba_code = vba_code.replace(' _\r\n', ' ')
  2043 + vba_code = vba_code.replace(' _\r', ' ')
  2044 + vba_code = vba_code.replace(' _\n', ' ')
  2045 + except:
  2046 + log.exception('type(vba_code)=%s' % type(vba_code))
  2047 + raise
1830 return vba_code 2048 return vba_code
1831 2049
1832 2050
@@ -1875,7 +2093,7 @@ def detect_autoexec(vba_code, obfuscation=None): @@ -1875,7 +2093,7 @@ def detect_autoexec(vba_code, obfuscation=None):
1875 for keyword in keywords: 2093 for keyword in keywords:
1876 #TODO: if keyword is already a compiled regex, use it as-is 2094 #TODO: if keyword is already a compiled regex, use it as-is
1877 # search using regex to detect word boundaries: 2095 # search using regex to detect word boundaries:
1878 - match = re.search(r'(?i)\b' + keyword + r'\b', vba_code) 2096 + match = re.search(r'(?i)\b' + re.escape(keyword) + r'\b', vba_code)
1879 if match: 2097 if match:
1880 #if keyword.lower() in vba_code: 2098 #if keyword.lower() in vba_code:
1881 found_keyword = match.group() 2099 found_keyword = match.group()
@@ -1901,7 +2119,8 @@ def detect_suspicious(vba_code, obfuscation=None): @@ -1901,7 +2119,8 @@ def detect_suspicious(vba_code, obfuscation=None):
1901 for description, keywords in SUSPICIOUS_KEYWORDS.items(): 2119 for description, keywords in SUSPICIOUS_KEYWORDS.items():
1902 for keyword in keywords: 2120 for keyword in keywords:
1903 # search using regex to detect word boundaries: 2121 # search using regex to detect word boundaries:
1904 - match = re.search(r'(?i)\b' + keyword + r'\b', vba_code) 2122 + # note: each keyword must be escaped if it contains special chars such as '\'
  2123 + match = re.search(r'(?i)\b' + re.escape(keyword) + r'\b', vba_code)
1905 if match: 2124 if match:
1906 #if keyword.lower() in vba_code: 2125 #if keyword.lower() in vba_code:
1907 found_keyword = match.group() 2126 found_keyword = match.group()
@@ -1909,7 +2128,9 @@ def detect_suspicious(vba_code, obfuscation=None): @@ -1909,7 +2128,9 @@ def detect_suspicious(vba_code, obfuscation=None):
1909 for description, keywords in SUSPICIOUS_KEYWORDS_NOREGEX.items(): 2128 for description, keywords in SUSPICIOUS_KEYWORDS_NOREGEX.items():
1910 for keyword in keywords: 2129 for keyword in keywords:
1911 if keyword.lower() in vba_code: 2130 if keyword.lower() in vba_code:
1912 - results.append((keyword, description + obf_text)) 2131 + # avoid reporting backspace chars out of plain VBA code:
  2132 + if not(keyword=='\b' and obfuscation is not None):
  2133 + results.append((keyword, description + obf_text))
1913 return results 2134 return results
1914 2135
1915 2136
@@ -1947,7 +2168,7 @@ def detect_hex_strings(vba_code): @@ -1947,7 +2168,7 @@ def detect_hex_strings(vba_code):
1947 for match in re_hex_string.finditer(vba_code): 2168 for match in re_hex_string.finditer(vba_code):
1948 value = match.group() 2169 value = match.group()
1949 if value not in found: 2170 if value not in found:
1950 - decoded = binascii.unhexlify(value) 2171 + decoded = bytes2str(binascii.unhexlify(value))
1951 results.append((value, decoded)) 2172 results.append((value, decoded))
1952 found.add(value) 2173 found.add(value)
1953 return results 2174 return results
@@ -1972,7 +2193,7 @@ def detect_base64_strings(vba_code): @@ -1972,7 +2193,7 @@ def detect_base64_strings(vba_code):
1972 # only keep new values and not in the whitelist: 2193 # only keep new values and not in the whitelist:
1973 if value not in found and value.lower() not in BASE64_WHITELIST: 2194 if value not in found and value.lower() not in BASE64_WHITELIST:
1974 try: 2195 try:
1975 - decoded = base64.b64decode(value) 2196 + decoded = bytes2str(base64.b64decode(value))
1976 results.append((value, decoded)) 2197 results.append((value, decoded))
1977 found.add(value) 2198 found.add(value)
1978 except (TypeError, ValueError) as exc: 2199 except (TypeError, ValueError) as exc:
@@ -2000,7 +2221,7 @@ def detect_dridex_strings(vba_code): @@ -2000,7 +2221,7 @@ def detect_dridex_strings(vba_code):
2000 continue 2221 continue
2001 if value not in found: 2222 if value not in found:
2002 try: 2223 try:
2003 - decoded = DridexUrlDecode(value) 2224 + decoded = bytes2str(DridexUrlDecode(value))
2004 results.append((value, decoded)) 2225 results.append((value, decoded))
2005 found.add(value) 2226 found.add(value)
2006 except Exception as exc: 2227 except Exception as exc:
@@ -2047,7 +2268,8 @@ def detect_vba_strings(vba_code): @@ -2047,7 +2268,8 @@ def detect_vba_strings(vba_code):
2047 2268
2048 2269
2049 def json2ascii(json_obj, encoding='utf8', errors='replace'): 2270 def json2ascii(json_obj, encoding='utf8', errors='replace'):
2050 - """ ensure there is no unicode in json and all strings are safe to decode 2271 + """
  2272 + ensure there is no unicode in json and all strings are safe to decode
2051 2273
2052 works recursively, decodes and re-encodes every string to/from unicode 2274 works recursively, decodes and re-encodes every string to/from unicode
2053 to ensure there will be no trouble in loading the dumped json output 2275 to ensure there will be no trouble in loading the dumped json output
@@ -2057,20 +2279,32 @@ def json2ascii(json_obj, encoding=&#39;utf8&#39;, errors=&#39;replace&#39;): @@ -2057,20 +2279,32 @@ def json2ascii(json_obj, encoding=&#39;utf8&#39;, errors=&#39;replace&#39;):
2057 elif isinstance(json_obj, (bool, int, float)): 2279 elif isinstance(json_obj, (bool, int, float)):
2058 pass 2280 pass
2059 elif isinstance(json_obj, str): 2281 elif isinstance(json_obj, str):
2060 - # de-code and re-encode  
2061 - dencoded = json_obj.decode(encoding, errors).encode(encoding, errors)  
2062 - if dencoded != json_obj:  
2063 - log.debug('json2ascii: replaced: {0} (len {1})'  
2064 - .format(json_obj, len(json_obj)))  
2065 - log.debug('json2ascii: with: {0} (len {1})'  
2066 - .format(dencoded, len(dencoded)))  
2067 - return dencoded  
2068 - elif isinstance(json_obj, unicode):  
2069 - log.debug('json2ascii: encode unicode: {0}'  
2070 - .format(json_obj.encode(encoding, errors))) 2282 + if PYTHON2:
  2283 + # de-code and re-encode
  2284 + dencoded = json_obj.decode(encoding, errors).encode(encoding, errors)
  2285 + if dencoded != json_obj:
  2286 + log.debug('json2ascii: replaced: {0} (len {1})'
  2287 + .format(json_obj, len(json_obj)))
  2288 + log.debug('json2ascii: with: {0} (len {1})'
  2289 + .format(dencoded, len(dencoded)))
  2290 + return dencoded
  2291 + else:
  2292 + # on Python 3, just keep Unicode strings as-is:
  2293 + return json_obj
  2294 + elif isinstance(json_obj, unicode) and PYTHON2:
  2295 + # On Python 2, encode unicode to bytes:
  2296 + json_obj_bytes = json_obj.encode(encoding, errors)
  2297 + log.debug('json2ascii: encode unicode: {0}'.format(json_obj_bytes))
  2298 + # cannot put original into logger
  2299 + # print 'original: ' json_obj
  2300 + return json_obj_bytes
  2301 + elif isinstance(json_obj, bytes) and not PYTHON2:
  2302 + # On Python 3, decode bytes to unicode str
  2303 + json_obj_str = json_obj.decode(encoding, errors)
  2304 + log.debug('json2ascii: encode unicode: {0}'.format(json_obj_str))
2071 # cannot put original into logger 2305 # cannot put original into logger
2072 # print 'original: ' json_obj 2306 # print 'original: ' json_obj
2073 - return json_obj.encode(encoding, errors) 2307 + return json_obj_str
2074 elif isinstance(json_obj, dict): 2308 elif isinstance(json_obj, dict):
2075 for key in json_obj: 2309 for key in json_obj:
2076 json_obj[key] = json2ascii(json_obj[key]) 2310 json_obj[key] = json2ascii(json_obj[key])
@@ -2096,7 +2330,6 @@ def print_json(json_dict=None, _json_is_first=False, _json_is_last=False, @@ -2096,7 +2330,6 @@ def print_json(json_dict=None, _json_is_first=False, _json_is_last=False,
2096 :param bool _json_is_last: set to True only for very last entry to complete 2330 :param bool _json_is_last: set to True only for very last entry to complete
2097 the top-level json-list 2331 the top-level json-list
2098 """ 2332 """
2099 -  
2100 if json_dict and json_parts: 2333 if json_dict and json_parts:
2101 raise ValueError('Invalid json argument: want either single dict or ' 2334 raise ValueError('Invalid json argument: want either single dict or '
2102 'key=value parts but got both)') 2335 'key=value parts but got both)')
@@ -2177,7 +2410,7 @@ class VBA_Scanner(object): @@ -2177,7 +2410,7 @@ class VBA_Scanner(object):
2177 # StrReverse after hex decoding: 2410 # StrReverse after hex decoding:
2178 self.code_hex_rev += '\n' + decoded[::-1] 2411 self.code_hex_rev += '\n' + decoded[::-1]
2179 # StrReverse before hex decoding: 2412 # StrReverse before hex decoding:
2180 - self.code_rev_hex += '\n' + binascii.unhexlify(encoded[::-1]) 2413 + self.code_rev_hex += '\n' + bytes2str(binascii.unhexlify(encoded[::-1]))
2181 #example: https://malwr.com/analysis/NmFlMGI4YTY1YzYyNDkwNTg1ZTBiZmY5OGI3YjlhYzU/ 2414 #example: https://malwr.com/analysis/NmFlMGI4YTY1YzYyNDkwNTg1ZTBiZmY5OGI3YjlhYzU/
2182 #TODO: also append the full code reversed if StrReverse? (risk of false positives?) 2415 #TODO: also append the full code reversed if StrReverse? (risk of false positives?)
2183 # Detect Base64-encoded strings 2416 # Detect Base64-encoded strings
@@ -2287,7 +2520,7 @@ def scan_vba(vba_code, include_decoded_strings, deobfuscate=False): @@ -2287,7 +2520,7 @@ def scan_vba(vba_code, include_decoded_strings, deobfuscate=False):
2287 :param include_decoded_strings: bool, if True all encoded strings will be included with their decoded content. 2520 :param include_decoded_strings: bool, if True all encoded strings will be included with their decoded content.
2288 :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) 2521 :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
2289 :return: list of tuples (type, keyword, description) 2522 :return: list of tuples (type, keyword, description)
2290 - (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String') 2523 + with type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String'
2291 """ 2524 """
2292 return VBA_Scanner(vba_code).scan(include_decoded_strings, deobfuscate) 2525 return VBA_Scanner(vba_code).scan(include_decoded_strings, deobfuscate)
2293 2526
@@ -2297,44 +2530,38 @@ def scan_vba(vba_code, include_decoded_strings, deobfuscate=False): @@ -2297,44 +2530,38 @@ def scan_vba(vba_code, include_decoded_strings, deobfuscate=False):
2297 class VBA_Parser(object): 2530 class VBA_Parser(object):
2298 """ 2531 """
2299 Class to parse MS Office files, to detect VBA macros and extract VBA source code 2532 Class to parse MS Office files, to detect VBA macros and extract VBA source code
2300 - Supported file formats:  
2301 - - Word 97-2003 (.doc, .dot)  
2302 - - Word 2007+ (.docm, .dotm)  
2303 - - Word 2003 XML (.xml)  
2304 - - Word MHT - Single File Web Page / MHTML (.mht)  
2305 - - Excel 97-2003 (.xls)  
2306 - - Excel 2007+ (.xlsm, .xlsb)  
2307 - - PowerPoint 97-2003 (.ppt)  
2308 - - PowerPoint 2007+ (.pptm, .ppsm)  
2309 """ 2533 """
2310 2534
2311 - def __init__(self, filename, data=None, container=None, relaxed=False): 2535 + def __init__(self, filename, data=None, container=None, relaxed=False, encoding=DEFAULT_API_ENCODING):
2312 """ 2536 """
2313 Constructor for VBA_Parser 2537 Constructor for VBA_Parser
2314 2538
2315 - :param filename: filename or path of file to parse, or file-like object 2539 + :param str filename: filename or path of file to parse, or file-like object
2316 2540
2317 - :param data: None or bytes str, if None the file will be read from disk (or from the file-like object).  
2318 - If data is provided as a bytes string, it will be parsed as the content of the file in memory,  
2319 - and not read from disk. Note: files must be read in binary mode, i.e. open(f, 'rb'). 2541 + :param bytes data: None or bytes str, if None the file will be read from disk (or from the file-like object).
  2542 + If data is provided as a bytes string, it will be parsed as the content of the file in memory,
  2543 + and not read from disk. Note: files must be read in binary mode, i.e. open(f, 'rb').
2320 2544
2321 - :param container: str, path and filename of container if the file is within  
2322 - a zip archive, None otherwise. 2545 + :param str container: str, path and filename of container if the file is within
  2546 + a zip archive, None otherwise.
2323 2547
2324 - :param relaxed: if True, treat mal-formed documents and missing streams more like MS office:  
2325 - do nothing; if False (default), raise errors in these cases 2548 + :param bool relaxed: if True, treat mal-formed documents and missing streams more like MS office:
  2549 + do nothing; if False (default), raise errors in these cases
2326 2550
2327 - raises a FileOpenError if all attemps to interpret the data header failed 2551 + :param str encoding: encoding for VBA source code and strings.
  2552 + Default: UTF-8 bytes strings on Python 2, unicode strings on Python 3 (None)
  2553 +
  2554 + raises a FileOpenError if all attempts to interpret the data header failed.
2328 """ 2555 """
2329 - #TODO: filename should only be a string, data should be used for the file-like object  
2330 - #TODO: filename should be mandatory, optional data is a string or file-like object  
2331 - #TODO: also support olefile and zipfile as input 2556 + # TODO: filename should only be a string, data should be used for the file-like object
  2557 + # TODO: filename should be mandatory, optional data is a string or file-like object
  2558 + # TODO: also support olefile and zipfile as input
2332 if data is None: 2559 if data is None:
2333 # open file from disk: 2560 # open file from disk:
2334 _file = filename 2561 _file = filename
2335 else: 2562 else:
2336 # file already read in memory, make it a file-like object for zipfile: 2563 # file already read in memory, make it a file-like object for zipfile:
2337 - _file = StringIO(data) 2564 + _file = BytesIO(data)
2338 #self.file = _file 2565 #self.file = _file
2339 self.ole_file = None 2566 self.ole_file = None
2340 self.ole_subfiles = [] 2567 self.ole_subfiles = []
@@ -2359,6 +2586,13 @@ class VBA_Parser(object): @@ -2359,6 +2586,13 @@ class VBA_Parser(object):
2359 self.nb_base64strings = 0 2586 self.nb_base64strings = 0
2360 self.nb_dridexstrings = 0 2587 self.nb_dridexstrings = 0
2361 self.nb_vbastrings = 0 2588 self.nb_vbastrings = 0
  2589 + #: Encoding for VBA source code and strings returned by all methods
  2590 + self.encoding = encoding
  2591 + self.xlm_macros = []
  2592 + #: Output from pcodedmp, disassembly of the VBA P-code
  2593 + self.pcodedmp_output = None
  2594 + #: Flag set to True/False if VBA stomping detected
  2595 + self.vba_stomping_detected = None
2362 2596
2363 # if filename is None: 2597 # if filename is None:
2364 # if isinstance(_file, basestring): 2598 # if isinstance(_file, basestring):
@@ -2372,15 +2606,9 @@ class VBA_Parser(object): @@ -2372,15 +2606,9 @@ class VBA_Parser(object):
2372 # This looks like an OLE file 2606 # This looks like an OLE file
2373 self.open_ole(_file) 2607 self.open_ole(_file)
2374 2608
2375 - # check whether file is encrypted (need to do this before try ppt)  
2376 - log.debug('Check encryption of ole file')  
2377 - crypt_indicator = oleid.OleID(self.ole_file).check_encrypted()  
2378 - if crypt_indicator.value:  
2379 - raise FileIsEncryptedError(filename)  
2380 -  
2381 # if this worked, try whether it is a ppt file (special ole file) 2609 # if this worked, try whether it is a ppt file (special ole file)
2382 self.open_ppt() 2610 self.open_ppt()
2383 - if self.type is None and is_zipfile(_file): 2611 + if self.type is None and zipfile.is_zipfile(_file):
2384 # Zip file, which may be an OpenXML document 2612 # Zip file, which may be an OpenXML document
2385 self.open_openxml(_file) 2613 self.open_openxml(_file)
2386 if self.type is None: 2614 if self.type is None:
@@ -2600,12 +2828,12 @@ class VBA_Parser(object): @@ -2600,12 +2828,12 @@ class VBA_Parser(object):
2600 try: 2828 try:
2601 # parse the MIME content 2829 # parse the MIME content
2602 # remove any leading whitespace or newline (workaround for issue in email package) 2830 # remove any leading whitespace or newline (workaround for issue in email package)
2603 - stripped_data = data.lstrip('\r\n\t ') 2831 + stripped_data = data.lstrip(b'\r\n\t ')
2604 # strip any junk from the beginning of the file 2832 # strip any junk from the beginning of the file
2605 # (issue #31 fix by Greg C - gdigreg) 2833 # (issue #31 fix by Greg C - gdigreg)
2606 # TODO: improve keywords to avoid false positives 2834 # TODO: improve keywords to avoid false positives
2607 - mime_offset = stripped_data.find('MIME')  
2608 - content_offset = stripped_data.find('Content') 2835 + mime_offset = stripped_data.find(b'MIME')
  2836 + content_offset = stripped_data.find(b'Content')
2609 # if "MIME" is found, and located before "Content": 2837 # if "MIME" is found, and located before "Content":
2610 if -1 < mime_offset <= content_offset: 2838 if -1 < mime_offset <= content_offset:
2611 stripped_data = stripped_data[mime_offset:] 2839 stripped_data = stripped_data[mime_offset:]
@@ -2614,7 +2842,11 @@ class VBA_Parser(object): @@ -2614,7 +2842,11 @@ class VBA_Parser(object):
2614 elif content_offset > -1: 2842 elif content_offset > -1:
2615 stripped_data = stripped_data[content_offset:] 2843 stripped_data = stripped_data[content_offset:]
2616 # TODO: quick and dirty fix: insert a standard line with MIME-Version header? 2844 # TODO: quick and dirty fix: insert a standard line with MIME-Version header?
2617 - mhtml = email.message_from_string(stripped_data) 2845 + if PYTHON2:
  2846 + mhtml = email.message_from_string(stripped_data)
  2847 + else:
  2848 + # on Python 3, need to use message_from_bytes instead:
  2849 + mhtml = email.message_from_bytes(stripped_data)
2618 # find all the attached files: 2850 # find all the attached files:
2619 for part in mhtml.walk(): 2851 for part in mhtml.walk():
2620 content_type = part.get_content_type() # always returns a value 2852 content_type = part.get_content_type() # always returns a value
@@ -2627,7 +2859,7 @@ class VBA_Parser(object): @@ -2627,7 +2859,7 @@ class VBA_Parser(object):
2627 # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded. 2859 # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded.
2628 # decompress the zlib data starting at offset 0x32, which is the OLE container: 2860 # decompress the zlib data starting at offset 0x32, which is the OLE container:
2629 # check ActiveMime header: 2861 # check ActiveMime header:
2630 - if isinstance(part_data, str) and is_mso_file(part_data): 2862 + if isinstance(part_data, bytes) and is_mso_file(part_data):
2631 log.debug('Found ActiveMime header, decompressing MSO container') 2863 log.debug('Found ActiveMime header, decompressing MSO container')
2632 try: 2864 try:
2633 ole_data = mso_file_extract(part_data) 2865 ole_data = mso_file_extract(part_data)
@@ -2697,7 +2929,9 @@ class VBA_Parser(object): @@ -2697,7 +2929,9 @@ class VBA_Parser(object):
2697 """ 2929 """
2698 log.info('Opening text file %s' % self.filename) 2930 log.info('Opening text file %s' % self.filename)
2699 # directly store the source code: 2931 # directly store the source code:
2700 - self.vba_code_all_modules = data 2932 + # On Python 2, store it as a raw bytes string
  2933 + # On Python 3, convert it to unicode assuming it was encoded with UTF-8
  2934 + self.vba_code_all_modules = bytes2str(data)
2701 self.contains_macros = True 2935 self.contains_macros = True
2702 # set type only if parsing succeeds 2936 # set type only if parsing succeeds
2703 self.type = TYPE_TEXT 2937 self.type = TYPE_TEXT
@@ -2853,7 +3087,7 @@ class VBA_Parser(object): @@ -2853,7 +3087,7 @@ class VBA_Parser(object):
2853 log.debug('%r...[much more data]...%r' % (data[:100], data[-50:])) 3087 log.debug('%r...[much more data]...%r' % (data[:100], data[-50:]))
2854 else: 3088 else:
2855 log.debug(repr(data)) 3089 log.debug(repr(data))
2856 - if 'Attribut\x00' in data: 3090 + if b'Attribut\x00' in data:
2857 log.debug('Found VBA compressed code') 3091 log.debug('Found VBA compressed code')
2858 self.contains_macros = True 3092 self.contains_macros = True
2859 except IOError as exc: 3093 except IOError as exc:
@@ -2862,8 +3096,44 @@ class VBA_Parser(object): @@ -2862,8 +3096,44 @@ class VBA_Parser(object):
2862 log.debug('Trace:', exc_trace=True) 3096 log.debug('Trace:', exc_trace=True)
2863 else: 3097 else:
2864 raise SubstreamOpenError(self.filename, d.name, exc) 3098 raise SubstreamOpenError(self.filename, d.name, exc)
  3099 + if self.detect_xlm_macros():
  3100 + self.contains_macros = True
2865 return self.contains_macros 3101 return self.contains_macros
2866 3102
  3103 + def detect_xlm_macros(self):
  3104 + from oletools.thirdparty.oledump.plugin_biff import cBIFF
  3105 + self.xlm_macros = []
  3106 + if self.ole_file is None:
  3107 + return False
  3108 + for excel_stream in ('Workbook', 'Book'):
  3109 + if self.ole_file.exists(excel_stream):
  3110 + log.debug('Found Excel stream %r' % excel_stream)
  3111 + data = self.ole_file.openstream(excel_stream).read()
  3112 + log.debug('Running BIFF plugin from oledump')
  3113 + try:
  3114 + biff_plugin = cBIFF(name=[excel_stream], stream=data, options='-x')
  3115 + self.xlm_macros = biff_plugin.Analyze()
  3116 + if len(self.xlm_macros)>0:
  3117 + log.debug('Found XLM macros')
  3118 + return True
  3119 + except:
  3120 + log.exception('Error when running oledump.plugin_biff, please report to %s' % URL_OLEVBA_ISSUES)
  3121 + return False
  3122 +
  3123 +
  3124 + def encode_string(self, unicode_str):
  3125 + """
  3126 + Encode a unicode string to bytes or str, using the specified encoding
  3127 + for the VBA_parser. By default, it will be bytes/UTF-8 on Python 2, and
  3128 + a normal unicode string on Python 3.
  3129 + :param str unicode_str: string to be encoded
  3130 + :return: encoded string
  3131 + """
  3132 + if self.encoding is None:
  3133 + return unicode_str
  3134 + else:
  3135 + return unicode_str.encode(self.encoding, errors='replace')
  3136 +
2867 def extract_macros(self): 3137 def extract_macros(self):
2868 """ 3138 """
2869 Extract and decompress source code for each VBA macro found in the file 3139 Extract and decompress source code for each VBA macro found in the file
@@ -2920,18 +3190,33 @@ class VBA_Parser(object): @@ -2920,18 +3190,33 @@ class VBA_Parser(object):
2920 # read data 3190 # read data
2921 log.debug('Reading data from stream %r' % d.name) 3191 log.debug('Reading data from stream %r' % d.name)
2922 data = ole._open(d.isectStart, d.size).read() 3192 data = ole._open(d.isectStart, d.size).read()
2923 - for match in re.finditer(r'\x00Attribut[^e]', data, flags=re.IGNORECASE): 3193 + for match in re.finditer(b'\\x00Attribut[^e]', data, flags=re.IGNORECASE):
2924 start = match.start() - 3 3194 start = match.start() - 3
2925 log.debug('Found VBA compressed code at index %X' % start) 3195 log.debug('Found VBA compressed code at index %X' % start)
2926 compressed_code = data[start:] 3196 compressed_code = data[start:]
2927 try: 3197 try:
2928 - vba_code = decompress_stream(compressed_code) 3198 + vba_code = decompress_stream(bytearray(compressed_code))
  3199 + # TODO vba_code = self.encode_string(vba_code)
2929 yield (self.filename, d.name, d.name, vba_code) 3200 yield (self.filename, d.name, d.name, vba_code)
2930 except Exception as exc: 3201 except Exception as exc:
2931 # display the exception with full stack trace for debugging 3202 # display the exception with full stack trace for debugging
2932 log.debug('Error processing stream %r in file %r (%s)' % (d.name, self.filename, exc)) 3203 log.debug('Error processing stream %r in file %r (%s)' % (d.name, self.filename, exc))
2933 log.debug('Traceback:', exc_info=True) 3204 log.debug('Traceback:', exc_info=True)
2934 # do not raise the error, as it is unlikely to be a compressed macro stream 3205 # do not raise the error, as it is unlikely to be a compressed macro stream
  3206 + if self.xlm_macros:
  3207 + vba_code = ''
  3208 + for line in self.xlm_macros:
  3209 + vba_code += "' " + line + '\n'
  3210 + yield ('xlm_macro', 'xlm_macro', 'xlm_macro.txt', vba_code)
  3211 + # Analyse the VBA P-code to detect VBA stomping:
  3212 + # If stomping is detected, add a fake VBA module with the P-code as source comments
  3213 + # so that VBA_Scanner can find keywords and IOCs in it
  3214 + if self.detect_vba_stomping():
  3215 + vba_code = ''
  3216 + for line in self.pcodedmp_output.splitlines():
  3217 + vba_code += "' " + line + '\n'
  3218 + yield ('VBA P-code', 'VBA P-code', 'VBA_P-code.txt', vba_code)
  3219 +
2935 3220
2936 def extract_all_macros(self): 3221 def extract_all_macros(self):
2937 """ 3222 """
@@ -2953,6 +3238,8 @@ class VBA_Parser(object): @@ -2953,6 +3238,8 @@ class VBA_Parser(object):
2953 """ 3238 """
2954 runs extract_macros and analyze the source code of all VBA macros 3239 runs extract_macros and analyze the source code of all VBA macros
2955 found in the file. 3240 found in the file.
  3241 + All results are stored in self.analysis_results.
  3242 + If called more than once, simply returns the previous results.
2956 """ 3243 """
2957 if self.detect_vba_macros(): 3244 if self.detect_vba_macros():
2958 # if the analysis was already done, avoid doing it twice: 3245 # if the analysis was already done, avoid doing it twice:
@@ -2969,6 +3256,13 @@ class VBA_Parser(object): @@ -2969,6 +3256,13 @@ class VBA_Parser(object):
2969 # Analyze the whole code at once: 3256 # Analyze the whole code at once:
2970 scanner = VBA_Scanner(self.vba_code_all_modules) 3257 scanner = VBA_Scanner(self.vba_code_all_modules)
2971 self.analysis_results = scanner.scan(show_decoded_strings, deobfuscate) 3258 self.analysis_results = scanner.scan(show_decoded_strings, deobfuscate)
  3259 + if self.detect_vba_stomping():
  3260 + log.debug('adding VBA stomping to suspicious keywords')
  3261 + keyword = 'VBA Stomping'
  3262 + description = 'VBA Stomping was detected: the VBA source code and P-code are different, '\
  3263 + 'this may have been used to hide malicious code'
  3264 + scanner.suspicious_keywords.append((keyword, description))
  3265 + scanner.results.append(('Suspicious', keyword, description))
2972 autoexec, suspicious, iocs, hexstrings, base64strings, dridex, vbastrings = scanner.scan_summary() 3266 autoexec, suspicious, iocs, hexstrings, base64strings, dridex, vbastrings = scanner.scan_summary()
2973 self.nb_autoexec += autoexec 3267 self.nb_autoexec += autoexec
2974 self.nb_suspicious += suspicious 3268 self.nb_suspicious += suspicious
@@ -3080,11 +3374,12 @@ class VBA_Parser(object): @@ -3080,11 +3374,12 @@ class VBA_Parser(object):
3080 """ 3374 """
3081 Extract printable strings from each VBA Form found in the file 3375 Extract printable strings from each VBA Form found in the file
3082 3376
3083 - Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found 3377 + Iterator: yields (filename, stream_path, form_string) for each printable string found in forms
3084 If the file is OLE, filename is the path of the file. 3378 If the file is OLE, filename is the path of the file.
3085 If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros 3379 If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros
3086 within the zip archive, e.g. word/vbaProject.bin. 3380 within the zip archive, e.g. word/vbaProject.bin.
3087 If the file is PPT, result is as for OpenXML but filename is useless 3381 If the file is PPT, result is as for OpenXML but filename is useless
  3382 + Note: form_string is a raw bytes string on Python 2, a unicode str on Python 3
3088 """ 3383 """
3089 if self.ole_file is None: 3384 if self.ole_file is None:
3090 # This may be either an OpenXML/PPT or a text file: 3385 # This may be either an OpenXML/PPT or a text file:
@@ -3107,7 +3402,13 @@ class VBA_Parser(object): @@ -3107,7 +3402,13 @@ class VBA_Parser(object):
3107 # Extract printable strings from the form object stream "o": 3402 # Extract printable strings from the form object stream "o":
3108 for m in re_printable_string.finditer(form_data): 3403 for m in re_printable_string.finditer(form_data):
3109 log.debug('Printable string found in form: %r' % m.group()) 3404 log.debug('Printable string found in form: %r' % m.group())
3110 - yield (self.filename, '/'.join(o_stream), m.group()) 3405 + # On Python 3, convert bytes string to unicode str:
  3406 + if PYTHON2:
  3407 + found_str = m.group()
  3408 + else:
  3409 + found_str = m.group().decode('utf8', errors='replace')
  3410 + if found_str != 'Tahoma':
  3411 + yield (self.filename, '/'.join(o_stream), found_str)
3111 3412
3112 def extract_form_strings_extended(self): 3413 def extract_form_strings_extended(self):
3113 if self.ole_file is None: 3414 if self.ole_file is None:
@@ -3128,6 +3429,136 @@ class VBA_Parser(object): @@ -3128,6 +3429,136 @@ class VBA_Parser(object):
3128 for variable in oleform.extract_OleFormVariables(ole, form_storage): 3429 for variable in oleform.extract_OleFormVariables(ole, form_storage):
3129 yield (self.filename, '/'.join(form_storage), variable) 3430 yield (self.filename, '/'.join(form_storage), variable)
3130 3431
  3432 + def extract_pcode(self):
  3433 + """
  3434 + Extract and disassemble the VBA P-code, using pcodedmp
  3435 +
  3436 + :return: VBA P-code disassembly
  3437 + :rtype: str
  3438 + """
  3439 + # only run it once:
  3440 + if self.pcodedmp_output is None:
  3441 + log.debug('Calling pcodedmp to extract and disassemble the VBA P-code')
  3442 + # import pcodedmp here to avoid circular imports:
  3443 + try:
  3444 + from pcodedmp import pcodedmp
  3445 + except Exception as e:
  3446 + # This may happen with Pypy, because pcodedmp imports win_unicode_console...
  3447 + # TODO: this is a workaround, we just ignore P-code
  3448 + # TODO: here we just use log.info, because the word "error" in the output makes some of the tests fail...
  3449 + log.info('Exception when importing pcodedmp: {}'.format(e))
  3450 + self.pcodedmp_output = ''
  3451 + return ''
  3452 + # logging is disabled after importing pcodedmp, need to re-enable it
  3453 + # This is because pcodedmp imports olevba again :-/
  3454 + # TODO: here it works only if logging was enabled, need to change pcodedmp!
  3455 + enable_logging()
  3456 + # pcodedmp prints all its output to sys.stdout, so we need to capture it so that
  3457 + # we can process the results later on.
  3458 + # save sys.stdout, then modify it to capture pcodedmp's output:
  3459 + # stdout = sys.stdout
  3460 + if PYTHON2:
  3461 + # on Python 2, console output is bytes
  3462 + output = BytesIO()
  3463 + else:
  3464 + # on Python 3, console output is unicode
  3465 + output = StringIO()
  3466 + # sys.stdout = output
  3467 + # we need to fake an argparser for those two args used by pcodedmp:
  3468 + class args:
  3469 + disasmOnly = True
  3470 + verbose = False
  3471 + try:
  3472 + # TODO: handle files in memory too
  3473 + log.debug('before pcodedmp')
  3474 + pcodedmp.processFile(self.filename, args, output_file=output)
  3475 + log.debug('after pcodedmp')
  3476 + except Exception as e:
  3477 + # print('Error while running pcodedmp: {}'.format(e), file=sys.stderr, flush=True)
  3478 + # set sys.stdout back to its original value
  3479 + # sys.stdout = stdout
  3480 + log.exception('Error while running pcodedmp')
  3481 + # finally:
  3482 + # # set sys.stdout back to its original value
  3483 + # sys.stdout = stdout
  3484 + self.pcodedmp_output = output.getvalue()
  3485 + # print(self.pcodedmp_output)
  3486 + # log.debug(self.pcodedmp_output)
  3487 + return self.pcodedmp_output
  3488 +
  3489 + def detect_vba_stomping(self):
  3490 + """
  3491 + Detect VBA stomping, by comparing the keywords present in the P-code and
  3492 + in the VBA source code.
  3493 +
  3494 + :return: True if VBA stomping detected, False otherwise
  3495 + :rtype: bool
  3496 + """
  3497 + # only run it once:
  3498 + if self.vba_stomping_detected is None:
  3499 + log.debug('Analysing the P-code to detect VBA stomping')
  3500 + self.extract_pcode()
  3501 + # print('pcodedmp OK')
  3502 + log.debug('pcodedmp OK')
  3503 + # process the output to extract keywords, to detect VBA stomping
  3504 + keywords = set()
  3505 + for line in self.pcodedmp_output.splitlines():
  3506 + if line.startswith('\t'):
  3507 + log.debug('P-code: ' + line.strip())
  3508 + tokens = line.split(None, 1)
  3509 + mnemonic = tokens[0]
  3510 + args = ''
  3511 + if len(tokens) == 2:
  3512 + args = tokens[1].strip()
  3513 + # log.debug(repr([mnemonic, args]))
  3514 + # if mnemonic in ('VarDefn',):
  3515 + # # just add the rest of the line
  3516 + # keywords.add(args)
  3517 + # if mnemonic == 'FuncDefn':
  3518 + # # function definition: just strip parentheses
  3519 + # funcdefn = args.strip('()')
  3520 + # keywords.add(funcdefn)
  3521 + if mnemonic in ('ArgsCall', 'ArgsLd', 'St', 'Ld', 'MemSt', 'Label'):
  3522 + # add 1st argument:
  3523 + name = args.split(None, 1)[0]
  3524 + # sometimes pcodedmp reports names like "id_FFFF", which are not
  3525 + # directly present in the VBA source code
  3526 + # (for example "Me" in VBA appears as id_FFFF in P-code)
  3527 + if not name.startswith('id_'):
  3528 + keywords.add(name)
  3529 + if mnemonic == 'LitStr':
  3530 + # re_string = re.compile(r'\"([^\"]|\"\")*\"')
  3531 + # for match in re_string.finditer(line):
  3532 + # print('\t' + match.group())
  3533 + # the string is the 2nd argument:
  3534 + s = args.split(None, 1)[1]
  3535 + # tricky issue: when a string contains double quotes inside,
  3536 + # pcodedmp returns a single ", whereas in the VBA source code
  3537 + # it is always a double "".
  3538 + # We have to remove the " around the strings, then double the remaining ",
  3539 + # and put back the " around:
  3540 + if len(s)>=2:
  3541 + assert(s[0]=='"' and s[-1]=='"')
  3542 + s = s[1:-1]
  3543 + s = s.replace('"', '""')
  3544 + s = '"' + s + '"'
  3545 + keywords.add(s)
  3546 + log.debug('Keywords extracted from P-code: ' + repr(sorted(keywords)))
  3547 + self.vba_stomping_detected = False
  3548 + # TODO: add a method to get all VBA code as one string
  3549 + vba_code_all_modules = ''
  3550 + for (_, _, _, vba_code) in self.extract_all_macros():
  3551 + vba_code_all_modules += vba_code + '\n'
  3552 + for keyword in keywords:
  3553 + if keyword not in vba_code_all_modules:
  3554 + log.debug('Keyword {!r} not found in VBA code'.format(keyword))
  3555 + log.debug('VBA STOMPING DETECTED!')
  3556 + self.vba_stomping_detected = True
  3557 + break
  3558 + if not self.vba_stomping_detected:
  3559 + log.debug('No VBA stomping detected.')
  3560 + return self.vba_stomping_detected
  3561 +
3131 def close(self): 3562 def close(self):
3132 """ 3563 """
3133 Close all the open files. This method must be called after usage, if 3564 Close all the open files. This method must be called after usage, if
@@ -3156,11 +3587,11 @@ class VBA_Parser_CLI(VBA_Parser): @@ -3156,11 +3587,11 @@ class VBA_Parser_CLI(VBA_Parser):
3156 super(VBA_Parser_CLI, self).__init__(*args, **kwargs) 3587 super(VBA_Parser_CLI, self).__init__(*args, **kwargs)
3157 3588
3158 3589
3159 - def print_analysis(self, show_decoded_strings=False, deobfuscate=False): 3590 + def run_analysis(self, show_decoded_strings=False, deobfuscate=False):
3160 """ 3591 """
3161 - Analyze the provided VBA code, and print the results in a table 3592 + Analyze the provided VBA code, without printing the results (yet)
  3593 + All results are stored in self.analysis_results.
3162 3594
3163 - :param vba_code: str, VBA source code to be analyzed  
3164 :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content. 3595 :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
3165 :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) 3596 :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
3166 :return: None 3597 :return: None
@@ -3169,21 +3600,37 @@ class VBA_Parser_CLI(VBA_Parser): @@ -3169,21 +3600,37 @@ class VBA_Parser_CLI(VBA_Parser):
3169 if sys.stdout.isatty(): 3600 if sys.stdout.isatty():
3170 print('Analysis...\r', end='') 3601 print('Analysis...\r', end='')
3171 sys.stdout.flush() 3602 sys.stdout.flush()
3172 - results = self.analyze_macros(show_decoded_strings, deobfuscate) 3603 + self.analyze_macros(show_decoded_strings, deobfuscate)
  3604 +
  3605 +
  3606 + def print_analysis(self, show_decoded_strings=False, deobfuscate=False):
  3607 + """
  3608 + print the analysis results in a table
  3609 +
  3610 + :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.
  3611 + :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
  3612 + :return: None
  3613 + """
  3614 + results = self.analysis_results
3173 if results: 3615 if results:
3174 - t = prettytable.PrettyTable(('Type', 'Keyword', 'Description'))  
3175 - t.align = 'l'  
3176 - t.max_width['Type'] = 10  
3177 - t.max_width['Keyword'] = 20  
3178 - t.max_width['Description'] = 39 3616 + t = tablestream.TableStream(column_width=(10, 20, 45),
  3617 + header_row=('Type', 'Keyword', 'Description'))
  3618 + COLOR_TYPE = {
  3619 + 'AutoExec': 'yellow',
  3620 + 'Suspicious': 'red',
  3621 + 'IOC': 'cyan',
  3622 + }
3179 for kw_type, keyword, description in results: 3623 for kw_type, keyword, description in results:
3180 # handle non printable strings: 3624 # handle non printable strings:
3181 if not is_printable(keyword): 3625 if not is_printable(keyword):
3182 keyword = repr(keyword) 3626 keyword = repr(keyword)
3183 if not is_printable(description): 3627 if not is_printable(description):
3184 description = repr(description) 3628 description = repr(description)
3185 - t.add_row((kw_type, keyword, description))  
3186 - print(t) 3629 + color_type = COLOR_TYPE.get(kw_type, None)
  3630 + t.write_row((kw_type, keyword, description), colors=(color_type, None, None))
  3631 + t.close()
  3632 + if self.vba_stomping_detected:
  3633 + print('VBA Stomping detection is experimental: please report any false positive/negative at https://github.com/decalage2/oletools/issues')
3187 else: 3634 else:
3188 print('No suspicious keyword or IOC found.') 3635 print('No suspicious keyword or IOC found.')
3189 3636
@@ -3204,10 +3651,29 @@ class VBA_Parser_CLI(VBA_Parser): @@ -3204,10 +3651,29 @@ class VBA_Parser_CLI(VBA_Parser):
3204 return [dict(type=kw_type, keyword=keyword, description=description) 3651 return [dict(type=kw_type, keyword=keyword, description=description)
3205 for kw_type, keyword, description in self.analyze_macros(show_decoded_strings, deobfuscate)] 3652 for kw_type, keyword, description in self.analyze_macros(show_decoded_strings, deobfuscate)]
3206 3653
  3654 + def colorize_keywords(self, vba_code):
  3655 + """
  3656 + Colorize keywords found during the VBA code analysis
  3657 + :param vba_code: str, VBA code to be colorized
  3658 + :return: str, VBA code including color tags for Colorclass
  3659 + """
  3660 + results = self.analysis_results
  3661 + if results:
  3662 + COLOR_TYPE = {
  3663 + 'AutoExec': 'yellow',
  3664 + 'Suspicious': 'red',
  3665 + 'IOC': 'cyan',
  3666 + }
  3667 + for kw_type, keyword, description in results:
  3668 + color_type = COLOR_TYPE.get(kw_type, None)
  3669 + if color_type:
  3670 + vba_code = vba_code.replace(keyword, '{auto%s}%s{/%s}' % (color_type, keyword, color_type))
  3671 + return vba_code
  3672 +
3207 def process_file(self, show_decoded_strings=False, 3673 def process_file(self, show_decoded_strings=False,
3208 display_code=True, hide_attributes=True, 3674 display_code=True, hide_attributes=True,
3209 vba_code_only=False, show_deobfuscated_code=False, 3675 vba_code_only=False, show_deobfuscated_code=False,
3210 - deobfuscate=False): 3676 + deobfuscate=False, pcode=False):
3211 """ 3677 """
3212 Process a single file 3678 Process a single file
3213 3679
@@ -3219,6 +3685,7 @@ class VBA_Parser_CLI(VBA_Parser): @@ -3219,6 +3685,7 @@ class VBA_Parser_CLI(VBA_Parser):
3219 otherwise each module is analyzed separately (old behaviour) 3685 otherwise each module is analyzed separately (old behaviour)
3220 :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default) 3686 :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default)
3221 :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) 3687 :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)
  3688 + :param pcode bool: if True, call pcodedmp to disassemble P-code and display it
3222 """ 3689 """
3223 #TODO: replace print by writing to a provided output file (sys.stdout by default) 3690 #TODO: replace print by writing to a provided output file (sys.stdout by default)
3224 # fix conflicting parameters: 3691 # fix conflicting parameters:
@@ -3234,6 +3701,8 @@ class VBA_Parser_CLI(VBA_Parser): @@ -3234,6 +3701,8 @@ class VBA_Parser_CLI(VBA_Parser):
3234 #TODO: handle olefile errors, when an OLE file is malformed 3701 #TODO: handle olefile errors, when an OLE file is malformed
3235 print('Type: %s'% self.type) 3702 print('Type: %s'% self.type)
3236 if self.detect_vba_macros(): 3703 if self.detect_vba_macros():
  3704 + # run analysis before displaying VBA code, in order to colorize found keywords
  3705 + self.run_analysis(show_decoded_strings=show_decoded_strings, deobfuscate=deobfuscate)
3237 #print 'Contains VBA Macros:' 3706 #print 'Contains VBA Macros:'
3238 for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros(): 3707 for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros():
3239 if hide_attributes: 3708 if hide_attributes:
@@ -3251,21 +3720,30 @@ class VBA_Parser_CLI(VBA_Parser): @@ -3251,21 +3720,30 @@ class VBA_Parser_CLI(VBA_Parser):
3251 print('(empty macro)') 3720 print('(empty macro)')
3252 else: 3721 else:
3253 # check if the VBA code contains special characters such as backspace (issue #358) 3722 # check if the VBA code contains special characters such as backspace (issue #358)
3254 - if b'\x08' in vba_code_filtered: 3723 + if '\x08' in vba_code_filtered:
3255 log.warning('The VBA code contains special characters such as backspace, that may be used for obfuscation.') 3724 log.warning('The VBA code contains special characters such as backspace, that may be used for obfuscation.')
3256 if sys.stdout.isatty(): 3725 if sys.stdout.isatty():
3257 # if the standard output is the console, we'll display colors 3726 # if the standard output is the console, we'll display colors
3258 backspace = colorclass.Color(b'{autored}\\x08{/red}') 3727 backspace = colorclass.Color(b'{autored}\\x08{/red}')
3259 else: 3728 else:
3260 - backspace = b'\x08' 3729 + backspace = '\x08'
3261 # replace backspace by "\x08" for display 3730 # replace backspace by "\x08" for display
3262 - vba_code_filtered = vba_code_filtered.replace(b'\x08', backspace) 3731 + vba_code_filtered = vba_code_filtered.replace('\x08', backspace)
  3732 + try:
  3733 + # Colorize the interesting keywords in the output:
  3734 + # (unless the output is redirected to a file)
  3735 + if sys.stdout.isatty():
  3736 + vba_code_filtered = colorclass.Color(self.colorize_keywords(vba_code_filtered))
  3737 + except UnicodeError:
  3738 + # TODO better handling of Unicode
  3739 + log.error('Unicode conversion to be fixed before colorizing the output')
3263 print(vba_code_filtered) 3740 print(vba_code_filtered)
3264 for (subfilename, stream_path, form_string) in self.extract_form_strings(): 3741 for (subfilename, stream_path, form_string) in self.extract_form_strings():
3265 - print('-' * 79)  
3266 - print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path))  
3267 - print('- ' * 39)  
3268 - print(form_string) 3742 + if form_string is not None:
  3743 + print('-' * 79)
  3744 + print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path))
  3745 + print('- ' * 39)
  3746 + print(form_string)
3269 try: 3747 try:
3270 for (subfilename, stream_path, form_variables) in self.extract_form_strings_extended(): 3748 for (subfilename, stream_path, form_variables) in self.extract_form_strings_extended():
3271 if form_variables is not None: 3749 if form_variables is not None:
@@ -3277,6 +3755,11 @@ class VBA_Parser_CLI(VBA_Parser): @@ -3277,6 +3755,11 @@ class VBA_Parser_CLI(VBA_Parser):
3277 # display the exception with full stack trace for debugging 3755 # display the exception with full stack trace for debugging
3278 log.info('Error parsing form: %s' % exc) 3756 log.info('Error parsing form: %s' % exc)
3279 log.debug('Traceback:', exc_info=True) 3757 log.debug('Traceback:', exc_info=True)
  3758 + if pcode:
  3759 + print('-' * 79)
  3760 + print('P-CODE disassembly:')
  3761 + pcode = self.extract_pcode()
  3762 + print(pcode)
3280 3763
3281 if not vba_code_only: 3764 if not vba_code_only:
3282 # analyse the code from all modules at once: 3765 # analyse the code from all modules at once:
@@ -3398,16 +3881,6 @@ class VBA_Parser_CLI(VBA_Parser): @@ -3398,16 +3881,6 @@ class VBA_Parser_CLI(VBA_Parser):
3398 3881
3399 line = '%-12s %s' % (flags, self.filename) 3882 line = '%-12s %s' % (flags, self.filename)
3400 print(line) 3883 print(line)
3401 -  
3402 - # old table display:  
3403 - # macros = autoexec = suspicious = iocs = hexstrings = 'no'  
3404 - # if nb_macros: macros = 'YES:%d' % nb_macros  
3405 - # if nb_autoexec: autoexec = 'YES:%d' % nb_autoexec  
3406 - # if nb_suspicious: suspicious = 'YES:%d' % nb_suspicious  
3407 - # if nb_iocs: iocs = 'YES:%d' % nb_iocs  
3408 - # if nb_hexstrings: hexstrings = 'YES:%d' % nb_hexstrings  
3409 - # # 2nd line = info  
3410 - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % (self.type, macros, autoexec, suspicious, iocs, hexstrings)  
3411 except Exception as exc: 3884 except Exception as exc:
3412 # display the exception with full stack trace for debugging only 3885 # display the exception with full stack trace for debugging only
3413 log.debug('Error processing file %s (%s)' % (self.filename, exc), 3886 log.debug('Error processing file %s (%s)' % (self.filename, exc),
@@ -3415,20 +3888,6 @@ class VBA_Parser_CLI(VBA_Parser): @@ -3415,20 +3888,6 @@ class VBA_Parser_CLI(VBA_Parser):
3415 raise ProcessingError(self.filename, exc) 3888 raise ProcessingError(self.filename, exc)
3416 3889
3417 3890
3418 - # t = prettytable.PrettyTable(('filename', 'type', 'macros', 'autoexec', 'suspicious', 'ioc', 'hexstrings'),  
3419 - # header=False, border=False)  
3420 - # t.align = 'l'  
3421 - # t.max_width['filename'] = 30  
3422 - # t.max_width['type'] = 10  
3423 - # t.max_width['macros'] = 6  
3424 - # t.max_width['autoexec'] = 6  
3425 - # t.max_width['suspicious'] = 6  
3426 - # t.max_width['ioc'] = 6  
3427 - # t.max_width['hexstrings'] = 6  
3428 - # t.add_row((filename, ftype, macros, autoexec, suspicious, iocs, hexstrings))  
3429 - # print t  
3430 -  
3431 -  
3432 #=== MAIN ===================================================================== 3891 #=== MAIN =====================================================================
3433 3892
3434 def parse_args(cmd_line_args=None): 3893 def parse_args(cmd_line_args=None):
@@ -3452,7 +3911,11 @@ def parse_args(cmd_line_args=None): @@ -3452,7 +3911,11 @@ def parse_args(cmd_line_args=None):
3452 parser.add_option("-r", action="store_true", dest="recursive", 3911 parser.add_option("-r", action="store_true", dest="recursive",
3453 help='find files recursively in subdirectories.') 3912 help='find files recursively in subdirectories.')
3454 parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None, 3913 parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None,
3455 - help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)') 3914 + help='if the file is a zip archive, open all files from it, using the provided password.')
  3915 + parser.add_option("-p", "--password", type='str', action='append',
  3916 + default=[],
  3917 + help='if encrypted office files are encountered, try '
  3918 + 'decryption with this password. May be repeated.')
3456 parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*', 3919 parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*',
3457 help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)') 3920 help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)')
3458 # output mode; could make this even simpler with add_option(type='choice') but that would make 3921 # output mode; could make this even simpler with add_option(type='choice') but that would make
@@ -3484,12 +3947,17 @@ def parse_args(cmd_line_args=None): @@ -3484,12 +3947,17 @@ def parse_args(cmd_line_args=None):
3484 help="Attempt to deobfuscate VBA expressions (slow)") 3947 help="Attempt to deobfuscate VBA expressions (slow)")
3485 parser.add_option('--relaxed', dest="relaxed", action="store_true", default=False, 3948 parser.add_option('--relaxed', dest="relaxed", action="store_true", default=False,
3486 help="Do not raise errors if opening of substream fails") 3949 help="Do not raise errors if opening of substream fails")
  3950 + parser.add_option('--pcode', dest="pcode", action="store_true", default=False,
  3951 + help="Disassemble and display the P-code (using pcodedmp)")
3487 3952
3488 (options, args) = parser.parse_args(cmd_line_args) 3953 (options, args) = parser.parse_args(cmd_line_args)
3489 3954
3490 # Print help if no arguments are passed 3955 # Print help if no arguments are passed
3491 if len(args) == 0: 3956 if len(args) == 0:
3492 - print('olevba %s - http://decalage.info/python/oletools' % __version__) 3957 + # print banner with version
  3958 + python_version = '%d.%d.%d' % sys.version_info[0:3]
  3959 + print('olevba %s on Python %s - http://decalage.info/python/oletools' %
  3960 + (__version__, python_version))
3493 print(__doc__) 3961 print(__doc__)
3494 parser.print_help() 3962 parser.print_help()
3495 sys.exit(RETURN_WRONG_ARGS) 3963 sys.exit(RETURN_WRONG_ARGS)
@@ -3499,6 +3967,112 @@ def parse_args(cmd_line_args=None): @@ -3499,6 +3967,112 @@ def parse_args(cmd_line_args=None):
3499 return options, args 3967 return options, args
3500 3968
3501 3969
  3970 +def process_file(filename, data, container, options, crypto_nesting=0):
  3971 + """
  3972 + Part of main function that processes a single file.
  3973 +
  3974 + This handles exceptions and encryption.
  3975 +
  3976 + Returns a single code summarizing the status of processing of this file
  3977 + """
  3978 + try:
  3979 + # Open the file
  3980 + vba_parser = VBA_Parser_CLI(filename, data=data, container=container,
  3981 + relaxed=options.relaxed)
  3982 +
  3983 + if options.output_mode == 'detailed':
  3984 + # fully detailed output
  3985 + vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,
  3986 + display_code=options.display_code,
  3987 + hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
  3988 + show_deobfuscated_code=options.show_deobfuscated_code,
  3989 + deobfuscate=options.deobfuscate, pcode=options.pcode)
  3990 + elif options.output_mode == 'triage':
  3991 + # summarized output for triage:
  3992 + vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings,
  3993 + deobfuscate=options.deobfuscate)
  3994 + elif options.output_mode == 'json':
  3995 + print_json(
  3996 + vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings,
  3997 + display_code=options.display_code,
  3998 + hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,
  3999 + show_deobfuscated_code=options.show_deobfuscated_code,
  4000 + deobfuscate=options.deobfuscate))
  4001 + else: # (should be impossible)
  4002 + raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode))
  4003 +
  4004 + # even if processing succeeds, file might still be encrypted
  4005 + log.debug('Checking for encryption (normal)')
  4006 + if not crypto.is_encrypted(filename):
  4007 + log.debug('no encryption detected')
  4008 + return RETURN_OK
  4009 + except Exception as exc:
  4010 + log.debug('Checking for encryption (after exception)')
  4011 + if crypto.is_encrypted(filename):
  4012 + pass # deal with this below
  4013 + else:
  4014 + if isinstance(exc, (SubstreamOpenError, UnexpectedDataError)):
  4015 + if options.output_mode in ('triage', 'unspecified'):
  4016 + print('%-12s %s - Error opening substream or uenxpected ' \
  4017 + 'content' % ('?', filename))
  4018 + elif options.output_mode == 'json':
  4019 + print_json(file=filename, type='error',
  4020 + error=type(exc).__name__, message=str(exc))
  4021 + else:
  4022 + log.exception('Error opening substream or unexpected '
  4023 + 'content in %s' % filename)
  4024 + return RETURN_OPEN_ERROR
  4025 + elif isinstance(exc, FileOpenError):
  4026 + if options.output_mode in ('triage', 'unspecified'):
  4027 + print('%-12s %s - File format not supported' % ('?', filename))
  4028 + elif options.output_mode == 'json':
  4029 + print_json(file=filename, type='error',
  4030 + error=type(exc).__name__, message=str(exc))
  4031 + else:
  4032 + log.exception('Failed to open %s -- probably not supported!' % filename)
  4033 + return RETURN_OPEN_ERROR
  4034 + elif isinstance(exc, ProcessingError):
  4035 + if options.output_mode in ('triage', 'unspecified'):
  4036 + print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc))
  4037 + elif options.output_mode == 'json':
  4038 + print_json(file=filename, type='error',
  4039 + error=type(exc).__name__,
  4040 + message=str(exc.orig_exc))
  4041 + else:
  4042 + log.exception('Error processing file %s (%s)!'
  4043 + % (filename, exc.orig_exc))
  4044 + return RETURN_PARSE_ERROR
  4045 + else:
  4046 + raise # let caller deal with this
  4047 +
  4048 + # we reach this point only if file is encrypted
  4049 + # check if this is an encrypted file in an encrypted file in an ...
  4050 + if crypto_nesting >= crypto.MAX_NESTING_DEPTH:
  4051 + raise crypto.MaxCryptoNestingReached(crypto_nesting, filename)
  4052 +
  4053 + decrypted_file = None
  4054 + try:
  4055 + log.debug('Checking encryption passwords {}'.format(options.password))
  4056 + passwords = options.password + crypto.DEFAULT_PASSWORDS
  4057 + decrypted_file = crypto.decrypt(filename, passwords)
  4058 + if not decrypted_file:
  4059 + log.error('Decrypt failed, run with debug output to get details')
  4060 + raise crypto.WrongEncryptionPassword(filename)
  4061 + log.info('Working on decrypted file')
  4062 + return process_file(decrypted_file, data, container or filename,
  4063 + options, crypto_nesting+1)
  4064 + except Exception:
  4065 + raise
  4066 + finally: # clean up
  4067 + try:
  4068 + log.debug('Removing crypt temp file {}'.format(decrypted_file))
  4069 + os.unlink(decrypted_file)
  4070 + except Exception: # e.g. file does not exist or is None
  4071 + pass
  4072 + # no idea what to return now
  4073 + raise Exception('Programming error -- should never have reached this!')
  4074 +
  4075 +
3502 def main(cmd_line_args=None): 4076 def main(cmd_line_args=None):
3503 """ 4077 """
3504 Main function, called when olevba is run from the command line 4078 Main function, called when olevba is run from the command line
@@ -3517,52 +4091,60 @@ def main(cmd_line_args=None): @@ -3517,52 +4091,60 @@ def main(cmd_line_args=None):
3517 url='http://decalage.info/python/oletools', 4091 url='http://decalage.info/python/oletools',
3518 type='MetaInformation', _json_is_first=True) 4092 type='MetaInformation', _json_is_first=True)
3519 else: 4093 else:
3520 - print('olevba %s - http://decalage.info/python/oletools' % __version__) 4094 + # print banner with version
  4095 + python_version = '%d.%d.%d' % sys.version_info[0:3]
  4096 + print('olevba %s on Python %s - http://decalage.info/python/oletools' %
  4097 + (__version__, python_version))
3521 4098
3522 logging.basicConfig(level=options.loglevel, format='%(levelname)-8s %(message)s') 4099 logging.basicConfig(level=options.loglevel, format='%(levelname)-8s %(message)s')
3523 # enable logging in the modules: 4100 # enable logging in the modules:
3524 enable_logging() 4101 enable_logging()
3525 4102
3526 - # Old display with number of items detected:  
3527 - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('Type', 'Macros', 'AutoEx', 'Susp.', 'IOCs', 'HexStr')  
3528 - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('-'*8, '-'*7, '-'*7, '-'*7, '-'*7, '-'*7)  
3529 -  
3530 # with the option --reveal, make sure --deobf is also enabled: 4103 # with the option --reveal, make sure --deobf is also enabled:
3531 if options.show_deobfuscated_code and not options.deobfuscate: 4104 if options.show_deobfuscated_code and not options.deobfuscate:
3532 - log.info('set --deobf because --reveal was set') 4105 + log.debug('set --deobf because --reveal was set')
3533 options.deobfuscate = True 4106 options.deobfuscate = True
3534 if options.output_mode == 'triage' and options.show_deobfuscated_code: 4107 if options.output_mode == 'triage' and options.show_deobfuscated_code:
3535 - log.info('ignoring option --reveal in triage output mode') 4108 + log.debug('ignoring option --reveal in triage output mode')
  4109 +
  4110 + # gather info on all files that must be processed
  4111 + # ignore directory names stored in zip files:
  4112 + all_input_info = tuple((container, filename, data) for
  4113 + container, filename, data in xglob.iter_files(
  4114 + args, recursive=options.recursive,
  4115 + zip_password=options.zip_password,
  4116 + zip_fname=options.zip_fname)
  4117 + if not (container and filename.endswith('/')))
  4118 +
  4119 + # specify output mode if options -t, -d and -j were not specified
  4120 + if options.output_mode == 'unspecified':
  4121 + if len(all_input_info) == 1:
  4122 + options.output_mode = 'detailed'
  4123 + else:
  4124 + options.output_mode = 'triage'
3536 4125
3537 - # Column headers (do not know how many files there will be yet, so if no output_mode  
3538 - # was specified, we will print triage for first file --> need these headers)  
3539 - if options.output_mode in ('triage', 'unspecified'): 4126 + # Column headers for triage mode
  4127 + if options.output_mode == 'triage':
3540 print('%-12s %-65s' % ('Flags', 'Filename')) 4128 print('%-12s %-65s' % ('Flags', 'Filename'))
3541 print('%-12s %-65s' % ('-' * 11, '-' * 65)) 4129 print('%-12s %-65s' % ('-' * 11, '-' * 65))
3542 4130
3543 previous_container = None 4131 previous_container = None
3544 count = 0 4132 count = 0
3545 container = filename = data = None 4133 container = filename = data = None
3546 - vba_parser = None  
3547 return_code = RETURN_OK 4134 return_code = RETURN_OK
3548 try: 4135 try:
3549 - for container, filename, data in xglob.iter_files(args, recursive=options.recursive,  
3550 - zip_password=options.zip_password, zip_fname=options.zip_fname):  
3551 - # ignore directory names stored in zip files:  
3552 - if container and filename.endswith('/'):  
3553 - continue  
3554 - 4136 + for container, filename, data in all_input_info:
3555 # handle errors from xglob 4137 # handle errors from xglob
3556 if isinstance(data, Exception): 4138 if isinstance(data, Exception):
3557 if isinstance(data, PathNotFoundException): 4139 if isinstance(data, PathNotFoundException):
3558 - if options.output_mode in ('triage', 'unspecified'): 4140 + if options.output_mode == 'triage':
3559 print('%-12s %s - File not found' % ('?', filename)) 4141 print('%-12s %s - File not found' % ('?', filename))
3560 elif options.output_mode != 'json': 4142 elif options.output_mode != 'json':
3561 log.error('Given path %r does not exist!' % filename) 4143 log.error('Given path %r does not exist!' % filename)
3562 return_code = RETURN_FILE_NOT_FOUND if return_code == 0 \ 4144 return_code = RETURN_FILE_NOT_FOUND if return_code == 0 \
3563 else RETURN_SEVERAL_ERRS 4145 else RETURN_SEVERAL_ERRS
3564 else: 4146 else:
3565 - if options.output_mode in ('triage', 'unspecified'): 4147 + if options.output_mode == 'triage':
3566 print('%-12s %s - Failed to read from zip file %s' % ('?', filename, container)) 4148 print('%-12s %s - Failed to read from zip file %s' % ('?', filename, container))
3567 elif options.output_mode != 'json': 4149 elif options.output_mode != 'json':
3568 log.error('Exception opening/reading %r from zip file %r: %s' 4150 log.error('Exception opening/reading %r from zip file %r: %s'
@@ -3574,107 +4156,42 @@ def main(cmd_line_args=None): @@ -3574,107 +4156,42 @@ def main(cmd_line_args=None):
3574 error=type(data).__name__, message=str(data)) 4156 error=type(data).__name__, message=str(data))
3575 continue 4157 continue
3576 4158
3577 - try:  
3578 - # close the previous file if analyzing several:  
3579 - # (this must be done here to avoid closing the file if there is only 1,  
3580 - # to fix issue #219)  
3581 - if vba_parser is not None:  
3582 - vba_parser.close()  
3583 - # Open the file  
3584 - vba_parser = VBA_Parser_CLI(filename, data=data, container=container,  
3585 - relaxed=options.relaxed)  
3586 -  
3587 - if options.output_mode == 'detailed':  
3588 - # fully detailed output  
3589 - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,  
3590 - display_code=options.display_code,  
3591 - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,  
3592 - show_deobfuscated_code=options.show_deobfuscated_code,  
3593 - deobfuscate=options.deobfuscate)  
3594 - elif options.output_mode in ('triage', 'unspecified'):  
3595 - # print container name when it changes:  
3596 - if container != previous_container:  
3597 - if container is not None:  
3598 - print('\nFiles in %s:' % container)  
3599 - previous_container = container  
3600 - # summarized output for triage:  
3601 - vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings,  
3602 - deobfuscate=options.deobfuscate)  
3603 - elif options.output_mode == 'json':  
3604 - print_json(  
3605 - vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings,  
3606 - display_code=options.display_code,  
3607 - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,  
3608 - show_deobfuscated_code=options.show_deobfuscated_code,  
3609 - deobfuscate=options.deobfuscate))  
3610 - else: # (should be impossible)  
3611 - raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode))  
3612 - count += 1  
3613 -  
3614 - except (SubstreamOpenError, UnexpectedDataError) as exc:  
3615 - if options.output_mode in ('triage', 'unspecified'):  
3616 - print('%-12s %s - Error opening substream or uenxpected ' \  
3617 - 'content' % ('?', filename))  
3618 - elif options.output_mode == 'json':  
3619 - print_json(file=filename, type='error',  
3620 - error=type(exc).__name__, message=str(exc))  
3621 - else:  
3622 - log.exception('Error opening substream or unexpected '  
3623 - 'content in %s' % filename)  
3624 - return_code = RETURN_OPEN_ERROR if return_code == 0 \  
3625 - else RETURN_SEVERAL_ERRS  
3626 - except FileOpenError as exc:  
3627 - if options.output_mode in ('triage', 'unspecified'):  
3628 - print('%-12s %s - File format not supported' % ('?', filename))  
3629 - elif options.output_mode == 'json':  
3630 - print_json(file=filename, type='error',  
3631 - error=type(exc).__name__, message=str(exc))  
3632 - else:  
3633 - log.exception('Failed to open %s -- probably not supported!' % filename)  
3634 - return_code = RETURN_OPEN_ERROR if return_code == 0 \  
3635 - else RETURN_SEVERAL_ERRS  
3636 - except ProcessingError as exc:  
3637 - if options.output_mode in ('triage', 'unspecified'):  
3638 - print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc))  
3639 - elif options.output_mode == 'json':  
3640 - print_json(file=filename, type='error',  
3641 - error=type(exc).__name__,  
3642 - message=str(exc.orig_exc))  
3643 - else:  
3644 - log.exception('Error processing file %s (%s)!'  
3645 - % (filename, exc.orig_exc))  
3646 - return_code = RETURN_PARSE_ERROR if return_code == 0 \  
3647 - else RETURN_SEVERAL_ERRS  
3648 - except FileIsEncryptedError as exc:  
3649 - if options.output_mode in ('triage', 'unspecified'):  
3650 - print('%-12s %s - File is encrypted' % ('!ERROR', filename))  
3651 - elif options.output_mode == 'json':  
3652 - print_json(file=filename, type='error',  
3653 - error=type(exc).__name__, message=str(exc))  
3654 - else:  
3655 - log.exception('File %s is encrypted!' % (filename))  
3656 - return_code = RETURN_ENCRYPTED if return_code == 0 \  
3657 - else RETURN_SEVERAL_ERRS  
3658 - # Here we do not close the vba_parser, because process_file may need it below. 4159 + if options.output_mode == 'triage':
  4160 + # print container name when it changes:
  4161 + if container != previous_container:
  4162 + if container is not None:
  4163 + print('\nFiles in %s:' % container)
  4164 + previous_container = container
  4165 +
  4166 + # process the file, handling errors and encryption
  4167 + curr_return_code = process_file(filename, data, container, options)
  4168 + count += 1
  4169 +
  4170 + # adjust overall return code
  4171 + if curr_return_code == RETURN_OK:
  4172 + continue # do not modify overall return code
  4173 + if return_code == RETURN_OK:
  4174 + return_code = curr_return_code # first error return code
  4175 + else:
  4176 + return_code = RETURN_SEVERAL_ERRS # several errors
3659 4177
3660 if options.output_mode == 'triage': 4178 if options.output_mode == 'triage':
3661 print('\n(Flags: OpX=OpenXML, XML=Word2003XML, FlX=FlatOPC XML, MHT=MHTML, TXT=Text, M=Macros, ' \ 4179 print('\n(Flags: OpX=OpenXML, XML=Word2003XML, FlX=FlatOPC XML, MHT=MHTML, TXT=Text, M=Macros, ' \
3662 'A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, ' \ 4180 'A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, ' \
3663 'B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown)\n') 4181 'B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown)\n')
3664 4182
3665 - if count == 1 and options.output_mode == 'unspecified':  
3666 - # if options -t, -d and -j were not specified and it's a single file, print details:  
3667 - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,  
3668 - display_code=options.display_code,  
3669 - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,  
3670 - show_deobfuscated_code=options.show_deobfuscated_code,  
3671 - deobfuscate=options.deobfuscate)  
3672 -  
3673 if options.output_mode == 'json': 4183 if options.output_mode == 'json':
3674 # print last json entry (a last one without a comma) and closing ] 4184 # print last json entry (a last one without a comma) and closing ]
3675 print_json(type='MetaInformation', return_code=return_code, 4185 print_json(type='MetaInformation', return_code=return_code,
3676 n_processed=count, _json_is_last=True) 4186 n_processed=count, _json_is_last=True)
3677 4187
  4188 + except crypto.CryptoErrorBase as exc:
  4189 + log.exception('Problems with encryption in main: {}'.format(exc),
  4190 + exc_info=True)
  4191 + if return_code == RETURN_OK:
  4192 + return_code = RETURN_ENCRYPTED
  4193 + else:
  4194 + return_code == RETURN_SEVERAL_ERRS
3678 except Exception as exc: 4195 except Exception as exc:
3679 # some unexpected error, maybe some of the types caught in except clauses 4196 # some unexpected error, maybe some of the types caught in except clauses
3680 # above were not sufficient. This is very bad, so log complete trace at exception level 4197 # above were not sufficient. This is very bad, so log complete trace at exception level
oletools/olevba3.py
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 -"""  
3 -olevba3.py  
4 2
5 -olevba is a script to parse OLE and OpenXML files such as MS Office documents  
6 -(e.g. Word, Excel), to extract VBA Macro code in clear text, deobfuscate  
7 -and analyze malicious macros. 3 +# olevba3 is a stub that redirects to olevba.py, for backwards compatibility
8 4
9 -olevba3 is the version of olevba that runs on Python 3.x. 5 +import sys, os, warnings
10 6
11 -Supported formats:  
12 -- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm)  
13 -- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb)  
14 -- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm)  
15 -- Word/PowerPoint 2007+ XML (aka Flat OPC)  
16 -- Word 2003 XML (.xml)  
17 -- Word/Excel Single File Web Page / MHTML (.mht)  
18 -- Publisher (.pub)  
19 -- raises an error if run with files encrypted using MS Crypto API RC4  
20 -  
21 -Author: Philippe Lagadec - http://www.decalage.info  
22 -License: BSD, see source code or documentation  
23 -  
24 -olevba is part of the python-oletools package:  
25 -http://www.decalage.info/python/oletools  
26 -  
27 -olevba is based on source code from officeparser by John William Davison  
28 -https://github.com/unixfreak0037/officeparser  
29 -"""  
30 -  
31 -# === LICENSE ==================================================================  
32 -  
33 -# olevba is copyright (c) 2014-2018 Philippe Lagadec (http://www.decalage.info)  
34 -# All rights reserved.  
35 -#  
36 -# Redistribution and use in source and binary forms, with or without modification,  
37 -# are permitted provided that the following conditions are met:  
38 -#  
39 -# * Redistributions of source code must retain the above copyright notice, this  
40 -# list of conditions and the following disclaimer.  
41 -# * Redistributions in binary form must reproduce the above copyright notice,  
42 -# this list of conditions and the following disclaimer in the documentation  
43 -# and/or other materials provided with the distribution.  
44 -#  
45 -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND  
46 -# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED  
47 -# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE  
48 -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE  
49 -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL  
50 -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR  
51 -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER  
52 -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,  
53 -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE  
54 -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.  
55 -  
56 -  
57 -# olevba contains modified source code from the officeparser project, published  
58 -# under the following MIT License (MIT):  
59 -#  
60 -# officeparser is copyright (c) 2014 John William Davison  
61 -#  
62 -# Permission is hereby granted, free of charge, to any person obtaining a copy  
63 -# of this software and associated documentation files (the "Software"), to deal  
64 -# in the Software without restriction, including without limitation the rights  
65 -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell  
66 -# copies of the Software, and to permit persons to whom the Software is  
67 -# furnished to do so, subject to the following conditions:  
68 -#  
69 -# The above copyright notice and this permission notice shall be included in all  
70 -# copies or substantial portions of the Software.  
71 -#  
72 -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR  
73 -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,  
74 -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE  
75 -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER  
76 -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,  
77 -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE  
78 -# SOFTWARE.  
79 -  
80 -from __future__ import print_function  
81 -  
82 -  
83 -#------------------------------------------------------------------------------  
84 -# CHANGELOG:  
85 -# 2014-08-05 v0.01 PL: - first version based on officeparser code  
86 -# 2014-08-14 v0.02 PL: - fixed bugs in code, added license from officeparser  
87 -# 2014-08-15 PL: - fixed incorrect value check in projecthelpfilepath Record  
88 -# 2014-08-15 v0.03 PL: - refactored extract_macros to support OpenXML formats  
89 -# and to find the VBA project root anywhere in the file  
90 -# 2014-11-29 v0.04 PL: - use olefile instead of OleFileIO_PL  
91 -# 2014-12-05 v0.05 PL: - refactored most functions into a class, new API  
92 -# - added detect_vba_macros  
93 -# 2014-12-10 v0.06 PL: - hide first lines with VB attributes  
94 -# - detect auto-executable macros  
95 -# - ignore empty macros  
96 -# 2014-12-14 v0.07 PL: - detect_autoexec() is now case-insensitive  
97 -# 2014-12-15 v0.08 PL: - improved display for empty macros  
98 -# - added pattern extraction  
99 -# 2014-12-25 v0.09 PL: - added suspicious keywords detection  
100 -# 2014-12-27 v0.10 PL: - added OptionParser, main and process_file  
101 -# - uses xglob to scan several files with wildcards  
102 -# - option -r to recurse subdirectories  
103 -# - option -z to scan files in password-protected zips  
104 -# 2015-01-02 v0.11 PL: - improved filter_vba to detect colons  
105 -# 2015-01-03 v0.12 PL: - fixed detect_patterns to detect all patterns  
106 -# - process_file: improved display, shows container file  
107 -# - improved list of executable file extensions  
108 -# 2015-01-04 v0.13 PL: - added several suspicious keywords, improved display  
109 -# 2015-01-08 v0.14 PL: - added hex strings detection and decoding  
110 -# - fixed issue #2, decoding VBA stream names using  
111 -# specified codepage and unicode stream names  
112 -# 2015-01-11 v0.15 PL: - added new triage mode, options -t and -d  
113 -# 2015-01-16 v0.16 PL: - fix for issue #3 (exception when module name="text")  
114 -# - added several suspicious keywords  
115 -# - added option -i to analyze VBA source code directly  
116 -# 2015-01-17 v0.17 PL: - removed .com from the list of executable extensions  
117 -# - added scan_vba to run all detection algorithms  
118 -# - decoded hex strings are now also scanned + reversed  
119 -# 2015-01-23 v0.18 PL: - fixed issue #3, case-insensitive search in code_modules  
120 -# 2015-01-24 v0.19 PL: - improved the detection of IOCs obfuscated with hex  
121 -# strings and StrReverse  
122 -# 2015-01-26 v0.20 PL: - added option --hex to show all hex strings decoded  
123 -# 2015-01-29 v0.21 PL: - added Dridex obfuscation decoding  
124 -# - improved display, shows obfuscation name  
125 -# 2015-02-01 v0.22 PL: - fixed issue #4: regex for URL, e-mail and exe filename  
126 -# - added Base64 obfuscation decoding (contribution from  
127 -# @JamesHabben)  
128 -# 2015-02-03 v0.23 PL: - triage now uses VBA_Scanner results, shows Base64 and  
129 -# Dridex strings  
130 -# - exception handling in detect_base64_strings  
131 -# 2015-02-07 v0.24 PL: - renamed option --hex to --decode, fixed display  
132 -# - display exceptions with stack trace  
133 -# - added several suspicious keywords  
134 -# - improved Base64 detection and decoding  
135 -# - fixed triage mode not to scan attrib lines  
136 -# 2015-03-04 v0.25 PL: - added support for Word 2003 XML  
137 -# 2015-03-22 v0.26 PL: - added suspicious keywords for sandboxing and  
138 -# virtualisation detection  
139 -# 2015-05-06 v0.27 PL: - added support for MHTML files with VBA macros  
140 -# (issue #10 reported by Greg from SpamStopsHere)  
141 -# 2015-05-24 v0.28 PL: - improved support for MHTML files with modified header  
142 -# (issue #11 reported by Thomas Chopitea)  
143 -# 2015-05-26 v0.29 PL: - improved MSO files parsing, taking into account  
144 -# various data offsets (issue #12)  
145 -# - improved detection of MSO files, avoiding incorrect  
146 -# parsing errors (issue #7)  
147 -# 2015-05-29 v0.30 PL: - added suspicious keywords suggested by @ozhermit,  
148 -# Davy Douhine (issue #9), issue #13  
149 -# 2015-06-16 v0.31 PL: - added generic VBA expression deobfuscation (chr,asc,etc)  
150 -# 2015-06-19 PL: - added options -a, -c, --each, --attr  
151 -# 2015-06-21 v0.32 PL: - always display decoded strings which are printable  
152 -# - fix VBA_Scanner.scan to return raw strings, not repr()  
153 -# 2015-07-09 v0.40 PL: - removed usage of sys.stderr which causes issues  
154 -# 2015-07-12 PL: - added Hex function decoding to VBA Parser  
155 -# 2015-07-13 PL: - added Base64 function decoding to VBA Parser  
156 -# 2015-09-06 PL: - improved VBA_Parser, refactored the main functions  
157 -# 2015-09-13 PL: - moved main functions to a class VBA_Parser_CLI  
158 -# - fixed issue when analysis was done twice  
159 -# 2015-09-15 PL: - remove duplicate IOCs from results  
160 -# 2015-09-16 PL: - join long VBA lines ending with underscore before scan  
161 -# - disabled unused option --each  
162 -# 2015-09-22 v0.41 PL: - added new option --reveal  
163 -# - added suspicious strings for PowerShell.exe options  
164 -# 2015-10-09 v0.42 PL: - VBA_Parser: split each format into a separate method  
165 -# 2015-10-10 PL: - added support for text files with VBA source code  
166 -# 2015-11-17 PL: - fixed bug with --decode option  
167 -# 2015-12-16 PL: - fixed bug in main (no options input anymore)  
168 -# - improved logging, added -l option  
169 -# 2016-01-31 PL: - fixed issue #31 in VBA_Parser.open_mht  
170 -# - fixed issue #32 by monkeypatching email.feedparser  
171 -# 2016-02-07 PL: - KeyboardInterrupt is now raised properly  
172 -# 2016-02-20 v0.43 PL: - fixed issue #34 in the VBA parser and vba_chr  
173 -# 2016-02-29 PL: - added Workbook_Activate to suspicious keywords  
174 -# 2016-03-08 v0.44 PL: - added VBA Form strings extraction and analysis  
175 -# 2016-03-04 v0.45 CH: - added JSON output (by Christian Herdtweck)  
176 -# 2016-03-16 CH: - added option --no-deobfuscate (temporary)  
177 -# 2016-04-19 v0.46 PL: - new option --deobf instead of --no-deobfuscate  
178 -# - updated suspicious keywords  
179 -# 2016-05-04 v0.47 PL: - look for VBA code in any stream including orphans  
180 -# 2016-04-28 CH: - return an exit code depending on the results  
181 -# - improved error and exception handling  
182 -# - improved JSON output  
183 -# 2016-05-12 CH: - added support for PowerPoint 97-2003 files  
184 -# 2016-06-06 CH: - improved handling of unicode VBA module names  
185 -# 2016-06-07 CH: - added option --relaxed, stricter parsing by default  
186 -# 2016-06-12 v0.50 PL: - fixed small bugs in VBA parsing code  
187 -# 2016-07-01 PL: - fixed issue #58 with format() to support Python 2.6  
188 -# 2016-07-29 CH: - fixed several bugs including #73 (Mac Roman encoding)  
189 -# 2016-08-31 PL: - added autoexec keyword InkPicture_Painted  
190 -# - detect_autoexec now returns the exact keyword found  
191 -# 2016-09-05 PL: - added autoexec keywords for MS Publisher (.pub)  
192 -# 2016-09-06 PL: - fixed issue #20, is_zipfile on Python 2.6  
193 -# 2016-09-12 PL: - enabled packrat to improve pyparsing performance  
194 -# 2016-10-25 PL: - fixed raise and print statements for Python 3  
195 -# 2016-11-03 v0.51 PL: - added EnumDateFormats and EnumSystemLanguageGroupsW  
196 -# 2017-02-07 PL: - temporary fix for issue #132  
197 -# - added keywords for Mac-specific macros (issue #130)  
198 -# 2017-03-08 PL: - fixed absolute imports  
199 -# 2017-03-16 PL: - fixed issues #148 and #149 for option --reveal  
200 -# 2017-05-19 PL: - added enable_logging to fix issue #154  
201 -# 2017-05-31 c1fe: - PR #135 fixing issue #132 for some Mac files  
202 -# 2017-06-08 PL: - fixed issue #122 Chr() with negative numbers  
203 -# 2017-06-15 PL: - deobfuscation line by line to handle large files  
204 -# 2017-07-11 v0.52 PL: - raise exception instead of sys.exit (issue #180)  
205 -# 2018-03-19 PL: - removed pyparsing from the thirdparty subfolder  
206 -# 2018-05-13 v0.53 PL: - added support for Word/PowerPoint 2007+ XML (FlatOPC)  
207 -# (issue #283)  
208 -# 2018-06-11 v0.53.1 MHW: - fixed #320: chr instead of unichr on python 3  
209 -# 2018-06-12 MHW: - fixed #322: import reduce from functools  
210 -# 2018-09-11 v0.54 PL: - olefile is now a dependency  
211 -# 2018-10-25 CH: - detect encryption and raise error if detected  
212 -  
213 -__version__ = '0.54dev4'  
214 -  
215 -#------------------------------------------------------------------------------  
216 -# TODO:  
217 -# + setup logging (common with other oletools)  
218 -# + add xor bruteforcing like bbharvest  
219 -# + options -a and -c should imply -d  
220 -  
221 -# TODO later:  
222 -# + performance improvement: instead of searching each keyword separately,  
223 -# first split vba code into a list of words (per line), then check each  
224 -# word against a dict. (or put vba words into a set/dict?)  
225 -# + for regex, maybe combine them into a single re with named groups?  
226 -# + add Yara support, include sample rules? plugins like balbuzard?  
227 -# + add balbuzard support  
228 -# + output to file (replace print by file.write, sys.stdout by default)  
229 -# + look for VBA in embedded documents (e.g. Excel in Word)  
230 -# + support SRP streams (see Lenny's article + links and sample)  
231 -# - python 3.x support  
232 -# - check VBA macros in Visio, Access, Project, etc  
233 -# - extract_macros: convert to a class, split long function into smaller methods  
234 -# - extract_macros: read bytes from stream file objects instead of strings  
235 -# - extract_macros: use combined struct.unpack instead of many calls  
236 -# - all except clauses should target specific exceptions  
237 -  
238 -#------------------------------------------------------------------------------  
239 -# REFERENCES:  
240 -# - [MS-OVBA]: Microsoft Office VBA File Format Structure  
241 -# http://msdn.microsoft.com/en-us/library/office/cc313094%28v=office.12%29.aspx  
242 -# - officeparser: https://github.com/unixfreak0037/officeparser  
243 -  
244 -  
245 -#--- IMPORTS ------------------------------------------------------------------  
246 -  
247 -import sys  
248 -import os  
249 -import logging  
250 -import struct  
251 -from _io import StringIO,BytesIO  
252 -import math  
253 -import zipfile  
254 -import re  
255 -import optparse  
256 -import binascii  
257 -import base64  
258 -import zlib  
259 -import email # for MHTML parsing  
260 -import string # for printable  
261 -import json # for json output mode (argument --json)  
262 -from functools import reduce  
263 -  
264 -# import lxml or ElementTree for XML parsing:  
265 -try:  
266 - # lxml: best performance for XML processing  
267 - import lxml.etree as ET  
268 -except ImportError:  
269 - try:  
270 - # Python 2.5+: batteries included  
271 - import xml.etree.cElementTree as ET  
272 - except ImportError:  
273 - try:  
274 - # Python <2.5: standalone ElementTree install  
275 - import elementtree.cElementTree as ET  
276 - except ImportError:  
277 - raise ImportError("lxml or ElementTree are not installed, " \  
278 - + "see http://codespeak.net/lxml " \  
279 - + "or http://effbot.org/zone/element-index.htm") 7 +warnings.warn('olevba3 is deprecated, olevba should be used instead.', DeprecationWarning)
280 8
281 # IMPORTANT: it should be possible to run oletools directly as scripts 9 # IMPORTANT: it should be possible to run oletools directly as scripts
282 # in any directory without installing them with pip or setup.py. 10 # in any directory without installing them with pip or setup.py.
@@ -284,3374 +12,13 @@ except ImportError: @@ -284,3374 +12,13 @@ except ImportError:
284 # And to enable Python 2+3 compatibility, we need to use absolute imports, 12 # And to enable Python 2+3 compatibility, we need to use absolute imports,
285 # so we add the oletools parent folder to sys.path (absolute+normalized path): 13 # so we add the oletools parent folder to sys.path (absolute+normalized path):
286 _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) 14 _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__)))
287 -# print('_thismodule_dir = %r' % _thismodule_dir)  
288 _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..')) 15 _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..'))
289 -# print('_parent_dir = %r' % _thirdparty_dir)  
290 -if not _parent_dir in sys.path: 16 +if _parent_dir not in sys.path:
291 sys.path.insert(0, _parent_dir) 17 sys.path.insert(0, _parent_dir)
292 18
293 -import olefile  
294 -from oletools.thirdparty.prettytable import prettytable  
295 -from oletools.thirdparty.xglob import xglob, PathNotFoundException  
296 -from pyparsing import \  
297 - CaselessKeyword, CaselessLiteral, Combine, Forward, Literal, \  
298 - Optional, QuotedString,Regex, Suppress, Word, WordStart, \  
299 - alphanums, alphas, hexnums,nums, opAssoc, srange, \  
300 - infixNotation, ParserElement  
301 -import oletools.ppt_parser as ppt_parser  
302 -from oletools import rtfobj  
303 -from oletools import oleid  
304 -from oletools.common.errors import FileIsEncryptedError  
305 -  
306 -# monkeypatch email to fix issue #32:  
307 -# allow header lines without ":"  
308 -import email.feedparser  
309 -email.feedparser.headerRE = re.compile(r'^(From |[\041-\071\073-\176]{1,}:?|[\t ])')  
310 -  
311 -# === PYTHON 2+3 SUPPORT ======================================================  
312 -  
313 -if sys.version_info[0] <= 2:  
314 - # Python 2.x  
315 - if sys.version_info[1] <= 6:  
316 - # Python 2.6  
317 - # use is_zipfile backported from Python 2.7:  
318 - from thirdparty.zipfile27 import is_zipfile  
319 - else:  
320 - # Python 2.7  
321 - from zipfile import is_zipfile  
322 -else:  
323 - # Python 3.x+  
324 - from zipfile import is_zipfile  
325 - # xrange is now called range:  
326 - xrange = range  
327 -  
328 -  
329 -# === PYTHON 3.0 - 3.4 SUPPORT ======================================================  
330 -  
331 -# From https://gist.github.com/ynkdir/867347/c5e188a4886bc2dd71876c7e069a7b00b6c16c61  
332 -  
333 -if sys.version_info >= (3, 0) and sys.version_info < (3, 5):  
334 - import codecs  
335 -  
336 - _backslashreplace_errors = codecs.lookup_error("backslashreplace")  
337 -  
338 - def backslashreplace_errors(exc):  
339 - if isinstance(exc, UnicodeDecodeError):  
340 - u = "".join("\\x{0:02x}".format(c) for c in exc.object[exc.start:exc.end])  
341 - return (u, exc.end)  
342 - return _backslashreplace_errors(exc)  
343 -  
344 - codecs.register_error("backslashreplace", backslashreplace_errors)  
345 -  
346 -  
347 -# === LOGGING =================================================================  
348 -  
349 -class NullHandler(logging.Handler):  
350 - """  
351 - Log Handler without output, to avoid printing messages if logging is not  
352 - configured by the main application.  
353 - Python 2.7 has logging.NullHandler, but this is necessary for 2.6:  
354 - see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library  
355 - """  
356 - def emit(self, record):  
357 - pass  
358 -  
359 -def get_logger(name, level=logging.CRITICAL+1):  
360 - """  
361 - Create a suitable logger object for this module.  
362 - The goal is not to change settings of the root logger, to avoid getting  
363 - other modules' logs on the screen.  
364 - If a logger exists with same name, reuse it. (Else it would have duplicate  
365 - handlers and messages would be doubled.)  
366 - The level is set to CRITICAL+1 by default, to avoid any logging.  
367 - """  
368 - # First, test if there is already a logger with the same name, else it  
369 - # will generate duplicate messages (due to duplicate handlers):  
370 - if name in logging.Logger.manager.loggerDict:  
371 - #NOTE: another less intrusive but more "hackish" solution would be to  
372 - # use getLogger then test if its effective level is not default.  
373 - logger = logging.getLogger(name)  
374 - # make sure level is OK:  
375 - logger.setLevel(level)  
376 - return logger  
377 - # get a new logger:  
378 - logger = logging.getLogger(name)  
379 - # only add a NullHandler for this logger, it is up to the application  
380 - # to configure its own logging:  
381 - logger.addHandler(NullHandler())  
382 - logger.setLevel(level)  
383 - return logger  
384 -  
385 -# a global logger object used for debugging:  
386 -log = get_logger('olevba')  
387 -  
388 -  
389 -def enable_logging():  
390 - """  
391 - Enable logging for this module (disabled by default).  
392 - This will set the module-specific logger level to NOTSET, which  
393 - means the main application controls the actual logging level.  
394 - """  
395 - log.setLevel(logging.NOTSET)  
396 - # Also enable logging in the ppt_parser module:  
397 - ppt_parser.enable_logging()  
398 -  
399 -  
400 -  
401 -#=== EXCEPTIONS ==============================================================  
402 -  
403 -class OlevbaBaseException(Exception):  
404 - """ Base class for exceptions produced here for simpler except clauses """  
405 - def __init__(self, msg, filename=None, orig_exc=None, **kwargs):  
406 - if orig_exc:  
407 - super(OlevbaBaseException, self).__init__(msg +  
408 - ' ({0})'.format(orig_exc),  
409 - **kwargs)  
410 - else:  
411 - super(OlevbaBaseException, self).__init__(msg, **kwargs)  
412 - self.msg = msg  
413 - self.filename = filename  
414 - self.orig_exc = orig_exc  
415 -  
416 -  
417 -class FileOpenError(OlevbaBaseException):  
418 - """ raised by VBA_Parser constructor if all open_... attempts failed  
419 -  
420 - probably means the file type is not supported  
421 - """  
422 -  
423 - def __init__(self, filename, orig_exc=None):  
424 - super(FileOpenError, self).__init__(  
425 - 'Failed to open file %s' % filename, filename, orig_exc)  
426 -  
427 -  
428 -class ProcessingError(OlevbaBaseException):  
429 - """ raised by VBA_Parser.process_file* functions """  
430 -  
431 - def __init__(self, filename, orig_exc):  
432 - super(ProcessingError, self).__init__(  
433 - 'Error processing file %s' % filename, filename, orig_exc)  
434 -  
435 -  
436 -class MsoExtractionError(RuntimeError, OlevbaBaseException):  
437 - """ raised by mso_file_extract if parsing MSO/ActiveMIME data failed """  
438 -  
439 - def __init__(self, msg):  
440 - MsoExtractionError.__init__(self, msg)  
441 - OlevbaBaseException.__init__(self, msg)  
442 -  
443 -  
444 -class SubstreamOpenError(FileOpenError):  
445 - """ special kind of FileOpenError: file is a substream of original file """  
446 -  
447 - def __init__(self, filename, subfilename, orig_exc=None):  
448 - super(SubstreamOpenError, self).__init__(  
449 - str(filename) + '/' + str(subfilename), orig_exc)  
450 - self.filename = filename # overwrite setting in OlevbaBaseException  
451 - self.subfilename = subfilename  
452 -  
453 -  
454 -class UnexpectedDataError(OlevbaBaseException):  
455 - """ raised when parsing is strict (=not relaxed) and data is unexpected """  
456 -  
457 - def __init__(self, stream_path, variable, expected, value):  
458 - if isinstance(expected, int):  
459 - es = '{0:04X}'.format(expected)  
460 - elif isinstance(expected, tuple):  
461 - es = ','.join('{0:04X}'.format(e) for e in expected)  
462 - es = '({0})'.format(es)  
463 - else:  
464 - raise ValueError('Unknown type encountered: {0}'.format(type(expected)))  
465 - super(UnexpectedDataError, self).__init__(  
466 - 'Unexpected value in {0} for variable {1}: '  
467 - 'expected {2} but found {3:04X}!'  
468 - .format(stream_path, variable, es, value))  
469 - self.stream_path = stream_path  
470 - self.variable = variable  
471 - self.expected = expected  
472 - self.value = value  
473 -  
474 -#--- CONSTANTS ----------------------------------------------------------------  
475 -  
476 -# return codes  
477 -RETURN_OK = 0  
478 -RETURN_WARNINGS = 1 # (reserved, not used yet)  
479 -RETURN_WRONG_ARGS = 2 # (fixed, built into optparse)  
480 -RETURN_FILE_NOT_FOUND = 3  
481 -RETURN_XGLOB_ERR = 4  
482 -RETURN_OPEN_ERROR = 5  
483 -RETURN_PARSE_ERROR = 6  
484 -RETURN_SEVERAL_ERRS = 7  
485 -RETURN_UNEXPECTED = 8  
486 -RETURN_ENCRYPTED = 9  
487 -  
488 -# MAC codepages (from http://stackoverflow.com/questions/1592925/decoding-mac-os-text-in-python)  
489 -MAC_CODEPAGES = {  
490 - 10000: 'mac-roman',  
491 - 10001: 'shiftjis', # not found: 'mac-shift-jis',  
492 - 10003: 'ascii', # nothing appropriate found: 'mac-hangul',  
493 - 10008: 'gb2321', # not found: 'mac-gb2312',  
494 - 10002: 'big5', # not found: 'mac-big5',  
495 - 10005: 'hebrew', # not found: 'mac-hebrew',  
496 - 10004: 'mac-arabic',  
497 - 10006: 'mac-greek',  
498 - 10081: 'mac-turkish',  
499 - 10021: 'thai', # not found: mac-thai',  
500 - 10029: 'maccentraleurope', # not found: 'mac-east europe',  
501 - 10007: 'ascii', # nothing appropriate found: 'mac-russian',  
502 -}  
503 -  
504 -# URL and message to report issues:  
505 -URL_OLEVBA_ISSUES = 'https://github.com/decalage2/oletools/issues'  
506 -MSG_OLEVBA_ISSUES = 'Please report this issue on %s' % URL_OLEVBA_ISSUES  
507 -  
508 -# Container types:  
509 -TYPE_OLE = 'OLE'  
510 -TYPE_OpenXML = 'OpenXML'  
511 -TYPE_FlatOPC_XML = 'FlatOPC_XML'  
512 -TYPE_Word2003_XML = 'Word2003_XML'  
513 -TYPE_MHTML = 'MHTML'  
514 -TYPE_TEXT = 'Text'  
515 -TYPE_PPT = 'PPT'  
516 -  
517 -# short tag to display file types in triage mode:  
518 -TYPE2TAG = {  
519 - TYPE_OLE: 'OLE:',  
520 - TYPE_OpenXML: 'OpX:',  
521 - TYPE_FlatOPC_XML: 'FlX:',  
522 - TYPE_Word2003_XML: 'XML:',  
523 - TYPE_MHTML: 'MHT:',  
524 - TYPE_TEXT: 'TXT:',  
525 - TYPE_PPT: 'PPT',  
526 -}  
527 -  
528 -  
529 -# MSO files ActiveMime header magic  
530 -MSO_ACTIVEMIME_HEADER = b'ActiveMime'  
531 -  
532 -MODULE_EXTENSION = "bas"  
533 -CLASS_EXTENSION = "cls"  
534 -FORM_EXTENSION = "frm"  
535 -  
536 -# Namespaces and tags for Word2003 XML parsing:  
537 -NS_W = '{http://schemas.microsoft.com/office/word/2003/wordml}'  
538 -# the tag <w:binData w:name="editdata.mso"> contains the VBA macro code:  
539 -TAG_BINDATA = NS_W + 'binData'  
540 -ATTR_NAME = NS_W + 'name'  
541 -  
542 -# Namespaces and tags for Word/PowerPoint 2007+ XML parsing:  
543 -# root: <pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage">  
544 -NS_XMLPACKAGE = '{http://schemas.microsoft.com/office/2006/xmlPackage}'  
545 -TAG_PACKAGE = NS_XMLPACKAGE + 'package'  
546 -# the tag <pkg:part> includes <pkg:binaryData> that contains the VBA macro code in Base64:  
547 -# <pkg:part pkg:name="/word/vbaProject.bin" pkg:contentType="application/vnd.ms-office.vbaProject"><pkg:binaryData>  
548 -TAG_PKGPART = NS_XMLPACKAGE + 'part'  
549 -ATTR_PKG_NAME = NS_XMLPACKAGE + 'name'  
550 -ATTR_PKG_CONTENTTYPE = NS_XMLPACKAGE + 'contentType'  
551 -CTYPE_VBAPROJECT = "application/vnd.ms-office.vbaProject"  
552 -TAG_PKGBINDATA = NS_XMLPACKAGE + 'binaryData'  
553 -  
554 -# Keywords to detect auto-executable macros  
555 -AUTOEXEC_KEYWORDS = {  
556 - # MS Word:  
557 - 'Runs when the Word document is opened':  
558 - ('AutoExec', 'AutoOpen', 'DocumentOpen'),  
559 - 'Runs when the Word document is closed':  
560 - ('AutoExit', 'AutoClose', 'Document_Close', 'DocumentBeforeClose'),  
561 - 'Runs when the Word document is modified':  
562 - ('DocumentChange',),  
563 - 'Runs when a new Word document is created':  
564 - ('AutoNew', 'Document_New', 'NewDocument'),  
565 -  
566 - # MS Word and Publisher:  
567 - 'Runs when the Word or Publisher document is opened':  
568 - ('Document_Open',),  
569 - 'Runs when the Publisher document is closed':  
570 - ('Document_BeforeClose',),  
571 -  
572 - # MS Excel:  
573 - 'Runs when the Excel Workbook is opened':  
574 - ('Auto_Open', 'Workbook_Open', 'Workbook_Activate'),  
575 - 'Runs when the Excel Workbook is closed':  
576 - ('Auto_Close', 'Workbook_Close'),  
577 -  
578 - # any MS Office application:  
579 - 'Runs when the file is opened (using InkPicture ActiveX object)':  
580 - # ref:https://twitter.com/joe4security/status/770691099988025345  
581 - (r'\w+_Painted',),  
582 - 'Runs when the file is opened and ActiveX objects trigger events':  
583 - (r'\w+_(?:GotFocus|LostFocus|MouseHover)',),  
584 -}  
585 -  
586 -# Suspicious Keywords that may be used by malware  
587 -# See VBA language reference: http://msdn.microsoft.com/en-us/library/office/jj692818%28v=office.15%29.aspx  
588 -SUSPICIOUS_KEYWORDS = {  
589 - #TODO: use regex to support variable whitespaces  
590 - 'May read system environment variables':  
591 - ('Environ',),  
592 - 'May open a file':  
593 - ('Open',),  
594 - 'May write to a file (if combined with Open)':  
595 - #TODO: regex to find Open+Write on same line  
596 - ('Write', 'Put', 'Output', 'Print #'),  
597 - 'May read or write a binary file (if combined with Open)':  
598 - #TODO: regex to find Open+Binary on same line  
599 - ('Binary',),  
600 - 'May copy a file':  
601 - ('FileCopy', 'CopyFile'),  
602 - #FileCopy: http://msdn.microsoft.com/en-us/library/office/gg264390%28v=office.15%29.aspx  
603 - #CopyFile: http://msdn.microsoft.com/en-us/library/office/gg264089%28v=office.15%29.aspx  
604 - 'May delete a file':  
605 - ('Kill',),  
606 - 'May create a text file':  
607 - ('CreateTextFile', 'ADODB.Stream', 'WriteText', 'SaveToFile'),  
608 - #CreateTextFile: http://msdn.microsoft.com/en-us/library/office/gg264617%28v=office.15%29.aspx  
609 - #ADODB.Stream sample: http://pastebin.com/Z4TMyuq6  
610 - 'May run an executable file or a system command':  
611 - ('Shell', 'vbNormal', 'vbNormalFocus', 'vbHide', 'vbMinimizedFocus', 'vbMaximizedFocus', 'vbNormalNoFocus',  
612 - 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute'),  
613 - # MacScript: see https://msdn.microsoft.com/en-us/library/office/gg264812.aspx  
614 - 'May run an executable file or a system command on a Mac':  
615 - ('MacScript',),  
616 - 'May run an executable file or a system command on a Mac (if combined with libc.dylib)':  
617 - ('system', 'popen', r'exec[lv][ep]?'),  
618 - #Shell: http://msdn.microsoft.com/en-us/library/office/gg278437%28v=office.15%29.aspx  
619 - #WScript.Shell+Run sample: http://pastebin.com/Z4TMyuq6  
620 - 'May run PowerShell commands':  
621 - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/  
622 - #also: https://bitbucket.org/decalage/oletools/issues/14/olevba-library-update-ioc  
623 - # ref: https://blog.netspi.com/15-ways-to-bypass-the-powershell-execution-policy/  
624 - # TODO: add support for keywords starting with a non-alpha character, such as "-noexit"  
625 - # TODO: '-command', '-EncodedCommand', '-scriptblock'  
626 - ('PowerShell', 'noexit', 'ExecutionPolicy', 'noprofile', 'command', 'EncodedCommand',  
627 - 'invoke-command', 'scriptblock', 'Invoke-Expression', 'AuthorizationManager'),  
628 - 'May run an executable file or a system command using PowerShell':  
629 - ('Start-Process',),  
630 - 'May hide the application':  
631 - ('Application.Visible', 'ShowWindow', 'SW_HIDE'),  
632 - 'May create a directory':  
633 - ('MkDir',),  
634 - 'May save the current workbook':  
635 - ('ActiveWorkbook.SaveAs',),  
636 - 'May change which directory contains files to open at startup':  
637 - #TODO: confirm the actual effect  
638 - ('Application.AltStartupPath',),  
639 - 'May create an OLE object':  
640 - ('CreateObject',),  
641 - 'May create an OLE object using PowerShell':  
642 - ('New-Object',),  
643 - 'May run an application (if combined with CreateObject)':  
644 - ('Shell.Application',),  
645 - 'May enumerate application windows (if combined with Shell.Application object)':  
646 - ('Windows', 'FindWindow'),  
647 - 'May run code from a DLL':  
648 - #TODO: regex to find declare+lib on same line - see mraptor  
649 - ('Lib',),  
650 - 'May run code from a library on a Mac':  
651 - #TODO: regex to find declare+lib on same line - see mraptor  
652 - ('libc.dylib', 'dylib'),  
653 - 'May inject code into another process':  
654 - ('CreateThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload  
655 - 'VirtualAllocEx', 'RtlMoveMemory',  
656 - ),  
657 - 'May run a shellcode in memory':  
658 - ('EnumSystemLanguageGroupsW?', # Used by Hancitor in Oct 2016  
659 - 'EnumDateFormats(?:W|(?:Ex){1,2})?'), # see https://msdn.microsoft.com/en-us/library/windows/desktop/dd317810(v=vs.85).aspx  
660 - 'May download files from the Internet':  
661 - #TODO: regex to find urlmon+URLDownloadToFileA on same line  
662 - ('URLDownloadToFileA', 'Msxml2.XMLHTTP', 'Microsoft.XMLHTTP',  
663 - 'MSXML2.ServerXMLHTTP', # suggested in issue #13  
664 - 'User-Agent', # sample from @ozhermit: http://pastebin.com/MPc3iV6z  
665 - ),  
666 - 'May download files from the Internet using PowerShell':  
667 - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/  
668 - ('Net.WebClient', 'DownloadFile', 'DownloadString'),  
669 - 'May control another application by simulating user keystrokes':  
670 - ('SendKeys', 'AppActivate'),  
671 - #SendKeys: http://msdn.microsoft.com/en-us/library/office/gg278655%28v=office.15%29.aspx  
672 - 'May attempt to obfuscate malicious function calls':  
673 - ('CallByName',),  
674 - #CallByName: http://msdn.microsoft.com/en-us/library/office/gg278760%28v=office.15%29.aspx  
675 - 'May attempt to obfuscate specific strings (use option --deobf to deobfuscate)':  
676 - #TODO: regex to find several Chr*, not just one  
677 - ('Chr', 'ChrB', 'ChrW', 'StrReverse', 'Xor'),  
678 - #Chr: http://msdn.microsoft.com/en-us/library/office/gg264465%28v=office.15%29.aspx  
679 - 'May read or write registry keys':  
680 - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/  
681 - ('RegOpenKeyExA', 'RegOpenKeyEx', 'RegCloseKey'),  
682 - 'May read registry keys':  
683 - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/  
684 - ('RegQueryValueExA', 'RegQueryValueEx',  
685 - 'RegRead', #with Wscript.Shell  
686 - ),  
687 - 'May detect virtualization':  
688 - # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/  
689 - (r'SYSTEM\ControlSet001\Services\Disk\Enum', 'VIRTUAL', 'VMWARE', 'VBOX'),  
690 - 'May detect Anubis Sandbox':  
691 - # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/  
692 - # NOTES: this sample also checks App.EXEName but that seems to be a bug, it works in VB6 but not in VBA  
693 - # ref: http://www.syssec-project.eu/m/page-media/3/disarm-raid11.pdf  
694 - ('GetVolumeInformationA', 'GetVolumeInformation', # with kernel32.dll  
695 - '1824245000', r'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProductId',  
696 - '76487-337-8429955-22614', 'andy', 'sample', r'C:\exec\exec.exe', 'popupkiller'  
697 - ),  
698 - 'May detect Sandboxie':  
699 - # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/  
700 - # ref: http://www.cplusplus.com/forum/windows/96874/  
701 - ('SbieDll.dll', 'SandboxieControlWndClass'),  
702 - 'May detect Sunbelt Sandbox':  
703 - # ref: http://www.cplusplus.com/forum/windows/96874/  
704 - (r'C:\file.exe',),  
705 - 'May detect Norman Sandbox':  
706 - # ref: http://www.cplusplus.com/forum/windows/96874/  
707 - ('currentuser',),  
708 - 'May detect CW Sandbox':  
709 - # ref: http://www.cplusplus.com/forum/windows/96874/  
710 - ('Schmidti',),  
711 - 'May detect WinJail Sandbox':  
712 - # ref: http://www.cplusplus.com/forum/windows/96874/  
713 - ('Afx:400000:0',),  
714 -}  
715 -  
716 -# Regular Expression for a URL:  
717 -# http://en.wikipedia.org/wiki/Uniform_resource_locator  
718 -# http://www.w3.org/Addressing/URL/uri-spec.html  
719 -#TODO: also support username:password@server  
720 -#TODO: other protocols (file, gopher, wais, ...?)  
721 -SCHEME = r'\b(?:http|ftp)s?'  
722 -# see http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains  
723 -TLD = r'(?:xn--[a-zA-Z0-9]{4,20}|[a-zA-Z]{2,20})'  
724 -DNS_NAME = r'(?:[a-zA-Z0-9\-\.]+\.' + TLD + ')'  
725 -#TODO: IPv6 - see https://www.debuggex.com/  
726 -# A literal numeric IPv6 address may be given, but must be enclosed in [ ] e.g. [db8:0cec::99:123a]  
727 -NUMBER_0_255 = r'(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])'  
728 -IPv4 = r'(?:' + NUMBER_0_255 + r'\.){3}' + NUMBER_0_255  
729 -# IPv4 must come before the DNS name because it is more specific  
730 -SERVER = r'(?:' + IPv4 + '|' + DNS_NAME + ')'  
731 -PORT = r'(?:\:[0-9]{1,5})?'  
732 -SERVER_PORT = SERVER + PORT  
733 -URL_PATH = r'(?:/[a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~]*)?' # [^\.\,\)\(\s"]  
734 -URL_RE = SCHEME + r'\://' + SERVER_PORT + URL_PATH  
735 -re_url = re.compile(URL_RE)  
736 -  
737 -  
738 -# Patterns to be extracted (IP addresses, URLs, etc)  
739 -# From patterns.py in balbuzard  
740 -RE_PATTERNS = (  
741 - ('URL', re.compile(URL_RE)),  
742 - ('IPv4 address', re.compile(IPv4)),  
743 - # TODO: add IPv6  
744 - ('E-mail address', re.compile(r'(?i)\b[A-Z0-9._%+-]+@' + SERVER + '\b')),  
745 - # ('Domain name', re.compile(r'(?=^.{1,254}$)(^(?:(?!\d+\.|-)[a-zA-Z0-9_\-]{1,63}(?<!-)\.?)+(?:[a-zA-Z]{2,})$)')),  
746 - # Executable file name with known extensions (except .com which is present in many URLs, and .application):  
747 - ("Executable file name", re.compile(  
748 - r"(?i)\b\w+\.(EXE|PIF|GADGET|MSI|MSP|MSC|VBS|VBE|VB|JSE|JS|WSF|WSC|WSH|WS|BAT|CMD|DLL|SCR|HTA|CPL|CLASS|JAR|PS1XML|PS1|PS2XML|PS2|PSC1|PSC2|SCF|LNK|INF|REG)\b")),  
749 - # Sources: http://www.howtogeek.com/137270/50-file-extensions-that-are-potentially-dangerous-on-windows/  
750 - # TODO: https://support.office.com/en-us/article/Blocked-attachments-in-Outlook-3811cddc-17c3-4279-a30c-060ba0207372#__attachment_file_types  
751 - # TODO: add win & unix file paths  
752 - #('Hex string', re.compile(r'(?:[0-9A-Fa-f]{2}){4,}')),  
753 -)  
754 -  
755 -# regex to detect strings encoded in hexadecimal  
756 -re_hex_string = re.compile(r'(?:[0-9A-Fa-f]{2}){4,}')  
757 -  
758 -# regex to detect strings encoded in base64  
759 -#re_base64_string = re.compile(r'"(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?"')  
760 -# better version from balbuzard, less false positives:  
761 -# (plain version without double quotes, used also below in quoted_base64_string)  
762 -BASE64_RE = r'(?:[A-Za-z0-9+/]{4}){1,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[A-Za-z0-9+/][AQgw]==)?'  
763 -re_base64_string = re.compile('"' + BASE64_RE + '"')  
764 -# white list of common strings matching the base64 regex, but which are not base64 strings (all lowercase):  
765 -BASE64_WHITELIST = set(['thisdocument', 'thisworkbook', 'test', 'temp', 'http', 'open', 'exit'])  
766 -  
767 -# regex to detect strings encoded with a specific Dridex algorithm  
768 -# (see https://github.com/JamesHabben/MalwareStuff)  
769 -re_dridex_string = re.compile(r'"[0-9A-Za-z]{20,}"')  
770 -# regex to check that it is not just a hex string:  
771 -re_nothex_check = re.compile(r'[G-Zg-z]')  
772 -  
773 -# regex to extract printable strings (at least 5 chars) from VBA Forms:  
774 -re_printable_string = re.compile(b'[\\t\\r\\n\\x20-\\xFF]{5,}')  
775 -  
776 -  
777 -# === PARTIAL VBA GRAMMAR ====================================================  
778 -  
779 -# REFERENCES:  
780 -# - [MS-VBAL]: VBA Language Specification  
781 -# https://msdn.microsoft.com/en-us/library/dd361851.aspx  
782 -# - pyparsing: http://pyparsing.wikispaces.com/  
783 -  
784 -# TODO: set whitespaces according to VBA  
785 -# TODO: merge extended lines before parsing  
786 -  
787 -# Enable PackRat for better performance:  
788 -# (see https://pythonhosted.org/pyparsing/pyparsing.ParserElement-class.html#enablePackrat)  
789 -ParserElement.enablePackrat()  
790 -  
791 -# VBA identifier chars (from MS-VBAL 3.3.5)  
792 -vba_identifier_chars = alphanums + '_'  
793 -  
794 -class VbaExpressionString(str):  
795 - """  
796 - Class identical to str, used to distinguish plain strings from strings  
797 - obfuscated using VBA expressions (Chr, StrReverse, etc)  
798 - Usage: each VBA expression parse action should convert strings to  
799 - VbaExpressionString.  
800 - Then isinstance(s, VbaExpressionString) is True only for VBA expressions.  
801 - (see detect_vba_strings)  
802 - """  
803 - # TODO: use Unicode everywhere instead of str  
804 - pass  
805 -  
806 -  
807 -# --- NUMBER TOKENS ----------------------------------------------------------  
808 -  
809 -# 3.3.2 Number Tokens  
810 -# INTEGER = integer-literal ["%" / "&" / "^"]  
811 -# integer-literal = decimal-literal / octal-literal / hex-literal  
812 -# decimal-literal = 1*decimal-digit  
813 -# octal-literal = "&" [%x004F / %x006F] 1*octal-digit  
814 -# ; & or &o or &O  
815 -# hex-literal = "&" (%x0048 / %x0068) 1*hex-digit  
816 -# ; &h or &H  
817 -# octal-digit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7"  
818 -# decimal-digit = octal-digit / "8" / "9"  
819 -# hex-digit = decimal-digit / %x0041-0046 / %x0061-0066 ;A-F / a-f  
820 -  
821 -# NOTE: here Combine() is required to avoid spaces between elements  
822 -# NOTE: here WordStart is necessary to avoid matching a number preceded by  
823 -# letters or underscore (e.g. "VBT1" or "ABC_34"), when using scanString  
824 -decimal_literal = Combine(Optional('-') + WordStart(vba_identifier_chars) + Word(nums)  
825 - + Suppress(Optional(Word('%&^', exact=1))))  
826 -decimal_literal.setParseAction(lambda t: int(t[0]))  
827 -  
828 -octal_literal = Combine(Suppress(Literal('&') + Optional((CaselessLiteral('o')))) + Word(srange('[0-7]'))  
829 - + Suppress(Optional(Word('%&^', exact=1))))  
830 -octal_literal.setParseAction(lambda t: int(t[0], base=8))  
831 -  
832 -hex_literal = Combine(Suppress(CaselessLiteral('&h')) + Word(srange('[0-9a-fA-F]'))  
833 - + Suppress(Optional(Word('%&^', exact=1))))  
834 -hex_literal.setParseAction(lambda t: int(t[0], base=16))  
835 -  
836 -integer = decimal_literal | octal_literal | hex_literal  
837 -  
838 -  
839 -# --- QUOTED STRINGS ---------------------------------------------------------  
840 -  
841 -# 3.3.4 String Tokens  
842 -# STRING = double-quote *string-character (double-quote / line-continuation / LINE-END)  
843 -# double-quote = %x0022 ; "  
844 -# string-character = NO-LINE-CONTINUATION ((double-quote double-quote) termination-character)  
845 -  
846 -quoted_string = QuotedString('"', escQuote='""')  
847 -quoted_string.setParseAction(lambda t: str(t[0]))  
848 -  
849 -  
850 -#--- VBA Expressions ---------------------------------------------------------  
851 -  
852 -# See MS-VBAL 5.6 Expressions  
853 -  
854 -# need to pre-declare using Forward() because it is recursive  
855 -# VBA string expression and integer expression  
856 -vba_expr_str = Forward()  
857 -vba_expr_int = Forward()  
858 -  
859 -# --- CHR --------------------------------------------------------------------  
860 -  
861 -# MS-VBAL 6.1.2.11.1.4 Chr / Chr$  
862 -# Function Chr(CharCode As Long) As Variant  
863 -# Function Chr$(CharCode As Long) As String  
864 -# Parameter Description  
865 -# CharCode Long whose value is a code point.  
866 -# Returns a String data value consisting of a single character containing the character whose code  
867 -# point is the data value of the argument.  
868 -# - If the argument is not in the range 0 to 255, Error Number 5 ("Invalid procedure call or  
869 -# argument") is raised unless the implementation supports a character set with a larger code point  
870 -# range.  
871 -# - If the argument value is in the range of 0 to 127, it is interpreted as a 7-bit ASCII code point.  
872 -# - If the argument value is in the range of 128 to 255, the code point interpretation of the value is  
873 -# implementation defined.  
874 -# - Chr$ has the same runtime semantics as Chr, however the declared type of its function result is  
875 -# String rather than Variant.  
876 -  
877 -# 6.1.2.11.1.5 ChrB / ChrB$  
878 -# Function ChrB(CharCode As Long) As Variant  
879 -# Function ChrB$(CharCode As Long) As String  
880 -# CharCode Long whose value is a code point.  
881 -# Returns a String data value consisting of a single byte character whose code point value is the  
882 -# data value of the argument.  
883 -# - If the argument is not in the range 0 to 255, Error Number 6 ("Overflow") is raised.  
884 -# - ChrB$ has the same runtime semantics as ChrB however the declared type of its function result  
885 -# is String rather than Variant.  
886 -# - Note: the ChrB function is used with byte data contained in a String. Instead of returning a  
887 -# character, which may be one or two bytes, ChrB always returns a single byte. The ChrW function  
888 -# returns a String containing the Unicode character except on platforms where Unicode is not  
889 -# supported, in which case, the behavior is identical to the Chr function.  
890 -  
891 -# 6.1.2.11.1.6 ChrW/ ChrW$  
892 -# Function ChrW(CharCode As Long) As Variant  
893 -# Function ChrW$(CharCode As Long) As String  
894 -# CharCode Long whose value is a code point.  
895 -# Returns a String data value consisting of a single character containing the character whose code  
896 -# point is the data value of the argument.  
897 -# - If the argument is not in the range -32,767 to 65,535 then Error Number 5 ("Invalid procedure  
898 -# call or argument") is raised.  
899 -# - If the argument is a negative value it is treated as if it was the value: CharCode + 65,536.  
900 -# - If the implemented uses 16-bit Unicode code points argument, data value is interpreted as a 16-  
901 -# bit Unicode code point.  
902 -# - If the implementation does not support Unicode, ChrW has the same semantics as Chr.  
903 -# - ChrW$ has the same runtime semantics as ChrW, however the declared type of its function result  
904 -# is String rather than Variant.  
905 -  
906 -# Chr, Chr$, ChrB, ChrW(int) => char  
907 -vba_chr = Suppress(  
908 - Combine(WordStart(vba_identifier_chars) + CaselessLiteral('Chr')  
909 - + Optional(CaselessLiteral('B') | CaselessLiteral('W')) + Optional('$'))  
910 - + '(') + vba_expr_int + Suppress(')')  
911 -  
912 -def vba_chr_tostr(t):  
913 - try:  
914 - i = t[0]  
915 - # normal, non-unicode character:  
916 - if i>=0 and i<=255:  
917 - return VbaExpressionString(chr(i))  
918 - else:  
919 - return VbaExpressionString(chr(i).encode('utf-8', 'backslashreplace'))  
920 - except ValueError:  
921 - log.exception('ERROR: incorrect parameter value for chr(): %r' % i)  
922 - return VbaExpressionString('Chr(%r)' % i)  
923 -  
924 -vba_chr.setParseAction(vba_chr_tostr)  
925 -  
926 -  
927 -# --- ASC --------------------------------------------------------------------  
928 -  
929 -# Asc(char) => int  
930 -#TODO: see MS-VBAL 6.1.2.11.1.1 page 240 => AscB, AscW  
931 -vba_asc = Suppress(CaselessKeyword('Asc') + '(') + vba_expr_str + Suppress(')')  
932 -vba_asc.setParseAction(lambda t: ord(t[0]))  
933 -  
934 -  
935 -# --- VAL --------------------------------------------------------------------  
936 -  
937 -# Val(string) => int  
938 -# TODO: make sure the behavior of VBA's val is fully covered  
939 -vba_val = Suppress(CaselessKeyword('Val') + '(') + vba_expr_str + Suppress(')')  
940 -vba_val.setParseAction(lambda t: int(t[0].strip()))  
941 -  
942 -  
943 -# --- StrReverse() --------------------------------------------------------------------  
944 -  
945 -# StrReverse(string) => string  
946 -strReverse = Suppress(CaselessKeyword('StrReverse') + '(') + vba_expr_str + Suppress(')')  
947 -strReverse.setParseAction(lambda t: VbaExpressionString(str(t[0])[::-1]))  
948 -  
949 -  
950 -# --- ENVIRON() --------------------------------------------------------------------  
951 -  
952 -# Environ("name") => just translated to "%name%", that is enough for malware analysis  
953 -environ = Suppress(CaselessKeyword('Environ') + '(') + vba_expr_str + Suppress(')')  
954 -environ.setParseAction(lambda t: VbaExpressionString('%%%s%%' % t[0]))  
955 -  
956 -  
957 -# --- IDENTIFIER -------------------------------------------------------------  
958 -  
959 -#TODO: see MS-VBAL 3.3.5 page 33  
960 -# 3.3.5 Identifier Tokens  
961 -# Latin-identifier = first-Latin-identifier-character *subsequent-Latin-identifier-character  
962 -# first-Latin-identifier-character = (%x0041-005A / %x0061-007A) ; A-Z / a-z  
963 -# subsequent-Latin-identifier-character = first-Latin-identifier-character / DIGIT / %x5F ; underscore  
964 -latin_identifier = Word(initChars=alphas, bodyChars=alphanums + '_')  
965 -  
966 -# --- HEX FUNCTION -----------------------------------------------------------  
967 -  
968 -# match any custom function name with a hex string as argument:  
969 -# TODO: accept vba_expr_str_item as argument, check if it is a hex or base64 string at runtime  
970 -  
971 -# quoted string of at least two hexadecimal numbers of two digits:  
972 -quoted_hex_string = Suppress('"') + Combine(Word(hexnums, exact=2) * (2, None)) + Suppress('"')  
973 -quoted_hex_string.setParseAction(lambda t: str(t[0]))  
974 -  
975 -hex_function_call = Suppress(latin_identifier) + Suppress('(') + \  
976 - quoted_hex_string('hex_string') + Suppress(')')  
977 -hex_function_call.setParseAction(lambda t: VbaExpressionString(binascii.a2b_hex(t.hex_string)))  
978 -  
979 -  
980 -# --- BASE64 FUNCTION -----------------------------------------------------------  
981 -  
982 -# match any custom function name with a Base64 string as argument:  
983 -# TODO: accept vba_expr_str_item as argument, check if it is a hex or base64 string at runtime  
984 -  
985 -# quoted string of at least two hexadecimal numbers of two digits:  
986 -quoted_base64_string = Suppress('"') + Regex(BASE64_RE) + Suppress('"')  
987 -quoted_base64_string.setParseAction(lambda t: str(t[0]))  
988 -  
989 -base64_function_call = Suppress(latin_identifier) + Suppress('(') + \  
990 - quoted_base64_string('base64_string') + Suppress(')')  
991 -base64_function_call.setParseAction(lambda t: VbaExpressionString(binascii.a2b_base64(t.base64_string)))  
992 -  
993 -  
994 -# ---STRING EXPRESSION -------------------------------------------------------  
995 -  
996 -def concat_strings_list(tokens):  
997 - """  
998 - parse action to concatenate strings in a VBA expression with operators '+' or '&'  
999 - """  
1000 - # extract argument from the tokens:  
1001 - # expected to be a tuple containing a list of strings such as [a,'&',b,'&',c,...]  
1002 - strings = tokens[0][::2]  
1003 - return VbaExpressionString(''.join(strings))  
1004 -  
1005 -  
1006 -vba_expr_str_item = (vba_chr | strReverse | environ | quoted_string | hex_function_call | base64_function_call)  
1007 -  
1008 -vba_expr_str <<= infixNotation(vba_expr_str_item,  
1009 - [  
1010 - ("+", 2, opAssoc.LEFT, concat_strings_list),  
1011 - ("&", 2, opAssoc.LEFT, concat_strings_list),  
1012 - ])  
1013 -  
1014 -  
1015 -# --- INTEGER EXPRESSION -------------------------------------------------------  
1016 -  
1017 -def sum_ints_list(tokens):  
1018 - """  
1019 - parse action to sum integers in a VBA expression with operator '+'  
1020 - """  
1021 - # extract argument from the tokens:  
1022 - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]  
1023 - integers = tokens[0][::2]  
1024 - return sum(integers)  
1025 -  
1026 -  
1027 -def subtract_ints_list(tokens):  
1028 - """  
1029 - parse action to subtract integers in a VBA expression with operator '-'  
1030 - """  
1031 - # extract argument from the tokens:  
1032 - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]  
1033 - integers = tokens[0][::2]  
1034 - return reduce(lambda x,y:x-y, integers)  
1035 -  
1036 -  
1037 -def multiply_ints_list(tokens):  
1038 - """  
1039 - parse action to multiply integers in a VBA expression with operator '*'  
1040 - """  
1041 - # extract argument from the tokens:  
1042 - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]  
1043 - integers = tokens[0][::2]  
1044 - return reduce(lambda x,y:x*y, integers)  
1045 -  
1046 -  
1047 -def divide_ints_list(tokens):  
1048 - """  
1049 - parse action to divide integers in a VBA expression with operator '/'  
1050 - """  
1051 - # extract argument from the tokens:  
1052 - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...]  
1053 - integers = tokens[0][::2]  
1054 - return reduce(lambda x,y:x/y, integers)  
1055 -  
1056 -  
1057 -vba_expr_int_item = (vba_asc | vba_val | integer)  
1058 -  
1059 -# operators associativity:  
1060 -# https://en.wikipedia.org/wiki/Operator_associativity  
1061 -  
1062 -vba_expr_int <<= infixNotation(vba_expr_int_item,  
1063 - [  
1064 - ("*", 2, opAssoc.LEFT, multiply_ints_list),  
1065 - ("/", 2, opAssoc.LEFT, divide_ints_list),  
1066 - ("-", 2, opAssoc.LEFT, subtract_ints_list),  
1067 - ("+", 2, opAssoc.LEFT, sum_ints_list),  
1068 - ])  
1069 -  
1070 -  
1071 -# see detect_vba_strings for the deobfuscation code using this grammar  
1072 -  
1073 -# === MSO/ActiveMime files parsing ===========================================  
1074 -  
1075 -def is_mso_file(data):  
1076 - """  
1077 - Check if the provided data is the content of a MSO/ActiveMime file, such as  
1078 - the ones created by Outlook in some cases, or Word/Excel when saving a  
1079 - file with the MHTML format or the Word 2003 XML format.  
1080 - This function only checks the ActiveMime magic at the beginning of data.  
1081 - :param data: bytes string, MSO/ActiveMime file content  
1082 - :return: bool, True if the file is MSO, False otherwise  
1083 - """  
1084 - return data.startswith(MSO_ACTIVEMIME_HEADER)  
1085 -  
1086 -  
1087 -# regex to find zlib block headers, starting with byte 0x78 = 'x'  
1088 -re_zlib_header = re.compile(r'x')  
1089 -  
1090 -  
1091 -def mso_file_extract(data):  
1092 - """  
1093 - Extract the data stored into a MSO/ActiveMime file, such as  
1094 - the ones created by Outlook in some cases, or Word/Excel when saving a  
1095 - file with the MHTML format or the Word 2003 XML format.  
1096 -  
1097 - :param data: bytes string, MSO/ActiveMime file content  
1098 - :return: bytes string, extracted data (uncompressed)  
1099 -  
1100 - raise a MsoExtractionError if the data cannot be extracted  
1101 - """  
1102 - # check the magic:  
1103 - assert is_mso_file(data)  
1104 -  
1105 - # In all the samples seen so far, Word always uses an offset of 0x32,  
1106 - # and Excel 0x22A. But we read the offset from the header to be more  
1107 - # generic.  
1108 - offsets = [0x32, 0x22A]  
1109 -  
1110 - # First, attempt to get the compressed data offset from the header  
1111 - # According to my tests, it should be an unsigned 16 bits integer,  
1112 - # at offset 0x1E (little endian) + add 46:  
1113 - try:  
1114 - offset = struct.unpack_from('<H', data, offset=0x1E)[0] + 46  
1115 - log.debug('Parsing MSO file: data offset = 0x%X' % offset)  
1116 - offsets.insert(0, offset) # insert at beginning of offsets  
1117 - except struct.error as exc:  
1118 - log.info('Unable to parse MSO/ActiveMime file header (%s)' % exc)  
1119 - log.debug('Trace:', exc_info=True)  
1120 - raise MsoExtractionError('Unable to parse MSO/ActiveMime file header')  
1121 - # now try offsets  
1122 - for start in offsets:  
1123 - try:  
1124 - log.debug('Attempting zlib decompression from MSO file offset 0x%X' % start)  
1125 - extracted_data = zlib.decompress(data[start:])  
1126 - return extracted_data  
1127 - except zlib.error as exc:  
1128 - log.info('zlib decompression failed for offset %s (%s)'  
1129 - % (start, exc))  
1130 - log.debug('Trace:', exc_info=True)  
1131 - # None of the guessed offsets worked, let's try brute-forcing by looking  
1132 - # for potential zlib-compressed blocks starting with 0x78:  
1133 - log.debug('Looking for potential zlib-compressed blocks in MSO file')  
1134 - for match in re_zlib_header.finditer(data):  
1135 - start = match.start()  
1136 - try:  
1137 - log.debug('Attempting zlib decompression from MSO file offset 0x%X' % start)  
1138 - extracted_data = zlib.decompress(data[start:])  
1139 - return extracted_data  
1140 - except zlib.error as exc:  
1141 - log.info('zlib decompression failed (%s)' % exc)  
1142 - log.debug('Trace:', exc_info=True)  
1143 - raise MsoExtractionError('Unable to decompress data from a MSO/ActiveMime file')  
1144 -  
1145 -  
1146 -#--- FUNCTIONS ----------------------------------------------------------------  
1147 -  
1148 -# set of printable characters, for is_printable  
1149 -_PRINTABLE_SET = set(string.printable)  
1150 -  
1151 -def is_printable(s):  
1152 - """  
1153 - returns True if string s only contains printable ASCII characters  
1154 - (i.e. contained in string.printable)  
1155 - This is similar to Python 3's str.isprintable, for Python 2.x.  
1156 - :param s: str  
1157 - :return: bool  
1158 - """  
1159 - # inspired from http://stackoverflow.com/questions/3636928/test-if-a-python-string-is-printable  
1160 - # check if the set of chars from s is contained into the set of printable chars:  
1161 - return set(s).issubset(_PRINTABLE_SET)  
1162 -  
1163 -  
1164 -def copytoken_help(decompressed_current, decompressed_chunk_start):  
1165 - """  
1166 - compute bit masks to decode a CopyToken according to MS-OVBA 2.4.1.3.19.1 CopyToken Help  
1167 -  
1168 - decompressed_current: number of decompressed bytes so far, i.e. len(decompressed_container)  
1169 - decompressed_chunk_start: offset of the current chunk in the decompressed container  
1170 - return length_mask, offset_mask, bit_count, maximum_length  
1171 - """  
1172 - difference = decompressed_current - decompressed_chunk_start  
1173 - bit_count = int(math.ceil(math.log(difference, 2)))  
1174 - bit_count = max([bit_count, 4])  
1175 - length_mask = 0xFFFF >> bit_count  
1176 - offset_mask = ~length_mask  
1177 - maximum_length = (0xFFFF >> bit_count) + 3  
1178 - return length_mask, offset_mask, bit_count, maximum_length  
1179 -  
1180 -  
1181 -def decompress_stream(compressed_container):  
1182 - """  
1183 - Decompress a stream according to MS-OVBA section 2.4.1  
1184 -  
1185 - compressed_container: string compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm  
1186 - return the decompressed container as a string (bytes)  
1187 - """  
1188 - # 2.4.1.2 State Variables  
1189 -  
1190 - # The following state is maintained for the CompressedContainer (section 2.4.1.1.1):  
1191 - # CompressedRecordEnd: The location of the byte after the last byte in the CompressedContainer (section 2.4.1.1.1).  
1192 - # CompressedCurrent: The location of the next byte in the CompressedContainer (section 2.4.1.1.1) to be read by  
1193 - # decompression or to be written by compression.  
1194 -  
1195 - # The following state is maintained for the current CompressedChunk (section 2.4.1.1.4):  
1196 - # CompressedChunkStart: The location of the first byte of the CompressedChunk (section 2.4.1.1.4) within the  
1197 - # CompressedContainer (section 2.4.1.1.1).  
1198 -  
1199 - # The following state is maintained for a DecompressedBuffer (section 2.4.1.1.2):  
1200 - # DecompressedCurrent: The location of the next byte in the DecompressedBuffer (section 2.4.1.1.2) to be written by  
1201 - # decompression or to be read by compression.  
1202 - # DecompressedBufferEnd: The location of the byte after the last byte in the DecompressedBuffer (section 2.4.1.1.2).  
1203 -  
1204 - # The following state is maintained for the current DecompressedChunk (section 2.4.1.1.3):  
1205 - # DecompressedChunkStart: The location of the first byte of the DecompressedChunk (section 2.4.1.1.3) within the  
1206 - # DecompressedBuffer (section 2.4.1.1.2).  
1207 -  
1208 - decompressed_container = bytearray() # result  
1209 - compressed_current = 0  
1210 -  
1211 - sig_byte = compressed_container[compressed_current]  
1212 - if sig_byte != 0x01:  
1213 - raise ValueError('invalid signature byte {0:02X}'.format(sig_byte))  
1214 -  
1215 - compressed_current += 1  
1216 -  
1217 - #NOTE: the definition of CompressedRecordEnd is ambiguous. Here we assume that  
1218 - # CompressedRecordEnd = len(compressed_container)  
1219 - while compressed_current < len(compressed_container):  
1220 - # 2.4.1.1.5  
1221 - compressed_chunk_start = compressed_current  
1222 - # chunk header = first 16 bits  
1223 - compressed_chunk_header = \  
1224 - struct.unpack("<H", compressed_container[compressed_chunk_start:compressed_chunk_start + 2])[0]  
1225 - # chunk size = 12 first bits of header + 3  
1226 - chunk_size = (compressed_chunk_header & 0x0FFF) + 3  
1227 - # chunk signature = 3 next bits - should always be 0b011  
1228 - chunk_signature = (compressed_chunk_header >> 12) & 0x07  
1229 - if chunk_signature != 0b011:  
1230 - raise ValueError('Invalid CompressedChunkSignature in VBA compressed stream')  
1231 - # chunk flag = next bit - 1 == compressed, 0 == uncompressed  
1232 - chunk_flag = (compressed_chunk_header >> 15) & 0x01  
1233 - log.debug("chunk size = {0}, compressed flag = {1}".format(chunk_size, chunk_flag))  
1234 -  
1235 - #MS-OVBA 2.4.1.3.12: the maximum size of a chunk including its header is 4098 bytes (header 2 + data 4096)  
1236 - # The minimum size is 3 bytes  
1237 - # NOTE: there seems to be a typo in MS-OVBA, the check should be with 4098, not 4095 (which is the max value  
1238 - # in chunk header before adding 3.  
1239 - # Also the first test is not useful since a 12 bits value cannot be larger than 4095.  
1240 - if chunk_flag == 1 and chunk_size > 4098:  
1241 - raise ValueError('CompressedChunkSize > 4098 but CompressedChunkFlag == 1')  
1242 - if chunk_flag == 0 and chunk_size != 4098:  
1243 - raise ValueError('CompressedChunkSize != 4098 but CompressedChunkFlag == 0')  
1244 -  
1245 - # check if chunk_size goes beyond the compressed data, instead of silently cutting it:  
1246 - #TODO: raise an exception?  
1247 - if compressed_chunk_start + chunk_size > len(compressed_container):  
1248 - log.warning('Chunk size is larger than remaining compressed data')  
1249 - compressed_end = min([len(compressed_container), compressed_chunk_start + chunk_size])  
1250 - # read after chunk header:  
1251 - compressed_current = compressed_chunk_start + 2  
1252 -  
1253 - if chunk_flag == 0:  
1254 - # MS-OVBA 2.4.1.3.3 Decompressing a RawChunk  
1255 - # uncompressed chunk: read the next 4096 bytes as-is  
1256 - #TODO: check if there are at least 4096 bytes left  
1257 - decompressed_container.extend([compressed_container[compressed_current:compressed_current + 4096]])  
1258 - compressed_current += 4096  
1259 - else:  
1260 - # MS-OVBA 2.4.1.3.2 Decompressing a CompressedChunk  
1261 - # compressed chunk  
1262 - decompressed_chunk_start = len(decompressed_container)  
1263 - while compressed_current < compressed_end:  
1264 - # MS-OVBA 2.4.1.3.4 Decompressing a TokenSequence  
1265 - # log.debug('compressed_current = %d / compressed_end = %d' % (compressed_current, compressed_end))  
1266 - # FlagByte: 8 bits indicating if the following 8 tokens are either literal (1 byte of plain text) or  
1267 - # copy tokens (reference to a previous literal token)  
1268 - flag_byte = compressed_container[compressed_current]  
1269 - compressed_current += 1  
1270 - for bit_index in range(0, 8):  
1271 - # log.debug('bit_index=%d / compressed_current=%d / compressed_end=%d' % (bit_index, compressed_current, compressed_end))  
1272 - if compressed_current >= compressed_end:  
1273 - break  
1274 - # MS-OVBA 2.4.1.3.5 Decompressing a Token  
1275 - # MS-OVBA 2.4.1.3.17 Extract FlagBit  
1276 - flag_bit = (flag_byte >> bit_index) & 1  
1277 - #log.debug('bit_index=%d: flag_bit=%d' % (bit_index, flag_bit))  
1278 - if flag_bit == 0: # LiteralToken  
1279 - # copy one byte directly to output  
1280 - decompressed_container.extend([compressed_container[compressed_current]])  
1281 - compressed_current += 1  
1282 - else: # CopyToken  
1283 - # MS-OVBA 2.4.1.3.19.2 Unpack CopyToken  
1284 - copy_token = \  
1285 - struct.unpack("<H", compressed_container[compressed_current:compressed_current + 2])[0]  
1286 - #TODO: check this  
1287 - length_mask, offset_mask, bit_count, _ = copytoken_help(  
1288 - len(decompressed_container), decompressed_chunk_start)  
1289 - length = (copy_token & length_mask) + 3  
1290 - temp1 = copy_token & offset_mask  
1291 - temp2 = 16 - bit_count  
1292 - offset = (temp1 >> temp2) + 1  
1293 - #log.debug('offset=%d length=%d' % (offset, length))  
1294 - copy_source = len(decompressed_container) - offset  
1295 - for index in range(copy_source, copy_source + length):  
1296 - decompressed_container.extend([decompressed_container[index]])  
1297 - compressed_current += 2  
1298 - return bytes(decompressed_container)  
1299 -  
1300 -  
1301 -def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False):  
1302 - """  
1303 - Extract VBA macros from an OleFileIO object.  
1304 - Internal function, do not call directly.  
1305 -  
1306 - vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream  
1307 - vba_project: path to the PROJECT stream  
1308 - :param relaxed: If True, only create info/debug log entry if data is not as expected  
1309 - (e.g. opening substream fails); if False, raise an error in this case  
1310 - This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream  
1311 - """  
1312 - # Open the PROJECT stream:  
1313 - project = ole.openstream(project_path)  
1314 - log.debug('relaxed is %s' % relaxed)  
1315 -  
1316 - # sample content of the PROJECT stream:  
1317 -  
1318 - ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}"  
1319 - ## Document=ThisDocument/&H00000000  
1320 - ## Module=NewMacros  
1321 - ## Name="Project"  
1322 - ## HelpContextID="0"  
1323 - ## VersionCompatible32="393222000"  
1324 - ## CMG="F1F301E705E705E705E705"  
1325 - ## DPB="8F8D7FE3831F2020202020"  
1326 - ## GC="2D2FDD81E51EE61EE6E1"  
1327 - ##  
1328 - ## [Host Extender Info]  
1329 - ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000  
1330 - ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000  
1331 - ##  
1332 - ## [Workspace]  
1333 - ## ThisDocument=22, 29, 339, 477, Z  
1334 - ## NewMacros=-4, 42, 832, 510, C  
1335 -  
1336 - code_modules = {}  
1337 -  
1338 - for line in project:  
1339 - line = line.strip().decode('utf-8','ignore')  
1340 - if '=' in line:  
1341 - # split line at the 1st equal sign:  
1342 - name, value = line.split('=', 1)  
1343 - # looking for code modules  
1344 - # add the code module as a key in the dictionary  
1345 - # the value will be the extension needed later  
1346 - # The value is converted to lowercase, to allow case-insensitive matching (issue #3)  
1347 - value = value.lower()  
1348 - if name == 'Document':  
1349 - # split value at the 1st slash, keep 1st part:  
1350 - value = value.split('/', 1)[0]  
1351 - code_modules[value] = CLASS_EXTENSION  
1352 - elif name == 'Module':  
1353 - code_modules[value] = MODULE_EXTENSION  
1354 - elif name == 'Class':  
1355 - code_modules[value] = CLASS_EXTENSION  
1356 - elif name == 'BaseClass':  
1357 - code_modules[value] = FORM_EXTENSION  
1358 -  
1359 - # read data from dir stream (compressed)  
1360 - dir_compressed = ole.openstream(dir_path).read()  
1361 -  
1362 - def check_value(name, expected, value):  
1363 - if expected != value:  
1364 - if relaxed:  
1365 - log.error("invalid value for {0} expected {1:04X} got {2:04X}"  
1366 - .format(name, expected, value))  
1367 - else:  
1368 - raise UnexpectedDataError(dir_path, name, expected, value)  
1369 -  
1370 - dir_stream = BytesIO(decompress_stream(dir_compressed))  
1371 -  
1372 - # PROJECTSYSKIND Record  
1373 - projectsyskind_id = struct.unpack("<H", dir_stream.read(2))[0]  
1374 - check_value('PROJECTSYSKIND_Id', 0x0001, projectsyskind_id)  
1375 - projectsyskind_size = struct.unpack("<L", dir_stream.read(4))[0]  
1376 - check_value('PROJECTSYSKIND_Size', 0x0004, projectsyskind_size)  
1377 - projectsyskind_syskind = struct.unpack("<L", dir_stream.read(4))[0]  
1378 - if projectsyskind_syskind == 0x00:  
1379 - log.debug("16-bit Windows")  
1380 - elif projectsyskind_syskind == 0x01:  
1381 - log.debug("32-bit Windows")  
1382 - elif projectsyskind_syskind == 0x02:  
1383 - log.debug("Macintosh")  
1384 - elif projectsyskind_syskind == 0x03:  
1385 - log.debug("64-bit Windows")  
1386 - else:  
1387 - log.error("invalid PROJECTSYSKIND_SysKind {0:04X}".format(projectsyskind_syskind))  
1388 -  
1389 - # PROJECTLCID Record  
1390 - projectlcid_id = struct.unpack("<H", dir_stream.read(2))[0]  
1391 - check_value('PROJECTLCID_Id', 0x0002, projectlcid_id)  
1392 - projectlcid_size = struct.unpack("<L", dir_stream.read(4))[0]  
1393 - check_value('PROJECTLCID_Size', 0x0004, projectlcid_size)  
1394 - projectlcid_lcid = struct.unpack("<L", dir_stream.read(4))[0]  
1395 - check_value('PROJECTLCID_Lcid', 0x409, projectlcid_lcid)  
1396 -  
1397 - # PROJECTLCIDINVOKE Record  
1398 - projectlcidinvoke_id = struct.unpack("<H", dir_stream.read(2))[0]  
1399 - check_value('PROJECTLCIDINVOKE_Id', 0x0014, projectlcidinvoke_id)  
1400 - projectlcidinvoke_size = struct.unpack("<L", dir_stream.read(4))[0]  
1401 - check_value('PROJECTLCIDINVOKE_Size', 0x0004, projectlcidinvoke_size)  
1402 - projectlcidinvoke_lcidinvoke = struct.unpack("<L", dir_stream.read(4))[0]  
1403 - check_value('PROJECTLCIDINVOKE_LcidInvoke', 0x409, projectlcidinvoke_lcidinvoke)  
1404 -  
1405 - # PROJECTCODEPAGE Record  
1406 - projectcodepage_id = struct.unpack("<H", dir_stream.read(2))[0]  
1407 - check_value('PROJECTCODEPAGE_Id', 0x0003, projectcodepage_id)  
1408 - projectcodepage_size = struct.unpack("<L", dir_stream.read(4))[0]  
1409 - check_value('PROJECTCODEPAGE_Size', 0x0002, projectcodepage_size)  
1410 - projectcodepage_codepage = struct.unpack("<H", dir_stream.read(2))[0]  
1411 -  
1412 - # PROJECTNAME Record  
1413 - projectname_id = struct.unpack("<H", dir_stream.read(2))[0]  
1414 - check_value('PROJECTNAME_Id', 0x0004, projectname_id)  
1415 - projectname_sizeof_projectname = struct.unpack("<L", dir_stream.read(4))[0]  
1416 - if projectname_sizeof_projectname < 1 or projectname_sizeof_projectname > 128:  
1417 - log.error("PROJECTNAME_SizeOfProjectName value not in range: {0}".format(projectname_sizeof_projectname))  
1418 - projectname_projectname = dir_stream.read(projectname_sizeof_projectname)  
1419 - unused = projectname_projectname  
1420 -  
1421 - # PROJECTDOCSTRING Record  
1422 - projectdocstring_id = struct.unpack("<H", dir_stream.read(2))[0]  
1423 - check_value('PROJECTDOCSTRING_Id', 0x0005, projectdocstring_id)  
1424 - projectdocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0]  
1425 - if projectdocstring_sizeof_docstring > 2000:  
1426 - log.error(  
1427 - "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring))  
1428 - projectdocstring_docstring = dir_stream.read(projectdocstring_sizeof_docstring)  
1429 - projectdocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1430 - check_value('PROJECTDOCSTRING_Reserved', 0x0040, projectdocstring_reserved)  
1431 - projectdocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1432 - if projectdocstring_sizeof_docstring_unicode % 2 != 0:  
1433 - log.error("PROJECTDOCSTRING_SizeOfDocStringUnicode is not even")  
1434 - projectdocstring_docstring_unicode = dir_stream.read(projectdocstring_sizeof_docstring_unicode)  
1435 - unused = projectdocstring_docstring  
1436 - unused = projectdocstring_docstring_unicode  
1437 -  
1438 - # PROJECTHELPFILEPATH Record - MS-OVBA 2.3.4.2.1.7  
1439 - projecthelpfilepath_id = struct.unpack("<H", dir_stream.read(2))[0]  
1440 - check_value('PROJECTHELPFILEPATH_Id', 0x0006, projecthelpfilepath_id)  
1441 - projecthelpfilepath_sizeof_helpfile1 = struct.unpack("<L", dir_stream.read(4))[0]  
1442 - if projecthelpfilepath_sizeof_helpfile1 > 260:  
1443 - log.error(  
1444 - "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1))  
1445 - projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1)  
1446 - projecthelpfilepath_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1447 - check_value('PROJECTHELPFILEPATH_Reserved', 0x003D, projecthelpfilepath_reserved)  
1448 - projecthelpfilepath_sizeof_helpfile2 = struct.unpack("<L", dir_stream.read(4))[0]  
1449 - if projecthelpfilepath_sizeof_helpfile2 != projecthelpfilepath_sizeof_helpfile1:  
1450 - log.error("PROJECTHELPFILEPATH_SizeOfHelpFile1 does not equal PROJECTHELPFILEPATH_SizeOfHelpFile2")  
1451 - projecthelpfilepath_helpfile2 = dir_stream.read(projecthelpfilepath_sizeof_helpfile2)  
1452 - if projecthelpfilepath_helpfile2 != projecthelpfilepath_helpfile1:  
1453 - log.error("PROJECTHELPFILEPATH_HelpFile1 does not equal PROJECTHELPFILEPATH_HelpFile2")  
1454 -  
1455 - # PROJECTHELPCONTEXT Record  
1456 - projecthelpcontext_id = struct.unpack("<H", dir_stream.read(2))[0]  
1457 - check_value('PROJECTHELPCONTEXT_Id', 0x0007, projecthelpcontext_id)  
1458 - projecthelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0]  
1459 - check_value('PROJECTHELPCONTEXT_Size', 0x0004, projecthelpcontext_size)  
1460 - projecthelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0]  
1461 - unused = projecthelpcontext_helpcontext  
1462 -  
1463 - # PROJECTLIBFLAGS Record  
1464 - projectlibflags_id = struct.unpack("<H", dir_stream.read(2))[0]  
1465 - check_value('PROJECTLIBFLAGS_Id', 0x0008, projectlibflags_id)  
1466 - projectlibflags_size = struct.unpack("<L", dir_stream.read(4))[0]  
1467 - check_value('PROJECTLIBFLAGS_Size', 0x0004, projectlibflags_size)  
1468 - projectlibflags_projectlibflags = struct.unpack("<L", dir_stream.read(4))[0]  
1469 - check_value('PROJECTLIBFLAGS_ProjectLibFlags', 0x0000, projectlibflags_projectlibflags)  
1470 -  
1471 - # PROJECTVERSION Record  
1472 - projectversion_id = struct.unpack("<H", dir_stream.read(2))[0]  
1473 - check_value('PROJECTVERSION_Id', 0x0009, projectversion_id)  
1474 - projectversion_reserved = struct.unpack("<L", dir_stream.read(4))[0]  
1475 - check_value('PROJECTVERSION_Reserved', 0x0004, projectversion_reserved)  
1476 - projectversion_versionmajor = struct.unpack("<L", dir_stream.read(4))[0]  
1477 - projectversion_versionminor = struct.unpack("<H", dir_stream.read(2))[0]  
1478 - unused = projectversion_versionmajor  
1479 - unused = projectversion_versionminor  
1480 -  
1481 - # PROJECTCONSTANTS Record  
1482 - projectconstants_id = struct.unpack("<H", dir_stream.read(2))[0]  
1483 - check_value('PROJECTCONSTANTS_Id', 0x000C, projectconstants_id)  
1484 - projectconstants_sizeof_constants = struct.unpack("<L", dir_stream.read(4))[0]  
1485 - if projectconstants_sizeof_constants > 1015:  
1486 - log.error(  
1487 - "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants))  
1488 - projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants)  
1489 - projectconstants_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1490 - check_value('PROJECTCONSTANTS_Reserved', 0x003C, projectconstants_reserved)  
1491 - projectconstants_sizeof_constants_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1492 - if projectconstants_sizeof_constants_unicode % 2 != 0:  
1493 - log.error("PROJECTCONSTANTS_SizeOfConstantsUnicode is not even")  
1494 - projectconstants_constants_unicode = dir_stream.read(projectconstants_sizeof_constants_unicode)  
1495 - unused = projectconstants_constants  
1496 - unused = projectconstants_constants_unicode  
1497 -  
1498 - # array of REFERENCE records  
1499 - check = None  
1500 - while True:  
1501 - check = struct.unpack("<H", dir_stream.read(2))[0]  
1502 - log.debug("reference type = {0:04X}".format(check))  
1503 - if check == 0x000F:  
1504 - break  
1505 -  
1506 - if check == 0x0016:  
1507 - # REFERENCENAME  
1508 - reference_id = check  
1509 - reference_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]  
1510 - reference_name = dir_stream.read(reference_sizeof_name)  
1511 - reference_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1512 - # According to [MS-OVBA] 2.3.4.2.2.2 REFERENCENAME Record:  
1513 - # "Reserved (2 bytes): MUST be 0x003E. MUST be ignored."  
1514 - # So let's ignore it, otherwise it crashes on some files (issue #132)  
1515 - # PR #135 by @c1fe:  
1516 - # contrary to the specification I think that the unicode name  
1517 - # is optional. if reference_reserved is not 0x003E I think it  
1518 - # is actually the start of another REFERENCE record  
1519 - # at least when projectsyskind_syskind == 0x02 (Macintosh)  
1520 - if reference_reserved == 0x003E:  
1521 - #if reference_reserved not in (0x003E, 0x000D):  
1522 - # raise UnexpectedDataError(dir_path, 'REFERENCE_Reserved',  
1523 - # 0x0003E, reference_reserved)  
1524 - reference_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1525 - reference_name_unicode = dir_stream.read(reference_sizeof_name_unicode)  
1526 - unused = reference_id  
1527 - unused = reference_name  
1528 - unused = reference_name_unicode  
1529 - continue  
1530 - else:  
1531 - check = reference_reserved  
1532 - log.debug("reference type = {0:04X}".format(check))  
1533 -  
1534 - if check == 0x0033:  
1535 - # REFERENCEORIGINAL (followed by REFERENCECONTROL)  
1536 - referenceoriginal_id = check  
1537 - referenceoriginal_sizeof_libidoriginal = struct.unpack("<L", dir_stream.read(4))[0]  
1538 - referenceoriginal_libidoriginal = dir_stream.read(referenceoriginal_sizeof_libidoriginal)  
1539 - unused = referenceoriginal_id  
1540 - unused = referenceoriginal_libidoriginal  
1541 - continue  
1542 -  
1543 - if check == 0x002F:  
1544 - # REFERENCECONTROL  
1545 - referencecontrol_id = check  
1546 - referencecontrol_sizetwiddled = struct.unpack("<L", dir_stream.read(4))[0] # ignore  
1547 - referencecontrol_sizeof_libidtwiddled = struct.unpack("<L", dir_stream.read(4))[0]  
1548 - referencecontrol_libidtwiddled = dir_stream.read(referencecontrol_sizeof_libidtwiddled)  
1549 - referencecontrol_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] # ignore  
1550 - check_value('REFERENCECONTROL_Reserved1', 0x0000, referencecontrol_reserved1)  
1551 - referencecontrol_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] # ignore  
1552 - check_value('REFERENCECONTROL_Reserved2', 0x0000, referencecontrol_reserved2)  
1553 - unused = referencecontrol_id  
1554 - unused = referencecontrol_sizetwiddled  
1555 - unused = referencecontrol_libidtwiddled  
1556 - # optional field  
1557 - check2 = struct.unpack("<H", dir_stream.read(2))[0]  
1558 - if check2 == 0x0016:  
1559 - referencecontrol_namerecordextended_id = check  
1560 - referencecontrol_namerecordextended_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0]  
1561 - referencecontrol_namerecordextended_name = dir_stream.read(  
1562 - referencecontrol_namerecordextended_sizeof_name)  
1563 - referencecontrol_namerecordextended_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1564 - if referencecontrol_namerecordextended_reserved == 0x003E:  
1565 - referencecontrol_namerecordextended_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1566 - referencecontrol_namerecordextended_name_unicode = dir_stream.read(  
1567 - referencecontrol_namerecordextended_sizeof_name_unicode)  
1568 - referencecontrol_reserved3 = struct.unpack("<H", dir_stream.read(2))[0]  
1569 - unused = referencecontrol_namerecordextended_id  
1570 - unused = referencecontrol_namerecordextended_name  
1571 - unused = referencecontrol_namerecordextended_name_unicode  
1572 - else:  
1573 - referencecontrol_reserved3 = referencecontrol_namerecordextended_reserved  
1574 - else:  
1575 - referencecontrol_reserved3 = check2  
1576 -  
1577 - check_value('REFERENCECONTROL_Reserved3', 0x0030, referencecontrol_reserved3)  
1578 - referencecontrol_sizeextended = struct.unpack("<L", dir_stream.read(4))[0]  
1579 - referencecontrol_sizeof_libidextended = struct.unpack("<L", dir_stream.read(4))[0]  
1580 - referencecontrol_libidextended = dir_stream.read(referencecontrol_sizeof_libidextended)  
1581 - referencecontrol_reserved4 = struct.unpack("<L", dir_stream.read(4))[0]  
1582 - referencecontrol_reserved5 = struct.unpack("<H", dir_stream.read(2))[0]  
1583 - referencecontrol_originaltypelib = dir_stream.read(16)  
1584 - referencecontrol_cookie = struct.unpack("<L", dir_stream.read(4))[0]  
1585 - unused = referencecontrol_sizeextended  
1586 - unused = referencecontrol_libidextended  
1587 - unused = referencecontrol_reserved4  
1588 - unused = referencecontrol_reserved5  
1589 - unused = referencecontrol_originaltypelib  
1590 - unused = referencecontrol_cookie  
1591 - continue  
1592 -  
1593 - if check == 0x000D:  
1594 - # REFERENCEREGISTERED  
1595 - referenceregistered_id = check  
1596 - referenceregistered_size = struct.unpack("<L", dir_stream.read(4))[0]  
1597 - referenceregistered_sizeof_libid = struct.unpack("<L", dir_stream.read(4))[0]  
1598 - referenceregistered_libid = dir_stream.read(referenceregistered_sizeof_libid)  
1599 - referenceregistered_reserved1 = struct.unpack("<L", dir_stream.read(4))[0]  
1600 - check_value('REFERENCEREGISTERED_Reserved1', 0x0000, referenceregistered_reserved1)  
1601 - referenceregistered_reserved2 = struct.unpack("<H", dir_stream.read(2))[0]  
1602 - check_value('REFERENCEREGISTERED_Reserved2', 0x0000, referenceregistered_reserved2)  
1603 - unused = referenceregistered_id  
1604 - unused = referenceregistered_size  
1605 - unused = referenceregistered_libid  
1606 - continue  
1607 -  
1608 - if check == 0x000E:  
1609 - # REFERENCEPROJECT  
1610 - referenceproject_id = check  
1611 - referenceproject_size = struct.unpack("<L", dir_stream.read(4))[0]  
1612 - referenceproject_sizeof_libidabsolute = struct.unpack("<L", dir_stream.read(4))[0]  
1613 - referenceproject_libidabsolute = dir_stream.read(referenceproject_sizeof_libidabsolute)  
1614 - referenceproject_sizeof_libidrelative = struct.unpack("<L", dir_stream.read(4))[0]  
1615 - referenceproject_libidrelative = dir_stream.read(referenceproject_sizeof_libidrelative)  
1616 - referenceproject_majorversion = struct.unpack("<L", dir_stream.read(4))[0]  
1617 - referenceproject_minorversion = struct.unpack("<H", dir_stream.read(2))[0]  
1618 - unused = referenceproject_id  
1619 - unused = referenceproject_size  
1620 - unused = referenceproject_libidabsolute  
1621 - unused = referenceproject_libidrelative  
1622 - unused = referenceproject_majorversion  
1623 - unused = referenceproject_minorversion  
1624 - continue  
1625 -  
1626 - log.error('invalid or unknown check Id {0:04X}'.format(check))  
1627 - # raise an exception instead of stopping abruptly (issue #180)  
1628 - raise UnexpectedDataError(dir_path, 'reference type', (0x0F, 0x16, 0x33, 0x2F, 0x0D, 0x0E), check)  
1629 - #sys.exit(0)  
1630 -  
1631 - projectmodules_id = check #struct.unpack("<H", dir_stream.read(2))[0]  
1632 - check_value('PROJECTMODULES_Id', 0x000F, projectmodules_id)  
1633 - projectmodules_size = struct.unpack("<L", dir_stream.read(4))[0]  
1634 - check_value('PROJECTMODULES_Size', 0x0002, projectmodules_size)  
1635 - projectmodules_count = struct.unpack("<H", dir_stream.read(2))[0]  
1636 - projectmodules_projectcookierecord_id = struct.unpack("<H", dir_stream.read(2))[0]  
1637 - check_value('PROJECTMODULES_ProjectCookieRecord_Id', 0x0013, projectmodules_projectcookierecord_id)  
1638 - projectmodules_projectcookierecord_size = struct.unpack("<L", dir_stream.read(4))[0]  
1639 - check_value('PROJECTMODULES_ProjectCookieRecord_Size', 0x0002, projectmodules_projectcookierecord_size)  
1640 - projectmodules_projectcookierecord_cookie = struct.unpack("<H", dir_stream.read(2))[0]  
1641 - unused = projectmodules_projectcookierecord_cookie  
1642 -  
1643 - # short function to simplify unicode text output  
1644 - uni_out = lambda unicode_text: unicode_text.encode('utf-8', 'replace')  
1645 -  
1646 - log.debug("parsing {0} modules".format(projectmodules_count))  
1647 - for projectmodule_index in range(0, projectmodules_count):  
1648 - try:  
1649 - modulename_id = struct.unpack("<H", dir_stream.read(2))[0]  
1650 - check_value('MODULENAME_Id', 0x0019, modulename_id)  
1651 - modulename_sizeof_modulename = struct.unpack("<L", dir_stream.read(4))[0]  
1652 - modulename_modulename = dir_stream.read(modulename_sizeof_modulename).decode('utf-8', 'backslashreplace')  
1653 - # TODO: preset variables to avoid "referenced before assignment" errors  
1654 - modulename_unicode_modulename_unicode = ''  
1655 - # account for optional sections  
1656 - section_id = struct.unpack("<H", dir_stream.read(2))[0]  
1657 - if section_id == 0x0047:  
1658 - modulename_unicode_id = section_id  
1659 - modulename_unicode_sizeof_modulename_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1660 - modulename_unicode_modulename_unicode = dir_stream.read(  
1661 - modulename_unicode_sizeof_modulename_unicode).decode('UTF-16LE', 'replace')  
1662 - # just guessing that this is the same encoding as used in OleFileIO  
1663 - unused = modulename_unicode_id  
1664 - section_id = struct.unpack("<H", dir_stream.read(2))[0]  
1665 - if section_id == 0x001A:  
1666 - modulestreamname_id = section_id  
1667 - modulestreamname_sizeof_streamname = struct.unpack("<L", dir_stream.read(4))[0]  
1668 - modulestreamname_streamname = dir_stream.read(modulestreamname_sizeof_streamname)  
1669 - modulestreamname_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1670 - check_value('MODULESTREAMNAME_Reserved', 0x0032, modulestreamname_reserved)  
1671 - modulestreamname_sizeof_streamname_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1672 - modulestreamname_streamname_unicode = dir_stream.read(  
1673 - modulestreamname_sizeof_streamname_unicode).decode('UTF-16LE', 'replace')  
1674 - # just guessing that this is the same encoding as used in OleFileIO  
1675 - unused = modulestreamname_id  
1676 - section_id = struct.unpack("<H", dir_stream.read(2))[0]  
1677 - if section_id == 0x001C:  
1678 - moduledocstring_id = section_id  
1679 - check_value('MODULEDOCSTRING_Id', 0x001C, moduledocstring_id)  
1680 - moduledocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0]  
1681 - moduledocstring_docstring = dir_stream.read(moduledocstring_sizeof_docstring)  
1682 - moduledocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0]  
1683 - check_value('MODULEDOCSTRING_Reserved', 0x0048, moduledocstring_reserved)  
1684 - moduledocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0]  
1685 - moduledocstring_docstring_unicode = dir_stream.read(moduledocstring_sizeof_docstring_unicode)  
1686 - unused = moduledocstring_docstring  
1687 - unused = moduledocstring_docstring_unicode  
1688 - section_id = struct.unpack("<H", dir_stream.read(2))[0]  
1689 - if section_id == 0x0031:  
1690 - moduleoffset_id = section_id  
1691 - check_value('MODULEOFFSET_Id', 0x0031, moduleoffset_id)  
1692 - moduleoffset_size = struct.unpack("<L", dir_stream.read(4))[0]  
1693 - check_value('MODULEOFFSET_Size', 0x0004, moduleoffset_size)  
1694 - moduleoffset_textoffset = struct.unpack("<L", dir_stream.read(4))[0]  
1695 - section_id = struct.unpack("<H", dir_stream.read(2))[0]  
1696 - if section_id == 0x001E:  
1697 - modulehelpcontext_id = section_id  
1698 - check_value('MODULEHELPCONTEXT_Id', 0x001E, modulehelpcontext_id)  
1699 - modulehelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0]  
1700 - check_value('MODULEHELPCONTEXT_Size', 0x0004, modulehelpcontext_size)  
1701 - modulehelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0]  
1702 - unused = modulehelpcontext_helpcontext  
1703 - section_id = struct.unpack("<H", dir_stream.read(2))[0]  
1704 - if section_id == 0x002C:  
1705 - modulecookie_id = section_id  
1706 - check_value('MODULECOOKIE_Id', 0x002C, modulecookie_id)  
1707 - modulecookie_size = struct.unpack("<L", dir_stream.read(4))[0]  
1708 - check_value('MODULECOOKIE_Size', 0x0002, modulecookie_size)  
1709 - modulecookie_cookie = struct.unpack("<H", dir_stream.read(2))[0]  
1710 - unused = modulecookie_cookie  
1711 - section_id = struct.unpack("<H", dir_stream.read(2))[0]  
1712 - if section_id == 0x0021 or section_id == 0x0022:  
1713 - moduletype_id = section_id  
1714 - moduletype_reserved = struct.unpack("<L", dir_stream.read(4))[0]  
1715 - unused = moduletype_id  
1716 - unused = moduletype_reserved  
1717 - section_id = struct.unpack("<H", dir_stream.read(2))[0]  
1718 - if section_id == 0x0025:  
1719 - modulereadonly_id = section_id  
1720 - check_value('MODULEREADONLY_Id', 0x0025, modulereadonly_id)  
1721 - modulereadonly_reserved = struct.unpack("<L", dir_stream.read(4))[0]  
1722 - check_value('MODULEREADONLY_Reserved', 0x0000, modulereadonly_reserved)  
1723 - section_id = struct.unpack("<H", dir_stream.read(2))[0]  
1724 - if section_id == 0x0028:  
1725 - moduleprivate_id = section_id  
1726 - check_value('MODULEPRIVATE_Id', 0x0028, moduleprivate_id)  
1727 - moduleprivate_reserved = struct.unpack("<L", dir_stream.read(4))[0]  
1728 - check_value('MODULEPRIVATE_Reserved', 0x0000, moduleprivate_reserved)  
1729 - section_id = struct.unpack("<H", dir_stream.read(2))[0]  
1730 - if section_id == 0x002B: # TERMINATOR  
1731 - module_reserved = struct.unpack("<L", dir_stream.read(4))[0]  
1732 - check_value('MODULE_Reserved', 0x0000, module_reserved)  
1733 - section_id = None  
1734 - if section_id != None:  
1735 - log.warning('unknown or invalid module section id {0:04X}'.format(section_id))  
1736 -  
1737 - log.debug('Project CodePage = %d' % projectcodepage_codepage)  
1738 - if projectcodepage_codepage in MAC_CODEPAGES:  
1739 - vba_codec = MAC_CODEPAGES[projectcodepage_codepage]  
1740 - else:  
1741 - vba_codec = 'cp%d' % projectcodepage_codepage  
1742 - log.debug("ModuleName = {0}".format(modulename_modulename))  
1743 - log.debug("ModuleNameUnicode = {0}".format(uni_out(modulename_unicode_modulename_unicode)))  
1744 - log.debug("StreamName = {0}".format(modulestreamname_streamname))  
1745 - try:  
1746 - streamname_unicode = modulestreamname_streamname.decode(vba_codec)  
1747 - except UnicodeError as ue:  
1748 - log.debug('failed to decode stream name {0!r} with codec {1}'  
1749 - .format(uni_out(streamname_unicode), vba_codec))  
1750 - streamname_unicode = modulestreamname_streamname.decode(vba_codec, errors='replace')  
1751 - log.debug("StreamName.decode('%s') = %s" % (vba_codec, uni_out(streamname_unicode)))  
1752 - log.debug("StreamNameUnicode = {0}".format(uni_out(modulestreamname_streamname_unicode)))  
1753 - log.debug("TextOffset = {0}".format(moduleoffset_textoffset))  
1754 -  
1755 - code_data = None  
1756 - try_names = streamname_unicode, \  
1757 - modulename_unicode_modulename_unicode, \  
1758 - modulestreamname_streamname_unicode  
1759 - for stream_name in try_names:  
1760 - # TODO: if olefile._find were less private, could replace this  
1761 - # try-except with calls to it  
1762 - try:  
1763 - code_path = vba_root + u'VBA/' + stream_name  
1764 - log.debug('opening VBA code stream %s' % uni_out(code_path))  
1765 - code_data = ole.openstream(code_path).read()  
1766 - break  
1767 - except IOError as ioe:  
1768 - log.debug('failed to open stream VBA/%r (%r), try other name'  
1769 - % (uni_out(stream_name), ioe))  
1770 -  
1771 - if code_data is None:  
1772 - log.info("Could not open stream %d of %d ('VBA/' + one of %r)!"  
1773 - % (projectmodule_index, projectmodules_count,  
1774 - '/'.join("'" + uni_out(stream_name) + "'"  
1775 - for stream_name in try_names)))  
1776 - if relaxed:  
1777 - continue # ... with next submodule  
1778 - else:  
1779 - raise SubstreamOpenError('[BASE]', 'VBA/' +  
1780 - uni_out(modulename_unicode_modulename_unicode))  
1781 -  
1782 - log.debug("length of code_data = {0}".format(len(code_data)))  
1783 - log.debug("offset of code_data = {0}".format(moduleoffset_textoffset))  
1784 - code_data = code_data[moduleoffset_textoffset:]  
1785 - if len(code_data) > 0:  
1786 - code_data = decompress_stream(code_data)  
1787 - # case-insensitive search in the code_modules dict to find the file extension:  
1788 - filext = code_modules.get(modulename_modulename.lower(), 'bin')  
1789 - filename = '{0}.{1}'.format(modulename_modulename, filext)  
1790 - #TODO: also yield the codepage so that callers can decode it properly  
1791 - yield (code_path, filename, code_data)  
1792 - # print '-'*79  
1793 - # print filename  
1794 - # print ''  
1795 - # print code_data  
1796 - # print ''  
1797 - log.debug('extracted file {0}'.format(filename))  
1798 - else:  
1799 - log.warning("module stream {0} has code data length 0".format(modulestreamname_streamname))  
1800 - except (UnexpectedDataError, SubstreamOpenError):  
1801 - raise  
1802 - except Exception as exc:  
1803 - log.info('Error parsing module {0} of {1} in _extract_vba:'  
1804 - .format(projectmodule_index, projectmodules_count),  
1805 - exc_info=True)  
1806 - if not relaxed:  
1807 - raise  
1808 - _ = unused # make pylint happy: now variable "unused" is being used ;-)  
1809 - return  
1810 -  
1811 -  
1812 -def vba_collapse_long_lines(vba_code):  
1813 - """  
1814 - Parse a VBA module code to detect continuation line characters (underscore) and  
1815 - collapse split lines. Continuation line characters are replaced by spaces.  
1816 -  
1817 - :param vba_code: str, VBA module code  
1818 - :return: str, VBA module code with long lines collapsed  
1819 - """  
1820 - # TODO: use a regex instead, to allow whitespaces after the underscore?  
1821 - vba_code = vba_code.replace(' _\r\n', ' ')  
1822 - vba_code = vba_code.replace(' _\r', ' ')  
1823 - vba_code = vba_code.replace(' _\n', ' ')  
1824 - return vba_code  
1825 -  
1826 -  
1827 -def filter_vba(vba_code):  
1828 - """  
1829 - Filter VBA source code to remove the first lines starting with "Attribute VB_",  
1830 - which are automatically added by MS Office and not displayed in the VBA Editor.  
1831 - This should only be used when displaying source code for human analysis.  
1832 -  
1833 - Note: lines are not filtered if they contain a colon, because it could be  
1834 - used to hide malicious instructions.  
1835 -  
1836 - :param vba_code: str, VBA source code  
1837 - :return: str, filtered VBA source code  
1838 - """  
1839 - vba_lines = vba_code.splitlines()  
1840 - start = 0  
1841 - for line in vba_lines:  
1842 - if line.startswith("Attribute VB_") and not ':' in line:  
1843 - start += 1  
1844 - else:  
1845 - break  
1846 - #TODO: also remove empty lines?  
1847 - vba = '\n'.join(vba_lines[start:])  
1848 - return vba  
1849 -  
1850 -  
1851 -def detect_autoexec(vba_code, obfuscation=None):  
1852 - """  
1853 - Detect if the VBA code contains keywords corresponding to macros running  
1854 - automatically when triggered by specific actions (e.g. when a document is  
1855 - opened or closed).  
1856 -  
1857 - :param vba_code: str, VBA source code  
1858 - :param obfuscation: None or str, name of obfuscation to be added to description  
1859 - :return: list of str tuples (keyword, description)  
1860 - """  
1861 - #TODO: merge code with detect_suspicious  
1862 - # case-insensitive search  
1863 - #vba_code = vba_code.lower()  
1864 - results = []  
1865 - obf_text = ''  
1866 - if obfuscation:  
1867 - obf_text = ' (obfuscation: %s)' % obfuscation  
1868 - for description, keywords in AUTOEXEC_KEYWORDS.items():  
1869 - for keyword in keywords:  
1870 - #TODO: if keyword is already a compiled regex, use it as-is  
1871 - # search using regex to detect word boundaries:  
1872 - match = re.search(r'(?i)\b' + keyword + r'\b', vba_code)  
1873 - if match:  
1874 - #if keyword.lower() in vba_code:  
1875 - found_keyword = match.group()  
1876 - results.append((found_keyword, description + obf_text))  
1877 - return results  
1878 -  
1879 -  
1880 -def detect_suspicious(vba_code, obfuscation=None):  
1881 - """  
1882 - Detect if the VBA code contains suspicious keywords corresponding to  
1883 - potential malware behaviour.  
1884 -  
1885 - :param vba_code: str, VBA source code  
1886 - :param obfuscation: None or str, name of obfuscation to be added to description  
1887 - :return: list of str tuples (keyword, description)  
1888 - """  
1889 - # case-insensitive search  
1890 - #vba_code = vba_code.lower()  
1891 - results = []  
1892 - obf_text = ''  
1893 - if obfuscation:  
1894 - obf_text = ' (obfuscation: %s)' % obfuscation  
1895 - for description, keywords in SUSPICIOUS_KEYWORDS.items():  
1896 - for keyword in keywords:  
1897 - # search using regex to detect word boundaries:  
1898 - match = re.search(r'(?i)\b' + re.escape(keyword) + r'\b', vba_code)  
1899 - if match:  
1900 - #if keyword.lower() in vba_code:  
1901 - found_keyword = match.group()  
1902 - results.append((found_keyword, description + obf_text))  
1903 - return results  
1904 -  
1905 -  
1906 -def detect_patterns(vba_code, obfuscation=None):  
1907 - """  
1908 - Detect if the VBA code contains specific patterns such as IP addresses,  
1909 - URLs, e-mail addresses, executable file names, etc.  
1910 -  
1911 - :param vba_code: str, VBA source code  
1912 - :return: list of str tuples (pattern type, value)  
1913 - """  
1914 - results = []  
1915 - found = set()  
1916 - obf_text = ''  
1917 - if obfuscation:  
1918 - obf_text = ' (obfuscation: %s)' % obfuscation  
1919 - for pattern_type, pattern_re in RE_PATTERNS:  
1920 - for match in pattern_re.finditer(vba_code):  
1921 - value = match.group()  
1922 - if value not in found:  
1923 - results.append((pattern_type + obf_text, value))  
1924 - found.add(value)  
1925 - return results  
1926 -  
1927 -  
1928 -def detect_hex_strings(vba_code):  
1929 - """  
1930 - Detect if the VBA code contains strings encoded in hexadecimal.  
1931 -  
1932 - :param vba_code: str, VBA source code  
1933 - :return: list of str tuples (encoded string, decoded string)  
1934 - """  
1935 - results = []  
1936 - found = set()  
1937 - for match in re_hex_string.finditer(vba_code):  
1938 - value = match.group()  
1939 - if value not in found:  
1940 - decoded = binascii.unhexlify(value)  
1941 - results.append((value, decoded.decode('utf-8', 'backslashreplace')))  
1942 - found.add(value)  
1943 - return results  
1944 -  
1945 -  
1946 -def detect_base64_strings(vba_code):  
1947 - """  
1948 - Detect if the VBA code contains strings encoded in base64.  
1949 -  
1950 - :param vba_code: str, VBA source code  
1951 - :return: list of str tuples (encoded string, decoded string)  
1952 - """  
1953 - #TODO: avoid matching simple hex strings as base64?  
1954 - results = []  
1955 - found = set()  
1956 - for match in re_base64_string.finditer(vba_code):  
1957 - # extract the base64 string without quotes:  
1958 - value = match.group().strip('"')  
1959 - # check it is not just a hex string:  
1960 - if not re_nothex_check.search(value):  
1961 - continue  
1962 - # only keep new values and not in the whitelist:  
1963 - if value not in found and value.lower() not in BASE64_WHITELIST:  
1964 - try:  
1965 - decoded = base64.b64decode(value)  
1966 - results.append((value, decoded.decode('utf-8','replace')))  
1967 - found.add(value)  
1968 - except (TypeError, ValueError) as exc:  
1969 - log.debug('Failed to base64-decode (%s)' % exc)  
1970 - # if an exception occurs, it is likely not a base64-encoded string  
1971 - return results  
1972 -  
1973 -  
1974 -def detect_dridex_strings(vba_code):  
1975 - """  
1976 - Detect if the VBA code contains strings obfuscated with a specific algorithm found in Dridex samples.  
1977 -  
1978 - :param vba_code: str, VBA source code  
1979 - :return: list of str tuples (encoded string, decoded string)  
1980 - """  
1981 - # TODO: move this at the beginning of script  
1982 - from oletools.thirdparty.DridexUrlDecoder.DridexUrlDecoder import DridexUrlDecode  
1983 -  
1984 - results = []  
1985 - found = set()  
1986 - for match in re_dridex_string.finditer(vba_code):  
1987 - value = match.group()[1:-1]  
1988 - # check it is not just a hex string:  
1989 - if not re_nothex_check.search(value):  
1990 - continue  
1991 - if value not in found:  
1992 - try:  
1993 - decoded = DridexUrlDecode(value)  
1994 - results.append((value, decoded))  
1995 - found.add(value)  
1996 - except Exception as exc:  
1997 - log.debug('Failed to Dridex-decode (%s)' % exc)  
1998 - # if an exception occurs, it is likely not a dridex-encoded string  
1999 - return results  
2000 -  
2001 -  
2002 -def detect_vba_strings(vba_code):  
2003 - """  
2004 - Detect if the VBA code contains strings obfuscated with VBA expressions  
2005 - using keywords such as Chr, Asc, Val, StrReverse, etc.  
2006 -  
2007 - :param vba_code: str, VBA source code  
2008 - :return: list of str tuples (encoded string, decoded string)  
2009 - """  
2010 - # TODO: handle exceptions  
2011 - results = []  
2012 - found = set()  
2013 - # IMPORTANT: to extract the actual VBA expressions found in the code,  
2014 - # we must expand tabs to have the same string as pyparsing.  
2015 - # Otherwise, start and end offsets are incorrect.  
2016 - vba_code = vba_code.expandtabs()  
2017 - # Split the VBA code line by line to avoid MemoryError on large scripts:  
2018 - for vba_line in vba_code.splitlines():  
2019 - for tokens, start, end in vba_expr_str.scanString(vba_line):  
2020 - encoded = vba_line[start:end]  
2021 - decoded = tokens[0]  
2022 - if isinstance(decoded, VbaExpressionString):  
2023 - # This is a VBA expression, not a simple string  
2024 - # print 'VBA EXPRESSION: encoded=%r => decoded=%r' % (encoded, decoded)  
2025 - # remove parentheses and quotes from original string:  
2026 - # if encoded.startswith('(') and encoded.endswith(')'):  
2027 - # encoded = encoded[1:-1]  
2028 - # if encoded.startswith('"') and encoded.endswith('"'):  
2029 - # encoded = encoded[1:-1]  
2030 - # avoid duplicates and simple strings:  
2031 - if encoded not in found and decoded != encoded:  
2032 - results.append((encoded, decoded))  
2033 - found.add(encoded)  
2034 - # else:  
2035 - # print 'VBA STRING: encoded=%r => decoded=%r' % (encoded, decoded)  
2036 - return results  
2037 -  
2038 -  
2039 -def json2ascii(json_obj, encoding='utf8', errors='replace'):  
2040 - """ ensure there is no unicode in json and all strings are safe to decode  
2041 -  
2042 - works recursively, decodes and re-encodes every string to/from unicode  
2043 - to ensure there will be no trouble in loading the dumped json output  
2044 - """  
2045 - if json_obj is None:  
2046 - pass  
2047 - elif isinstance(json_obj, (bool, int, float)):  
2048 - pass  
2049 - elif isinstance(json_obj, str):  
2050 - # de-code and re-encode  
2051 - dencoded = json_obj  
2052 - if dencoded != json_obj:  
2053 - log.debug('json2ascii: replaced: {0} (len {1})'  
2054 - .format(json_obj, len(json_obj)))  
2055 - log.debug('json2ascii: with: {0} (len {1})'  
2056 - .format(dencoded, len(dencoded)))  
2057 - return dencoded  
2058 - elif isinstance(json_obj, bytes):  
2059 - log.debug('json2ascii: encode unicode: {0}'  
2060 - .format(json_obj.decode(encoding, errors)))  
2061 - # cannot put original into logger  
2062 - # print 'original: ' json_obj  
2063 - return json_obj.decode(encoding, errors)  
2064 - elif isinstance(json_obj, dict):  
2065 - for key in json_obj:  
2066 - json_obj[key] = json2ascii(json_obj[key])  
2067 - elif isinstance(json_obj, (list,tuple)):  
2068 - for item in json_obj:  
2069 - item = json2ascii(item)  
2070 - else:  
2071 - log.debug('unexpected type in json2ascii: {0} -- leave as is'  
2072 - .format(type(json_obj)))  
2073 - return json_obj  
2074 -  
2075 -  
2076 -def print_json(json_dict=None, _json_is_first=False, _json_is_last=False,  
2077 - **json_parts):  
2078 - """ line-wise print of json.dumps(json2ascii(..)) with options and indent+1  
2079 -  
2080 - can use in two ways:  
2081 - (1) print_json(some_dict)  
2082 - (2) print_json(key1=value1, key2=value2, ...)  
2083 -  
2084 - :param bool _json_is_first: set to True only for very first entry to complete  
2085 - the top-level json-list  
2086 - :param bool _json_is_last: set to True only for very last entry to complete  
2087 - the top-level json-list  
2088 - """  
2089 - if json_dict and json_parts:  
2090 - raise ValueError('Invalid json argument: want either single dict or '  
2091 - 'key=value parts but got both)')  
2092 - elif (json_dict is not None) and (not isinstance(json_dict, dict)):  
2093 - raise ValueError('Invalid json argument: want either single dict or '  
2094 - 'key=value parts but got {0} instead of dict)'  
2095 - .format(type(json_dict)))  
2096 - if json_parts:  
2097 - json_dict = json_parts  
2098 -  
2099 - if _json_is_first:  
2100 - print('[')  
2101 -  
2102 - lines = json.dumps(json2ascii(json_dict), check_circular=False,  
2103 - indent=4, ensure_ascii=False).splitlines()  
2104 - for line in lines[:-1]:  
2105 - print(' {0}'.format(line))  
2106 - if _json_is_last:  
2107 - print(' {0}'.format(lines[-1])) # print last line without comma  
2108 - print(']')  
2109 - else:  
2110 - print(' {0},'.format(lines[-1])) # print last line with comma  
2111 -  
2112 -  
2113 -class VBA_Scanner(object):  
2114 - """  
2115 - Class to scan the source code of a VBA module to find obfuscated strings,  
2116 - suspicious keywords, IOCs, auto-executable macros, etc.  
2117 - """  
2118 -  
2119 - def __init__(self, vba_code):  
2120 - """  
2121 - VBA_Scanner constructor  
2122 -  
2123 - :param vba_code: str, VBA source code to be analyzed  
2124 - """  
2125 - if isinstance(vba_code, bytes):  
2126 - vba_code = vba_code.decode('utf-8', 'backslashreplace')  
2127 - # join long lines ending with " _":  
2128 - self.code = vba_collapse_long_lines(vba_code)  
2129 - self.code_hex = ''  
2130 - self.code_hex_rev = ''  
2131 - self.code_rev_hex = ''  
2132 - self.code_base64 = ''  
2133 - self.code_dridex = ''  
2134 - self.code_vba = ''  
2135 - self.strReverse = None  
2136 - # results = None before scanning, then a list of tuples after scanning  
2137 - self.results = None  
2138 - self.autoexec_keywords = None  
2139 - self.suspicious_keywords = None  
2140 - self.iocs = None  
2141 - self.hex_strings = None  
2142 - self.base64_strings = None  
2143 - self.dridex_strings = None  
2144 - self.vba_strings = None  
2145 -  
2146 -  
2147 - def scan(self, include_decoded_strings=False, deobfuscate=False):  
2148 - """  
2149 - Analyze the provided VBA code to detect suspicious keywords,  
2150 - auto-executable macros, IOC patterns, obfuscation patterns  
2151 - such as hex-encoded strings.  
2152 -  
2153 - :param include_decoded_strings: bool, if True, all encoded strings will be included with their decoded content.  
2154 - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)  
2155 - :return: list of tuples (type, keyword, description)  
2156 - (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String')  
2157 - """  
2158 - # First, detect and extract hex-encoded strings:  
2159 - self.hex_strings = detect_hex_strings(self.code)  
2160 - # detect if the code contains StrReverse:  
2161 - self.strReverse = False  
2162 - if 'strreverse' in self.code.lower(): self.strReverse = True  
2163 - # Then append the decoded strings to the VBA code, to detect obfuscated IOCs and keywords:  
2164 - for encoded, decoded in self.hex_strings:  
2165 - self.code_hex += '\n' + decoded  
2166 - # if the code contains "StrReverse", also append the hex strings in reverse order:  
2167 - if self.strReverse:  
2168 - # StrReverse after hex decoding:  
2169 - self.code_hex_rev += '\n' + decoded[::-1]  
2170 - # StrReverse before hex decoding:  
2171 - self.code_rev_hex += '\n' + str(binascii.unhexlify(encoded[::-1]))  
2172 - #example: https://malwr.com/analysis/NmFlMGI4YTY1YzYyNDkwNTg1ZTBiZmY5OGI3YjlhYzU/  
2173 - #TODO: also append the full code reversed if StrReverse? (risk of false positives?)  
2174 - # Detect Base64-encoded strings  
2175 - self.base64_strings = detect_base64_strings(self.code)  
2176 - for encoded, decoded in self.base64_strings:  
2177 - self.code_base64 += '\n' + decoded  
2178 - # Detect Dridex-encoded strings  
2179 - self.dridex_strings = detect_dridex_strings(self.code)  
2180 - for encoded, decoded in self.dridex_strings:  
2181 - self.code_dridex += '\n' + decoded  
2182 - # Detect obfuscated strings in VBA expressions  
2183 - if deobfuscate:  
2184 - self.vba_strings = detect_vba_strings(self.code)  
2185 - else:  
2186 - self.vba_strings = []  
2187 - for encoded, decoded in self.vba_strings:  
2188 - self.code_vba += '\n' + decoded  
2189 - results = []  
2190 - self.autoexec_keywords = []  
2191 - self.suspicious_keywords = []  
2192 - self.iocs = []  
2193 -  
2194 - for code, obfuscation in (  
2195 - (self.code, None),  
2196 - (self.code_hex, 'Hex'),  
2197 - (self.code_hex_rev, 'Hex+StrReverse'),  
2198 - (self.code_rev_hex, 'StrReverse+Hex'),  
2199 - (self.code_base64, 'Base64'),  
2200 - (self.code_dridex, 'Dridex'),  
2201 - (self.code_vba, 'VBA expression'),  
2202 - ):  
2203 - if isinstance(code,bytes):  
2204 - code=code.decode('utf-8','backslashreplace')  
2205 - self.autoexec_keywords += detect_autoexec(code, obfuscation)  
2206 - self.suspicious_keywords += detect_suspicious(code, obfuscation)  
2207 - self.iocs += detect_patterns(code, obfuscation)  
2208 -  
2209 - # If hex-encoded strings were discovered, add an item to suspicious keywords:  
2210 - if self.hex_strings:  
2211 - self.suspicious_keywords.append(('Hex Strings',  
2212 - 'Hex-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)'))  
2213 - if self.base64_strings:  
2214 - self.suspicious_keywords.append(('Base64 Strings',  
2215 - 'Base64-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)'))  
2216 - if self.dridex_strings:  
2217 - self.suspicious_keywords.append(('Dridex Strings',  
2218 - 'Dridex-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)'))  
2219 - if self.vba_strings:  
2220 - self.suspicious_keywords.append(('VBA obfuscated Strings',  
2221 - 'VBA string expressions were detected, may be used to obfuscate strings (option --decode to see all)'))  
2222 - # use a set to avoid duplicate keywords  
2223 - keyword_set = set()  
2224 - for keyword, description in self.autoexec_keywords:  
2225 - if keyword not in keyword_set:  
2226 - results.append(('AutoExec', keyword, description))  
2227 - keyword_set.add(keyword)  
2228 - keyword_set = set()  
2229 - for keyword, description in self.suspicious_keywords:  
2230 - if keyword not in keyword_set:  
2231 - results.append(('Suspicious', keyword, description))  
2232 - keyword_set.add(keyword)  
2233 - keyword_set = set()  
2234 - for pattern_type, value in self.iocs:  
2235 - if value not in keyword_set:  
2236 - results.append(('IOC', value, pattern_type))  
2237 - keyword_set.add(value)  
2238 -  
2239 - # include decoded strings only if they are printable or if --decode option:  
2240 - for encoded, decoded in self.hex_strings:  
2241 - if include_decoded_strings or is_printable(decoded):  
2242 - results.append(('Hex String', decoded, encoded))  
2243 - for encoded, decoded in self.base64_strings:  
2244 - if include_decoded_strings or is_printable(decoded):  
2245 - results.append(('Base64 String', decoded, encoded))  
2246 - for encoded, decoded in self.dridex_strings:  
2247 - if include_decoded_strings or is_printable(decoded):  
2248 - results.append(('Dridex string', decoded, encoded))  
2249 - for encoded, decoded in self.vba_strings:  
2250 - if include_decoded_strings or is_printable(decoded):  
2251 - results.append(('VBA string', decoded, encoded))  
2252 - self.results = results  
2253 - return results  
2254 -  
2255 - def scan_summary(self):  
2256 - """  
2257 - Analyze the provided VBA code to detect suspicious keywords,  
2258 - auto-executable macros, IOC patterns, obfuscation patterns  
2259 - such as hex-encoded strings.  
2260 -  
2261 - :return: tuple with the number of items found for each category:  
2262 - (autoexec, suspicious, IOCs, hex, base64, dridex, vba)  
2263 - """  
2264 - # avoid scanning the same code twice:  
2265 - if self.results is None:  
2266 - self.scan()  
2267 - return (len(self.autoexec_keywords), len(self.suspicious_keywords),  
2268 - len(self.iocs), len(self.hex_strings), len(self.base64_strings),  
2269 - len(self.dridex_strings), len(self.vba_strings))  
2270 -  
2271 -  
2272 -def scan_vba(vba_code, include_decoded_strings, deobfuscate=False):  
2273 - """  
2274 - Analyze the provided VBA code to detect suspicious keywords,  
2275 - auto-executable macros, IOC patterns, obfuscation patterns  
2276 - such as hex-encoded strings.  
2277 - (shortcut for VBA_Scanner(vba_code).scan())  
2278 -  
2279 - :param vba_code: str, VBA source code to be analyzed  
2280 - :param include_decoded_strings: bool, if True all encoded strings will be included with their decoded content.  
2281 - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)  
2282 - :return: list of tuples (type, keyword, description)  
2283 - (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String')  
2284 - """  
2285 - return VBA_Scanner(vba_code).scan(include_decoded_strings, deobfuscate)  
2286 -  
2287 -  
2288 -#=== CLASSES =================================================================  
2289 -  
2290 -class VBA_Parser(object):  
2291 - """  
2292 - Class to parse MS Office files, to detect VBA macros and extract VBA source code  
2293 - Supported file formats:  
2294 - - Word 97-2003 (.doc, .dot)  
2295 - - Word 2007+ (.docm, .dotm)  
2296 - - Word 2003 XML (.xml)  
2297 - - Word MHT - Single File Web Page / MHTML (.mht)  
2298 - - Excel 97-2003 (.xls)  
2299 - - Excel 2007+ (.xlsm, .xlsb)  
2300 - - PowerPoint 97-2003 (.ppt)  
2301 - - PowerPoint 2007+ (.pptm, .ppsm)  
2302 - """  
2303 -  
2304 - def __init__(self, filename, data=None, container=None, relaxed=False):  
2305 - """  
2306 - Constructor for VBA_Parser  
2307 -  
2308 - :param filename: filename or path of file to parse, or file-like object  
2309 -  
2310 - :param data: None or bytes str, if None the file will be read from disk (or from the file-like object).  
2311 - If data is provided as a bytes string, it will be parsed as the content of the file in memory,  
2312 - and not read from disk. Note: files must be read in binary mode, i.e. open(f, 'rb').  
2313 -  
2314 - :param container: str, path and filename of container if the file is within  
2315 - a zip archive, None otherwise.  
2316 -  
2317 - :param relaxed: if True, treat mal-formed documents and missing streams more like MS office:  
2318 - do nothing; if False (default), raise errors in these cases  
2319 -  
2320 - raises a FileOpenError if all attemps to interpret the data header failed  
2321 - """  
2322 - #TODO: filename should only be a string, data should be used for the file-like object  
2323 - #TODO: filename should be mandatory, optional data is a string or file-like object  
2324 - #TODO: also support olefile and zipfile as input  
2325 - if data is None:  
2326 - # open file from disk:  
2327 - _file = filename  
2328 - else:  
2329 - # file already read in memory, make it a file-like object for zipfile:  
2330 - _file = BytesIO(data)  
2331 - #self.file = _file  
2332 - self.ole_file = None  
2333 - self.ole_subfiles = []  
2334 - self.filename = filename  
2335 - self.container = container  
2336 - self.relaxed = relaxed  
2337 - self.type = None  
2338 - self.vba_projects = None  
2339 - self.vba_forms = None  
2340 - self.contains_macros = None # will be set to True or False by detect_macros  
2341 - self.vba_code_all_modules = None # to store the source code of all modules  
2342 - # list of tuples for each module: (subfilename, stream_path, vba_filename, vba_code)  
2343 - self.modules = None  
2344 - # Analysis results: list of tuples (type, keyword, description) - See VBA_Scanner  
2345 - self.analysis_results = None  
2346 - # statistics for the scan summary and flags  
2347 - self.nb_macros = 0  
2348 - self.nb_autoexec = 0  
2349 - self.nb_suspicious = 0  
2350 - self.nb_iocs = 0  
2351 - self.nb_hexstrings = 0  
2352 - self.nb_base64strings = 0  
2353 - self.nb_dridexstrings = 0  
2354 - self.nb_vbastrings = 0  
2355 -  
2356 - # if filename is None:  
2357 - # if isinstance(_file, basestring):  
2358 - # if len(_file) < olefile.MINIMAL_OLEFILE_SIZE:  
2359 - # self.filename = _file  
2360 - # else:  
2361 - # self.filename = '<file in bytes string>'  
2362 - # else:  
2363 - # self.filename = '<file-like object>'  
2364 - if olefile.isOleFile(_file):  
2365 - # This looks like an OLE file  
2366 - self.open_ole(_file)  
2367 -  
2368 - # check whether file is encrypted (need to do this before try ppt)  
2369 - log.debug('Check encryption of ole file')  
2370 - crypt_indicator = oleid.OleID(self.ole_file).check_encrypted()  
2371 - if crypt_indicator.value:  
2372 - raise FileIsEncryptedError(filename)  
2373 -  
2374 - # if this worked, try whether it is a ppt file (special ole file)  
2375 - self.open_ppt()  
2376 - if self.type is None and is_zipfile(_file):  
2377 - # Zip file, which may be an OpenXML document  
2378 - self.open_openxml(_file)  
2379 - if self.type is None:  
2380 - # read file from disk, check if it is a Word 2003 XML file (WordProcessingML), Excel 2003 XML,  
2381 - # or a plain text file containing VBA code  
2382 - if data is None:  
2383 - with open(filename, 'rb') as file_handle:  
2384 - data = file_handle.read()  
2385 - # check if it is a Word 2003 XML file (WordProcessingML): must contain the namespace  
2386 - if b'http://schemas.microsoft.com/office/word/2003/wordml' in data:  
2387 - self.open_word2003xml(data)  
2388 - # check if it is a Word/PowerPoint 2007+ XML file (Flat OPC): must contain the namespace  
2389 - if b'http://schemas.microsoft.com/office/2006/xmlPackage' in data:  
2390 - self.open_flatopc(data)  
2391 - # store a lowercase version for the next tests:  
2392 - data_lowercase = data.lower()  
2393 - # check if it is a MHT file (MIME HTML, Word or Excel saved as "Single File Web Page"):  
2394 - # According to my tests, these files usually start with "MIME-Version: 1.0" on the 1st line  
2395 - # BUT Word accepts a blank line or other MIME headers inserted before,  
2396 - # and even whitespaces in between "MIME", "-", "Version" and ":". The version number is ignored.  
2397 - # And the line is case insensitive.  
2398 - # so we'll just check the presence of mime, version and multipart anywhere:  
2399 - if self.type is None and b'mime' in data_lowercase and b'version' in data_lowercase \  
2400 - and b'multipart' in data_lowercase:  
2401 - self.open_mht(data)  
2402 - #TODO: handle exceptions  
2403 - #TODO: Excel 2003 XML  
2404 - # Check whether this is rtf  
2405 - if rtfobj.is_rtf(data, treat_str_as_data=True):  
2406 - # Ignore RTF since it contains no macros and methods in here will not find macros  
2407 - # in embedded objects. run rtfobj and repeat on its output.  
2408 - msg = '%s is RTF, need to run rtfobj.py and find VBA Macros in its output.' % self.filename  
2409 - log.info(msg)  
2410 - raise FileOpenError(msg)  
2411 - # Check if this is a plain text VBA or VBScript file:  
2412 - # To avoid scanning binary files, we simply check for some control chars:  
2413 - if self.type is None and b'\x00' not in data:  
2414 - self.open_text(data)  
2415 - if self.type is None:  
2416 - # At this stage, could not match a known format:  
2417 - msg = '%s is not a supported file type, cannot extract VBA Macros.' % self.filename  
2418 - log.info(msg)  
2419 - raise FileOpenError(msg)  
2420 -  
2421 - def open_ole(self, _file):  
2422 - """  
2423 - Open an OLE file  
2424 - :param _file: filename or file contents in a file object  
2425 - :return: nothing  
2426 - """  
2427 - log.info('Opening OLE file %s' % self.filename)  
2428 - try:  
2429 - # Open and parse the OLE file, using unicode for path names:  
2430 - self.ole_file = olefile.OleFileIO(_file, path_encoding=None)  
2431 - # set type only if parsing succeeds  
2432 - self.type = TYPE_OLE  
2433 - except (IOError, TypeError, ValueError) as exc:  
2434 - # TODO: handle OLE parsing exceptions  
2435 - log.info('Failed OLE parsing for file %r (%s)' % (self.filename, exc))  
2436 - log.debug('Trace:', exc_info=True)  
2437 -  
2438 -  
2439 - def open_openxml(self, _file):  
2440 - """  
2441 - Open an OpenXML file  
2442 - :param _file: filename or file contents in a file object  
2443 - :return: nothing  
2444 - """  
2445 - # This looks like a zip file, need to look for vbaProject.bin inside  
2446 - # It can be any OLE file inside the archive  
2447 - #...because vbaProject.bin can be renamed:  
2448 - # see http://www.decalage.info/files/JCV07_Lagadec_OpenDocument_OpenXML_v4_decalage.pdf#page=18  
2449 - log.info('Opening ZIP/OpenXML file %s' % self.filename)  
2450 - try:  
2451 - z = zipfile.ZipFile(_file)  
2452 - #TODO: check if this is actually an OpenXML file  
2453 - #TODO: if the zip file is encrypted, suggest to use the -z option, or try '-z infected' automatically  
2454 - # check each file within the zip if it is an OLE file, by reading its magic:  
2455 - for subfile in z.namelist():  
2456 - with z.open(subfile) as file_handle:  
2457 - magic = file_handle.read(len(olefile.MAGIC))  
2458 - if magic == olefile.MAGIC:  
2459 - log.debug('Opening OLE file %s within zip' % subfile)  
2460 - with z.open(subfile) as file_handle:  
2461 - ole_data = file_handle.read()  
2462 - try:  
2463 - self.ole_subfiles.append(  
2464 - VBA_Parser(filename=subfile, data=ole_data,  
2465 - relaxed=self.relaxed))  
2466 - except OlevbaBaseException as exc:  
2467 - if self.relaxed:  
2468 - log.info('%s is not a valid OLE file (%s)' % (subfile, exc))  
2469 - log.debug('Trace:', exc_info=True)  
2470 - continue  
2471 - else:  
2472 - raise SubstreamOpenError(self.filename, subfile,  
2473 - exc)  
2474 - z.close()  
2475 - # set type only if parsing succeeds  
2476 - self.type = TYPE_OpenXML  
2477 - except OlevbaBaseException as exc:  
2478 - if self.relaxed:  
2479 - log.info('Error {0} caught in Zip/OpenXML parsing for file {1}'  
2480 - .format(exc, self.filename))  
2481 - log.debug('Trace:', exc_info=True)  
2482 - else:  
2483 - raise  
2484 - except (RuntimeError, zipfile.BadZipfile, zipfile.LargeZipFile, IOError) as exc:  
2485 - # TODO: handle parsing exceptions  
2486 - log.info('Failed Zip/OpenXML parsing for file %r (%s)'  
2487 - % (self.filename, exc))  
2488 - log.debug('Trace:', exc_info=True)  
2489 -  
2490 - def open_word2003xml(self, data):  
2491 - """  
2492 - Open a Word 2003 XML file  
2493 - :param data: file contents in a string or bytes  
2494 - :return: nothing  
2495 - """  
2496 - log.info('Opening Word 2003 XML file %s' % self.filename)  
2497 - try:  
2498 - # parse the XML content  
2499 - # TODO: handle XML parsing exceptions  
2500 - et = ET.fromstring(data)  
2501 - # find all the binData elements:  
2502 - for bindata in et.getiterator(TAG_BINDATA):  
2503 - # the binData content is an OLE container for the VBA project, compressed  
2504 - # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded.  
2505 - # get the filename:  
2506 - fname = bindata.get(ATTR_NAME, 'noname.mso')  
2507 - # decode the base64 activemime  
2508 - mso_data = binascii.a2b_base64(bindata.text)  
2509 - if is_mso_file(mso_data):  
2510 - # decompress the zlib data stored in the MSO file, which is the OLE container:  
2511 - # TODO: handle different offsets => separate function  
2512 - try:  
2513 - ole_data = mso_file_extract(mso_data)  
2514 - self.ole_subfiles.append(  
2515 - VBA_Parser(filename=fname, data=ole_data,  
2516 - relaxed=self.relaxed))  
2517 - except OlevbaBaseException as exc:  
2518 - if self.relaxed:  
2519 - log.info('Error parsing subfile {0}: {1}'  
2520 - .format(fname, exc))  
2521 - log.debug('Trace:', exc_info=True)  
2522 - else:  
2523 - raise SubstreamOpenError(self.filename, fname, exc)  
2524 - else:  
2525 - log.info('%s is not a valid MSO file' % fname)  
2526 - # set type only if parsing succeeds  
2527 - self.type = TYPE_Word2003_XML  
2528 - except OlevbaBaseException as exc:  
2529 - if self.relaxed:  
2530 - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc))  
2531 - log.debug('Trace:', exc_info=True)  
2532 - else:  
2533 - raise  
2534 - except Exception as exc:  
2535 - # TODO: differentiate exceptions for each parsing stage  
2536 - # (but ET is different libs, no good exception description in API)  
2537 - # found: XMLSyntaxError  
2538 - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc))  
2539 - log.debug('Trace:', exc_info=True)  
2540 -  
2541 - def open_flatopc(self, data):  
2542 - """  
2543 - Open a Word or PowerPoint 2007+ XML file, aka "Flat OPC"  
2544 - :param data: file contents in a string or bytes  
2545 - :return: nothing  
2546 - """  
2547 - log.info('Opening Flat OPC Word/PowerPoint XML file %s' % self.filename)  
2548 - try:  
2549 - # parse the XML content  
2550 - # TODO: handle XML parsing exceptions  
2551 - et = ET.fromstring(data)  
2552 - # TODO: check root node namespace and tag  
2553 - # find all the pkg:part elements:  
2554 - for pkgpart in et.iter(TAG_PKGPART):  
2555 - fname = pkgpart.get(ATTR_PKG_NAME, 'unknown')  
2556 - content_type = pkgpart.get(ATTR_PKG_CONTENTTYPE, 'unknown')  
2557 - if content_type == CTYPE_VBAPROJECT:  
2558 - for bindata in pkgpart.iterfind(TAG_PKGBINDATA):  
2559 - try:  
2560 - ole_data = binascii.a2b_base64(bindata.text)  
2561 - self.ole_subfiles.append(  
2562 - VBA_Parser(filename=fname, data=ole_data,  
2563 - relaxed=self.relaxed))  
2564 - except OlevbaBaseException as exc:  
2565 - if self.relaxed:  
2566 - log.info('Error parsing subfile {0}: {1}'  
2567 - .format(fname, exc))  
2568 - log.debug('Trace:', exc_info=True)  
2569 - else:  
2570 - raise SubstreamOpenError(self.filename, fname, exc)  
2571 - # set type only if parsing succeeds  
2572 - self.type = TYPE_FlatOPC_XML  
2573 - except OlevbaBaseException as exc:  
2574 - if self.relaxed:  
2575 - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc))  
2576 - log.debug('Trace:', exc_info=True)  
2577 - else:  
2578 - raise  
2579 - except Exception as exc:  
2580 - # TODO: differentiate exceptions for each parsing stage  
2581 - # (but ET is different libs, no good exception description in API)  
2582 - # found: XMLSyntaxError  
2583 - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc))  
2584 - log.debug('Trace:', exc_info=True)  
2585 -  
2586 - def open_mht(self, data):  
2587 - """  
2588 - Open a MHTML file  
2589 - :param data: file contents in a string or bytes  
2590 - :return: nothing  
2591 - """  
2592 - log.info('Opening MHTML file %s' % self.filename)  
2593 - try:  
2594 - if isinstance(data,bytes):  
2595 - data = data.decode('utf8', 'backslashreplace')  
2596 - # parse the MIME content  
2597 - # remove any leading whitespace or newline (workaround for issue in email package)  
2598 - stripped_data = data.lstrip('\r\n\t ')  
2599 - # strip any junk from the beginning of the file  
2600 - # (issue #31 fix by Greg C - gdigreg)  
2601 - # TODO: improve keywords to avoid false positives  
2602 - mime_offset = stripped_data.find('MIME')  
2603 - content_offset = stripped_data.find('Content')  
2604 - # if "MIME" is found, and located before "Content":  
2605 - if -1 < mime_offset <= content_offset:  
2606 - stripped_data = stripped_data[mime_offset:]  
2607 - # else if "Content" is found, and before "MIME"  
2608 - # TODO: can it work without "MIME" at all?  
2609 - elif content_offset > -1:  
2610 - stripped_data = stripped_data[content_offset:]  
2611 - # TODO: quick and dirty fix: insert a standard line with MIME-Version header?  
2612 - mhtml = email.message_from_string(stripped_data)  
2613 - # find all the attached files:  
2614 - for part in mhtml.walk():  
2615 - content_type = part.get_content_type() # always returns a value  
2616 - fname = part.get_filename(None) # returns None if it fails  
2617 - # TODO: get content-location if no filename  
2618 - log.debug('MHTML part: filename=%r, content-type=%r' % (fname, content_type))  
2619 - part_data = part.get_payload(decode=True)  
2620 - # VBA macros are stored in a binary file named "editdata.mso".  
2621 - # the data content is an OLE container for the VBA project, compressed  
2622 - # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded.  
2623 - # decompress the zlib data starting at offset 0x32, which is the OLE container:  
2624 - # check ActiveMime header:  
2625 -  
2626 - if (isinstance(part_data, str) or isinstance(part_data, bytes)) and is_mso_file(part_data):  
2627 - log.debug('Found ActiveMime header, decompressing MSO container')  
2628 - try:  
2629 - ole_data = mso_file_extract(part_data)  
2630 -  
2631 - # TODO: check if it is actually an OLE file  
2632 - # TODO: get the MSO filename from content_location?  
2633 - self.ole_subfiles.append(  
2634 - VBA_Parser(filename=fname, data=ole_data,  
2635 - relaxed=self.relaxed))  
2636 - except OlevbaBaseException as exc:  
2637 - if self.relaxed:  
2638 - log.info('%s does not contain a valid OLE file (%s)'  
2639 - % (fname, exc))  
2640 - log.debug('Trace:', exc_info=True)  
2641 - # TODO: bug here - need to split in smaller functions/classes?  
2642 - else:  
2643 - raise SubstreamOpenError(self.filename, fname, exc)  
2644 - else:  
2645 - log.debug('type(part_data) = %s' % type(part_data))  
2646 - try:  
2647 - log.debug('part_data[0:20] = %r' % part_data[0:20])  
2648 - except TypeError as err:  
2649 - log.debug('part_data has no __getitem__')  
2650 - # set type only if parsing succeeds  
2651 - self.type = TYPE_MHTML  
2652 - except OlevbaBaseException:  
2653 - raise  
2654 - except Exception:  
2655 - log.info('Failed MIME parsing for file %r - %s'  
2656 - % (self.filename, MSG_OLEVBA_ISSUES))  
2657 - log.debug('Trace:', exc_info=True)  
2658 -  
2659 - def open_ppt(self):  
2660 - """ try to interpret self.ole_file as PowerPoint 97-2003 using PptParser  
2661 -  
2662 - Although self.ole_file is a valid olefile.OleFileIO, we set  
2663 - self.ole_file = None in here and instead set self.ole_subfiles to the  
2664 - VBA ole streams found within the main ole file. That makes most of the  
2665 - code below treat this like an OpenXML file and only look at the  
2666 - ole_subfiles (except find_vba_* which needs to explicitly check for  
2667 - self.type)  
2668 - """  
2669 -  
2670 - log.info('Check whether OLE file is PPT')  
2671 - try:  
2672 - ppt = ppt_parser.PptParser(self.ole_file, fast_fail=True)  
2673 - for vba_data in ppt.iter_vba_data():  
2674 - self.ole_subfiles.append(VBA_Parser(None, vba_data,  
2675 - container='PptParser'))  
2676 - log.info('File is PPT')  
2677 - self.ole_file.close() # just in case  
2678 - self.ole_file = None # required to make other methods look at ole_subfiles  
2679 - self.type = TYPE_PPT  
2680 - except Exception as exc:  
2681 - if self.container == 'PptParser':  
2682 - # this is a subfile of a ppt --> to be expected that is no ppt  
2683 - log.debug('PPT subfile is not a PPT file')  
2684 - else:  
2685 - log.debug("File appears not to be a ppt file (%s)" % exc)  
2686 -  
2687 -  
2688 - def open_text(self, data):  
2689 - """  
2690 - Open a text file containing VBA or VBScript source code  
2691 - :param data: file contents in a string or bytes  
2692 - :return: nothing  
2693 - """  
2694 - log.info('Opening text file %s' % self.filename)  
2695 - # directly store the source code:  
2696 - if isinstance(data,bytes):  
2697 - data=data.decode('utf8','backslashreplace')  
2698 - self.vba_code_all_modules = data  
2699 - self.contains_macros = True  
2700 - # set type only if parsing succeeds  
2701 - self.type = TYPE_TEXT  
2702 -  
2703 -  
2704 - def find_vba_projects(self):  
2705 - """  
2706 - Finds all the VBA projects stored in an OLE file.  
2707 -  
2708 - Return None if the file is not OLE but OpenXML.  
2709 - Return a list of tuples (vba_root, project_path, dir_path) for each VBA project.  
2710 - vba_root is the path of the root OLE storage containing the VBA project,  
2711 - including a trailing slash unless it is the root of the OLE file.  
2712 - project_path is the path of the OLE stream named "PROJECT" within the VBA project.  
2713 - dir_path is the path of the OLE stream named "VBA/dir" within the VBA project.  
2714 -  
2715 - If this function returns an empty list for one of the supported formats  
2716 - (i.e. Word, Excel, Powerpoint), then the file does not contain VBA macros.  
2717 -  
2718 - :return: None if OpenXML file, list of tuples (vba_root, project_path, dir_path)  
2719 - for each VBA project found if OLE file  
2720 - """  
2721 - log.debug('VBA_Parser.find_vba_projects')  
2722 -  
2723 - # if the file is not OLE but OpenXML, return None:  
2724 - if self.ole_file is None and self.type != TYPE_PPT:  
2725 - return None  
2726 -  
2727 - # if this method has already been called, return previous result:  
2728 - if self.vba_projects is not None:  
2729 - return self.vba_projects  
2730 -  
2731 - # if this is a ppt file (PowerPoint 97-2003):  
2732 - # self.ole_file is None but the ole_subfiles do contain vba_projects  
2733 - # (like for OpenXML files).  
2734 - if self.type == TYPE_PPT:  
2735 - # TODO: so far, this function is never called for PPT files, but  
2736 - # if that happens, the information is lost which ole file contains  
2737 - # which storage!  
2738 - log.warning('Returned info is not complete for PPT types!')  
2739 - self.vba_projects = []  
2740 - for subfile in self.ole_subfiles:  
2741 - self.vba_projects.extend(subfile.find_vba_projects())  
2742 - return self.vba_projects  
2743 -  
2744 - # Find the VBA project root (different in MS Word, Excel, etc):  
2745 - # - Word 97-2003: Macros  
2746 - # - Excel 97-2003: _VBA_PROJECT_CUR  
2747 - # - PowerPoint 97-2003: PptParser has identified ole_subfiles  
2748 - # - Word 2007+: word/vbaProject.bin in zip archive, then the VBA project is the root of vbaProject.bin.  
2749 - # - Excel 2007+: xl/vbaProject.bin in zip archive, then same as Word  
2750 - # - PowerPoint 2007+: ppt/vbaProject.bin in zip archive, then same as Word  
2751 - # - Visio 2007: not supported yet (different file structure)  
2752 -  
2753 - # According to MS-OVBA section 2.2.1:  
2754 - # - the VBA project root storage MUST contain a VBA storage and a PROJECT stream  
2755 - # - The root/VBA storage MUST contain a _VBA_PROJECT stream and a dir stream  
2756 - # - all names are case-insensitive  
2757 -  
2758 - def check_vba_stream(ole, vba_root, stream_path):  
2759 - full_path = vba_root + stream_path  
2760 - if ole.exists(full_path) and ole.get_type(full_path) == olefile.STGTY_STREAM:  
2761 - log.debug('Found %s stream: %s' % (stream_path, full_path))  
2762 - return full_path  
2763 - else:  
2764 - log.debug('Missing %s stream, this is not a valid VBA project structure' % stream_path)  
2765 - return False  
2766 -  
2767 - # start with an empty list:  
2768 - self.vba_projects = []  
2769 - # Look for any storage containing those storage/streams:  
2770 - ole = self.ole_file  
2771 - for storage in ole.listdir(streams=False, storages=True):  
2772 - log.debug('Checking storage %r' % storage)  
2773 - # Look for a storage ending with "VBA":  
2774 - if storage[-1].upper() == 'VBA':  
2775 - log.debug('Found VBA storage: %s' % ('/'.join(storage)))  
2776 - vba_root = '/'.join(storage[:-1])  
2777 - # Add a trailing slash to vba_root, unless it is the root of the OLE file:  
2778 - # (used later to append all the child streams/storages)  
2779 - if vba_root != '':  
2780 - vba_root += '/'  
2781 - log.debug('Checking vba_root="%s"' % vba_root)  
2782 -  
2783 - # Check if the VBA root storage also contains a PROJECT stream:  
2784 - project_path = check_vba_stream(ole, vba_root, 'PROJECT')  
2785 - if not project_path: continue  
2786 - # Check if the VBA root storage also contains a VBA/_VBA_PROJECT stream:  
2787 - vba_project_path = check_vba_stream(ole, vba_root, 'VBA/_VBA_PROJECT')  
2788 - if not vba_project_path: continue  
2789 - # Check if the VBA root storage also contains a VBA/dir stream:  
2790 - dir_path = check_vba_stream(ole, vba_root, 'VBA/dir')  
2791 - if not dir_path: continue  
2792 - # Now we are pretty sure it is a VBA project structure  
2793 - log.debug('VBA root storage: "%s"' % vba_root)  
2794 - # append the results to the list as a tuple for later use:  
2795 - self.vba_projects.append((vba_root, project_path, dir_path))  
2796 - return self.vba_projects  
2797 -  
2798 - def detect_vba_macros(self):  
2799 - """  
2800 - Detect the potential presence of VBA macros in the file, by checking  
2801 - if it contains VBA projects. Both OLE and OpenXML files are supported.  
2802 -  
2803 - Important: for now, results are accurate only for Word, Excel and PowerPoint  
2804 -  
2805 - Note: this method does NOT attempt to check the actual presence or validity  
2806 - of VBA macro source code, so there might be false positives.  
2807 - It may also detect VBA macros in files embedded within the main file,  
2808 - for example an Excel workbook with macros embedded into a Word  
2809 - document without macros may be detected, without distinction.  
2810 -  
2811 - :return: bool, True if at least one VBA project has been found, False otherwise  
2812 - """  
2813 - #TODO: return None or raise exception if format not supported  
2814 - #TODO: return the number of VBA projects found instead of True/False?  
2815 - # if this method was already called, return the previous result:  
2816 - if self.contains_macros is not None:  
2817 - return self.contains_macros  
2818 - # if OpenXML/PPT, check all the OLE subfiles:  
2819 - if self.ole_file is None:  
2820 - for ole_subfile in self.ole_subfiles:  
2821 - if ole_subfile.detect_vba_macros():  
2822 - self.contains_macros = True  
2823 - return True  
2824 - # otherwise, no macro found:  
2825 - self.contains_macros = False  
2826 - return False  
2827 - # otherwise it's an OLE file, find VBA projects:  
2828 - vba_projects = self.find_vba_projects()  
2829 - if len(vba_projects) == 0:  
2830 - self.contains_macros = False  
2831 - else:  
2832 - self.contains_macros = True  
2833 - # Also look for VBA code in any stream including orphans  
2834 - # (happens in some malformed files)  
2835 - ole = self.ole_file  
2836 - for sid in xrange(len(ole.direntries)):  
2837 - # check if id is already done above:  
2838 - log.debug('Checking DirEntry #%d' % sid)  
2839 - d = ole.direntries[sid]  
2840 - if d is None:  
2841 - # this direntry is not part of the tree: either unused or an orphan  
2842 - d = ole._load_direntry(sid)  
2843 - log.debug('This DirEntry is an orphan or unused')  
2844 - if d.entry_type == olefile.STGTY_STREAM:  
2845 - # read data  
2846 - log.debug('Reading data from stream %r - size: %d bytes' % (d.name, d.size))  
2847 - try:  
2848 - data = ole._open(d.isectStart, d.size).read()  
2849 - log.debug('Read %d bytes' % len(data))  
2850 - if len(data) > 200:  
2851 - log.debug('%r...[much more data]...%r' % (data[:100], data[-50:]))  
2852 - else:  
2853 - log.debug(repr(data))  
2854 - if 'Attribut\x00' in data.decode('utf-8', 'ignore'):  
2855 - log.debug('Found VBA compressed code')  
2856 - self.contains_macros = True  
2857 - except IOError as exc:  
2858 - if self.relaxed:  
2859 - log.info('Error when reading OLE Stream %r' % d.name)  
2860 - log.debug('Trace:', exc_trace=True)  
2861 - else:  
2862 - raise SubstreamOpenError(self.filename, d.name, exc)  
2863 - return self.contains_macros  
2864 -  
2865 - def extract_macros(self):  
2866 - """  
2867 - Extract and decompress source code for each VBA macro found in the file  
2868 -  
2869 - Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found  
2870 - If the file is OLE, filename is the path of the file.  
2871 - If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros  
2872 - within the zip archive, e.g. word/vbaProject.bin.  
2873 - If the file is PPT, result is as for OpenXML but filename is useless  
2874 - """  
2875 - log.debug('extract_macros:')  
2876 - if self.ole_file is None:  
2877 - # This may be either an OpenXML/PPT or a text file:  
2878 - if self.type == TYPE_TEXT:  
2879 - # This is a text file, yield the full code:  
2880 - yield (self.filename, '', self.filename, self.vba_code_all_modules)  
2881 - else:  
2882 - # OpenXML/PPT: recursively yield results from each OLE subfile:  
2883 - for ole_subfile in self.ole_subfiles:  
2884 - for results in ole_subfile.extract_macros():  
2885 - yield results  
2886 - else:  
2887 - # This is an OLE file:  
2888 - self.find_vba_projects()  
2889 - # set of stream ids  
2890 - vba_stream_ids = set()  
2891 - for vba_root, project_path, dir_path in self.vba_projects:  
2892 - # extract all VBA macros from that VBA root storage:  
2893 - # The function _extract_vba may fail on some files (issue #132)  
2894 - try:  
2895 - for stream_path, vba_filename, vba_code in \  
2896 - _extract_vba(self.ole_file, vba_root, project_path,  
2897 - dir_path, self.relaxed):  
2898 - # store direntry ids in a set:  
2899 - vba_stream_ids.add(self.ole_file._find(stream_path))  
2900 - yield (self.filename, stream_path, vba_filename, vba_code)  
2901 - except Exception as e:  
2902 - log.exception('Error in _extract_vba')  
2903 - # Also look for VBA code in any stream including orphans  
2904 - # (happens in some malformed files)  
2905 - ole = self.ole_file  
2906 - for sid in xrange(len(ole.direntries)):  
2907 - # check if id is already done above:  
2908 - log.debug('Checking DirEntry #%d' % sid)  
2909 - if sid in vba_stream_ids:  
2910 - log.debug('Already extracted')  
2911 - continue  
2912 - d = ole.direntries[sid]  
2913 - if d is None:  
2914 - # this direntry is not part of the tree: either unused or an orphan  
2915 - d = ole._load_direntry(sid)  
2916 - log.debug('This DirEntry is an orphan or unused')  
2917 - if d.entry_type == olefile.STGTY_STREAM:  
2918 - # read data  
2919 - log.debug('Reading data from stream %r' % d.name)  
2920 - data = ole._open(d.isectStart, d.size).read()  
2921 - for match in re.finditer(b'\\x00Attribut[^e]', data, flags=re.IGNORECASE):  
2922 - start = match.start() - 3  
2923 - log.debug('Found VBA compressed code at index %X' % start)  
2924 - compressed_code = data[start:]  
2925 - try:  
2926 - vba_code = decompress_stream(compressed_code)  
2927 - yield (self.filename, d.name, d.name, vba_code)  
2928 - except Exception as exc:  
2929 - # display the exception with full stack trace for debugging  
2930 - log.debug('Error processing stream %r in file %r (%s)' % (d.name, self.filename, exc))  
2931 - log.debug('Traceback:', exc_info=True)  
2932 - # do not raise the error, as it is unlikely to be a compressed macro stream  
2933 -  
2934 - def extract_all_macros(self):  
2935 - """  
2936 - Extract and decompress source code for each VBA macro found in the file  
2937 - by calling extract_macros(), store the results as a list of tuples  
2938 - (filename, stream_path, vba_filename, vba_code) in self.modules.  
2939 - See extract_macros for details.  
2940 - """  
2941 - if self.modules is None:  
2942 - self.modules = []  
2943 - for (subfilename, stream_path, vba_filename, vba_code) in self.extract_macros():  
2944 - self.modules.append((subfilename, stream_path, vba_filename, vba_code))  
2945 - self.nb_macros = len(self.modules)  
2946 - return self.modules  
2947 -  
2948 -  
2949 -  
2950 - def analyze_macros(self, show_decoded_strings=False, deobfuscate=False):  
2951 - """  
2952 - runs extract_macros and analyze the source code of all VBA macros  
2953 - found in the file.  
2954 - """  
2955 - if self.detect_vba_macros():  
2956 - # if the analysis was already done, avoid doing it twice:  
2957 - if self.analysis_results is not None:  
2958 - return self.analysis_results  
2959 - # variable to merge source code from all modules:  
2960 - if self.vba_code_all_modules is None:  
2961 - self.vba_code_all_modules = ''  
2962 - for (_, _, _, vba_code) in self.extract_all_macros():  
2963 - #TODO: filter code? (each module)  
2964 - if isinstance(vba_code, bytes):  
2965 - vba_code = vba_code.decode('utf-8', 'ignore')  
2966 - self.vba_code_all_modules += vba_code + '\n'  
2967 - for (_, _, form_string) in self.extract_form_strings():  
2968 - self.vba_code_all_modules += form_string.decode('utf-8', 'ignore') + '\n'  
2969 - # Analyze the whole code at once:  
2970 - scanner = VBA_Scanner(self.vba_code_all_modules)  
2971 - self.analysis_results = scanner.scan(show_decoded_strings, deobfuscate)  
2972 - autoexec, suspicious, iocs, hexstrings, base64strings, dridex, vbastrings = scanner.scan_summary()  
2973 - self.nb_autoexec += autoexec  
2974 - self.nb_suspicious += suspicious  
2975 - self.nb_iocs += iocs  
2976 - self.nb_hexstrings += hexstrings  
2977 - self.nb_base64strings += base64strings  
2978 - self.nb_dridexstrings += dridex  
2979 - self.nb_vbastrings += vbastrings  
2980 -  
2981 - return self.analysis_results  
2982 -  
2983 -  
2984 - def reveal(self):  
2985 - # we only want printable strings:  
2986 - analysis = self.analyze_macros(show_decoded_strings=False)  
2987 - # to avoid replacing short strings contained into longer strings, we sort the analysis results  
2988 - # based on the length of the encoded string, in reverse order:  
2989 - analysis = sorted(analysis, key=lambda type_decoded_encoded: len(type_decoded_encoded[2]), reverse=True)  
2990 - # normally now self.vba_code_all_modules contains source code from all modules  
2991 - # Need to collapse long lines:  
2992 - deobf_code = vba_collapse_long_lines(self.vba_code_all_modules)  
2993 - deobf_code = filter_vba(deobf_code)  
2994 - for kw_type, decoded, encoded in analysis:  
2995 - if kw_type == 'VBA string':  
2996 - #print '%3d occurences: %r => %r' % (deobf_code.count(encoded), encoded, decoded)  
2997 - # need to add double quotes around the decoded strings  
2998 - # after escaping double-quotes as double-double-quotes for VBA:  
2999 - decoded = decoded.replace('"', '""')  
3000 - decoded = '"%s"' % decoded  
3001 - # if the encoded string is enclosed in parentheses,  
3002 - # keep them in the decoded version:  
3003 - if encoded.startswith('(') and encoded.endswith(')'):  
3004 - decoded = '(%s)' % decoded  
3005 - deobf_code = deobf_code.replace(encoded, decoded)  
3006 - # # TODO: there is a bug somewhere which creates double returns '\r\r'  
3007 - # deobf_code = deobf_code.replace('\r\r', '\r')  
3008 - return deobf_code  
3009 - #TODO: repasser l'analyse plusieurs fois si des chaines hex ou base64 sont revelees  
3010 -  
3011 -  
3012 - def find_vba_forms(self):  
3013 - """  
3014 - Finds all the VBA forms stored in an OLE file.  
3015 -  
3016 - Return None if the file is not OLE but OpenXML.  
3017 - Return a list of tuples (vba_root, project_path, dir_path) for each VBA project.  
3018 - vba_root is the path of the root OLE storage containing the VBA project,  
3019 - including a trailing slash unless it is the root of the OLE file.  
3020 - project_path is the path of the OLE stream named "PROJECT" within the VBA project.  
3021 - dir_path is the path of the OLE stream named "VBA/dir" within the VBA project.  
3022 -  
3023 - If this function returns an empty list for one of the supported formats  
3024 - (i.e. Word, Excel, Powerpoint), then the file does not contain VBA forms.  
3025 -  
3026 - :return: None if OpenXML file, list of tuples (vba_root, project_path, dir_path)  
3027 - for each VBA project found if OLE file  
3028 - """  
3029 - log.debug('VBA_Parser.find_vba_forms')  
3030 -  
3031 - # if the file is not OLE but OpenXML, return None:  
3032 - if self.ole_file is None and self.type != TYPE_PPT:  
3033 - return None  
3034 -  
3035 - # if this method has already been called, return previous result:  
3036 - # if self.vba_projects is not None:  
3037 - # return self.vba_projects  
3038 -  
3039 - # According to MS-OFORMS section 2.1.2 Control Streams:  
3040 - # - A parent control, that is, a control that can contain embedded controls,  
3041 - # MUST be persisted as a storage that contains multiple streams.  
3042 - # - All parent controls MUST contain a FormControl. The FormControl  
3043 - # properties are persisted to a stream (1) as specified in section 2.1.1.2.  
3044 - # The name of this stream (1) MUST be "f".  
3045 - # - Embedded controls that cannot themselves contain other embedded  
3046 - # controls are persisted sequentially as FormEmbeddedActiveXControls  
3047 - # to a stream (1) contained in the same storage as the parent control.  
3048 - # The name of this stream (1) MUST be "o".  
3049 - # - all names are case-insensitive  
3050 -  
3051 - if self.type == TYPE_PPT:  
3052 - # TODO: so far, this function is never called for PPT files, but  
3053 - # if that happens, the information is lost which ole file contains  
3054 - # which storage!  
3055 - ole_files = self.ole_subfiles  
3056 - log.warning('Returned info is not complete for PPT types!')  
3057 - else:  
3058 - ole_files = [self.ole_file, ]  
3059 -  
3060 - # start with an empty list:  
3061 - self.vba_forms = []  
3062 -  
3063 - # Loop over ole streams  
3064 - for ole in ole_files:  
3065 - # Look for any storage containing those storage/streams:  
3066 - for storage in ole.listdir(streams=False, storages=True):  
3067 - log.debug('Checking storage %r' % storage)  
3068 - # Look for two streams named 'o' and 'f':  
3069 - o_stream = storage + ['o']  
3070 - f_stream = storage + ['f']  
3071 - log.debug('Checking if streams %r and %r exist' % (f_stream, o_stream))  
3072 - if ole.exists(o_stream) and ole.get_type(o_stream) == olefile.STGTY_STREAM \  
3073 - and ole.exists(f_stream) and ole.get_type(f_stream) == olefile.STGTY_STREAM:  
3074 - form_path = '/'.join(storage)  
3075 - log.debug('Found VBA Form: %r' % form_path)  
3076 - self.vba_forms.append(storage)  
3077 - return self.vba_forms  
3078 -  
3079 - def extract_form_strings(self):  
3080 - """  
3081 - Extract printable strings from each VBA Form found in the file  
3082 -  
3083 - Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found  
3084 - If the file is OLE, filename is the path of the file.  
3085 - If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros  
3086 - within the zip archive, e.g. word/vbaProject.bin.  
3087 - If the file is PPT, result is as for OpenXML but filename is useless  
3088 - """  
3089 - if self.ole_file is None:  
3090 - # This may be either an OpenXML/PPT or a text file:  
3091 - if self.type == TYPE_TEXT:  
3092 - # This is a text file, return no results:  
3093 - return  
3094 - else:  
3095 - # OpenXML/PPT: recursively yield results from each OLE subfile:  
3096 - for ole_subfile in self.ole_subfiles:  
3097 - for results in ole_subfile.extract_form_strings():  
3098 - yield results  
3099 - else:  
3100 - # This is an OLE file:  
3101 - self.find_vba_forms()  
3102 - ole = self.ole_file  
3103 - for form_storage in self.vba_forms:  
3104 - o_stream = form_storage + ['o']  
3105 - log.debug('Opening form object stream %r' % '/'.join(o_stream))  
3106 - form_data = ole.openstream(o_stream).read()  
3107 - # Extract printable strings from the form object stream "o":  
3108 - for m in re_printable_string.finditer(form_data):  
3109 - log.debug('Printable string found in form: %r' % m.group())  
3110 - yield (self.filename, '/'.join(o_stream), m.group())  
3111 -  
3112 -  
3113 - def close(self):  
3114 - """  
3115 - Close all the open files. This method must be called after usage, if  
3116 - the application is opening many files.  
3117 - """  
3118 - if self.ole_file is None:  
3119 - if self.ole_subfiles is not None:  
3120 - for ole_subfile in self.ole_subfiles:  
3121 - ole_subfile.close()  
3122 - else:  
3123 - self.ole_file.close()  
3124 -  
3125 -  
3126 -  
3127 -class VBA_Parser_CLI(VBA_Parser):  
3128 - """  
3129 - VBA parser and analyzer, adding methods for the command line interface  
3130 - of olevba. (see VBA_Parser)  
3131 - """  
3132 -  
3133 - def __init__(self, *args, **kwargs):  
3134 - """  
3135 - Constructor for VBA_Parser_CLI.  
3136 - Calls __init__ from VBA_Parser with all arguments --> see doc there  
3137 - """  
3138 - super(VBA_Parser_CLI, self).__init__(*args, **kwargs)  
3139 -  
3140 -  
3141 - def print_analysis(self, show_decoded_strings=False, deobfuscate=False):  
3142 - """  
3143 - Analyze the provided VBA code, and print the results in a table  
3144 -  
3145 - :param vba_code: str, VBA source code to be analyzed  
3146 - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.  
3147 - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)  
3148 - :return: None  
3149 - """  
3150 - # print a waiting message only if the output is not redirected to a file:  
3151 - if sys.stdout.isatty():  
3152 - print('Analysis...\r', end='')  
3153 - sys.stdout.flush()  
3154 - results = self.analyze_macros(show_decoded_strings, deobfuscate)  
3155 - if results:  
3156 - t = prettytable.PrettyTable(('Type', 'Keyword', 'Description'))  
3157 - t.align = 'l'  
3158 - t.max_width['Type'] = 10  
3159 - t.max_width['Keyword'] = 20  
3160 - t.max_width['Description'] = 39  
3161 - for kw_type, keyword, description in results:  
3162 - # handle non printable strings:  
3163 - if not is_printable(keyword):  
3164 - keyword = repr(keyword)  
3165 - if not is_printable(description):  
3166 - description = repr(description)  
3167 - t.add_row((kw_type, keyword, description))  
3168 - print(t)  
3169 - else:  
3170 - print('No suspicious keyword or IOC found.')  
3171 -  
3172 - def print_analysis_json(self, show_decoded_strings=False, deobfuscate=False):  
3173 - """  
3174 - Analyze the provided VBA code, and return the results in json format  
3175 -  
3176 - :param vba_code: str, VBA source code to be analyzed  
3177 - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.  
3178 - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)  
3179 -  
3180 - :return: dict  
3181 - """  
3182 - # print a waiting message only if the output is not redirected to a file:  
3183 - if sys.stdout.isatty():  
3184 - print('Analysis...\r', end='')  
3185 - sys.stdout.flush()  
3186 - return [dict(type=kw_type, keyword=keyword, description=description)  
3187 - for kw_type, keyword, description in self.analyze_macros(show_decoded_strings, deobfuscate)]  
3188 -  
3189 - def process_file(self, show_decoded_strings=False,  
3190 - display_code=True, hide_attributes=True,  
3191 - vba_code_only=False, show_deobfuscated_code=False,  
3192 - deobfuscate=False):  
3193 - """  
3194 - Process a single file  
3195 -  
3196 - :param filename: str, path and filename of file on disk, or within the container.  
3197 - :param data: bytes, content of the file if it is in a container, None if it is a file on disk.  
3198 - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.  
3199 - :param display_code: bool, if False VBA source code is not displayed (default True)  
3200 - :param global_analysis: bool, if True all modules are merged for a single analysis (default),  
3201 - otherwise each module is analyzed separately (old behaviour)  
3202 - :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default)  
3203 - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)  
3204 - """  
3205 - #TODO: replace print by writing to a provided output file (sys.stdout by default)  
3206 - # fix conflicting parameters:  
3207 - if vba_code_only and not display_code:  
3208 - display_code = True  
3209 - if self.container:  
3210 - display_filename = '%s in %s' % (self.filename, self.container)  
3211 - else:  
3212 - display_filename = self.filename  
3213 - print('=' * 79)  
3214 - print('FILE: %s' % display_filename)  
3215 - try:  
3216 - #TODO: handle olefile errors, when an OLE file is malformed  
3217 - print('Type: %s'% self.type)  
3218 - if self.detect_vba_macros():  
3219 - #print 'Contains VBA Macros:'  
3220 - for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros():  
3221 - if hide_attributes:  
3222 - # hide attribute lines:  
3223 - if isinstance(vba_code,bytes):  
3224 - vba_code =vba_code.decode('utf-8','backslashreplace')  
3225 - vba_code_filtered = filter_vba(vba_code)  
3226 - else:  
3227 - vba_code_filtered = vba_code  
3228 - print('-' * 79)  
3229 - print('VBA MACRO %s ' % vba_filename)  
3230 - print('in file: %s - OLE stream: %s' % (subfilename, repr(stream_path)))  
3231 - if display_code:  
3232 - print('- ' * 39)  
3233 - # detect empty macros:  
3234 - if vba_code_filtered.strip() == '':  
3235 - print('(empty macro)')  
3236 - else:  
3237 - print(vba_code_filtered)  
3238 - for (subfilename, stream_path, form_string) in self.extract_form_strings():  
3239 - print('-' * 79)  
3240 - print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path))  
3241 - print('- ' * 39)  
3242 - print(form_string.decode('utf-8', 'ignore'))  
3243 - if not vba_code_only:  
3244 - # analyse the code from all modules at once:  
3245 - self.print_analysis(show_decoded_strings, deobfuscate)  
3246 - if show_deobfuscated_code:  
3247 - print('MACRO SOURCE CODE WITH DEOBFUSCATED VBA STRINGS (EXPERIMENTAL):\n\n')  
3248 - print(self.reveal())  
3249 - else:  
3250 - print('No VBA macros found.')  
3251 - except OlevbaBaseException:  
3252 - raise  
3253 - except Exception as exc:  
3254 - # display the exception with full stack trace for debugging  
3255 - log.info('Error processing file %s (%s)' % (self.filename, exc))  
3256 - log.debug('Traceback:', exc_info=True)  
3257 - raise ProcessingError(self.filename, exc)  
3258 - print('')  
3259 -  
3260 -  
3261 - def process_file_json(self, show_decoded_strings=False,  
3262 - display_code=True, hide_attributes=True,  
3263 - vba_code_only=False, show_deobfuscated_code=False,  
3264 - deobfuscate=False):  
3265 - """  
3266 - Process a single file  
3267 -  
3268 - every "show" or "print" here is to be translated as "add to json"  
3269 -  
3270 - :param filename: str, path and filename of file on disk, or within the container.  
3271 - :param data: bytes, content of the file if it is in a container, None if it is a file on disk.  
3272 - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content.  
3273 - :param display_code: bool, if False VBA source code is not displayed (default True)  
3274 - :param global_analysis: bool, if True all modules are merged for a single analysis (default),  
3275 - otherwise each module is analyzed separately (old behaviour)  
3276 - :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default)  
3277 - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow)  
3278 - """  
3279 - #TODO: fix conflicting parameters (?)  
3280 -  
3281 - if vba_code_only and not display_code:  
3282 - display_code = True  
3283 -  
3284 - result = {}  
3285 -  
3286 - if self.container:  
3287 - result['container'] = self.container  
3288 - else:  
3289 - result['container'] = None  
3290 - result['file'] = self.filename  
3291 - result['json_conversion_successful'] = False  
3292 - result['analysis'] = None  
3293 - result['code_deobfuscated'] = None  
3294 - result['do_deobfuscate'] = deobfuscate  
3295 -  
3296 - try:  
3297 - #TODO: handle olefile errors, when an OLE file is malformed  
3298 - result['type'] = self.type  
3299 - macros = []  
3300 - if self.detect_vba_macros():  
3301 - for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros():  
3302 - curr_macro = {}  
3303 - if isinstance(vba_code, bytes):  
3304 - vba_code = vba_code.decode('utf-8', 'backslashreplace')  
3305 -  
3306 - if hide_attributes:  
3307 - # hide attribute lines:  
3308 - vba_code_filtered = filter_vba(vba_code)  
3309 - else:  
3310 - vba_code_filtered = vba_code  
3311 -  
3312 - curr_macro['vba_filename'] = vba_filename  
3313 - curr_macro['subfilename'] = subfilename  
3314 - curr_macro['ole_stream'] = stream_path  
3315 - if display_code:  
3316 - curr_macro['code'] = vba_code_filtered.strip()  
3317 - else:  
3318 - curr_macro['code'] = None  
3319 - macros.append(curr_macro)  
3320 - if not vba_code_only:  
3321 - # analyse the code from all modules at once:  
3322 - result['analysis'] = self.print_analysis_json(show_decoded_strings,  
3323 - deobfuscate)  
3324 - if show_deobfuscated_code:  
3325 - result['code_deobfuscated'] = self.reveal()  
3326 - result['macros'] = macros  
3327 - result['json_conversion_successful'] = True  
3328 - except Exception as exc:  
3329 - # display the exception with full stack trace for debugging  
3330 - log.info('Error processing file %s (%s)' % (self.filename, exc))  
3331 - log.debug('Traceback:', exc_info=True)  
3332 - raise ProcessingError(self.filename, exc)  
3333 -  
3334 - return result  
3335 -  
3336 -  
3337 - def process_file_triage(self, show_decoded_strings=False, deobfuscate=False):  
3338 - """  
3339 - Process a file in triage mode, showing only summary results on one line.  
3340 - """  
3341 - #TODO: replace print by writing to a provided output file (sys.stdout by default)  
3342 - try:  
3343 - #TODO: handle olefile errors, when an OLE file is malformed  
3344 - if self.detect_vba_macros():  
3345 - # print a waiting message only if the output is not redirected to a file:  
3346 - if sys.stdout.isatty():  
3347 - print('Analysis...\r', end='')  
3348 - sys.stdout.flush()  
3349 - self.analyze_macros(show_decoded_strings=show_decoded_strings,  
3350 - deobfuscate=deobfuscate)  
3351 - flags = TYPE2TAG[self.type]  
3352 - macros = autoexec = suspicious = iocs = hexstrings = base64obf = dridex = vba_obf = '-'  
3353 - if self.contains_macros: macros = 'M'  
3354 - if self.nb_autoexec: autoexec = 'A'  
3355 - if self.nb_suspicious: suspicious = 'S'  
3356 - if self.nb_iocs: iocs = 'I'  
3357 - if self.nb_hexstrings: hexstrings = 'H'  
3358 - if self.nb_base64strings: base64obf = 'B'  
3359 - if self.nb_dridexstrings: dridex = 'D'  
3360 - if self.nb_vbastrings: vba_obf = 'V'  
3361 - flags += '%s%s%s%s%s%s%s%s' % (macros, autoexec, suspicious, iocs, hexstrings,  
3362 - base64obf, dridex, vba_obf)  
3363 -  
3364 - line = '%-12s %s' % (flags, self.filename)  
3365 - print(line)  
3366 -  
3367 - # old table display:  
3368 - # macros = autoexec = suspicious = iocs = hexstrings = 'no'  
3369 - # if nb_macros: macros = 'YES:%d' % nb_macros  
3370 - # if nb_autoexec: autoexec = 'YES:%d' % nb_autoexec  
3371 - # if nb_suspicious: suspicious = 'YES:%d' % nb_suspicious  
3372 - # if nb_iocs: iocs = 'YES:%d' % nb_iocs  
3373 - # if nb_hexstrings: hexstrings = 'YES:%d' % nb_hexstrings  
3374 - # # 2nd line = info  
3375 - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % (self.type, macros, autoexec, suspicious, iocs, hexstrings)  
3376 - except Exception as exc:  
3377 - # display the exception with full stack trace for debugging only  
3378 - log.debug('Error processing file %s (%s)' % (self.filename, exc),  
3379 - exc_info=True)  
3380 - raise ProcessingError(self.filename, exc)  
3381 -  
3382 -  
3383 - # t = prettytable.PrettyTable(('filename', 'type', 'macros', 'autoexec', 'suspicious', 'ioc', 'hexstrings'),  
3384 - # header=False, border=False)  
3385 - # t.align = 'l'  
3386 - # t.max_width['filename'] = 30  
3387 - # t.max_width['type'] = 10  
3388 - # t.max_width['macros'] = 6  
3389 - # t.max_width['autoexec'] = 6  
3390 - # t.max_width['suspicious'] = 6  
3391 - # t.max_width['ioc'] = 6  
3392 - # t.max_width['hexstrings'] = 6  
3393 - # t.add_row((filename, ftype, macros, autoexec, suspicious, iocs, hexstrings))  
3394 - # print t  
3395 -  
3396 -  
3397 -#=== MAIN =====================================================================  
3398 -  
3399 -def parse_args(cmd_line_args=None):  
3400 - """ parse command line arguments (given ones or per default sys.argv) """  
3401 -  
3402 - DEFAULT_LOG_LEVEL = "warning" # Default log level  
3403 - LOG_LEVELS = {  
3404 - 'debug': logging.DEBUG,  
3405 - 'info': logging.INFO,  
3406 - 'warning': logging.WARNING,  
3407 - 'error': logging.ERROR,  
3408 - 'critical': logging.CRITICAL  
3409 - }  
3410 -  
3411 - usage = 'usage: olevba [options] <filename> [filename2 ...]'  
3412 - parser = optparse.OptionParser(usage=usage)  
3413 - # parser.add_option('-o', '--outfile', dest='outfile',  
3414 - # help='output file')  
3415 - # parser.add_option('-c', '--csv', dest='csv',  
3416 - # help='export results to a CSV file')  
3417 - parser.add_option("-r", action="store_true", dest="recursive",  
3418 - help='find files recursively in subdirectories.')  
3419 - parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None,  
3420 - help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)')  
3421 - parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*',  
3422 - help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)')  
3423 - # output mode; could make this even simpler with add_option(type='choice') but that would make  
3424 - # cmd line interface incompatible...  
3425 - modes = optparse.OptionGroup(parser, title='Output mode (mutually exclusive)')  
3426 - modes.add_option("-t", '--triage', action="store_const", dest="output_mode",  
3427 - const='triage', default='unspecified',  
3428 - help='triage mode, display results as a summary table (default for multiple files)')  
3429 - modes.add_option("-d", '--detailed', action="store_const", dest="output_mode",  
3430 - const='detailed', default='unspecified',  
3431 - help='detailed mode, display full results (default for single file)')  
3432 - modes.add_option("-j", '--json', action="store_const", dest="output_mode",  
3433 - const='json', default='unspecified',  
3434 - help='json mode, detailed in json format (never default)')  
3435 - parser.add_option_group(modes)  
3436 - parser.add_option("-a", '--analysis', action="store_false", dest="display_code", default=True,  
3437 - help='display only analysis results, not the macro source code')  
3438 - parser.add_option("-c", '--code', action="store_true", dest="vba_code_only", default=False,  
3439 - help='display only VBA source code, do not analyze it')  
3440 - parser.add_option("--decode", action="store_true", dest="show_decoded_strings",  
3441 - help='display all the obfuscated strings with their decoded content (Hex, Base64, StrReverse, Dridex, VBA).')  
3442 - parser.add_option("--attr", action="store_false", dest="hide_attributes", default=True,  
3443 - help='display the attribute lines at the beginning of VBA source code')  
3444 - parser.add_option("--reveal", action="store_true", dest="show_deobfuscated_code",  
3445 - help='display the macro source code after replacing all the obfuscated strings by their decoded content.')  
3446 - parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL,  
3447 - help="logging level debug/info/warning/error/critical (default=%default)")  
3448 - parser.add_option('--deobf', dest="deobfuscate", action="store_true", default=False,  
3449 - help="Attempt to deobfuscate VBA expressions (slow)")  
3450 - parser.add_option('--relaxed', dest="relaxed", action="store_true", default=False,  
3451 - help="Do not raise errors if opening of substream fails")  
3452 -  
3453 - (options, args) = parser.parse_args(cmd_line_args)  
3454 -  
3455 - # Print help if no arguments are passed  
3456 - if len(args) == 0:  
3457 - print('olevba %s - http://decalage.info/python/oletools' % __version__)  
3458 - print(__doc__)  
3459 - parser.print_help()  
3460 - sys.exit(RETURN_WRONG_ARGS)  
3461 -  
3462 - options.loglevel = LOG_LEVELS[options.loglevel]  
3463 -  
3464 - return options, args  
3465 -  
3466 -  
3467 -def main(cmd_line_args=None):  
3468 - """  
3469 - Main function, called when olevba is run from the command line  
3470 -  
3471 - Optional argument: command line arguments to be forwarded to ArgumentParser  
3472 - in process_args. Per default (cmd_line_args=None), sys.argv is used. Option  
3473 - mainly added for unit-testing  
3474 - """  
3475 -  
3476 - options, args = parse_args(cmd_line_args)  
3477 -  
3478 - # provide info about tool and its version  
3479 - if options.output_mode == 'json':  
3480 - # print first json entry with meta info and opening '['  
3481 - print_json(script_name='olevba', version=__version__,  
3482 - url='http://decalage.info/python/oletools',  
3483 - type='MetaInformation', _json_is_first=True)  
3484 - else:  
3485 - print('olevba3 %s - http://decalage.info/python/oletools' % __version__)  
3486 -  
3487 - logging.basicConfig(level=options.loglevel, format='%(levelname)-8s %(message)s')  
3488 - # enable logging in the modules:  
3489 - enable_logging()  
3490 -  
3491 - # Old display with number of items detected:  
3492 - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('Type', 'Macros', 'AutoEx', 'Susp.', 'IOCs', 'HexStr')  
3493 - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('-'*8, '-'*7, '-'*7, '-'*7, '-'*7, '-'*7)  
3494 -  
3495 - # with the option --reveal, make sure --deobf is also enabled:  
3496 - if options.show_deobfuscated_code and not options.deobfuscate:  
3497 - log.info('set --deobf because --reveal was set')  
3498 - options.deobfuscate = True  
3499 - if options.output_mode == 'triage' and options.show_deobfuscated_code:  
3500 - log.info('ignoring option --reveal in triage output mode')  
3501 -  
3502 - # Column headers (do not know how many files there will be yet, so if no output_mode  
3503 - # was specified, we will print triage for first file --> need these headers)  
3504 - if options.output_mode in ('triage', 'unspecified'):  
3505 - print('%-12s %-65s' % ('Flags', 'Filename'))  
3506 - print('%-12s %-65s' % ('-' * 11, '-' * 65))  
3507 -  
3508 - previous_container = None  
3509 - count = 0  
3510 - container = filename = data = None  
3511 - vba_parser = None  
3512 - return_code = RETURN_OK  
3513 - try:  
3514 - for container, filename, data in xglob.iter_files(args, recursive=options.recursive,  
3515 - zip_password=options.zip_password, zip_fname=options.zip_fname):  
3516 - # ignore directory names stored in zip files:  
3517 - if container and filename.endswith('/'):  
3518 - continue  
3519 -  
3520 - # handle errors from xglob  
3521 - if isinstance(data, Exception):  
3522 - if isinstance(data, PathNotFoundException):  
3523 - if options.output_mode in ('triage', 'unspecified'):  
3524 - print('%-12s %s - File not found' % ('?', filename))  
3525 - elif options.output_mode != 'json':  
3526 - log.error('Given path %r does not exist!' % filename)  
3527 - return_code = RETURN_FILE_NOT_FOUND if return_code == 0 \  
3528 - else RETURN_SEVERAL_ERRS  
3529 - else:  
3530 - if options.output_mode in ('triage', 'unspecified'):  
3531 - print('%-12s %s - Failed to read from zip file %s' % ('?', filename, container))  
3532 - elif options.output_mode != 'json':  
3533 - log.error('Exception opening/reading %r from zip file %r: %s'  
3534 - % (filename, container, data))  
3535 - return_code = RETURN_XGLOB_ERR if return_code == 0 \  
3536 - else RETURN_SEVERAL_ERRS  
3537 - if options.output_mode == 'json':  
3538 - print_json(file=filename, type='error',  
3539 - error=type(data).__name__, message=str(data))  
3540 - continue  
3541 -  
3542 - try:  
3543 - # Open the file  
3544 - vba_parser = VBA_Parser_CLI(filename, data=data, container=container,  
3545 - relaxed=options.relaxed)  
3546 -  
3547 - if options.output_mode == 'detailed':  
3548 - # fully detailed output  
3549 - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,  
3550 - display_code=options.display_code,  
3551 - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,  
3552 - show_deobfuscated_code=options.show_deobfuscated_code,  
3553 - deobfuscate=options.deobfuscate)  
3554 - elif options.output_mode in ('triage', 'unspecified'):  
3555 - # print container name when it changes:  
3556 - if container != previous_container:  
3557 - if container is not None:  
3558 - print('\nFiles in %s:' % container)  
3559 - previous_container = container  
3560 - # summarized output for triage:  
3561 - vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings,  
3562 - deobfuscate=options.deobfuscate)  
3563 - elif options.output_mode == 'json':  
3564 - print_json(  
3565 - vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings,  
3566 - display_code=options.display_code,  
3567 - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,  
3568 - show_deobfuscated_code=options.show_deobfuscated_code,  
3569 - deobfuscate=options.deobfuscate))  
3570 - else: # (should be impossible)  
3571 - raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode))  
3572 - count += 1  
3573 -  
3574 - except (SubstreamOpenError, UnexpectedDataError) as exc:  
3575 - if options.output_mode in ('triage', 'unspecified'):  
3576 - print('%-12s %s - Error opening substream or uenxpected ' \  
3577 - 'content' % ('?', filename))  
3578 - elif options.output_mode == 'json':  
3579 - print_json(file=filename, type='error',  
3580 - error=type(exc).__name__, message=str(exc))  
3581 - else:  
3582 - log.exception('Error opening substream or unexpected '  
3583 - 'content in %s' % filename)  
3584 - return_code = RETURN_OPEN_ERROR if return_code == 0 \  
3585 - else RETURN_SEVERAL_ERRS  
3586 - except FileOpenError as exc:  
3587 - if options.output_mode in ('triage', 'unspecified'):  
3588 - print('%-12s %s - File format not supported' % ('?', filename))  
3589 - elif options.output_mode == 'json':  
3590 - print_json(file=filename, type='error',  
3591 - error=type(exc).__name__, message=str(exc))  
3592 - else:  
3593 - log.exception('Failed to open %s -- probably not supported!' % filename)  
3594 - return_code = RETURN_OPEN_ERROR if return_code == 0 \  
3595 - else RETURN_SEVERAL_ERRS  
3596 - except ProcessingError as exc:  
3597 - if options.output_mode in ('triage', 'unspecified'):  
3598 - print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc))  
3599 - elif options.output_mode == 'json':  
3600 - print_json(file=filename, type='error',  
3601 - error=type(exc).__name__,  
3602 - message=str(exc.orig_exc))  
3603 - else:  
3604 - log.exception('Error processing file %s (%s)!'  
3605 - % (filename, exc.orig_exc))  
3606 - return_code = RETURN_PARSE_ERROR if return_code == 0 \  
3607 - else RETURN_SEVERAL_ERRS  
3608 - except FileIsEncryptedError as exc:  
3609 - if options.output_mode in ('triage', 'unspecified'):  
3610 - print('%-12s %s - File is encrypted' % ('!ERROR', filename))  
3611 - elif options.output_mode == 'json':  
3612 - print_json(file=filename, type='error',  
3613 - error=type(exc).__name__, message=str(exc))  
3614 - else:  
3615 - log.exception('File %s is encrypted!' % (filename))  
3616 - return_code = RETURN_ENCRYPTED if return_code == 0 \  
3617 - else RETURN_SEVERAL_ERRS  
3618 - # Here we do not close the vba_parser, because process_file may need it below.  
3619 -  
3620 - finally:  
3621 - if vba_parser is not None:  
3622 - vba_parser.close()  
3623 -  
3624 - if options.output_mode == 'triage':  
3625 - print('\n(Flags: OpX=OpenXML, XML=Word2003XML, FlX=FlatOPC XML, MHT=MHTML, TXT=Text, M=Macros, ' \  
3626 - 'A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, ' \  
3627 - 'B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown)\n')  
3628 -  
3629 - if count == 1 and options.output_mode == 'unspecified':  
3630 - # if options -t, -d and -j were not specified and it's a single file, print details:  
3631 - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings,  
3632 - display_code=options.display_code,  
3633 - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only,  
3634 - show_deobfuscated_code=options.show_deobfuscated_code,  
3635 - deobfuscate=options.deobfuscate)  
3636 -  
3637 - if options.output_mode == 'json':  
3638 - # print last json entry (a last one without a comma) and closing ]  
3639 - print_json(type='MetaInformation', return_code=return_code,  
3640 - n_processed=count, _json_is_last=True)  
3641 -  
3642 - except Exception as exc:  
3643 - # some unexpected error, maybe some of the types caught in except clauses  
3644 - # above were not sufficient. This is very bad, so log complete trace at exception level  
3645 - # and do not care about output mode  
3646 - log.exception('Unhandled exception in main: %s' % exc, exc_info=True)  
3647 - return_code = RETURN_UNEXPECTED # even if there were others before -- this is more important  
3648 - # TODO: print msg with URL to report issues (except in JSON mode)  
3649 -  
3650 - # done. exit  
3651 - log.debug('will exit now with code %s' % return_code)  
3652 - sys.exit(return_code) 19 +from oletools.olevba import *
  20 +from oletools.olevba import __doc__, __version__
3653 21
3654 if __name__ == '__main__': 22 if __name__ == '__main__':
3655 main() 23 main()
3656 24
3657 -# This was coded while listening to "Dust" from I Love You But I've Chosen Darkness  
oletools/ooxml.py
@@ -16,11 +16,11 @@ TODO: &quot;xml2003&quot; == &quot;flatopc&quot;? @@ -16,11 +16,11 @@ TODO: &quot;xml2003&quot; == &quot;flatopc&quot;?
16 """ 16 """
17 17
18 import sys 18 import sys
19 -from oletools.common.log_helper import log_helper  
20 from zipfile import ZipFile, BadZipfile, is_zipfile 19 from zipfile import ZipFile, BadZipfile, is_zipfile
21 from os.path import splitext 20 from os.path import splitext
22 import io 21 import io
23 import re 22 import re
  23 +from oletools.common.log_helper import log_helper
24 24
25 # import lxml or ElementTree for XML parsing: 25 # import lxml or ElementTree for XML parsing:
26 try: 26 try:
@@ -107,16 +107,14 @@ def debug_str(elem): @@ -107,16 +107,14 @@ def debug_str(elem):
107 text = u', '.join(parts) 107 text = u', '.join(parts)
108 if len(text) > 150: 108 if len(text) > 150:
109 return text[:147] + u'...]' 109 return text[:147] + u'...]'
110 - else:  
111 - return text + u']' 110 + return text + u']'
112 111
113 112
114 def isstr(some_var): 113 def isstr(some_var):
115 """ version-independent test for isinstance(some_var, (str, unicode)) """ 114 """ version-independent test for isinstance(some_var, (str, unicode)) """
116 if sys.version_info.major == 2: 115 if sys.version_info.major == 2:
117 return isinstance(some_var, basestring) # true for str and unicode 116 return isinstance(some_var, basestring) # true for str and unicode
118 - else:  
119 - return isinstance(some_var, str) # there is no unicode 117 + return isinstance(some_var, str) # there is no unicode
120 118
121 119
122 ############################################################################### 120 ###############################################################################
@@ -136,23 +134,29 @@ def get_type(filename): @@ -136,23 +134,29 @@ def get_type(filename):
136 prog_id = match.groups()[0] 134 prog_id = match.groups()[0]
137 if prog_id == WORD_XML_PROG_ID: 135 if prog_id == WORD_XML_PROG_ID:
138 return DOCTYPE_WORD_XML 136 return DOCTYPE_WORD_XML
139 - elif prog_id == EXCEL_XML_PROG_ID: 137 + if prog_id == EXCEL_XML_PROG_ID:
140 return DOCTYPE_EXCEL_XML 138 return DOCTYPE_EXCEL_XML
141 - else:  
142 - return DOCTYPE_NONE 139 + return DOCTYPE_NONE
143 140
144 is_doc = False 141 is_doc = False
145 is_xls = False 142 is_xls = False
146 is_ppt = False 143 is_ppt = False
147 - for _, elem, _ in parser.iter_xml(FILE_CONTENT_TYPES):  
148 - logger.debug(u' ' + debug_str(elem))  
149 - try:  
150 - content_type = elem.attrib['ContentType']  
151 - except KeyError: # ContentType not an attr  
152 - continue  
153 - is_xls |= content_type.startswith(CONTENT_TYPES_EXCEL)  
154 - is_doc |= content_type.startswith(CONTENT_TYPES_WORD)  
155 - is_ppt |= content_type.startswith(CONTENT_TYPES_PPT) 144 + try:
  145 + for _, elem, _ in parser.iter_xml(FILE_CONTENT_TYPES):
  146 + logger.debug(u' ' + debug_str(elem))
  147 + try:
  148 + content_type = elem.attrib['ContentType']
  149 + except KeyError: # ContentType not an attr
  150 + continue
  151 + is_xls |= content_type.startswith(CONTENT_TYPES_EXCEL)
  152 + is_doc |= content_type.startswith(CONTENT_TYPES_WORD)
  153 + is_ppt |= content_type.startswith(CONTENT_TYPES_PPT)
  154 + except BadOOXML as oo_err:
  155 + if oo_err.more_info.startswith('invalid subfile') and \
  156 + FILE_CONTENT_TYPES in oo_err.more_info:
  157 + # no FILE_CONTENT_TYPES in zip, so probably no ms office xml.
  158 + return DOCTYPE_NONE
  159 + raise
156 160
157 if is_doc and not is_xls and not is_ppt: 161 if is_doc and not is_xls and not is_ppt:
158 return DOCTYPE_WORD 162 return DOCTYPE_WORD
@@ -162,9 +166,8 @@ def get_type(filename): @@ -162,9 +166,8 @@ def get_type(filename):
162 return DOCTYPE_POWERPOINT 166 return DOCTYPE_POWERPOINT
163 if not is_doc and not is_xls and not is_ppt: 167 if not is_doc and not is_xls and not is_ppt:
164 return DOCTYPE_NONE 168 return DOCTYPE_NONE
165 - else:  
166 - logger.warning('Encountered contradictory content types')  
167 - return DOCTYPE_MIXED 169 + logger.warning('Encountered contradictory content types')
  170 + return DOCTYPE_MIXED
168 171
169 172
170 def is_ooxml(filename): 173 def is_ooxml(filename):
@@ -177,6 +180,7 @@ def is_ooxml(filename): @@ -177,6 +180,7 @@ def is_ooxml(filename):
177 return False 180 return False
178 if doctype == DOCTYPE_NONE: 181 if doctype == DOCTYPE_NONE:
179 return False 182 return False
  183 + return True
180 184
181 185
182 ############################################################################### 186 ###############################################################################
@@ -216,6 +220,7 @@ class ZipSubFile(object): @@ -216,6 +220,7 @@ class ZipSubFile(object):
216 See also (and maybe could some day merge with): 220 See also (and maybe could some day merge with):
217 ppt_record_parser.IterStream; also: oleobj.FakeFile 221 ppt_record_parser.IterStream; also: oleobj.FakeFile
218 """ 222 """
  223 + CHUNK_SIZE = 4096
219 224
220 def __init__(self, container, filename, mode='r', size=None): 225 def __init__(self, container, filename, mode='r', size=None):
221 """ remember all necessary vars but do not open yet """ 226 """ remember all necessary vars but do not open yet """
@@ -253,7 +258,7 @@ class ZipSubFile(object): @@ -253,7 +258,7 @@ class ZipSubFile(object):
253 # print('ZipSubFile: opened; size={}'.format(self.size)) 258 # print('ZipSubFile: opened; size={}'.format(self.size))
254 return self 259 return self
255 260
256 - def write(self, *args, **kwargs): # pylint: disable=unused-argument,no-self-use 261 + def write(self, *args, **kwargs):
257 """ write is not allowed """ 262 """ write is not allowed """
258 raise IOError('writing not implemented') 263 raise IOError('writing not implemented')
259 264
@@ -311,10 +316,9 @@ class ZipSubFile(object): @@ -311,10 +316,9 @@ class ZipSubFile(object):
311 """ helper for seek: skip forward by given amount using read() """ 316 """ helper for seek: skip forward by given amount using read() """
312 # print('ZipSubFile: seek by skipping {} bytes starting at {}' 317 # print('ZipSubFile: seek by skipping {} bytes starting at {}'
313 # .format(self.pos, to_skip)) 318 # .format(self.pos, to_skip))
314 - CHUNK_SIZE = 4096  
315 - n_chunks, leftover = divmod(to_skip, CHUNK_SIZE) 319 + n_chunks, leftover = divmod(to_skip, self.CHUNK_SIZE)
316 for _ in range(n_chunks): 320 for _ in range(n_chunks):
317 - self.read(CHUNK_SIZE) # just read and discard 321 + self.read(self.CHUNK_SIZE) # just read and discard
318 self.read(leftover) 322 self.read(leftover)
319 # print('ZipSubFile: seek by skipping done, pos now {}' 323 # print('ZipSubFile: seek by skipping done, pos now {}'
320 # .format(self.pos)) 324 # .format(self.pos))
@@ -417,8 +421,7 @@ class XmlParser(object): @@ -417,8 +421,7 @@ class XmlParser(object):
417 if match: 421 if match:
418 self._is_single_xml = True 422 self._is_single_xml = True
419 return True 423 return True
420 - if not match:  
421 - raise BadOOXML(self.filename, 'is no zip and has no prog_id') 424 + raise BadOOXML(self.filename, 'is no zip and has no prog_id')
422 425
423 def iter_files(self, args=None): 426 def iter_files(self, args=None):
424 """ Find files in zip or just give single xml file """ 427 """ Find files in zip or just give single xml file """
@@ -433,17 +436,14 @@ class XmlParser(object): @@ -433,17 +436,14 @@ class XmlParser(object):
433 subfiles = None 436 subfiles = None
434 try: 437 try:
435 zipper = ZipFile(self.filename) 438 zipper = ZipFile(self.filename)
436 - try:  
437 - _ = zipper.getinfo(FILE_CONTENT_TYPES)  
438 - except KeyError:  
439 - raise BadOOXML(self.filename,  
440 - 'No content type information')  
441 if not args: 439 if not args:
442 subfiles = zipper.namelist() 440 subfiles = zipper.namelist()
443 elif isstr(args): 441 elif isstr(args):
444 subfiles = [args, ] 442 subfiles = [args, ]
445 else: 443 else:
446 - subfiles = tuple(args) # make a copy in case orig changes 444 + # make a copy in case original args are modified
  445 + # Not sure whether this really is needed...
  446 + subfiles = tuple(arg for arg in args)
447 447
448 for subfile in subfiles: 448 for subfile in subfiles:
449 with zipper.open(subfile, 'r') as handle: 449 with zipper.open(subfile, 'r') as handle:
@@ -451,10 +451,12 @@ class XmlParser(object): @@ -451,10 +451,12 @@ class XmlParser(object):
451 if not args: 451 if not args:
452 self.did_iter_all = True 452 self.did_iter_all = True
453 except KeyError as orig_err: 453 except KeyError as orig_err:
  454 + # Note: do not change text of this message without adjusting
  455 + # conditions in except handlers
454 raise BadOOXML(self.filename, 456 raise BadOOXML(self.filename,
455 'invalid subfile: ' + str(orig_err)) 457 'invalid subfile: ' + str(orig_err))
456 except BadZipfile: 458 except BadZipfile:
457 - raise BadOOXML(self.filename, 'neither zip nor xml') 459 + raise BadOOXML(self.filename, 'not in zip format')
458 finally: 460 finally:
459 if zipper: 461 if zipper:
460 zipper.close() 462 zipper.close()
@@ -503,7 +505,7 @@ class XmlParser(object): @@ -503,7 +505,7 @@ class XmlParser(object):
503 if event == 'start': 505 if event == 'start':
504 if elem.tag in want_tags: 506 if elem.tag in want_tags:
505 logger.debug('remember start of tag {0} at {1}' 507 logger.debug('remember start of tag {0} at {1}'
506 - .format(elem.tag, depth)) 508 + .format(elem.tag, depth))
507 inside_tags.append((elem.tag, depth)) 509 inside_tags.append((elem.tag, depth))
508 depth += 1 510 depth += 1
509 continue 511 continue
@@ -519,18 +521,18 @@ class XmlParser(object): @@ -519,18 +521,18 @@ class XmlParser(object):
519 inside_tags.pop() 521 inside_tags.pop()
520 else: 522 else:
521 logger.error('found end for wanted tag {0} ' 523 logger.error('found end for wanted tag {0} '
522 - 'but last start tag {1} does not'  
523 - ' match'.format(curr_tag,  
524 - inside_tags[-1])) 524 + 'but last start tag {1} does not'
  525 + ' match'.format(curr_tag,
  526 + inside_tags[-1]))
525 # try to recover: close all deeper tags 527 # try to recover: close all deeper tags
526 while inside_tags and \ 528 while inside_tags and \
527 inside_tags[-1][1] >= depth: 529 inside_tags[-1][1] >= depth:
528 logger.debug('recover: pop {0}' 530 logger.debug('recover: pop {0}'
529 - .format(inside_tags[-1])) 531 + .format(inside_tags[-1]))
530 inside_tags.pop() 532 inside_tags.pop()
531 except IndexError: # no inside_tag[-1] 533 except IndexError: # no inside_tag[-1]
532 logger.error('found end of {0} at depth {1} but ' 534 logger.error('found end of {0} at depth {1} but '
533 - 'no start event') 535 + 'no start event')
534 # yield element 536 # yield element
535 if is_wanted or not want_tags: 537 if is_wanted or not want_tags:
536 yield subfile, elem, depth 538 yield subfile, elem, depth
@@ -544,7 +546,7 @@ class XmlParser(object): @@ -544,7 +546,7 @@ class XmlParser(object):
544 except ET.ParseError as err: 546 except ET.ParseError as err:
545 self.subfiles_no_xml.add(subfile) 547 self.subfiles_no_xml.add(subfile)
546 if subfile is None: # this is no zip subfile but single xml 548 if subfile is None: # this is no zip subfile but single xml
547 - raise BadOOXML(self.filename, 'is neither zip nor xml') 549 + raise BadOOXML(self.filename, 'content is not valid XML')
548 elif subfile.endswith('.xml'): 550 elif subfile.endswith('.xml'):
549 log = logger.warning 551 log = logger.warning
550 else: 552 else:
@@ -568,21 +570,30 @@ class XmlParser(object): @@ -568,21 +570,30 @@ class XmlParser(object):
568 570
569 defaults = [] 571 defaults = []
570 files = [] 572 files = []
571 - for _, elem, _ in self.iter_xml(FILE_CONTENT_TYPES):  
572 - if elem.tag.endswith('Default'):  
573 - extension = elem.attrib['Extension']  
574 - if extension.startswith('.'):  
575 - extension = extension[1:]  
576 - defaults.append((extension, elem.attrib['ContentType']))  
577 - logger.debug('found content type for extension {0[0]}: {0[1]}'  
578 - .format(defaults[-1]))  
579 - elif elem.tag.endswith('Override'):  
580 - subfile = elem.attrib['PartName']  
581 - if subfile.startswith('/'):  
582 - subfile = subfile[1:]  
583 - files.append((subfile, elem.attrib['ContentType']))  
584 - logger.debug('found content type for subfile {0[0]}: {0[1]}'  
585 - .format(files[-1])) 573 + try:
  574 + for _, elem, _ in self.iter_xml(FILE_CONTENT_TYPES):
  575 + if elem.tag.endswith('Default'):
  576 + extension = elem.attrib['Extension']
  577 + if extension.startswith('.'):
  578 + extension = extension[1:]
  579 + defaults.append((extension, elem.attrib['ContentType']))
  580 + logger.debug('found content type for extension {0[0]}: '
  581 + '{0[1]}'.format(defaults[-1]))
  582 + elif elem.tag.endswith('Override'):
  583 + subfile = elem.attrib['PartName']
  584 + if subfile.startswith('/'):
  585 + subfile = subfile[1:]
  586 + files.append((subfile, elem.attrib['ContentType']))
  587 + logger.debug('found content type for subfile {0[0]}: '
  588 + '{0[1]}'.format(files[-1]))
  589 + except BadOOXML as oo_err:
  590 + if oo_err.more_info.startswith('invalid subfile') and \
  591 + FILE_CONTENT_TYPES in oo_err.more_info:
  592 + # no FILE_CONTENT_TYPES in zip, so probably no ms office xml.
  593 + # Maybe OpenDocument format? In any case, try to analyze.
  594 + pass
  595 + else:
  596 + raise
586 return dict(files), dict(defaults) 597 return dict(files), dict(defaults)
587 598
588 def iter_non_xml(self): 599 def iter_non_xml(self):
@@ -599,7 +610,7 @@ class XmlParser(object): @@ -599,7 +610,7 @@ class XmlParser(object):
599 """ 610 """
600 if not self.did_iter_all: 611 if not self.did_iter_all:
601 logger.warning('Did not iterate through complete file. ' 612 logger.warning('Did not iterate through complete file. '
602 - 'Should run iter_xml() without args, first.') 613 + 'Should run iter_xml() without args, first.')
603 if not self.subfiles_no_xml: 614 if not self.subfiles_no_xml:
604 return 615 return
605 616
@@ -631,7 +642,7 @@ def test(): @@ -631,7 +642,7 @@ def test():
631 642
632 see module doc for more info 643 see module doc for more info
633 """ 644 """
634 - log_helper.enable_logging(False, logger.DEBUG) 645 + log_helper.enable_logging(False, 'debug')
635 if len(sys.argv) != 2: 646 if len(sys.argv) != 2:
636 print(u'To test this code, give me a single file as arg') 647 print(u'To test this code, give me a single file as arg')
637 return 2 648 return 2
oletools/ppt_parser.py
@@ -43,7 +43,7 @@ file structure and will replace this module some time soon! @@ -43,7 +43,7 @@ file structure and will replace this module some time soon!
43 # 2017-04-23 v0.51 PL: - fixed absolute imports and issue #101 43 # 2017-04-23 v0.51 PL: - fixed absolute imports and issue #101
44 # 2018-09-11 v0.54 PL: - olefile is now a dependency 44 # 2018-09-11 v0.54 PL: - olefile is now a dependency
45 45
46 -__version__ = '0.54dev1' 46 +__version__ = '0.54'
47 47
48 48
49 # --- IMPORTS ------------------------------------------------------------------ 49 # --- IMPORTS ------------------------------------------------------------------
oletools/ppt_record_parser.py
@@ -63,7 +63,6 @@ except ImportError: @@ -63,7 +63,6 @@ except ImportError:
63 sys.path.insert(0, PARENT_DIR) 63 sys.path.insert(0, PARENT_DIR)
64 del PARENT_DIR 64 del PARENT_DIR
65 from oletools import record_base 65 from oletools import record_base
66 -from oletools.common.errors import FileIsEncryptedError  
67 66
68 67
69 # types of relevant records (there are much more than listed here) 68 # types of relevant records (there are much more than listed here)
@@ -109,10 +108,11 @@ RECORD_TYPES = dict([ @@ -109,10 +108,11 @@ RECORD_TYPES = dict([
109 ]) 108 ])
110 109
111 110
112 -# record types where version is not 0x0 or 0xf 111 +# record types where version is not 0x0 or 0x1 or 0xf
113 VERSION_EXCEPTIONS = dict([ 112 VERSION_EXCEPTIONS = dict([
114 (0x0400, 2), # rt_vbainfoatom 113 (0x0400, 2), # rt_vbainfoatom
115 (0x03ef, 2), # rt_slideatom 114 (0x03ef, 2), # rt_slideatom
  115 + (0xe9c7, 7), # tests/test-data/encrypted/encrypted.ppt, not investigated
116 ]) 116 ])
117 117
118 118
@@ -149,6 +149,10 @@ def is_ppt(filename): @@ -149,6 +149,10 @@ def is_ppt(filename):
149 Param filename can be anything that OleFileIO constructor accepts: name of 149 Param filename can be anything that OleFileIO constructor accepts: name of
150 file or file data or data stream. 150 file or file data or data stream.
151 151
  152 + Will not try to decrypt the file not even try to determine whether it is
  153 + encrypted. If the file is encrypted will either raise an error or just
  154 + return `False`.
  155 +
152 see also: oleid.OleID.check_powerpoint 156 see also: oleid.OleID.check_powerpoint
153 """ 157 """
154 have_current_user = False 158 have_current_user = False
@@ -170,7 +174,7 @@ def is_ppt(filename): @@ -170,7 +174,7 @@ def is_ppt(filename):
170 for record in stream.iter_records(): 174 for record in stream.iter_records():
171 if record.type == 0x0ff5: # UserEditAtom 175 if record.type == 0x0ff5: # UserEditAtom
172 have_user_edit = True 176 have_user_edit = True
173 - elif record.type == 0x1772: # PersisDirectoryAtom 177 + elif record.type == 0x1772: # PersistDirectoryAtom
174 have_persist_dir = True 178 have_persist_dir = True
175 elif record.type == 0x03e8: # DocumentContainer 179 elif record.type == 0x03e8: # DocumentContainer
176 have_document_container = True 180 have_document_container = True
@@ -181,13 +185,12 @@ def is_ppt(filename): @@ -181,13 +185,12 @@ def is_ppt(filename):
181 return True 185 return True
182 else: # ignore other streams/storages since they are optional 186 else: # ignore other streams/storages since they are optional
183 continue 187 continue
184 - except FileIsEncryptedError:  
185 - assert ppt_file is not None, \  
186 - 'Encryption error should not be raised from just opening OLE file.'  
187 - # just rely on stream names, copied from oleid  
188 - return ppt_file.exists('PowerPoint Document')  
189 - except Exception:  
190 - pass 188 + except Exception as exc:
  189 + logging.debug('Ignoring exception in is_ppt, assume is not ppt',
  190 + exc_info=True)
  191 + finally:
  192 + if ppt_file is not None:
  193 + ppt_file.close()
191 return False 194 return False
192 195
193 196
oletools/pyxswf.py
@@ -25,7 +25,7 @@ http://www.decalage.info/python/oletools @@ -25,7 +25,7 @@ http://www.decalage.info/python/oletools
25 25
26 #=== LICENSE ================================================================= 26 #=== LICENSE =================================================================
27 27
28 -# pyxswf is copyright (c) 2012-2016, Philippe Lagadec (http://www.decalage.info) 28 +# pyxswf is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
29 # All rights reserved. 29 # All rights reserved.
30 # 30 #
31 # Redistribution and use in source and binary forms, with or without modification, 31 # Redistribution and use in source and binary forms, with or without modification,
@@ -59,7 +59,7 @@ http://www.decalage.info/python/oletools @@ -59,7 +59,7 @@ http://www.decalage.info/python/oletools
59 # 2016-11-01 PL: - replaced StringIO by BytesIO for Python 3 59 # 2016-11-01 PL: - replaced StringIO by BytesIO for Python 3
60 # 2018-09-11 v0.54 PL: - olefile is now a dependency 60 # 2018-09-11 v0.54 PL: - olefile is now a dependency
61 61
62 -__version__ = '0.54dev1' 62 +__version__ = '0.54'
63 63
64 #------------------------------------------------------------------------------ 64 #------------------------------------------------------------------------------
65 # TODO: 65 # TODO:
oletools/record_base.py
@@ -8,7 +8,10 @@ This is the case for xls and ppt, so classes are bases for xls_parser.py and @@ -8,7 +8,10 @@ This is the case for xls and ppt, so classes are bases for xls_parser.py and
8 ppt_record_parser.py . 8 ppt_record_parser.py .
9 """ 9 """
10 10
11 -# === LICENSE ================================================================= 11 +# === LICENSE ==================================================================
  12 +
  13 +# record_base is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info)
  14 +# All rights reserved.
12 # 15 #
13 # Redistribution and use in source and binary forms, with or without 16 # Redistribution and use in source and binary forms, with or without
14 # modification, are permitted provided that the following conditions are met: 17 # modification, are permitted provided that the following conditions are met:
@@ -37,8 +40,10 @@ from __future__ import print_function @@ -37,8 +40,10 @@ from __future__ import print_function
37 # CHANGELOG: 40 # CHANGELOG:
38 # 2017-11-30 v0.01 CH: - first version based on xls_parser 41 # 2017-11-30 v0.01 CH: - first version based on xls_parser
39 # 2018-09-11 v0.54 PL: - olefile is now a dependency 42 # 2018-09-11 v0.54 PL: - olefile is now a dependency
  43 +# 2019-01-30 PL: - fixed import to avoid mixing installed oletools
  44 +# and dev version
40 45
41 -__version__ = '0.54dev1' 46 +__version__ = '0.54'
42 47
43 # ----------------------------------------------------------------------------- 48 # -----------------------------------------------------------------------------
44 # TODO: 49 # TODO:
@@ -63,16 +68,12 @@ import logging @@ -63,16 +68,12 @@ import logging
63 68
64 import olefile 69 import olefile
65 70
66 -try:  
67 - from oletools.common.errors import FileIsEncryptedError  
68 -except ImportError:  
69 - # little hack to allow absolute imports even if oletools is not installed.  
70 - PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(  
71 - os.path.abspath(__file__))))  
72 - if PARENT_DIR not in sys.path:  
73 - sys.path.insert(0, PARENT_DIR)  
74 - del PARENT_DIR  
75 - from oletools.common.errors import FileIsEncryptedError 71 +# little hack to allow absolute imports even if oletools is not installed.
  72 +PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(
  73 + os.path.abspath(__file__))))
  74 +if PARENT_DIR not in sys.path:
  75 + sys.path.insert(0, PARENT_DIR)
  76 +del PARENT_DIR
76 from oletools import oleid 77 from oletools import oleid
77 78
78 79
@@ -125,10 +126,9 @@ class OleRecordFile(olefile.OleFileIO): @@ -125,10 +126,9 @@ class OleRecordFile(olefile.OleFileIO):
125 """ 126 """
126 127
127 def open(self, filename, *args, **kwargs): 128 def open(self, filename, *args, **kwargs):
128 - """Call OleFileIO.open, raise error if is encrypted.""" 129 + """Call OleFileIO.open."""
129 #super(OleRecordFile, self).open(filename, *args, **kwargs) 130 #super(OleRecordFile, self).open(filename, *args, **kwargs)
130 OleFileIO.open(self, filename, *args, **kwargs) 131 OleFileIO.open(self, filename, *args, **kwargs)
131 - self.is_encrypted = oleid.OleID(self).check_encrypted().value  
132 132
133 @classmethod 133 @classmethod
134 def stream_class_for_name(cls, stream_name): 134 def stream_class_for_name(cls, stream_name):
@@ -161,8 +161,7 @@ class OleRecordFile(olefile.OleFileIO): @@ -161,8 +161,7 @@ class OleRecordFile(olefile.OleFileIO):
161 stream = clz(self._open(direntry.isectStart, direntry.size), 161 stream = clz(self._open(direntry.isectStart, direntry.size),
162 direntry.size, 162 direntry.size,
163 None if is_orphan else direntry.name, 163 None if is_orphan else direntry.name,
164 - direntry.entry_type,  
165 - self.is_encrypted) 164 + direntry.entry_type)
166 yield stream 165 yield stream
167 stream.close() 166 stream.close()
168 167
@@ -175,14 +174,13 @@ class OleRecordStream(object): @@ -175,14 +174,13 @@ class OleRecordStream(object):
175 abstract base class 174 abstract base class
176 """ 175 """
177 176
178 - def __init__(self, stream, size, name, stream_type, is_encrypted=False): 177 + def __init__(self, stream, size, name, stream_type):
179 self.stream = stream 178 self.stream = stream
180 self.size = size 179 self.size = size
181 self.name = name 180 self.name = name
182 if stream_type not in ENTRY_TYPE2STR: 181 if stream_type not in ENTRY_TYPE2STR:
183 raise ValueError('Unknown stream type: {0}'.format(stream_type)) 182 raise ValueError('Unknown stream type: {0}'.format(stream_type))
184 self.stream_type = stream_type 183 self.stream_type = stream_type
185 - self.is_encrypted = is_encrypted  
186 184
187 def read_record_head(self): 185 def read_record_head(self):
188 """ read first few bytes of record to determine size and type 186 """ read first few bytes of record to determine size and type
@@ -211,9 +209,6 @@ class OleRecordStream(object): @@ -211,9 +209,6 @@ class OleRecordStream(object):
211 209
212 Stream must be positioned at start of records (e.g. start of stream). 210 Stream must be positioned at start of records (e.g. start of stream).
213 """ 211 """
214 - if self.is_encrypted:  
215 - raise FileIsEncryptedError()  
216 -  
217 while True: 212 while True:
218 # unpacking as in olevba._extract_vba 213 # unpacking as in olevba._extract_vba
219 pos = self.stream.tell() 214 pos = self.stream.tell()
oletools/rtfobj.py
@@ -17,7 +17,7 @@ http://www.decalage.info/python/oletools @@ -17,7 +17,7 @@ http://www.decalage.info/python/oletools
17 17
18 #=== LICENSE ================================================================= 18 #=== LICENSE =================================================================
19 19
20 -# rtfobj is copyright (c) 2012-2018, Philippe Lagadec (http://www.decalage.info) 20 +# rtfobj is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info)
21 # All rights reserved. 21 # All rights reserved.
22 # 22 #
23 # Redistribution and use in source and binary forms, with or without modification, 23 # Redistribution and use in source and binary forms, with or without modification,
@@ -88,8 +88,10 @@ http://www.decalage.info/python/oletools @@ -88,8 +88,10 @@ http://www.decalage.info/python/oletools
88 # 2018-05-31 v0.53.1 PP: - fixed issue #316: whitespace after \bin on Python 3 88 # 2018-05-31 v0.53.1 PP: - fixed issue #316: whitespace after \bin on Python 3
89 # 2018-06-22 v0.53.2 PL: - fixed issue #327: added "\pnaiu" & "\pnaiud" 89 # 2018-06-22 v0.53.2 PL: - fixed issue #327: added "\pnaiu" & "\pnaiud"
90 # 2018-09-11 v0.54 PL: - olefile is now a dependency 90 # 2018-09-11 v0.54 PL: - olefile is now a dependency
  91 +# 2019-07-08 v0.55 MM: - added URL carver for CVE-2017-0199 (Equation Editor) PR #460
  92 +# - added SCT to the list of executable file extensions PR #461
91 93
92 -__version__ = '0.54dev1' 94 +__version__ = '0.55.dev3'
93 95
94 # ------------------------------------------------------------------------------ 96 # ------------------------------------------------------------------------------
95 # TODO: 97 # TODO:
@@ -103,7 +105,7 @@ __version__ = &#39;0.54dev1&#39; @@ -103,7 +105,7 @@ __version__ = &#39;0.54dev1&#39;
103 105
104 # === IMPORTS ================================================================= 106 # === IMPORTS =================================================================
105 107
106 -import re, os, sys, binascii, logging, optparse 108 +import re, os, sys, binascii, logging, optparse, hashlib
107 import os.path 109 import os.path
108 from time import time 110 from time import time
109 111
@@ -268,7 +270,7 @@ re_delim_hexblock = re.compile(DELIMITER + PATTERN) @@ -268,7 +270,7 @@ re_delim_hexblock = re.compile(DELIMITER + PATTERN)
268 270
269 # TODO: use a frozenset instead of a regex? 271 # TODO: use a frozenset instead of a regex?
270 re_executable_extensions = re.compile( 272 re_executable_extensions = re.compile(
271 - r"(?i)\.(EXE|COM|PIF|GADGET|MSI|MSP|MSC|VBS|VBE|VB|JSE|JS|WSF|WSC|WSH|WS|BAT|CMD|DLL|SCR|HTA|CPL|CLASS|JAR|PS1XML|PS1|PS2XML|PS2|PSC1|PSC2|SCF|LNK|INF|REG)\b") 273 + r"(?i)\.(BAT|CLASS|CMD|CPL|DLL|EXECOM|GADGET|HTA|INF|JAR|JS|JSE|LNK|MSC|MSI|MSP|PIF|PS1|PS1XML|PS2|PS2XML|PSC1|PSC2|REG|SCF|SCR|SCT|VB|VBE|VBS|WS|WSC|WSF|WSH)\b")
272 274
273 # Destination Control Words, according to MS RTF Specifications v1.9.1: 275 # Destination Control Words, according to MS RTF Specifications v1.9.1:
274 DESTINATION_CONTROL_WORDS = frozenset(( 276 DESTINATION_CONTROL_WORDS = frozenset((
@@ -678,6 +680,7 @@ class RtfObjParser(RtfParser): @@ -678,6 +680,7 @@ class RtfObjParser(RtfParser):
678 rtfobj.hexdata = hexdata 680 rtfobj.hexdata = hexdata
679 object_data = binascii.unhexlify(hexdata) 681 object_data = binascii.unhexlify(hexdata)
680 rtfobj.rawdata = object_data 682 rtfobj.rawdata = object_data
  683 + rtfobj.rawdata_md5 = hashlib.md5(object_data).hexdigest()
681 # TODO: check if all hex data is extracted properly 684 # TODO: check if all hex data is extracted properly
682 685
683 obj = oleobj.OleObject() 686 obj = oleobj.OleObject()
@@ -687,6 +690,7 @@ class RtfObjParser(RtfParser): @@ -687,6 +690,7 @@ class RtfObjParser(RtfParser):
687 rtfobj.class_name = obj.class_name 690 rtfobj.class_name = obj.class_name
688 rtfobj.oledata_size = obj.data_size 691 rtfobj.oledata_size = obj.data_size
689 rtfobj.oledata = obj.data 692 rtfobj.oledata = obj.data
  693 + rtfobj.oledata_md5 = hashlib.md5(obj.data).hexdigest()
690 rtfobj.is_ole = True 694 rtfobj.is_ole = True
691 if obj.class_name.lower() == b'package': 695 if obj.class_name.lower() == b'package':
692 opkg = oleobj.OleNativeStream(bindata=obj.data, 696 opkg = oleobj.OleNativeStream(bindata=obj.data,
@@ -695,6 +699,7 @@ class RtfObjParser(RtfParser): @@ -695,6 +699,7 @@ class RtfObjParser(RtfParser):
695 rtfobj.src_path = opkg.src_path 699 rtfobj.src_path = opkg.src_path
696 rtfobj.temp_path = opkg.temp_path 700 rtfobj.temp_path = opkg.temp_path
697 rtfobj.olepkgdata = opkg.data 701 rtfobj.olepkgdata = opkg.data
  702 + rtfobj.olepkgdata_md5 = hashlib.md5(opkg.data).hexdigest()
698 rtfobj.is_package = True 703 rtfobj.is_package = True
699 else: 704 else:
700 if olefile.isOleFile(obj.data): 705 if olefile.isOleFile(obj.data):
@@ -878,15 +883,23 @@ def process_file(container, filename, data, output_dir=None, save_object=False): @@ -878,15 +883,23 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
878 ole_column += '\nFilename: %r' % rtfobj.filename 883 ole_column += '\nFilename: %r' % rtfobj.filename
879 ole_column += '\nSource path: %r' % rtfobj.src_path 884 ole_column += '\nSource path: %r' % rtfobj.src_path
880 ole_column += '\nTemp path = %r' % rtfobj.temp_path 885 ole_column += '\nTemp path = %r' % rtfobj.temp_path
  886 + ole_column += '\nMD5 = %r' % rtfobj.olepkgdata_md5
881 ole_color = 'yellow' 887 ole_color = 'yellow'
882 # check if the file extension is executable: 888 # check if the file extension is executable:
883 - _, ext = os.path.splitext(rtfobj.filename)  
884 - log.debug('File extension: %r' % ext)  
885 - if re_executable_extensions.match(ext): 889 +
  890 + _, temp_ext = os.path.splitext(rtfobj.temp_path)
  891 + log.debug('Temp path extension: %r' % temp_ext)
  892 + _, file_ext = os.path.splitext(rtfobj.filename)
  893 + log.debug('File extension: %r' % file_ext)
  894 +
  895 + if temp_ext != file_ext:
  896 + ole_column += "\nMODIFIED FILE EXTENSION"
  897 +
  898 + if re_executable_extensions.match(temp_ext) or re_executable_extensions.match(file_ext):
886 ole_color = 'red' 899 ole_color = 'red'
887 ole_column += '\nEXECUTABLE FILE' 900 ole_column += '\nEXECUTABLE FILE'
888 - # else:  
889 - # pkg_column = 'Not an OLE Package' 901 + else:
  902 + ole_column += '\nMD5 = %r' % rtfobj.oledata_md5
890 if rtfobj.clsid is not None: 903 if rtfobj.clsid is not None:
891 ole_column += '\nCLSID: %s' % rtfobj.clsid 904 ole_column += '\nCLSID: %s' % rtfobj.clsid
892 ole_column += '\n%s' % rtfobj.clsid_desc 905 ole_column += '\n%s' % rtfobj.clsid_desc
@@ -896,7 +909,28 @@ def process_file(container, filename, data, output_dir=None, save_object=False): @@ -896,7 +909,28 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
896 # http://www.kb.cert.org/vuls/id/921560 909 # http://www.kb.cert.org/vuls/id/921560
897 if rtfobj.class_name == b'OLE2Link': 910 if rtfobj.class_name == b'OLE2Link':
898 ole_color = 'red' 911 ole_color = 'red'
899 - ole_column += '\nPossibly an exploit for the OLE2Link vulnerability (VU#921560, CVE-2017-0199)' 912 + ole_column += '\nPossibly an exploit for the OLE2Link vulnerability (VU#921560, CVE-2017-0199)\n'
  913 + # https://bitbucket.org/snippets/Alexander_Hanel/7Adpp
  914 + found_list = re.findall(r'[a-fA-F0-9\x0D\x0A]{128,}',data)
  915 + urls = []
  916 + for item in found_list:
  917 + try:
  918 + temp = item.replace("\x0D\x0A","").decode("hex")
  919 + except:
  920 + continue
  921 + pat = re.compile(r'(?:[\x20-\x7E][\x00]){3,}')
  922 + words = [w.decode('utf-16le') for w in pat.findall(temp)]
  923 + for w in words:
  924 + if "http" in w:
  925 + urls.append(w)
  926 + urls = sorted(set(urls))
  927 + if urls:
  928 + ole_column += 'URL extracted: ' + ', '.join(urls)
  929 + # Detect Equation Editor exploit
  930 + # https://www.kb.cert.org/vuls/id/421280/
  931 + elif rtfobj.class_name.lower() == b'equation.3':
  932 + ole_color = 'red'
  933 + ole_column += '\nPossibly an exploit for the Equation Editor vulnerability (VU#421280, CVE-2017-11882)'
900 else: 934 else:
901 ole_column = 'Not a well-formed OLE object' 935 ole_column = 'Not a well-formed OLE object'
902 tstream.write_row(( 936 tstream.write_row((
@@ -930,6 +964,7 @@ def process_file(container, filename, data, output_dir=None, save_object=False): @@ -930,6 +964,7 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
930 else: 964 else:
931 fname = '%s_object_%08X.noname' % (fname_prefix, rtfobj.start) 965 fname = '%s_object_%08X.noname' % (fname_prefix, rtfobj.start)
932 print(' saving to file %s' % fname) 966 print(' saving to file %s' % fname)
  967 + print(' md5 %s' % rtfobj.olepkgdata_md5)
933 open(fname, 'wb').write(rtfobj.olepkgdata) 968 open(fname, 'wb').write(rtfobj.olepkgdata)
934 # When format_id=TYPE_LINKED, oledata_size=None 969 # When format_id=TYPE_LINKED, oledata_size=None
935 elif rtfobj.is_ole and rtfobj.oledata_size is not None: 970 elif rtfobj.is_ole and rtfobj.oledata_size is not None:
@@ -947,11 +982,13 @@ def process_file(container, filename, data, output_dir=None, save_object=False): @@ -947,11 +982,13 @@ def process_file(container, filename, data, output_dir=None, save_object=False):
947 ext = 'bin' 982 ext = 'bin'
948 fname = '%s_object_%08X.%s' % (fname_prefix, rtfobj.start, ext) 983 fname = '%s_object_%08X.%s' % (fname_prefix, rtfobj.start, ext)
949 print(' saving to file %s' % fname) 984 print(' saving to file %s' % fname)
  985 + print(' md5 %s' % rtfobj.oledata_md5)
950 open(fname, 'wb').write(rtfobj.oledata) 986 open(fname, 'wb').write(rtfobj.oledata)
951 else: 987 else:
952 print('Saving raw data in object #%d:' % i) 988 print('Saving raw data in object #%d:' % i)
953 fname = '%s_object_%08X.raw' % (fname_prefix, rtfobj.start) 989 fname = '%s_object_%08X.raw' % (fname_prefix, rtfobj.start)
954 print(' saving object to file %s' % fname) 990 print(' saving object to file %s' % fname)
  991 + print(' md5 %s' % rtfobj.rawdata_md5)
955 open(fname, 'wb').write(rtfobj.rawdata) 992 open(fname, 'wb').write(rtfobj.rawdata)
956 993
957 994
@@ -1035,4 +1072,3 @@ if __name__ == &#39;__main__&#39;: @@ -1035,4 +1072,3 @@ if __name__ == &#39;__main__&#39;:
1035 main() 1072 main()
1036 1073
1037 # This code was developed while listening to The Mary Onettes "Lost" 1074 # This code was developed while listening to The Mary Onettes "Lost"
1038 -  
oletools/thirdparty/oledump/__init__.py 0 → 100644
oletools/thirdparty/oledump/plugin_biff.py 0 → 100644
  1 +#!/usr/bin/env python
  2 +
  3 +__description__ = 'BIFF plugin for oledump.py'
  4 +__author__ = 'Didier Stevens'
  5 +__version__ = '0.0.5'
  6 +__date__ = '2019/03/06'
  7 +
  8 +# Slightly modified version by Philippe Lagadec to be imported into olevba
  9 +
  10 +"""
  11 +
  12 +Source code put in public domain by Didier Stevens, no Copyright
  13 +https://DidierStevens.com
  14 +Use at your own risk
  15 +
  16 +History:
  17 + 2014/11/15: start
  18 + 2014/11/21: changed interface: added options; added options -a (asciidump) and -s (strings)
  19 + 2017/12/10: 0.0.2 added optparse & option -o
  20 + 2017/12/12: added option -f
  21 + 2017/12/13: added 0x support for option -f
  22 + 2018/10/24: 0.0.3 started coding Excel 4.0 macro support
  23 + 2018/10/25: continue
  24 + 2018/10/26: continue
  25 + 2019/01/05: 0.0.4 added option -x
  26 + 2019/03/06: 0.0.5 enhanced parsing of formula expressions
  27 +
  28 +Todo:
  29 +"""
  30 +
  31 +import struct
  32 +import re
  33 +import optparse
  34 +import binascii
  35 +import sys
  36 +
  37 +# from olevba:
  38 +
  39 +if sys.version_info[0] <= 2:
  40 + # Python 2.x
  41 + PYTHON2 = True
  42 +else:
  43 + # Python 3.x+
  44 + PYTHON2 = False
  45 +
  46 +def unicode2str(unicode_string):
  47 + """
  48 + convert a unicode string to a native str:
  49 + - on Python 3, it returns the same string
  50 + - on Python 2, the string is encoded with UTF-8 to a bytes str
  51 + :param unicode_string: unicode string to be converted
  52 + :return: the string converted to str
  53 + :rtype: str
  54 + """
  55 + if PYTHON2:
  56 + return unicode_string.encode('utf8', errors='replace')
  57 + else:
  58 + return unicode_string
  59 +
  60 +
  61 +def bytes2str(bytes_string, encoding='utf8'):
  62 + """
  63 + convert a bytes string to a native str:
  64 + - on Python 2, it returns the same string (bytes=str)
  65 + - on Python 3, the string is decoded using the provided encoding
  66 + (UTF-8 by default) to a unicode str
  67 + :param bytes_string: bytes string to be converted
  68 + :param encoding: codec to be used for decoding
  69 + :return: the string converted to str
  70 + :rtype: str
  71 + """
  72 + if PYTHON2:
  73 + return bytes_string
  74 + else:
  75 + return bytes_string.decode(encoding, errors='replace')
  76 +
  77 +
  78 +dTokens = {
  79 +0x01: 'ptgExp',
  80 +0x02: 'ptgTbl',
  81 +0x03: 'ptgAdd',
  82 +0x04: 'ptgSub',
  83 +0x05: 'ptgMul',
  84 +0x06: 'ptgDiv',
  85 +0x07: 'ptgPower',
  86 +0x08: 'ptgConcat',
  87 +0x09: 'ptgLT',
  88 +0x0A: 'ptgLE',
  89 +0x0B: 'ptgEQ',
  90 +0x0C: 'ptgGE',
  91 +0x0D: 'ptgGT',
  92 +0x0E: 'ptgNE',
  93 +0x0F: 'ptgIsect',
  94 +0x10: 'ptgUnion',
  95 +0x11: 'ptgRange',
  96 +0x12: 'ptgUplus',
  97 +0x13: 'ptgUminus',
  98 +0x14: 'ptgPercent',
  99 +0x15: 'ptgParen',
  100 +0x16: 'ptgMissArg',
  101 +0x17: 'ptgStr',
  102 +0x19: 'ptgAttr',
  103 +0x1A: 'ptgSheet',
  104 +0x1B: 'ptgEndSheet',
  105 +0x1C: 'ptgErr',
  106 +0x1D: 'ptgBool',
  107 +0x1E: 'ptgInt',
  108 +0x1F: 'ptgNum',
  109 +0x20: 'ptgArray',
  110 +0x21: 'ptgFunc',
  111 +0x22: 'ptgFuncVar',
  112 +0x23: 'ptgName',
  113 +0x24: 'ptgRef',
  114 +0x25: 'ptgArea',
  115 +0x26: 'ptgMemArea',
  116 +0x27: 'ptgMemErr',
  117 +0x28: 'ptgMemNoMem',
  118 +0x29: 'ptgMemFunc',
  119 +0x2A: 'ptgRefErr',
  120 +0x2B: 'ptgAreaErr',
  121 +0x2C: 'ptgRefN',
  122 +0x2D: 'ptgAreaN',
  123 +0x2E: 'ptgMemAreaN',
  124 +0x2F: 'ptgMemNoMemN',
  125 +0x39: 'ptgNameX',
  126 +0x3A: 'ptgRef3d',
  127 +0x3B: 'ptgArea3d',
  128 +0x3C: 'ptgRefErr3d',
  129 +0x3D: 'ptgAreaErr3d',
  130 +0x40: 'ptgArrayV',
  131 +0x41: 'ptgFuncV',
  132 +0x42: 'ptgFuncVarV',
  133 +0x43: 'ptgNameV',
  134 +0x44: 'ptgRefV',
  135 +0x45: 'ptgAreaV',
  136 +0x46: 'ptgMemAreaV',
  137 +0x47: 'ptgMemErrV',
  138 +0x48: 'ptgMemNoMemV',
  139 +0x49: 'ptgMemFuncV',
  140 +0x4A: 'ptgRefErrV',
  141 +0x4B: 'ptgAreaErrV',
  142 +0x4C: 'ptgRefNV',
  143 +0x4D: 'ptgAreaNV',
  144 +0x4E: 'ptgMemAreaNV',
  145 +0x4F: 'ptgMemNoMemNV',
  146 +0x58: 'ptgFuncCEV',
  147 +0x59: 'ptgNameXV',
  148 +0x5A: 'ptgRef3dV',
  149 +0x5B: 'ptgArea3dV',
  150 +0x5C: 'ptgRefErr3dV',
  151 +0x5D: 'ptgAreaErr3dV',
  152 +0x60: 'ptgArrayA',
  153 +0x61: 'ptgFuncA',
  154 +0x62: 'ptgFuncVarA',
  155 +0x63: 'ptgNameA',
  156 +0x64: 'ptgRefA',
  157 +0x65: 'ptgAreaA',
  158 +0x66: 'ptgMemAreaA',
  159 +0x67: 'ptgMemErrA',
  160 +0x68: 'ptgMemNoMemA',
  161 +0x69: 'ptgMemFuncA',
  162 +0x6A: 'ptgRefErrA',
  163 +0x6B: 'ptgAreaErrA',
  164 +0x6C: 'ptgRefNA',
  165 +0x6D: 'ptgAreaNA',
  166 +0x6E: 'ptgMemAreaNA',
  167 +0x6F: 'ptgMemNoMemNA',
  168 +0x78: 'ptgFuncCEA',
  169 +0x79: 'ptgNameXA',
  170 +0x7A: 'ptgRef3dA',
  171 +0x7B: 'ptgArea3dA',
  172 +0x7C: 'ptgRefErr3dA',
  173 +0x7D: 'ptgAreaErr3dA',
  174 +}
  175 +
  176 +#https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/00b5dd7d-51ca-4938-b7b7-483fe0e5933b
  177 +dFunctions = {
  178 +0x0000: 'COUNT',
  179 +0x0001: 'IF',
  180 +0x0002: 'ISNA',
  181 +0x0003: 'ISERROR',
  182 +0x0004: 'SUM',
  183 +0x0005: 'AVERAGE',
  184 +0x0006: 'MIN',
  185 +0x0007: 'MAX',
  186 +0x0008: 'ROW',
  187 +0x0009: 'COLUMN',
  188 +0x000A: 'NA',
  189 +0x000B: 'NPV',
  190 +0x000C: 'STDEV',
  191 +0x000D: 'DOLLAR',
  192 +0x000E: 'FIXED',
  193 +0x000F: 'SIN',
  194 +0x0010: 'COS',
  195 +0x0011: 'TAN',
  196 +0x0012: 'ATAN',
  197 +0x0013: 'PI',
  198 +0x0014: 'SQRT',
  199 +0x0015: 'EXP',
  200 +0x0016: 'LN',
  201 +0x0017: 'LOG10',
  202 +0x0018: 'ABS',
  203 +0x0019: 'INT',
  204 +0x001A: 'SIGN',
  205 +0x001B: 'ROUND',
  206 +0x001C: 'LOOKUP',
  207 +0x001D: 'INDEX',
  208 +0x001E: 'REPT',
  209 +0x001F: 'MID',
  210 +0x0020: 'LEN',
  211 +0x0021: 'VALUE',
  212 +0x0022: 'TRUE',
  213 +0x0023: 'FALSE',
  214 +0x0024: 'AND',
  215 +0x0025: 'OR',
  216 +0x0026: 'NOT',
  217 +0x0027: 'MOD',
  218 +0x0028: 'DCOUNT',
  219 +0x0029: 'DSUM',
  220 +0x002A: 'DAVERAGE',
  221 +0x002B: 'DMIN',
  222 +0x002C: 'DMAX',
  223 +0x002D: 'DSTDEV',
  224 +0x002E: 'VAR',
  225 +0x002F: 'DVAR',
  226 +0x0030: 'TEXT',
  227 +0x0031: 'LINEST',
  228 +0x0032: 'TREND',
  229 +0x0033: 'LOGEST',
  230 +0x0034: 'GROWTH',
  231 +0x0035: 'GOTO',
  232 +0x0036: 'HALT',
  233 +0x0037: 'RETURN',
  234 +0x0038: 'PV',
  235 +0x0039: 'FV',
  236 +0x003A: 'NPER',
  237 +0x003B: 'PMT',
  238 +0x003C: 'RATE',
  239 +0x003D: 'MIRR',
  240 +0x003E: 'IRR',
  241 +0x003F: 'RAND',
  242 +0x0040: 'MATCH',
  243 +0x0041: 'DATE',
  244 +0x0042: 'TIME',
  245 +0x0043: 'DAY',
  246 +0x0044: 'MONTH',
  247 +0x0045: 'YEAR',
  248 +0x0046: 'WEEKDAY',
  249 +0x0047: 'HOUR',
  250 +0x0048: 'MINUTE',
  251 +0x0049: 'SECOND',
  252 +0x004A: 'NOW',
  253 +0x004B: 'AREAS',
  254 +0x004C: 'ROWS',
  255 +0x004D: 'COLUMNS',
  256 +0x004E: 'OFFSET',
  257 +0x004F: 'ABSREF',
  258 +0x0050: 'RELREF',
  259 +0x0051: 'ARGUMENT',
  260 +0x0052: 'SEARCH',
  261 +0x0053: 'TRANSPOSE',
  262 +0x0054: 'ERROR',
  263 +0x0055: 'STEP',
  264 +0x0056: 'TYPE',
  265 +0x0057: 'ECHO',
  266 +0x0058: 'SET.NAME',
  267 +0x0059: 'CALLER',
  268 +0x005A: 'DEREF',
  269 +0x005B: 'WINDOWS',
  270 +0x005C: 'SERIES',
  271 +0x005D: 'DOCUMENTS',
  272 +0x005E: 'ACTIVE.CELL',
  273 +0x005F: 'SELECTION',
  274 +0x0060: 'RESULT',
  275 +0x0061: 'ATAN2',
  276 +0x0062: 'ASIN',
  277 +0x0063: 'ACOS',
  278 +0x0064: 'CHOOSE',
  279 +0x0065: 'HLOOKUP',
  280 +0x0066: 'VLOOKUP',
  281 +0x0067: 'LINKS',
  282 +0x0068: 'INPUT',
  283 +0x0069: 'ISREF',
  284 +0x006A: 'GET.FORMULA',
  285 +0x006B: 'GET.NAME',
  286 +0x006C: 'SET.VALUE',
  287 +0x006D: 'LOG',
  288 +0x006E: 'EXEC',
  289 +0x006F: 'CHAR',
  290 +0x0070: 'LOWER',
  291 +0x0071: 'UPPER',
  292 +0x0072: 'PROPER',
  293 +0x0073: 'LEFT',
  294 +0x0074: 'RIGHT',
  295 +0x0075: 'EXACT',
  296 +0x0076: 'TRIM',
  297 +0x0077: 'REPLACE',
  298 +0x0078: 'SUBSTITUTE',
  299 +0x0079: 'CODE',
  300 +0x007A: 'NAMES',
  301 +0x007B: 'DIRECTORY',
  302 +0x007C: 'FIND',
  303 +0x007D: 'CELL',
  304 +0x007E: 'ISERR',
  305 +0x007F: 'ISTEXT',
  306 +0x0080: 'ISNUMBER',
  307 +0x0081: 'ISBLANK',
  308 +0x0082: 'T',
  309 +0x0083: 'N',
  310 +0x0084: 'FOPEN',
  311 +0x0085: 'FCLOSE',
  312 +0x0086: 'FSIZE',
  313 +0x0087: 'FREADLN',
  314 +0x0088: 'FREAD',
  315 +0x0089: 'FWRITELN',
  316 +0x008A: 'FWRITE',
  317 +0x008B: 'FPOS',
  318 +0x008C: 'DATEVALUE',
  319 +0x008D: 'TIMEVALUE',
  320 +0x008E: 'SLN',
  321 +0x008F: 'SYD',
  322 +0x0090: 'DDB',
  323 +0x0091: 'GET.DEF',
  324 +0x0092: 'REFTEXT',
  325 +0x0093: 'TEXTREF',
  326 +0x0094: 'INDIRECT',
  327 +0x0095: 'REGISTER',
  328 +0x0096: 'CALL',
  329 +0x0097: 'ADD.BAR',
  330 +0x0098: 'ADD.MENU',
  331 +0x0099: 'ADD.COMMAND',
  332 +0x009A: 'ENABLE.COMMAND',
  333 +0x009B: 'CHECK.COMMAND',
  334 +0x009C: 'RENAME.COMMAND',
  335 +0x009D: 'SHOW.BAR',
  336 +0x009E: 'DELETE.MENU',
  337 +0x009F: 'DELETE.COMMAND',
  338 +0x00A0: 'GET.CHART.ITEM',
  339 +0x00A1: 'DIALOG.BOX',
  340 +0x00A2: 'CLEAN',
  341 +0x00A3: 'MDETERM',
  342 +0x00A4: 'MINVERSE',
  343 +0x00A5: 'MMULT',
  344 +0x00A6: 'FILES',
  345 +0x00A7: 'IPMT',
  346 +0x00A8: 'PPMT',
  347 +0x00A9: 'COUNTA',
  348 +0x00AA: 'CANCEL.KEY',
  349 +0x00AB: 'FOR',
  350 +0x00AC: 'WHILE',
  351 +0x00AD: 'BREAK',
  352 +0x00AE: 'NEXT',
  353 +0x00AF: 'INITIATE',
  354 +0x00B0: 'REQUEST',
  355 +0x00B1: 'POKE',
  356 +0x00B2: 'EXECUTE',
  357 +0x00B3: 'TERMINATE',
  358 +0x00B4: 'RESTART',
  359 +0x00B5: 'HELP',
  360 +0x00B6: 'GET.BAR',
  361 +0x00B7: 'PRODUCT',
  362 +0x00B8: 'FACT',
  363 +0x00B9: 'GET.CELL',
  364 +0x00BA: 'GET.WORKSPACE',
  365 +0x00BB: 'GET.WINDOW',
  366 +0x00BC: 'GET.DOCUMENT',
  367 +0x00BD: 'DPRODUCT',
  368 +0x00BE: 'ISNONTEXT',
  369 +0x00BF: 'GET.NOTE',
  370 +0x00C0: 'NOTE',
  371 +0x00C1: 'STDEVP',
  372 +0x00C2: 'VARP',
  373 +0x00C3: 'DSTDEVP',
  374 +0x00C4: 'DVARP',
  375 +0x00C5: 'TRUNC',
  376 +0x00C6: 'ISLOGICAL',
  377 +0x00C7: 'DCOUNTA',
  378 +0x00C8: 'DELETE.BAR',
  379 +0x00C9: 'UNREGISTER',
  380 +0x00CC: 'USDOLLAR',
  381 +0x00CD: 'FINDB',
  382 +0x00CE: 'SEARCHB',
  383 +0x00CF: 'REPLACEB',
  384 +0x00D0: 'LEFTB',
  385 +0x00D1: 'RIGHTB',
  386 +0x00D2: 'MIDB',
  387 +0x00D3: 'LENB',
  388 +0x00D4: 'ROUNDUP',
  389 +0x00D5: 'ROUNDDOWN',
  390 +0x00D6: 'ASC',
  391 +0x00D7: 'DBCS',
  392 +0x00D8: 'RANK',
  393 +0x00DB: 'ADDRESS',
  394 +0x00DC: 'DAYS360',
  395 +0x00DD: 'TODAY',
  396 +0x00DE: 'VDB',
  397 +0x00DF: 'ELSE',
  398 +0x00E0: 'ELSE.IF',
  399 +0x00E1: 'END.IF',
  400 +0x00E2: 'FOR.CELL',
  401 +0x00E3: 'MEDIAN',
  402 +0x00E4: 'SUMPRODUCT',
  403 +0x00E5: 'SINH',
  404 +0x00E6: 'COSH',
  405 +0x00E7: 'TANH',
  406 +0x00E8: 'ASINH',
  407 +0x00E9: 'ACOSH',
  408 +0x00EA: 'ATANH',
  409 +0x00EB: 'DGET',
  410 +0x00EC: 'CREATE.OBJECT',
  411 +0x00ED: 'VOLATILE',
  412 +0x00EE: 'LAST.ERROR',
  413 +0x00EF: 'CUSTOM.UNDO',
  414 +0x00F0: 'CUSTOM.REPEAT',
  415 +0x00F1: 'FORMULA.CONVERT',
  416 +0x00F2: 'GET.LINK.INFO',
  417 +0x00F3: 'TEXT.BOX',
  418 +0x00F4: 'INFO',
  419 +0x00F5: 'GROUP',
  420 +0x00F6: 'GET.OBJECT',
  421 +0x00F7: 'DB',
  422 +0x00F8: 'PAUSE',
  423 +0x00FB: 'RESUME',
  424 +0x00FC: 'FREQUENCY',
  425 +0x00FD: 'ADD.TOOLBAR',
  426 +0x00FE: 'DELETE.TOOLBAR',
  427 +0x00FF: 'User Defined Function',
  428 +0x0100: 'RESET.TOOLBAR',
  429 +0x0101: 'EVALUATE',
  430 +0x0102: 'GET.TOOLBAR',
  431 +0x0103: 'GET.TOOL',
  432 +0x0104: 'SPELLING.CHECK',
  433 +0x0105: 'ERROR.TYPE',
  434 +0x0106: 'APP.TITLE',
  435 +0x0107: 'WINDOW.TITLE',
  436 +0x0108: 'SAVE.TOOLBAR',
  437 +0x0109: 'ENABLE.TOOL',
  438 +0x010A: 'PRESS.TOOL',
  439 +0x010B: 'REGISTER.ID',
  440 +0x010C: 'GET.WORKBOOK',
  441 +0x010D: 'AVEDEV',
  442 +0x010E: 'BETADIST',
  443 +0x010F: 'GAMMALN',
  444 +0x0110: 'BETAINV',
  445 +0x0111: 'BINOMDIST',
  446 +0x0112: 'CHIDIST',
  447 +0x0113: 'CHIINV',
  448 +0x0114: 'COMBIN',
  449 +0x0115: 'CONFIDENCE',
  450 +0x0116: 'CRITBINOM',
  451 +0x0117: 'EVEN',
  452 +0x0118: 'EXPONDIST',
  453 +0x0119: 'FDIST',
  454 +0x011A: 'FINV',
  455 +0x011B: 'FISHER',
  456 +0x011C: 'FISHERINV',
  457 +0x011D: 'FLOOR',
  458 +0x011E: 'GAMMADIST',
  459 +0x011F: 'GAMMAINV',
  460 +0x0120: 'CEILING',
  461 +0x0121: 'HYPGEOMDIST',
  462 +0x0122: 'LOGNORMDIST',
  463 +0x0123: 'LOGINV',
  464 +0x0124: 'NEGBINOMDIST',
  465 +0x0125: 'NORMDIST',
  466 +0x0126: 'NORMSDIST',
  467 +0x0127: 'NORMINV',
  468 +0x0128: 'NORMSINV',
  469 +0x0129: 'STANDARDIZE',
  470 +0x012A: 'ODD',
  471 +0x012B: 'PERMUT',
  472 +0x012C: 'POISSON',
  473 +0x012D: 'TDIST',
  474 +0x012E: 'WEIBULL',
  475 +0x012F: 'SUMXMY2',
  476 +0x0130: 'SUMX2MY2',
  477 +0x0131: 'SUMX2PY2',
  478 +0x0132: 'CHITEST',
  479 +0x0133: 'CORREL',
  480 +0x0134: 'COVAR',
  481 +0x0135: 'FORECAST',
  482 +0x0136: 'FTEST',
  483 +0x0137: 'INTERCEPT',
  484 +0x0138: 'PEARSON',
  485 +0x0139: 'RSQ',
  486 +0x013A: 'STEYX',
  487 +0x013B: 'SLOPE',
  488 +0x013C: 'TTEST',
  489 +0x013D: 'PROB',
  490 +0x013E: 'DEVSQ',
  491 +0x013F: 'GEOMEAN',
  492 +0x0140: 'HARMEAN',
  493 +0x0141: 'SUMSQ',
  494 +0x0142: 'KURT',
  495 +0x0143: 'SKEW',
  496 +0x0144: 'ZTEST',
  497 +0x0145: 'LARGE',
  498 +0x0146: 'SMALL',
  499 +0x0147: 'QUARTILE',
  500 +0x0148: 'PERCENTILE',
  501 +0x0149: 'PERCENTRANK',
  502 +0x014A: 'MODE',
  503 +0x014B: 'TRIMMEAN',
  504 +0x014C: 'TINV',
  505 +0x014E: 'MOVIE.COMMAND',
  506 +0x014F: 'GET.MOVIE',
  507 +0x0150: 'CONCATENATE',
  508 +0x0151: 'POWER',
  509 +0x0152: 'PIVOT.ADD.DATA',
  510 +0x0153: 'GET.PIVOT.TABLE',
  511 +0x0154: 'GET.PIVOT.FIELD',
  512 +0x0155: 'GET.PIVOT.ITEM',
  513 +0x0156: 'RADIANS',
  514 +0x0157: 'DEGREES',
  515 +0x0158: 'SUBTOTAL',
  516 +0x0159: 'SUMIF',
  517 +0x015A: 'COUNTIF',
  518 +0x015B: 'COUNTBLANK',
  519 +0x015C: 'SCENARIO.GET',
  520 +0x015D: 'OPTIONS.LISTS.GET',
  521 +0x015E: 'ISPMT',
  522 +0x015F: 'DATEDIF',
  523 +0x0160: 'DATESTRING',
  524 +0x0161: 'NUMBERSTRING',
  525 +0x0162: 'ROMAN',
  526 +0x0163: 'OPEN.DIALOG',
  527 +0x0164: 'SAVE.DIALOG',
  528 +0x0165: 'VIEW.GET',
  529 +0x0166: 'GETPIVOTDATA',
  530 +0x0167: 'HYPERLINK',
  531 +0x0168: 'PHONETIC',
  532 +0x0169: 'AVERAGEA',
  533 +0x016A: 'MAXA',
  534 +0x016B: 'MINA',
  535 +0x016C: 'STDEVPA',
  536 +0x016D: 'VARPA',
  537 +0x016E: 'STDEVA',
  538 +0x016F: 'VARA',
  539 +0x0170: 'BAHTTEXT',
  540 +0x0171: 'THAIDAYOFWEEK',
  541 +0x0172: 'THAIDIGIT',
  542 +0x0173: 'THAIMONTHOFYEAR',
  543 +0x0174: 'THAINUMSOUND',
  544 +0x0175: 'THAINUMSTRING',
  545 +0x0176: 'THAISTRINGLENGTH',
  546 +0x0177: 'ISTHAIDIGIT',
  547 +0x0178: 'ROUNDBAHTDOWN',
  548 +0x0179: 'ROUNDBAHTUP',
  549 +0x017A: 'THAIYEAR',
  550 +0x017B: 'RTD',
  551 +
  552 +0x8076: 'ALERT',
  553 +}
  554 +
  555 +dOpcodes = {
  556 + 0x06: 'FORMULA : Cell Formula',
  557 + 0x0A: 'EOF : End of File',
  558 + 0x0C: 'CALCCOUNT : Iteration Count',
  559 + 0x0D: 'CALCMODE : Calculation Mode',
  560 + 0x0E: 'PRECISION : Precision',
  561 + 0x0F: 'REFMODE : Reference Mode',
  562 + 0x10: 'DELTA : Iteration Increment',
  563 + 0x11: 'ITERATION : Iteration Mode',
  564 + 0x12: 'PROTECT : Protection Flag',
  565 + 0x13: 'PASSWORD : Protection Password',
  566 + 0x14: 'HEADER : Print Header on Each Page',
  567 + 0x15: 'FOOTER : Print Footer on Each Page',
  568 + 0x16: 'EXTERNCOUNT : Number of External References',
  569 + 0x17: 'EXTERNSHEET : External Reference',
  570 + 0x18: 'LABEL : Cell Value, String Constant',
  571 + 0x19: 'WINDOWPROTECT : Windows Are Protected',
  572 + 0x1A: 'VERTICALPAGEBREAKS : Explicit Column Page Breaks',
  573 + 0x1B: 'HORIZONTALPAGEBREAKS : Explicit Row Page Breaks',
  574 + 0x1C: 'NOTE : Comment Associated with a Cell',
  575 + 0x1D: 'SELECTION : Current Selection',
  576 + 0x22: '1904 : 1904 Date System',
  577 + 0x26: 'LEFTMARGIN : Left Margin Measurement',
  578 + 0x27: 'RIGHTMARGIN : Right Margin Measurement',
  579 + 0x28: 'TOPMARGIN : Top Margin Measurement',
  580 + 0x29: 'BOTTOMMARGIN : Bottom Margin Measurement',
  581 + 0x2A: 'PRINTHEADERS : Print Row/Column Labels',
  582 + 0x2B: 'PRINTGRIDLINES : Print Gridlines Flag',
  583 + 0x2F: 'FILEPASS : File Is Password-Protected',
  584 + 0x3C: 'CONTINUE : Continues Long Records',
  585 + 0x3D: 'WINDOW1 : Window Information',
  586 + 0x40: 'BACKUP : Save Backup Version of the File',
  587 + 0x41: 'PANE : Number of Panes and Their Position',
  588 + 0x42: 'CODENAME : VBE Object Name',
  589 + 0x42: 'CODEPAGE : Default Code Page',
  590 + 0x4D: 'PLS : Environment-Specific Print Record',
  591 + 0x50: 'DCON : Data Consolidation Information',
  592 + 0x51: 'DCONREF : Data Consolidation References',
  593 + 0x52: 'DCONNAME : Data Consolidation Named References',
  594 + 0x55: 'DEFCOLWIDTH : Default Width for Columns',
  595 + 0x59: 'XCT : CRN Record Count',
  596 + 0x5A: 'CRN : Nonresident Operands',
  597 + 0x5B: 'FILESHARING : File-Sharing Information',
  598 + 0x5C: 'WRITEACCESS : Write Access User Name',
  599 + 0x5D: 'OBJ : Describes a Graphic Object',
  600 + 0x5E: 'UNCALCED : Recalculation Status',
  601 + 0x5F: 'SAVERECALC : Recalculate Before Save',
  602 + 0x60: 'TEMPLATE : Workbook Is a Template',
  603 + 0x63: 'OBJPROTECT : Objects Are Protected',
  604 + 0x7D: 'COLINFO : Column Formatting Information',
  605 + 0x7E: 'RK : Cell Value, RK Number',
  606 + 0x7F: 'IMDATA : Image Data',
  607 + 0x80: 'GUTS : Size of Row and Column Gutters',
  608 + 0x81: 'WSBOOL : Additional Workspace Information',
  609 + 0x82: 'GRIDSET : State Change of Gridlines Option',
  610 + 0x83: 'HCENTER : Center Between Horizontal Margins',
  611 + 0x84: 'VCENTER : Center Between Vertical Margins',
  612 + 0x85: 'BOUNDSHEET : Sheet Information',
  613 + 0x86: 'WRITEPROT : Workbook Is Write-Protected',
  614 + 0x87: 'ADDIN : Workbook Is an Add-in Macro',
  615 + 0x88: 'EDG : Edition Globals',
  616 + 0x89: 'PUB : Publisher',
  617 + 0x8C: 'COUNTRY : Default Country and WIN.INI Country',
  618 + 0x8D: 'HIDEOBJ : Object Display Options',
  619 + 0x90: 'SORT : Sorting Options',
  620 + 0x91: 'SUB : Subscriber',
  621 + 0x92: 'PALETTE : Color Palette Definition',
  622 + 0x94: 'LHRECORD : .WK? File Conversion Information',
  623 + 0x95: 'LHNGRAPH : Named Graph Information',
  624 + 0x96: 'SOUND : Sound Note',
  625 + 0x98: 'LPR : Sheet Was Printed Using LINE.PRINT(',
  626 + 0x99: 'STANDARDWIDTH : Standard Column Width',
  627 + 0x9A: 'FNGROUPNAME : Function Group Name',
  628 + 0x9B: 'FILTERMODE : Sheet Contains Filtered List',
  629 + 0x9C: 'FNGROUPCOUNT : Built-in Function Group Count',
  630 + 0x9D: 'AUTOFILTERINFO : Drop-Down Arrow Count',
  631 + 0x9E: 'AUTOFILTER : AutoFilter Data',
  632 + 0xA0: 'SCL : Window Zoom Magnification',
  633 + 0xA1: 'SETUP : Page Setup',
  634 + 0xA9: 'COORDLIST : Polygon Object Vertex Coordinates',
  635 + 0xAB: 'GCW : Global Column-Width Flags',
  636 + 0xAE: 'SCENMAN : Scenario Output Data',
  637 + 0xAF: 'SCENARIO : Scenario Data',
  638 + 0xB0: 'SXVIEW : View Definition',
  639 + 0xB1: 'SXVD : View Fields',
  640 + 0xB2: 'SXVI : View Item',
  641 + 0xB4: 'SXIVD : Row/Column Field IDs',
  642 + 0xB5: 'SXLI : Line Item Array',
  643 + 0xB6: 'SXPI : Page Item',
  644 + 0xB8: 'DOCROUTE : Routing Slip Information',
  645 + 0xB9: 'RECIPNAME : Recipient Name',
  646 + 0xBC: 'SHRFMLA : Shared Formula',
  647 + 0xBD: 'MULRK : Multiple RK Cells',
  648 + 0xBE: 'MULBLANK : Multiple Blank Cells',
  649 + 0xC1: 'MMS : ADDMENU / DELMENU Record Group Count',
  650 + 0xC2: 'ADDMENU : Menu Addition',
  651 + 0xC3: 'DELMENU : Menu Deletion',
  652 + 0xC5: 'SXDI : Data Item',
  653 + 0xC6: 'SXDB : PivotTable Cache Data',
  654 + 0xCD: 'SXSTRING : String',
  655 + 0xD0: 'SXTBL : Multiple Consolidation Source Info',
  656 + 0xD1: 'SXTBRGIITM : Page Item Name Count',
  657 + 0xD2: 'SXTBPG : Page Item Indexes',
  658 + 0xD3: 'OBPROJ : Visual Basic Project',
  659 + 0xD5: 'SXIDSTM : Stream ID',
  660 + 0xD6: 'RSTRING : Cell with Character Formatting',
  661 + 0xD7: 'DBCELL : Stream Offsets',
  662 + 0xDA: 'BOOKBOOL : Workbook Option Flag',
  663 + 0xDC: 'PARAMQRY : Query Parameters',
  664 + 0xDC: 'SXEXT : External Source Information',
  665 + 0xDD: 'SCENPROTECT : Scenario Protection',
  666 + 0xDE: 'OLESIZE : Size of OLE Object',
  667 + 0xDF: 'UDDESC : Description String for Chart Autoformat',
  668 + 0xE0: 'XF : Extended Format',
  669 + 0xE1: 'INTERFACEHDR : Beginning of User Interface Records',
  670 + 0xE2: 'INTERFACEEND : End of User Interface Records',
  671 + 0xE3: 'SXVS : View Source',
  672 + 0xE5: 'MERGECELLS : Merged Cells',
  673 + 0xEA: 'TABIDCONF : Sheet Tab ID of Conflict History',
  674 + 0xEB: 'MSODRAWINGGROUP : Microsoft Office Drawing Group',
  675 + 0xEC: 'MSODRAWING : Microsoft Office Drawing',
  676 + 0xED: 'MSODRAWINGSELECTION : Microsoft Office Drawing Selection',
  677 + 0xF0: 'SXRULE : PivotTable Rule Data',
  678 + 0xF1: 'SXEX : PivotTable View Extended Information',
  679 + 0xF2: 'SXFILT : PivotTable Rule Filter',
  680 + 0xF4: 'SXDXF : Pivot Table Formatting',
  681 + 0xF5: 'SXITM : Pivot Table Item Indexes',
  682 + 0xF6: 'SXNAME : PivotTable Name',
  683 + 0xF7: 'SXSELECT : PivotTable Selection Information',
  684 + 0xF8: 'SXPAIR : PivotTable Name Pair',
  685 + 0xF9: 'SXFMLA : Pivot Table Parsed Expression',
  686 + 0xFB: 'SXFORMAT : PivotTable Format Record',
  687 + 0xFC: 'SST : Shared String Table',
  688 + 0xFD: 'LABELSST : Cell Value, String Constant/ SST',
  689 + 0xFF: 'EXTSST : Extended Shared String Table',
  690 + 0x100: 'SXVDEX : Extended PivotTable View Fields',
  691 + 0x103: 'SXFORMULA : PivotTable Formula Record',
  692 + 0x122: 'SXDBEX : PivotTable Cache Data',
  693 + 0x13D: 'TABID : Sheet Tab Index Array',
  694 + 0x160: 'USESELFS : Natural Language Formulas Flag',
  695 + 0x161: 'DSF : Double Stream File',
  696 + 0x162: 'XL5MODIFY : Flag for DSF',
  697 + 0x1A5: 'FILESHARING2 : File-Sharing Information for Shared Lists',
  698 + 0x1A9: 'USERBVIEW : Workbook Custom View Settings',
  699 + 0x1AA: 'USERSVIEWBEGIN : Custom View Settings',
  700 + 0x1AB: 'USERSVIEWEND : End of Custom View Records',
  701 + 0x1AD: 'QSI : External Data Range',
  702 + 0x1AE: 'SUPBOOK : Supporting Workbook',
  703 + 0x1AF: 'PROT4REV : Shared Workbook Protection Flag',
  704 + 0x1B0: 'CONDFMT : Conditional Formatting Range Information',
  705 + 0x1B1: 'CF : Conditional Formatting Conditions',
  706 + 0x1B2: 'DVAL : Data Validation Information',
  707 + 0x1B5: 'DCONBIN : Data Consolidation Information',
  708 + 0x1B6: 'TXO : Text Object',
  709 + 0x1B7: 'REFRESHALL : Refresh Flag',
  710 + 0x1B8: 'HLINK : Hyperlink',
  711 + 0x1BB: 'SXFDBTYPE : SQL Datatype Identifier',
  712 + 0x1BC: 'PROT4REVPASS : Shared Workbook Protection Password',
  713 + 0x1BE: 'DV : Data Validation Criteria',
  714 + 0x1C0: 'EXCEL9FILE : Excel 9 File',
  715 + 0x1C1: 'RECALCID : Recalc Information',
  716 + 0x200: 'DIMENSIONS : Cell Table Size',
  717 + 0x201: 'BLANK : Cell Value, Blank Cell',
  718 + 0x203: 'NUMBER : Cell Value, Floating-Point Number',
  719 + 0x204: 'LABEL : Cell Value, String Constant',
  720 + 0x205: 'BOOLERR : Cell Value, Boolean or Error',
  721 + 0x207: 'STRING : String Value of a Formula',
  722 + 0x208: 'ROW : Describes a Row',
  723 + 0x20B: 'INDEX : Index Record',
  724 + 0x218: 'NAME : Defined Name',
  725 + 0x221: 'ARRAY : Array-Entered Formula',
  726 + 0x223: 'EXTERNNAME : Externally Referenced Name',
  727 + 0x225: 'DEFAULTROWHEIGHT : Default Row Height',
  728 + 0x231: 'FONT : Font Description',
  729 + 0x236: 'TABLE : Data Table',
  730 + 0x23E: 'WINDOW2 : Sheet Window Information',
  731 + 0x293: 'STYLE : Style Information',
  732 + 0x406: 'FORMULA : Cell Formula',
  733 + 0x41E: 'FORMAT : Number Format',
  734 + 0x800: 'HLINKTOOLTIP : Hyperlink Tooltip',
  735 + 0x801: 'WEBPUB : Web Publish Item',
  736 + 0x802: 'QSISXTAG : PivotTable and Query Table Extensions',
  737 + 0x803: 'DBQUERYEXT : Database Query Extensions',
  738 + 0x804: 'EXTSTRING : FRT String',
  739 + 0x805: 'TXTQUERY : Text Query Information',
  740 + 0x806: 'QSIR : Query Table Formatting',
  741 + 0x807: 'QSIF : Query Table Field Formatting',
  742 + 0x809: 'BOF : Beginning of File',
  743 + 0x80A: 'OLEDBCONN : OLE Database Connection',
  744 + 0x80B: 'WOPT : Web Options',
  745 + 0x80C: 'SXVIEWEX : Pivot Table OLAP Extensions',
  746 + 0x80D: 'SXTH : PivotTable OLAP Hierarchy',
  747 + 0x80E: 'SXPIEX : OLAP Page Item Extensions',
  748 + 0x80F: 'SXVDTEX : View Dimension OLAP Extensions',
  749 + 0x810: 'SXVIEWEX9 : Pivot Table Extensions',
  750 + 0x812: 'CONTINUEFRT : Continued FRT',
  751 + 0x813: 'REALTIMEDATA : Real-Time Data (RTD)',
  752 + 0x862: 'SHEETEXT : Extra Sheet Info',
  753 + 0x863: 'BOOKEXT : Extra Book Info',
  754 + 0x864: 'SXADDL : Pivot Table Additional Info',
  755 + 0x865: 'CRASHRECERR : Crash Recovery Error',
  756 + 0x866: 'HFPicture : Header / Footer Picture',
  757 + 0x867: 'FEATHEADR : Shared Feature Header',
  758 + 0x868: 'FEAT : Shared Feature Record',
  759 + 0x86A: 'DATALABEXT : Chart Data Label Extension',
  760 + 0x86B: 'DATALABEXTCONTENTS : Chart Data Label Extension Contents',
  761 + 0x86C: 'CELLWATCH : Cell Watch',
  762 + 0x86d: 'FEATINFO : Shared Feature Info Record',
  763 + 0x871: 'FEATHEADR11 : Shared Feature Header 11',
  764 + 0x872: 'FEAT11 : Shared Feature 11 Record',
  765 + 0x873: 'FEATINFO11 : Shared Feature Info 11 Record',
  766 + 0x874: 'DROPDOWNOBJIDS : Drop Down Object',
  767 + 0x875: 'CONTINUEFRT11 : Continue FRT 11',
  768 + 0x876: 'DCONN : Data Connection',
  769 + 0x877: 'LIST12 : Extra Table Data Introduced in Excel 2007',
  770 + 0x878: 'FEAT12 : Shared Feature 12 Record',
  771 + 0x879: 'CONDFMT12 : Conditional Formatting Range Information 12',
  772 + 0x87A: 'CF12 : Conditional Formatting Condition 12',
  773 + 0x87B: 'CFEX : Conditional Formatting Extension',
  774 + 0x87C: 'XFCRC : XF Extensions Checksum',
  775 + 0x87D: 'XFEXT : XF Extension',
  776 + 0x87E: 'EZFILTER12 : AutoFilter Data Introduced in Excel 2007',
  777 + 0x87F: 'CONTINUEFRT12 : Continue FRT 12',
  778 + 0x881: 'SXADDL12 : Additional Workbook Connections Information',
  779 + 0x884: 'MDTINFO : Information about a Metadata Type',
  780 + 0x885: 'MDXSTR : MDX Metadata String',
  781 + 0x886: 'MDXTUPLE : Tuple MDX Metadata',
  782 + 0x887: 'MDXSET : Set MDX Metadata',
  783 + 0x888: 'MDXPROP : Member Property MDX Metadata',
  784 + 0x889: 'MDXKPI : Key Performance Indicator MDX Metadata',
  785 + 0x88A: 'MDTB : Block of Metadata Records',
  786 + 0x88B: 'PLV : Page Layout View Settings in Excel 2007',
  787 + 0x88C: 'COMPAT12 : Compatibility Checker 12',
  788 + 0x88D: 'DXF : Differential XF',
  789 + 0x88E: 'TABLESTYLES : Table Styles',
  790 + 0x88F: 'TABLESTYLE : Table Style',
  791 + 0x890: 'TABLESTYLEELEMENT : Table Style Element',
  792 + 0x892: 'STYLEEXT : Named Cell Style Extension',
  793 + 0x893: 'NAMEPUBLISH : Publish To Excel Server Data for Name',
  794 + 0x894: 'NAMECMT : Name Comment',
  795 + 0x895: 'SORTDATA12 : Sort Data 12',
  796 + 0x896: 'THEME : Theme',
  797 + 0x897: 'GUIDTYPELIB : VB Project Typelib GUID',
  798 + 0x898: 'FNGRP12 : Function Group',
  799 + 0x899: 'NAMEFNGRP12 : Extra Function Group',
  800 + 0x89A: 'MTRSETTINGS : Multi-Threaded Calculation Settings',
  801 + 0x89B: 'COMPRESSPICTURES : Automatic Picture Compression Mode',
  802 + 0x89C: 'HEADERFOOTER : Header Footer',
  803 + 0x8A3: 'FORCEFULLCALCULATION : Force Full Calculation Settings',
  804 + 0x8c1: 'LISTOBJ : List Object',
  805 + 0x8c2: 'LISTFIELD : List Field',
  806 + 0x8c3: 'LISTDV : List Data Validation',
  807 + 0x8c4: 'LISTCONDFMT : List Conditional Formatting',
  808 + 0x8c5: 'LISTCF : List Cell Formatting',
  809 + 0x8c6: 'FMQRY : Filemaker queries',
  810 + 0x8c7: 'FMSQRY : File maker queries',
  811 + 0x8c8: 'PLV : Page Layout View in Mac Excel 11',
  812 + 0x8c9: 'LNEXT : Extension information for borders in Mac Office 11',
  813 + 0x8ca: 'MKREXT : Extension information for markers in Mac Office 11'
  814 +}
  815 +
  816 +
  817 +# CIC: Call If Callable
  818 +def CIC(expression):
  819 + if callable(expression):
  820 + return expression()
  821 + else:
  822 + return expression
  823 +
  824 +
  825 +# IFF: IF Function
  826 +def IFF(expression, valueTrue, valueFalse):
  827 + if expression:
  828 + return CIC(valueTrue)
  829 + else:
  830 + return CIC(valueFalse)
  831 +
  832 +
  833 +def CombineHexASCII(hexDump, asciiDump, length):
  834 + if hexDump == '':
  835 + return ''
  836 + return hexDump + ' ' + (' ' * (3 * (length - len(asciiDump)))) + asciiDump
  837 +
  838 +def HexASCII(data, length=16):
  839 + result = []
  840 + if len(data) > 0:
  841 + hexDump = ''
  842 + asciiDump = ''
  843 + for i, b in enumerate(data):
  844 + if i % length == 0:
  845 + if hexDump != '':
  846 + result.append(CombineHexASCII(hexDump, asciiDump, length))
  847 + hexDump = '%08X:' % i
  848 + asciiDump = ''
  849 + hexDump += ' %02X' % ord(b)
  850 + asciiDump += IFF(ord(b) >= 32, b, '.')
  851 + result.append(CombineHexASCII(hexDump, asciiDump, length))
  852 + return result
  853 +
  854 +def StringsASCII(data):
  855 + """
  856 + Extract a list of plain ASCII strings of 4+ chars found in data.
  857 + :param data: bytearray or bytes
  858 + :return: list of str (converted to unicode on Python 3)
  859 + """
  860 + # list of bytes strings:
  861 + bytes_strings = re.findall(b'[^\x00-\x08\x0A-\x1F\x7F-\xFF]{4,}', bytes(data))
  862 + return [bytes2str(bs) for bs in bytes_strings]
  863 +
  864 +def StringsUNICODE(data):
  865 + """
  866 + Extract a list of Unicode strings (made of 4+ plain ASCII characters only) found in data.
  867 + :param data: bytearray or bytes
  868 + :return: list of str (converted to unicode on Python 3)
  869 + """
  870 + # list of bytes strings:
  871 + # TODO: check if the null byte should be before or after the ascii byte
  872 + bytes_strings = [foundunicodestring.replace(b'\x00', b'') for foundunicodestring, dummy in re.findall(b'(([^\x00-\x08\x0A-\x1F\x7F-\xFF]\x00){4,})', bytes(data))]
  873 + return [bytes2str(bs) for bs in bytes_strings]
  874 +
  875 +def Strings(data, encodings='sL'):
  876 + """
  877 +
  878 + :param data bytearray: bytearray, data to be scanned for strings
  879 + :param encodings:
  880 + :return: dict with key = 's' or 'L', values = list of str
  881 + """
  882 + dStrings = {}
  883 + for encoding in encodings:
  884 + if encoding == 's':
  885 + dStrings[encoding] = StringsASCII(data)
  886 + elif encoding == 'L':
  887 + dStrings[encoding] = StringsUNICODE(data)
  888 + return dStrings
  889 +
  890 +def ContainsWord(word, expression):
  891 + return struct.pack('<H', word) in expression
  892 +
  893 +# https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/6e5eed10-5b77-43d6-8dd0-37345f8654ad
  894 +def ParseLoc(expression):
  895 + """
  896 +
  897 + :param expression bytearray: bytearray, data to be parsed
  898 + :return:
  899 + :rtype: str
  900 + """
  901 + formatcodes = 'HH'
  902 + formatsize = struct.calcsize(formatcodes)
  903 + row, column = struct.unpack(formatcodes, expression[0:formatsize])
  904 + rowRelative = column & 0x8000
  905 + colRelative = column & 0x4000
  906 + column = column & 0x3FFF
  907 + if rowRelative:
  908 + rowindicator = '~'
  909 + else:
  910 + rowindicator = ''
  911 + row += 1
  912 + if colRelative:
  913 + colindicator = '~'
  914 + else:
  915 + colindicator = ''
  916 + column += 1
  917 + return 'R%s%dC%s%d' % (rowindicator, row, colindicator, column)
  918 +
  919 +def ParseExpression(expression):
  920 + '''
  921 + Parse an expression into a human readable string.
  922 +
  923 + :param expression bytearray: bytearray, expression data to be parsed
  924 + :return: str, parsed expression as a string (bytes on Python 2, unicode on python 3)
  925 + :rtype: str
  926 + '''
  927 + result = ''
  928 + while len(expression) > 0:
  929 + ptgid = expression[0] # int
  930 + expression = expression[1:] # bytearray
  931 + if ptgid in dTokens:
  932 + result += dTokens[ptgid] + ' '
  933 + if ptgid == 0x17: # ptgStr
  934 + length = expression[0] # int
  935 + expression = expression[1:]
  936 + if expression[0] == 0: # probably BIFF8 -> UNICODE (compressed)
  937 + expression = expression[1:]
  938 + result += '"%s" ' % bytes2str(expression[:length])
  939 + expression = expression[length:]
  940 + elif ptgid == 0x19: # ptgAttr
  941 + grbit = expression[0] # int
  942 + expression = expression[1:]
  943 + if grbit & 0x04:
  944 + result += 'CHOOSE '
  945 + break
  946 + else:
  947 + expression = expression[2:]
  948 + elif ptgid == 0x16 or ptgid == 0x0e: # 0x0E: 'ptgNE', 0x16: 'ptgMissArg'
  949 + pass
  950 + elif ptgid == 0x1e: # ptgInt
  951 + result += '%d ' % (expression[0] + expression[1] * 0x100)
  952 + expression = expression[2:]
  953 + elif ptgid == 0x41: # ptgFuncV
  954 + functionid = expression[0] + expression[1] * 0x100
  955 + result += '%s (0x%04x) ' % (dFunctions.get(functionid, '*UNKNOWN FUNCTION*'), functionid)
  956 + expression = expression[2:]
  957 + elif ptgid == 0x22 or ptgid == 0x42: # 0x22: 'ptgFuncVar', 0x42: 'ptgFuncVarV'
  958 + functionid = expression[1] + expression[2] * 0x100
  959 + result += 'args %d func %s (0x%04x) ' % (expression[0], dFunctions.get(functionid, '*UNKNOWN FUNCTION*'), functionid)
  960 + expression = expression[3:]
  961 + elif ptgid == 0x23: # ptgName
  962 + result += '%04x ' % (expression[0] + expression[1] * 0x100)
  963 + # TODO: looks like we're skipping quite a few bytes
  964 + expression = expression[14:]
  965 + elif ptgid == 0x1f: # ptgNum
  966 + result += 'FLOAT '
  967 + # TODO: looks like we're skipping quite a few bytes
  968 + expression = expression[8:]
  969 + elif ptgid == 0x26: # ptgMemArea
  970 + expression = expression[4:] # skipping 4 bytes
  971 + expression = expression[expression[0] + expression[1] * 0x100:]
  972 + result += 'REFERENCE-EXPRESSION '
  973 + elif ptgid == 0x01: # ptgExp
  974 + formatcodes = 'HH'
  975 + formatsize = struct.calcsize(formatcodes)
  976 + row, column = struct.unpack(formatcodes, expression[0:formatsize])
  977 + expression = expression[formatsize:]
  978 + result += 'R%dC%d ' % (row + 1, column + 1)
  979 + elif ptgid == 0x24 or ptgid == 0x44: # 0x24: 'ptgRef', 0x44: 'ptgRefV'
  980 + result += '%s ' % ParseLoc(expression)
  981 + expression = expression[4:]
  982 + elif ptgid == 0x3A or ptgid == 0x5A: # 0x3A: 'ptgRef3d', 0x5A: 'ptgRef3dV'
  983 + result += '%s ' % ParseLoc(expression[2:])
  984 + expression = expression[6:]
  985 + else:
  986 + break
  987 + else:
  988 + result += '*UNKNOWN TOKEN* '
  989 + break
  990 + if len(expression) == 0:
  991 + return result
  992 + else:
  993 + # 0x006E: 'EXEC', 0x0095: 'REGISTER'
  994 + functions = [dFunctions[functionid] for functionid in [0x6E, 0x95] if ContainsWord(functionid, expression)]
  995 + if functions != []:
  996 + message = ' Could contain following functions: ' + ','.join(functions) + ' -'
  997 + else:
  998 + message = ''
  999 + return result + ' *INCOMPLETE FORMULA PARSING*' + message + ' Remaining, unparsed expression: ' + repr(expression)
  1000 +
  1001 +
  1002 +class cBIFF(object): # cPluginParent):
  1003 + macroOnly = False
  1004 + name = 'BIFF plugin'
  1005 +
  1006 + def __init__(self, name, stream, options):
  1007 + self.streamname = name
  1008 + self.stream = stream
  1009 + self.options = options
  1010 + self.ran = False
  1011 +
  1012 + def Analyze(self):
  1013 + result = []
  1014 + macros4Found = False
  1015 + if self.streamname in [['Workbook'], ['Book']]:
  1016 + self.ran = True
  1017 + # use a bytearray to have Python 2+3 compatibility with the same code (no need for ord())
  1018 + stream = bytearray(self.stream)
  1019 +
  1020 + oParser = optparse.OptionParser()
  1021 + oParser.add_option('-s', '--strings', action='store_true', default=False, help='Dump strings')
  1022 + oParser.add_option('-a', '--hexascii', action='store_true', default=False, help='Dump hex ascii')
  1023 + oParser.add_option('-x', '--xlm', action='store_true', default=False, help='Select all records relevant for Excel 4.0 macros')
  1024 + oParser.add_option('-o', '--opcode', type=str, default='', help='Opcode to filter for')
  1025 + oParser.add_option('-f', '--find', type=str, default='', help='Content to search for')
  1026 + (options, args) = oParser.parse_args(self.options.split(' '))
  1027 +
  1028 + if options.find.startswith('0x'):
  1029 + options.find = binascii.a2b_hex(options.find[2:])
  1030 +
  1031 + while len(stream)>0:
  1032 + formatcodes = 'HH'
  1033 + formatsize = struct.calcsize(formatcodes)
  1034 + # print('formatsize=%d' % formatsize)
  1035 + opcode, length = struct.unpack(formatcodes, stream[0:formatsize])
  1036 + # print('opcode=%d length=%d len(stream)=%d' % (opcode, length, len(stream)))
  1037 + stream = stream[formatsize:]
  1038 + data = stream[:length]
  1039 + stream = stream[length:]
  1040 +
  1041 + if opcode in dOpcodes:
  1042 + opcodename = dOpcodes[opcode]
  1043 + else:
  1044 + opcodename = ''
  1045 + line = '%04x %6d %s' % (opcode, length, opcodename)
  1046 + # print(line)
  1047 +
  1048 + # FORMULA record
  1049 + if opcode == 0x06 and len(data) >= 21:
  1050 + formatcodes = 'HH'
  1051 + formatsize = struct.calcsize(formatcodes)
  1052 + row, column = struct.unpack(formatcodes, data[0:formatsize])
  1053 + formatcodes = 'H'
  1054 + formatsize = struct.calcsize(formatcodes)
  1055 + length = struct.unpack(formatcodes, data[20:20 + formatsize])[0]
  1056 + expression = data[22:]
  1057 + line += ' - R%dC%d len=%d %s' % (row + 1, column + 1, length, ParseExpression(expression))
  1058 + # print(line)
  1059 +
  1060 + # FORMULA record #a# difference BIFF4 and BIFF5+
  1061 + if opcode == 0x18 and len(data) >= 16:
  1062 + if data[0] & 0x20:
  1063 + dBuildInNames = {1: 'Auto_Open', 2: 'Auto_Close'}
  1064 + code = data[14]
  1065 + if code == 0: #a# hack with BIFF8 Unicode
  1066 + code = data[15]
  1067 + line += ' - build-in-name %d %s' % (code, dBuildInNames.get(code, '?'))
  1068 + else:
  1069 + pass
  1070 + line += ' - %s' % bytes2str(data[14:14+data[3]])
  1071 + # print(line)
  1072 +
  1073 + # BOUNDSHEET record
  1074 + if opcode == 0x85 and len(data) >= 6:
  1075 + dSheetType = {0: 'worksheet or dialog sheet', 1: 'Excel 4.0 macro sheet', 2: 'chart', 6: 'Visual Basic module'}
  1076 + if data[5] == 1:
  1077 + macros4Found = True
  1078 + dSheetState = {0: 'visible', 1: 'hidden', 2: 'very hidden'}
  1079 + line += ' - %s, %s' % (dSheetType.get(data[5], '%02x' % data[5]), dSheetState.get(data[4], '%02x' % data[4]))
  1080 + # print(line)
  1081 +
  1082 + # STRING record
  1083 + if opcode == 0x207 and len(data) >= 4:
  1084 + values = list(Strings(data[3:]).values())
  1085 + strings = ''
  1086 + if values[0] != []:
  1087 + strings += ' '.join(values[0])
  1088 + if values[1] != []:
  1089 + if strings != '':
  1090 + strings += ' '
  1091 + strings += ' '.join(values[1])
  1092 + line += ' - %s' % strings
  1093 + # print(line)
  1094 +
  1095 + if options.find == '' and options.opcode == '' and not options.xlm or options.opcode != '' and options.opcode.lower() in line.lower() or options.find != '' and options.find in data or options.xlm and opcode in [0x06, 0x18, 0x85, 0x207]:
  1096 + result.append(line)
  1097 +
  1098 + if options.hexascii:
  1099 + result.extend(' ' + foundstring for foundstring in HexASCII(data, 8))
  1100 + elif options.strings:
  1101 + dEncodings = {'s': 'ASCII', 'L': 'UNICODE'}
  1102 + for encoding, strings in Strings(data).items():
  1103 + if len(strings) > 0:
  1104 + result.append(' ' + dEncodings[encoding] + ':')
  1105 + result.extend(' ' + foundstring for foundstring in strings)
  1106 +
  1107 + if options.xlm and not macros4Found:
  1108 + result = []
  1109 +
  1110 + return result
  1111 +
  1112 +# AddPlugin(cBIFF)
oletools/thirdparty/tablestream/tablestream.py
@@ -55,8 +55,9 @@ from __future__ import print_function @@ -55,8 +55,9 @@ from __future__ import print_function
55 # 2016-08-28 v0.07 PL: - support for both Python 2.6+ and 3.x 55 # 2016-08-28 v0.07 PL: - support for both Python 2.6+ and 3.x
56 # - all cells are converted to unicode 56 # - all cells are converted to unicode
57 # 2018-09-22 v0.08 PL: - removed mention to oletools' thirdparty folder 57 # 2018-09-22 v0.08 PL: - removed mention to oletools' thirdparty folder
  58 +# 2019-03-27 v0.09 PL: - slight fix, TableStyleSlim inherits from TableStyle
58 59
59 -__version__ = '0.08' 60 +__version__ = '0.09'
60 61
61 #------------------------------------------------------------------------------ 62 #------------------------------------------------------------------------------
62 # TODO: 63 # TODO:
@@ -174,7 +175,7 @@ class TableStyle(object): @@ -174,7 +175,7 @@ class TableStyle(object):
174 bottom_right = u'+' 175 bottom_right = u'+'
175 176
176 177
177 -class TableStyleSlim(object): 178 +class TableStyleSlim(TableStyle):
178 """ 179 """
179 Style for a TableStream. 180 Style for a TableStream.
180 Example: 181 Example:
oletools/thirdparty/xglob/xglob.py
1 -#! /usr/bin/env python2  
2 -"""  
3 -xglob  
4 -  
5 -xglob is a python package to list files matching wildcards (*, ?, []),  
6 -extending the functionality of the glob module from the standard python  
7 -library (https://docs.python.org/2/library/glob.html).  
8 -  
9 -Main features:  
10 -- recursive file listing (including subfolders)  
11 -- file listing within Zip archives  
12 -- helper function to open files specified as arguments, supporting files  
13 - within zip archives encrypted with a password  
14 -  
15 -Author: Philippe Lagadec - http://www.decalage.info  
16 -License: BSD, see source code or documentation  
17 -  
18 -For more info and updates: http://www.decalage.info/xglob  
19 -"""  
20 -  
21 -# LICENSE:  
22 -#  
23 -# xglob is copyright (c) 2013-2016, Philippe Lagadec (http://www.decalage.info)  
24 -# All rights reserved.  
25 -#  
26 -# Redistribution and use in source and binary forms, with or without modification,  
27 -# are permitted provided that the following conditions are met:  
28 -#  
29 -# * Redistributions of source code must retain the above copyright notice, this  
30 -# list of conditions and the following disclaimer.  
31 -# * Redistributions in binary form must reproduce the above copyright notice,  
32 -# this list of conditions and the following disclaimer in the documentation  
33 -# and/or other materials provided with the distribution.  
34 -#  
35 -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND  
36 -# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED  
37 -# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE  
38 -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE  
39 -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL  
40 -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR  
41 -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER  
42 -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,  
43 -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE  
44 -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.  
45 -  
46 -  
47 -#------------------------------------------------------------------------------  
48 -# CHANGELOG:  
49 -# 2013-12-04 v0.01 PL: - scan several files from command line args  
50 -# 2014-01-14 v0.02 PL: - added riglob, ziglob  
51 -# 2014-12-26 v0.03 PL: - moved code from balbuzard into a separate package  
52 -# 2015-01-03 v0.04 PL: - fixed issues in iter_files + yield container name  
53 -# 2016-02-24 v0.05 PL: - do not stop on exceptions, return them as data  
54 -# - fixed issue when using wildcards with empty path  
55 -# 2016-04-28 v0.06 CH: - improved handling of non-existing files  
56 -# (by Christian Herdtweck)  
57 -  
58 -__version__ = '0.06'  
59 -  
60 -  
61 -#=== IMPORTS =================================================================  
62 -  
63 -import os, fnmatch, glob, zipfile  
64 -  
65 -#=== EXCEPTIONS ==============================================================  
66 -  
67 -class PathNotFoundException(Exception):  
68 - """ raised if given a fixed file/dir (not a glob) that does not exist """  
69 - def __init__(self, path):  
70 - super(PathNotFoundException, self).__init__(  
71 - 'Given path does not exist: %r' % path)  
72 -  
73 -  
74 -#=== FUNCTIONS ===============================================================  
75 -  
76 -# recursive glob function to find files in any subfolder:  
77 -# inspired by http://stackoverflow.com/questions/14798220/how-can-i-search-sub-folders-using-glob-glob-module-in-python  
78 -def rglob (path, pattern='*.*'):  
79 - """  
80 - Recursive glob:  
81 - similar to glob.glob, but finds files recursively in all subfolders of path.  
82 - path: root directory where to search files  
83 - pattern: pattern for filenames, using wildcards, e.g. *.txt  
84 - """  
85 - #TODO: more compatible API with glob: use single param, split path from pattern  
86 - return [os.path.join(dirpath, f)  
87 - for dirpath, dirnames, files in os.walk(path)  
88 - for f in fnmatch.filter(files, pattern)]  
89 -  
90 -  
91 -def riglob (pathname):  
92 - """  
93 - Recursive iglob:  
94 - similar to glob.iglob, but finds files recursively in all subfolders of path.  
95 - pathname: root directory where to search files followed by pattern for  
96 - filenames, using wildcards, e.g. *.txt  
97 - """  
98 - path, filespec = os.path.split(pathname)  
99 - # fix path if empty:  
100 - if path == '':  
101 - path = '.'  
102 - # print 'riglob: path=%r, filespec=%r' % (path, filespec)  
103 - for dirpath, dirnames, files in os.walk(path):  
104 - for f in fnmatch.filter(files, filespec):  
105 - yield os.path.join(dirpath, f)  
106 -  
107 -  
108 -def ziglob (zipfileobj, pathname):  
109 - """  
110 - iglob in a zip:  
111 - similar to glob.iglob, but finds files within a zip archive.  
112 - - zipfileobj: zipfile.ZipFile object  
113 - - pathname: root directory where to search files followed by pattern for  
114 - filenames, using wildcards, e.g. *.txt  
115 - """  
116 - files = zipfileobj.namelist()  
117 - #for f in files: print f  
118 - for f in fnmatch.filter(files, pathname):  
119 - yield f  
120 -  
121 -  
122 -def iter_files(files, recursive=False, zip_password=None, zip_fname='*'):  
123 - """  
124 - Open each file provided as argument:  
125 - - files is a list of arguments  
126 - - if zip_password is None, each file is listed without reading its content.  
127 - Wilcards are supported.  
128 - - if not, then each file is opened as a zip archive with the provided password  
129 - - then files matching zip_fname are opened from the zip archive  
130 -  
131 - Iterator: yields (container, filename, data) for each file. If zip_password is None, then  
132 - only the filename is returned, container and data=None. Otherwise container is the  
133 - filename of the container (zip file), and data is the file content (or an exception).  
134 - If a given filename is not a glob and does not exist, the triplet  
135 - (None, filename, PathNotFoundException) is yielded. (Globs matching nothing  
136 - do not trigger exceptions)  
137 - """  
138 - #TODO: catch exceptions and yield them for the caller (no file found, file is not zip, wrong password, etc)  
139 - #TODO: use logging instead of printing  
140 - #TODO: split in two simpler functions, the caller knows if it's a zip or not  
141 - # print 'iter_files: files=%r, recursive=%s' % (files, recursive)  
142 - # choose recursive or non-recursive iglob:  
143 - if recursive:  
144 - iglob = riglob  
145 - else:  
146 - iglob = glob.iglob  
147 - for filespec in files:  
148 - if not is_glob(filespec) and not os.path.exists(filespec):  
149 - yield None, filespec, PathNotFoundException(filespec)  
150 - continue  
151 - for filename in iglob(filespec):  
152 - if zip_password is not None:  
153 - # Each file is expected to be a zip archive:  
154 - #print 'Opening zip archive %s with provided password' % filename  
155 - z = zipfile.ZipFile(filename, 'r')  
156 - #print 'Looking for file(s) matching "%s"' % zip_fname  
157 - for subfilename in ziglob(z, zip_fname):  
158 - #print 'Opening file in zip archive:', filename  
159 - try:  
160 - data = z.read(subfilename, zip_password)  
161 - yield filename, subfilename, data  
162 - except Exception as e:  
163 - yield filename, subfilename, e  
164 - z.close()  
165 - else:  
166 - # normal file  
167 - # do not read the file content, just yield the filename  
168 - yield None, filename, None  
169 - #print 'Opening file', filename  
170 - #data = open(filename, 'rb').read()  
171 - #yield None, filename, data  
172 -  
173 -  
174 -def is_glob(filespec):  
175 - """ determine if given file specification is a single file name or a glob  
176 -  
177 - python's glob and fnmatch can only interpret ?, *, [list], and [ra-nge],  
178 - (and combinations: hex_*_[A-Fabcdef0-9]).  
179 - The special chars *?[-] can only be escaped using []  
180 - --> file_name is not a glob  
181 - --> file?name is a glob  
182 - --> file* is a glob  
183 - --> file[-._]name is a glob  
184 - --> file[?]name is not a glob (matches literal "file?name")  
185 - --> file[*]name is not a glob (matches literal "file*name")  
186 - --> file[-]name is not a glob (matches literal "file-name")  
187 - --> file-name is not a glob  
188 -  
189 - Also, obviously incorrect globs are treated as non-globs  
190 - --> file[name is not a glob (matches literal "file[name")  
191 - --> file]-[name is treated as a glob  
192 - (it is not a valid glob but detecting errors like this requires  
193 - sophisticated regular expression matching)  
194 -  
195 - Python's glob also works with globs in directory-part of path  
196 - --> dir-part of path is analyzed just like filename-part  
197 - --> thirdparty/*/xglob.py is a (valid) glob  
198 -  
199 - TODO: create a correct regexp to test for validity of ranges  
200 - """  
201 -  
202 - # remove escaped special chars  
203 - cleaned = filespec.replace('[*]', '').replace('[?]', '') \  
204 - .replace('[[]', '').replace('[]]', '').replace('[-]', '')  
205 -  
206 - # check if special chars remain  
207 - return '*' in cleaned or '?' in cleaned or \  
208 - ('[' in cleaned and ']' in cleaned) 1 +#! /usr/bin/env python2
  2 +"""
  3 +xglob
  4 +
  5 +xglob is a python package to list files matching wildcards (*, ?, []),
  6 +extending the functionality of the glob module from the standard python
  7 +library (https://docs.python.org/2/library/glob.html).
  8 +
  9 +Main features:
  10 +- recursive file listing (including subfolders)
  11 +- file listing within Zip archives
  12 +- helper function to open files specified as arguments, supporting files
  13 + within zip archives encrypted with a password
  14 +
  15 +Author: Philippe Lagadec - http://www.decalage.info
  16 +License: BSD, see source code or documentation
  17 +
  18 +For more info and updates: http://www.decalage.info/xglob
  19 +"""
  20 +
  21 +# LICENSE:
  22 +#
  23 +# xglob is copyright (c) 2013-2018, Philippe Lagadec (http://www.decalage.info)
  24 +# All rights reserved.
  25 +#
  26 +# Redistribution and use in source and binary forms, with or without modification,
  27 +# are permitted provided that the following conditions are met:
  28 +#
  29 +# * Redistributions of source code must retain the above copyright notice, this
  30 +# list of conditions and the following disclaimer.
  31 +# * Redistributions in binary form must reproduce the above copyright notice,
  32 +# this list of conditions and the following disclaimer in the documentation
  33 +# and/or other materials provided with the distribution.
  34 +#
  35 +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  36 +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  37 +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  38 +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  39 +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  40 +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  41 +# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  42 +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  43 +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  44 +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  45 +
  46 +
  47 +#------------------------------------------------------------------------------
  48 +# CHANGELOG:
  49 +# 2013-12-04 v0.01 PL: - scan several files from command line args
  50 +# 2014-01-14 v0.02 PL: - added riglob, ziglob
  51 +# 2014-12-26 v0.03 PL: - moved code from balbuzard into a separate package
  52 +# 2015-01-03 v0.04 PL: - fixed issues in iter_files + yield container name
  53 +# 2016-02-24 v0.05 PL: - do not stop on exceptions, return them as data
  54 +# - fixed issue when using wildcards with empty path
  55 +# 2016-04-28 v0.06 CH: - improved handling of non-existing files
  56 +# (by Christian Herdtweck)
  57 +# 2018-12-08 v0.07 PL: - fixed issue #373, zip password must be bytes
  58 +
  59 +__version__ = '0.07'
  60 +
  61 +
  62 +#=== IMPORTS =================================================================
  63 +
  64 +import os, fnmatch, glob, zipfile
  65 +
  66 +#=== EXCEPTIONS ==============================================================
  67 +
  68 +class PathNotFoundException(Exception):
  69 + """ raised if given a fixed file/dir (not a glob) that does not exist """
  70 + def __init__(self, path):
  71 + super(PathNotFoundException, self).__init__(
  72 + 'Given path does not exist: %r' % path)
  73 +
  74 +
  75 +#=== FUNCTIONS ===============================================================
  76 +
  77 +# recursive glob function to find files in any subfolder:
  78 +# inspired by http://stackoverflow.com/questions/14798220/how-can-i-search-sub-folders-using-glob-glob-module-in-python
  79 +def rglob (path, pattern='*.*'):
  80 + """
  81 + Recursive glob:
  82 + similar to glob.glob, but finds files recursively in all subfolders of path.
  83 + path: root directory where to search files
  84 + pattern: pattern for filenames, using wildcards, e.g. *.txt
  85 + """
  86 + #TODO: more compatible API with glob: use single param, split path from pattern
  87 + return [os.path.join(dirpath, f)
  88 + for dirpath, dirnames, files in os.walk(path)
  89 + for f in fnmatch.filter(files, pattern)]
  90 +
  91 +
  92 +def riglob (pathname):
  93 + """
  94 + Recursive iglob:
  95 + similar to glob.iglob, but finds files recursively in all subfolders of path.
  96 + pathname: root directory where to search files followed by pattern for
  97 + filenames, using wildcards, e.g. *.txt
  98 + """
  99 + path, filespec = os.path.split(pathname)
  100 + # fix path if empty:
  101 + if path == '':
  102 + path = '.'
  103 + # print 'riglob: path=%r, filespec=%r' % (path, filespec)
  104 + for dirpath, dirnames, files in os.walk(path):
  105 + for f in fnmatch.filter(files, filespec):
  106 + yield os.path.join(dirpath, f)
  107 +
  108 +
  109 +def ziglob (zipfileobj, pathname):
  110 + """
  111 + iglob in a zip:
  112 + similar to glob.iglob, but finds files within a zip archive.
  113 + - zipfileobj: zipfile.ZipFile object
  114 + - pathname: root directory where to search files followed by pattern for
  115 + filenames, using wildcards, e.g. *.txt
  116 + """
  117 + files = zipfileobj.namelist()
  118 + #for f in files: print f
  119 + for f in fnmatch.filter(files, pathname):
  120 + yield f
  121 +
  122 +
  123 +def iter_files(files, recursive=False, zip_password=None, zip_fname='*'):
  124 + """
  125 + Open each file provided as argument:
  126 + - files is a list of arguments
  127 + - if zip_password is None, each file is listed without reading its content.
  128 + Wilcards are supported.
  129 + - if not, then each file is opened as a zip archive with the provided password
  130 + - then files matching zip_fname are opened from the zip archive
  131 +
  132 + Iterator: yields (container, filename, data) for each file. If zip_password is None, then
  133 + only the filename is returned, container and data=None. Otherwise container is the
  134 + filename of the container (zip file), and data is the file content (or an exception).
  135 + If a given filename is not a glob and does not exist, the triplet
  136 + (None, filename, PathNotFoundException) is yielded. (Globs matching nothing
  137 + do not trigger exceptions)
  138 + """
  139 + #TODO: catch exceptions and yield them for the caller (no file found, file is not zip, wrong password, etc)
  140 + #TODO: use logging instead of printing
  141 + #TODO: split in two simpler functions, the caller knows if it's a zip or not
  142 + # print 'iter_files: files=%r, recursive=%s' % (files, recursive)
  143 + # choose recursive or non-recursive iglob:
  144 + if recursive:
  145 + iglob = riglob
  146 + else:
  147 + iglob = glob.iglob
  148 + for filespec in files:
  149 + if not is_glob(filespec) and not os.path.exists(filespec):
  150 + yield None, filespec, PathNotFoundException(filespec)
  151 + continue
  152 + for filename in iglob(filespec):
  153 + if zip_password is not None:
  154 + # Each file is expected to be a zip archive:
  155 + # The zip password must be bytes, not unicode/str:
  156 + if not isinstance(zip_password, bytes):
  157 + zip_password = bytes(zip_password, encoding='utf8')
  158 + # print('Opening zip archive %s with provided password' % filename)
  159 + # print('zip password: %r' % zip_password)
  160 + # print(type(zip_password))
  161 + z = zipfile.ZipFile(filename, 'r')
  162 + #print 'Looking for file(s) matching "%s"' % zip_fname
  163 + for subfilename in ziglob(z, zip_fname):
  164 + #print 'Opening file in zip archive:', filename
  165 + try:
  166 + data = z.read(subfilename, zip_password)
  167 + yield filename, subfilename, data
  168 + except Exception as e:
  169 + yield filename, subfilename, e
  170 + z.close()
  171 + else:
  172 + # normal file
  173 + # do not read the file content, just yield the filename
  174 + yield None, filename, None
  175 + #print 'Opening file', filename
  176 + #data = open(filename, 'rb').read()
  177 + #yield None, filename, data
  178 +
  179 +
  180 +def is_glob(filespec):
  181 + """ determine if given file specification is a single file name or a glob
  182 +
  183 + python's glob and fnmatch can only interpret ?, *, [list], and [ra-nge],
  184 + (and combinations: hex_*_[A-Fabcdef0-9]).
  185 + The special chars *?[-] can only be escaped using []
  186 + --> file_name is not a glob
  187 + --> file?name is a glob
  188 + --> file* is a glob
  189 + --> file[-._]name is a glob
  190 + --> file[?]name is not a glob (matches literal "file?name")
  191 + --> file[*]name is not a glob (matches literal "file*name")
  192 + --> file[-]name is not a glob (matches literal "file-name")
  193 + --> file-name is not a glob
  194 +
  195 + Also, obviously incorrect globs are treated as non-globs
  196 + --> file[name is not a glob (matches literal "file[name")
  197 + --> file]-[name is treated as a glob
  198 + (it is not a valid glob but detecting errors like this requires
  199 + sophisticated regular expression matching)
  200 +
  201 + Python's glob also works with globs in directory-part of path
  202 + --> dir-part of path is analyzed just like filename-part
  203 + --> thirdparty/*/xglob.py is a (valid) glob
  204 +
  205 + TODO: create a correct regexp to test for validity of ranges
  206 + """
  207 +
  208 + # remove escaped special chars
  209 + cleaned = filespec.replace('[*]', '').replace('[?]', '') \
  210 + .replace('[[]', '').replace('[]]', '').replace('[-]', '')
  211 +
  212 + # check if special chars remain
  213 + return '*' in cleaned or '?' in cleaned or \
  214 + ('[' in cleaned and ']' in cleaned)
oletools/thirdparty/zipfile27/LICENSE.txt deleted
1 -Python 2.7 license  
2 -  
3 -This is the official license for the Python 2.7 release:  
4 -  
5 -A. HISTORY OF THE SOFTWARE  
6 -==========================  
7 -  
8 -Python was created in the early 1990s by Guido van Rossum at Stichting  
9 -Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands  
10 -as a successor of a language called ABC. Guido remains Python's  
11 -principal author, although it includes many contributions from others.  
12 -  
13 -In 1995, Guido continued his work on Python at the Corporation for  
14 -National Research Initiatives (CNRI, see http://www.cnri.reston.va.us)  
15 -in Reston, Virginia where he released several versions of the  
16 -software.  
17 -  
18 -In May 2000, Guido and the Python core development team moved to  
19 -BeOpen.com to form the BeOpen PythonLabs team. In October of the same  
20 -year, the PythonLabs team moved to Digital Creations (now Zope  
21 -Corporation, see http://www.zope.com). In 2001, the Python Software  
22 -Foundation (PSF, see http://www.python.org/psf/) was formed, a  
23 -non-profit organization created specifically to own Python-related  
24 -Intellectual Property. Zope Corporation is a sponsoring member of  
25 -the PSF.  
26 -  
27 -All Python releases are Open Source (see http://www.opensource.org for  
28 -the Open Source Definition). Historically, most, but not all, Python  
29 -releases have also been GPL-compatible; the table below summarizes  
30 -the various releases.  
31 -  
32 - Release Derived Year Owner GPL-  
33 - from compatible? (1)  
34 -  
35 - 0.9.0 thru 1.2 1991-1995 CWI yes  
36 - 1.3 thru 1.5.2 1.2 1995-1999 CNRI yes  
37 - 1.6 1.5.2 2000 CNRI no  
38 - 2.0 1.6 2000 BeOpen.com no  
39 - 1.6.1 1.6 2001 CNRI yes (2)  
40 - 2.1 2.0+1.6.1 2001 PSF no  
41 - 2.0.1 2.0+1.6.1 2001 PSF yes  
42 - 2.1.1 2.1+2.0.1 2001 PSF yes  
43 - 2.2 2.1.1 2001 PSF yes  
44 - 2.1.2 2.1.1 2002 PSF yes  
45 - 2.1.3 2.1.2 2002 PSF yes  
46 - 2.2.1 2.2 2002 PSF yes  
47 - 2.2.2 2.2.1 2002 PSF yes  
48 - 2.2.3 2.2.2 2003 PSF yes  
49 - 2.3 2.2.2 2002-2003 PSF yes  
50 - 2.3.1 2.3 2002-2003 PSF yes  
51 - 2.3.2 2.3.1 2002-2003 PSF yes  
52 - 2.3.3 2.3.2 2002-2003 PSF yes  
53 - 2.3.4 2.3.3 2004 PSF yes  
54 - 2.3.5 2.3.4 2005 PSF yes  
55 - 2.4 2.3 2004 PSF yes  
56 - 2.4.1 2.4 2005 PSF yes  
57 - 2.4.2 2.4.1 2005 PSF yes  
58 - 2.4.3 2.4.2 2006 PSF yes  
59 - 2.5 2.4 2006 PSF yes  
60 - 2.7 2.6 2010 PSF yes  
61 -  
62 -Footnotes:  
63 -  
64 -(1) GPL-compatible doesn't mean that we're distributing Python under  
65 - the GPL. All Python licenses, unlike the GPL, let you distribute  
66 - a modified version without making your changes open source. The  
67 - GPL-compatible licenses make it possible to combine Python with  
68 - other software that is released under the GPL; the others don't.  
69 -  
70 -(2) According to Richard Stallman, 1.6.1 is not GPL-compatible,  
71 - because its license has a choice of law clause. According to  
72 - CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1  
73 - is "not incompatible" with the GPL.  
74 -  
75 -Thanks to the many outside volunteers who have worked under Guido's  
76 -direction to make these releases possible.  
77 -  
78 -  
79 -B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON  
80 -===============================================================  
81 -  
82 -PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2  
83 ---------------------------------------------  
84 -  
85 -1. This LICENSE AGREEMENT is between the Python Software Foundation  
86 -("PSF"), and the Individual or Organization ("Licensee") accessing and  
87 -otherwise using this software ("Python") in source or binary form and  
88 -its associated documentation.  
89 -  
90 -2. Subject to the terms and conditions of this License Agreement, PSF  
91 -hereby grants Licensee a nonexclusive, royalty-free, world-wide  
92 -license to reproduce, analyze, test, perform and/or display publicly,  
93 -prepare derivative works, distribute, and otherwise use Python  
94 -alone or in any derivative version, provided, however, that PSF's  
95 -License Agreement and PSF's notice of copyright, i.e., "Copyright (c)  
96 -2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation; All Rights  
97 -Reserved" are retained in Python alone or in any derivative version  
98 -prepared by Licensee.  
99 -  
100 -3. In the event Licensee prepares a derivative work that is based on  
101 -or incorporates Python or any part thereof, and wants to make  
102 -the derivative work available to others as provided herein, then  
103 -Licensee hereby agrees to include in any such work a brief summary of  
104 -the changes made to Python.  
105 -  
106 -4. PSF is making Python available to Licensee on an "AS IS"  
107 -basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR  
108 -IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND  
109 -DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS  
110 -FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT  
111 -INFRINGE ANY THIRD PARTY RIGHTS.  
112 -  
113 -5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON  
114 -FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS  
115 -A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON,  
116 -OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.  
117 -  
118 -6. This License Agreement will automatically terminate upon a material  
119 -breach of its terms and conditions.  
120 -  
121 -7. Nothing in this License Agreement shall be deemed to create any  
122 -relationship of agency, partnership, or joint venture between PSF and  
123 -Licensee. This License Agreement does not grant permission to use PSF  
124 -trademarks or trade name in a trademark sense to endorse or promote  
125 -products or services of Licensee, or any third party.  
126 -  
127 -8. By copying, installing or otherwise using Python, Licensee  
128 -agrees to be bound by the terms and conditions of this License  
129 -Agreement.  
130 -  
131 -  
132 -BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0  
133 --------------------------------------------  
134 -  
135 -BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1  
136 -  
137 -1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an  
138 -office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the  
139 -Individual or Organization ("Licensee") accessing and otherwise using  
140 -this software in source or binary form and its associated  
141 -documentation ("the Software").  
142 -  
143 -2. Subject to the terms and conditions of this BeOpen Python License  
144 -Agreement, BeOpen hereby grants Licensee a non-exclusive,  
145 -royalty-free, world-wide license to reproduce, analyze, test, perform  
146 -and/or display publicly, prepare derivative works, distribute, and  
147 -otherwise use the Software alone or in any derivative version,  
148 -provided, however, that the BeOpen Python License is retained in the  
149 -Software, alone or in any derivative version prepared by Licensee.  
150 -  
151 -3. BeOpen is making the Software available to Licensee on an "AS IS"  
152 -basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR  
153 -IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND  
154 -DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS  
155 -FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT  
156 -INFRINGE ANY THIRD PARTY RIGHTS.  
157 -  
158 -4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE  
159 -SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS  
160 -AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY  
161 -DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.  
162 -  
163 -5. This License Agreement will automatically terminate upon a material  
164 -breach of its terms and conditions.  
165 -  
166 -6. This License Agreement shall be governed by and interpreted in all  
167 -respects by the law of the State of California, excluding conflict of  
168 -law provisions. Nothing in this License Agreement shall be deemed to  
169 -create any relationship of agency, partnership, or joint venture  
170 -between BeOpen and Licensee. This License Agreement does not grant  
171 -permission to use BeOpen trademarks or trade names in a trademark  
172 -sense to endorse or promote products or services of Licensee, or any  
173 -third party. As an exception, the "BeOpen Python" logos available at  
174 -http://www.pythonlabs.com/logos.html may be used according to the  
175 -permissions granted on that web page.  
176 -  
177 -7. By copying, installing or otherwise using the software, Licensee  
178 -agrees to be bound by the terms and conditions of this License  
179 -Agreement.  
180 -  
181 -  
182 -CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1  
183 ----------------------------------------  
184 -  
185 -1. This LICENSE AGREEMENT is between the Corporation for National  
186 -Research Initiatives, having an office at 1895 Preston White Drive,  
187 -Reston, VA 20191 ("CNRI"), and the Individual or Organization  
188 -("Licensee") accessing and otherwise using Python 1.6.1 software in  
189 -source or binary form and its associated documentation.  
190 -  
191 -2. Subject to the terms and conditions of this License Agreement, CNRI  
192 -hereby grants Licensee a nonexclusive, royalty-free, world-wide  
193 -license to reproduce, analyze, test, perform and/or display publicly,  
194 -prepare derivative works, distribute, and otherwise use Python 1.6.1  
195 -alone or in any derivative version, provided, however, that CNRI's  
196 -License Agreement and CNRI's notice of copyright, i.e., "Copyright (c)  
197 -1995-2001 Corporation for National Research Initiatives; All Rights  
198 -Reserved" are retained in Python 1.6.1 alone or in any derivative  
199 -version prepared by Licensee. Alternately, in lieu of CNRI's License  
200 -Agreement, Licensee may substitute the following text (omitting the  
201 -quotes): "Python 1.6.1 is made available subject to the terms and  
202 -conditions in CNRI's License Agreement. This Agreement together with  
203 -Python 1.6.1 may be located on the Internet using the following  
204 -unique, persistent identifier (known as a handle): 1895.22/1013. This  
205 -Agreement may also be obtained from a proxy server on the Internet  
206 -using the following URL: http://hdl.handle.net/1895.22/1013".  
207 -  
208 -3. In the event Licensee prepares a derivative work that is based on  
209 -or incorporates Python 1.6.1 or any part thereof, and wants to make  
210 -the derivative work available to others as provided herein, then  
211 -Licensee hereby agrees to include in any such work a brief summary of  
212 -the changes made to Python 1.6.1.  
213 -  
214 -4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS"  
215 -basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR  
216 -IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND  
217 -DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS  
218 -FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT  
219 -INFRINGE ANY THIRD PARTY RIGHTS.  
220 -  
221 -5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON  
222 -1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS  
223 -A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1,  
224 -OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF.  
225 -  
226 -6. This License Agreement will automatically terminate upon a material  
227 -breach of its terms and conditions.  
228 -  
229 -7. This License Agreement shall be governed by the federal  
230 -intellectual property law of the United States, including without  
231 -limitation the federal copyright law, and, to the extent such  
232 -U.S. federal law does not apply, by the law of the Commonwealth of  
233 -Virginia, excluding Virginia's conflict of law provisions.  
234 -Notwithstanding the foregoing, with regard to derivative works based  
235 -on Python 1.6.1 that incorporate non-separable material that was  
236 -previously distributed under the GNU General Public License (GPL), the  
237 -law of the Commonwealth of Virginia shall govern this License  
238 -Agreement only as to issues arising under or with respect to  
239 -Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this  
240 -License Agreement shall be deemed to create any relationship of  
241 -agency, partnership, or joint venture between CNRI and Licensee. This  
242 -License Agreement does not grant permission to use CNRI trademarks or  
243 -trade name in a trademark sense to endorse or promote products or  
244 -services of Licensee, or any third party.  
245 -  
246 -8. By clicking on the "ACCEPT" button where indicated, or by copying,  
247 -installing or otherwise using Python 1.6.1, Licensee agrees to be  
248 -bound by the terms and conditions of this License Agreement.  
249 -  
250 - ACCEPT  
251 -  
252 -  
253 -CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2  
254 ---------------------------------------------------  
255 -  
256 -Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam,  
257 -The Netherlands. All rights reserved.  
258 -  
259 -Permission to use, copy, modify, and distribute this software and its  
260 -documentation for any purpose and without fee is hereby granted,  
261 -provided that the above copyright notice appear in all copies and that  
262 -both that copyright notice and this permission notice appear in  
263 -supporting documentation, and that the name of Stichting Mathematisch  
264 -Centrum or CWI not be used in advertising or publicity pertaining to  
265 -distribution of the software without specific, written prior  
266 -permission.  
267 -  
268 -STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO  
269 -THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND  
270 -FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE  
271 -FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES  
272 -WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN  
273 -ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT  
274 -OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.  
275 -  
oletools/thirdparty/zipfile27/__init__.py deleted
1 -# Excerpt from the zipfile module from Python 2.7, to enable is_zipfile  
2 -# to check any file object (e.g. in memory), for Python 2.6.  
3 -# is_zipfile in Python 2.6 can only check files on disk.  
4 -  
5 -# This code from Python 2.7 was not modified.  
6 -  
7 -# 2016-09-06 v0.01 PL: - first version  
8 -  
9 -  
10 -from zipfile import _EndRecData  
11 -  
12 -def _check_zipfile(fp):  
13 - try:  
14 - if _EndRecData(fp):  
15 - return True # file has correct magic number  
16 - except IOError:  
17 - pass  
18 - return False  
19 -  
20 -def is_zipfile(filename):  
21 - """Quickly see if a file is a ZIP file by checking the magic number.  
22 -  
23 - The filename argument may be a file or file-like object too.  
24 - """  
25 - result = False  
26 - try:  
27 - if hasattr(filename, "read"):  
28 - result = _check_zipfile(fp=filename)  
29 - else:  
30 - with open(filename, "rb") as fp:  
31 - result = _check_zipfile(fp)  
32 - except IOError:  
33 - pass  
34 - return result  
35 -  
oletools/xls_parser.py
@@ -5,7 +5,7 @@ Read storages, (sub-)streams, records from xls file @@ -5,7 +5,7 @@ Read storages, (sub-)streams, records from xls file
5 # 5 #
6 # === LICENSE ================================================================== 6 # === LICENSE ==================================================================
7 7
8 -# xls_parser is copyright (c) 2014-2018 Philippe Lagadec (http://www.decalage.info) 8 +# xls_parser is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info)
9 # All rights reserved. 9 # All rights reserved.
10 # 10 #
11 # Redistribution and use in source and binary forms, with or without modification, 11 # Redistribution and use in source and binary forms, with or without modification,
@@ -33,8 +33,10 @@ Read storages, (sub-)streams, records from xls file @@ -33,8 +33,10 @@ Read storages, (sub-)streams, records from xls file
33 # 2017-11-02 v0.1 CH: - first version 33 # 2017-11-02 v0.1 CH: - first version
34 # 2017-11-02 v0.2 CH: - move some code to record_base.py 34 # 2017-11-02 v0.2 CH: - move some code to record_base.py
35 # (to avoid copy-and-paste in ppt_parser.py) 35 # (to avoid copy-and-paste in ppt_parser.py)
  36 +# 2019-01-30 v0.54 PL: - fixed import to avoid mixing installed oletools
  37 +# and dev version
36 38
37 -__version__ = '0.2' 39 +__version__ = '0.54'
38 40
39 # ----------------------------------------------------------------------------- 41 # -----------------------------------------------------------------------------
40 # TODO: 42 # TODO:
@@ -56,17 +58,14 @@ import os.path @@ -56,17 +58,14 @@ import os.path
56 from struct import unpack 58 from struct import unpack
57 import logging 59 import logging
58 60
59 -try:  
60 - from oletools import record_base  
61 -except ImportError:  
62 - # little hack to allow absolute imports even if oletools is not installed.  
63 - # Copied from olevba.py  
64 - PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(  
65 - os.path.abspath(__file__))))  
66 - if PARENT_DIR not in sys.path:  
67 - sys.path.insert(0, PARENT_DIR)  
68 - del PARENT_DIR  
69 - from oletools import record_base 61 +# little hack to allow absolute imports even if oletools is not installed.
  62 +# Copied from olevba.py
  63 +PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname(
  64 + os.path.abspath(__file__))))
  65 +if PARENT_DIR not in sys.path:
  66 + sys.path.insert(0, PARENT_DIR)
  67 +del PARENT_DIR
  68 +from oletools import record_base
70 69
71 70
72 # === PYTHON 2+3 SUPPORT ====================================================== 71 # === PYTHON 2+3 SUPPORT ======================================================
@@ -89,12 +88,18 @@ def is_xls(filename): @@ -89,12 +88,18 @@ def is_xls(filename):
89 substream. 88 substream.
90 See also: oleid.OleID.check_excel 89 See also: oleid.OleID.check_excel
91 """ 90 """
  91 + xls_file = None
92 try: 92 try:
93 - for stream in XlsFile(filename).iter_streams(): 93 + xls_file = XlsFile(filename)
  94 + for stream in xls_file.iter_streams():
94 if isinstance(stream, WorkbookStream): 95 if isinstance(stream, WorkbookStream):
95 return True 96 return True
96 except Exception: 97 except Exception:
97 - pass 98 + logging.debug('Ignoring exception in is_xls, assume is not xls',
  99 + exc_info=True)
  100 + finally:
  101 + if xls_file is not None:
  102 + xls_file.close()
98 return False 103 return False
99 104
100 105
@@ -102,7 +107,7 @@ def read_unicode(data, start_idx, n_chars): @@ -102,7 +107,7 @@ def read_unicode(data, start_idx, n_chars):
102 """ read a unicode string from a XLUnicodeStringNoCch structure """ 107 """ read a unicode string from a XLUnicodeStringNoCch structure """
103 # first bit 0x0 --> only low-bytes are saved, all high bytes are 0 108 # first bit 0x0 --> only low-bytes are saved, all high bytes are 0
104 # first bit 0x1 --> 2 bytes per character 109 # first bit 0x1 --> 2 bytes per character
105 - low_bytes_only = (ord(data[start_idx]) == 0) 110 + low_bytes_only = (ord(data[start_idx:start_idx+1]) == 0)
106 if low_bytes_only: 111 if low_bytes_only:
107 end_idx = start_idx + 1 + n_chars 112 end_idx = start_idx + 1 + n_chars
108 return data[start_idx+1:end_idx].decode('ascii'), end_idx 113 return data[start_idx+1:end_idx].decode('ascii'), end_idx
@@ -350,6 +355,7 @@ class XlsRecordSupBook(XlsRecord): @@ -350,6 +355,7 @@ class XlsRecordSupBook(XlsRecord):
350 LINK_TYPE_EXTERNAL = 'external workbook' 355 LINK_TYPE_EXTERNAL = 'external workbook'
351 356
352 def finish_constructing(self, _): 357 def finish_constructing(self, _):
  358 + """Finish constructing this record; called at end of constructor."""
353 # set defaults 359 # set defaults
354 self.ctab = None 360 self.ctab = None
355 self.cch = None 361 self.cch = None
requirements.txt
1 pyparsing>=2.2.0 1 pyparsing>=2.2.0
2 -olefile>=0.45 2 +olefile>=0.46
  3 +easygui
  4 +colorclass
  5 +msoffcrypto-tool
  6 +pcodedmp>=1.2.5
3 \ No newline at end of file 7 \ No newline at end of file
setup.py
@@ -28,6 +28,9 @@ to install this package. @@ -28,6 +28,9 @@ to install this package.
28 # 2018-09-15 PL: - easygui is now a dependency 28 # 2018-09-15 PL: - easygui is now a dependency
29 # 2018-09-22 PL: - colorclass is now a dependency 29 # 2018-09-22 PL: - colorclass is now a dependency
30 # 2018-10-27 PL: - fixed issue #359 (bug when importing log_helper) 30 # 2018-10-27 PL: - fixed issue #359 (bug when importing log_helper)
  31 +# 2019-02-26 CH: - add optional dependency msoffcrypto for decryption
  32 +# 2019-05-22 PL: - 'msoffcrypto-tool' is now a required dependency
  33 +# 2019-05-23 v0.55 PL: - added pcodedmp as dependency
31 34
32 #--- TODO --------------------------------------------------------------------- 35 #--- TODO ---------------------------------------------------------------------
33 36
@@ -47,7 +50,7 @@ import os, fnmatch @@ -47,7 +50,7 @@ import os, fnmatch
47 #--- METADATA ----------------------------------------------------------------- 50 #--- METADATA -----------------------------------------------------------------
48 51
49 name = "oletools" 52 name = "oletools"
50 -version = '0.54dev4' 53 +version = '0.55.dev3'
51 desc = "Python tools to analyze security characteristics of MS Office and OLE files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), for Malware Analysis and Incident Response #DFIR" 54 desc = "Python tools to analyze security characteristics of MS Office and OLE files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), for Malware Analysis and Incident Response #DFIR"
52 long_desc = open('oletools/README.rst').read() 55 long_desc = open('oletools/README.rst').read()
53 author = "Philippe Lagadec" 56 author = "Philippe Lagadec"
@@ -73,6 +76,7 @@ classifiers=[ @@ -73,6 +76,7 @@ classifiers=[
73 "Programming Language :: Python :: 3.4", 76 "Programming Language :: Python :: 3.4",
74 "Programming Language :: Python :: 3.5", 77 "Programming Language :: Python :: 3.5",
75 "Programming Language :: Python :: 3.6", 78 "Programming Language :: Python :: 3.6",
  79 + "Programming Language :: Python :: 3.7",
76 "Topic :: Security", 80 "Topic :: Security",
77 "Topic :: Software Development :: Libraries :: Python Modules", 81 "Topic :: Software Development :: Libraries :: Python Modules",
78 ] 82 ]
@@ -89,7 +93,7 @@ packages=[ @@ -89,7 +93,7 @@ packages=[
89 'oletools.thirdparty.xglob', 93 'oletools.thirdparty.xglob',
90 'oletools.thirdparty.DridexUrlDecoder', 94 'oletools.thirdparty.DridexUrlDecoder',
91 'oletools.thirdparty.tablestream', 95 'oletools.thirdparty.tablestream',
92 - 'oletools.thirdparty.zipfile27', 96 + 'oletools.thirdparty.oledump',
93 ] 97 ]
94 ##setupdir = '.' 98 ##setupdir = '.'
95 ##package_dir={'': setupdir} 99 ##package_dir={'': setupdir}
@@ -177,9 +181,6 @@ package_data={ @@ -177,9 +181,6 @@ package_data={
177 'oletools.thirdparty.DridexUrlDecoder': [ 181 'oletools.thirdparty.DridexUrlDecoder': [
178 'LICENSE.txt', 182 'LICENSE.txt',
179 ], 183 ],
180 - 'oletools.thirdparty.zipfile27': [  
181 - 'LICENSE.txt',  
182 - ],  
183 # 'oletools.thirdparty.tablestream': [ 184 # 'oletools.thirdparty.tablestream': [
184 # 'LICENSE', 'README', 185 # 'LICENSE', 'README',
185 # ], 186 # ],
@@ -305,11 +306,11 @@ def main(): @@ -305,11 +306,11 @@ def main():
305 author_email=author_email, 306 author_email=author_email,
306 url=url, 307 url=url,
307 license=license, 308 license=license,
308 -## package_dir=package_dir, 309 + # package_dir=package_dir,
309 packages=packages, 310 packages=packages,
310 package_data = package_data, 311 package_data = package_data,
311 download_url=download_url, 312 download_url=download_url,
312 -# data_files=data_files, 313 + # data_files=data_files,
313 entry_points=entry_points, 314 entry_points=entry_points,
314 test_suite="tests", 315 test_suite="tests",
315 # scripts=scripts, 316 # scripts=scripts,
@@ -318,6 +319,8 @@ def main(): @@ -318,6 +319,8 @@ def main():
318 "olefile>=0.46", 319 "olefile>=0.46",
319 "easygui", 320 "easygui",
320 'colorclass', 321 'colorclass',
  322 + 'msoffcrypto-tool',
  323 + 'pcodedmp>=1.2.5',
321 ], 324 ],
322 ) 325 )
323 326
tests/common/log_helper/log_helper_test_imported.py
@@ -11,6 +11,8 @@ INFO_MESSAGE = &#39;imported: info log&#39; @@ -11,6 +11,8 @@ INFO_MESSAGE = &#39;imported: info log&#39;
11 WARNING_MESSAGE = 'imported: warning log' 11 WARNING_MESSAGE = 'imported: warning log'
12 ERROR_MESSAGE = 'imported: error log' 12 ERROR_MESSAGE = 'imported: error log'
13 CRITICAL_MESSAGE = 'imported: critical log' 13 CRITICAL_MESSAGE = 'imported: critical log'
  14 +RESULT_MESSAGE = 'imported: result log'
  15 +RESULT_TYPE = 'imported: result'
14 16
15 logger = log_helper.get_or_create_silent_logger('test_imported', logging.ERROR) 17 logger = log_helper.get_or_create_silent_logger('test_imported', logging.ERROR)
16 18
@@ -21,3 +23,4 @@ def log(): @@ -21,3 +23,4 @@ def log():
21 logger.warning(WARNING_MESSAGE) 23 logger.warning(WARNING_MESSAGE)
22 logger.error(ERROR_MESSAGE) 24 logger.error(ERROR_MESSAGE)
23 logger.critical(CRITICAL_MESSAGE) 25 logger.critical(CRITICAL_MESSAGE)
  26 + logger.info(RESULT_MESSAGE, type=RESULT_TYPE)
tests/common/log_helper/log_helper_test_main.py
@@ -9,6 +9,8 @@ INFO_MESSAGE = &#39;main: info log&#39; @@ -9,6 +9,8 @@ INFO_MESSAGE = &#39;main: info log&#39;
9 WARNING_MESSAGE = 'main: warning log' 9 WARNING_MESSAGE = 'main: warning log'
10 ERROR_MESSAGE = 'main: error log' 10 ERROR_MESSAGE = 'main: error log'
11 CRITICAL_MESSAGE = 'main: critical log' 11 CRITICAL_MESSAGE = 'main: critical log'
  12 +RESULT_MESSAGE = 'main: result log'
  13 +RESULT_TYPE = 'main: result'
12 14
13 logger = log_helper.get_or_create_silent_logger('test_main') 15 logger = log_helper.get_or_create_silent_logger('test_main')
14 16
@@ -32,12 +34,16 @@ def init_logging_and_log(args): @@ -32,12 +34,16 @@ def init_logging_and_log(args):
32 level = args[-1] 34 level = args[-1]
33 use_json = 'as-json' in args 35 use_json = 'as-json' in args
34 throw = 'throw' in args 36 throw = 'throw' in args
  37 + percent_autoformat = '%-autoformat' in args
35 38
36 if 'enable' in args: 39 if 'enable' in args:
37 log_helper.enable_logging(use_json, level, stream=sys.stdout) 40 log_helper.enable_logging(use_json, level, stream=sys.stdout)
38 41
39 _log() 42 _log()
40 43
  44 + if percent_autoformat:
  45 + logger.info('The %s is %d.', 'answer', 47)
  46 +
41 if throw: 47 if throw:
42 raise Exception('An exception occurred before ending the logging') 48 raise Exception('An exception occurred before ending the logging')
43 49
@@ -50,6 +56,7 @@ def _log(): @@ -50,6 +56,7 @@ def _log():
50 logger.warning(WARNING_MESSAGE) 56 logger.warning(WARNING_MESSAGE)
51 logger.error(ERROR_MESSAGE) 57 logger.error(ERROR_MESSAGE)
52 logger.critical(CRITICAL_MESSAGE) 58 logger.critical(CRITICAL_MESSAGE)
  59 + logger.info(RESULT_MESSAGE, type=RESULT_TYPE)
53 log_helper_test_imported.log() 60 log_helper_test_imported.log()
54 61
55 62
tests/common/log_helper/test_log_helper.py
@@ -13,9 +13,11 @@ from tests.common.log_helper import log_helper_test_main @@ -13,9 +13,11 @@ from tests.common.log_helper import log_helper_test_main
13 from tests.common.log_helper import log_helper_test_imported 13 from tests.common.log_helper import log_helper_test_imported
14 from os.path import dirname, join, relpath, abspath 14 from os.path import dirname, join, relpath, abspath
15 15
  16 +from tests.test_utils import PROJECT_ROOT
  17 +
16 # this is the common base of "tests" and "oletools" dirs 18 # this is the common base of "tests" and "oletools" dirs
17 -ROOT_DIRECTORY = abspath(join(__file__, '..', '..', '..', '..'))  
18 -TEST_FILE = relpath(join(dirname(__file__), 'log_helper_test_main.py'), ROOT_DIRECTORY) 19 +TEST_FILE = relpath(join(dirname(abspath(__file__)), 'log_helper_test_main.py'),
  20 + PROJECT_ROOT)
19 PYTHON_EXECUTABLE = sys.executable 21 PYTHON_EXECUTABLE = sys.executable
20 22
21 MAIN_LOG_MESSAGES = [ 23 MAIN_LOG_MESSAGES = [
@@ -59,6 +61,62 @@ class TestLogHelper(unittest.TestCase): @@ -59,6 +61,62 @@ class TestLogHelper(unittest.TestCase):
59 log_helper_test_imported.CRITICAL_MESSAGE 61 log_helper_test_imported.CRITICAL_MESSAGE
60 ]) 62 ])
61 63
  64 + def test_logs_type_ignored(self):
  65 + """Run test script with logging enabled at info level. Want no type."""
  66 + output = self._run_test(['enable', 'info'])
  67 +
  68 + expect = '\n'.join([
  69 + 'INFO ' + log_helper_test_main.INFO_MESSAGE,
  70 + 'WARNING ' + log_helper_test_main.WARNING_MESSAGE,
  71 + 'ERROR ' + log_helper_test_main.ERROR_MESSAGE,
  72 + 'CRITICAL ' + log_helper_test_main.CRITICAL_MESSAGE,
  73 + 'INFO ' + log_helper_test_main.RESULT_MESSAGE,
  74 + 'INFO ' + log_helper_test_imported.INFO_MESSAGE,
  75 + 'WARNING ' + log_helper_test_imported.WARNING_MESSAGE,
  76 + 'ERROR ' + log_helper_test_imported.ERROR_MESSAGE,
  77 + 'CRITICAL ' + log_helper_test_imported.CRITICAL_MESSAGE,
  78 + 'INFO ' + log_helper_test_imported.RESULT_MESSAGE,
  79 + ])
  80 + self.assertEqual(output, expect)
  81 +
  82 + def test_logs_type_in_json(self):
  83 + """Check type field is contained in json log."""
  84 + output = self._run_test(['enable', 'as-json', 'info'])
  85 +
  86 + # convert to json preserving order of output
  87 + jout = json.loads(output)
  88 +
  89 + jexpect = [
  90 + dict(type='msg', level='INFO',
  91 + msg=log_helper_test_main.INFO_MESSAGE),
  92 + dict(type='msg', level='WARNING',
  93 + msg=log_helper_test_main.WARNING_MESSAGE),
  94 + dict(type='msg', level='ERROR',
  95 + msg=log_helper_test_main.ERROR_MESSAGE),
  96 + dict(type='msg', level='CRITICAL',
  97 + msg=log_helper_test_main.CRITICAL_MESSAGE),
  98 + # this is the important entry (has a different "type" field):
  99 + dict(type=log_helper_test_main.RESULT_TYPE, level='INFO',
  100 + msg=log_helper_test_main.RESULT_MESSAGE),
  101 + dict(type='msg', level='INFO',
  102 + msg=log_helper_test_imported.INFO_MESSAGE),
  103 + dict(type='msg', level='WARNING',
  104 + msg=log_helper_test_imported.WARNING_MESSAGE),
  105 + dict(type='msg', level='ERROR',
  106 + msg=log_helper_test_imported.ERROR_MESSAGE),
  107 + dict(type='msg', level='CRITICAL',
  108 + msg=log_helper_test_imported.CRITICAL_MESSAGE),
  109 + # ... and this:
  110 + dict(type=log_helper_test_imported.RESULT_TYPE, level='INFO',
  111 + msg=log_helper_test_imported.RESULT_MESSAGE),
  112 + ]
  113 + self.assertEqual(jout, jexpect)
  114 +
  115 + def test_percent_autoformat(self):
  116 + """Test that auto-formatting of log strings with `%` works."""
  117 + output = self._run_test(['enable', '%-autoformat', 'info'])
  118 + self.assertIn('The answer is 47.', output)
  119 +
62 def test_json_correct_on_exceptions(self): 120 def test_json_correct_on_exceptions(self):
63 """ 121 """
64 Test that even on unhandled exceptions our JSON is always correct 122 Test that even on unhandled exceptions our JSON is always correct
@@ -72,10 +130,10 @@ class TestLogHelper(unittest.TestCase): @@ -72,10 +130,10 @@ class TestLogHelper(unittest.TestCase):
72 def _assert_json_messages(self, output, messages): 130 def _assert_json_messages(self, output, messages):
73 try: 131 try:
74 json_data = json.loads(output) 132 json_data = json.loads(output)
75 - self.assertEquals(len(json_data), len(messages)) 133 + self.assertEqual(len(json_data), len(messages))
76 134
77 for i in range(len(messages)): 135 for i in range(len(messages)):
78 - self.assertEquals(messages[i], json_data[i]['msg']) 136 + self.assertEqual(messages[i], json_data[i]['msg'])
79 except ValueError: 137 except ValueError:
80 self.fail('Invalid json:\n' + output) 138 self.fail('Invalid json:\n' + output)
81 139
@@ -90,9 +148,9 @@ class TestLogHelper(unittest.TestCase): @@ -90,9 +148,9 @@ class TestLogHelper(unittest.TestCase):
90 child = subprocess.Popen( 148 child = subprocess.Popen(
91 [PYTHON_EXECUTABLE, TEST_FILE] + args, 149 [PYTHON_EXECUTABLE, TEST_FILE] + args,
92 shell=False, 150 shell=False,
93 - env={'PYTHONPATH': ROOT_DIRECTORY}, 151 + env={'PYTHONPATH': PROJECT_ROOT},
94 universal_newlines=True, 152 universal_newlines=True,
95 - cwd=ROOT_DIRECTORY, 153 + cwd=PROJECT_ROOT,
96 stdin=None, 154 stdin=None,
97 stdout=subprocess.PIPE, 155 stdout=subprocess.PIPE,
98 stderr=subprocess.PIPE 156 stderr=subprocess.PIPE
@@ -102,7 +160,7 @@ class TestLogHelper(unittest.TestCase): @@ -102,7 +160,7 @@ class TestLogHelper(unittest.TestCase):
102 if not isinstance(output, str): 160 if not isinstance(output, str):
103 output = output.decode('utf-8') 161 output = output.decode('utf-8')
104 162
105 - self.assertEquals(child.returncode == 0, should_succeed) 163 + self.assertEqual(child.returncode == 0, should_succeed)
106 164
107 return output.strip() 165 return output.strip()
108 166
tests/msodde/test_basic.py
@@ -9,11 +9,16 @@ Ensure that @@ -9,11 +9,16 @@ Ensure that
9 from __future__ import print_function 9 from __future__ import print_function
10 10
11 import unittest 11 import unittest
12 -from oletools import msodde  
13 -from tests.test_utils import DATA_BASE_DIR as BASE_DIR 12 +import sys
14 import os 13 import os
15 -from os.path import join 14 +from os.path import join, basename
16 from traceback import print_exc 15 from traceback import print_exc
  16 +import json
  17 +from collections import OrderedDict
  18 +from oletools import msodde
  19 +from oletools.crypto import \
  20 + WrongEncryptionPassword, CryptoLibNotImported, check_msoffcrypto
  21 +from tests.test_utils import call_and_capture, DATA_BASE_DIR as BASE_DIR
17 22
18 23
19 class TestReturnCode(unittest.TestCase): 24 class TestReturnCode(unittest.TestCase):
@@ -46,15 +51,21 @@ class TestReturnCode(unittest.TestCase): @@ -46,15 +51,21 @@ class TestReturnCode(unittest.TestCase):
46 51
47 def test_invalid_none(self): 52 def test_invalid_none(self):
48 """ check that no file argument leads to non-zero exit status """ 53 """ check that no file argument leads to non-zero exit status """
49 - self.do_test_validity('', True) 54 + if sys.hexversion > 0x03030000: # version 3.3 and higher
  55 + # different errors probably depending on whether msoffcryto is
  56 + # available or not
  57 + expect_error = (AttributeError, FileNotFoundError)
  58 + else:
  59 + expect_error = (AttributeError, IOError)
  60 + self.do_test_validity('', expect_error)
50 61
51 def test_invalid_empty(self): 62 def test_invalid_empty(self):
52 """ check that empty file argument leads to non-zero exit status """ 63 """ check that empty file argument leads to non-zero exit status """
53 - self.do_test_validity(join(BASE_DIR, 'basic/empty'), True) 64 + self.do_test_validity(join(BASE_DIR, 'basic/empty'), Exception)
54 65
55 def test_invalid_text(self): 66 def test_invalid_text(self):
56 """ check that text file argument leads to non-zero exit status """ 67 """ check that text file argument leads to non-zero exit status """
57 - self.do_test_validity(join(BASE_DIR, 'basic/text'), True) 68 + self.do_test_validity(join(BASE_DIR, 'basic/text'), Exception)
58 69
59 def test_encrypted(self): 70 def test_encrypted(self):
60 """ 71 """
@@ -64,28 +75,56 @@ class TestReturnCode(unittest.TestCase): @@ -64,28 +75,56 @@ class TestReturnCode(unittest.TestCase):
64 Encryption) is tested. 75 Encryption) is tested.
65 """ 76 """
66 CRYPT_DIR = join(BASE_DIR, 'encrypted') 77 CRYPT_DIR = join(BASE_DIR, 'encrypted')
67 - ADD_ARGS = '', '-j', '-d', '-f', '-a' 78 + have_crypto = check_msoffcrypto()
68 for filename in os.listdir(CRYPT_DIR): 79 for filename in os.listdir(CRYPT_DIR):
69 - full_name = join(CRYPT_DIR, filename)  
70 - for args in ADD_ARGS:  
71 - self.do_test_validity(args + ' ' + full_name, True)  
72 -  
73 - def do_test_validity(self, args, expect_error=False):  
74 - """ helper for test_valid_doc[x] """  
75 - have_exception = False 80 + if have_crypto and 'standardpassword' in filename:
  81 + # these are automagically decrypted
  82 + self.do_test_validity(join(CRYPT_DIR, filename))
  83 + elif have_crypto:
  84 + self.do_test_validity(join(CRYPT_DIR, filename),
  85 + WrongEncryptionPassword)
  86 + else:
  87 + self.do_test_validity(join(CRYPT_DIR, filename),
  88 + CryptoLibNotImported)
  89 +
  90 + def do_test_validity(self, filename, expect_error=None):
  91 + """ helper for test_[in]valid_* """
  92 + found_error = None
  93 + # DEBUG: print('Testing file {}'.format(filename))
76 try: 94 try:
77 - msodde.process_file(args, msodde.FIELD_FILTER_BLACKLIST)  
78 - except Exception:  
79 - have_exception = True  
80 - print_exc()  
81 - except SystemExit as exc: # sys.exit() was called  
82 - have_exception = True  
83 - if exc.code is None:  
84 - have_exception = False  
85 -  
86 - self.assertEqual(expect_error, have_exception,  
87 - msg='Args={0}, expect={1}, exc={2}'  
88 - .format(args, expect_error, have_exception)) 95 + msodde.process_maybe_encrypted(filename,
  96 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
  97 + except Exception as exc:
  98 + found_error = exc
  99 + # DEBUG: print_exc()
  100 +
  101 + if expect_error and not found_error:
  102 + self.fail('Expected {} but msodde finished without errors for {}'
  103 + .format(expect_error, filename))
  104 + elif not expect_error and found_error:
  105 + self.fail('Unexpected error {} from msodde for {}'
  106 + .format(found_error, filename))
  107 + elif expect_error and not isinstance(found_error, expect_error):
  108 + self.fail('Wrong kind of error {} from msodde for {}, expected {}'
  109 + .format(type(found_error), filename, expect_error))
  110 +
  111 +
  112 +@unittest.skipIf(not check_msoffcrypto(),
  113 + 'Module msoffcrypto not installed for {}'
  114 + .format(basename(sys.executable)))
  115 +class TestErrorOutput(unittest.TestCase):
  116 + """msodde does not specify error by return code but text output."""
  117 +
  118 + def test_crypt_output(self):
  119 + """Check for helpful error message when failing to decrypt."""
  120 + for suffix in 'doc', 'docm', 'docx', 'ppt', 'pptm', 'pptx', 'xls', \
  121 + 'xlsb', 'xlsm', 'xlsx':
  122 + example_file = join(BASE_DIR, 'encrypted', 'encrypted.' + suffix)
  123 + output, ret_code = call_and_capture('msodde', [example_file, ],
  124 + accept_nonzero_exit=True)
  125 + self.assertEqual(ret_code, 1)
  126 + self.assertIn('passwords could not decrypt office file', output,
  127 + msg='Unexpected output: {}'.format(output.strip()))
89 128
90 129
91 class TestDdeLinks(unittest.TestCase): 130 class TestDdeLinks(unittest.TestCase):
@@ -100,33 +139,37 @@ class TestDdeLinks(unittest.TestCase): @@ -100,33 +139,37 @@ class TestDdeLinks(unittest.TestCase):
100 def test_with_dde(self): 139 def test_with_dde(self):
101 """ check that dde links appear on stdout """ 140 """ check that dde links appear on stdout """
102 filename = 'dde-test-from-office2003.doc' 141 filename = 'dde-test-from-office2003.doc'
103 - output = msodde.process_file(  
104 - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST) 142 + output = msodde.process_maybe_encrypted(
  143 + join(BASE_DIR, 'msodde', filename),
  144 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
105 self.assertNotEqual(len(self.get_dde_from_output(output)), 0, 145 self.assertNotEqual(len(self.get_dde_from_output(output)), 0,
106 msg='Found no dde links in output of ' + filename) 146 msg='Found no dde links in output of ' + filename)
107 147
108 def test_no_dde(self): 148 def test_no_dde(self):
109 """ check that no dde links appear on stdout """ 149 """ check that no dde links appear on stdout """
110 filename = 'harmless-clean.doc' 150 filename = 'harmless-clean.doc'
111 - output = msodde.process_file(  
112 - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST) 151 + output = msodde.process_maybe_encrypted(
  152 + join(BASE_DIR, 'msodde', filename),
  153 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
113 self.assertEqual(len(self.get_dde_from_output(output)), 0, 154 self.assertEqual(len(self.get_dde_from_output(output)), 0,
114 msg='Found dde links in output of ' + filename) 155 msg='Found dde links in output of ' + filename)
115 156
116 def test_with_dde_utf16le(self): 157 def test_with_dde_utf16le(self):
117 """ check that dde links appear on stdout """ 158 """ check that dde links appear on stdout """
118 filename = 'dde-test-from-office2013-utf_16le-korean.doc' 159 filename = 'dde-test-from-office2013-utf_16le-korean.doc'
119 - output = msodde.process_file(  
120 - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST) 160 + output = msodde.process_maybe_encrypted(
  161 + join(BASE_DIR, 'msodde', filename),
  162 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
121 self.assertNotEqual(len(self.get_dde_from_output(output)), 0, 163 self.assertNotEqual(len(self.get_dde_from_output(output)), 0,
122 msg='Found no dde links in output of ' + filename) 164 msg='Found no dde links in output of ' + filename)
123 165
124 def test_excel(self): 166 def test_excel(self):
125 """ check that dde links are found in excel 2007+ files """ 167 """ check that dde links are found in excel 2007+ files """
126 - expect = ['DDE-Link cmd /c calc.exe', ] 168 + expect = ['cmd /c calc.exe', ]
127 for extn in 'xlsx', 'xlsm', 'xlsb': 169 for extn in 'xlsx', 'xlsm', 'xlsb':
128 - output = msodde.process_file(  
129 - join(BASE_DIR, 'msodde', 'dde-test.' + extn), msodde.FIELD_FILTER_BLACKLIST) 170 + output = msodde.process_maybe_encrypted(
  171 + join(BASE_DIR, 'msodde', 'dde-test.' + extn),
  172 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
130 173
131 self.assertEqual(expect, self.get_dde_from_output(output), 174 self.assertEqual(expect, self.get_dde_from_output(output),
132 msg='unexpected output for dde-test.{0}: {1}' 175 msg='unexpected output for dde-test.{0}: {1}'
@@ -136,8 +179,9 @@ class TestDdeLinks(unittest.TestCase): @@ -136,8 +179,9 @@ class TestDdeLinks(unittest.TestCase):
136 """ check that dde in xml from word / excel is found """ 179 """ check that dde in xml from word / excel is found """
137 for name_part in 'excel2003', 'word2003', 'word2007': 180 for name_part in 'excel2003', 'word2003', 'word2007':
138 filename = 'dde-in-' + name_part + '.xml' 181 filename = 'dde-in-' + name_part + '.xml'
139 - output = msodde.process_file(  
140 - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST) 182 + output = msodde.process_maybe_encrypted(
  183 + join(BASE_DIR, 'msodde', filename),
  184 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
141 links = self.get_dde_from_output(output) 185 links = self.get_dde_from_output(output)
142 self.assertEqual(len(links), 1, 'found {0} dde-links in {1}' 186 self.assertEqual(len(links), 1, 'found {0} dde-links in {1}'
143 .format(len(links), filename)) 187 .format(len(links), filename))
@@ -149,15 +193,17 @@ class TestDdeLinks(unittest.TestCase): @@ -149,15 +193,17 @@ class TestDdeLinks(unittest.TestCase):
149 def test_clean_rtf_blacklist(self): 193 def test_clean_rtf_blacklist(self):
150 """ find a lot of hyperlinks in rtf spec """ 194 """ find a lot of hyperlinks in rtf spec """
151 filename = 'RTF-Spec-1.7.rtf' 195 filename = 'RTF-Spec-1.7.rtf'
152 - output = msodde.process_file(  
153 - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST) 196 + output = msodde.process_maybe_encrypted(
  197 + join(BASE_DIR, 'msodde', filename),
  198 + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST)
154 self.assertEqual(len(self.get_dde_from_output(output)), 1413) 199 self.assertEqual(len(self.get_dde_from_output(output)), 1413)
155 200
156 def test_clean_rtf_ddeonly(self): 201 def test_clean_rtf_ddeonly(self):
157 """ find no dde links in rtf spec """ 202 """ find no dde links in rtf spec """
158 filename = 'RTF-Spec-1.7.rtf' 203 filename = 'RTF-Spec-1.7.rtf'
159 - output = msodde.process_file(  
160 - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_DDE) 204 + output = msodde.process_maybe_encrypted(
  205 + join(BASE_DIR, 'msodde', filename),
  206 + field_filter_mode=msodde.FIELD_FILTER_DDE)
161 self.assertEqual(len(self.get_dde_from_output(output)), 0, 207 self.assertEqual(len(self.get_dde_from_output(output)), 0,
162 msg='Found dde links in output of ' + filename) 208 msg='Found dde links in output of ' + filename)
163 209
tests/msodde/test_crypto.py 0 → 100644
  1 +"""Check decryption of files from msodde works."""
  2 +
  3 +import sys
  4 +import unittest
  5 +from os.path import basename, join as pjoin
  6 +
  7 +from tests.test_utils import DATA_BASE_DIR, call_and_capture
  8 +
  9 +from oletools import crypto
  10 +
  11 +
  12 +@unittest.skipIf(not crypto.check_msoffcrypto(),
  13 + 'Module msoffcrypto not installed for {}'
  14 + .format(basename(sys.executable)))
  15 +class MsoddeCryptoTest(unittest.TestCase):
  16 + """Test integration of decryption in msodde."""
  17 +
  18 + def test_standard_password(self):
  19 + """Check dde-link is found in xls[mb] sample files."""
  20 + for suffix in 'xls', 'xlsx', 'xlsm', 'xlsb':
  21 + example_file = pjoin(DATA_BASE_DIR, 'encrypted',
  22 + 'dde-test-encrypt-standardpassword.' + suffix)
  23 + output, _ = call_and_capture('msodde', [example_file, ])
  24 + self.assertIn('\nDDE Links:\ncmd /c calc.exe\n', output,
  25 + msg='Unexpected output {!r} for {}'
  26 + .format(output, suffix))
  27 +
  28 + # TODO: add more, in particular a sample with a "proper" password
  29 +
  30 +
  31 +if __name__ == '__main__':
  32 + unittest.main()
tests/oleid/test_basic.py
@@ -20,7 +20,7 @@ class TestOleIDBasic(unittest.TestCase): @@ -20,7 +20,7 @@ class TestOleIDBasic(unittest.TestCase):
20 """Run all file in test-data through oleid and compare to known ouput""" 20 """Run all file in test-data through oleid and compare to known ouput"""
21 # this relies on order of indicators being constant, could relax that 21 # this relies on order of indicators being constant, could relax that
22 # Also requires that files have the correct suffixes (no rtf in doc) 22 # Also requires that files have the correct suffixes (no rtf in doc)
23 - NON_OLE_SUFFIXES = ('.xml', '.csv', '.rtf', '') 23 + NON_OLE_SUFFIXES = ('.xml', '.csv', '.rtf', '', '.odt', '.ods', '.odp')
24 NON_OLE_VALUES = (False, ) 24 NON_OLE_VALUES = (False, )
25 WORD = b'Microsoft Office Word' 25 WORD = b'Microsoft Office Word'
26 PPT = b'Microsoft Office PowerPoint' 26 PPT = b'Microsoft Office PowerPoint'
@@ -121,6 +121,33 @@ class TestOleIDBasic(unittest.TestCase): @@ -121,6 +121,33 @@ class TestOleIDBasic(unittest.TestCase):
121 'msodde/harmless-clean.docx': (False,), 121 'msodde/harmless-clean.docx': (False,),
122 'oleform/oleform-PR314.docm': (False,), 122 'oleform/oleform-PR314.docm': (False,),
123 'basic/encrypted.docx': CRYPT, 123 'basic/encrypted.docx': CRYPT,
  124 + 'oleobj/external_link/sample_with_external_link_to_doc.docx': (False,),
  125 + 'oleobj/external_link/sample_with_external_link_to_doc.xlsb': (False,),
  126 + 'oleobj/external_link/sample_with_external_link_to_doc.dotm': (False,),
  127 + 'oleobj/external_link/sample_with_external_link_to_doc.xlsm': (False,),
  128 + 'oleobj/external_link/sample_with_external_link_to_doc.pptx': (False,),
  129 + 'oleobj/external_link/sample_with_external_link_to_doc.dotx': (False,),
  130 + 'oleobj/external_link/sample_with_external_link_to_doc.docm': (False,),
  131 + 'oleobj/external_link/sample_with_external_link_to_doc.potm': (False,),
  132 + 'oleobj/external_link/sample_with_external_link_to_doc.xlsx': (False,),
  133 + 'oleobj/external_link/sample_with_external_link_to_doc.potx': (False,),
  134 + 'oleobj/external_link/sample_with_external_link_to_doc.ppsm': (False,),
  135 + 'oleobj/external_link/sample_with_external_link_to_doc.pptm': (False,),
  136 + 'oleobj/external_link/sample_with_external_link_to_doc.ppsx': (False,),
  137 + 'encrypted/autostart-encrypt-standardpassword.xlsm':
  138 + (True, False, 'unknown', True, False, False, False, False, False, False, 0),
  139 + 'encrypted/autostart-encrypt-standardpassword.xls':
  140 + (True, True, EXCEL, True, False, True, True, False, False, False, 0),
  141 + 'encrypted/dde-test-encrypt-standardpassword.xlsx':
  142 + (True, False, 'unknown', True, False, False, False, False, False, False, 0),
  143 + 'encrypted/dde-test-encrypt-standardpassword.xlsm':
  144 + (True, False, 'unknown', True, False, False, False, False, False, False, 0),
  145 + 'encrypted/autostart-encrypt-standardpassword.xlsb':
  146 + (True, False, 'unknown', True, False, False, False, False, False, False, 0),
  147 + 'encrypted/dde-test-encrypt-standardpassword.xls':
  148 + (True, True, EXCEL, True, False, False, True, False, False, False, 0),
  149 + 'encrypted/dde-test-encrypt-standardpassword.xlsb':
  150 + (True, False, 'unknown', True, False, False, False, False, False, False, 0),
124 } 151 }
125 152
126 indicator_names = [] 153 indicator_names = []
@@ -148,7 +175,8 @@ class TestOleIDBasic(unittest.TestCase): @@ -148,7 +175,8 @@ class TestOleIDBasic(unittest.TestCase):
148 OLE_VALUES[name])) 175 OLE_VALUES[name]))
149 except KeyError: 176 except KeyError:
150 print('Should add oleid output for {} to {} ({})' 177 print('Should add oleid output for {} to {} ({})'
151 - .format(name, __name__, values[3:])) 178 + .format(name, __name__, values))
  179 +
152 180
153 # just in case somebody calls this file as a script 181 # just in case somebody calls this file as a script
154 if __name__ == '__main__': 182 if __name__ == '__main__':
tests/oleobj/test_basic.py
@@ -8,7 +8,7 @@ from hashlib import md5 @@ -8,7 +8,7 @@ from hashlib import md5
8 from glob import glob 8 from glob import glob
9 9
10 # Directory with test data, independent of current working directory 10 # Directory with test data, independent of current working directory
11 -from tests.test_utils import DATA_BASE_DIR 11 +from tests.test_utils import DATA_BASE_DIR, call_and_capture
12 from oletools import oleobj 12 from oletools import oleobj
13 13
14 14
@@ -41,8 +41,10 @@ SAMPLES += tuple( @@ -41,8 +41,10 @@ SAMPLES += tuple(
41 'ab8c65e4c0fc51739aa66ca5888265b4') 41 'ab8c65e4c0fc51739aa66ca5888265b4')
42 for extn in ('xls', 'xlsx', 'xlsb', 'xlsm', 'xla', 'xlam', 'xlt', 'xltm', 42 for extn in ('xls', 'xlsx', 'xlsb', 'xlsm', 'xla', 'xlam', 'xlt', 'xltm',
43 'xltx', 'ppt', 'pptx', 'pptm', 'pps', 'ppsx', 'ppsm', 'pot', 43 'xltx', 'ppt', 'pptx', 'pptm', 'pps', 'ppsx', 'ppsm', 'pot',
44 - 'potx', 'potm') 44 + 'potx', 'potm', 'ods', 'odp')
45 ) 45 )
  46 +SAMPLES += (('embedded-simple-2007.odt', 'simple-text-file.txt',
  47 + 'bd5c063a5a43f67b3c50dc7b0f1195af'), )
46 48
47 49
48 def calc_md5(filename): 50 def calc_md5(filename):
@@ -79,10 +81,6 @@ class TestOleObj(unittest.TestCase): @@ -79,10 +81,6 @@ class TestOleObj(unittest.TestCase):
79 """ fixture start: create temp dir """ 81 """ fixture start: create temp dir """
80 self.temp_dir = mkdtemp(prefix='oletools-oleobj-') 82 self.temp_dir = mkdtemp(prefix='oletools-oleobj-')
81 self.did_fail = False 83 self.did_fail = False
82 - if DEBUG:  
83 - import logging  
84 - logging.basicConfig(level=logging.DEBUG if DEBUG else logging.INFO)  
85 - oleobj.log.setLevel(logging.NOTSET)  
86 84
87 def tearDown(self): 85 def tearDown(self):
88 """ fixture end: remove temp dir """ 86 """ fixture end: remove temp dir """
@@ -99,7 +97,8 @@ class TestOleObj(unittest.TestCase): @@ -99,7 +97,8 @@ class TestOleObj(unittest.TestCase):
99 """ 97 """
100 test that oleobj can be called with -i and -v 98 test that oleobj can be called with -i and -v
101 99
102 - this is the way that amavisd calls oleobj, thinking it is ripOLE 100 + This is how ripOLE used to be often called (e.g. by amavisd-new);
  101 + ensure oleobj is a compatible replacement.
103 """ 102 """
104 self.do_test_md5(['-d', self.temp_dir, '-v', '-i']) 103 self.do_test_md5(['-d', self.temp_dir, '-v', '-i'])
105 104
@@ -110,35 +109,52 @@ class TestOleObj(unittest.TestCase): @@ -110,35 +109,52 @@ class TestOleObj(unittest.TestCase):
110 'embedded-simple-2007.xml', 109 'embedded-simple-2007.xml',
111 'embedded-simple-2007-as2003.xml'): 110 'embedded-simple-2007-as2003.xml'):
112 full_name = join(DATA_BASE_DIR, 'oleobj', sample_name) 111 full_name = join(DATA_BASE_DIR, 'oleobj', sample_name)
113 - ret_val = oleobj.main(args + [full_name, ]) 112 + output, ret_val = call_and_capture('oleobj', args + [full_name, ],
  113 + accept_nonzero_exit=True)
114 if glob(self.temp_dir + 'ole-object-*'): 114 if glob(self.temp_dir + 'ole-object-*'):
115 - self.fail('found embedded data in {0}'.format(sample_name))  
116 - self.assertEqual(ret_val, oleobj.RETURN_NO_DUMP) 115 + self.fail('found embedded data in {0}. Output:\n{1}'
  116 + .format(sample_name, output))
  117 + self.assertEqual(ret_val, oleobj.RETURN_NO_DUMP,
  118 + msg='Wrong return value {} for {}. Output:\n{}'
  119 + .format(ret_val, sample_name, output))
117 120
118 - def do_test_md5(self, args, test_fun=oleobj.main): 121 + def do_test_md5(self, args, test_fun=None, only_run_every=1):
119 """ helper for test_md5 and test_md5_args """ 122 """ helper for test_md5 and test_md5_args """
120 - # name of sample, extension of embedded file, md5 hash of embedded file  
121 data_dir = join(DATA_BASE_DIR, 'oleobj') 123 data_dir = join(DATA_BASE_DIR, 'oleobj')
122 - for sample_name, embedded_name, expect_hash in SAMPLES:  
123 - ret_val = test_fun(args + [join(data_dir, sample_name), ])  
124 - self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP) 124 +
  125 + # name of sample, extension of embedded file, md5 hash of embedded file
  126 + for sample_index, (sample_name, embedded_name, expect_hash) \
  127 + in enumerate(SAMPLES):
  128 + if sample_index % only_run_every != 0:
  129 + continue
  130 + args_with_path = args + [join(data_dir, sample_name), ]
  131 + if test_fun is None:
  132 + output, ret_val = call_and_capture('oleobj', args_with_path,
  133 + accept_nonzero_exit=True)
  134 + else:
  135 + ret_val = test_fun(args_with_path)
  136 + output = '[output: see above]'
  137 + self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP,
  138 + msg='Wrong return value {} for {}. Output:\n{}'
  139 + .format(ret_val, sample_name, output))
125 expect_name = join(self.temp_dir, 140 expect_name = join(self.temp_dir,
126 sample_name + '_' + embedded_name) 141 sample_name + '_' + embedded_name)
127 if not isfile(expect_name): 142 if not isfile(expect_name):
128 self.did_fail = True 143 self.did_fail = True
129 - self.fail('{0} not created from {1}'.format(expect_name,  
130 - sample_name)) 144 + self.fail('{0} not created from {1}. Output:\n{2}'
  145 + .format(expect_name, sample_name, output))
131 continue 146 continue
132 md5_hash = calc_md5(expect_name) 147 md5_hash = calc_md5(expect_name)
133 if md5_hash != expect_hash: 148 if md5_hash != expect_hash:
134 self.did_fail = True 149 self.did_fail = True
135 - self.fail('Wrong md5 {0} of {1} from {2}'  
136 - .format(md5_hash, expect_name, sample_name)) 150 + self.fail('Wrong md5 {0} of {1} from {2}. Output:\n{3}'
  151 + .format(md5_hash, expect_name, sample_name, output))
137 continue 152 continue
138 153
139 def test_non_streamed(self): 154 def test_non_streamed(self):
140 """ Ensure old oleobj behaviour still works: pre-read whole file """ 155 """ Ensure old oleobj behaviour still works: pre-read whole file """
141 - return self.do_test_md5(['-d', self.temp_dir], test_fun=preread_file) 156 + return self.do_test_md5(['-d', self.temp_dir], test_fun=preread_file,
  157 + only_run_every=4)
142 158
143 159
144 # just in case somebody calls this file as a script 160 # just in case somebody calls this file as a script
tests/oleobj/test_external_links.py
@@ -6,7 +6,7 @@ import os @@ -6,7 +6,7 @@ import os
6 from os import path 6 from os import path
7 7
8 # Directory with test data, independent of current working directory 8 # Directory with test data, independent of current working directory
9 -from tests.test_utils import DATA_BASE_DIR 9 +from tests.test_utils import DATA_BASE_DIR, call_and_capture
10 from oletools import oleobj 10 from oletools import oleobj
11 11
12 BASE_DIR = path.join(DATA_BASE_DIR, 'oleobj', 'external_link') 12 BASE_DIR = path.join(DATA_BASE_DIR, 'oleobj', 'external_link')
@@ -22,8 +22,11 @@ class TestExternalLinks(unittest.TestCase): @@ -22,8 +22,11 @@ class TestExternalLinks(unittest.TestCase):
22 for filename in filenames: 22 for filename in filenames:
23 file_path = path.join(dirpath, filename) 23 file_path = path.join(dirpath, filename)
24 24
25 - ret_val = oleobj.main([file_path])  
26 - self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP) 25 + output, ret_val = call_and_capture('oleobj', [file_path, ],
  26 + accept_nonzero_exit=True)
  27 + self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP,
  28 + msg='Wrong return value {} for {}. Output:\n{}'
  29 + .format(ret_val, filename, output))
27 30
28 31
29 # just in case somebody calls this file as a script 32 # just in case somebody calls this file as a script
tests/olevba/test_basic.py
@@ -3,21 +3,71 @@ Test basic functionality of olevba[3] @@ -3,21 +3,71 @@ Test basic functionality of olevba[3]
3 """ 3 """
4 4
5 import unittest 5 import unittest
6 -import sys  
7 -if sys.version_info.major <= 2:  
8 - from oletools import olevba  
9 -else:  
10 - from oletools import olevba3 as olevba  
11 import os 6 import os
12 from os.path import join 7 from os.path import join
  8 +import re
13 9
14 # Directory with test data, independent of current working directory 10 # Directory with test data, independent of current working directory
15 -from tests.test_utils import DATA_BASE_DIR 11 +from tests.test_utils import DATA_BASE_DIR, call_and_capture
16 12
17 13
18 class TestOlevbaBasic(unittest.TestCase): 14 class TestOlevbaBasic(unittest.TestCase):
19 """Tests olevba basic functionality""" 15 """Tests olevba basic functionality"""
20 16
  17 + def test_text_behaviour(self):
  18 + """Test behaviour of olevba when presented with pure text file."""
  19 + self.do_test_behaviour('text')
  20 +
  21 + def test_empty_behaviour(self):
  22 + """Test behaviour of olevba when presented with pure text file."""
  23 + self.do_test_behaviour('empty')
  24 +
  25 + def do_test_behaviour(self, filename):
  26 + """Helper for test_{text,empty}_behaviour."""
  27 + input_file = join(DATA_BASE_DIR, 'basic', filename)
  28 + output, _ = call_and_capture('olevba', args=(input_file, ))
  29 +
  30 + # check output
  31 + self.assertTrue(re.search(r'^Type:\s+Text\s*$', output, re.MULTILINE),
  32 + msg='"Type: Text" not found in output:\n' + output)
  33 + self.assertTrue(re.search(r'^No suspicious .+ found.$', output,
  34 + re.MULTILINE),
  35 + msg='"No suspicous...found" not found in output:\n' + \
  36 + output)
  37 + self.assertNotIn('error', output.lower())
  38 +
  39 + # check warnings
  40 + for line in output.splitlines():
  41 + if line.startswith('WARNING ') and 'encrypted' in line:
  42 + continue # encryption warnings are ok
  43 + elif 'warn' in line.lower():
  44 + raise self.fail('Found "warn" in output line: "{}"'
  45 + .format(line.rstrip()))
  46 + self.assertIn('not encrypted', output)
  47 +
  48 + def test_rtf_behaviour(self):
  49 + """Test behaviour of olevba when presented with an rtf file."""
  50 + input_file = join(DATA_BASE_DIR, 'msodde', 'RTF-Spec-1.7.rtf')
  51 + output, ret_code = call_and_capture('olevba', args=(input_file, ),
  52 + accept_nonzero_exit=True)
  53 +
  54 + # check that return code is olevba.RETURN_OPEN_ERROR
  55 + self.assertEqual(ret_code, 5)
  56 +
  57 + # check output:
  58 + self.assertIn('FileOpenError', output)
  59 + self.assertIn('is RTF', output)
  60 + self.assertIn('rtfobj.py', output)
  61 + self.assertIn('not encrypted', output)
  62 +
  63 + # check warnings
  64 + for line in output.splitlines():
  65 + if line.startswith('WARNING ') and 'encrypted' in line:
  66 + continue # encryption warnings are ok
  67 + elif 'warn' in line.lower():
  68 + raise self.fail('Found "warn" in output line: "{}"'
  69 + .format(line.rstrip()))
  70 +
21 def test_crypt_return(self): 71 def test_crypt_return(self):
22 """ 72 """
23 Tests that encrypted files give a certain return code. 73 Tests that encrypted files give a certain return code.
@@ -28,15 +78,23 @@ class TestOlevbaBasic(unittest.TestCase): @@ -28,15 +78,23 @@ class TestOlevbaBasic(unittest.TestCase):
28 CRYPT_DIR = join(DATA_BASE_DIR, 'encrypted') 78 CRYPT_DIR = join(DATA_BASE_DIR, 'encrypted')
29 CRYPT_RETURN_CODE = 9 79 CRYPT_RETURN_CODE = 9
30 ADD_ARGS = [], ['-d', ], ['-a', ], ['-j', ], ['-t', ] 80 ADD_ARGS = [], ['-d', ], ['-a', ], ['-j', ], ['-t', ]
  81 + EXCEPTIONS = ['autostart-encrypt-standardpassword.xls', # These ...
  82 + 'autostart-encrypt-standardpassword.xlsm', # files ...
  83 + 'autostart-encrypt-standardpassword.xlsb', # are ...
  84 + 'dde-test-encrypt-standardpassword.xls', # automati...
  85 + 'dde-test-encrypt-standardpassword.xlsx', # ...cally...
  86 + 'dde-test-encrypt-standardpassword.xlsm', # decrypted.
  87 + 'dde-test-encrypt-standardpassword.xlsb']
31 for filename in os.listdir(CRYPT_DIR): 88 for filename in os.listdir(CRYPT_DIR):
  89 + if filename in EXCEPTIONS:
  90 + continue
32 full_name = join(CRYPT_DIR, filename) 91 full_name = join(CRYPT_DIR, filename)
33 for args in ADD_ARGS: 92 for args in ADD_ARGS:
34 - try:  
35 - ret_code = olevba.main(args + [full_name, ])  
36 - except SystemExit as se:  
37 - ret_code = se.code or 0 # se.code can be None 93 + _, ret_code = call_and_capture('olevba',
  94 + args=[full_name, ] + args,
  95 + accept_nonzero_exit=True)
38 self.assertEqual(ret_code, CRYPT_RETURN_CODE, 96 self.assertEqual(ret_code, CRYPT_RETURN_CODE,
39 - msg='Wrong return code {} for args {}' 97 + msg='Wrong return code {} for args {}'\
40 .format(ret_code, args + [filename, ])) 98 .format(ret_code, args + [filename, ]))
41 99
42 100
tests/olevba/test_crypto.py 0 → 100644
  1 +"""Check decryption of files from olevba works."""
  2 +
  3 +import sys
  4 +import unittest
  5 +from os.path import basename, join as pjoin
  6 +import json
  7 +from collections import OrderedDict
  8 +
  9 +from tests.test_utils import DATA_BASE_DIR, call_and_capture
  10 +
  11 +from oletools import crypto
  12 +
  13 +
  14 +@unittest.skipIf(not crypto.check_msoffcrypto(),
  15 + 'Module msoffcrypto not installed for {}'
  16 + .format(basename(sys.executable)))
  17 +class OlevbaCryptoWriteProtectTest(unittest.TestCase):
  18 + """
  19 + Test documents that are 'write-protected' through encryption.
  20 +
  21 + Excel has a way to 'write-protect' documents by encrypting them with a
  22 + hard-coded standard password. When looking at the file-structure you see
  23 + an OLE-file with streams `EncryptedPackage`, `StrongEncryptionSpace`, and
  24 + `EncryptionInfo`. Contained in the first is the actual file. When opening
  25 + such a file in excel, it is decrypted without the user noticing.
  26 +
  27 + Olevba should detect such encryption, try to decrypt with the standard
  28 + password and look for VBA code in the decrypted file.
  29 +
  30 + All these tests are skipped if the module `msoffcrypto-tools` is not
  31 + installed.
  32 + """
  33 + def test_autostart(self):
  34 + """Check that autostart macro is found in xls[mb] sample file."""
  35 + for suffix in 'xlsm', 'xlsb':
  36 + example_file = pjoin(
  37 + DATA_BASE_DIR, 'encrypted',
  38 + 'autostart-encrypt-standardpassword.' + suffix)
  39 + output, _ = call_and_capture('olevba', args=('-j', example_file),
  40 + exclude_stderr=True)
  41 + data = json.loads(output, object_pairs_hook=OrderedDict)
  42 + # debug: json.dump(data, sys.stdout, indent=4)
  43 + self.assertEqual(len(data), 4)
  44 + self.assertIn('script_name', data[0])
  45 + self.assertIn('version', data[0])
  46 + self.assertEqual(data[0]['type'], 'MetaInformation')
  47 + self.assertIn('return_code', data[-1])
  48 + self.assertEqual(data[-1]['type'], 'MetaInformation')
  49 + self.assertEqual(data[1]['container'], None)
  50 + self.assertEqual(data[1]['file'], example_file)
  51 + self.assertEqual(data[1]['analysis'], None)
  52 + self.assertEqual(data[1]['macros'], [])
  53 + self.assertEqual(data[1]['type'], 'OLE')
  54 + self.assertEqual(data[2]['container'], example_file)
  55 + self.assertNotEqual(data[2]['file'], example_file)
  56 + self.assertEqual(data[2]['type'], "OpenXML")
  57 + analysis = data[2]['analysis']
  58 + self.assertEqual(analysis[0]['type'], 'AutoExec')
  59 + self.assertEqual(analysis[0]['keyword'], 'Auto_Open')
  60 + macros = data[2]['macros']
  61 + self.assertEqual(macros[0]['vba_filename'], 'Modul1.bas')
  62 + self.assertIn('Sub Auto_Open()', macros[0]['code'])
  63 +
  64 +
  65 +if __name__ == '__main__':
  66 + unittest.main()
tests/ooxml/test_basic.py
@@ -33,6 +33,8 @@ class TestOOXML(unittest.TestCase): @@ -33,6 +33,8 @@ class TestOOXML(unittest.TestCase):
33 pptx=ooxml.DOCTYPE_POWERPOINT, pptm=ooxml.DOCTYPE_POWERPOINT, 33 pptx=ooxml.DOCTYPE_POWERPOINT, pptm=ooxml.DOCTYPE_POWERPOINT,
34 ppsx=ooxml.DOCTYPE_POWERPOINT, ppsm=ooxml.DOCTYPE_POWERPOINT, 34 ppsx=ooxml.DOCTYPE_POWERPOINT, ppsm=ooxml.DOCTYPE_POWERPOINT,
35 potx=ooxml.DOCTYPE_POWERPOINT, potm=ooxml.DOCTYPE_POWERPOINT, 35 potx=ooxml.DOCTYPE_POWERPOINT, potm=ooxml.DOCTYPE_POWERPOINT,
  36 + ods=ooxml.DOCTYPE_NONE, odt=ooxml.DOCTYPE_NONE,
  37 + odp=ooxml.DOCTYPE_NONE,
36 ) 38 )
37 39
38 # files that are neither OLE nor xml: 40 # files that are neither OLE nor xml:
tests/ooxml/test_zip_sub_file.py
@@ -144,15 +144,15 @@ class TestZipSubFile(unittest.TestCase): @@ -144,15 +144,15 @@ class TestZipSubFile(unittest.TestCase):
144 self.subfile.seek(0, os.SEEK_END) 144 self.subfile.seek(0, os.SEEK_END)
145 self.compare.seek(0, os.SEEK_END) 145 self.compare.seek(0, os.SEEK_END)
146 146
147 - self.assertEquals(self.compare.read(10), self.subfile.read(10))  
148 - self.assertEquals(self.compare.tell(), self.subfile.tell()) 147 + self.assertEqual(self.compare.read(10), self.subfile.read(10))
  148 + self.assertEqual(self.compare.tell(), self.subfile.tell())
149 149
150 self.subfile.seek(0) 150 self.subfile.seek(0)
151 self.compare.seek(0) 151 self.compare.seek(0)
152 self.subfile.seek(len(FILE_CONTENTS) - 1) 152 self.subfile.seek(len(FILE_CONTENTS) - 1)
153 self.compare.seek(len(FILE_CONTENTS) - 1) 153 self.compare.seek(len(FILE_CONTENTS) - 1)
154 - self.assertEquals(self.compare.read(10), self.subfile.read(10))  
155 - self.assertEquals(self.compare.tell(), self.subfile.tell()) 154 + self.assertEqual(self.compare.read(10), self.subfile.read(10))
  155 + self.assertEqual(self.compare.tell(), self.subfile.tell())
156 156
157 def test_error_seek(self): 157 def test_error_seek(self):
158 """ test correct behaviour if seek beyond end (no exception) """ 158 """ test correct behaviour if seek beyond end (no exception) """
tests/ppt_parser/test_basic.py
@@ -16,7 +16,7 @@ class TestBasic(unittest.TestCase): @@ -16,7 +16,7 @@ class TestBasic(unittest.TestCase):
16 16
17 def test_is_ppt(self): 17 def test_is_ppt(self):
18 """ test ppt_record_parser.is_ppt(filename) """ 18 """ test ppt_record_parser.is_ppt(filename) """
19 - exceptions = [] 19 + exceptions = ['encrypted.ppt', ] # actually is ppt but embedded
20 for base_dir, _, files in os.walk(DATA_BASE_DIR): 20 for base_dir, _, files in os.walk(DATA_BASE_DIR):
21 for filename in files: 21 for filename in files:
22 if filename in exceptions: 22 if filename in exceptions:
tests/test-data/encrypted/autostart-encrypt-standardpassword.xls 0 → 100644
No preview for this file type
tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsb 0 → 100644
No preview for this file type
tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsm 0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xls 0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsb 0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsm 0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsx 0 → 100644
No preview for this file type
tests/test-data/oleobj/embedded-simple-2007.odp 0 → 100644
No preview for this file type
tests/test-data/oleobj/embedded-simple-2007.ods 0 → 100644
No preview for this file type
tests/test-data/oleobj/embedded-simple-2007.odt 0 → 100644
No preview for this file type
tests/test_utils/__init__.py
1 -from os.path import dirname, join  
2 -  
3 -# Directory with test data, independent of current working directory  
4 -DATA_BASE_DIR = join(dirname(dirname(__file__)), 'test-data') 1 +from .utils import *
tests/test_utils/utils.py 0 → 100644
  1 +#!/usr/bin/env python3
  2 +
  3 +"""Utils generally useful for unittests."""
  4 +
  5 +import sys
  6 +import os
  7 +from os.path import dirname, join, abspath
  8 +from subprocess import check_output, PIPE, STDOUT, CalledProcessError
  9 +
  10 +
  11 +# Base dir of project, contains subdirs "tests" and "oletools" and README.md
  12 +PROJECT_ROOT = dirname(dirname(dirname(abspath(__file__))))
  13 +
  14 +# Directory with test data, independent of current working directory
  15 +DATA_BASE_DIR = join(PROJECT_ROOT, 'tests', 'test-data')
  16 +
  17 +# Directory with source code
  18 +SOURCE_BASE_DIR = join(PROJECT_ROOT, 'oletools')
  19 +
  20 +
  21 +def call_and_capture(module, args=None, accept_nonzero_exit=False,
  22 + exclude_stderr=False):
  23 + """
  24 + Run module as script, capturing and returning output and return code.
  25 +
  26 + This is the best way to capture a module's stdout and stderr; trying to
  27 + modify sys.stdout/sys.stderr to StringIO-Buffers frequently causes trouble.
  28 +
  29 + Only drawback sofar: stdout and stderr are merged into one (which is
  30 + what users see on their shell as well). When testing for json-compatible
  31 + output you should `exclude_stderr` to `False` since logging ignores stderr,
  32 + so unforseen warnings (e.g. issued by pypy) would mess up your json.
  33 +
  34 + :param str module: name of module to test, e.g. `olevba`
  35 + :param args: arguments for module's main function
  36 + :param bool fail_nonzero: Raise error if command returns non-0 return code
  37 + :param bool exclude_stderr: Exclude output to `sys.stderr` from output
  38 + (e.g. if parsing output through json)
  39 + :returns: ret_code, output
  40 + :rtype: int, str
  41 + """
  42 + # create a PYTHONPATH environment var to prefer our current code
  43 + env = os.environ.copy()
  44 + try:
  45 + env['PYTHONPATH'] = SOURCE_BASE_DIR + os.pathsep + \
  46 + os.environ['PYTHONPATH']
  47 + except KeyError:
  48 + env['PYTHONPATH'] = SOURCE_BASE_DIR
  49 +
  50 + # hack: in python2 output encoding (sys.stdout.encoding) was None
  51 + # although sys.getdefaultencoding() and sys.getfilesystemencoding were ok
  52 + # TODO: maybe can remove this once branch
  53 + # "encoding-for-non-unicode-environments" is merged
  54 + if 'PYTHONIOENCODING' not in env:
  55 + env['PYTHONIOENCODING'] = 'utf8'
  56 +
  57 + # ensure args is a tuple
  58 + my_args = tuple(args) if args else ()
  59 +
  60 + ret_code = -1
  61 + try:
  62 + output = check_output((sys.executable, '-m', module) + my_args,
  63 + universal_newlines=True, env=env,
  64 + stderr=PIPE if exclude_stderr else STDOUT)
  65 + ret_code = 0
  66 +
  67 + except CalledProcessError as err:
  68 + if accept_nonzero_exit:
  69 + ret_code = err.returncode
  70 + output = err.output
  71 + else:
  72 + print(err.output)
  73 + raise
  74 +
  75 + return output, ret_code