Commit c47b13c16461fd24a8217e12419989c0a81721fc

Authored by Philippe Lagadec
1 parent dfd6b4f0

updated documentation for v0.41

oletools/README.html
@@ -4,7 +4,8 @@ @@ -4,7 +4,8 @@
4 <p>Note: python-oletools is not related to OLETools published by BeCubed Software.</p> 4 <p>Note: python-oletools is not related to OLETools published by BeCubed Software.</p>
5 <h2 id="news">News</h2> 5 <h2 id="news">News</h2>
6 <ul> 6 <ul>
7 -<li><strong>2015-09-17 v0.40</strong>: Improved macro deobfuscation in <a href="https://bitbucket.org/decalage/oletools/wiki/olevba">olevba</a>, to decode Hex and Base64 within VBA expressions. Display printable deobfuscated strings by default. Improved the VBA_Parser API. Improved performance. Fixed <a href="https://bitbucket.org/decalage/oletools/issue/23">issue #23</a> with sys.stderr.</li> 7 +<li><strong>2015-09-22 v0.41</strong>: added new --reveal option to <a href="https://bitbucket.org/decalage/oletools/wiki/olevba">olevba</a>, to show the macro code with VBA strings deobfuscated.</li>
  8 +<li>2015-09-17 v0.40: Improved macro deobfuscation in <a href="https://bitbucket.org/decalage/oletools/wiki/olevba">olevba</a>, to decode Hex and Base64 within VBA expressions. Display printable deobfuscated strings by default. Improved the VBA_Parser API. Improved performance. Fixed <a href="https://bitbucket.org/decalage/oletools/issue/23">issue #23</a> with sys.stderr.</li>
8 <li>2015-06-19 v0.12: <a href="https://bitbucket.org/decalage/oletools/wiki/olevba">olevba</a> can now deobfuscate VBA expressions with any combination of Chr, Asc, Val, StrReverse, Environ, +, &amp;, using a VBA parser built with <a href="http://pyparsing.wikispaces.com">pyparsing</a>. New options to display only the analysis results or only the macros source code. The analysis is now done on all the VBA modules at once.</li> 9 <li>2015-06-19 v0.12: <a href="https://bitbucket.org/decalage/oletools/wiki/olevba">olevba</a> can now deobfuscate VBA expressions with any combination of Chr, Asc, Val, StrReverse, Environ, +, &amp;, using a VBA parser built with <a href="http://pyparsing.wikispaces.com">pyparsing</a>. New options to display only the analysis results or only the macros source code. The analysis is now done on all the VBA modules at once.</li>
9 <li>2015-05-29 v0.11: Improved parsing of MHTML and ActiveMime/MSO files in <a href="https://bitbucket.org/decalage/oletools/wiki/olevba">olevba</a>, added several suspicious keywords to VBA scanner (thanks to <span class="citation">@ozhermit</span> and Davy Douhine for the suggestions)</li> 10 <li>2015-05-29 v0.11: Improved parsing of MHTML and ActiveMime/MSO files in <a href="https://bitbucket.org/decalage/oletools/wiki/olevba">olevba</a>, added several suspicious keywords to VBA scanner (thanks to <span class="citation">@ozhermit</span> and Davy Douhine for the suggestions)</li>
10 <li>2015-05-06 v0.10: <a href="https://bitbucket.org/decalage/oletools/wiki/olevba">olevba</a> now supports Word MHTML files with macros, aka &quot;Single File Web Page&quot; (.mht) - see <a href="https://bitbucket.org/decalage/oletools/issue/10">issue #10</a> for more info</li> 11 <li>2015-05-06 v0.10: <a href="https://bitbucket.org/decalage/oletools/wiki/olevba">olevba</a> now supports Word MHTML files with macros, aka &quot;Single File Web Page&quot; (.mht) - see <a href="https://bitbucket.org/decalage/oletools/issue/10">issue #10</a> for more info</li>
oletools/README.rst
@@ -26,7 +26,10 @@ Software. @@ -26,7 +26,10 @@ Software.
26 News 26 News
27 ---- 27 ----
28 28
29 -- **2015-09-17 v0.40**: Improved macro deobfuscation in 29 +- **2015-09-22 v0.41**: added new --reveal option to
  30 + `olevba <https://bitbucket.org/decalage/oletools/wiki/olevba>`__, to
  31 + show the macro code with VBA strings deobfuscated.
  32 +- 2015-09-17 v0.40: Improved macro deobfuscation in
30 `olevba <https://bitbucket.org/decalage/oletools/wiki/olevba>`__, to 33 `olevba <https://bitbucket.org/decalage/oletools/wiki/olevba>`__, to
31 decode Hex and Base64 within VBA expressions. Display printable 34 decode Hex and Base64 within VBA expressions. Display printable
32 deobfuscated strings by default. Improved the VBA\_Parser API. 35 deobfuscated strings by default. Improved the VBA\_Parser API.
oletools/doc/Home.html
1 -<p>python-oletools v0.40 documentation</p> 1 +<p>python-oletools v0.41 documentation</p>
2 <p>===================================</p> 2 <p>===================================</p>
3 <p>This is the home page of the documentation for python-oletools. The latest version can be found</p> 3 <p>This is the home page of the documentation for python-oletools. The latest version can be found</p>
4 <p><a href="https://bitbucket.org/decalage/oletools/wiki">online</a>, otherwise a copy is provided in the doc subfolder of the package.</p> 4 <p><a href="https://bitbucket.org/decalage/oletools/wiki">online</a>, otherwise a copy is provided in the doc subfolder of the package.</p>
oletools/doc/Home.md
1 -python-oletools v0.40 documentation  
2 -===================================  
3 -  
4 -This is the home page of the documentation for python-oletools. The latest version can be found  
5 -[online](https://bitbucket.org/decalage/oletools/wiki), otherwise a copy is provided in the doc subfolder of the package.  
6 -  
7 -[python-oletools](http://www.decalage.info/python/oletools) is a package of python tools to analyze  
8 -[Microsoft OLE2 files](http://en.wikipedia.org/wiki/Compound_File_Binary_Format)  
9 -(also called Structured Storage, Compound File Binary Format or Compound Document File Format),  
10 -such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging.  
11 -It is based on the [olefile](http://www.decalage.info/olefile) parser.  
12 -See [http://www.decalage.info/python/oletools](http://www.decalage.info/python/oletools) for more info.  
13 -  
14 -**Quick links:** [Home page](http://www.decalage.info/python/oletools) -  
15 -[Download/Install](https://bitbucket.org/decalage/oletools/wiki/Install) -  
16 -[Documentation](https://bitbucket.org/decalage/oletools/wiki) -  
17 -[Report Issues/Suggestions/Questions](https://bitbucket.org/decalage/oletools/issues?status=new&status=open) -  
18 -[Contact the author](http://decalage.info/contact) -  
19 -[Repository](https://bitbucket.org/decalage/oletools) -  
20 -[Updates on Twitter](https://twitter.com/decalage2)  
21 -  
22 -Note: python-oletools is not related to OLETools published by BeCubed Software.  
23 -  
24 -Tools in python-oletools:  
25 --------------------------  
26 -  
27 -- **[[olebrowse]]**: A simple GUI to browse OLE files (e.g. MS Word, Excel, Powerpoint documents), to  
28 - view and extract individual data streams.  
29 -- **[[oleid]]**: a tool to analyze OLE files to detect specific characteristics usually found in malicious files.  
30 -- **[[olemeta]]**: a tool to extract all standard properties (metadata) from OLE files.  
31 -- **[[oletimes]]**: a tool to extract creation and modification timestamps of all streams and storages.  
32 -- **[[olevba]]**: a tool to extract and analyze VBA Macro source code from MS Office documents (OLE and OpenXML).  
33 -- **[[pyxswf]]**: a tool to detect, extract and analyze Flash objects (SWF) that may  
34 - be embedded in files such as MS Office documents (e.g. Word, Excel) and RTF,  
35 - which is especially useful for malware analysis.  
36 -- **[[rtfobj]]**: a tool and python module to extract embedded objects from RTF files.  
37 -- and a few others (coming soon)  
38 -  
39 ---------------------------------------------------------------------------  
40 -  
41 -python-oletools documentation  
42 ------------------------------  
43 -  
44 -- [[Home]]  
45 -- [[License]]  
46 -- [[Install]]  
47 -- [[Contribute]], Suggest Improvements or Report Issues  
48 -- Tools:  
49 - - [[olebrowse]]  
50 - - [[oleid]]  
51 - - [[olemeta]]  
52 - - [[oletimes]]  
53 - - [[olevba]]  
54 - - [[pyxswf]] 1 +python-oletools v0.41 documentation
  2 +===================================
  3 +
  4 +This is the home page of the documentation for python-oletools. The latest version can be found
  5 +[online](https://bitbucket.org/decalage/oletools/wiki), otherwise a copy is provided in the doc subfolder of the package.
  6 +
  7 +[python-oletools](http://www.decalage.info/python/oletools) is a package of python tools to analyze
  8 +[Microsoft OLE2 files](http://en.wikipedia.org/wiki/Compound_File_Binary_Format)
  9 +(also called Structured Storage, Compound File Binary Format or Compound Document File Format),
  10 +such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging.
  11 +It is based on the [olefile](http://www.decalage.info/olefile) parser.
  12 +See [http://www.decalage.info/python/oletools](http://www.decalage.info/python/oletools) for more info.
  13 +
  14 +**Quick links:** [Home page](http://www.decalage.info/python/oletools) -
  15 +[Download/Install](https://bitbucket.org/decalage/oletools/wiki/Install) -
  16 +[Documentation](https://bitbucket.org/decalage/oletools/wiki) -
  17 +[Report Issues/Suggestions/Questions](https://bitbucket.org/decalage/oletools/issues?status=new&status=open) -
  18 +[Contact the author](http://decalage.info/contact) -
  19 +[Repository](https://bitbucket.org/decalage/oletools) -
  20 +[Updates on Twitter](https://twitter.com/decalage2)
  21 +
  22 +Note: python-oletools is not related to OLETools published by BeCubed Software.
  23 +
  24 +Tools in python-oletools:
  25 +-------------------------
  26 +
  27 +- **[[olebrowse]]**: A simple GUI to browse OLE files (e.g. MS Word, Excel, Powerpoint documents), to
  28 + view and extract individual data streams.
  29 +- **[[oleid]]**: a tool to analyze OLE files to detect specific characteristics usually found in malicious files.
  30 +- **[[olemeta]]**: a tool to extract all standard properties (metadata) from OLE files.
  31 +- **[[oletimes]]**: a tool to extract creation and modification timestamps of all streams and storages.
  32 +- **[[olevba]]**: a tool to extract and analyze VBA Macro source code from MS Office documents (OLE and OpenXML).
  33 +- **[[pyxswf]]**: a tool to detect, extract and analyze Flash objects (SWF) that may
  34 + be embedded in files such as MS Office documents (e.g. Word, Excel) and RTF,
  35 + which is especially useful for malware analysis.
  36 +- **[[rtfobj]]**: a tool and python module to extract embedded objects from RTF files.
  37 +- and a few others (coming soon)
  38 +
  39 +--------------------------------------------------------------------------
  40 +
  41 +python-oletools documentation
  42 +-----------------------------
  43 +
  44 +- [[Home]]
  45 +- [[License]]
  46 +- [[Install]]
  47 +- [[Contribute]], Suggest Improvements or Report Issues
  48 +- Tools:
  49 + - [[olebrowse]]
  50 + - [[oleid]]
  51 + - [[olemeta]]
  52 + - [[oletimes]]
  53 + - [[olevba]]
  54 + - [[pyxswf]]
55 - [[rtfobj]] 55 - [[rtfobj]]
56 \ No newline at end of file 56 \ No newline at end of file
oletools/doc/olevba.html
@@ -106,7 +106,11 @@ Options: @@ -106,7 +106,11 @@ Options:
106 106
107 --attr display the attribute lines at the beginning of VBA 107 --attr display the attribute lines at the beginning of VBA
108 108
109 - source code</code></pre> 109 + source code
  110 +
  111 + --reveal display the macro source code after replacing all the
  112 +
  113 + obfuscated strings by their decoded content.</code></pre>
110 <h3 id="examples">Examples</h3> 114 <h3 id="examples">Examples</h3>
111 <p>Scan a single file:</p> 115 <p>Scan a single file:</p>
112 <pre><code>olevba.py file.doc</code></pre> 116 <pre><code>olevba.py file.doc</code></pre>
@@ -114,6 +118,8 @@ Options: @@ -114,6 +118,8 @@ Options:
114 <pre><code>olevba.py malicious_file.xls.zip -z infected</code></pre> 118 <pre><code>olevba.py malicious_file.xls.zip -z infected</code></pre>
115 <p>Scan a single file, showing all obfuscated strings decoded:</p> 119 <p>Scan a single file, showing all obfuscated strings decoded:</p>
116 <pre><code>olevba.py file.doc --decode</code></pre> 120 <pre><code>olevba.py file.doc --decode</code></pre>
  121 +<p>Scan a single file, showing the macro source code with VBA strings deobfuscated:</p>
  122 +<pre><code>olevba.py file.doc --reveal</code></pre>
117 <p>Scan VBA source code extracted into a text file:</p> 123 <p>Scan VBA source code extracted into a text file:</p>
118 <pre><code>olevba.py -i source_code.vba</code></pre> 124 <pre><code>olevba.py -i source_code.vba</code></pre>
119 <p>Scan a collection of files stored in a folder:</p> 125 <p>Scan a collection of files stored in a folder:</p>
oletools/doc/olevba.md
1 -olevba  
2 -======  
3 -  
4 -olevba is a script to parse OLE and OpenXML files such as MS Office documents  
5 -(e.g. Word, Excel), to **detect VBA Macros**, extract their **source code** in clear text,  
6 -and detect security-related patterns such as **auto-executable macros**, **suspicious  
7 -VBA keywords** used by malware, anti-sandboxing and anti-virtualization techniques,  
8 -and potential **IOCs** (IP addresses, URLs, executable filenames, etc).  
9 -It also detects and decodes several common **obfuscation methods including Hex encoding,  
10 -StrReverse, Base64, Dridex, VBA expressions**, and extracts IOCs from decoded strings.  
11 -  
12 -It can be used either as a command-line tool, or as a python module from your own applications.  
13 -  
14 -It is part of the [python-oletools](http://www.decalage.info/python/oletools) package.  
15 -  
16 -olevba is based on source code from [officeparser](https://github.com/unixfreak0037/officeparser)  
17 -by John William Davison, with significant modifications.  
18 -  
19 -## Supported formats  
20 -  
21 -- Word 97-2003 (.doc, .dot)  
22 -- Word 2007+ (.docm, .dotm)  
23 -- Word 2003 XML (.xml)  
24 -- Word/Excel MHTML, aka Single File Web Page (.mht)  
25 -- Excel 97-2003 (.xls)  
26 -- Excel 2007+ (.xlsm, .xlsb)  
27 -- PowerPoint 2007+ (.pptm, .ppsm)  
28 -  
29 -## Main Features  
30 -  
31 -- Detect VBA macros in MS Office 97-2003 and 2007+ files, XML, MHT  
32 -- Extract VBA macro source code  
33 -- Detect auto-executable macros  
34 -- Detect suspicious VBA keywords often used by malware  
35 -- Detect anti-sandboxing and anti-virtualization techniques  
36 -- Detect and decodes strings obfuscated with Hex/Base64/StrReverse/Dridex  
37 -- Deobfuscates VBA expressions with any combination of Chr, Asc, Val, StrReverse, Environ, +, &, using a VBA parser built with  
38 -[pyparsing](http://pyparsing.wikispaces.com), including custom Hex and Base64 encodings  
39 -- Extract IOCs/patterns of interest such as IP addresses, URLs, e-mail addresses and executable file names  
40 -- Scan multiple files and sample collections (wildcards, recursive)  
41 -- Triage mode for a summary view of multiple files  
42 -- Scan malware samples in password-protected Zip archives  
43 -- Python API to use olevba from your applications  
44 -  
45 -MS Office files encrypted with a password are also supported, because VBA macro code is never  
46 -encrypted, only the content of the document.  
47 -  
48 -## About VBA Macros  
49 -  
50 -See [this article](http://www.decalage.info/en/vba_tools) for more information and technical details about VBA Macros  
51 -and how they are stored in MS Office documents.  
52 -  
53 -## How it works  
54 -  
55 -1. olevba checks the file type: If it is an OLE file (i.e MS Office 97-2003), it is parsed right away.  
56 -1. If it is a zip file (i.e. MS Office 2007+), XML or MHTML, olevba looks for all OLE files stored in it (e.g. vbaProject.bin, editdata.mso), and opens them.  
57 -1. olevba identifies all the VBA projects stored in the OLE structure.  
58 -1. Each VBA project is parsed to find the corresponding OLE streams containing macro code.  
59 -1. In each of these OLE streams, the VBA macro source code is extracted and decompressed (RLE compression).  
60 -1. olevba looks for specific strings obfuscated with various algorithms (Hex, Base64, StrReverse, Dridex, VBA expressions).  
61 -1. olevba scans the macro source code and the deobfuscated strings to find suspicious keywords, auto-executable macros  
62 -and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames, etc).  
63 -  
64 -  
65 -## Usage  
66 -  
67 - :::text  
68 - Usage: olevba.py [options] <filename> [filename2 ...]  
69 -  
70 - Options:  
71 - -h, --help show this help message and exit  
72 - -r find files recursively in subdirectories.  
73 - -z ZIP_PASSWORD, --zip=ZIP_PASSWORD  
74 - if the file is a zip archive, open all files from it,  
75 - using the provided password (requires Python 2.6+)  
76 - -f ZIP_FNAME, --zipfname=ZIP_FNAME  
77 - if the file is a zip archive, file(s) to be opened  
78 - within the zip. Wildcards * and ? are supported.  
79 - (default:*)  
80 - -t, --triage triage mode, display results as a summary table  
81 - (default for multiple files)  
82 - -d, --detailed detailed mode, display full results (default for  
83 - single file)  
84 - -a, --analysis display only analysis results, not the macro source  
85 - code  
86 - -c, --code display only VBA source code, do not analyze it  
87 - -i INPUT, --input=INPUT  
88 - input file containing VBA source code to be analyzed  
89 - (no parsing)  
90 - --decode display all the obfuscated strings with their decoded  
91 - content (Hex, Base64, StrReverse, Dridex, VBA).  
92 - --attr display the attribute lines at the beginning of VBA  
93 - source code  
94 -  
95 -### Examples  
96 -  
97 -Scan a single file:  
98 -  
99 - :::text  
100 - olevba.py file.doc  
101 -  
102 -Scan a single file, stored in a Zip archive with password "infected":  
103 -  
104 - :::text  
105 - olevba.py malicious_file.xls.zip -z infected  
106 -  
107 -Scan a single file, showing all obfuscated strings decoded:  
108 -  
109 - :::text  
110 - olevba.py file.doc --decode  
111 -  
112 -Scan VBA source code extracted into a text file:  
113 -  
114 - :::text  
115 - olevba.py -i source_code.vba  
116 -  
117 -Scan a collection of files stored in a folder:  
118 -  
119 - :::text  
120 - olevba.py MalwareZoo/VBA/*  
121 -  
122 -Scan all .doc and .xls files, recursively in all subfolders:  
123 -  
124 - :::text  
125 - olevba.py MalwareZoo/VBA/*.doc MalwareZoo/VBA/*.xls -r  
126 -  
127 -Scan all .doc files within all .zip files with password, recursively:  
128 -  
129 - :::text  
130 - olevba.py MalwareZoo/VBA/*.zip -r -z infected -f *.doc  
131 -  
132 -  
133 -### Detailed analysis mode (default for single file)  
134 -  
135 -When a single file is scanned, or when using the option -d, all details of the analysis are displayed.  
136 -  
137 -For example, checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/):  
138 -  
139 - :::text  
140 - >olevba.py c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected  
141 - ===============================================================================  
142 - FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip  
143 - Type: OLE  
144 - -------------------------------------------------------------------------------  
145 - VBA MACRO ThisDocument.cls  
146 - in file: DIAN_caso-5415.doc.malware - OLE stream: Macros/VBA/ThisDocument  
147 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
148 - Option Explicit  
149 - Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal FVQGKS As Long,_  
150 - ByVal WSGSGY As String, ByVal IFRRFV As String, ByVal NCVOLV As Long, _  
151 - ByVal HQTLDG As Long) As Long  
152 - Sub AutoOpen()  
153 - Auto_Open  
154 - End Sub  
155 - Sub Auto_Open()  
156 - SNVJYQ  
157 - End Sub  
158 - Public Sub SNVJYQ()  
159 - [Malicious Code...]  
160 - End Sub  
161 - Function OGEXYR(XSTAHU As String, PHHWIV As String) As Boolean  
162 - [Malicious Code...]  
163 - Application.DisplayAlerts = False  
164 - Application.Quit  
165 - End Function  
166 - Sub Workbook_Open()  
167 - Auto_Open  
168 - End Sub  
169 -  
170 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  
171 - ANALYSIS:  
172 - +------------+----------------------+-----------------------------------------+  
173 - | Type | Keyword | Description |  
174 - +------------+----------------------+-----------------------------------------+  
175 - | AutoExec | AutoOpen | Runs when the Word document is opened |  
176 - | AutoExec | Auto_Open | Runs when the Excel Workbook is opened |  
177 - | AutoExec | Workbook_Open | Runs when the Excel Workbook is opened |  
178 - | Suspicious | Lib | May run code from a DLL |  
179 - | Suspicious | Shell | May run an executable file or a system |  
180 - | | | command |  
181 - | Suspicious | Environ | May read system environment variables |  
182 - | Suspicious | URLDownloadToFileA | May download files from the Internet |  
183 - | IOC | http://germanya.com. | URL |  
184 - | | ec/logs/test.exe" | |  
185 - | IOC | http://germanya.com. | URL |  
186 - | | ec/logs/counter.php" | |  
187 - | IOC | germanya.com | Executable file name |  
188 - | IOC | test.exe | Executable file name |  
189 - | IOC | sfjozjero.exe | Executable file name |  
190 - +------------+----------------------+-----------------------------------------+  
191 -  
192 -### Triage mode (default for multiple files)  
193 -  
194 -When several files are scanned, or when using the option -t, a summary of the analysis for each file is displayed.  
195 -This is more convenient for quick triage of a collection of suspicious files.  
196 -  
197 -The following flags show the results of the analysis:  
198 -  
199 -- **OLE**: the file type is OLE, for example MS Office 97-2003  
200 -- **OpX**: the file type is OpenXML, for example MS Office 2007+  
201 -- **XML**: the file type is Word 2003 XML  
202 -- **MHT**: the file type is Word MHTML, aka Single File Web Page (.mht)  
203 -- **?**: the file type is not supported  
204 -- **M**: contains VBA Macros  
205 -- **A**: auto-executable macros  
206 -- **S**: suspicious VBA keywords  
207 -- **I**: potential IOCs  
208 -- **H**: hex-encoded strings (potential obfuscation)  
209 -- **B**: Base64-encoded strings (potential obfuscation)  
210 -- **D**: Dridex-encoded strings (potential obfuscation)  
211 -- **V**: VBA string expressions (potential obfuscation)  
212 -  
213 -Here is an example:  
214 -  
215 - :::text  
216 - c:\>olevba.py \MalwareZoo\VBA\samples\*  
217 - Flags Filename  
218 - ----------- -----------------------------------------------------------------  
219 - OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware  
220 - OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_1.doc.malware  
221 - OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_2.doc.malware  
222 - OLE:MASI--- \MalwareZoo\VBA\samples\DRIDEX_3.doc.malware  
223 - OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_4.doc.malware  
224 - OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_5.doc.malware  
225 - OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_6.doc.malware  
226 - OLE:MAS---- \MalwareZoo\VBA\samples\DRIDEX_7.doc.malware  
227 - OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_8.doc.malware  
228 - OLE:MASIHBD \MalwareZoo\VBA\samples\DRIDEX_9.xls.malware  
229 - OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_A.doc.malware  
230 - OLE:------- \MalwareZoo\VBA\samples\Normal_Document.doc  
231 - OLE:M------ \MalwareZoo\VBA\samples\Normal_Document_Macro.doc  
232 - OpX:MASI--- \MalwareZoo\VBA\samples\RottenKitten.xlsb.malware  
233 - OLE:MASI-B- \MalwareZoo\VBA\samples\ROVNIX.doc.malware  
234 - OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc  
235 -  
236 -  
237 ---------------------------------------------------------------------------  
238 -  
239 -## How to use olevba in Python applications  
240 -  
241 -olevba may be used to open a MS Office file, detect if it contains VBA macros, extract and analyze the VBA source code  
242 -from your own python applications.  
243 -  
244 -IMPORTANT: olevba is currently under active development, therefore this API is likely to change.  
245 -  
246 -### Import olevba  
247 -  
248 -First, import the **oletools.olevba** package, using at least the VBA_Parser and VBA_Scanner classes:  
249 -  
250 - :::python  
251 - from oletools.olevba import VBA_Parser, TYPE_OLE, TYPE_OpenXML, TYPE_Word2003_XML, TYPE_MHTML  
252 -  
253 -### Parse a MS Office file  
254 -  
255 -To parse a file on disk, create an instance of the **VBA_Parser** class, providing the name of the file to open as parameter.  
256 -For example:  
257 -  
258 - :::python  
259 - vbaparser = VBA_Parser('my_file_with_macros.doc')  
260 -  
261 -The file may also be provided as a bytes string containing its data. In that case, the actual  
262 -filename must be provided for reference, and the file content with the data parameter. For example:  
263 -  
264 - :::python  
265 - myfile = 'my_file_with_macros.doc'  
266 - filedata = open(myfile, 'rb').read()  
267 - vbaparser = VBA_Parser(myfile, data=filedata)  
268 -  
269 -VBA_Parser will raise an exception if the file is not a supported format, such as OLE (MS Office 97-2003), OpenXML  
270 -(MS Office 2007+), MHTML or Word 2003 XML.  
271 -  
272 -After parsing the file, the attribute **VBA_Parser.type** is a string indicating the file type.  
273 -It can be either TYPE_OLE, TYPE_OpenXML, TYPE_Word2003_XML or TYPE_MHTML. (constants defined in the olevba module)  
274 -  
275 -### Detect VBA macros  
276 -  
277 -The method **detect_vba_macros** of a VBA_Parser object returns True if VBA macros have been found in the file,  
278 -False otherwise.  
279 -  
280 - :::python  
281 - if vbaparser.detect_vba_macros():  
282 - print 'VBA Macros found'  
283 - else:  
284 - print 'No VBA Macros found'  
285 -  
286 -Note: The detection algorithm looks for streams and storage with specific names in the OLE structure, which works fine  
287 -for all the supported formats listed above. However, for some formats such as PowerPoint 97-2003, this method will  
288 -always return False because VBA Macros are stored in a different way which is not yet supported by olevba.  
289 -  
290 -Moreover, if the file contains an embedded document (e.g. an Excel workbook inserted into a Word document), this method  
291 -may return True if the embedded document contains VBA Macros, even if the main document does not.  
292 -  
293 -### Extract VBA Macro Source Code  
294 -  
295 -The method **extract_macros** extracts and decompresses source code for each VBA macro found in the file (possibly  
296 -including embedded files). It is a generator yielding a tuple (filename, stream_path, vba_filename, vba_code)  
297 -for each VBA macro found.  
298 -  
299 -- filename: If the file is OLE (MS Office 97-2003), filename is the path of the file.  
300 - If the file is OpenXML (MS Office 2007+), filename is the path of the OLE subfile containing VBA macros within the zip archive,  
301 - e.g. word/vbaProject.bin.  
302 -- stream_path: path of the OLE stream containing the VBA macro source code  
303 -- vba_filename: corresponding VBA filename  
304 -- vba_code: string containing the VBA source code in clear text  
305 -  
306 -Example:  
307 -  
308 - :::python  
309 - for (filename, stream_path, vba_filename, vba_code) in vbaparser.extract_macros():  
310 - print '-'*79  
311 - print 'Filename :', filename  
312 - print 'OLE stream :', stream_path  
313 - print 'VBA filename:', vba_filename  
314 - print '- '*39  
315 - print vba_code  
316 -  
317 -Alternatively, the VBA_Parser method **extract_all_macros** returns the same results as a list of tuples.  
318 -  
319 -### Analyze VBA Source Code  
320 -  
321 -Since version 0.40, the VBA_Parser class provides simpler methods than VBA_Scanner to analyze all macros contained  
322 -in a file:  
323 -  
324 -The methods **scan** or **scan_summary** from the class **VBA_Parser** can be used to scan the source code of all  
325 -VBA modules to find obfuscated strings, suspicious keywords, IOCs, auto-executable macros, etc.  
326 -  
327 -scan() takes an optional argument include_decoded_strings: if set to True, the results will contain all the encoded  
328 -strings found in the code (Hex, Base64, Dridex) with their decoded value.  
329 -By default, it will include the strings which contain printable characters only.  
330 -  
331 -**VBA_Parser.scan()** returns a list of tuples (type, keyword, description), one for each item in the results.  
332 -  
333 -- type may be either 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String', 'Dridex String' or  
334 - 'VBA obfuscated Strings'.  
335 -- keyword is the string found for auto-executable macros, suspicious keywords or IOCs. For obfuscated strings, it is  
336 - the decoded value of the string.  
337 -- description provides a description of the keyword. For obfuscated strings, it is the encoded value of the string.  
338 -  
339 -Example:  
340 -  
341 - :::python  
342 - results = vbaparser.scan()  
343 - for kw_type, keyword, description in results:  
344 - print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)  
345 -  
346 -**VBA_Parser.scan_summary()** returns a tuple with the number of items found for each category:  
347 -(autoexec, suspicious, IOCs, hex, base64, dridex, vbastrings).  
348 -  
349 -  
350 -  
351 -### Close the VBA_Parser  
352 -  
353 -After usage, it is better to call the **close** method of the VBA_Parser object, to make sure the file is closed,  
354 -especially if your application is parsing many files.  
355 -  
356 - :::python  
357 - vbaparser.close()  
358 -  
359 -  
360 ---------------------------------------------------------------------------  
361 -  
362 -## Deprecated API  
363 -  
364 -The following methods and functions are still functional, but their usage is not recommended  
365 -since they have been replaced by better solutions.  
366 -  
367 -### VBA_Scanner (deprecated)  
368 -  
369 -Note: this API is under active development and may change in the future.  
370 -  
371 -The class **VBA_Scanner** can be used to scan the source code of a VBA module to find obfuscated strings,  
372 -suspicious keywords, IOCs, auto-executable macros, etc.  
373 -  
374 -First, create a VBA_Scanner object with a string containing the VBA source code (for example returned by the  
375 -extract_macros method). Then call the methods **scan** or **scan_summary** to get the results of the analysis.  
376 -  
377 -scan() takes an optional argument include_decoded_strings: if set to True, the results will contain all the encoded  
378 -strings found in the code (Hex, Base64, Dridex) with their decoded value.  
379 -  
380 -**scan** returns a list of tuples (type, keyword, description), one for each item in the results.  
381 -  
382 -- type may be either 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String'.  
383 -- keyword is the string found for auto-executable macros, suspicious keywords or IOCs. For obfuscated strings, it is  
384 - the decoded value of the string.  
385 -- description provides a description of the keyword. For obfuscated strings, it is the encoded value of the string.  
386 -  
387 -Example:  
388 -  
389 - :::python  
390 - vba_scanner = VBA_Scanner(vba_code)  
391 - results = vba_scanner.scan(include_decoded_strings=True)  
392 - for kw_type, keyword, description in results:  
393 - print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)  
394 -  
395 -The function **scan_vba** is a shortcut for VBA_Scanner(vba_code).scan():  
396 -  
397 - :::python  
398 - results = scan_vba(vba_code, include_decoded_strings=True)  
399 - for kw_type, keyword, description in results:  
400 - print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)  
401 -  
402 -**scan_summary** returns a tuple with the number of items found for each category:  
403 -(autoexec, suspicious, IOCs, hex, base64, dridex).  
404 -  
405 -  
406 -### Detect auto-executable macros (deprecated)  
407 -  
408 -**Deprecated**: It is preferable to use either scan_vba or VBA_Scanner to get all results at once.  
409 -  
410 -The function **detect_autoexec** checks if VBA macro code contains specific macro names  
411 -that will be triggered when the document/workbook is opened, closed, changed, etc.  
412 -  
413 -It returns a list of tuples containing two strings, the detected keyword, and the  
414 -description of the trigger. (See the malware example above)  
415 -  
416 -Sample usage:  
417 -  
418 - :::python  
419 - from oletools.olevba import detect_autoexec  
420 - autoexec_keywords = detect_autoexec(vba_code)  
421 - if autoexec_keywords:  
422 - print 'Auto-executable macro keywords found:'  
423 - for keyword, description in autoexec_keywords:  
424 - print '%s: %s' % (keyword, description)  
425 - else:  
426 - print 'Auto-executable macro keywords: None found'  
427 -  
428 -  
429 -### Detect suspicious VBA keywords (deprecated)  
430 -  
431 -**Deprecated**: It is preferable to use either scan_vba or VBA_Scanner to get all results at once.  
432 -  
433 -The function **detect_suspicious** checks if VBA macro code contains specific  
434 -keywords often used by malware to act on the system (create files, run  
435 -commands or applications, write to the registry, etc).  
436 -  
437 -It returns a list of tuples containing two strings, the detected keyword, and the  
438 -description of the corresponding malicious behaviour. (See the malware example above)  
439 -  
440 -Sample usage:  
441 -  
442 - :::python  
443 - from oletools.olevba import detect_suspicious  
444 - suspicious_keywords = detect_suspicious(vba_code)  
445 - if suspicious_keywords:  
446 - print 'Suspicious VBA keywords found:'  
447 - for keyword, description in suspicious_keywords:  
448 - print '%s: %s' % (keyword, description)  
449 - else:  
450 - print 'Suspicious VBA keywords: None found'  
451 -  
452 -  
453 -### Extract potential IOCs (deprecated)  
454 -  
455 -**Deprecated**: It is preferable to use either scan_vba or VBA_Scanner to get all results at once.  
456 -  
457 -The function **detect_patterns** checks if VBA macro code contains specific  
458 -patterns of interest, that may be useful for malware analysis and detection  
459 -(potential Indicators of Compromise): IP addresses, e-mail addresses,  
460 -URLs, executable file names.  
461 -  
462 -It returns a list of tuples containing two strings, the pattern type, and the  
463 -extracted value. (See the malware example above)  
464 -  
465 -Sample usage:  
466 -  
467 - :::python  
468 - from oletools.olevba import detect_patterns  
469 - patterns = detect_patterns(vba_code)  
470 - if patterns:  
471 - print 'Patterns found:'  
472 - for pattern_type, value in patterns:  
473 - print '%s: %s' % (pattern_type, value)  
474 - else:  
475 - print 'Patterns: None found'  
476 -  
477 -  
478 ---------------------------------------------------------------------------  
479 -  
480 -python-oletools documentation  
481 ------------------------------  
482 -  
483 -- [[Home]]  
484 -- [[License]]  
485 -- [[Install]]  
486 -- [[Contribute]], Suggest Improvements or Report Issues  
487 -- Tools:  
488 - - [[olebrowse]]  
489 - - [[oleid]]  
490 - - [[olemeta]]  
491 - - [[oletimes]]  
492 - - [[olevba]]  
493 - - [[pyxswf]] 1 +olevba
  2 +======
  3 +
  4 +olevba is a script to parse OLE and OpenXML files such as MS Office documents
  5 +(e.g. Word, Excel), to **detect VBA Macros**, extract their **source code** in clear text,
  6 +and detect security-related patterns such as **auto-executable macros**, **suspicious
  7 +VBA keywords** used by malware, anti-sandboxing and anti-virtualization techniques,
  8 +and potential **IOCs** (IP addresses, URLs, executable filenames, etc).
  9 +It also detects and decodes several common **obfuscation methods including Hex encoding,
  10 +StrReverse, Base64, Dridex, VBA expressions**, and extracts IOCs from decoded strings.
  11 +
  12 +It can be used either as a command-line tool, or as a python module from your own applications.
  13 +
  14 +It is part of the [python-oletools](http://www.decalage.info/python/oletools) package.
  15 +
  16 +olevba is based on source code from [officeparser](https://github.com/unixfreak0037/officeparser)
  17 +by John William Davison, with significant modifications.
  18 +
  19 +## Supported formats
  20 +
  21 +- Word 97-2003 (.doc, .dot)
  22 +- Word 2007+ (.docm, .dotm)
  23 +- Word 2003 XML (.xml)
  24 +- Word/Excel MHTML, aka Single File Web Page (.mht)
  25 +- Excel 97-2003 (.xls)
  26 +- Excel 2007+ (.xlsm, .xlsb)
  27 +- PowerPoint 2007+ (.pptm, .ppsm)
  28 +
  29 +## Main Features
  30 +
  31 +- Detect VBA macros in MS Office 97-2003 and 2007+ files, XML, MHT
  32 +- Extract VBA macro source code
  33 +- Detect auto-executable macros
  34 +- Detect suspicious VBA keywords often used by malware
  35 +- Detect anti-sandboxing and anti-virtualization techniques
  36 +- Detect and decodes strings obfuscated with Hex/Base64/StrReverse/Dridex
  37 +- Deobfuscates VBA expressions with any combination of Chr, Asc, Val, StrReverse, Environ, +, &, using a VBA parser built with
  38 +[pyparsing](http://pyparsing.wikispaces.com), including custom Hex and Base64 encodings
  39 +- Extract IOCs/patterns of interest such as IP addresses, URLs, e-mail addresses and executable file names
  40 +- Scan multiple files and sample collections (wildcards, recursive)
  41 +- Triage mode for a summary view of multiple files
  42 +- Scan malware samples in password-protected Zip archives
  43 +- Python API to use olevba from your applications
  44 +
  45 +MS Office files encrypted with a password are also supported, because VBA macro code is never
  46 +encrypted, only the content of the document.
  47 +
  48 +## About VBA Macros
  49 +
  50 +See [this article](http://www.decalage.info/en/vba_tools) for more information and technical details about VBA Macros
  51 +and how they are stored in MS Office documents.
  52 +
  53 +## How it works
  54 +
  55 +1. olevba checks the file type: If it is an OLE file (i.e MS Office 97-2003), it is parsed right away.
  56 +1. If it is a zip file (i.e. MS Office 2007+), XML or MHTML, olevba looks for all OLE files stored in it (e.g. vbaProject.bin, editdata.mso), and opens them.
  57 +1. olevba identifies all the VBA projects stored in the OLE structure.
  58 +1. Each VBA project is parsed to find the corresponding OLE streams containing macro code.
  59 +1. In each of these OLE streams, the VBA macro source code is extracted and decompressed (RLE compression).
  60 +1. olevba looks for specific strings obfuscated with various algorithms (Hex, Base64, StrReverse, Dridex, VBA expressions).
  61 +1. olevba scans the macro source code and the deobfuscated strings to find suspicious keywords, auto-executable macros
  62 +and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames, etc).
  63 +
  64 +
  65 +## Usage
  66 +
  67 + :::text
  68 + Usage: olevba.py [options] <filename> [filename2 ...]
  69 +
  70 + Options:
  71 + -h, --help show this help message and exit
  72 + -r find files recursively in subdirectories.
  73 + -z ZIP_PASSWORD, --zip=ZIP_PASSWORD
  74 + if the file is a zip archive, open all files from it,
  75 + using the provided password (requires Python 2.6+)
  76 + -f ZIP_FNAME, --zipfname=ZIP_FNAME
  77 + if the file is a zip archive, file(s) to be opened
  78 + within the zip. Wildcards * and ? are supported.
  79 + (default:*)
  80 + -t, --triage triage mode, display results as a summary table
  81 + (default for multiple files)
  82 + -d, --detailed detailed mode, display full results (default for
  83 + single file)
  84 + -a, --analysis display only analysis results, not the macro source
  85 + code
  86 + -c, --code display only VBA source code, do not analyze it
  87 + -i INPUT, --input=INPUT
  88 + input file containing VBA source code to be analyzed
  89 + (no parsing)
  90 + --decode display all the obfuscated strings with their decoded
  91 + content (Hex, Base64, StrReverse, Dridex, VBA).
  92 + --attr display the attribute lines at the beginning of VBA
  93 + source code
  94 + --reveal display the macro source code after replacing all the
  95 + obfuscated strings by their decoded content.
  96 +
  97 +### Examples
  98 +
  99 +Scan a single file:
  100 +
  101 + :::text
  102 + olevba.py file.doc
  103 +
  104 +Scan a single file, stored in a Zip archive with password "infected":
  105 +
  106 + :::text
  107 + olevba.py malicious_file.xls.zip -z infected
  108 +
  109 +Scan a single file, showing all obfuscated strings decoded:
  110 +
  111 + :::text
  112 + olevba.py file.doc --decode
  113 +
  114 +Scan a single file, showing the macro source code with VBA strings deobfuscated:
  115 +
  116 + :::text
  117 + olevba.py file.doc --reveal
  118 +
  119 +Scan VBA source code extracted into a text file:
  120 +
  121 + :::text
  122 + olevba.py -i source_code.vba
  123 +
  124 +Scan a collection of files stored in a folder:
  125 +
  126 + :::text
  127 + olevba.py MalwareZoo/VBA/*
  128 +
  129 +Scan all .doc and .xls files, recursively in all subfolders:
  130 +
  131 + :::text
  132 + olevba.py MalwareZoo/VBA/*.doc MalwareZoo/VBA/*.xls -r
  133 +
  134 +Scan all .doc files within all .zip files with password, recursively:
  135 +
  136 + :::text
  137 + olevba.py MalwareZoo/VBA/*.zip -r -z infected -f *.doc
  138 +
  139 +
  140 +### Detailed analysis mode (default for single file)
  141 +
  142 +When a single file is scanned, or when using the option -d, all details of the analysis are displayed.
  143 +
  144 +For example, checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/):
  145 +
  146 + :::text
  147 + >olevba.py c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected
  148 + ===============================================================================
  149 + FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip
  150 + Type: OLE
  151 + -------------------------------------------------------------------------------
  152 + VBA MACRO ThisDocument.cls
  153 + in file: DIAN_caso-5415.doc.malware - OLE stream: Macros/VBA/ThisDocument
  154 + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  155 + Option Explicit
  156 + Private Declare Function URLDownloadToFileA Lib "urlmon" (ByVal FVQGKS As Long,_
  157 + ByVal WSGSGY As String, ByVal IFRRFV As String, ByVal NCVOLV As Long, _
  158 + ByVal HQTLDG As Long) As Long
  159 + Sub AutoOpen()
  160 + Auto_Open
  161 + End Sub
  162 + Sub Auto_Open()
  163 + SNVJYQ
  164 + End Sub
  165 + Public Sub SNVJYQ()
  166 + [Malicious Code...]
  167 + End Sub
  168 + Function OGEXYR(XSTAHU As String, PHHWIV As String) As Boolean
  169 + [Malicious Code...]
  170 + Application.DisplayAlerts = False
  171 + Application.Quit
  172 + End Function
  173 + Sub Workbook_Open()
  174 + Auto_Open
  175 + End Sub
  176 +
  177 + - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
  178 + ANALYSIS:
  179 + +------------+----------------------+-----------------------------------------+
  180 + | Type | Keyword | Description |
  181 + +------------+----------------------+-----------------------------------------+
  182 + | AutoExec | AutoOpen | Runs when the Word document is opened |
  183 + | AutoExec | Auto_Open | Runs when the Excel Workbook is opened |
  184 + | AutoExec | Workbook_Open | Runs when the Excel Workbook is opened |
  185 + | Suspicious | Lib | May run code from a DLL |
  186 + | Suspicious | Shell | May run an executable file or a system |
  187 + | | | command |
  188 + | Suspicious | Environ | May read system environment variables |
  189 + | Suspicious | URLDownloadToFileA | May download files from the Internet |
  190 + | IOC | http://germanya.com. | URL |
  191 + | | ec/logs/test.exe" | |
  192 + | IOC | http://germanya.com. | URL |
  193 + | | ec/logs/counter.php" | |
  194 + | IOC | germanya.com | Executable file name |
  195 + | IOC | test.exe | Executable file name |
  196 + | IOC | sfjozjero.exe | Executable file name |
  197 + +------------+----------------------+-----------------------------------------+
  198 +
  199 +### Triage mode (default for multiple files)
  200 +
  201 +When several files are scanned, or when using the option -t, a summary of the analysis for each file is displayed.
  202 +This is more convenient for quick triage of a collection of suspicious files.
  203 +
  204 +The following flags show the results of the analysis:
  205 +
  206 +- **OLE**: the file type is OLE, for example MS Office 97-2003
  207 +- **OpX**: the file type is OpenXML, for example MS Office 2007+
  208 +- **XML**: the file type is Word 2003 XML
  209 +- **MHT**: the file type is Word MHTML, aka Single File Web Page (.mht)
  210 +- **?**: the file type is not supported
  211 +- **M**: contains VBA Macros
  212 +- **A**: auto-executable macros
  213 +- **S**: suspicious VBA keywords
  214 +- **I**: potential IOCs
  215 +- **H**: hex-encoded strings (potential obfuscation)
  216 +- **B**: Base64-encoded strings (potential obfuscation)
  217 +- **D**: Dridex-encoded strings (potential obfuscation)
  218 +- **V**: VBA string expressions (potential obfuscation)
  219 +
  220 +Here is an example:
  221 +
  222 + :::text
  223 + c:\>olevba.py \MalwareZoo\VBA\samples\*
  224 + Flags Filename
  225 + ----------- -----------------------------------------------------------------
  226 + OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware
  227 + OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_1.doc.malware
  228 + OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_2.doc.malware
  229 + OLE:MASI--- \MalwareZoo\VBA\samples\DRIDEX_3.doc.malware
  230 + OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_4.doc.malware
  231 + OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_5.doc.malware
  232 + OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_6.doc.malware
  233 + OLE:MAS---- \MalwareZoo\VBA\samples\DRIDEX_7.doc.malware
  234 + OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_8.doc.malware
  235 + OLE:MASIHBD \MalwareZoo\VBA\samples\DRIDEX_9.xls.malware
  236 + OLE:MASIH-- \MalwareZoo\VBA\samples\DRIDEX_A.doc.malware
  237 + OLE:------- \MalwareZoo\VBA\samples\Normal_Document.doc
  238 + OLE:M------ \MalwareZoo\VBA\samples\Normal_Document_Macro.doc
  239 + OpX:MASI--- \MalwareZoo\VBA\samples\RottenKitten.xlsb.malware
  240 + OLE:MASI-B- \MalwareZoo\VBA\samples\ROVNIX.doc.malware
  241 + OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc
  242 +
  243 +
  244 +--------------------------------------------------------------------------
  245 +
  246 +## How to use olevba in Python applications
  247 +
  248 +olevba may be used to open a MS Office file, detect if it contains VBA macros, extract and analyze the VBA source code
  249 +from your own python applications.
  250 +
  251 +IMPORTANT: olevba is currently under active development, therefore this API is likely to change.
  252 +
  253 +### Import olevba
  254 +
  255 +First, import the **oletools.olevba** package, using at least the VBA_Parser and VBA_Scanner classes:
  256 +
  257 + :::python
  258 + from oletools.olevba import VBA_Parser, TYPE_OLE, TYPE_OpenXML, TYPE_Word2003_XML, TYPE_MHTML
  259 +
  260 +### Parse a MS Office file
  261 +
  262 +To parse a file on disk, create an instance of the **VBA_Parser** class, providing the name of the file to open as parameter.
  263 +For example:
  264 +
  265 + :::python
  266 + vbaparser = VBA_Parser('my_file_with_macros.doc')
  267 +
  268 +The file may also be provided as a bytes string containing its data. In that case, the actual
  269 +filename must be provided for reference, and the file content with the data parameter. For example:
  270 +
  271 + :::python
  272 + myfile = 'my_file_with_macros.doc'
  273 + filedata = open(myfile, 'rb').read()
  274 + vbaparser = VBA_Parser(myfile, data=filedata)
  275 +
  276 +VBA_Parser will raise an exception if the file is not a supported format, such as OLE (MS Office 97-2003), OpenXML
  277 +(MS Office 2007+), MHTML or Word 2003 XML.
  278 +
  279 +After parsing the file, the attribute **VBA_Parser.type** is a string indicating the file type.
  280 +It can be either TYPE_OLE, TYPE_OpenXML, TYPE_Word2003_XML or TYPE_MHTML. (constants defined in the olevba module)
  281 +
  282 +### Detect VBA macros
  283 +
  284 +The method **detect_vba_macros** of a VBA_Parser object returns True if VBA macros have been found in the file,
  285 +False otherwise.
  286 +
  287 + :::python
  288 + if vbaparser.detect_vba_macros():
  289 + print 'VBA Macros found'
  290 + else:
  291 + print 'No VBA Macros found'
  292 +
  293 +Note: The detection algorithm looks for streams and storage with specific names in the OLE structure, which works fine
  294 +for all the supported formats listed above. However, for some formats such as PowerPoint 97-2003, this method will
  295 +always return False because VBA Macros are stored in a different way which is not yet supported by olevba.
  296 +
  297 +Moreover, if the file contains an embedded document (e.g. an Excel workbook inserted into a Word document), this method
  298 +may return True if the embedded document contains VBA Macros, even if the main document does not.
  299 +
  300 +### Extract VBA Macro Source Code
  301 +
  302 +The method **extract_macros** extracts and decompresses source code for each VBA macro found in the file (possibly
  303 +including embedded files). It is a generator yielding a tuple (filename, stream_path, vba_filename, vba_code)
  304 +for each VBA macro found.
  305 +
  306 +- filename: If the file is OLE (MS Office 97-2003), filename is the path of the file.
  307 + If the file is OpenXML (MS Office 2007+), filename is the path of the OLE subfile containing VBA macros within the zip archive,
  308 + e.g. word/vbaProject.bin.
  309 +- stream_path: path of the OLE stream containing the VBA macro source code
  310 +- vba_filename: corresponding VBA filename
  311 +- vba_code: string containing the VBA source code in clear text
  312 +
  313 +Example:
  314 +
  315 + :::python
  316 + for (filename, stream_path, vba_filename, vba_code) in vbaparser.extract_macros():
  317 + print '-'*79
  318 + print 'Filename :', filename
  319 + print 'OLE stream :', stream_path
  320 + print 'VBA filename:', vba_filename
  321 + print '- '*39
  322 + print vba_code
  323 +
  324 +Alternatively, the VBA_Parser method **extract_all_macros** returns the same results as a list of tuples.
  325 +
  326 +### Analyze VBA Source Code
  327 +
  328 +Since version 0.40, the VBA_Parser class provides simpler methods than VBA_Scanner to analyze all macros contained
  329 +in a file:
  330 +
  331 +The methods **scan** or **scan_summary** from the class **VBA_Parser** can be used to scan the source code of all
  332 +VBA modules to find obfuscated strings, suspicious keywords, IOCs, auto-executable macros, etc.
  333 +
  334 +scan() takes an optional argument include_decoded_strings: if set to True, the results will contain all the encoded
  335 +strings found in the code (Hex, Base64, Dridex) with their decoded value.
  336 +By default, it will include the strings which contain printable characters only.
  337 +
  338 +**VBA_Parser.scan()** returns a list of tuples (type, keyword, description), one for each item in the results.
  339 +
  340 +- type may be either 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String', 'Dridex String' or
  341 + 'VBA obfuscated Strings'.
  342 +- keyword is the string found for auto-executable macros, suspicious keywords or IOCs. For obfuscated strings, it is
  343 + the decoded value of the string.
  344 +- description provides a description of the keyword. For obfuscated strings, it is the encoded value of the string.
  345 +
  346 +Example:
  347 +
  348 + :::python
  349 + results = vbaparser.scan()
  350 + for kw_type, keyword, description in results:
  351 + print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)
  352 +
  353 +**VBA_Parser.scan_summary()** returns a tuple with the number of items found for each category:
  354 +(autoexec, suspicious, IOCs, hex, base64, dridex, vbastrings).
  355 +
  356 +
  357 +
  358 +### Close the VBA_Parser
  359 +
  360 +After usage, it is better to call the **close** method of the VBA_Parser object, to make sure the file is closed,
  361 +especially if your application is parsing many files.
  362 +
  363 + :::python
  364 + vbaparser.close()
  365 +
  366 +
  367 +--------------------------------------------------------------------------
  368 +
  369 +## Deprecated API
  370 +
  371 +The following methods and functions are still functional, but their usage is not recommended
  372 +since they have been replaced by better solutions.
  373 +
  374 +### VBA_Scanner (deprecated)
  375 +
  376 +Note: this API is under active development and may change in the future.
  377 +
  378 +The class **VBA_Scanner** can be used to scan the source code of a VBA module to find obfuscated strings,
  379 +suspicious keywords, IOCs, auto-executable macros, etc.
  380 +
  381 +First, create a VBA_Scanner object with a string containing the VBA source code (for example returned by the
  382 +extract_macros method). Then call the methods **scan** or **scan_summary** to get the results of the analysis.
  383 +
  384 +scan() takes an optional argument include_decoded_strings: if set to True, the results will contain all the encoded
  385 +strings found in the code (Hex, Base64, Dridex) with their decoded value.
  386 +
  387 +**scan** returns a list of tuples (type, keyword, description), one for each item in the results.
  388 +
  389 +- type may be either 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String'.
  390 +- keyword is the string found for auto-executable macros, suspicious keywords or IOCs. For obfuscated strings, it is
  391 + the decoded value of the string.
  392 +- description provides a description of the keyword. For obfuscated strings, it is the encoded value of the string.
  393 +
  394 +Example:
  395 +
  396 + :::python
  397 + vba_scanner = VBA_Scanner(vba_code)
  398 + results = vba_scanner.scan(include_decoded_strings=True)
  399 + for kw_type, keyword, description in results:
  400 + print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)
  401 +
  402 +The function **scan_vba** is a shortcut for VBA_Scanner(vba_code).scan():
  403 +
  404 + :::python
  405 + results = scan_vba(vba_code, include_decoded_strings=True)
  406 + for kw_type, keyword, description in results:
  407 + print 'type=%s - keyword=%s - description=%s' % (kw_type, keyword, description)
  408 +
  409 +**scan_summary** returns a tuple with the number of items found for each category:
  410 +(autoexec, suspicious, IOCs, hex, base64, dridex).
  411 +
  412 +
  413 +### Detect auto-executable macros (deprecated)
  414 +
  415 +**Deprecated**: It is preferable to use either scan_vba or VBA_Scanner to get all results at once.
  416 +
  417 +The function **detect_autoexec** checks if VBA macro code contains specific macro names
  418 +that will be triggered when the document/workbook is opened, closed, changed, etc.
  419 +
  420 +It returns a list of tuples containing two strings, the detected keyword, and the
  421 +description of the trigger. (See the malware example above)
  422 +
  423 +Sample usage:
  424 +
  425 + :::python
  426 + from oletools.olevba import detect_autoexec
  427 + autoexec_keywords = detect_autoexec(vba_code)
  428 + if autoexec_keywords:
  429 + print 'Auto-executable macro keywords found:'
  430 + for keyword, description in autoexec_keywords:
  431 + print '%s: %s' % (keyword, description)
  432 + else:
  433 + print 'Auto-executable macro keywords: None found'
  434 +
  435 +
  436 +### Detect suspicious VBA keywords (deprecated)
  437 +
  438 +**Deprecated**: It is preferable to use either scan_vba or VBA_Scanner to get all results at once.
  439 +
  440 +The function **detect_suspicious** checks if VBA macro code contains specific
  441 +keywords often used by malware to act on the system (create files, run
  442 +commands or applications, write to the registry, etc).
  443 +
  444 +It returns a list of tuples containing two strings, the detected keyword, and the
  445 +description of the corresponding malicious behaviour. (See the malware example above)
  446 +
  447 +Sample usage:
  448 +
  449 + :::python
  450 + from oletools.olevba import detect_suspicious
  451 + suspicious_keywords = detect_suspicious(vba_code)
  452 + if suspicious_keywords:
  453 + print 'Suspicious VBA keywords found:'
  454 + for keyword, description in suspicious_keywords:
  455 + print '%s: %s' % (keyword, description)
  456 + else:
  457 + print 'Suspicious VBA keywords: None found'
  458 +
  459 +
  460 +### Extract potential IOCs (deprecated)
  461 +
  462 +**Deprecated**: It is preferable to use either scan_vba or VBA_Scanner to get all results at once.
  463 +
  464 +The function **detect_patterns** checks if VBA macro code contains specific
  465 +patterns of interest, that may be useful for malware analysis and detection
  466 +(potential Indicators of Compromise): IP addresses, e-mail addresses,
  467 +URLs, executable file names.
  468 +
  469 +It returns a list of tuples containing two strings, the pattern type, and the
  470 +extracted value. (See the malware example above)
  471 +
  472 +Sample usage:
  473 +
  474 + :::python
  475 + from oletools.olevba import detect_patterns
  476 + patterns = detect_patterns(vba_code)
  477 + if patterns:
  478 + print 'Patterns found:'
  479 + for pattern_type, value in patterns:
  480 + print '%s: %s' % (pattern_type, value)
  481 + else:
  482 + print 'Patterns: None found'
  483 +
  484 +
  485 +--------------------------------------------------------------------------
  486 +
  487 +python-oletools documentation
  488 +-----------------------------
  489 +
  490 +- [[Home]]
  491 +- [[License]]
  492 +- [[Install]]
  493 +- [[Contribute]], Suggest Improvements or Report Issues
  494 +- Tools:
  495 + - [[olebrowse]]
  496 + - [[oleid]]
  497 + - [[olemeta]]
  498 + - [[oletimes]]
  499 + - [[olevba]]
  500 + - [[pyxswf]]
494 - [[rtfobj]] 501 - [[rtfobj]]
495 \ No newline at end of file 502 \ No newline at end of file
oletools/olevba.py
@@ -2100,6 +2100,7 @@ class VBA_Parser_CLI(VBA_Parser): @@ -2100,6 +2100,7 @@ class VBA_Parser_CLI(VBA_Parser):
2100 2100
2101 2101
2102 def reveal(self): 2102 def reveal(self):
  2103 + #TODO: move this code to the VBA_Parser class (without print)
2103 print 'MACRO SOURCE CODE WITH DEOBFUSCATED VBA STRINGS (EXPERIMENTAL):\n' 2104 print 'MACRO SOURCE CODE WITH DEOBFUSCATED VBA STRINGS (EXPERIMENTAL):\n'
2104 # we only want printable strings: 2105 # we only want printable strings:
2105 analysis = self.analyze_macros(show_decoded_strings=False) 2106 analysis = self.analyze_macros(show_decoded_strings=False)