Commit c1d26ba7fe93a2070ccc675a2a94c2c948ec2577

Authored by Philippe Lagadec
1 parent 03c0a9ec

pyxswf v0.02: added extraction from RTF embedded objects, with new rtfobj module

README.md
@@ -12,13 +12,14 @@ Tools in python-oletools: @@ -12,13 +12,14 @@ Tools in python-oletools:
12 view and extract individual data streams. 12 view and extract individual data streams.
13 - **oleid**: a tool to analyze OLE files to detect specific characteristics that could potentially indicate that the file is suspicious or malicious. 13 - **oleid**: a tool to analyze OLE files to detect specific characteristics that could potentially indicate that the file is suspicious or malicious.
14 - **pyxswf**: a tool to detect, extract and analyze Flash objects (SWF) that may 14 - **pyxswf**: a tool to detect, extract and analyze Flash objects (SWF) that may
15 - be embedded in files such as MS Office documents (e.g. Word, Excel), 15 + be embedded in files such as MS Office documents (e.g. Word, Excel) and RTF,
16 which is especially useful for malware analysis. 16 which is especially useful for malware analysis.
17 - and a few others (coming soon) 17 - and a few others (coming soon)
18 18
19 News 19 News
20 ---- 20 ----
21 21
  22 +- 2012-11-09 v0.03: Improved pyxswf to extract Flash objects from RTF
22 - 2012-10-29 v0.02: Added oleid 23 - 2012-10-29 v0.02: Added oleid
23 - 2012-10-09 v0.01: Initial version of olebrowse and pyxswf 24 - 2012-10-09 v0.01: Initial version of olebrowse and pyxswf
24 - see changelog in source code for more info. 25 - see changelog in source code for more info.
@@ -84,13 +85,18 @@ their OLE structure properly, which is necessary when streams are fragmented. @@ -84,13 +85,18 @@ their OLE structure properly, which is necessary when streams are fragmented.
84 Stream fragmentation is a known obfuscation technique, as explained on 85 Stream fragmentation is a known obfuscation technique, as explained on
85 [http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/](http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/) 86 [http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/](http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/)
86 87
87 -For this, simply add the -o option to work on OLE streams rather than raw files. 88 +It can also extract Flash objects from RTF documents, by parsing embedded objects encoded in hexadecimal format (-f option).
  89 +
  90 +
  91 +For this, simply add the -o option to work on OLE streams rather than raw files, or the -f option to work on RTF files.
88 92
89 Usage: pyxswf.py [options] <file.bad> 93 Usage: pyxswf.py [options] <file.bad>
90 94
91 Options: 95 Options:
92 -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF 96 -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF
93 in each stream 97 in each stream
  98 + -f, --rtf Parse an RTF file to look for SWF in each embedded
  99 + object
94 -x, --extract Extracts the embedded SWF(s), names it MD5HASH.swf & 100 -x, --extract Extracts the embedded SWF(s), names it MD5HASH.swf &
95 saves it in the working dir. No addition args needed 101 saves it in the working dir. No addition args needed
96 -h, --help show this help message and exit 102 -h, --help show this help message and exit
@@ -106,7 +112,7 @@ For this, simply add the -o option to work on OLE streams rather than raw files. @@ -106,7 +112,7 @@ For this, simply add the -o option to work on OLE streams rather than raw files.
106 contain SWFs. Must provide path in quotes 112 contain SWFs. Must provide path in quotes
107 -c, --compress Compresses the SWF using Zlib 113 -c, --compress Compresses the SWF using Zlib
108 114
109 -Example - detecting and extracting a SWF file from a Word document on Windows: 115 +Example 1 - detecting and extracting a SWF file from a Word document on Windows:
110 116
111 C:\oletools>pyxswf.py -o word_flash.doc 117 C:\oletools>pyxswf.py -o word_flash.doc
112 OLE stream: 'Contents' 118 OLE stream: 'Contents'
@@ -118,7 +124,16 @@ Example - detecting and extracting a SWF file from a Word document on Windows: @@ -118,7 +124,16 @@ Example - detecting and extracting a SWF file from a Word document on Windows:
118 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents 124 [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents
119 [ADDR] SWF 1 at 0x8 - FWS Header 125 [ADDR] SWF 1 at 0x8 - FWS Header
120 [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf 126 [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
121 - 127 +
  128 +Example 2 - detecting and extracting a SWF file from a RTF document on Windows:
  129 +
  130 + C:\oletools>pyxswf.py -xf "rtf_flash.rtf"
  131 + RTF embedded object size 1498557 at index 000036DD
  132 + [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
  133 + 00036DD
  134 + [ADDR] SWF 1 at 0xc40 - FWS Header
  135 + [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
  136 +
122 For more info, see [http://www.decalage.info/python/pyxswf](http://www.decalage.info/python/pyxswf) 137 For more info, see [http://www.decalage.info/python/pyxswf](http://www.decalage.info/python/pyxswf)
123 138
124 139
oletools/README.txt
@@ -25,12 +25,13 @@ Tools in python-oletools: @@ -25,12 +25,13 @@ Tools in python-oletools:
25 suspicious or malicious. 25 suspicious or malicious.
26 - **pyxswf**: a tool to detect, extract and analyze Flash objects (SWF) 26 - **pyxswf**: a tool to detect, extract and analyze Flash objects (SWF)
27 that may be embedded in files such as MS Office documents (e.g. Word, 27 that may be embedded in files such as MS Office documents (e.g. Word,
28 - Excel), which is especially useful for malware analysis. 28 + Excel) and RTF, which is especially useful for malware analysis.
29 - and a few others (coming soon) 29 - and a few others (coming soon)
30 30
31 News 31 News
32 ---- 32 ----
33 33
  34 +- 2012-11-09 v0.03: Improved pyxswf to extract Flash objects from RTF
34 - 2012-10-29 v0.02: Added oleid 35 - 2012-10-29 v0.02: Added oleid
35 - 2012-10-09 v0.01: Initial version of olebrowse and pyxswf 36 - 2012-10-09 v0.01: Initial version of olebrowse and pyxswf
36 - see changelog in source code for more info. 37 - see changelog in source code for more info.
@@ -112,8 +113,11 @@ are fragmented. Stream fragmentation is a known obfuscation technique, @@ -112,8 +113,11 @@ are fragmented. Stream fragmentation is a known obfuscation technique,
112 as explained on 113 as explained on
113 `http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/ <http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/>`_ 114 `http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/ <http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/>`_
114 115
  116 +It can also extract Flash objects from RTF documents, by parsing
  117 +embedded objects encoded in hexadecimal format (-f option).
  118 +
115 For this, simply add the -o option to work on OLE streams rather than 119 For this, simply add the -o option to work on OLE streams rather than
116 -raw files. 120 +raw files, or the -f option to work on RTF files.
117 121
118 :: 122 ::
119 123
@@ -122,6 +126,8 @@ raw files. @@ -122,6 +126,8 @@ raw files.
122 Options: 126 Options:
123 -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF 127 -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF
124 in each stream 128 in each stream
  129 + -f, --rtf Parse an RTF file to look for SWF in each embedded
  130 + object
125 -x, --extract Extracts the embedded SWF(s), names it MD5HASH.swf & 131 -x, --extract Extracts the embedded SWF(s), names it MD5HASH.swf &
126 saves it in the working dir. No addition args needed 132 saves it in the working dir. No addition args needed
127 -h, --help show this help message and exit 133 -h, --help show this help message and exit
@@ -137,7 +143,7 @@ raw files. @@ -137,7 +143,7 @@ raw files.
137 contain SWFs. Must provide path in quotes 143 contain SWFs. Must provide path in quotes
138 -c, --compress Compresses the SWF using Zlib 144 -c, --compress Compresses the SWF using Zlib
139 145
140 -Example - detecting and extracting a SWF file from a Word document on 146 +Example 1 - detecting and extracting a SWF file from a Word document on
141 Windows: 147 Windows:
142 148
143 :: 149 ::
@@ -153,6 +159,18 @@ Windows: @@ -153,6 +159,18 @@ Windows:
153 [ADDR] SWF 1 at 0x8 - FWS Header 159 [ADDR] SWF 1 at 0x8 - FWS Header
154 [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf 160 [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
155 161
  162 +Example 2 - detecting and extracting a SWF file from a RTF document on
  163 +Windows:
  164 +
  165 +::
  166 +
  167 + C:\oletools>pyxswf.py -xf "rtf_flash.rtf"
  168 + RTF embedded object size 1498557 at index 000036DD
  169 + [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0
  170 + 00036DD
  171 + [ADDR] SWF 1 at 0xc40 - FWS Header
  172 + [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf
  173 +
156 For more info, see 174 For more info, see
157 `http://www.decalage.info/python/pyxswf <http://www.decalage.info/python/pyxswf>`_ 175 `http://www.decalage.info/python/pyxswf <http://www.decalage.info/python/pyxswf>`_
158 176
oletools/pyxswf.py
1 #!/usr/bin/env python 1 #!/usr/bin/env python
2 """ 2 """
3 -pyxswf.py - Philippe Lagadec 2012-09-17 3 +pyxswf.py
4 4
5 pyxswf is a script to detect, extract and analyze Flash objects (SWF) that may 5 pyxswf is a script to detect, extract and analyze Flash objects (SWF) that may
6 be embedded in files such as MS Office documents (e.g. Word, Excel), 6 be embedded in files such as MS Office documents (e.g. Word, Excel),
7 which is especially useful for malware analysis. 7 which is especially useful for malware analysis.
  8 +
8 pyxswf is an extension to xxxswf.py published by Alexander Hanel on 9 pyxswf is an extension to xxxswf.py published by Alexander Hanel on
9 http://hooked-on-mnemonics.blogspot.nl/2011/12/xxxswfpy.html 10 http://hooked-on-mnemonics.blogspot.nl/2011/12/xxxswfpy.html
10 Compared to xxxswf, it can extract streams from MS Office documents by parsing 11 Compared to xxxswf, it can extract streams from MS Office documents by parsing
11 -their OLE structure properly, which is necessary when streams are fragmented. 12 +their OLE structure properly (-o option), which is necessary when streams are
  13 +fragmented.
12 Stream fragmentation is a known obfuscation technique, as explained on 14 Stream fragmentation is a known obfuscation technique, as explained on
13 http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/ 15 http://www.breakingpointsystems.com/resources/blog/evasion-with-ole2-fragmentation/
14 16
  17 +It can also extract Flash objects from RTF documents, by parsing embedded
  18 +objects encoded in hexadecimal format (-f option).
  19 +
15 pyxswf project website: http://www.decalage.info/python/pyxswf 20 pyxswf project website: http://www.decalage.info/python/pyxswf
16 21
17 pyxswf is part of the python-oletools package: 22 pyxswf is part of the python-oletools package:
@@ -41,18 +46,19 @@ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE @@ -41,18 +46,19 @@ OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
41 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 46 OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
42 """ 47 """
43 48
44 -__version__ = '0.01' 49 +__version__ = '0.02'
45 50
46 #------------------------------------------------------------------------------ 51 #------------------------------------------------------------------------------
47 # CHANGELOG: 52 # CHANGELOG:
48 # 2012-09-17 v0.01 PL: - first version 53 # 2012-09-17 v0.01 PL: - first version
  54 +# 2012-11-09 v0.02 PL: - added RTF embedded objects extraction
49 55
50 #------------------------------------------------------------------------------ 56 #------------------------------------------------------------------------------
51 # TODO: 57 # TODO:
52 # - check if file is OLE 58 # - check if file is OLE
53 # - support -r 59 # - support -r
54 60
55 -import optparse, sys, os 61 +import optparse, sys, os, rtfobj, StringIO
56 from thirdparty.xxxswf import xxxswf 62 from thirdparty.xxxswf import xxxswf
57 from thirdparty.OleFileIO_PL import OleFileIO_PL 63 from thirdparty.OleFileIO_PL import OleFileIO_PL
58 64
@@ -76,6 +82,7 @@ def main(): @@ -76,6 +82,7 @@ def main():
76 parser.add_option('-c', '--compress', action='store_true', dest='compress', help='Compresses the SWF using Zlib') 82 parser.add_option('-c', '--compress', action='store_true', dest='compress', help='Compresses the SWF using Zlib')
77 83
78 parser.add_option('-o', '--ole', action='store_true', dest='ole', help='Parse an OLE file (e.g. Word, Excel) to look for SWF in each stream') 84 parser.add_option('-o', '--ole', action='store_true', dest='ole', help='Parse an OLE file (e.g. Word, Excel) to look for SWF in each stream')
  85 + parser.add_option('-f', '--rtf', action='store_true', dest='rtf', help='Parse an RTF file to look for SWF in each embedded object')
79 86
80 87
81 (options, args) = parser.parse_args() 88 (options, args) = parser.parse_args()
@@ -85,6 +92,7 @@ def main(): @@ -85,6 +92,7 @@ def main():
85 parser.print_help() 92 parser.print_help()
86 return 93 return
87 94
  95 + # OLE MODE:
88 if options.ole: 96 if options.ole:
89 for filename in args: 97 for filename in args:
90 ole = OleFileIO_PL.OleFileIO(filename) 98 ole = OleFileIO_PL.OleFileIO(filename)
@@ -99,6 +107,18 @@ def main(): @@ -99,6 +107,18 @@ def main():
99 xxxswf.disneyland(f, direntry.name, options) 107 xxxswf.disneyland(f, direntry.name, options)
100 f.close() 108 f.close()
101 ole.close() 109 ole.close()
  110 +
  111 + # RTF MODE:
  112 + elif options.rtf:
  113 + for filename in args:
  114 + for index, data in rtfobj.rtf_iter_objects(filename):
  115 + if 'FWS' in data or 'CWS' in data:
  116 + print 'RTF embedded object size %d at index %08X' % (len(data), index)
  117 + f = StringIO.StringIO(data)
  118 + name = 'RTF_embedded_object_%08X' % index
  119 + # call xxxswf to scan or extract Flash files:
  120 + xxxswf.disneyland(f, name, options)
  121 +
102 else: 122 else:
103 xxxswf.main() 123 xxxswf.main()
104 124
oletools/rtfobj.py 0 โ†’ 100644
  1 +#!/usr/bin/env python
  2 +"""
  3 +rtfobj.py - Philippe Lagadec 2012-11-09
  4 +
  5 +rtfobj is a Python module to extract embedded objects from RTF files, such as
  6 +OLE ojects. It can be used as a Python library or a command-line tool.
  7 +
  8 +Usage: rtfobj.py <file.rtf>
  9 +
  10 +rtfobj project website: http://www.decalage.info/python/rtfobj
  11 +
  12 +rtfobj is part of the python-oletools package:
  13 +http://www.decalage.info/python/oletools
  14 +
  15 +rtfobj is copyright (c) 2012, Philippe Lagadec (http://www.decalage.info)
  16 +All rights reserved.
  17 +
  18 +Redistribution and use in source and binary forms, with or without modification,
  19 +are permitted provided that the following conditions are met:
  20 +
  21 + * Redistributions of source code must retain the above copyright notice, this
  22 + list of conditions and the following disclaimer.
  23 + * Redistributions in binary form must reproduce the above copyright notice,
  24 + this list of conditions and the following disclaimer in the documentation
  25 + and/or other materials provided with the distribution.
  26 +
  27 +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
  28 +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
  29 +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
  30 +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
  31 +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
  32 +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
  33 +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
  34 +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
  35 +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
  36 +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  37 +"""
  38 +
  39 +__version__ = '0.01'
  40 +
  41 +#------------------------------------------------------------------------------
  42 +# CHANGELOG:
  43 +# 2012-11-09 v0.01 PL: - first version
  44 +
  45 +#------------------------------------------------------------------------------
  46 +# TODO:
  47 +# - improve regex pattern for better performance?
  48 +
  49 +import re, sys, string, binascii
  50 +
  51 +# REGEX pattern to extract embedded OLE objects in hexadecimal format:
  52 +# alphanum digit: [0-9A-Fa-f]
  53 +# hex char = two alphanum digits: [0-9A-Fa-f]{2}
  54 +# several hex chars, at least 4: (?:[0-9A-Fa-f]{2}){4,}
  55 +# at least 4 hex chars, followed by whitespace or CR/LF: (?:[0-9A-Fa-f]{2}){4,}\s*
  56 +PATTERN = r'(?:(?:[0-9A-Fa-f]{2})+\s*)*(?:[0-9A-Fa-f]{2}){4,}'
  57 +
  58 +# a dummy translation table for str.translate, which does not change anythying:
  59 +TRANSTABLE_NOCHANGE = string.maketrans('', '')
  60 +
  61 +
  62 +def rtf_iter_objects (filename, min_size=32):
  63 + """
  64 + Open a RTF file, extract each embedded object encoded in hexadecimal of
  65 + size > min_size, yield the index of the object in the RTF file and its data
  66 + in binary format.
  67 + This is an iterator.
  68 + """
  69 + data = open(filename, 'rb').read()
  70 + for m in re.finditer(PATTERN, data):
  71 + found = m.group(0)
  72 + # remove all whitespace and line feeds:
  73 + #NOTE: with Python 2.6+, we could use None instead of TRANSTABLE_NOCHANGE
  74 + found = found.translate(TRANSTABLE_NOCHANGE, ' \t\r\n\f\v')
  75 + found = binascii.unhexlify(found)
  76 + #print repr(found)
  77 + if len(found)>min_size:
  78 + yield m.start(), found
  79 +
  80 +if __name__ == '__main__':
  81 + if len(sys.argv<2):
  82 + sys.exit(__doc__)
  83 + for index, data in rtf_iter_objects(sys.argv[1]):
  84 + print 'found object size %d at index %08X' % (len(data), index)
  85 + fname = 'object_%08X.bin' % index
  86 + print 'saving to file %s' % fname
  87 + open(fname, 'wb').write(data)