Commit 677d9ad57da9670ed58cd2d824b5baf7ac7c5c64
Merge remote-tracking branch 'upstream/master'
Showing
93 changed files
with
4772 additions
and
5845 deletions
.travis.yml
INSTALL.txt
| 1 | -How to Download and Install python-oletools | |
| 2 | -=========================================== | |
| 1 | +How to Download and Install oletools | |
| 2 | +==================================== | |
| 3 | 3 | |
| 4 | 4 | Pre-requisites |
| 5 | 5 | -------------- |
| 6 | 6 | |
| 7 | -The recommended Python version to run oletools is Python 2.7. | |
| 8 | -Python 2.6 is also supported, but as it is not tested as often as 2.7, some features | |
| 9 | -might not work as expected. | |
| 10 | - | |
| 11 | -Since v0.50, oletools can also run with Python 3.x. As this is quite new, please | |
| 12 | -report any issue you may encounter. | |
| 13 | - | |
| 7 | +The recommended Python version to run oletools is the latest **Python 3.x** (3.7 for now). | |
| 8 | +Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly | |
| 9 | +recommended to switch to Python 3 now. | |
| 14 | 10 | |
| 15 | 11 | Recommended way to Download+Install/Update oletools: pip |
| 16 | 12 | -------------------------------------------------------- |
| ... | ... | @@ -23,7 +19,11 @@ system, either upgrade Python or see https://pip.pypa.io/en/stable/installing/ |
| 23 | 19 | To download and install/update the latest release version of oletools, |
| 24 | 20 | run the following command in a shell: |
| 25 | 21 | |
| 22 | +```text | |
| 26 | 23 | sudo -H pip install -U oletools |
| 24 | +``` | |
| 25 | + | |
| 26 | +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. | |
| 27 | 27 | |
| 28 | 28 | **Important**: Since version 0.50, pip will automatically create convenient command-line scripts |
| 29 | 29 | in /usr/local/bin to run all the oletools from any directory. |
| ... | ... | @@ -33,7 +33,19 @@ in /usr/local/bin to run all the oletools from any directory. |
| 33 | 33 | To download and install/update the latest release version of oletools, |
| 34 | 34 | run the following command in a cmd window: |
| 35 | 35 | |
| 36 | +```text | |
| 36 | 37 | pip install -U oletools |
| 38 | +``` | |
| 39 | + | |
| 40 | +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. | |
| 41 | + | |
| 42 | +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip | |
| 43 | +and install for all users. If that is not possible, you may also install only for the current user | |
| 44 | +by adding the `--user` option: | |
| 45 | + | |
| 46 | +```text | |
| 47 | +pip3 install -U --user oletools | |
| 48 | +``` | |
| 37 | 49 | |
| 38 | 50 | **Important**: Since version 0.50, pip will automatically create convenient command-line scripts |
| 39 | 51 | to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc. |
| ... | ... | @@ -47,18 +59,33 @@ you may also use pip: |
| 47 | 59 | |
| 48 | 60 | ### Linux, Mac OSX, Unix |
| 49 | 61 | |
| 62 | +```text | |
| 50 | 63 | sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip |
| 64 | +``` | |
| 65 | + | |
| 66 | +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. | |
| 51 | 67 | |
| 52 | 68 | ### Windows |
| 53 | 69 | |
| 70 | +```text | |
| 54 | 71 | pip install -U https://github.com/decalage2/oletools/archive/master.zip |
| 72 | +``` | |
| 73 | + | |
| 74 | +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. | |
| 75 | + | |
| 76 | +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip | |
| 77 | +and install for all users. If that is not possible, you may also install only for the current user | |
| 78 | +by adding the `--user` option: | |
| 55 | 79 | |
| 80 | +```text | |
| 81 | +pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip | |
| 82 | +``` | |
| 56 | 83 | |
| 57 | 84 | How to install offline - Computer without Internet access |
| 58 | 85 | --------------------------------------------------------- |
| 59 | 86 | |
| 60 | 87 | First, download the oletools archive on a computer with Internet access: |
| 61 | -* Latest stable version: from https://github.com/decalage2/oletools/releases | |
| 88 | +* Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases | |
| 62 | 89 | * Development version: https://github.com/decalage2/oletools/archive/master.zip |
| 63 | 90 | |
| 64 | 91 | Copy the archive file to the target computer. |
| ... | ... | @@ -66,11 +93,15 @@ Copy the archive file to the target computer. |
| 66 | 93 | On Linux, Mac OSX, Unix, run the following command using the filename of the |
| 67 | 94 | archive that you downloaded: |
| 68 | 95 | |
| 96 | +```text | |
| 69 | 97 | sudo -H pip install -U oletools.zip |
| 98 | +``` | |
| 70 | 99 | |
| 71 | 100 | On Windows: |
| 72 | 101 | |
| 102 | +```text | |
| 73 | 103 | pip install -U oletools.zip |
| 104 | +``` | |
| 74 | 105 | |
| 75 | 106 | |
| 76 | 107 | Old school install using setup.py |
| ... | ... | @@ -88,9 +119,12 @@ Then extract the archive, open a shell and go to the oletools directory. |
| 88 | 119 | |
| 89 | 120 | ### Linux, Mac OSX, Unix |
| 90 | 121 | |
| 122 | +```text | |
| 91 | 123 | sudo -H python setup.py install |
| 124 | +``` | |
| 92 | 125 | |
| 93 | 126 | ### Windows: |
| 94 | 127 | |
| 128 | +```text | |
| 95 | 129 | python setup.py install |
| 96 | - | |
| 130 | +``` | ... | ... |
LICENSE.md
0 → 100644
| 1 | +This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files | |
| 2 | +published with their own license. | |
| 3 | + | |
| 4 | +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info) | |
| 5 | + | |
| 6 | +All rights reserved. | |
| 7 | + | |
| 8 | +Redistribution and use in source and binary forms, with or without modification, | |
| 9 | +are permitted provided that the following conditions are met: | |
| 10 | + | |
| 11 | + * Redistributions of source code must retain the above copyright notice, this | |
| 12 | + list of conditions and the following disclaimer. | |
| 13 | + * Redistributions in binary form must reproduce the above copyright notice, | |
| 14 | + this list of conditions and the following disclaimer in the documentation | |
| 15 | + and/or other materials provided with the distribution. | |
| 16 | + | |
| 17 | +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |
| 18 | +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
| 19 | +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |
| 20 | +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | |
| 21 | +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
| 22 | +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | |
| 23 | +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | |
| 24 | +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | |
| 25 | +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
| 26 | +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
| 27 | + | |
| 28 | + | |
| 29 | +---------- | |
| 30 | + | |
| 31 | +olevba contains modified source code from the officeparser project, published | |
| 32 | +under the following MIT License (MIT): | |
| 33 | + | |
| 34 | +officeparser is copyright (c) 2014 John William Davison | |
| 35 | + | |
| 36 | +Permission is hereby granted, free of charge, to any person obtaining a copy | |
| 37 | +of this software and associated documentation files (the "Software"), to deal | |
| 38 | +in the Software without restriction, including without limitation the rights | |
| 39 | +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
| 40 | +copies of the Software, and to permit persons to whom the Software is | |
| 41 | +furnished to do so, subject to the following conditions: | |
| 42 | + | |
| 43 | +The above copyright notice and this permission notice shall be included in all | |
| 44 | +copies or substantial portions of the Software. | |
| 45 | + | |
| 46 | +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
| 47 | +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
| 48 | +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
| 49 | +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
| 50 | +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
| 51 | +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |
| 52 | +SOFTWARE. | ... | ... |
MANIFEST.in
0 → 100644
| 1 | +include install.bat | |
| 2 | +include INSTALL.txt | |
| 3 | +include README.md | |
| 4 | +include requirements.txt | |
| 5 | +include oletools/README.rst | |
| 6 | +include oletools/README.html | |
| 7 | +include oletools/LICENSE.txt | |
| 8 | +include oletools/DocVarDump.vba | |
| 9 | +recursive-include oletools/thirdparty *.* | |
| 10 | +recursive-include cheatsheet *.* | |
| 11 | +global-exclude *.pyc | |
| 12 | + | |
| 13 | +recursive-include tests *.py | |
| 14 | +graft tests/test-data | ... | ... |
README.md
| ... | ... | @@ -26,7 +26,25 @@ Note: python-oletools is not related to OLETools published by BeCubed Software. |
| 26 | 26 | News |
| 27 | 27 | ---- |
| 28 | 28 | |
| 29 | -- **2018-05-30 v0.53**: | |
| 29 | +- **2019-05-22 v0.54.2**: | |
| 30 | + - bugfix release: fixed several issues related to encrypted documents | |
| 31 | + and XLM/XLF Excel 4 macros | |
| 32 | + - msoffcrypto-tool is now installed by default to handle encrypted documents | |
| 33 | + - olevba and msodde now handle documents encrypted with common passwords such | |
| 34 | + as 123, 1234, 4321, 12345, 123456, VelvetSweatShop automatically. | |
| 35 | +- **2019-04-04 v0.54**: | |
| 36 | + - olevba, msodde: added support for encrypted MS Office files | |
| 37 | + - olevba: added detection and extraction of XLM/XLF Excel 4 macros (thanks to plugin_biff from Didier Stevens' oledump) | |
| 38 | + - olevba, mraptor: added detection of VBA running Excel 4 macros | |
| 39 | + - olevba: detect and display special characters such as backspace | |
| 40 | + - olevba: colorized output showing suspicious keywords in the VBA code | |
| 41 | + - olevba, mraptor: full Python 3 compatibility, no separate olevba3/mraptor3 anymore | |
| 42 | + - olevba: improved handling of code pages and unicode | |
| 43 | + - olevba: fixed a false-positive in VBA macro detection | |
| 44 | + - rtfobj: improved OLE Package handling, improved Equation object detection | |
| 45 | + - oleobj: added detection of external links to objects in OpenXML | |
| 46 | + - replaced third party packages by PyPI dependencies | |
| 47 | +- 2018-05-30 v0.53: | |
| 30 | 48 | - olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML files (aka Flat OPC format) |
| 31 | 49 | - improved support for VBA forms in olevba (oleform) |
| 32 | 50 | - rtfobj now displays the CLSID of OLE objects, which is the best way to identify them. Known-bad CLSIDs such as MS Equation Editor are highlighted in red. |
| ... | ... | @@ -75,26 +93,38 @@ Projects using oletools: |
| 75 | 93 | ------------------------ |
| 76 | 94 | |
| 77 | 95 | oletools are used by a number of projects and online malware analysis services, |
| 78 | -including [Viper](http://viper.li/), [REMnux](https://remnux.org/), | |
| 96 | +including | |
| 97 | +[ACE](https://github.com/IntegralDefense/ACE), | |
| 98 | +[Anlyz.io](https://sandbox.anlyz.io/), | |
| 99 | +[AssemblyLine](https://www.cse-cst.gc.ca/en/assemblyline), | |
| 100 | +[CAPE](https://github.com/ctxis/CAPE), | |
| 101 | +[Cuckoo Sandbox](https://github.com/cuckoosandbox/cuckoo), | |
| 102 | +[DARKSURGEON](https://github.com/cryps1s/DARKSURGEON), | |
| 103 | +[Deepviz](https://sandbox.deepviz.com/), | |
| 104 | +[dridex.malwareconfig.com](https://dridex.malwareconfig.com), | |
| 79 | 105 | [FAME](https://certsocietegenerale.github.io/fame/), |
| 106 | +[FLARE-VM](https://github.com/fireeye/flare-vm), | |
| 80 | 107 | [Hybrid-analysis.com](https://www.hybrid-analysis.com/), |
| 81 | 108 | [Joe Sandbox](https://www.document-analyzer.net/), |
| 82 | -[Deepviz](https://sandbox.deepviz.com/), | |
| 83 | 109 | [Laika BOSS](https://github.com/lmco/laikaboss), |
| 84 | -[Cuckoo Sandbox](https://github.com/cuckoosandbox/cuckoo), | |
| 85 | -[Anlyz.io](https://sandbox.anlyz.io/), | |
| 86 | -[ViperMonkey](https://github.com/decalage2/ViperMonkey), | |
| 87 | -[pcodedmp](https://github.com/bontchev/pcodedmp), | |
| 88 | -[dridex.malwareconfig.com](https://dridex.malwareconfig.com), | |
| 89 | -[Snake](https://github.com/countercept/snake), | |
| 90 | -[DARKSURGEON](https://github.com/cryps1s/DARKSURGEON), | |
| 91 | -[CAPE](https://github.com/ctxis/CAPE), | |
| 92 | -[AssemblyLine](https://www.cse-cst.gc.ca/en/assemblyline), | |
| 110 | +[MacroMilter](https://github.com/sbidy/MacroMilter), | |
| 93 | 111 | [malshare.io](https://malshare.io), |
| 94 | -[Malware Repository Framework (MRF)](https://www.adlice.com/download/mrf/), | |
| 95 | 112 | [malware-repo](https://github.com/Tigzy/malware-repo), |
| 96 | -[Vba2Graph](https://github.com/MalwareCantFly/Vba2Graph), | |
| 113 | +[Malware Repository Framework (MRF)](https://www.adlice.com/download/mrf/), | |
| 114 | +[olefy](https://github.com/HeinleinSupport/olefy), | |
| 115 | +[PeekabooAV](https://github.com/scVENUS/PeekabooAV), | |
| 116 | +[pcodedmp](https://github.com/bontchev/pcodedmp), | |
| 117 | +[PyCIRCLean](https://github.com/CIRCL/PyCIRCLean), | |
| 118 | +[REMnux](https://remnux.org/), | |
| 119 | +[Snake](https://github.com/countercept/snake), | |
| 120 | +[SNDBOX](https://app.sndbox.com), | |
| 97 | 121 | [Strelka](https://github.com/target/strelka), |
| 122 | +[stoQ](https://stoq.punchcyber.com/), | |
| 123 | +[TheHive/Cortex](https://github.com/TheHive-Project/Cortex-Analyzers), | |
| 124 | +[Vba2Graph](https://github.com/MalwareCantFly/Vba2Graph), | |
| 125 | +[Viper](http://viper.li/), | |
| 126 | +[ViperMonkey](https://github.com/decalage2/ViperMonkey), | |
| 127 | +[YOMI](https://yomi.yoroi.company), | |
| 98 | 128 | and probably [VirusTotal](https://www.virustotal.com). |
| 99 | 129 | And quite a few [other projects on GitHub](https://github.com/search?q=oletools&type=Repositories). |
| 100 | 130 | (Please [contact me]((http://decalage.info/contact)) if you have or know |
| ... | ... | @@ -149,7 +179,7 @@ License |
| 149 | 179 | This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files |
| 150 | 180 | published with their own license. |
| 151 | 181 | |
| 152 | -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (http://www.decalage.info) | |
| 182 | +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info) | |
| 153 | 183 | |
| 154 | 184 | All rights reserved. |
| 155 | 185 | ... | ... |
oletools/LICENSE.txt
| 1 | -LICENSE for the python-oletools package: | |
| 2 | - | |
| 3 | -This license applies to the python-oletools package, apart from the thirdparty | |
| 4 | -folder which contains third-party files published with their own license. | |
| 5 | - | |
| 6 | -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (http://www.decalage.info) | |
| 7 | - | |
| 8 | -All rights reserved. | |
| 9 | - | |
| 10 | -Redistribution and use in source and binary forms, with or without modification, | |
| 11 | -are permitted provided that the following conditions are met: | |
| 12 | - | |
| 13 | - * Redistributions of source code must retain the above copyright notice, this | |
| 14 | - list of conditions and the following disclaimer. | |
| 15 | - * Redistributions in binary form must reproduce the above copyright notice, | |
| 16 | - this list of conditions and the following disclaimer in the documentation | |
| 17 | - and/or other materials provided with the distribution. | |
| 18 | - | |
| 19 | -THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |
| 20 | -ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
| 21 | -WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |
| 22 | -DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | |
| 23 | -FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
| 24 | -DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | |
| 25 | -SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | |
| 26 | -CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | |
| 27 | -OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
| 28 | -OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
| 29 | - | |
| 30 | - | |
| 31 | ----------- | |
| 32 | - | |
| 33 | -olevba contains modified source code from the officeparser project, published | |
| 34 | -under the following MIT License (MIT): | |
| 35 | - | |
| 36 | -officeparser is copyright (c) 2014 John William Davison | |
| 37 | - | |
| 38 | -Permission is hereby granted, free of charge, to any person obtaining a copy | |
| 39 | -of this software and associated documentation files (the "Software"), to deal | |
| 40 | -in the Software without restriction, including without limitation the rights | |
| 41 | -to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
| 42 | -copies of the Software, and to permit persons to whom the Software is | |
| 43 | -furnished to do so, subject to the following conditions: | |
| 44 | - | |
| 45 | -The above copyright notice and this permission notice shall be included in all | |
| 46 | -copies or substantial portions of the Software. | |
| 47 | - | |
| 48 | -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
| 49 | -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
| 50 | -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
| 51 | -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
| 52 | -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
| 53 | -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |
| 54 | -SOFTWARE. | |
| 1 | +LICENSE for the python-oletools package: | |
| 2 | + | |
| 3 | +This license applies to the python-oletools package, apart from the thirdparty | |
| 4 | +folder which contains third-party files published with their own license. | |
| 5 | + | |
| 6 | +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info) | |
| 7 | + | |
| 8 | +All rights reserved. | |
| 9 | + | |
| 10 | +Redistribution and use in source and binary forms, with or without modification, | |
| 11 | +are permitted provided that the following conditions are met: | |
| 12 | + | |
| 13 | + * Redistributions of source code must retain the above copyright notice, this | |
| 14 | + list of conditions and the following disclaimer. | |
| 15 | + * Redistributions in binary form must reproduce the above copyright notice, | |
| 16 | + this list of conditions and the following disclaimer in the documentation | |
| 17 | + and/or other materials provided with the distribution. | |
| 18 | + | |
| 19 | +THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |
| 20 | +ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
| 21 | +WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |
| 22 | +DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | |
| 23 | +FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
| 24 | +DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | |
| 25 | +SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | |
| 26 | +CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | |
| 27 | +OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
| 28 | +OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
| 29 | + | |
| 30 | + | |
| 31 | +---------- | |
| 32 | + | |
| 33 | +olevba contains modified source code from the officeparser project, published | |
| 34 | +under the following MIT License (MIT): | |
| 35 | + | |
| 36 | +officeparser is copyright (c) 2014 John William Davison | |
| 37 | + | |
| 38 | +Permission is hereby granted, free of charge, to any person obtaining a copy | |
| 39 | +of this software and associated documentation files (the "Software"), to deal | |
| 40 | +in the Software without restriction, including without limitation the rights | |
| 41 | +to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
| 42 | +copies of the Software, and to permit persons to whom the Software is | |
| 43 | +furnished to do so, subject to the following conditions: | |
| 44 | + | |
| 45 | +The above copyright notice and this permission notice shall be included in all | |
| 46 | +copies or substantial portions of the Software. | |
| 47 | + | |
| 48 | +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
| 49 | +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
| 50 | +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
| 51 | +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
| 52 | +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
| 53 | +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |
| 54 | +SOFTWARE. | ... | ... |
oletools/README.html
| ... | ... | @@ -17,13 +17,33 @@ |
| 17 | 17 | </head> |
| 18 | 18 | <body> |
| 19 | 19 | <h1 id="python-oletools">python-oletools</h1> |
| 20 | -<p><a href="https://pypi.org/project/oletools/"><img src="https://img.shields.io/pypi/v/oletools.svg" alt="PyPI" /></a> <a href="https://travis-ci.org/decalage2/oletools"><img src="https://travis-ci.org/decalage2/oletools.svg?branch=master" alt="Build Status" /></a></p> | |
| 20 | +<p><a href="https://pypi.org/project/oletools/"><img src="https://img.shields.io/pypi/v/oletools.svg" alt="PyPI" /></a> <a href="https://travis-ci.org/decalage2/oletools"><img src="https://travis-ci.org/decalage2/oletools.svg?branch=master" alt="Build Status" /></a> <a href="https://saythanks.io/to/decalage2"><img src="https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg" alt="Say Thanks!" /></a></p> | |
| 21 | 21 | <p><a href="http://www.decalage.info/python/oletools">oletools</a> is a package of python tools to analyze <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on the <a href="http://www.decalage.info/olefile">olefile</a> parser. See <a href="http://www.decalage.info/python/oletools" class="uri">http://www.decalage.info/python/oletools</a> for more info.</p> |
| 22 | 22 | <p><strong>Quick links:</strong> <a href="http://www.decalage.info/python/oletools">Home page</a> - <a href="https://github.com/decalage2/oletools/wiki/Install">Download/Install</a> - <a href="https://github.com/decalage2/oletools/wiki">Documentation</a> - <a href="https://github.com/decalage2/oletools/issues">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the Author</a> - <a href="https://github.com/decalage2/oletools">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a> <a href="https://github.com/decalage2/oletools/blob/master/cheatsheet/oletools_cheatsheet.pdf">Cheatsheet</a></p> |
| 23 | 23 | <p>Note: python-oletools is not related to OLETools published by BeCubed Software.</p> |
| 24 | 24 | <h2 id="news">News</h2> |
| 25 | 25 | <ul> |
| 26 | -<li><strong>2018-05-30 v0.53</strong>: | |
| 26 | +<li><strong>2019-05-22 v0.54.2</strong>: | |
| 27 | +<ul> | |
| 28 | +<li>bugfix release: fixed several issues related to encrypted documents and XLM/XLF Excel 4 macros</li> | |
| 29 | +<li>msoffcrypto-tool is now installed by default to handle encrypted documents</li> | |
| 30 | +<li>olevba and msodde now handle documents encrypted with common passwords such as 123, 1234, 4321, 12345, 123456, VelvetSweatShop automatically.</li> | |
| 31 | +</ul></li> | |
| 32 | +<li><strong>2019-04-04 v0.54</strong>: | |
| 33 | +<ul> | |
| 34 | +<li>olevba, msodde: added support for encrypted MS Office files</li> | |
| 35 | +<li>olevba: added detection and extraction of XLM/XLF Excel 4 macros (thanks to plugin_biff from Didier Stevens' oledump)</li> | |
| 36 | +<li>olevba, mraptor: added detection of VBA running Excel 4 macros</li> | |
| 37 | +<li>olevba: detect and display special characters such as backspace</li> | |
| 38 | +<li>olevba: colorized output showing suspicious keywords in the VBA code</li> | |
| 39 | +<li>olevba, mraptor: full Python 3 compatibility, no separate olevba3/mraptor3 anymore</li> | |
| 40 | +<li>olevba: improved handling of code pages and unicode</li> | |
| 41 | +<li>olevba: fixed a false-positive in VBA macro detection</li> | |
| 42 | +<li>rtfobj: improved OLE Package handling, improved Equation object detection</li> | |
| 43 | +<li>oleobj: added detection of external links to objects in OpenXML</li> | |
| 44 | +<li>replaced third party packages by PyPI dependencies</li> | |
| 45 | +</ul></li> | |
| 46 | +<li>2018-05-30 v0.53: | |
| 27 | 47 | <ul> |
| 28 | 48 | <li>olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML files (aka Flat OPC format)</li> |
| 29 | 49 | <li>improved support for VBA forms in olevba (oleform)</li> |
| ... | ... | @@ -66,7 +86,7 @@ |
| 66 | 86 | <li><a href="https://github.com/decalage2/oletools/wiki/olemap">olemap</a>: to display a map of all the sectors in an OLE file.</li> |
| 67 | 87 | </ul> |
| 68 | 88 | <h2 id="projects-using-oletools">Projects using oletools:</h2> |
| 69 | -<p>oletools are used by a number of projects and online malware analysis services, including <a href="http://viper.li/">Viper</a>, <a href="https://remnux.org/">REMnux</a>, <a href="https://certsocietegenerale.github.io/fame/">FAME</a>, <a href="https://www.hybrid-analysis.com/">Hybrid-analysis.com</a>, <a href="https://www.document-analyzer.net/">Joe Sandbox</a>, <a href="https://sandbox.deepviz.com/">Deepviz</a>, <a href="https://github.com/lmco/laikaboss">Laika BOSS</a>, <a href="https://github.com/cuckoosandbox/cuckoo">Cuckoo Sandbox</a>, <a href="https://sandbox.anlyz.io/">Anlyz.io</a>, <a href="https://github.com/decalage2/ViperMonkey">ViperMonkey</a>, <a href="https://github.com/bontchev/pcodedmp">pcodedmp</a>, <a href="https://dridex.malwareconfig.com">dridex.malwareconfig.com</a>, <a href="https://github.com/countercept/snake">Snake</a>, <a href="https://github.com/cryps1s/DARKSURGEON">DARKSURGEON</a>, and probably <a href="https://www.virustotal.com">VirusTotal</a>. (Please <a href="(http://decalage.info/contact)">contact me</a> if you have or know a project using oletools)</p> | |
| 89 | +<p>oletools are used by a number of projects and online malware analysis services, including <a href="http://viper.li/">Viper</a>, <a href="https://remnux.org/">REMnux</a>, <a href="https://github.com/fireeye/flare-vm">FLARE-VM</a>, <a href="https://certsocietegenerale.github.io/fame/">FAME</a>, <a href="https://www.hybrid-analysis.com/">Hybrid-analysis.com</a>, <a href="https://www.document-analyzer.net/">Joe Sandbox</a>, <a href="https://sandbox.deepviz.com/">Deepviz</a>, <a href="https://github.com/lmco/laikaboss">Laika BOSS</a>, <a href="https://github.com/cuckoosandbox/cuckoo">Cuckoo Sandbox</a>, <a href="https://sandbox.anlyz.io/">Anlyz.io</a>, <a href="https://github.com/decalage2/ViperMonkey">ViperMonkey</a>, <a href="https://github.com/bontchev/pcodedmp">pcodedmp</a>, <a href="https://dridex.malwareconfig.com">dridex.malwareconfig.com</a>, <a href="https://github.com/countercept/snake">Snake</a>, <a href="https://github.com/cryps1s/DARKSURGEON">DARKSURGEON</a>, <a href="https://github.com/ctxis/CAPE">CAPE</a>, <a href="https://www.cse-cst.gc.ca/en/assemblyline">AssemblyLine</a>, <a href="https://malshare.io">malshare.io</a>, <a href="https://www.adlice.com/download/mrf/">Malware Repository Framework (MRF)</a>, <a href="https://github.com/Tigzy/malware-repo">malware-repo</a>, <a href="https://github.com/MalwareCantFly/Vba2Graph">Vba2Graph</a>, <a href="https://github.com/target/strelka">Strelka</a>, <a href="https://stoq.punchcyber.com/">stoQ</a>, <a href="https://yomi.yoroi.company">YOMI</a>, and probably <a href="https://www.virustotal.com">VirusTotal</a>. And quite a few <a href="https://github.com/search?q=oletools&type=Repositories">other projects on GitHub</a>. (Please <a href="(http://decalage.info/contact)">contact me</a> if you have or know a project using oletools)</p> | |
| 70 | 90 | <h2 id="download-and-install">Download and Install:</h2> |
| 71 | 91 | <p>The recommended way to download and install/update the <strong>latest stable release</strong> of oletools is to use <a href="https://pip.pypa.io/en/stable/installing/">pip</a>:</p> |
| 72 | 92 | <ul> |
| ... | ... | @@ -89,7 +109,7 @@ |
| 89 | 109 | <p>The code is available in <a href="https://github.com/decalage2/oletools">a GitHub repository</a>. You may use it to submit enhancements using forks and pull requests.</p> |
| 90 | 110 | <h2 id="license">License</h2> |
| 91 | 111 | <p>This license applies to the python-oletools package, apart from the thirdparty folder which contains third-party files published with their own license.</p> |
| 92 | -<p>The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (http://www.decalage.info)</p> | |
| 112 | +<p>The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (http://www.decalage.info)</p> | |
| 93 | 113 | <p>All rights reserved.</p> |
| 94 | 114 | <p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p> |
| 95 | 115 | <ul> | ... | ... |
oletools/README.rst
| 1 | 1 | python-oletools |
| 2 | 2 | =============== |
| 3 | 3 | |
| 4 | -|PyPI| |Build Status| | |
| 4 | +|PyPI| |Build Status| |Say Thanks!| | |
| 5 | 5 | |
| 6 | 6 | `oletools <http://www.decalage.info/python/oletools>`__ is a package of |
| 7 | 7 | python tools to analyze `Microsoft OLE2 |
| ... | ... | @@ -29,7 +29,35 @@ Software. |
| 29 | 29 | News |
| 30 | 30 | ---- |
| 31 | 31 | |
| 32 | -- **2018-05-30 v0.53**: | |
| 32 | +- **2019-05-22 v0.54.2**: | |
| 33 | + | |
| 34 | + - bugfix release: fixed several issues related to encrypted | |
| 35 | + documents and XLM/XLF Excel 4 macros | |
| 36 | + - msoffcrypto-tool is now installed by default to handle encrypted | |
| 37 | + documents | |
| 38 | + - olevba and msodde now handle documents encrypted with common | |
| 39 | + passwords such as 123, 1234, 4321, 12345, 123456, VelvetSweatShop | |
| 40 | + automatically. | |
| 41 | + | |
| 42 | +- **2019-04-04 v0.54**: | |
| 43 | + | |
| 44 | + - olevba, msodde: added support for encrypted MS Office files | |
| 45 | + - olevba: added detection and extraction of XLM/XLF Excel 4 macros | |
| 46 | + (thanks to plugin_biff from Didier Stevens' oledump) | |
| 47 | + - olevba, mraptor: added detection of VBA running Excel 4 macros | |
| 48 | + - olevba: detect and display special characters such as backspace | |
| 49 | + - olevba: colorized output showing suspicious keywords in the VBA | |
| 50 | + code | |
| 51 | + - olevba, mraptor: full Python 3 compatibility, no separate | |
| 52 | + olevba3/mraptor3 anymore | |
| 53 | + - olevba: improved handling of code pages and unicode | |
| 54 | + - olevba: fixed a false-positive in VBA macro detection | |
| 55 | + - rtfobj: improved OLE Package handling, improved Equation object | |
| 56 | + detection | |
| 57 | + - oleobj: added detection of external links to objects in OpenXML | |
| 58 | + - replaced third party packages by PyPI dependencies | |
| 59 | + | |
| 60 | +- 2018-05-30 v0.53: | |
| 33 | 61 | |
| 34 | 62 | - olevba and mraptor can now parse Word/PowerPoint 2007+ pure XML |
| 35 | 63 | files (aka Flat OPC format) |
| ... | ... | @@ -115,6 +143,7 @@ Projects using oletools: |
| 115 | 143 | oletools are used by a number of projects and online malware analysis |
| 116 | 144 | services, including `Viper <http://viper.li/>`__, |
| 117 | 145 | `REMnux <https://remnux.org/>`__, |
| 146 | +`FLARE-VM <https://github.com/fireeye/flare-vm>`__, | |
| 118 | 147 | `FAME <https://certsocietegenerale.github.io/fame/>`__, |
| 119 | 148 | `Hybrid-analysis.com <https://www.hybrid-analysis.com/>`__, `Joe |
| 120 | 149 | Sandbox <https://www.document-analyzer.net/>`__, |
| ... | ... | @@ -126,10 +155,21 @@ Sandbox <https://github.com/cuckoosandbox/cuckoo>`__, |
| 126 | 155 | `pcodedmp <https://github.com/bontchev/pcodedmp>`__, |
| 127 | 156 | `dridex.malwareconfig.com <https://dridex.malwareconfig.com>`__, |
| 128 | 157 | `Snake <https://github.com/countercept/snake>`__, |
| 129 | -`DARKSURGEON <https://github.com/cryps1s/DARKSURGEON>`__, and probably | |
| 130 | -`VirusTotal <https://www.virustotal.com>`__. (Please `contact | |
| 131 | -me <(http://decalage.info/contact)>`__ if you have or know a project | |
| 132 | -using oletools) | |
| 158 | +`DARKSURGEON <https://github.com/cryps1s/DARKSURGEON>`__, | |
| 159 | +`CAPE <https://github.com/ctxis/CAPE>`__, | |
| 160 | +`AssemblyLine <https://www.cse-cst.gc.ca/en/assemblyline>`__, | |
| 161 | +`malshare.io <https://malshare.io>`__, `Malware Repository Framework | |
| 162 | +(MRF) <https://www.adlice.com/download/mrf/>`__, | |
| 163 | +`malware-repo <https://github.com/Tigzy/malware-repo>`__, | |
| 164 | +`Vba2Graph <https://github.com/MalwareCantFly/Vba2Graph>`__, | |
| 165 | +`Strelka <https://github.com/target/strelka>`__, | |
| 166 | +`stoQ <https://stoq.punchcyber.com/>`__, | |
| 167 | +`YOMI <https://yomi.yoroi.company>`__, and probably | |
| 168 | +`VirusTotal <https://www.virustotal.com>`__. And quite a few `other | |
| 169 | +projects on | |
| 170 | +GitHub <https://github.com/search?q=oletools&type=Repositories>`__. | |
| 171 | +(Please `contact me <(http://decalage.info/contact)>`__ if you have or | |
| 172 | +know a project using oletools) | |
| 133 | 173 | |
| 134 | 174 | Download and Install: |
| 135 | 175 | --------------------- |
| ... | ... | @@ -186,7 +226,7 @@ This license applies to the python-oletools package, apart from the |
| 186 | 226 | thirdparty folder which contains third-party files published with their |
| 187 | 227 | own license. |
| 188 | 228 | |
| 189 | -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec | |
| 229 | +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec | |
| 190 | 230 | (http://www.decalage.info) |
| 191 | 231 | |
| 192 | 232 | All rights reserved. |
| ... | ... | @@ -243,3 +283,5 @@ SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. |
| 243 | 283 | :target: https://pypi.org/project/oletools/ |
| 244 | 284 | .. |Build Status| image:: https://travis-ci.org/decalage2/oletools.svg?branch=master |
| 245 | 285 | :target: https://travis-ci.org/decalage2/oletools |
| 286 | +.. |Say Thanks!| image:: https://img.shields.io/badge/Say%20Thanks-!-1EAEDB.svg | |
| 287 | + :target: https://saythanks.io/to/decalage2 | ... | ... |
oletools/common/clsid.py
| ... | ... | @@ -12,7 +12,7 @@ http://www.decalage.info/python/oletools |
| 12 | 12 | |
| 13 | 13 | #=== LICENSE ================================================================== |
| 14 | 14 | |
| 15 | -# oletools are copyright (c) 2018 Philippe Lagadec (http://www.decalage.info) | |
| 15 | +# oletools are copyright (c) 2018-2019 Philippe Lagadec (http://www.decalage.info) | |
| 16 | 16 | # All rights reserved. |
| 17 | 17 | # |
| 18 | 18 | # Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -43,7 +43,7 @@ http://www.decalage.info/python/oletools |
| 43 | 43 | # 2018-04-18 PL: - added known-bad CLSIDs from Cuckoo sandbox (issue #290) |
| 44 | 44 | # 2018-05-08 PL: - added more CLSIDs (issues #299, #304), merged and sorted |
| 45 | 45 | |
| 46 | -__version__ = '0.54dev3' | |
| 46 | +__version__ = '0.54' | |
| 47 | 47 | |
| 48 | 48 | |
| 49 | 49 | # REFERENCES: |
| ... | ... | @@ -137,9 +137,23 @@ KNOWN_CLSIDS = { |
| 137 | 137 | '85131630-480C-11D2-B1F9-00C04F86C324': 'scrrun.dll - JS File Host Encode Object (ProgID: JSFile.HostEncode)', |
| 138 | 138 | '85131631-480C-11D2-B1F9-00C04F86C324': 'scrrun.dll - VBS File Host Encode Object (ProgID: VBSFile.HostEncode)', |
| 139 | 139 | '8627E73B-B5AA-4643-A3B0-570EDA17E3E7': 'UmOutlookAddin.ButtonBar (potential exploit document CVE-2016-0042 / MS16-014)', |
| 140 | + '88D969E5-F192-11D4-A65F-0040963251E5': 'Msxml2.DOMDocument.5.0', | |
| 141 | + '88D969E9-F192-11D4-A65F-0040963251E5': 'Msxml2.DSOControl.5.0', | |
| 142 | + '88D969E6-F192-11D4-A65F-0040963251E5': 'Msxml2.FreeThreadedDOMDocument.5.0', | |
| 143 | + '88D969F5-F192-11D4-A65F-0040963251E5': 'Msxml2.MXDigitalSignature.5.0', | |
| 144 | + '88D969F0-F192-11D4-A65F-0040963251E5': 'Msxml2.MXHTMLWriter.5.0', | |
| 145 | + '88D969F1-F192-11D4-A65F-0040963251E5': 'Msxml2.MXNamespaceManager.5.0', | |
| 146 | + '88D969EF-F192-11D4-A65F-0040963251E5': 'Msxml2.MXXMLWriter.5.0', | |
| 147 | + '88D969EE-F192-11D4-A65F-0040963251E5': 'Msxml2.SAXAttributes.5.0', | |
| 148 | + '88D969EC-8B8B-4C3D-859E-AF6CD158BE0F': 'Msxml2.SAXXMLReader.5.0', | |
| 149 | + '88D969EB-F192-11D4-A65F-0040963251E5': 'Msxml2.ServerXMLHTTP.5.0', | |
| 150 | + '88D969EA-F192-11D4-A65F-0040963251E5': 'Msxml2.XMLHTTP.5.0', | |
| 151 | + '88D969E7-F192-11D4-A65F-0040963251E5': 'Msxml2.XMLSchemaCache.5.0', | |
| 152 | + '88D969E8-F192-11D4-A65F-0040963251E5': 'Msxml2.XSLTemplate.5.0', | |
| 140 | 153 | '8E75D913-3D21-11D2-85C4-080009A0C626': 'AutoCAD 2004-2006 Document', |
| 141 | 154 | '9181DC5F-E07D-418A-ACA6-8EEA1ECB8E9E': 'MSCOMCTL.TreeCtrl (may trigger CVE-2012-0158)', |
| 142 | 155 | '975797FC-4E2A-11D0-B702-00C04FD8DBF7': 'Loads ELSEXT.DLL (Known Related to CVE-2015-6128)', |
| 156 | + '978C9E23-D4B0-11CE-BF2D-00AA003F40D0': 'Microsoft Forms 2.0 Label (Forms.Label.1)', | |
| 143 | 157 | '996BF5E0-8044-4650-ADEB-0B013914E99C': 'MSCOMCTL.ListViewCtrl (may trigger CVE-2012-0158)', |
| 144 | 158 | 'A08A033D-1A75-4AB6-A166-EAD02F547959': 'otkloadr WRAssembly Object (can be used to bypass ASLR after triggering an exploit)', |
| 145 | 159 | 'B54F3741-5B07-11CF-A4B0-00AA004A55E8': 'vbscript.dll - VB Script Language (ProgID: VBS, VBScript)', | ... | ... |
oletools/common/codepages.py
0 → 100644
| 1 | +""" | |
| 2 | +codepages.py | |
| 3 | + | |
| 4 | +codepages is a python module to map code pages (numbers) to Python codecs, | |
| 5 | +in order to decode bytes to unicode. | |
| 6 | +It also provides the name/description of code pages. | |
| 7 | + | |
| 8 | +Author: Philippe Lagadec - http://www.decalage.info | |
| 9 | +License: BSD, see source code or documentation | |
| 10 | + | |
| 11 | +codepages is part of the python-oletools package: | |
| 12 | +http://www.decalage.info/python/oletools | |
| 13 | +""" | |
| 14 | + | |
| 15 | +# === LICENSE ================================================================== | |
| 16 | + | |
| 17 | +# codepages is copyright (c) 2018-2019 Philippe Lagadec (http://www.decalage.info) | |
| 18 | +# All rights reserved. | |
| 19 | +# | |
| 20 | +# Redistribution and use in source and binary forms, with or without modification, | |
| 21 | +# are permitted provided that the following conditions are met: | |
| 22 | +# | |
| 23 | +# * Redistributions of source code must retain the above copyright notice, this | |
| 24 | +# list of conditions and the following disclaimer. | |
| 25 | +# * Redistributions in binary form must reproduce the above copyright notice, | |
| 26 | +# this list of conditions and the following disclaimer in the documentation | |
| 27 | +# and/or other materials provided with the distribution. | |
| 28 | +# | |
| 29 | +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |
| 30 | +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
| 31 | +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |
| 32 | +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | |
| 33 | +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
| 34 | +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | |
| 35 | +# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | |
| 36 | +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | |
| 37 | +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
| 38 | +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
| 39 | + | |
| 40 | + | |
| 41 | +# ----------------------------------------------------------------------------- | |
| 42 | +# CHANGELOG: | |
| 43 | +# 2018-12-13 v0.54 PL: - first version | |
| 44 | +# 2019-01-30 PL: - added a few code pages from xlrd | |
| 45 | + | |
| 46 | +__version__ = '0.54' | |
| 47 | + | |
| 48 | +# ----------------------------------------------------------------------------- | |
| 49 | +# TODO: | |
| 50 | +# TODO: check also http://www.aivosto.com/articles/charsets-codepages.html | |
| 51 | +# TODO: https://en.wikipedia.org/wiki/Code_page | |
| 52 | + | |
| 53 | +# ----------------------------------------------------------------------------- | |
| 54 | +# REFERENCES: | |
| 55 | +# - https://docs.microsoft.com/en-gb/windows/desktop/Intl/code-page-identifiers | |
| 56 | + | |
| 57 | + | |
| 58 | +# --- IMPORTS ----------------------------------------------------------------- | |
| 59 | + | |
| 60 | +import codecs | |
| 61 | + | |
| 62 | +# === CONSTANTS =============================================================== | |
| 63 | + | |
| 64 | +# Code page names from https://docs.microsoft.com/en-gb/windows/desktop/Intl/code-page-identifiers | |
| 65 | +# Retrieved on the 2018-12-13 | |
| 66 | +# How it was converted to Python: | |
| 67 | +# 1) copy the table data (3 columns) from browser into Excel | |
| 68 | +# 2) use the following formula to concatenate 1st and 3rd columns: =A1 & ": " & "'" & C1 & "'," | |
| 69 | +# 3) copy from Excel into Python | |
| 70 | + | |
| 71 | +CODEPAGE_NAME = { | |
| 72 | + 37: 'IBM EBCDIC US-Canada', | |
| 73 | + 437: 'OEM United States', | |
| 74 | + 500: 'IBM EBCDIC International', | |
| 75 | + 708: 'Arabic (ASMO 708)', | |
| 76 | + 709: 'Arabic (ASMO-449+, BCON V4)', | |
| 77 | + 710: 'Arabic - Transparent Arabic', | |
| 78 | + 720: 'Arabic (Transparent ASMO); Arabic (DOS)', | |
| 79 | + 737: 'OEM Greek (formerly 437G); Greek (DOS)', | |
| 80 | + 775: 'OEM Baltic; Baltic (DOS)', | |
| 81 | + 850: 'OEM Multilingual Latin 1; Western European (DOS)', | |
| 82 | + 852: 'OEM Latin 2; Central European (DOS)', | |
| 83 | + 855: 'OEM Cyrillic (primarily Russian)', | |
| 84 | + 857: 'OEM Turkish; Turkish (DOS)', | |
| 85 | + 858: 'OEM Multilingual Latin 1 + Euro symbol', | |
| 86 | + 860: 'OEM Portuguese; Portuguese (DOS)', | |
| 87 | + 861: 'OEM Icelandic; Icelandic (DOS)', | |
| 88 | + 862: 'OEM Hebrew; Hebrew (DOS)', | |
| 89 | + 863: 'OEM French Canadian; French Canadian (DOS)', | |
| 90 | + 864: 'OEM Arabic; Arabic (864)', | |
| 91 | + 865: 'OEM Nordic; Nordic (DOS)', | |
| 92 | + 866: 'OEM Russian; Cyrillic (DOS)', | |
| 93 | + 869: 'OEM Modern Greek; Greek, Modern (DOS)', | |
| 94 | + 870: 'IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2', | |
| 95 | + 874: 'ANSI/OEM Thai (ISO 8859-11); Thai (Windows)', | |
| 96 | + 875: 'IBM EBCDIC Greek Modern', | |
| 97 | + 932: 'ANSI/OEM Japanese; Japanese (Shift-JIS)', | |
| 98 | + 936: 'ANSI/OEM Simplified Chinese (PRC, Singapore); Chinese Simplified (GB2312)', | |
| 99 | + 949: 'ANSI/OEM Korean (Unified Hangul Code)', | |
| 100 | + 950: 'ANSI/OEM Traditional Chinese (Taiwan; Hong Kong SAR, PRC); Chinese Traditional (Big5)', | |
| 101 | + 1026: 'IBM EBCDIC Turkish (Latin 5)', | |
| 102 | + 1047: 'IBM EBCDIC Latin 1/Open System', | |
| 103 | + 1140: 'IBM EBCDIC US-Canada (037 + Euro symbol); IBM EBCDIC (US-Canada-Euro)', | |
| 104 | + 1141: 'IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro)', | |
| 105 | + 1142: 'IBM EBCDIC Denmark-Norway (20277 + Euro symbol); IBM EBCDIC (Denmark-Norway-Euro)', | |
| 106 | + 1143: 'IBM EBCDIC Finland-Sweden (20278 + Euro symbol); IBM EBCDIC (Finland-Sweden-Euro)', | |
| 107 | + 1144: 'IBM EBCDIC Italy (20280 + Euro symbol); IBM EBCDIC (Italy-Euro)', | |
| 108 | + 1145: 'IBM EBCDIC Latin America-Spain (20284 + Euro symbol); IBM EBCDIC (Spain-Euro)', | |
| 109 | + 1146: 'IBM EBCDIC United Kingdom (20285 + Euro symbol); IBM EBCDIC (UK-Euro)', | |
| 110 | + 1147: 'IBM EBCDIC France (20297 + Euro symbol); IBM EBCDIC (France-Euro)', | |
| 111 | + 1148: 'IBM EBCDIC International (500 + Euro symbol); IBM EBCDIC (International-Euro)', | |
| 112 | + 1149: 'IBM EBCDIC Icelandic (20871 + Euro symbol); IBM EBCDIC (Icelandic-Euro)', | |
| 113 | + 1200: 'Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications', | |
| 114 | + 1201: 'Unicode UTF-16, big endian byte order; available only to managed applications', | |
| 115 | + 1250: 'ANSI Central European; Central European (Windows)', | |
| 116 | + 1251: 'ANSI Cyrillic; Cyrillic (Windows)', | |
| 117 | + 1252: 'ANSI Latin 1; Western European (Windows)', | |
| 118 | + 1253: 'ANSI Greek; Greek (Windows)', | |
| 119 | + 1254: 'ANSI Turkish; Turkish (Windows)', | |
| 120 | + 1255: 'ANSI Hebrew; Hebrew (Windows)', | |
| 121 | + 1256: 'ANSI Arabic; Arabic (Windows)', | |
| 122 | + 1257: 'ANSI Baltic; Baltic (Windows)', | |
| 123 | + 1258: 'ANSI/OEM Vietnamese; Vietnamese (Windows)', | |
| 124 | + 1361: 'Korean (Johab)', | |
| 125 | + 10000: 'MAC Roman; Western European (Mac)', | |
| 126 | + 10001: 'Japanese (Mac)', | |
| 127 | + 10002: 'MAC Traditional Chinese (Big5); Chinese Traditional (Mac)', | |
| 128 | + 10003: 'Korean (Mac)', | |
| 129 | + 10004: 'Arabic (Mac)', | |
| 130 | + 10005: 'Hebrew (Mac)', | |
| 131 | + 10006: 'Greek (Mac)', | |
| 132 | + 10007: 'Cyrillic (Mac)', | |
| 133 | + 10008: 'MAC Simplified Chinese (GB 2312); Chinese Simplified (Mac)', | |
| 134 | + 10010: 'Romanian (Mac)', | |
| 135 | + 10017: 'Ukrainian (Mac)', | |
| 136 | + 10021: 'Thai (Mac)', | |
| 137 | + 10029: 'MAC Latin 2; Central European (Mac)', | |
| 138 | + 10079: 'Icelandic (Mac)', | |
| 139 | + 10081: 'Turkish (Mac)', | |
| 140 | + 10082: 'Croatian (Mac)', | |
| 141 | + 12000: 'Unicode UTF-32, little endian byte order; available only to managed applications', | |
| 142 | + 12001: 'Unicode UTF-32, big endian byte order; available only to managed applications', | |
| 143 | + 20000: 'CNS Taiwan; Chinese Traditional (CNS)', | |
| 144 | + 20001: 'TCA Taiwan', | |
| 145 | + 20002: 'Eten Taiwan; Chinese Traditional (Eten)', | |
| 146 | + 20003: 'IBM5550 Taiwan', | |
| 147 | + 20004: 'TeleText Taiwan', | |
| 148 | + 20005: 'Wang Taiwan', | |
| 149 | + 20105: 'IA5 (IRV International Alphabet No. 5, 7-bit); Western European (IA5)', | |
| 150 | + 20106: 'IA5 German (7-bit)', | |
| 151 | + 20107: 'IA5 Swedish (7-bit)', | |
| 152 | + 20108: 'IA5 Norwegian (7-bit)', | |
| 153 | + 20127: 'US-ASCII (7-bit)', | |
| 154 | + 20261: 'T.61', | |
| 155 | + 20269: 'ISO 6937 Non-Spacing Accent', | |
| 156 | + 20273: 'IBM EBCDIC Germany', | |
| 157 | + 20277: 'IBM EBCDIC Denmark-Norway', | |
| 158 | + 20278: 'IBM EBCDIC Finland-Sweden', | |
| 159 | + 20280: 'IBM EBCDIC Italy', | |
| 160 | + 20284: 'IBM EBCDIC Latin America-Spain', | |
| 161 | + 20285: 'IBM EBCDIC United Kingdom', | |
| 162 | + 20290: 'IBM EBCDIC Japanese Katakana Extended', | |
| 163 | + 20297: 'IBM EBCDIC France', | |
| 164 | + 20420: 'IBM EBCDIC Arabic', | |
| 165 | + 20423: 'IBM EBCDIC Greek', | |
| 166 | + 20424: 'IBM EBCDIC Hebrew', | |
| 167 | + 20833: 'IBM EBCDIC Korean Extended', | |
| 168 | + 20838: 'IBM EBCDIC Thai', | |
| 169 | + 20866: 'Russian (KOI8-R); Cyrillic (KOI8-R)', | |
| 170 | + 20871: 'IBM EBCDIC Icelandic', | |
| 171 | + 20880: 'IBM EBCDIC Cyrillic Russian', | |
| 172 | + 20905: 'IBM EBCDIC Turkish', | |
| 173 | + 20924: 'IBM EBCDIC Latin 1/Open System (1047 + Euro symbol)', | |
| 174 | + 20932: 'Japanese (JIS 0208-1990 and 0212-1990)', | |
| 175 | + 20936: 'Simplified Chinese (GB2312); Chinese Simplified (GB2312-80)', | |
| 176 | + 20949: 'Korean Wansung', | |
| 177 | + 21025: 'IBM EBCDIC Cyrillic Serbian-Bulgarian', | |
| 178 | + 21027: '(deprecated)', | |
| 179 | + 21866: 'Ukrainian (KOI8-U); Cyrillic (KOI8-U)', | |
| 180 | + 28591: 'ISO 8859-1 Latin 1; Western European (ISO)', | |
| 181 | + 28592: 'ISO 8859-2 Central European; Central European (ISO)', | |
| 182 | + 28593: 'ISO 8859-3 Latin 3', | |
| 183 | + 28594: 'ISO 8859-4 Baltic', | |
| 184 | + 28595: 'ISO 8859-5 Cyrillic', | |
| 185 | + 28596: 'ISO 8859-6 Arabic', | |
| 186 | + 28597: 'ISO 8859-7 Greek', | |
| 187 | + 28598: 'ISO 8859-8 Hebrew; Hebrew (ISO-Visual)', | |
| 188 | + 28599: 'ISO 8859-9 Turkish', | |
| 189 | + 28603: 'ISO 8859-13 Estonian', | |
| 190 | + 28605: 'ISO 8859-15 Latin 9', | |
| 191 | + 29001: 'Europa 3', | |
| 192 | + 38598: 'ISO 8859-8 Hebrew; Hebrew (ISO-Logical)', | |
| 193 | + 50220: 'ISO 2022 Japanese with no halfwidth Katakana; Japanese (JIS)', | |
| 194 | + 50221: 'ISO 2022 Japanese with halfwidth Katakana; Japanese (JIS-Allow 1 byte Kana)', | |
| 195 | + 50222: 'ISO 2022 Japanese JIS X 0201-1989; Japanese (JIS-Allow 1 byte Kana - SO/SI)', | |
| 196 | + 50225: 'ISO 2022 Korean', | |
| 197 | + 50227: 'ISO 2022 Simplified Chinese; Chinese Simplified (ISO 2022)', | |
| 198 | + 50229: 'ISO 2022 Traditional Chinese', | |
| 199 | + 50930: 'EBCDIC Japanese (Katakana) Extended', | |
| 200 | + 50931: 'EBCDIC US-Canada and Japanese', | |
| 201 | + 50933: 'EBCDIC Korean Extended and Korean', | |
| 202 | + 50935: 'EBCDIC Simplified Chinese Extended and Simplified Chinese', | |
| 203 | + 50936: 'EBCDIC Simplified Chinese', | |
| 204 | + 50937: 'EBCDIC US-Canada and Traditional Chinese', | |
| 205 | + 50939: 'EBCDIC Japanese (Latin) Extended and Japanese', | |
| 206 | + 51932: 'EUC Japanese', | |
| 207 | + 51936: 'EUC Simplified Chinese; Chinese Simplified (EUC)', | |
| 208 | + 51949: 'EUC Korean', | |
| 209 | + 51950: 'EUC Traditional Chinese', | |
| 210 | + 52936: 'HZ-GB2312 Simplified Chinese; Chinese Simplified (HZ)', | |
| 211 | + 54936: 'Windows XP and later: GB18030 Simplified Chinese (4 byte); Chinese Simplified (GB18030)', | |
| 212 | + 57002: 'ISCII Devanagari', | |
| 213 | + 57003: 'ISCII Bangla', | |
| 214 | + 57004: 'ISCII Tamil', | |
| 215 | + 57005: 'ISCII Telugu', | |
| 216 | + 57006: 'ISCII Assamese', | |
| 217 | + 57007: 'ISCII Odia', | |
| 218 | + 57008: 'ISCII Kannada', | |
| 219 | + 57009: 'ISCII Malayalam', | |
| 220 | + 57010: 'ISCII Gujarati', | |
| 221 | + 57011: 'ISCII Punjabi', | |
| 222 | + 65000: 'Unicode (UTF-7)', | |
| 223 | + 65001: 'Unicode (UTF-8)', | |
| 224 | +} | |
| 225 | + | |
| 226 | + | |
| 227 | +# Mapping from codepages to Python codecs, when 'cpXXX' does not work | |
| 228 | +# (inspired from http://stackoverflow.com/questions/1592925/decoding-mac-os-text-in-python) | |
| 229 | +CODEPAGE_TO_CODEC = { | |
| 230 | + 37: 'cp037', | |
| 231 | + 708: 'arabic', # not found: Arabic (ASMO 708) => arabic = iso-8859-6 | |
| 232 | + 709: 'arabic', # not found: Arabic (ASMO-449+, BCON V4) => arabic = iso-8859-6 | |
| 233 | + 710: 'arabic', # not found: Arabic - Transparent Arabic => arabic = iso-8859-6 | |
| 234 | + 870: 'latin2', # IBM EBCDIC Multilingual/ROECE (Latin 2); IBM EBCDIC Multilingual Latin 2 | |
| 235 | + 1047: 'latin1', # IBM EBCDIC Latin 1/Open System | |
| 236 | + 1141: 'cp273', # IBM EBCDIC Germany (20273 + Euro symbol); IBM EBCDIC (Germany-Euro) | |
| 237 | + 1200: 'utf_16_le', # Unicode UTF-16, little endian byte order (BMP of ISO 10646); available only to managed applications | |
| 238 | + 1201: 'utf_16_be', # Unicode UTF-16, big endian byte order; available only to managed applications | |
| 239 | + | |
| 240 | + 10000: 'mac-roman', | |
| 241 | + 10001: 'shiftjis', # not found: 'mac-shift-jis', | |
| 242 | + 10002: 'big5', # not found: 'mac-big5', | |
| 243 | + 10003: 'ascii', # nothing appropriate found: 'mac-hangul', | |
| 244 | + 10004: 'mac-arabic', | |
| 245 | + 10005: 'hebrew', # not found: 'mac-hebrew', | |
| 246 | + 10006: 'mac-greek', | |
| 247 | + #10007: 'ascii', # nothing appropriate found: 'mac-russian', | |
| 248 | + 10007: 'mac_cyrillic', # guess (from xlrd) | |
| 249 | + 10008: 'gb2312', # not found: 'mac-gb2312', | |
| 250 | + 10021: 'thai', # not found: mac-thai', | |
| 251 | + #10029: 'maccentraleurope', # not found: 'mac-east europe', | |
| 252 | + 10029: 'mac_latin2', # guess (from xlrd) | |
| 253 | + 10079: 'mac_iceland', # guess (from xlrd) | |
| 254 | + 10081: 'mac-turkish', | |
| 255 | + | |
| 256 | + 12000: 'utf_32_le', # Unicode UTF-32, little endian byte order | |
| 257 | + 12001: 'utf_32_be', # Unicode UTF-32, big endian byte order | |
| 258 | + | |
| 259 | + 20127: 'ascii', | |
| 260 | + | |
| 261 | + 28591: 'latin1', | |
| 262 | + 28592: 'iso8859_2', | |
| 263 | + 28593: 'iso8859_3', | |
| 264 | + 28594: 'iso8859_4', | |
| 265 | + 28595: 'iso8859_5', | |
| 266 | + 28596: 'iso8859_6', | |
| 267 | + 28597: 'iso8859_7', | |
| 268 | + 28598: 'iso8859_8', | |
| 269 | + 28599: 'iso8859_9', | |
| 270 | + 28603: 'iso8859_13', | |
| 271 | + 28605: 'iso8859_15', | |
| 272 | + | |
| 273 | + 32768: 'mac_roman', # from xlrd | |
| 274 | + 32769: 'cp1252', # from xlrd | |
| 275 | + 38598: 'iso8859_8', | |
| 276 | + | |
| 277 | + 65000: 'utf7', | |
| 278 | + 65001: 'utf8', | |
| 279 | +} | |
| 280 | + | |
| 281 | + | |
| 282 | +# === FUNCTIONS ============================================================== | |
| 283 | + | |
| 284 | +def codepage2codec(codepage): | |
| 285 | + """ | |
| 286 | + convert a codepage number to a Python codec. | |
| 287 | + If the corresponding codec cannot be found, returns "utf8" by default. | |
| 288 | + | |
| 289 | + :param codepage: int, code page number | |
| 290 | + :return: str, Python codec name | |
| 291 | + """ | |
| 292 | + if codepage in CODEPAGE_TO_CODEC: | |
| 293 | + codec = CODEPAGE_TO_CODEC[codepage] | |
| 294 | + else: | |
| 295 | + codec = 'cp%d' % codepage | |
| 296 | + try: | |
| 297 | + codecs.lookup(codec) | |
| 298 | + except LookupError: | |
| 299 | + #log.error('Codec not found for code page %d, using UTF-8 as fallback.' % codepage) | |
| 300 | + codec = 'utf8' | |
| 301 | + return codec | |
| 302 | + | |
| 303 | + | |
| 304 | +def get_codepage_name(codepage): | |
| 305 | + """ | |
| 306 | + return the name of a codepage based on its number | |
| 307 | + :param codepage: int, codepage number | |
| 308 | + :return: str, codepage name | |
| 309 | + """ | |
| 310 | + return CODEPAGE_NAME.get(codepage, 'Unknown code page') | |
| 311 | + | |
| 312 | + | |
| 313 | +# === MAIN: TESTS ============================================================ | |
| 314 | + | |
| 315 | +if __name__ == '__main__': | |
| 316 | + for cp in sorted(CODEPAGE_NAME.keys()): | |
| 317 | + print('Code Page: %d => codec: %s - %s' % (cp, codepage2codec(cp), CODEPAGE_NAME[cp])) | |
| 0 | 318 | \ No newline at end of file | ... | ... |
oletools/common/errors.py
| ... | ... | @@ -4,10 +4,42 @@ Errors used in several tools to avoid duplication |
| 4 | 4 | .. codeauthor:: Intra2net AG <info@intra2net.com> |
| 5 | 5 | """ |
| 6 | 6 | |
| 7 | -class FileIsEncryptedError(ValueError): | |
| 7 | +class CryptoErrorBase(ValueError): | |
| 8 | + """Base class for crypto-based exceptions.""" | |
| 9 | + pass | |
| 10 | + | |
| 11 | + | |
| 12 | +class CryptoLibNotImported(CryptoErrorBase, ImportError): | |
| 13 | + """Exception thrown if msoffcrypto is needed but could not be imported.""" | |
| 14 | + | |
| 15 | + def __init__(self): | |
| 16 | + super(CryptoLibNotImported, self).__init__( | |
| 17 | + 'msoffcrypto-tools is not installed. Please run "pip install msoffcrypto-tool" or see https://github.com/nolze/msoffcrypto-tool') | |
| 18 | + | |
| 19 | + | |
| 20 | +class UnsupportedEncryptionError(CryptoErrorBase): | |
| 8 | 21 | """Exception thrown if file is encrypted and cannot deal with it.""" |
| 9 | - # see also: same class in olevba[3] and record_base | |
| 10 | 22 | def __init__(self, filename=None): |
| 11 | - super(FileIsEncryptedError, self).__init__( | |
| 23 | + super(UnsupportedEncryptionError, self).__init__( | |
| 12 | 24 | 'Office file {}is encrypted, not yet supported' |
| 13 | 25 | .format('' if filename is None else filename + ' ')) |
| 26 | + | |
| 27 | + | |
| 28 | +class WrongEncryptionPassword(CryptoErrorBase): | |
| 29 | + """Exception thrown if encryption could be handled but passwords wrong.""" | |
| 30 | + def __init__(self, filename=None): | |
| 31 | + super(WrongEncryptionPassword, self).__init__( | |
| 32 | + 'Given passwords could not decrypt office file{}, use option -p to specify the password' | |
| 33 | + .format('' if filename is None else ' ' + filename)) | |
| 34 | + | |
| 35 | + | |
| 36 | +class MaxCryptoNestingReached(CryptoErrorBase): | |
| 37 | + """ | |
| 38 | + Exception thrown if decryption is too deeply layered. | |
| 39 | + | |
| 40 | + (...or decrypt code creates inf loop) | |
| 41 | + """ | |
| 42 | + def __init__(self, n_layers, filename=None): | |
| 43 | + super(MaxCryptoNestingReached, self).__init__( | |
| 44 | + 'Encountered more than {} layers of encryption for office file{}' | |
| 45 | + .format(n_layers, '' if filename is None else ' ' + filename)) | ... | ... |
oletools/common/log_helper/_json_formatter.py
| ... | ... | @@ -13,8 +13,13 @@ class JsonFormatter(logging.Formatter): |
| 13 | 13 | Since we don't buffer messages, we always prepend messages with a comma to make |
| 14 | 14 | the output JSON-compatible. The only exception is when printing the first line, |
| 15 | 15 | so we need to keep track of it. |
| 16 | + | |
| 17 | + We assume that all input comes from the OletoolsLoggerAdapter which | |
| 18 | + ensures that there is a `type` field in the record. Otherwise will have | |
| 19 | + to add a try-except around the access to `record.type`. | |
| 16 | 20 | """ |
| 17 | - json_dict = dict(msg=record.msg, level=record.levelname) | |
| 21 | + json_dict = dict(msg=record.msg.replace('\n', ' '), level=record.levelname) | |
| 22 | + json_dict['type'] = record.type | |
| 18 | 23 | formatted_message = ' ' + json.dumps(json_dict) |
| 19 | 24 | |
| 20 | 25 | if self._is_first_line: | ... | ... |
oletools/common/log_helper/_logger_adapter.py
| ... | ... | @@ -8,18 +8,45 @@ class OletoolsLoggerAdapter(logging.LoggerAdapter): |
| 8 | 8 | """ |
| 9 | 9 | _json_enabled = None |
| 10 | 10 | |
| 11 | - def print_str(self, message): | |
| 11 | + def print_str(self, message, **kwargs): | |
| 12 | 12 | """ |
| 13 | 13 | This function replaces normal print() calls so we can format them as JSON |
| 14 | 14 | when needed or just print them right away otherwise. |
| 15 | 15 | """ |
| 16 | 16 | if self._json_enabled and self._json_enabled(): |
| 17 | 17 | # Messages from this function should always be printed, |
| 18 | - # so when using JSON we log using the same level that set | |
| 19 | - self.log(_root_logger_wrapper.level(), message) | |
| 18 | + # so when using JSON we log using the same level that set. | |
| 19 | + # Additional information in kwargs is added to LogRecord | |
| 20 | + self.log(_root_logger_wrapper.level(), message, extra=kwargs) | |
| 20 | 21 | else: |
| 21 | 22 | print(message) |
| 22 | 23 | |
| 24 | + def log(self, lvl, msg, *args, **kwargs): | |
| 25 | + """ | |
| 26 | + Run :py:meth:`process` on kwargs, then forward to actual logger. | |
| 27 | + | |
| 28 | + This is based on the logging cookbox, section "Using LoggerAdapter to | |
| 29 | + impart contextual information". | |
| 30 | + """ | |
| 31 | + msg, kwargs = self.process(msg, kwargs) | |
| 32 | + self.logger.log(lvl, msg, *args, **kwargs) | |
| 33 | + | |
| 34 | + def process(self, msg, kwargs): | |
| 35 | + """ | |
| 36 | + Ensure `kwargs['extra']['type']` exists, init with given arg `type`. | |
| 37 | + | |
| 38 | + The `type` field will be added to the :py:class:`logging.LogRecord` and | |
| 39 | + is used by the :py:class:`JsonFormatter`. | |
| 40 | + """ | |
| 41 | + if 'extra' not in kwargs: | |
| 42 | + kwargs['extra'] = {} | |
| 43 | + if 'type' in kwargs: | |
| 44 | + kwargs['extra']['type'] = kwargs['type'] | |
| 45 | + del kwargs['type'] # downstream loggers cannot deal with this | |
| 46 | + if 'type' not in kwargs['extra']: | |
| 47 | + kwargs['extra']['type'] = 'msg' # type will be added to LogRecord | |
| 48 | + return msg, kwargs | |
| 49 | + | |
| 23 | 50 | def set_json_enabled_function(self, json_enabled): |
| 24 | 51 | """ |
| 25 | 52 | Set a function to be called to check whether JSON output is enabled. | ... | ... |
oletools/crypto.py
0 → 100644
| 1 | +#!/usr/bin/env python | |
| 2 | +""" | |
| 3 | +crypto.py | |
| 4 | + | |
| 5 | +Module to be used by other scripts and modules in oletools, that provides | |
| 6 | +information on encryption in OLE files. | |
| 7 | + | |
| 8 | +Uses :py:mod:`msoffcrypto-tool` to decrypt if it is available. Otherwise | |
| 9 | +decryption will fail with an ImportError. | |
| 10 | + | |
| 11 | +Encryption/Write-Protection can be realized in many different ways. They range | |
| 12 | +from setting a single flag in an otherwise unprotected file to embedding a | |
| 13 | +regular file (e.g. xlsx) in an EncryptedStream inside an OLE file. That means | |
| 14 | +that (1) that lots of bad things are accesible even if no encryption password | |
| 15 | +is known, and (2) even basic attributes like the file type can change by | |
| 16 | +decryption. Therefore I suggest the following general routine to deal with | |
| 17 | +potentially encrypted files:: | |
| 18 | + | |
| 19 | + def script_main_function(input_file, passwords, crypto_nesting=0, args): | |
| 20 | + '''Wrapper around main function to deal with encrypted files.''' | |
| 21 | + initial_stuff(input_file, args) | |
| 22 | + result = None | |
| 23 | + try: | |
| 24 | + result = do_your_thing_assuming_no_encryption(input_file) | |
| 25 | + if not crypto.is_encrypted(input_file): | |
| 26 | + return result | |
| 27 | + except Exception: | |
| 28 | + if not crypto.is_encrypted(input_file): | |
| 29 | + raise | |
| 30 | + # we reach this point only if file is encrypted | |
| 31 | + # check if this is an encrypted file in an encrypted file in an ... | |
| 32 | + if crypto_nesting >= crypto.MAX_NESTING_DEPTH: | |
| 33 | + raise crypto.MaxCryptoNestingReached(crypto_nesting, filename) | |
| 34 | + decrypted_file = None | |
| 35 | + try: | |
| 36 | + decrypted_file = crypto.decrypt(input_file, passwords) | |
| 37 | + if decrypted_file is None: | |
| 38 | + raise crypto.WrongEncryptionPassword(input_file) | |
| 39 | + # might still be encrypted, so call this again recursively | |
| 40 | + result = script_main_function(decrypted_file, passwords, | |
| 41 | + crypto_nesting+1, args) | |
| 42 | + except Exception: | |
| 43 | + raise | |
| 44 | + finally: # clean up | |
| 45 | + try: # (maybe file was not yet created) | |
| 46 | + os.unlink(decrypted_file) | |
| 47 | + except Exception: | |
| 48 | + pass | |
| 49 | + | |
| 50 | +(Realized e.g. in :py:mod:`oletools.msodde`). | |
| 51 | +That means that caller code needs another wrapper around its main function. I | |
| 52 | +did try it another way first (a transparent on-demand unencrypt) but for the | |
| 53 | +above reasons I believe this is the better way. Also, non-top-level-code can | |
| 54 | +just assume that it works on unencrypted data and fail with an exception if | |
| 55 | +encrypted data makes its work impossible. No need to check `if is_encrypted()` | |
| 56 | +at the start of functions. | |
| 57 | + | |
| 58 | +.. seealso:: [MS-OFFCRYPTO] | |
| 59 | +.. seealso:: https://github.com/nolze/msoffcrypto-tool | |
| 60 | + | |
| 61 | +crypto is part of the python-oletools package: | |
| 62 | +http://www.decalage.info/python/oletools | |
| 63 | +""" | |
| 64 | + | |
| 65 | +# === LICENSE ================================================================= | |
| 66 | + | |
| 67 | +# crypto is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info) | |
| 68 | +# All rights reserved. | |
| 69 | +# | |
| 70 | +# Redistribution and use in source and binary forms, with or without | |
| 71 | +# modification, are permitted provided that the following conditions are met: | |
| 72 | +# | |
| 73 | +# * Redistributions of source code must retain the above copyright notice, | |
| 74 | +# this list of conditions and the following disclaimer. | |
| 75 | +# * Redistributions in binary form must reproduce the above copyright notice, | |
| 76 | +# this list of conditions and the following disclaimer in the documentation | |
| 77 | +# and/or other materials provided with the distribution. | |
| 78 | +# | |
| 79 | +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | |
| 80 | +# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | |
| 81 | +# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE | |
| 82 | +# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE | |
| 83 | +# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR | |
| 84 | +# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF | |
| 85 | +# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS | |
| 86 | +# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN | |
| 87 | +# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) | |
| 88 | +# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | |
| 89 | +# POSSIBILITY OF SUCH DAMAGE. | |
| 90 | + | |
| 91 | +# ----------------------------------------------------------------------------- | |
| 92 | +# CHANGELOG: | |
| 93 | +# 2019-02-14 v0.01 CH: - first version with encryption check from oleid | |
| 94 | +# 2019-04-01 v0.54 PL: - fixed bug in is_encrypted_ole | |
| 95 | +# 2019-05-23 PL: - added DEFAULT_PASSWORDS list | |
| 96 | + | |
| 97 | +__version__ = '0.54.2' | |
| 98 | + | |
| 99 | +import sys | |
| 100 | +import struct | |
| 101 | +import os | |
| 102 | +from os.path import splitext, isfile | |
| 103 | +from tempfile import mkstemp | |
| 104 | +import zipfile | |
| 105 | +import logging | |
| 106 | + | |
| 107 | +from olefile import OleFileIO | |
| 108 | + | |
| 109 | +try: | |
| 110 | + import msoffcrypto | |
| 111 | +except ImportError: | |
| 112 | + msoffcrypto = None | |
| 113 | + | |
| 114 | +# IMPORTANT: it should be possible to run oletools directly as scripts | |
| 115 | +# in any directory without installing them with pip or setup.py. | |
| 116 | +# In that case, relative imports are NOT usable. | |
| 117 | +# And to enable Python 2+3 compatibility, we need to use absolute imports, | |
| 118 | +# so we add the oletools parent folder to sys.path (absolute+normalized path): | |
| 119 | +_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) | |
| 120 | +_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..')) | |
| 121 | +if _parent_dir not in sys.path: | |
| 122 | + sys.path.insert(0, _parent_dir) | |
| 123 | + | |
| 124 | +from oletools.common.errors import CryptoErrorBase, WrongEncryptionPassword, \ | |
| 125 | + UnsupportedEncryptionError, MaxCryptoNestingReached, CryptoLibNotImported | |
| 126 | +from oletools.common.log_helper import log_helper | |
| 127 | + | |
| 128 | + | |
| 129 | +#: if there is an encrypted file embedded in an encrypted file, | |
| 130 | +#: how deep down do we go | |
| 131 | +MAX_NESTING_DEPTH = 10 | |
| 132 | + | |
| 133 | +# === LOGGING ================================================================= | |
| 134 | + | |
| 135 | +# TODO: use log_helper instead | |
| 136 | + | |
| 137 | +def get_logger(name, level=logging.CRITICAL+1): | |
| 138 | + """ | |
| 139 | + Create a suitable logger object for this module. | |
| 140 | + The goal is not to change settings of the root logger, to avoid getting | |
| 141 | + other modules' logs on the screen. | |
| 142 | + If a logger exists with same name, reuse it. (Else it would have duplicate | |
| 143 | + handlers and messages would be doubled.) | |
| 144 | + The level is set to CRITICAL+1 by default, to avoid any logging. | |
| 145 | + """ | |
| 146 | + # First, test if there is already a logger with the same name, else it | |
| 147 | + # will generate duplicate messages (due to duplicate handlers): | |
| 148 | + if name in logging.Logger.manager.loggerDict: | |
| 149 | + # NOTE: another less intrusive but more "hackish" solution would be to | |
| 150 | + # use getLogger then test if its effective level is not default. | |
| 151 | + logger = logging.getLogger(name) | |
| 152 | + # make sure level is OK: | |
| 153 | + logger.setLevel(level) | |
| 154 | + return logger | |
| 155 | + # get a new logger: | |
| 156 | + logger = logging.getLogger(name) | |
| 157 | + # only add a NullHandler for this logger, it is up to the application | |
| 158 | + # to configure its own logging: | |
| 159 | + logger.addHandler(logging.NullHandler()) | |
| 160 | + logger.setLevel(level) | |
| 161 | + return logger | |
| 162 | + | |
| 163 | +# a global logger object used for debugging: | |
| 164 | +log = get_logger('crypto') | |
| 165 | + | |
| 166 | +def enable_logging(): | |
| 167 | + """ | |
| 168 | + Enable logging for this module (disabled by default). | |
| 169 | + This will set the module-specific logger level to NOTSET, which | |
| 170 | + means the main application controls the actual logging level. | |
| 171 | + """ | |
| 172 | + log.setLevel(logging.NOTSET) | |
| 173 | + | |
| 174 | + | |
| 175 | +def is_encrypted(some_file): | |
| 176 | + """ | |
| 177 | + Determine whether document contains encrypted content. | |
| 178 | + | |
| 179 | + This should return False for documents that are just write-protected or | |
| 180 | + signed or finalized. It should return True if ANY content of the file is | |
| 181 | + encrypted and can therefore not be analyzed by other oletools modules | |
| 182 | + without given a password. | |
| 183 | + | |
| 184 | + Exception: there are way to write-protect an office document by embedding | |
| 185 | + it as encrypted stream with hard-coded standard password into an otherwise | |
| 186 | + empty OLE file. From an office user point of view, this is no encryption, | |
| 187 | + but regarding file structure this is encryption, so we return `True` for | |
| 188 | + these. | |
| 189 | + | |
| 190 | + This should not raise exceptions needlessly. | |
| 191 | + | |
| 192 | + This implementation is rather simple: it returns True if the file contains | |
| 193 | + streams with typical encryption names (c.f. [MS-OFFCRYPTO]). It does not | |
| 194 | + test whether these streams actually contain data or whether the ole file | |
| 195 | + structure contains the necessary references to these. It also checks the | |
| 196 | + "well-known property" PIDSI_DOC_SECURITY if the SummaryInformation stream | |
| 197 | + is accessible (c.f. [MS-OLEPS] 2.25.1) | |
| 198 | + | |
| 199 | + :param some_file: File name or an opened OleFileIO | |
| 200 | + :type some_file: :py:class:`olefile.OleFileIO` or `str` | |
| 201 | + :returns: True if (and only if) the file contains encrypted content | |
| 202 | + """ | |
| 203 | + log.debug('is_encrypted') | |
| 204 | + | |
| 205 | + # ask msoffcrypto if possible | |
| 206 | + if check_msoffcrypto(): | |
| 207 | + log.debug('Checking for encryption using msoffcrypto') | |
| 208 | + file_handle = None | |
| 209 | + file_pos = None | |
| 210 | + try: | |
| 211 | + if isinstance(some_file, OleFileIO): | |
| 212 | + # TODO: hacky, replace once msoffcrypto-tools accepts OleFileIO | |
| 213 | + file_handle = some_file.fp | |
| 214 | + file_pos = file_handle.tell() | |
| 215 | + file_handle.seek(0) | |
| 216 | + else: | |
| 217 | + file_handle = open(some_file, 'rb') | |
| 218 | + | |
| 219 | + return msoffcrypto.OfficeFile(file_handle).is_encrypted() | |
| 220 | + | |
| 221 | + except Exception as exc: | |
| 222 | + log.warning('msoffcrypto failed to interpret file {} or determine ' | |
| 223 | + 'whether it is encrypted: {}' | |
| 224 | + .format(file_handle.name, exc)) | |
| 225 | + | |
| 226 | + finally: | |
| 227 | + try: | |
| 228 | + if file_pos is not None: # input was OleFileIO | |
| 229 | + file_handle.seek(file_pos) | |
| 230 | + else: # input was file name | |
| 231 | + file_handle.close() | |
| 232 | + except Exception as exc: | |
| 233 | + log.warning('Ignoring error during clean up: {}'.format(exc)) | |
| 234 | + | |
| 235 | + # if that failed, try ourselves with older and less accurate code | |
| 236 | + try: | |
| 237 | + if isinstance(some_file, OleFileIO): | |
| 238 | + return _is_encrypted_ole(some_file) | |
| 239 | + if zipfile.is_zipfile(some_file): | |
| 240 | + return _is_encrypted_zip(some_file) | |
| 241 | + # otherwise assume it is the name of an ole file | |
| 242 | + with OleFileIO(some_file) as ole: | |
| 243 | + return _is_encrypted_ole(ole) | |
| 244 | + except Exception as exc: | |
| 245 | + log.warning('Failed to check {} for encryption ({}); assume it is not ' | |
| 246 | + 'encrypted.'.format(some_file, exc)) | |
| 247 | + | |
| 248 | + return False | |
| 249 | + | |
| 250 | + | |
| 251 | +def _is_encrypted_zip(filename): | |
| 252 | + """Specialization of :py:func:`is_encrypted` for zip-based files.""" | |
| 253 | + log.debug('Checking for encryption in zip file') | |
| 254 | + # TODO: distinguish OpenXML from normal zip files | |
| 255 | + # try to decrypt a few bytes from first entry | |
| 256 | + with zipfile.ZipFile(filename, 'r') as zipper: | |
| 257 | + first_entry = zipper.infolist()[0] | |
| 258 | + try: | |
| 259 | + with zipper.open(first_entry, 'r') as reader: | |
| 260 | + reader.read(min(16, first_entry.file_size)) | |
| 261 | + return False | |
| 262 | + except RuntimeError as rt_err: | |
| 263 | + return 'crypt' in str(rt_err) | |
| 264 | + | |
| 265 | + | |
| 266 | +def _is_encrypted_ole(ole): | |
| 267 | + """Specialization of :py:func:`is_encrypted` for ole files.""" | |
| 268 | + log.debug('Checking for encryption in OLE file') | |
| 269 | + # check well known property for password protection | |
| 270 | + # (this field may be missing for Powerpoint2000, for example) | |
| 271 | + # TODO: check whether password protection always implies encryption. Could | |
| 272 | + # write-protection or signing with password trigger this as well? | |
| 273 | + if ole.exists("\x05SummaryInformation"): | |
| 274 | + suminfo_data = ole.getproperties("\x05SummaryInformation") | |
| 275 | + if 0x13 in suminfo_data and (suminfo_data[0x13] & 1): | |
| 276 | + return True | |
| 277 | + | |
| 278 | + # check a few stream names | |
| 279 | + # TODO: check whether these actually contain data and whether other | |
| 280 | + # necessary properties exist / are set | |
| 281 | + if ole.exists('EncryptionInfo'): | |
| 282 | + log.debug('found stream EncryptionInfo') | |
| 283 | + return True | |
| 284 | + # or an encrypted ppt file | |
| 285 | + if ole.exists('EncryptedSummary') and \ | |
| 286 | + not ole.exists('SummaryInformation'): | |
| 287 | + return True | |
| 288 | + | |
| 289 | + # Word-specific old encryption: | |
| 290 | + if ole.exists('WordDocument'): | |
| 291 | + # check for Word-specific encryption flag: | |
| 292 | + stream = None | |
| 293 | + try: | |
| 294 | + stream = ole.openstream(["WordDocument"]) | |
| 295 | + # pass header 10 bytes | |
| 296 | + stream.read(10) | |
| 297 | + # read flag structure: | |
| 298 | + temp16 = struct.unpack("H", stream.read(2))[0] | |
| 299 | + f_encrypted = (temp16 & 0x0100) >> 8 | |
| 300 | + if f_encrypted: | |
| 301 | + return True | |
| 302 | + finally: | |
| 303 | + if stream is not None: | |
| 304 | + stream.close() | |
| 305 | + | |
| 306 | + # no indication of encryption | |
| 307 | + return False | |
| 308 | + | |
| 309 | + | |
| 310 | +#: one way to achieve "write protection" in office files is to encrypt the file | |
| 311 | +#: using this password | |
| 312 | +WRITE_PROTECT_ENCRYPTION_PASSWORD = 'VelvetSweatshop' | |
| 313 | + | |
| 314 | +#: list of common passwords to be tried by default, used by malware | |
| 315 | +DEFAULT_PASSWORDS = [WRITE_PROTECT_ENCRYPTION_PASSWORD, '123', '1234', '12345', '123456', '4321'] | |
| 316 | + | |
| 317 | + | |
| 318 | +def _check_msoffcrypto(): | |
| 319 | + """Raise a :py:class:`CryptoLibNotImported` if msoffcrypto not imported.""" | |
| 320 | + if msoffcrypto is None: | |
| 321 | + raise CryptoLibNotImported() | |
| 322 | + | |
| 323 | + | |
| 324 | +def check_msoffcrypto(): | |
| 325 | + """Return `True` iff :py:mod:`msoffcrypto` could be imported.""" | |
| 326 | + return msoffcrypto is not None | |
| 327 | + | |
| 328 | + | |
| 329 | +def decrypt(filename, passwords=None, **temp_file_args): | |
| 330 | + """ | |
| 331 | + Try to decrypt an encrypted file | |
| 332 | + | |
| 333 | + This function tries to decrypt the given file using a given set of | |
| 334 | + passwords. If no password is given, tries the standard password for write | |
| 335 | + protection. Creates a file with decrypted data whose file name is returned. | |
| 336 | + If the decryption fails, None is returned. | |
| 337 | + | |
| 338 | + :param str filename: path to an ole file on disc | |
| 339 | + :param passwords: list/set/tuple/... of passwords or a single password or | |
| 340 | + None | |
| 341 | + :type passwords: iterable or str or None | |
| 342 | + :param temp_file_args: arguments for :py:func:`tempfile.mkstemp` e.g., | |
| 343 | + `dirname` or `prefix`. `suffix` will default to | |
| 344 | + suffix of input `filename`, `prefix` defaults to | |
| 345 | + `oletools-decrypt-`; `text` will be ignored | |
| 346 | + :returns: name of the decrypted temporary file (type str) or `None` | |
| 347 | + :raises: :py:class:`ImportError` if :py:mod:`msoffcrypto-tools` not found | |
| 348 | + :raises: :py:class:`ValueError` if the given file is not encrypted | |
| 349 | + """ | |
| 350 | + _check_msoffcrypto() | |
| 351 | + | |
| 352 | + # normalize password so we always have a list/tuple | |
| 353 | + if isinstance(passwords, str): | |
| 354 | + passwords = (passwords, ) | |
| 355 | + elif not passwords: | |
| 356 | + passwords = DEFAULT_PASSWORDS | |
| 357 | + | |
| 358 | + # check temp file args | |
| 359 | + if 'prefix' not in temp_file_args: | |
| 360 | + temp_file_args['prefix'] = 'oletools-decrypt-' | |
| 361 | + if 'suffix' not in temp_file_args: | |
| 362 | + temp_file_args['suffix'] = splitext(filename)[1] | |
| 363 | + temp_file_args['text'] = False | |
| 364 | + | |
| 365 | + decrypt_file = None | |
| 366 | + with open(filename, 'rb') as reader: | |
| 367 | + try: | |
| 368 | + crypto_file = msoffcrypto.OfficeFile(reader) | |
| 369 | + except Exception as exc: # e.g. ppt, not yet supported by msoffcrypto | |
| 370 | + if 'Unrecognized file format' in str(exc): | |
| 371 | + log.debug('Caught exception', exc_info=True) | |
| 372 | + | |
| 373 | + # raise different exception without stack trace of original exc | |
| 374 | + if sys.version_info.major == 2: | |
| 375 | + raise UnsupportedEncryptionError(filename) | |
| 376 | + else: | |
| 377 | + # this is a syntax error in python 2, so wrap it in exec() | |
| 378 | + exec('raise UnsupportedEncryptionError(filename) from None') | |
| 379 | + else: | |
| 380 | + raise | |
| 381 | + if not crypto_file.is_encrypted(): | |
| 382 | + raise ValueError('Given input file {} is not encrypted!' | |
| 383 | + .format(filename)) | |
| 384 | + | |
| 385 | + for password in passwords: | |
| 386 | + log.debug('Trying to decrypt with password {!r}'.format(password)) | |
| 387 | + write_descriptor = None | |
| 388 | + write_handle = None | |
| 389 | + decrypt_file = None | |
| 390 | + try: | |
| 391 | + crypto_file.load_key(password=password) | |
| 392 | + | |
| 393 | + # create temp file | |
| 394 | + write_descriptor, decrypt_file = mkstemp(**temp_file_args) | |
| 395 | + write_handle = os.fdopen(write_descriptor, 'wb') | |
| 396 | + write_descriptor = None # is now handled via write_handle | |
| 397 | + crypto_file.decrypt(write_handle) | |
| 398 | + | |
| 399 | + # decryption was successfull; clean up and return | |
| 400 | + write_handle.close() | |
| 401 | + write_handle = None | |
| 402 | + break | |
| 403 | + except Exception: | |
| 404 | + log.debug('Failed to decrypt', exc_info=True) | |
| 405 | + | |
| 406 | + # error-clean up: close everything and del temp file | |
| 407 | + if write_handle: | |
| 408 | + write_handle.close() | |
| 409 | + elif write_descriptor: | |
| 410 | + os.close(write_descriptor) | |
| 411 | + if decrypt_file and isfile(decrypt_file): | |
| 412 | + os.unlink(decrypt_file) | |
| 413 | + decrypt_file = None | |
| 414 | + # if we reach this, all passwords were tried without success | |
| 415 | + log.debug('All passwords failed') | |
| 416 | + return decrypt_file | ... | ... |
oletools/doc/Home.html
| ... | ... | @@ -16,7 +16,7 @@ |
| 16 | 16 | <![endif]--> |
| 17 | 17 | </head> |
| 18 | 18 | <body> |
| 19 | -<h1 id="python-oletools-v0.53-documentation">python-oletools v0.53 documentation</h1> | |
| 19 | +<h1 id="python-oletools-v0.54-documentation">python-oletools v0.54 documentation</h1> | |
| 20 | 20 | <p>This is the home page of the documentation for python-oletools. The latest version can be found <a href="https://github.com/decalage2/oletools/wiki">online</a>, otherwise a copy is provided in the doc subfolder of the package.</p> |
| 21 | 21 | <p><a href="http://www.decalage.info/python/oletools">python-oletools</a> is a package of python tools to analyze <a href="http://en.wikipedia.org/wiki/Compound_File_Binary_Format">Microsoft OLE2 files</a> (also called Structured Storage, Compound File Binary Format or Compound Document File Format), such as Microsoft Office documents or Outlook messages, mainly for malware analysis, forensics and debugging. It is based on the <a href="http://www.decalage.info/olefile">olefile</a> parser. See <a href="http://www.decalage.info/python/oletools" class="uri">http://www.decalage.info/python/oletools</a> for more info.</p> |
| 22 | 22 | <p><strong>Quick links:</strong> <a href="http://www.decalage.info/python/oletools">Home page</a> - <a href="https://github.com/decalage2/oletools/wiki/Install">Download/Install</a> - <a href="https://github.com/decalage2/oletools/wiki">Documentation</a> - <a href="https://github.com/decalage2/oletools/issues">Report Issues/Suggestions/Questions</a> - <a href="http://decalage.info/contact">Contact the Author</a> - <a href="https://github.com/decalage2/oletools">Repository</a> - <a href="https://twitter.com/decalage2">Updates on Twitter</a></p> | ... | ... |
oletools/doc/Home.md
oletools/doc/Install.html
| ... | ... | @@ -16,28 +16,35 @@ |
| 16 | 16 | <![endif]--> |
| 17 | 17 | </head> |
| 18 | 18 | <body> |
| 19 | -<h1 id="how-to-download-and-install-python-oletools">How to Download and Install python-oletools</h1> | |
| 19 | +<h1 id="how-to-download-and-install-oletools">How to Download and Install oletools</h1> | |
| 20 | 20 | <h2 id="pre-requisites">Pre-requisites</h2> |
| 21 | -<p>The recommended Python version to run oletools is <strong>Python 2.7</strong>. Python 2.6 is also supported, but as it is not tested as often as 2.7, some features might not work as expected.</p> | |
| 22 | -<p>Since oletools v0.50, thanks to contributions by <span class="citation" data-cites="Sebdraven">[@Sebdraven]</span>(https://twitter.com/Sebdraven), most tools can also run with <strong>Python 3.x</strong>. As this is quite new, please <a href="(https://github.com/decalage2/oletools/issues)">report any issue</a> you may encounter.</p> | |
| 21 | +<p>The recommended Python version to run oletools is the latest <strong>Python 3.x</strong> (3.7 for now). Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly recommended to switch to Python 3 now.</p> | |
| 23 | 22 | <h2 id="recommended-way-to-downloadinstallupdate-oletools-pip">Recommended way to Download+Install/Update oletools: pip</h2> |
| 24 | 23 | <p>Pip is included with Python since version 2.7.9 and 3.4. If it is not installed on your system, either upgrade Python or see https://pip.pypa.io/en/stable/installing/</p> |
| 25 | 24 | <h3 id="linux-mac-osx-unix">Linux, Mac OSX, Unix</h3> |
| 26 | 25 | <p>To download and install/update the latest release version of oletools, run the following command in a shell:</p> |
| 27 | 26 | <pre class="text"><code>sudo -H pip install -U oletools</code></pre> |
| 27 | +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p> | |
| 28 | 28 | <p><strong>Important</strong>: Since version 0.50, pip will automatically create convenient command-line scripts in /usr/local/bin to run all the oletools from any directory.</p> |
| 29 | 29 | <h3 id="windows">Windows</h3> |
| 30 | 30 | <p>To download and install/update the latest release version of oletools, run the following command in a cmd window:</p> |
| 31 | 31 | <pre class="text"><code>pip install -U oletools</code></pre> |
| 32 | +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p> | |
| 33 | +<p><strong>Note</strong>: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip and install for all users. If that is not possible, you may also install only for the current user by adding the <code>--user</code> option:</p> | |
| 34 | +<pre class="text"><code>pip3 install -U --user oletools</code></pre> | |
| 32 | 35 | <p><strong>Important</strong>: Since version 0.50, pip will automatically create convenient command-line scripts to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc.</p> |
| 33 | 36 | <h2 id="how-to-install-the-latest-development-version">How to install the latest development version</h2> |
| 34 | 37 | <p>If you want to benefit from the latest improvements in the development version, you may also use pip:</p> |
| 35 | 38 | <h3 id="linux-mac-osx-unix-1">Linux, Mac OSX, Unix</h3> |
| 36 | 39 | <pre class="text"><code>sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip</code></pre> |
| 40 | +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p> | |
| 37 | 41 | <h3 id="windows-1">Windows</h3> |
| 38 | 42 | <pre class="text"><code>pip install -U https://github.com/decalage2/oletools/archive/master.zip</code></pre> |
| 43 | +<p>Replace <code>pip</code> by <code>pip3</code> or <code>pip2</code> to install on a specific Python version.</p> | |
| 44 | +<p><strong>Note</strong>: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip and install for all users. If that is not possible, you may also install only for the current user by adding the <code>--user</code> option:</p> | |
| 45 | +<pre class="text"><code>pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip</code></pre> | |
| 39 | 46 | <h2 id="how-to-install-offline---computer-without-internet-access">How to install offline - Computer without Internet access</h2> |
| 40 | -<p>First, download the oletools archive on a computer with Internet access: * Latest stable version: from https://github.com/decalage2/oletools/releases * Development version: https://github.com/decalage2/oletools/archive/master.zip</p> | |
| 47 | +<p>First, download the oletools archive on a computer with Internet access: * Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases * Development version: https://github.com/decalage2/oletools/archive/master.zip</p> | |
| 41 | 48 | <p>Copy the archive file to the target computer.</p> |
| 42 | 49 | <p>On Linux, Mac OSX, Unix, run the following command using the filename of the archive that you downloaded:</p> |
| 43 | 50 | <pre class="text"><code>sudo -H pip install -U oletools.zip</code></pre> | ... | ... |
oletools/doc/Install.md
| 1 | -How to Download and Install python-oletools | |
| 2 | -=========================================== | |
| 1 | +How to Download and Install oletools | |
| 2 | +==================================== | |
| 3 | 3 | |
| 4 | 4 | Pre-requisites |
| 5 | 5 | -------------- |
| 6 | 6 | |
| 7 | -The recommended Python version to run oletools is **Python 2.7**. | |
| 8 | -Python 2.6 is also supported, but as it is not tested as often as 2.7, some features | |
| 9 | -might not work as expected. | |
| 10 | - | |
| 11 | -Since oletools v0.50, thanks to contributions by [@Sebdraven](https://twitter.com/Sebdraven), | |
| 12 | -most tools can also run with **Python 3.x**. As this is quite new, please | |
| 13 | -[report any issue]((https://github.com/decalage2/oletools/issues)) you may encounter. | |
| 14 | - | |
| 15 | - | |
| 7 | +The recommended Python version to run oletools is the latest **Python 3.x** (3.7 for now). | |
| 8 | +Python 2.7 is still supported, but as it will become end of life in 2020 (see https://pythonclock.org/), it is highly | |
| 9 | +recommended to switch to Python 3 now. | |
| 16 | 10 | |
| 17 | 11 | Recommended way to Download+Install/Update oletools: pip |
| 18 | 12 | -------------------------------------------------------- |
| ... | ... | @@ -29,6 +23,8 @@ run the following command in a shell: |
| 29 | 23 | sudo -H pip install -U oletools |
| 30 | 24 | ``` |
| 31 | 25 | |
| 26 | +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. | |
| 27 | + | |
| 32 | 28 | **Important**: Since version 0.50, pip will automatically create convenient command-line scripts |
| 33 | 29 | in /usr/local/bin to run all the oletools from any directory. |
| 34 | 30 | |
| ... | ... | @@ -41,6 +37,16 @@ run the following command in a cmd window: |
| 41 | 37 | pip install -U oletools |
| 42 | 38 | ``` |
| 43 | 39 | |
| 40 | +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. | |
| 41 | + | |
| 42 | +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip | |
| 43 | +and install for all users. If that is not possible, you may also install only for the current user | |
| 44 | +by adding the `--user` option: | |
| 45 | + | |
| 46 | +```text | |
| 47 | +pip3 install -U --user oletools | |
| 48 | +``` | |
| 49 | + | |
| 44 | 50 | **Important**: Since version 0.50, pip will automatically create convenient command-line scripts |
| 45 | 51 | to run all the oletools from any directory: olevba, mraptor, oleid, rtfobj, etc. |
| 46 | 52 | |
| ... | ... | @@ -57,17 +63,29 @@ you may also use pip: |
| 57 | 63 | sudo -H pip install -U https://github.com/decalage2/oletools/archive/master.zip |
| 58 | 64 | ``` |
| 59 | 65 | |
| 66 | +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. | |
| 67 | + | |
| 60 | 68 | ### Windows |
| 61 | 69 | |
| 62 | 70 | ```text |
| 63 | 71 | pip install -U https://github.com/decalage2/oletools/archive/master.zip |
| 64 | 72 | ``` |
| 65 | 73 | |
| 74 | +Replace `pip` by `pip3` or `pip2` to install on a specific Python version. | |
| 75 | + | |
| 76 | +**Note**: with Python 3, you may need to open a cmd window with Administrator privileges in order to run pip | |
| 77 | +and install for all users. If that is not possible, you may also install only for the current user | |
| 78 | +by adding the `--user` option: | |
| 79 | + | |
| 80 | +```text | |
| 81 | +pip3 install -U --user https://github.com/decalage2/oletools/archive/master.zip | |
| 82 | +``` | |
| 83 | + | |
| 66 | 84 | How to install offline - Computer without Internet access |
| 67 | 85 | --------------------------------------------------------- |
| 68 | 86 | |
| 69 | 87 | First, download the oletools archive on a computer with Internet access: |
| 70 | -* Latest stable version: from https://github.com/decalage2/oletools/releases | |
| 88 | +* Latest stable version: from https://pypi.org/project/oletools/ or https://github.com/decalage2/oletools/releases | |
| 71 | 89 | * Development version: https://github.com/decalage2/oletools/archive/master.zip |
| 72 | 90 | |
| 73 | 91 | Copy the archive file to the target computer. | ... | ... |
oletools/doc/License.html
| ... | ... | @@ -18,7 +18,7 @@ |
| 18 | 18 | <body> |
| 19 | 19 | <h1 id="license-for-python-oletools">License for python-oletools</h1> |
| 20 | 20 | <p>This license applies to the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package, apart from the thirdparty folder which contains third-party files published with their own license.</p> |
| 21 | -<p>The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec (<a href="http://www.decalage.info" class="uri">http://www.decalage.info</a>)</p> | |
| 21 | +<p>The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec (<a href="http://www.decalage.info" class="uri">http://www.decalage.info</a>)</p> | |
| 22 | 22 | <p>All rights reserved.</p> |
| 23 | 23 | <p>Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:</p> |
| 24 | 24 | <ul> | ... | ... |
oletools/doc/License.md
| ... | ... | @@ -4,7 +4,7 @@ License for python-oletools |
| 4 | 4 | This license applies to the [python-oletools](http://www.decalage.info/python/oletools) package, apart from the |
| 5 | 5 | thirdparty folder which contains third-party files published with their own license. |
| 6 | 6 | |
| 7 | -The python-oletools package is copyright (c) 2012-2018 Philippe Lagadec ([http://www.decalage.info](http://www.decalage.info)) | |
| 7 | +The python-oletools package is copyright (c) 2012-2019 Philippe Lagadec ([http://www.decalage.info](http://www.decalage.info)) | |
| 8 | 8 | |
| 9 | 9 | All rights reserved. |
| 10 | 10 | ... | ... |
oletools/doc/mraptor.html
| ... | ... | @@ -24,7 +24,7 @@ |
| 24 | 24 | <p>mraptor can be used either as a command-line tool, or as a python module from your own applications.</p> |
| 25 | 25 | <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p> |
| 26 | 26 | <h2 id="usage">Usage</h2> |
| 27 | -<pre class="text"><code>Usage: mraptor.py [options] <filename> [filename2 ...] | |
| 27 | +<pre class="text"><code>Usage: mraptor [options] <filename> [filename2 ...] | |
| 28 | 28 | |
| 29 | 29 | Options: |
| 30 | 30 | -h, --help show this help message and exit |
| ... | ... | @@ -49,15 +49,15 @@ An exit code is returned based on the analysis result: |
| 49 | 49 | - 20: SUSPICIOUS</code></pre> |
| 50 | 50 | <h3 id="examples">Examples</h3> |
| 51 | 51 | <p>Scan a single file:</p> |
| 52 | -<pre class="text"><code>mraptor.py file.doc</code></pre> | |
| 52 | +<pre class="text"><code>mraptor file.doc</code></pre> | |
| 53 | 53 | <p>Scan a single file, stored in a Zip archive with password “infected”:</p> |
| 54 | -<pre class="text"><code>mraptor.py malicious_file.xls.zip -z infected</code></pre> | |
| 54 | +<pre class="text"><code>mraptor malicious_file.xls.zip -z infected</code></pre> | |
| 55 | 55 | <p>Scan a collection of files stored in a folder:</p> |
| 56 | -<pre class="text"><code>mraptor.py "MalwareZoo/VBA/*"</code></pre> | |
| 56 | +<pre class="text"><code>mraptor "MalwareZoo/VBA/*"</code></pre> | |
| 57 | 57 | <p><strong>Important</strong>: on Linux/MacOSX, always add double quotes around a file name when you use wildcards such as <code>*</code> and <code>?</code>. Otherwise, the shell may replace the argument with the actual list of files matching the wildcards before starting the script.</p> |
| 58 | 58 | <p><img src="mraptor1.png" /></p> |
| 59 | 59 | <h2 id="python-3-support---mraptor3">Python 3 support - mraptor3</h2> |
| 60 | -<p>As of v0.50, mraptor has been ported to Python 3 thanks to <span class="citation" data-cites="sebdraven">@sebdraven</span>. However, the differences between Python 2 and 3 are significant and for now there is a separate version of mraptor named mraptor3 to be used with Python 3.</p> | |
| 60 | +<p>Since v0.54, mraptor is fully compatible with both Python 2 and 3. There is no need to use mraptor3 anymore, however it is still present for backward compatibility.</p> | |
| 61 | 61 | <hr /> |
| 62 | 62 | <h2 id="how-to-use-mraptor-in-python-applications">How to use mraptor in Python applications</h2> |
| 63 | 63 | <p>TODO</p> | ... | ... |
oletools/doc/mraptor.md
| ... | ... | @@ -24,7 +24,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa |
| 24 | 24 | ## Usage |
| 25 | 25 | |
| 26 | 26 | ```text |
| 27 | -Usage: mraptor.py [options] <filename> [filename2 ...] | |
| 27 | +Usage: mraptor [options] <filename> [filename2 ...] | |
| 28 | 28 | |
| 29 | 29 | Options: |
| 30 | 30 | -h, --help show this help message and exit |
| ... | ... | @@ -54,19 +54,19 @@ An exit code is returned based on the analysis result: |
| 54 | 54 | Scan a single file: |
| 55 | 55 | |
| 56 | 56 | ```text |
| 57 | -mraptor.py file.doc | |
| 57 | +mraptor file.doc | |
| 58 | 58 | ``` |
| 59 | 59 | |
| 60 | 60 | Scan a single file, stored in a Zip archive with password "infected": |
| 61 | 61 | |
| 62 | 62 | ```text |
| 63 | -mraptor.py malicious_file.xls.zip -z infected | |
| 63 | +mraptor malicious_file.xls.zip -z infected | |
| 64 | 64 | ``` |
| 65 | 65 | |
| 66 | 66 | Scan a collection of files stored in a folder: |
| 67 | 67 | |
| 68 | 68 | ```text |
| 69 | -mraptor.py "MalwareZoo/VBA/*" | |
| 69 | +mraptor "MalwareZoo/VBA/*" | |
| 70 | 70 | ``` |
| 71 | 71 | |
| 72 | 72 | **Important**: on Linux/MacOSX, always add double quotes around a file name when you use |
| ... | ... | @@ -77,10 +77,8 @@ list of files matching the wildcards before starting the script. |
| 77 | 77 | |
| 78 | 78 | ## Python 3 support - mraptor3 |
| 79 | 79 | |
| 80 | -As of v0.50, mraptor has been ported to Python 3 thanks to @sebdraven. | |
| 81 | -However, the differences between Python 2 and 3 are significant and for now | |
| 82 | -there is a separate version of mraptor named mraptor3 to be used with | |
| 83 | -Python 3. | |
| 80 | +Since v0.54, mraptor is fully compatible with both Python 2 and 3. | |
| 81 | +There is no need to use mraptor3 anymore, however it is still present for backward compatibility. | |
| 84 | 82 | |
| 85 | 83 | |
| 86 | 84 | -------------------------------------------------------------------------- | ... | ... |
oletools/doc/olebrowse.html
| ... | ... | @@ -26,7 +26,7 @@ |
| 26 | 26 | <p>And for Python 3:</p> |
| 27 | 27 | <pre><code>sudo apt-get install python3-tk</code></pre> |
| 28 | 28 | <h2 id="usage">Usage</h2> |
| 29 | -<pre><code>olebrowse.py [file]</code></pre> | |
| 29 | +<pre><code>olebrowse [file]</code></pre> | |
| 30 | 30 | <p>If you provide a file it will be opened, else a dialog will allow you to browse folders to open a file. Then if it is a valid OLE file, the list of data streams will be displayed. You can select a stream, and then either view its content in a builtin hexadecimal viewer, or save it to a file for further analysis.</p> |
| 31 | 31 | <h2 id="screenshots">Screenshots</h2> |
| 32 | 32 | <p>Main menu, showing all streams in the OLE file:</p> | ... | ... |
oletools/doc/olebrowse.md
| ... | ... | @@ -30,9 +30,9 @@ sudo apt-get install python3-tk |
| 30 | 30 | |
| 31 | 31 | Usage |
| 32 | 32 | ----- |
| 33 | - | |
| 34 | - olebrowse.py [file] | |
| 35 | - | |
| 33 | +``` | |
| 34 | +olebrowse [file] | |
| 35 | +``` | |
| 36 | 36 | If you provide a file it will be opened, else a dialog will allow you to browse |
| 37 | 37 | folders to open a file. Then if it is a valid OLE file, the list of data streams |
| 38 | 38 | will be displayed. You can select a stream, and then either view its content | ... | ... |
oletools/doc/oledir.html
| ... | ... | @@ -21,10 +21,21 @@ |
| 21 | 21 | <p>It can be used either as a command-line tool, or as a python module from your own applications.</p> |
| 22 | 22 | <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p> |
| 23 | 23 | <h2 id="usage">Usage</h2> |
| 24 | -<pre class="text"><code>Usage: oledir.py <filename></code></pre> | |
| 24 | +<pre class="text"><code>Usage: oledir [options] <filename> [filename2 ...] | |
| 25 | + | |
| 26 | +Options: | |
| 27 | + -h, --help show this help message and exit | |
| 28 | + -r find files recursively in subdirectories. | |
| 29 | + -z ZIP_PASSWORD, --zip=ZIP_PASSWORD | |
| 30 | + if the file is a zip archive, open all files from it, | |
| 31 | + using the provided password (requires Python 2.6+) | |
| 32 | + -f ZIP_FNAME, --zipfname=ZIP_FNAME | |
| 33 | + if the file is a zip archive, file(s) to be opened | |
| 34 | + within the zip. Wildcards * and ? are supported. | |
| 35 | + (default:*)</code></pre> | |
| 25 | 36 | <h3 id="examples">Examples</h3> |
| 26 | 37 | <p>Scan a single file:</p> |
| 27 | -<pre class="text"><code>oledir.py file.doc</code></pre> | |
| 38 | +<pre class="text"><code>oledir file.doc</code></pre> | |
| 28 | 39 | <p><img src="oledir.png" /></p> |
| 29 | 40 | <hr /> |
| 30 | 41 | <h2 id="how-to-use-oledir-in-python-applications">How to use oledir in Python applications</h2> | ... | ... |
oletools/doc/oledir.md
| ... | ... | @@ -11,7 +11,18 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa |
| 11 | 11 | ## Usage |
| 12 | 12 | |
| 13 | 13 | ```text |
| 14 | -Usage: oledir.py <filename> | |
| 14 | +Usage: oledir [options] <filename> [filename2 ...] | |
| 15 | + | |
| 16 | +Options: | |
| 17 | + -h, --help show this help message and exit | |
| 18 | + -r find files recursively in subdirectories. | |
| 19 | + -z ZIP_PASSWORD, --zip=ZIP_PASSWORD | |
| 20 | + if the file is a zip archive, open all files from it, | |
| 21 | + using the provided password (requires Python 2.6+) | |
| 22 | + -f ZIP_FNAME, --zipfname=ZIP_FNAME | |
| 23 | + if the file is a zip archive, file(s) to be opened | |
| 24 | + within the zip. Wildcards * and ? are supported. | |
| 25 | + (default:*) | |
| 15 | 26 | ``` |
| 16 | 27 | |
| 17 | 28 | ### Examples |
| ... | ... | @@ -19,7 +30,7 @@ Usage: oledir.py <filename> |
| 19 | 30 | Scan a single file: |
| 20 | 31 | |
| 21 | 32 | ```text |
| 22 | -oledir.py file.doc | |
| 33 | +oledir file.doc | |
| 23 | 34 | ``` |
| 24 | 35 | |
| 25 | 36 |  | ... | ... |
oletools/doc/oleid.html
| ... | ... | @@ -107,10 +107,10 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni |
| 107 | 107 | <li>CSV output</li> |
| 108 | 108 | </ul> |
| 109 | 109 | <h2 id="usage">Usage</h2> |
| 110 | -<pre class="text"><code>oleid.py <file></code></pre> | |
| 110 | +<pre class="text"><code>oleid <file></code></pre> | |
| 111 | 111 | <h3 id="example">Example</h3> |
| 112 | 112 | <p>Analyzing a Word document containing a Flash object and VBA macros:</p> |
| 113 | -<pre class="text"><code>C:\oletools>oleid.py word_flash_vba.doc | |
| 113 | +<pre class="text"><code>C:\oletools>oleid word_flash_vba.doc | |
| 114 | 114 | |
| 115 | 115 | Filename: word_flash_vba.doc |
| 116 | 116 | +-------------------------------+-----------------------+ | ... | ... |
oletools/doc/oleid.md
| ... | ... | @@ -32,7 +32,7 @@ Planned improvements: |
| 32 | 32 | ## Usage |
| 33 | 33 | |
| 34 | 34 | ```text |
| 35 | -oleid.py <file> | |
| 35 | +oleid <file> | |
| 36 | 36 | ``` |
| 37 | 37 | |
| 38 | 38 | ### Example |
| ... | ... | @@ -40,7 +40,7 @@ oleid.py <file> |
| 40 | 40 | Analyzing a Word document containing a Flash object and VBA macros: |
| 41 | 41 | |
| 42 | 42 | ```text |
| 43 | -C:\oletools>oleid.py word_flash_vba.doc | |
| 43 | +C:\oletools>oleid word_flash_vba.doc | |
| 44 | 44 | |
| 45 | 45 | Filename: word_flash_vba.doc |
| 46 | 46 | +-------------------------------+-----------------------+ | ... | ... |
oletools/doc/olemap.html
| ... | ... | @@ -21,10 +21,10 @@ |
| 21 | 21 | <p>It can be used either as a command-line tool, or as a python module from your own applications.</p> |
| 22 | 22 | <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p> |
| 23 | 23 | <h2 id="usage">Usage</h2> |
| 24 | -<pre class="text"><code>Usage: olemap.py <filename></code></pre> | |
| 24 | +<pre class="text"><code>Usage: olemap <filename></code></pre> | |
| 25 | 25 | <h3 id="examples">Examples</h3> |
| 26 | 26 | <p>Scan a single file:</p> |
| 27 | -<pre class="text"><code>olemap.py file.doc</code></pre> | |
| 27 | +<pre class="text"><code>olemap file.doc</code></pre> | |
| 28 | 28 | <p><img src="olemap1.png" /></p> |
| 29 | 29 | <p><img src="olemap2.png" /></p> |
| 30 | 30 | <hr /> | ... | ... |
oletools/doc/olemap.md
| ... | ... | @@ -10,7 +10,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa |
| 10 | 10 | ## Usage |
| 11 | 11 | |
| 12 | 12 | ```text |
| 13 | -Usage: olemap.py <filename> | |
| 13 | +Usage: olemap <filename> | |
| 14 | 14 | ``` |
| 15 | 15 | |
| 16 | 16 | ### Examples |
| ... | ... | @@ -18,7 +18,7 @@ Usage: olemap.py <filename> |
| 18 | 18 | Scan a single file: |
| 19 | 19 | |
| 20 | 20 | ```text |
| 21 | -olemap.py file.doc | |
| 21 | +olemap file.doc | |
| 22 | 22 | ``` |
| 23 | 23 | |
| 24 | 24 |  | ... | ... |
oletools/doc/olemeta.html
| ... | ... | @@ -20,7 +20,7 @@ |
| 20 | 20 | <p>olemeta is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract all standard properties present in the OLE file.</p> |
| 21 | 21 | <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p> |
| 22 | 22 | <h2 id="usage">Usage</h2> |
| 23 | -<pre class="text"><code>olemeta.py <file></code></pre> | |
| 23 | +<pre class="text"><code>olemeta <file></code></pre> | |
| 24 | 24 | <h3 id="example">Example</h3> |
| 25 | 25 | <p><img src="olemeta1.png" /></p> |
| 26 | 26 | <h2 id="how-to-use-olemeta-in-python-applications">How to use olemeta in Python applications</h2> | ... | ... |
oletools/doc/olemeta.md
oletools/doc/oletimes.html
| ... | ... | @@ -20,10 +20,10 @@ |
| 20 | 20 | <p>oletimes is a script to parse OLE files such as MS Office documents (e.g. Word, Excel), to extract creation and modification times of all streams and storages in the OLE file.</p> |
| 21 | 21 | <p>It is part of the <a href="http://www.decalage.info/python/oletools">python-oletools</a> package.</p> |
| 22 | 22 | <h2 id="usage">Usage</h2> |
| 23 | -<pre class="text"><code>oletimes.py <file></code></pre> | |
| 23 | +<pre class="text"><code>oletimes <file></code></pre> | |
| 24 | 24 | <h3 id="example">Example</h3> |
| 25 | 25 | <p>Checking the malware sample <a href="https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/">DIAN_caso-5415.doc</a>:</p> |
| 26 | -<pre class="text"><code>>oletimes.py DIAN_caso-5415.doc | |
| 26 | +<pre class="text"><code>>oletimes DIAN_caso-5415.doc | |
| 27 | 27 | |
| 28 | 28 | +----------------------------+---------------------+---------------------+ |
| 29 | 29 | | Stream/Storage name | Modification Time | Creation Time | | ... | ... |
oletools/doc/oletimes.md
| ... | ... | @@ -10,7 +10,7 @@ It is part of the [python-oletools](http://www.decalage.info/python/oletools) pa |
| 10 | 10 | ## Usage |
| 11 | 11 | |
| 12 | 12 | ```text |
| 13 | -oletimes.py <file> | |
| 13 | +oletimes <file> | |
| 14 | 14 | ``` |
| 15 | 15 | |
| 16 | 16 | ### Example |
| ... | ... | @@ -18,7 +18,7 @@ oletimes.py <file> |
| 18 | 18 | Checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/): |
| 19 | 19 | |
| 20 | 20 | ```text |
| 21 | ->oletimes.py DIAN_caso-5415.doc | |
| 21 | +>oletimes DIAN_caso-5415.doc | |
| 22 | 22 | |
| 23 | 23 | +----------------------------+---------------------+---------------------+ |
| 24 | 24 | | Stream/Storage name | Modification Time | Creation Time | | ... | ... |
oletools/doc/olevba.html
| ... | ... | @@ -127,56 +127,65 @@ code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warni |
| 127 | 127 | <li>olevba scans the macro source code and the deobfuscated strings to find suspicious keywords, auto-executable macros and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames, etc).</li> |
| 128 | 128 | </ol> |
| 129 | 129 | <h2 id="usage">Usage</h2> |
| 130 | -<pre class="text"><code>Usage: olevba.py [options] <filename> [filename2 ...] | |
| 131 | - | |
| 130 | +<pre class="text"><code>Usage: olevba [options] <filename> [filename2 ...] | |
| 131 | + | |
| 132 | 132 | Options: |
| 133 | 133 | -h, --help show this help message and exit |
| 134 | 134 | -r find files recursively in subdirectories. |
| 135 | 135 | -z ZIP_PASSWORD, --zip=ZIP_PASSWORD |
| 136 | 136 | if the file is a zip archive, open all files from it, |
| 137 | - using the provided password (requires Python 2.6+) | |
| 137 | + using the provided password. | |
| 138 | + -p PASSWORD, --password=PASSWORD | |
| 139 | + if encrypted office files are encountered, try | |
| 140 | + decryption with this password. May be repeated. | |
| 138 | 141 | -f ZIP_FNAME, --zipfname=ZIP_FNAME |
| 139 | 142 | if the file is a zip archive, file(s) to be opened |
| 140 | 143 | within the zip. Wildcards * and ? are supported. |
| 141 | 144 | (default:*) |
| 142 | - -t, --triage triage mode, display results as a summary table | |
| 143 | - (default for multiple files) | |
| 144 | - -d, --detailed detailed mode, display full results (default for | |
| 145 | - single file) | |
| 146 | 145 | -a, --analysis display only analysis results, not the macro source |
| 147 | 146 | code |
| 148 | 147 | -c, --code display only VBA source code, do not analyze it |
| 149 | - -i INPUT, --input=INPUT | |
| 150 | - input file containing VBA source code to be analyzed | |
| 151 | - (no parsing) | |
| 152 | 148 | --decode display all the obfuscated strings with their decoded |
| 153 | 149 | content (Hex, Base64, StrReverse, Dridex, VBA). |
| 154 | 150 | --attr display the attribute lines at the beginning of VBA |
| 155 | 151 | source code |
| 156 | 152 | --reveal display the macro source code after replacing all the |
| 157 | - obfuscated strings by their decoded content.</code></pre> | |
| 153 | + obfuscated strings by their decoded content. | |
| 154 | + -l LOGLEVEL, --loglevel=LOGLEVEL | |
| 155 | + logging level debug/info/warning/error/critical | |
| 156 | + (default=warning) | |
| 157 | + --deobf Attempt to deobfuscate VBA expressions (slow) | |
| 158 | + --relaxed Do not raise errors if opening of substream fails | |
| 159 | + | |
| 160 | + Output mode (mutually exclusive): | |
| 161 | + -t, --triage triage mode, display results as a summary table | |
| 162 | + (default for multiple files) | |
| 163 | + -d, --detailed detailed mode, display full results (default for | |
| 164 | + single file) | |
| 165 | + -j, --json json mode, detailed in json format (never default)</code></pre> | |
| 166 | +<p><strong>New in v0.54:</strong> the -p option can now be used to decrypt encrypted documents using the provided password(s).</p> | |
| 158 | 167 | <h3 id="examples">Examples</h3> |
| 159 | 168 | <p>Scan a single file:</p> |
| 160 | -<pre class="text"><code>olevba.py file.doc</code></pre> | |
| 169 | +<pre class="text"><code>olevba file.doc</code></pre> | |
| 161 | 170 | <p>Scan a single file, stored in a Zip archive with password “infected”:</p> |
| 162 | -<pre class="text"><code>olevba.py malicious_file.xls.zip -z infected</code></pre> | |
| 171 | +<pre class="text"><code>olevba malicious_file.xls.zip -z infected</code></pre> | |
| 163 | 172 | <p>Scan a single file, showing all obfuscated strings decoded:</p> |
| 164 | -<pre class="text"><code>olevba.py file.doc --decode</code></pre> | |
| 173 | +<pre class="text"><code>olevba file.doc --decode</code></pre> | |
| 165 | 174 | <p>Scan a single file, showing the macro source code with VBA strings deobfuscated:</p> |
| 166 | -<pre class="text"><code>olevba.py file.doc --reveal</code></pre> | |
| 175 | +<pre class="text"><code>olevba file.doc --reveal</code></pre> | |
| 167 | 176 | <p>Scan VBA source code extracted into a text file:</p> |
| 168 | -<pre class="text"><code>olevba.py source_code.vba</code></pre> | |
| 177 | +<pre class="text"><code>olevba source_code.vba</code></pre> | |
| 169 | 178 | <p>Scan a collection of files stored in a folder:</p> |
| 170 | -<pre class="text"><code>olevba.py "MalwareZoo/VBA/*"</code></pre> | |
| 179 | +<pre class="text"><code>olevba "MalwareZoo/VBA/*"</code></pre> | |
| 171 | 180 | <p>NOTE: On Linux, MacOSX and other Unix variants, it is required to add double quotes around wildcards. Otherwise, they will be expanded by the shell instead of olevba.</p> |
| 172 | 181 | <p>Scan all .doc and .xls files, recursively in all subfolders:</p> |
| 173 | -<pre class="text"><code>olevba.py "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r</code></pre> | |
| 182 | +<pre class="text"><code>olevba "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r</code></pre> | |
| 174 | 183 | <p>Scan all .doc files within all .zip files with password, recursively:</p> |
| 175 | -<pre class="text"><code>olevba.py "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc"</code></pre> | |
| 184 | +<pre class="text"><code>olevba "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc"</code></pre> | |
| 176 | 185 | <h3 id="detailed-analysis-mode-default-for-single-file">Detailed analysis mode (default for single file)</h3> |
| 177 | 186 | <p>When a single file is scanned, or when using the option -d, all details of the analysis are displayed.</p> |
| 178 | 187 | <p>For example, checking the malware sample <a href="https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/">DIAN_caso-5415.doc</a>:</p> |
| 179 | -<pre class="text"><code>>olevba.py c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected | |
| 188 | +<pre class="text"><code>>olevba c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected | |
| 180 | 189 | =============================================================================== |
| 181 | 190 | FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip |
| 182 | 191 | Type: OLE |
| ... | ... | @@ -246,7 +255,7 @@ ANALYSIS: |
| 246 | 255 | <li><strong>V</strong>: VBA string expressions (potential obfuscation)</li> |
| 247 | 256 | </ul> |
| 248 | 257 | <p>Here is an example:</p> |
| 249 | -<pre class="text"><code>c:\>olevba.py \MalwareZoo\VBA\samples\* | |
| 258 | +<pre class="text"><code>c:\>olevba \MalwareZoo\VBA\samples\* | |
| 250 | 259 | Flags Filename |
| 251 | 260 | ----------- ----------------------------------------------------------------- |
| 252 | 261 | OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware |
| ... | ... | @@ -266,7 +275,7 @@ OpX:MASI--- \MalwareZoo\VBA\samples\RottenKitten.xlsb.malware |
| 266 | 275 | OLE:MASI-B- \MalwareZoo\VBA\samples\ROVNIX.doc.malware |
| 267 | 276 | OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc</code></pre> |
| 268 | 277 | <h2 id="python-3-support---olevba3">Python 3 support - olevba3</h2> |
| 269 | -<p>As of v0.50, olevba has been ported to Python 3 thanks to <span class="citation" data-cites="sebdraven">@sebdraven</span>. However, the differences between Python 2 and 3 are significant and for now there is a separate version of olevba named olevba3 to be used with Python 3.</p> | |
| 278 | +<p>Since v0.54, olevba is fully compatible with both Python 2 and 3. There is no need to use olevba3 anymore, however it is still present for backward compatibility.</p> | |
| 270 | 279 | <hr /> |
| 271 | 280 | <h2 id="how-to-use-olevba-in-python-applications">How to use olevba in Python applications</h2> |
| 272 | 281 | <p>olevba may be used to open a MS Office file, detect if it contains VBA macros, extract and analyze the VBA source code from your own python applications.</p> | ... | ... |
oletools/doc/olevba.md
| ... | ... | @@ -67,85 +67,95 @@ and potential IOCs (URLs, IP addresses, e-mail addresses, executable filenames, |
| 67 | 67 | ## Usage |
| 68 | 68 | |
| 69 | 69 | ```text |
| 70 | -Usage: olevba.py [options] <filename> [filename2 ...] | |
| 71 | - | |
| 70 | +Usage: olevba [options] <filename> [filename2 ...] | |
| 71 | + | |
| 72 | 72 | Options: |
| 73 | 73 | -h, --help show this help message and exit |
| 74 | 74 | -r find files recursively in subdirectories. |
| 75 | 75 | -z ZIP_PASSWORD, --zip=ZIP_PASSWORD |
| 76 | 76 | if the file is a zip archive, open all files from it, |
| 77 | - using the provided password (requires Python 2.6+) | |
| 77 | + using the provided password. | |
| 78 | + -p PASSWORD, --password=PASSWORD | |
| 79 | + if encrypted office files are encountered, try | |
| 80 | + decryption with this password. May be repeated. | |
| 78 | 81 | -f ZIP_FNAME, --zipfname=ZIP_FNAME |
| 79 | 82 | if the file is a zip archive, file(s) to be opened |
| 80 | 83 | within the zip. Wildcards * and ? are supported. |
| 81 | 84 | (default:*) |
| 82 | - -t, --triage triage mode, display results as a summary table | |
| 83 | - (default for multiple files) | |
| 84 | - -d, --detailed detailed mode, display full results (default for | |
| 85 | - single file) | |
| 86 | 85 | -a, --analysis display only analysis results, not the macro source |
| 87 | 86 | code |
| 88 | 87 | -c, --code display only VBA source code, do not analyze it |
| 89 | - -i INPUT, --input=INPUT | |
| 90 | - input file containing VBA source code to be analyzed | |
| 91 | - (no parsing) | |
| 92 | 88 | --decode display all the obfuscated strings with their decoded |
| 93 | 89 | content (Hex, Base64, StrReverse, Dridex, VBA). |
| 94 | 90 | --attr display the attribute lines at the beginning of VBA |
| 95 | 91 | source code |
| 96 | 92 | --reveal display the macro source code after replacing all the |
| 97 | 93 | obfuscated strings by their decoded content. |
| 94 | + -l LOGLEVEL, --loglevel=LOGLEVEL | |
| 95 | + logging level debug/info/warning/error/critical | |
| 96 | + (default=warning) | |
| 97 | + --deobf Attempt to deobfuscate VBA expressions (slow) | |
| 98 | + --relaxed Do not raise errors if opening of substream fails | |
| 99 | + | |
| 100 | + Output mode (mutually exclusive): | |
| 101 | + -t, --triage triage mode, display results as a summary table | |
| 102 | + (default for multiple files) | |
| 103 | + -d, --detailed detailed mode, display full results (default for | |
| 104 | + single file) | |
| 105 | + -j, --json json mode, detailed in json format (never default) | |
| 98 | 106 | ``` |
| 99 | 107 | |
| 108 | +**New in v0.54:** the -p option can now be used to decrypt encrypted documents using the provided password(s). | |
| 109 | + | |
| 100 | 110 | ### Examples |
| 101 | 111 | |
| 102 | 112 | Scan a single file: |
| 103 | 113 | |
| 104 | 114 | ```text |
| 105 | -olevba.py file.doc | |
| 115 | +olevba file.doc | |
| 106 | 116 | ``` |
| 107 | 117 | |
| 108 | 118 | Scan a single file, stored in a Zip archive with password "infected": |
| 109 | 119 | |
| 110 | 120 | ```text |
| 111 | -olevba.py malicious_file.xls.zip -z infected | |
| 121 | +olevba malicious_file.xls.zip -z infected | |
| 112 | 122 | ``` |
| 113 | 123 | |
| 114 | 124 | Scan a single file, showing all obfuscated strings decoded: |
| 115 | 125 | |
| 116 | 126 | ```text |
| 117 | -olevba.py file.doc --decode | |
| 127 | +olevba file.doc --decode | |
| 118 | 128 | ``` |
| 119 | 129 | |
| 120 | 130 | Scan a single file, showing the macro source code with VBA strings deobfuscated: |
| 121 | 131 | |
| 122 | 132 | ```text |
| 123 | -olevba.py file.doc --reveal | |
| 133 | +olevba file.doc --reveal | |
| 124 | 134 | ``` |
| 125 | 135 | |
| 126 | 136 | Scan VBA source code extracted into a text file: |
| 127 | 137 | |
| 128 | 138 | ```text |
| 129 | -olevba.py source_code.vba | |
| 139 | +olevba source_code.vba | |
| 130 | 140 | ``` |
| 131 | 141 | |
| 132 | 142 | Scan a collection of files stored in a folder: |
| 133 | 143 | |
| 134 | 144 | ```text |
| 135 | -olevba.py "MalwareZoo/VBA/*" | |
| 145 | +olevba "MalwareZoo/VBA/*" | |
| 136 | 146 | ``` |
| 137 | 147 | NOTE: On Linux, MacOSX and other Unix variants, it is required to add double quotes around wildcards. Otherwise, they will be expanded by the shell instead of olevba. |
| 138 | 148 | |
| 139 | 149 | Scan all .doc and .xls files, recursively in all subfolders: |
| 140 | 150 | |
| 141 | 151 | ```text |
| 142 | -olevba.py "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r | |
| 152 | +olevba "MalwareZoo/VBA/*.doc" "MalwareZoo/VBA/*.xls" -r | |
| 143 | 153 | ``` |
| 144 | 154 | |
| 145 | 155 | Scan all .doc files within all .zip files with password, recursively: |
| 146 | 156 | |
| 147 | 157 | ```text |
| 148 | -olevba.py "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc" | |
| 158 | +olevba "MalwareZoo/VBA/*.zip" -r -z infected -f "*.doc" | |
| 149 | 159 | ``` |
| 150 | 160 | |
| 151 | 161 | |
| ... | ... | @@ -156,7 +166,7 @@ When a single file is scanned, or when using the option -d, all details of the a |
| 156 | 166 | For example, checking the malware sample [DIAN_caso-5415.doc](https://malwr.com/analysis/M2I4YWRhM2IwY2QwNDljN2E3ZWFjYTg3ODk4NmZhYmE/): |
| 157 | 167 | |
| 158 | 168 | ```text |
| 159 | ->olevba.py c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected | |
| 169 | +>olevba c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip -z infected | |
| 160 | 170 | =============================================================================== |
| 161 | 171 | FILE: DIAN_caso-5415.doc.malware in c:\MalwareZoo\VBA\DIAN_caso-5415.doc.zip |
| 162 | 172 | Type: OLE |
| ... | ... | @@ -233,7 +243,7 @@ The following flags show the results of the analysis: |
| 233 | 243 | Here is an example: |
| 234 | 244 | |
| 235 | 245 | ```text |
| 236 | -c:\>olevba.py \MalwareZoo\VBA\samples\* | |
| 246 | +c:\>olevba \MalwareZoo\VBA\samples\* | |
| 237 | 247 | Flags Filename |
| 238 | 248 | ----------- ----------------------------------------------------------------- |
| 239 | 249 | OLE:MASI--- \MalwareZoo\VBA\samples\DIAN_caso-5415.doc.malware |
| ... | ... | @@ -256,10 +266,9 @@ OLE:MA----- \MalwareZoo\VBA\samples\Word within Word macro auto.doc |
| 256 | 266 | |
| 257 | 267 | ## Python 3 support - olevba3 |
| 258 | 268 | |
| 259 | -As of v0.50, olevba has been ported to Python 3 thanks to @sebdraven. | |
| 260 | -However, the differences between Python 2 and 3 are significant and for now | |
| 261 | -there is a separate version of olevba named olevba3 to be used with | |
| 262 | -Python 3. | |
| 269 | +Since v0.54, olevba is fully compatible with both Python 2 and 3. | |
| 270 | +There is no need to use olevba3 anymore, however it is still present for backward compatibility. | |
| 271 | + | |
| 263 | 272 | |
| 264 | 273 | -------------------------------------------------------------------------- |
| 265 | 274 | ... | ... |
oletools/doc/pyxswf.html
| ... | ... | @@ -24,7 +24,7 @@ |
| 24 | 24 | <p>It can also extract Flash objects from RTF documents, by parsing embedded objects encoded in hexadecimal format (-f option).</p> |
| 25 | 25 | <p>For this, simply add the -o option to work on OLE streams rather than raw files, or the -f option to work on RTF files.</p> |
| 26 | 26 | <h2 id="usage">Usage</h2> |
| 27 | -<pre class="text"><code>Usage: pyxswf.py [options] <file.bad> | |
| 27 | +<pre class="text"><code>Usage: pyxswf [options] <file.bad> | |
| 28 | 28 | |
| 29 | 29 | Options: |
| 30 | 30 | -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF |
| ... | ... | @@ -46,18 +46,18 @@ Options: |
| 46 | 46 | contain SWFs. Must provide path in quotes |
| 47 | 47 | -c, --compress Compresses the SWF using Zlib</code></pre> |
| 48 | 48 | <h3 id="example-1---detecting-and-extracting-a-swf-file-from-a-word-document-on-windows">Example 1 - detecting and extracting a SWF file from a Word document on Windows:</h3> |
| 49 | -<pre class="text"><code>C:\oletools>pyxswf.py -o word_flash.doc | |
| 49 | +<pre class="text"><code>C:\oletools>pyxswf -o word_flash.doc | |
| 50 | 50 | OLE stream: 'Contents' |
| 51 | 51 | [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents |
| 52 | 52 | [ADDR] SWF 1 at 0x8 - FWS Header |
| 53 | 53 | |
| 54 | -C:\oletools>pyxswf.py -xo word_flash.doc | |
| 54 | +C:\oletools>pyxswf -xo word_flash.doc | |
| 55 | 55 | OLE stream: 'Contents' |
| 56 | 56 | [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents |
| 57 | 57 | [ADDR] SWF 1 at 0x8 - FWS Header |
| 58 | 58 | [FILE] Carved SWF MD5: 2498e9c0701dc0e461ab4358f9102bc5.swf</code></pre> |
| 59 | 59 | <h3 id="example-2---detecting-and-extracting-a-swf-file-from-a-rtf-document-on-windows">Example 2 - detecting and extracting a SWF file from a RTF document on Windows:</h3> |
| 60 | -<pre class="text"><code>C:\oletools>pyxswf.py -xf "rtf_flash.rtf" | |
| 60 | +<pre class="text"><code>C:\oletools>pyxswf -xf "rtf_flash.rtf" | |
| 61 | 61 | RTF embedded object size 1498557 at index 000036DD |
| 62 | 62 | [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0 |
| 63 | 63 | 00036DD | ... | ... |
oletools/doc/pyxswf.md
| ... | ... | @@ -21,7 +21,7 @@ For this, simply add the -o option to work on OLE streams rather than raw files, |
| 21 | 21 | ## Usage |
| 22 | 22 | |
| 23 | 23 | ```text |
| 24 | -Usage: pyxswf.py [options] <file.bad> | |
| 24 | +Usage: pyxswf [options] <file.bad> | |
| 25 | 25 | |
| 26 | 26 | Options: |
| 27 | 27 | -o, --ole Parse an OLE file (e.g. Word, Excel) to look for SWF |
| ... | ... | @@ -47,12 +47,12 @@ Options: |
| 47 | 47 | ### Example 1 - detecting and extracting a SWF file from a Word document on Windows: |
| 48 | 48 | |
| 49 | 49 | ```text |
| 50 | -C:\oletools>pyxswf.py -o word_flash.doc | |
| 50 | +C:\oletools>pyxswf -o word_flash.doc | |
| 51 | 51 | OLE stream: 'Contents' |
| 52 | 52 | [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents |
| 53 | 53 | [ADDR] SWF 1 at 0x8 - FWS Header |
| 54 | 54 | |
| 55 | -C:\oletools>pyxswf.py -xo word_flash.doc | |
| 55 | +C:\oletools>pyxswf -xo word_flash.doc | |
| 56 | 56 | OLE stream: 'Contents' |
| 57 | 57 | [SUMMARY] 1 SWF(s) in MD5:993664cc86f60d52d671b6610813cfd1:Contents |
| 58 | 58 | [ADDR] SWF 1 at 0x8 - FWS Header |
| ... | ... | @@ -62,7 +62,7 @@ OLE stream: 'Contents' |
| 62 | 62 | ### Example 2 - detecting and extracting a SWF file from a RTF document on Windows: |
| 63 | 63 | |
| 64 | 64 | ```text |
| 65 | -C:\oletools>pyxswf.py -xf "rtf_flash.rtf" | |
| 65 | +C:\oletools>pyxswf -xf "rtf_flash.rtf" | |
| 66 | 66 | RTF embedded object size 1498557 at index 000036DD |
| 67 | 67 | [SUMMARY] 1 SWF(s) in MD5:46a110548007e04f4043785ac4184558:RTF_embedded_object_0 |
| 68 | 68 | 00036DD | ... | ... |
oletools/ezhexviewer.py
| ... | ... | @@ -16,7 +16,7 @@ Usage in a python application: |
| 16 | 16 | |
| 17 | 17 | ezhexviewer project website: http://www.decalage.info/python/ezhexviewer |
| 18 | 18 | |
| 19 | -ezhexviewer is copyright (c) 2012-2017, Philippe Lagadec (http://www.decalage.info) | |
| 19 | +ezhexviewer is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info) | |
| 20 | 20 | All rights reserved. |
| 21 | 21 | |
| 22 | 22 | Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -50,7 +50,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| 50 | 50 | # 2017-04-26 PL: - fixed absolute imports (issue #141) |
| 51 | 51 | # 2018-09-15 v0.54 PL: - easygui is now a dependency |
| 52 | 52 | |
| 53 | -__version__ = '0.54dev1' | |
| 53 | +__version__ = '0.54' | |
| 54 | 54 | |
| 55 | 55 | #----------------------------------------------------------------------------- |
| 56 | 56 | # TODO: | ... | ... |
oletools/mraptor.py
| ... | ... | @@ -23,7 +23,7 @@ http://www.decalage.info/python/oletools |
| 23 | 23 | |
| 24 | 24 | # === LICENSE ================================================================== |
| 25 | 25 | |
| 26 | -# MacroRaptor is copyright (c) 2016-2018 Philippe Lagadec (http://www.decalage.info) | |
| 26 | +# MacroRaptor is copyright (c) 2016-2019 Philippe Lagadec (http://www.decalage.info) | |
| 27 | 27 | # All rights reserved. |
| 28 | 28 | # |
| 29 | 29 | # Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -58,8 +58,9 @@ http://www.decalage.info/python/oletools |
| 58 | 58 | # 2016-12-21 v0.51 PL: - added more ActiveX macro triggers |
| 59 | 59 | # 2017-03-08 PL: - fixed absolute imports |
| 60 | 60 | # 2018-05-25 v0.53 PL: - added Word/PowerPoint 2007+ XML (aka Flat OPC) issue #283 |
| 61 | +# 2019-04-04 v0.54 PL: - added ExecuteExcel4Macro, ShellExecuteA, XLM keywords | |
| 61 | 62 | |
| 62 | -__version__ = '0.53' | |
| 63 | +__version__ = '0.54' | |
| 63 | 64 | |
| 64 | 65 | #------------------------------------------------------------------------------ |
| 65 | 66 | # TODO: |
| ... | ... | @@ -119,20 +120,21 @@ re_autoexec = re.compile(r'(?i)\b(?:Auto(?:Exec|_?Open|_?Close|Exit|New)' + |
| 119 | 120 | r'|DocumentComplete|DownloadBegin|DownloadComplete|FileDownload' + |
| 120 | 121 | r'|NavigateComplete2|NavigateError|ProgressChange|PropertyChange' + |
| 121 | 122 | r'|SetSecureLockIcon|StatusTextChange|TitleChange|MouseMove' + |
| 122 | - r'|MouseEnter|MouseLeave|))\b') | |
| 123 | + r'|MouseEnter|MouseLeave))|Auto_Ope\b') | |
| 124 | +# TODO: "Auto_Ope" is temporarily here because of a bug in plugin_biff, which misses the last byte in "Auto_Open"... | |
| 123 | 125 | |
| 124 | 126 | # MS-VBAL 5.4.5.1 Open Statement: |
| 125 | 127 | RE_OPEN_WRITE = r'(?:\bOpen\b[^\n]+\b(?:Write|Append|Binary|Output|Random)\b)' |
| 126 | 128 | |
| 127 | 129 | re_write = re.compile(r'(?i)\b(?:FileCopy|CopyFile|Kill|CreateTextFile|' |
| 128 | - + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|' | |
| 130 | + + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|WriteProcessMemory|' | |
| 129 | 131 | + r'ADODB\.Stream|WriteText|SaveToFile|SaveAs|SaveAsRTF|FileSaveAs|MkDir|RmDir|SaveSetting|SetAttr)\b|' + RE_OPEN_WRITE) |
| 130 | 132 | |
| 131 | 133 | # MS-VBAL 5.2.3.5 External Procedure Declaration |
| 132 | 134 | RE_DECLARE_LIB = r'(?:\bDeclare\b[^\n]+\bLib\b)' |
| 133 | 135 | |
| 134 | 136 | re_execute = re.compile(r'(?i)\b(?:Shell|CreateObject|GetObject|SendKeys|' |
| 135 | - + r'MacScript|FollowHyperlink|CreateThread|ShellExecute)\b|' + RE_DECLARE_LIB) | |
| 137 | + + r'MacScript|FollowHyperlink|CreateThread|ShellExecuteA?|ExecuteExcel4Macro|EXEC|REGISTER)\b|' + RE_DECLARE_LIB) | |
| 136 | 138 | |
| 137 | 139 | |
| 138 | 140 | # === CLASSES ================================================================= | ... | ... |
oletools/mraptor3.py
| 1 | 1 | #!/usr/bin/env python |
| 2 | -""" | |
| 3 | -mraptor.py - MacroRaptor | |
| 4 | 2 | |
| 5 | -MacroRaptor is a script to parse OLE and OpenXML files such as MS Office | |
| 6 | -documents (e.g. Word, Excel), to detect malicious macros. | |
| 3 | +# mraptor3 is a stub that redirects to mraptor.py, for backwards compatibility | |
| 7 | 4 | |
| 8 | -Supported formats: | |
| 9 | -- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm) | |
| 10 | -- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb) | |
| 11 | -- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm) | |
| 12 | -- Word/PowerPoint 2007+ XML (aka Flat OPC) | |
| 13 | -- Word 2003 XML (.xml) | |
| 14 | -- Word/Excel Single File Web Page / MHTML (.mht) | |
| 15 | -- Publisher (.pub) | |
| 5 | +import sys, os, warnings | |
| 16 | 6 | |
| 17 | -Author: Philippe Lagadec - http://www.decalage.info | |
| 18 | -License: BSD, see source code or documentation | |
| 19 | - | |
| 20 | -MacroRaptor is part of the python-oletools package: | |
| 21 | -http://www.decalage.info/python/oletools | |
| 22 | -""" | |
| 23 | - | |
| 24 | -# === LICENSE ================================================================== | |
| 25 | - | |
| 26 | -# MacroRaptor is copyright (c) 2016-2018 Philippe Lagadec (http://www.decalage.info) | |
| 27 | -# All rights reserved. | |
| 28 | -# | |
| 29 | -# Redistribution and use in source and binary forms, with or without modification, | |
| 30 | -# are permitted provided that the following conditions are met: | |
| 31 | -# | |
| 32 | -# * Redistributions of source code must retain the above copyright notice, this | |
| 33 | -# list of conditions and the following disclaimer. | |
| 34 | -# * Redistributions in binary form must reproduce the above copyright notice, | |
| 35 | -# this list of conditions and the following disclaimer in the documentation | |
| 36 | -# and/or other materials provided with the distribution. | |
| 37 | -# | |
| 38 | -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |
| 39 | -# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
| 40 | -# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |
| 41 | -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | |
| 42 | -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
| 43 | -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | |
| 44 | -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | |
| 45 | -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | |
| 46 | -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
| 47 | -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
| 48 | - | |
| 49 | -#------------------------------------------------------------------------------ | |
| 50 | -# CHANGELOG: | |
| 51 | -# 2016-02-23 v0.01 PL: - first version | |
| 52 | -# 2016-02-29 v0.02 PL: - added Workbook_Activate, FileSaveAs | |
| 53 | -# 2016-03-04 v0.03 PL: - returns an exit code based on the overall result | |
| 54 | -# 2016-03-08 v0.04 PL: - collapse long lines before analysis | |
| 55 | -# 2016-07-19 v0.50 SL: - converted to Python 3 | |
| 56 | -# 2016-08-26 PL: - changed imports for Python 3 | |
| 57 | -# 2017-04-26 v0.51 PL: - fixed absolute imports (issue #141) | |
| 58 | -# 2017-06-29 PL: - synced with mraptor.py 0.51 | |
| 59 | -# 2018-05-25 v0.53 PL: - added Word/PowerPoint 2007+ XML (aka Flat OPC) issue #283 | |
| 60 | - | |
| 61 | -__version__ = '0.53' | |
| 62 | - | |
| 63 | -#------------------------------------------------------------------------------ | |
| 64 | -# TODO: | |
| 65 | - | |
| 66 | - | |
| 67 | -#--- IMPORTS ------------------------------------------------------------------ | |
| 68 | - | |
| 69 | -import sys, os, logging, optparse, re | |
| 7 | +warnings.warn('mraptor3 is deprecated, mraptor should be used instead.', DeprecationWarning) | |
| 70 | 8 | |
| 71 | 9 | # IMPORTANT: it should be possible to run oletools directly as scripts |
| 72 | 10 | # in any directory without installing them with pip or setup.py. |
| ... | ... | @@ -74,280 +12,12 @@ import sys, os, logging, optparse, re |
| 74 | 12 | # And to enable Python 2+3 compatibility, we need to use absolute imports, |
| 75 | 13 | # so we add the oletools parent folder to sys.path (absolute+normalized path): |
| 76 | 14 | _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) |
| 77 | -# print('_thismodule_dir = %r' % _thismodule_dir) | |
| 78 | 15 | _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..')) |
| 79 | -# print('_parent_dir = %r' % _thirdparty_dir) | |
| 80 | -if not _parent_dir in sys.path: | |
| 16 | +if _parent_dir not in sys.path: | |
| 81 | 17 | sys.path.insert(0, _parent_dir) |
| 82 | 18 | |
| 83 | -from oletools.thirdparty.xglob import xglob | |
| 84 | -from oletools.thirdparty.tablestream import tablestream | |
| 85 | - | |
| 86 | -# import the python 3 version of olevba | |
| 87 | -from oletools import olevba3 as olevba | |
| 88 | -from oletools.olevba3 import TYPE2TAG | |
| 89 | - | |
| 90 | -# === LOGGING ================================================================= | |
| 91 | - | |
| 92 | -# a global logger object used for debugging: | |
| 93 | -log = olevba.get_logger('mraptor') | |
| 94 | - | |
| 95 | - | |
| 96 | -#--- CONSTANTS ---------------------------------------------------------------- | |
| 97 | - | |
| 98 | -# URL and message to report issues: | |
| 99 | -# TODO: make it a common variable for all oletools | |
| 100 | -URL_ISSUES = 'https://github.com/decalage2/oletools/issues' | |
| 101 | -MSG_ISSUES = 'Please report this issue on %s' % URL_ISSUES | |
| 102 | - | |
| 103 | -# 'AutoExec', 'AutoOpen', 'Auto_Open', 'AutoClose', 'Auto_Close', 'AutoNew', 'AutoExit', | |
| 104 | -# 'Document_Open', 'DocumentOpen', | |
| 105 | -# 'Document_Close', 'DocumentBeforeClose', 'Document_BeforeClose', | |
| 106 | -# 'DocumentChange','Document_New', | |
| 107 | -# 'NewDocument' | |
| 108 | -# 'Workbook_Open', 'Workbook_Close', | |
| 109 | -# *_Painted such as InkPicture1_Painted | |
| 110 | -# *_GotFocus|LostFocus|MouseHover for other ActiveX objects | |
| 111 | -# reference: http://www.greyhathacker.net/?p=948 | |
| 112 | - | |
| 113 | -# TODO: check if line also contains Sub or Function | |
| 114 | -re_autoexec = re.compile(r'(?i)\b(?:Auto(?:Exec|_?Open|_?Close|Exit|New)' + | |
| 115 | - r'|Document(?:_?Open|_Close|_?BeforeClose|Change|_New)' + | |
| 116 | - r'|NewDocument|Workbook(?:_Open|_Activate|_Close)' + | |
| 117 | - r'|\w+_(?:Painted|Painting|GotFocus|LostFocus|MouseHover' + | |
| 118 | - r'|Layout|Click|Change|Resize|BeforeNavigate2|BeforeScriptExecute' + | |
| 119 | - r'|DocumentComplete|DownloadBegin|DownloadComplete|FileDownload' + | |
| 120 | - r'|NavigateComplete2|NavigateError|ProgressChange|PropertyChange' + | |
| 121 | - r'|SetSecureLockIcon|StatusTextChange|TitleChange|MouseMove' + | |
| 122 | - r'|MouseEnter|MouseLeave|))\b') | |
| 123 | - | |
| 124 | -# MS-VBAL 5.4.5.1 Open Statement: | |
| 125 | -RE_OPEN_WRITE = r'(?:\bOpen\b[^\n]+\b(?:Write|Append|Binary|Output|Random)\b)' | |
| 126 | - | |
| 127 | -re_write = re.compile(r'(?i)\b(?:FileCopy|CopyFile|Kill|CreateTextFile|' | |
| 128 | - + r'VirtualAlloc|RtlMoveMemory|URLDownloadToFileA?|AltStartupPath|' | |
| 129 | - + r'ADODB\.Stream|WriteText|SaveToFile|SaveAs|SaveAsRTF|FileSaveAs|MkDir|RmDir|SaveSetting|SetAttr)\b|' + RE_OPEN_WRITE) | |
| 130 | - | |
| 131 | -# MS-VBAL 5.2.3.5 External Procedure Declaration | |
| 132 | -RE_DECLARE_LIB = r'(?:\bDeclare\b[^\n]+\bLib\b)' | |
| 133 | - | |
| 134 | -re_execute = re.compile(r'(?i)\b(?:Shell|CreateObject|GetObject|SendKeys|' | |
| 135 | - + r'MacScript|FollowHyperlink|CreateThread|ShellExecute)\b|' + RE_DECLARE_LIB) | |
| 136 | - | |
| 137 | - | |
| 138 | -# === CLASSES ================================================================= | |
| 139 | - | |
| 140 | -class Result_NoMacro(object): | |
| 141 | - exit_code = 0 | |
| 142 | - color = 'green' | |
| 143 | - name = 'No Macro' | |
| 144 | - | |
| 145 | - | |
| 146 | -class Result_NotMSOffice(object): | |
| 147 | - exit_code = 1 | |
| 148 | - color = 'green' | |
| 149 | - name = 'Not MS Office' | |
| 150 | - | |
| 151 | - | |
| 152 | -class Result_MacroOK(object): | |
| 153 | - exit_code = 2 | |
| 154 | - color = 'cyan' | |
| 155 | - name = 'Macro OK' | |
| 156 | - | |
| 157 | - | |
| 158 | -class Result_Error(object): | |
| 159 | - exit_code = 10 | |
| 160 | - color = 'yellow' | |
| 161 | - name = 'ERROR' | |
| 162 | - | |
| 163 | - | |
| 164 | -class Result_Suspicious(object): | |
| 165 | - exit_code = 20 | |
| 166 | - color = 'red' | |
| 167 | - name = 'SUSPICIOUS' | |
| 168 | - | |
| 169 | - | |
| 170 | -class MacroRaptor(object): | |
| 171 | - """ | |
| 172 | - class to scan VBA macro code to detect if it is malicious | |
| 173 | - """ | |
| 174 | - def __init__(self, vba_code): | |
| 175 | - """ | |
| 176 | - MacroRaptor constructor | |
| 177 | - :param vba_code: string containing the VBA macro code | |
| 178 | - """ | |
| 179 | - # collapse long lines first | |
| 180 | - self.vba_code = olevba.vba_collapse_long_lines(vba_code) | |
| 181 | - self.autoexec = False | |
| 182 | - self.write = False | |
| 183 | - self.execute = False | |
| 184 | - self.flags = '' | |
| 185 | - self.suspicious = False | |
| 186 | - self.autoexec_match = None | |
| 187 | - self.write_match = None | |
| 188 | - self.execute_match = None | |
| 189 | - self.matches = [] | |
| 190 | - | |
| 191 | - def scan(self): | |
| 192 | - """ | |
| 193 | - Scan the VBA macro code to detect if it is malicious | |
| 194 | - :return: | |
| 195 | - """ | |
| 196 | - m = re_autoexec.search(self.vba_code) | |
| 197 | - if m is not None: | |
| 198 | - self.autoexec = True | |
| 199 | - self.autoexec_match = m.group() | |
| 200 | - self.matches.append(m.group()) | |
| 201 | - m = re_write.search(self.vba_code) | |
| 202 | - if m is not None: | |
| 203 | - self.write = True | |
| 204 | - self.write_match = m.group() | |
| 205 | - self.matches.append(m.group()) | |
| 206 | - m = re_execute.search(self.vba_code) | |
| 207 | - if m is not None: | |
| 208 | - self.execute = True | |
| 209 | - self.execute_match = m.group() | |
| 210 | - self.matches.append(m.group()) | |
| 211 | - if self.autoexec and (self.execute or self.write): | |
| 212 | - self.suspicious = True | |
| 213 | - | |
| 214 | - def get_flags(self): | |
| 215 | - flags = '' | |
| 216 | - flags += 'A' if self.autoexec else '-' | |
| 217 | - flags += 'W' if self.write else '-' | |
| 218 | - flags += 'X' if self.execute else '-' | |
| 219 | - return flags | |
| 220 | - | |
| 221 | - | |
| 222 | -# === MAIN ==================================================================== | |
| 223 | - | |
| 224 | -def main(): | |
| 225 | - """ | |
| 226 | - Main function, called when olevba is run from the command line | |
| 227 | - """ | |
| 228 | - global log | |
| 229 | - DEFAULT_LOG_LEVEL = "warning" # Default log level | |
| 230 | - LOG_LEVELS = { | |
| 231 | - 'debug': logging.DEBUG, | |
| 232 | - 'info': logging.INFO, | |
| 233 | - 'warning': logging.WARNING, | |
| 234 | - 'error': logging.ERROR, | |
| 235 | - 'critical': logging.CRITICAL | |
| 236 | - } | |
| 237 | - | |
| 238 | - usage = 'usage: %prog [options] <filename> [filename2 ...]' | |
| 239 | - parser = optparse.OptionParser(usage=usage) | |
| 240 | - parser.add_option("-r", action="store_true", dest="recursive", | |
| 241 | - help='find files recursively in subdirectories.') | |
| 242 | - parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None, | |
| 243 | - help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)') | |
| 244 | - parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*', | |
| 245 | - help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)') | |
| 246 | - parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL, | |
| 247 | - help="logging level debug/info/warning/error/critical (default=%default)") | |
| 248 | - parser.add_option("-m", '--matches', action="store_true", dest="show_matches", | |
| 249 | - help='Show matched strings.') | |
| 250 | - | |
| 251 | - # TODO: add logfile option | |
| 252 | - | |
| 253 | - (options, args) = parser.parse_args() | |
| 254 | - | |
| 255 | - # Print help if no arguments are passed | |
| 256 | - if len(args) == 0: | |
| 257 | - print('MacroRaptor %s - http://decalage.info/python/oletools' % __version__) | |
| 258 | - print('This is work in progress, please report issues at %s' % URL_ISSUES) | |
| 259 | - print(__doc__) | |
| 260 | - parser.print_help() | |
| 261 | - print('\nAn exit code is returned based on the analysis result:') | |
| 262 | - for result in (Result_NoMacro, Result_NotMSOffice, Result_MacroOK, Result_Error, Result_Suspicious): | |
| 263 | - print(' - %d: %s' % (result.exit_code, result.name)) | |
| 264 | - sys.exit() | |
| 265 | - | |
| 266 | - # print banner with version | |
| 267 | - print('MacroRaptor %s - http://decalage.info/python/oletools' % __version__) | |
| 268 | - print('This is work in progress, please report issues at %s' % URL_ISSUES) | |
| 269 | - | |
| 270 | - logging.basicConfig(level=LOG_LEVELS[options.loglevel], format='%(levelname)-8s %(message)s') | |
| 271 | - # enable logging in the modules: | |
| 272 | - log.setLevel(logging.NOTSET) | |
| 273 | - | |
| 274 | - t = tablestream.TableStream(style=tablestream.TableStyleSlim, | |
| 275 | - header_row=['Result', 'Flags', 'Type', 'File'], | |
| 276 | - column_width=[10, 5, 4, 56]) | |
| 277 | - | |
| 278 | - exitcode = -1 | |
| 279 | - global_result = None | |
| 280 | - # TODO: handle errors in xglob, to continue processing the next files | |
| 281 | - for container, filename, data in xglob.iter_files(args, recursive=options.recursive, | |
| 282 | - zip_password=options.zip_password, zip_fname=options.zip_fname): | |
| 283 | - # ignore directory names stored in zip files: | |
| 284 | - if container and filename.endswith('/'): | |
| 285 | - continue | |
| 286 | - full_name = '%s in %s' % (filename, container) if container else filename | |
| 287 | - # try: | |
| 288 | - # # Open the file | |
| 289 | - # if data is None: | |
| 290 | - # data = open(filename, 'rb').read() | |
| 291 | - # except: | |
| 292 | - # log.exception('Error when opening file %r' % full_name) | |
| 293 | - # continue | |
| 294 | - if isinstance(data, Exception): | |
| 295 | - result = Result_Error | |
| 296 | - t.write_row([result.name, '', '', full_name], | |
| 297 | - colors=[result.color, None, None, None]) | |
| 298 | - t.write_row(['', '', '', str(data)], | |
| 299 | - colors=[None, None, None, result.color]) | |
| 300 | - else: | |
| 301 | - filetype = '???' | |
| 302 | - try: | |
| 303 | - vba_parser = olevba.VBA_Parser(filename=filename, data=data, container=container) | |
| 304 | - filetype = TYPE2TAG[vba_parser.type] | |
| 305 | - except Exception as e: | |
| 306 | - # log.error('Error when parsing VBA macros from file %r' % full_name) | |
| 307 | - # TODO: distinguish actual errors from non-MSOffice files | |
| 308 | - result = Result_Error | |
| 309 | - t.write_row([result.name, '', filetype, full_name], | |
| 310 | - colors=[result.color, None, None, None]) | |
| 311 | - t.write_row(['', '', '', str(e)], | |
| 312 | - colors=[None, None, None, result.color]) | |
| 313 | - continue | |
| 314 | - if vba_parser.detect_vba_macros(): | |
| 315 | - vba_code_all_modules = '' | |
| 316 | - try: | |
| 317 | - for (subfilename, stream_path, vba_filename, vba_code) in vba_parser.extract_all_macros(): | |
| 318 | - vba_code_all_modules += vba_code.decode('utf-8','replace') + '\n' | |
| 319 | - except Exception as e: | |
| 320 | - # log.error('Error when parsing VBA macros from file %r' % full_name) | |
| 321 | - result = Result_Error | |
| 322 | - t.write_row([result.name, '', TYPE2TAG[vba_parser.type], full_name], | |
| 323 | - colors=[result.color, None, None, None]) | |
| 324 | - t.write_row(['', '', '', str(e)], | |
| 325 | - colors=[None, None, None, result.color]) | |
| 326 | - continue | |
| 327 | - mraptor = MacroRaptor(vba_code_all_modules) | |
| 328 | - mraptor.scan() | |
| 329 | - if mraptor.suspicious: | |
| 330 | - result = Result_Suspicious | |
| 331 | - else: | |
| 332 | - result = Result_MacroOK | |
| 333 | - t.write_row([result.name, mraptor.get_flags(), filetype, full_name], | |
| 334 | - colors=[result.color, None, None, None]) | |
| 335 | - if mraptor.matches and options.show_matches: | |
| 336 | - t.write_row(['', '', '', 'Matches: %r' % mraptor.matches]) | |
| 337 | - else: | |
| 338 | - result = Result_NoMacro | |
| 339 | - t.write_row([result.name, '', filetype, full_name], | |
| 340 | - colors=[result.color, None, None, None]) | |
| 341 | - if result.exit_code > exitcode: | |
| 342 | - global_result = result | |
| 343 | - exitcode = result.exit_code | |
| 344 | - | |
| 345 | - print('') | |
| 346 | - print('Flags: A=AutoExec, W=Write, X=Execute') | |
| 347 | - print('Exit code: %d - %s' % (exitcode, global_result.name)) | |
| 348 | - sys.exit(exitcode) | |
| 19 | +from oletools.mraptor import * | |
| 20 | +from oletools.mraptor import __doc__, __version__ | |
| 349 | 21 | |
| 350 | 22 | if __name__ == '__main__': |
| 351 | 23 | main() |
| 352 | - | |
| 353 | -# Soundtrack: "Dark Child" by Marlon Williams | ... | ... |
oletools/mraptor_milter.py
| ... | ... | @@ -98,18 +98,7 @@ from oletools import olevba, mraptor |
| 98 | 98 | |
| 99 | 99 | from Milter.utils import parse_addr |
| 100 | 100 | |
| 101 | -if sys.version_info[0] <= 2: | |
| 102 | - # Python 2.x | |
| 103 | - if sys.version_info[1] <= 6: | |
| 104 | - # Python 2.6 | |
| 105 | - # use is_zipfile backported from Python 2.7: | |
| 106 | - from oletools.thirdparty.zipfile27 import is_zipfile | |
| 107 | - else: | |
| 108 | - # Python 2.7 | |
| 109 | - from zipfile import is_zipfile | |
| 110 | -else: | |
| 111 | - # Python 3.x+ | |
| 112 | - from zipfile import is_zipfile | |
| 101 | +from zipfile import is_zipfile | |
| 113 | 102 | |
| 114 | 103 | |
| 115 | 104 | ... | ... |
oletools/msodde.py
| ... | ... | @@ -11,7 +11,6 @@ Supported formats: |
| 11 | 11 | - RTF |
| 12 | 12 | - CSV (exported from / imported into Excel) |
| 13 | 13 | - XML (exported from Word 2003, Word 2007+, Excel 2003, (Excel 2007+?) |
| 14 | -- raises an error if run with files encrypted using MS Crypto API RC4 | |
| 15 | 14 | |
| 16 | 15 | Author: Philippe Lagadec - http://www.decalage.info |
| 17 | 16 | License: BSD, see source code or documentation |
| ... | ... | @@ -22,7 +21,7 @@ http://www.decalage.info/python/oletools |
| 22 | 21 | |
| 23 | 22 | # === LICENSE ================================================================= |
| 24 | 23 | |
| 25 | -# msodde is copyright (c) 2017-2018 Philippe Lagadec (http://www.decalage.info) | |
| 24 | +# msodde is copyright (c) 2017-2019 Philippe Lagadec (http://www.decalage.info) | |
| 26 | 25 | # All rights reserved. |
| 27 | 26 | # |
| 28 | 27 | # Redistribution and use in source and binary forms, with or without |
| ... | ... | @@ -52,19 +51,30 @@ from __future__ import print_function |
| 52 | 51 | |
| 53 | 52 | import argparse |
| 54 | 53 | import os |
| 55 | -from os.path import abspath, dirname | |
| 56 | 54 | import sys |
| 57 | 55 | import re |
| 58 | 56 | import csv |
| 59 | 57 | |
| 60 | 58 | import olefile |
| 61 | 59 | |
| 60 | +# IMPORTANT: it should be possible to run oletools directly as scripts | |
| 61 | +# in any directory without installing them with pip or setup.py. | |
| 62 | +# In that case, relative imports are NOT usable. | |
| 63 | +# And to enable Python 2+3 compatibility, we need to use absolute imports, | |
| 64 | +# so we add the oletools parent folder to sys.path (absolute+normalized path): | |
| 65 | +_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) | |
| 66 | +# print('_thismodule_dir = %r' % _thismodule_dir) | |
| 67 | +_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..')) | |
| 68 | +# print('_parent_dir = %r' % _thirdparty_dir) | |
| 69 | +if _parent_dir not in sys.path: | |
| 70 | + sys.path.insert(0, _parent_dir) | |
| 71 | + | |
| 62 | 72 | from oletools import ooxml |
| 63 | 73 | from oletools import xls_parser |
| 64 | 74 | from oletools import rtfobj |
| 65 | -from oletools import oleid | |
| 75 | +from oletools.ppt_record_parser import is_ppt | |
| 76 | +from oletools import crypto | |
| 66 | 77 | from oletools.common.log_helper import log_helper |
| 67 | -from oletools.common.errors import FileIsEncryptedError | |
| 68 | 78 | |
| 69 | 79 | # ----------------------------------------------------------------------------- |
| 70 | 80 | # CHANGELOG: |
| ... | ... | @@ -88,8 +98,11 @@ from oletools.common.errors import FileIsEncryptedError |
| 88 | 98 | # 2018-03-21 CH: - added detection for various CSV formulas (issue #259) |
| 89 | 99 | # 2018-09-11 v0.54 PL: - olefile is now a dependency |
| 90 | 100 | # 2018-10-25 CH: - detect encryption and raise error if detected |
| 101 | +# 2019-03-25 CH: - added decryption of password-protected files | |
| 102 | +# 2019-07-17 v0.55 CH: - fixed issue #267, unicode error on Python 2 | |
| 103 | + | |
| 91 | 104 | |
| 92 | -__version__ = '0.54dev4' | |
| 105 | +__version__ = '0.55.dev3' | |
| 93 | 106 | |
| 94 | 107 | # ----------------------------------------------------------------------------- |
| 95 | 108 | # TODO: field codes can be in headers/footers/comments - parse these |
| ... | ... | @@ -305,6 +318,9 @@ def process_args(cmd_line_args=None): |
| 305 | 318 | default=DEFAULT_LOG_LEVEL, |
| 306 | 319 | help="logging level debug/info/warning/error/critical " |
| 307 | 320 | "(default=%(default)s)") |
| 321 | + parser.add_argument("-p", "--password", type=str, action='append', | |
| 322 | + help='if encrypted office files are encountered, try ' | |
| 323 | + 'decryption with this password. May be repeated.') | |
| 308 | 324 | filter_group = parser.add_argument_group( |
| 309 | 325 | title='Filter which OpenXML field commands are returned', |
| 310 | 326 | description='Only applies to OpenXML (e.g. docx) and rtf, not to OLE ' |
| ... | ... | @@ -348,14 +364,13 @@ def process_doc_field(data): |
| 348 | 364 | """ check if field instructions start with DDE |
| 349 | 365 | |
| 350 | 366 | expects unicode input, returns unicode output (empty if not dde) """ |
| 351 | - logger.debug('processing field {0}'.format(data)) | |
| 367 | + logger.debug(u'processing field {0}'.format(data)) | |
| 352 | 368 | |
| 353 | 369 | if data.lstrip().lower().startswith(u'dde'): |
| 354 | 370 | return data |
| 355 | - elif data.lstrip().lower().startswith(u'\x00d\x00d\x00e\x00'): | |
| 371 | + if data.lstrip().lower().startswith(u'\x00d\x00d\x00e\x00'): | |
| 356 | 372 | return data |
| 357 | - else: | |
| 358 | - return u'' | |
| 373 | + return u'' | |
| 359 | 374 | |
| 360 | 375 | |
| 361 | 376 | OLE_FIELD_START = 0x13 |
| ... | ... | @@ -379,7 +394,7 @@ def process_doc_stream(stream): |
| 379 | 394 | while True: |
| 380 | 395 | idx += 1 |
| 381 | 396 | char = stream.read(1) # loop over every single byte |
| 382 | - if len(char) == 0: | |
| 397 | + if len(char) == 0: # pylint: disable=len-as-condition | |
| 383 | 398 | break |
| 384 | 399 | else: |
| 385 | 400 | char = ord(char) |
| ... | ... | @@ -417,7 +432,7 @@ def process_doc_stream(stream): |
| 417 | 432 | pass |
| 418 | 433 | elif len(field_contents) > OLE_FIELD_MAX_SIZE: |
| 419 | 434 | logger.debug('field exceeds max size of {0}. Ignore rest' |
| 420 | - .format(OLE_FIELD_MAX_SIZE)) | |
| 435 | + .format(OLE_FIELD_MAX_SIZE)) | |
| 421 | 436 | max_size_exceeded = True |
| 422 | 437 | |
| 423 | 438 | # appending a raw byte to a unicode string here. Not clean but |
| ... | ... | @@ -437,7 +452,7 @@ def process_doc_stream(stream): |
| 437 | 452 | logger.debug('big field was not a field after all') |
| 438 | 453 | |
| 439 | 454 | logger.debug('Checked {0} characters, found {1} fields' |
| 440 | - .format(idx, len(result_parts))) | |
| 455 | + .format(idx, len(result_parts))) | |
| 441 | 456 | |
| 442 | 457 | return result_parts |
| 443 | 458 | |
| ... | ... | @@ -462,11 +477,10 @@ def process_doc(ole): |
| 462 | 477 | direntry = ole._load_direntry(sid) |
| 463 | 478 | is_stream = direntry.entry_type == olefile.STGTY_STREAM |
| 464 | 479 | logger.debug('direntry {:2d} {}: {}' |
| 465 | - .format(sid, '[orphan]' if is_orphan else direntry.name, | |
| 466 | - 'is stream of size {}'.format(direntry.size) | |
| 467 | - if is_stream else | |
| 468 | - 'no stream ({})' | |
| 469 | - .format(direntry.entry_type))) | |
| 480 | + .format(sid, '[orphan]' if is_orphan else direntry.name, | |
| 481 | + 'is stream of size {}'.format(direntry.size) | |
| 482 | + if is_stream else | |
| 483 | + 'no stream ({})'.format(direntry.entry_type))) | |
| 470 | 484 | if is_stream: |
| 471 | 485 | new_parts = process_doc_stream( |
| 472 | 486 | ole._open(direntry.isectStart, direntry.size)) |
| ... | ... | @@ -480,17 +494,23 @@ def process_xls(filepath): |
| 480 | 494 | """ find dde links in excel ole file """ |
| 481 | 495 | |
| 482 | 496 | result = [] |
| 483 | - for stream in xls_parser.XlsFile(filepath).iter_streams(): | |
| 484 | - if not isinstance(stream, xls_parser.WorkbookStream): | |
| 485 | - continue | |
| 486 | - for record in stream.iter_records(): | |
| 487 | - if not isinstance(record, xls_parser.XlsRecordSupBook): | |
| 497 | + xls_file = None | |
| 498 | + try: | |
| 499 | + xls_file = xls_parser.XlsFile(filepath) | |
| 500 | + for stream in xls_file.iter_streams(): | |
| 501 | + if not isinstance(stream, xls_parser.WorkbookStream): | |
| 488 | 502 | continue |
| 489 | - if record.support_link_type in ( | |
| 490 | - xls_parser.XlsRecordSupBook.LINK_TYPE_OLE_DDE, | |
| 491 | - xls_parser.XlsRecordSupBook.LINK_TYPE_EXTERNAL): | |
| 492 | - result.append(record.virt_path.replace(u'\u0003', u' ')) | |
| 493 | - return u'\n'.join(result) | |
| 503 | + for record in stream.iter_records(): | |
| 504 | + if not isinstance(record, xls_parser.XlsRecordSupBook): | |
| 505 | + continue | |
| 506 | + if record.support_link_type in ( | |
| 507 | + xls_parser.XlsRecordSupBook.LINK_TYPE_OLE_DDE, | |
| 508 | + xls_parser.XlsRecordSupBook.LINK_TYPE_EXTERNAL): | |
| 509 | + result.append(record.virt_path.replace(u'\u0003', u' ')) | |
| 510 | + return u'\n'.join(result) | |
| 511 | + finally: | |
| 512 | + if xls_file is not None: | |
| 513 | + xls_file.close() | |
| 494 | 514 | |
| 495 | 515 | |
| 496 | 516 | def process_docx(filepath, field_filter_mode=None): |
| ... | ... | @@ -525,7 +545,8 @@ def process_docx(filepath, field_filter_mode=None): |
| 525 | 545 | else: |
| 526 | 546 | elem = curr_elem |
| 527 | 547 | if elem is None: |
| 528 | - raise BadOOXML(filepath, 'Got "None"-Element from iter_xml') | |
| 548 | + raise ooxml.BadOOXML(filepath, | |
| 549 | + 'Got "None"-Element from iter_xml') | |
| 529 | 550 | |
| 530 | 551 | # check if FLDCHARTYPE and whether "begin" or "end" tag |
| 531 | 552 | attrib_type = elem.attrib.get(ATTR_W_FLDCHARTYPE[0]) or \ |
| ... | ... | @@ -535,7 +556,7 @@ def process_docx(filepath, field_filter_mode=None): |
| 535 | 556 | level += 1 |
| 536 | 557 | if attrib_type == "end": |
| 537 | 558 | level -= 1 |
| 538 | - if level == 0 or level == -1: # edge-case; level gets -1 | |
| 559 | + if level in (0, -1): # edge-case; level gets -1 | |
| 539 | 560 | all_fields.append(ddetext) |
| 540 | 561 | ddetext = u'' |
| 541 | 562 | level = 0 # reset edge-case |
| ... | ... | @@ -564,6 +585,7 @@ def process_docx(filepath, field_filter_mode=None): |
| 564 | 585 | |
| 565 | 586 | |
| 566 | 587 | def unquote(field): |
| 588 | + """TODO: document what exactly is happening here...""" | |
| 567 | 589 | if "QUOTE" not in field or NO_QUOTES: |
| 568 | 590 | return field |
| 569 | 591 | # split into components |
| ... | ... | @@ -605,8 +627,8 @@ def field_is_blacklisted(contents): |
| 605 | 627 | index = FIELD_BLACKLIST_CMDS.index(words[0].lower()) |
| 606 | 628 | except ValueError: # first word is no blacklisted command |
| 607 | 629 | return False |
| 608 | - logger.debug('trying to match "{0}" to blacklist command {1}' | |
| 609 | - .format(contents, FIELD_BLACKLIST[index])) | |
| 630 | + logger.debug(u'trying to match "{0}" to blacklist command {1}' | |
| 631 | + .format(contents, FIELD_BLACKLIST[index])) | |
| 610 | 632 | _, nargs_required, nargs_optional, sw_with_arg, sw_solo, sw_format \ |
| 611 | 633 | = FIELD_BLACKLIST[index] |
| 612 | 634 | |
| ... | ... | @@ -617,12 +639,13 @@ def field_is_blacklisted(contents): |
| 617 | 639 | break |
| 618 | 640 | nargs += 1 |
| 619 | 641 | if nargs < nargs_required: |
| 620 | - logger.debug('too few args: found {0}, but need at least {1} in "{2}"' | |
| 621 | - .format(nargs, nargs_required, contents)) | |
| 642 | + logger.debug(u'too few args: found {0}, but need at least {1} in "{2}"' | |
| 643 | + .format(nargs, nargs_required, contents)) | |
| 622 | 644 | return False |
| 623 | - elif nargs > nargs_required + nargs_optional: | |
| 624 | - logger.debug('too many args: found {0}, but need at most {1}+{2} in "{3}"' | |
| 625 | - .format(nargs, nargs_required, nargs_optional, contents)) | |
| 645 | + if nargs > nargs_required + nargs_optional: | |
| 646 | + logger.debug(u'too many args: found {0}, but need at most {1}+{2} in ' | |
| 647 | + u'"{3}"' | |
| 648 | + .format(nargs, nargs_required, nargs_optional, contents)) | |
| 626 | 649 | return False |
| 627 | 650 | |
| 628 | 651 | # check switches |
| ... | ... | @@ -631,15 +654,15 @@ def field_is_blacklisted(contents): |
| 631 | 654 | for word in words[1+nargs:]: |
| 632 | 655 | if expect_arg: # this is an argument for the last switch |
| 633 | 656 | if arg_choices and (word not in arg_choices): |
| 634 | - logger.debug('Found invalid switch argument "{0}" in "{1}"' | |
| 635 | - .format(word, contents)) | |
| 657 | + logger.debug(u'Found invalid switch argument "{0}" in "{1}"' | |
| 658 | + .format(word, contents)) | |
| 636 | 659 | return False |
| 637 | 660 | expect_arg = False |
| 638 | 661 | arg_choices = [] # in general, do not enforce choices |
| 639 | 662 | continue # "no further questions, your honor" |
| 640 | 663 | elif not FIELD_SWITCH_REGEX.match(word): |
| 641 | - logger.debug('expected switch, found "{0}" in "{1}"' | |
| 642 | - .format(word, contents)) | |
| 664 | + logger.debug(u'expected switch, found "{0}" in "{1}"' | |
| 665 | + .format(word, contents)) | |
| 643 | 666 | return False |
| 644 | 667 | # we want a switch and we got a valid one |
| 645 | 668 | switch = word[1] |
| ... | ... | @@ -660,8 +683,8 @@ def field_is_blacklisted(contents): |
| 660 | 683 | if 'numeric' in sw_format: |
| 661 | 684 | arg_choices = [] # too many choices to list them here |
| 662 | 685 | else: |
| 663 | - logger.debug('unexpected switch {0} in "{1}"' | |
| 664 | - .format(switch, contents)) | |
| 686 | + logger.debug(u'unexpected switch {0} in "{1}"' | |
| 687 | + .format(switch, contents)) | |
| 665 | 688 | return False |
| 666 | 689 | |
| 667 | 690 | # if nothing went wrong sofar, the contents seems to match the blacklist |
| ... | ... | @@ -676,7 +699,7 @@ def process_xlsx(filepath): |
| 676 | 699 | tag = elem.tag.lower() |
| 677 | 700 | if tag == 'ddelink' or tag.endswith('}ddelink'): |
| 678 | 701 | # we have found a dde link. Try to get more info about it |
| 679 | - link_info = ['DDE-Link'] | |
| 702 | + link_info = [] | |
| 680 | 703 | if 'ddeService' in elem.attrib: |
| 681 | 704 | link_info.append(elem.attrib['ddeService']) |
| 682 | 705 | if 'ddeTopic' in elem.attrib: |
| ... | ... | @@ -687,16 +710,15 @@ def process_xlsx(filepath): |
| 687 | 710 | for subfile, content_type, handle in parser.iter_non_xml(): |
| 688 | 711 | try: |
| 689 | 712 | logger.info('Parsing non-xml subfile {0} with content type {1}' |
| 690 | - .format(subfile, content_type)) | |
| 713 | + .format(subfile, content_type)) | |
| 691 | 714 | for record in xls_parser.parse_xlsb_part(handle, content_type, |
| 692 | 715 | subfile): |
| 693 | 716 | logger.debug('{0}: {1}'.format(subfile, record)) |
| 694 | 717 | if isinstance(record, xls_parser.XlsbBeginSupBook) and \ |
| 695 | 718 | record.link_type == \ |
| 696 | 719 | xls_parser.XlsbBeginSupBook.LINK_TYPE_DDE: |
| 697 | - dde_links.append('DDE-Link ' + record.string1 + ' ' + | |
| 698 | - record.string2) | |
| 699 | - except Exception: | |
| 720 | + dde_links.append(record.string1 + ' ' + record.string2) | |
| 721 | + except Exception as exc: | |
| 700 | 722 | if content_type.startswith('application/vnd.ms-excel.') or \ |
| 701 | 723 | content_type.startswith('application/vnd.ms-office.'): # pylint: disable=bad-indentation |
| 702 | 724 | # should really be able to parse these either as xml or records |
| ... | ... | @@ -727,7 +749,8 @@ class RtfFieldParser(rtfobj.RtfParser): |
| 727 | 749 | |
| 728 | 750 | def open_destination(self, destination): |
| 729 | 751 | if destination.cword == b'fldinst': |
| 730 | - logger.debug('*** Start field data at index %Xh' % destination.start) | |
| 752 | + logger.debug('*** Start field data at index %Xh' | |
| 753 | + % destination.start) | |
| 731 | 754 | |
| 732 | 755 | def close_destination(self, destination): |
| 733 | 756 | if destination.cword == b'fldinst': |
| ... | ... | @@ -758,7 +781,7 @@ def process_rtf(file_handle, field_filter_mode=None): |
| 758 | 781 | all_fields = [field.decode('ascii') for field in rtfparser.fields] |
| 759 | 782 | # apply field command filter |
| 760 | 783 | logger.debug('found {1} fields, filtering with mode "{0}"' |
| 761 | - .format(field_filter_mode, len(all_fields))) | |
| 784 | + .format(field_filter_mode, len(all_fields))) | |
| 762 | 785 | if field_filter_mode in (FIELD_FILTER_ALL, None): |
| 763 | 786 | clean_fields = all_fields |
| 764 | 787 | elif field_filter_mode == FIELD_FILTER_DDE: |
| ... | ... | @@ -815,11 +838,12 @@ def process_csv(filepath): |
| 815 | 838 | results, _ = process_csv_dialect(file_handle, delim) |
| 816 | 839 | except csv.Error: # e.g. sniffing fails |
| 817 | 840 | logger.debug('failed to csv-parse with delimiter {0!r}' |
| 818 | - .format(delim)) | |
| 841 | + .format(delim)) | |
| 819 | 842 | |
| 820 | 843 | if is_small and not results: |
| 821 | 844 | # try whole file as single cell, since sniffing fails in this case |
| 822 | - logger.debug('last attempt: take whole file as single unquoted cell') | |
| 845 | + logger.debug('last attempt: take whole file as single unquoted ' | |
| 846 | + 'cell') | |
| 823 | 847 | file_handle.seek(0) |
| 824 | 848 | match = CSV_DDE_FORMAT.match(file_handle.read(CSV_SMALL_THRESH)) |
| 825 | 849 | if match: |
| ... | ... | @@ -836,8 +860,8 @@ def process_csv_dialect(file_handle, delimiters): |
| 836 | 860 | delimiters=delimiters) |
| 837 | 861 | dialect.strict = False # microsoft is never strict |
| 838 | 862 | logger.debug('sniffed csv dialect with delimiter {0!r} ' |
| 839 | - 'and quote char {1!r}' | |
| 840 | - .format(dialect.delimiter, dialect.quotechar)) | |
| 863 | + 'and quote char {1!r}' | |
| 864 | + .format(dialect.delimiter, dialect.quotechar)) | |
| 841 | 865 | |
| 842 | 866 | # rewind file handle to start |
| 843 | 867 | file_handle.seek(0) |
| ... | ... | @@ -877,7 +901,7 @@ def process_excel_xml(filepath): |
| 877 | 901 | break |
| 878 | 902 | if formula is None: |
| 879 | 903 | continue |
| 880 | - logger.debug('found cell with formula {0}'.format(formula)) | |
| 904 | + logger.debug(u'found cell with formula {0}'.format(formula)) | |
| 881 | 905 | match = re.match(XML_DDE_FORMAT, formula) |
| 882 | 906 | if match: |
| 883 | 907 | dde_links.append(u' '.join(match.groups()[:2])) |
| ... | ... | @@ -891,19 +915,11 @@ def process_file(filepath, field_filter_mode=None): |
| 891 | 915 | if xls_parser.is_xls(filepath): |
| 892 | 916 | logger.debug('Process file as excel 2003 (xls)') |
| 893 | 917 | return process_xls(filepath) |
| 894 | - | |
| 895 | - # encrypted files also look like ole, even if office 2007+ (xml-based) | |
| 896 | - # so check for encryption, first | |
| 897 | - ole = olefile.OleFileIO(filepath, path_encoding=None) | |
| 898 | - oid = oleid.OleID(ole) | |
| 899 | - if oid.check_encrypted().value: | |
| 900 | - log.debug('is encrypted - raise error') | |
| 901 | - raise FileIsEncryptedError(filepath) | |
| 902 | - elif oid.check_powerpoint().value: | |
| 903 | - log.debug('is ppt - cannot have DDE') | |
| 918 | + if is_ppt(filepath): | |
| 919 | + logger.debug('is ppt - cannot have DDE') | |
| 904 | 920 | return u'' |
| 905 | - else: | |
| 906 | - logger.debug('Process file as word 2003 (doc)') | |
| 921 | + logger.debug('Process file as word 2003 (doc)') | |
| 922 | + with olefile.OleFileIO(filepath, path_encoding=None) as ole: | |
| 907 | 923 | return process_doc(ole) |
| 908 | 924 | |
| 909 | 925 | with open(filepath, 'rb') as file_handle: |
| ... | ... | @@ -921,22 +937,77 @@ def process_file(filepath, field_filter_mode=None): |
| 921 | 937 | if doctype == ooxml.DOCTYPE_EXCEL: |
| 922 | 938 | logger.debug('Process file as excel 2007+ (xlsx)') |
| 923 | 939 | return process_xlsx(filepath) |
| 924 | - elif doctype in (ooxml.DOCTYPE_EXCEL_XML, ooxml.DOCTYPE_EXCEL_XML2003): | |
| 940 | + if doctype in (ooxml.DOCTYPE_EXCEL_XML, ooxml.DOCTYPE_EXCEL_XML2003): | |
| 925 | 941 | logger.debug('Process file as xml from excel 2003/2007+') |
| 926 | 942 | return process_excel_xml(filepath) |
| 927 | - elif doctype in (ooxml.DOCTYPE_WORD_XML, ooxml.DOCTYPE_WORD_XML2003): | |
| 943 | + if doctype in (ooxml.DOCTYPE_WORD_XML, ooxml.DOCTYPE_WORD_XML2003): | |
| 928 | 944 | logger.debug('Process file as xml from word 2003/2007+') |
| 929 | 945 | return process_docx(filepath) |
| 930 | - elif doctype is None: | |
| 946 | + if doctype is None: | |
| 931 | 947 | logger.debug('Process file as csv') |
| 932 | 948 | return process_csv(filepath) |
| 933 | - else: # could be docx; if not: this is the old default code path | |
| 934 | - logger.debug('Process file as word 2007+ (docx)') | |
| 935 | - return process_docx(filepath, field_filter_mode) | |
| 949 | + # could be docx; if not: this is the old default code path | |
| 950 | + logger.debug('Process file as word 2007+ (docx)') | |
| 951 | + return process_docx(filepath, field_filter_mode) | |
| 936 | 952 | |
| 937 | 953 | |
| 938 | 954 | # === MAIN ================================================================= |
| 939 | 955 | |
| 956 | + | |
| 957 | +def process_maybe_encrypted(filepath, passwords=None, crypto_nesting=0, | |
| 958 | + **kwargs): | |
| 959 | + """ | |
| 960 | + Process a file that might be encrypted. | |
| 961 | + | |
| 962 | + Calls :py:func:`process_file` and if that fails tries to decrypt and | |
| 963 | + process the result. Based on recommendation in module doc string of | |
| 964 | + :py:mod:`oletools.crypto`. | |
| 965 | + | |
| 966 | + :param str filepath: path to file on disc. | |
| 967 | + :param passwords: list of passwords (str) to try for decryption or None | |
| 968 | + :param int crypto_nesting: How many decryption layers were already used to | |
| 969 | + get the given file. | |
| 970 | + :param kwargs: same as :py:func:`process_file` | |
| 971 | + :returns: same as :py:func:`process_file` | |
| 972 | + """ | |
| 973 | + result = u'' | |
| 974 | + try: | |
| 975 | + result = process_file(filepath, **kwargs) | |
| 976 | + if not crypto.is_encrypted(filepath): | |
| 977 | + return result | |
| 978 | + except Exception: | |
| 979 | + logger.debug('Ignoring exception:', exc_info=True) | |
| 980 | + if not crypto.is_encrypted(filepath): | |
| 981 | + raise | |
| 982 | + | |
| 983 | + # we reach this point only if file is encrypted | |
| 984 | + # check if this is an encrypted file in an encrypted file in an ... | |
| 985 | + if crypto_nesting >= crypto.MAX_NESTING_DEPTH: | |
| 986 | + raise crypto.MaxCryptoNestingReached(crypto_nesting, filepath) | |
| 987 | + | |
| 988 | + decrypted_file = None | |
| 989 | + if passwords is None: | |
| 990 | + passwords = crypto.DEFAULT_PASSWORDS | |
| 991 | + else: | |
| 992 | + passwords = list(passwords) + crypto.DEFAULT_PASSWORDS | |
| 993 | + try: | |
| 994 | + logger.debug('Trying to decrypt file') | |
| 995 | + decrypted_file = crypto.decrypt(filepath, passwords) | |
| 996 | + if not decrypted_file: | |
| 997 | + logger.error('Decrypt failed, run with debug output to get details') | |
| 998 | + raise crypto.WrongEncryptionPassword(filepath) | |
| 999 | + logger.info('Analyze decrypted file') | |
| 1000 | + result = process_maybe_encrypted(decrypted_file, passwords, | |
| 1001 | + crypto_nesting+1, **kwargs) | |
| 1002 | + finally: # clean up | |
| 1003 | + try: # (maybe file was not yet created) | |
| 1004 | + os.unlink(decrypted_file) | |
| 1005 | + except Exception: | |
| 1006 | + logger.debug('Ignoring exception closing decrypted file:', | |
| 1007 | + exc_info=True) | |
| 1008 | + return result | |
| 1009 | + | |
| 1010 | + | |
| 940 | 1011 | def main(cmd_line_args=None): |
| 941 | 1012 | """ Main function, called if this file is called as a script |
| 942 | 1013 | |
| ... | ... | @@ -961,13 +1032,16 @@ def main(cmd_line_args=None): |
| 961 | 1032 | text = '' |
| 962 | 1033 | return_code = 1 |
| 963 | 1034 | try: |
| 964 | - text = process_file(args.filepath, args.field_filter_mode) | |
| 1035 | + text = process_maybe_encrypted( | |
| 1036 | + args.filepath, args.password, | |
| 1037 | + field_filter_mode=args.field_filter_mode) | |
| 965 | 1038 | return_code = 0 |
| 966 | 1039 | except Exception as exc: |
| 967 | - logger.exception(exc.message) | |
| 1040 | + logger.exception(str(exc)) | |
| 968 | 1041 | |
| 969 | 1042 | logger.print_str('DDE Links:') |
| 970 | - logger.print_str(text) | |
| 1043 | + for link in text.splitlines(): | |
| 1044 | + logger.print_str(text, type='dde-link') | |
| 971 | 1045 | |
| 972 | 1046 | log_helper.end_logging() |
| 973 | 1047 | ... | ... |
oletools/olebrowse.py
| ... | ... | @@ -12,7 +12,7 @@ olebrowse project website: http://www.decalage.info/python/olebrowse |
| 12 | 12 | olebrowse is part of the python-oletools package: |
| 13 | 13 | http://www.decalage.info/python/oletools |
| 14 | 14 | |
| 15 | -olebrowse is copyright (c) 2012-2017, Philippe Lagadec (http://www.decalage.info) | |
| 15 | +olebrowse is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info) | |
| 16 | 16 | All rights reserved. |
| 17 | 17 | |
| 18 | 18 | Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -43,7 +43,7 @@ OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
| 43 | 43 | # 2017-04-26 v0.51 PL: - fixed absolute imports (issue #141) |
| 44 | 44 | # 2018-09-11 v0.54 PL: - olefile is now a dependency |
| 45 | 45 | |
| 46 | -__version__ = '0.54dev1' | |
| 46 | +__version__ = '0.54' | |
| 47 | 47 | |
| 48 | 48 | #------------------------------------------------------------------------------ |
| 49 | 49 | # TODO: | ... | ... |
oletools/oledir.py
| ... | ... | @@ -14,7 +14,7 @@ http://www.decalage.info/python/oletools |
| 14 | 14 | |
| 15 | 15 | #=== LICENSE ================================================================== |
| 16 | 16 | |
| 17 | -# oledir is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info) | |
| 17 | +# oledir is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info) | |
| 18 | 18 | # All rights reserved. |
| 19 | 19 | # |
| 20 | 20 | # Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -53,7 +53,7 @@ from __future__ import print_function |
| 53 | 53 | # 2018-08-28 v0.54 PL: - olefile is now a dependency |
| 54 | 54 | # 2018-10-06 - colorclass is now a dependency |
| 55 | 55 | |
| 56 | -__version__ = '0.54dev1' | |
| 56 | +__version__ = '0.54' | |
| 57 | 57 | |
| 58 | 58 | #------------------------------------------------------------------------------ |
| 59 | 59 | # TODO: | ... | ... |
oletools/oleform.py
oletools/oleid.py
| ... | ... | @@ -17,7 +17,7 @@ http://www.decalage.info/python/oletools |
| 17 | 17 | |
| 18 | 18 | #=== LICENSE ================================================================= |
| 19 | 19 | |
| 20 | -# oleid is copyright (c) 2012-2018, Philippe Lagadec (http://www.decalage.info) | |
| 20 | +# oleid is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info) | |
| 21 | 21 | # All rights reserved. |
| 22 | 22 | # |
| 23 | 23 | # Redistribution and use in source and binary forms, with or without |
| ... | ... | @@ -59,7 +59,7 @@ from __future__ import print_function |
| 59 | 59 | # 2018-10-19 CH: - accept olefile as well as filename, return Indicators, |
| 60 | 60 | # improve encryption detection for ppt |
| 61 | 61 | |
| 62 | -__version__ = '0.54dev4' | |
| 62 | +__version__ = '0.54' | |
| 63 | 63 | |
| 64 | 64 | |
| 65 | 65 | #------------------------------------------------------------------------------ |
| ... | ... | @@ -80,22 +80,26 @@ __version__ = '0.54dev4' |
| 80 | 80 | |
| 81 | 81 | #=== IMPORTS ================================================================= |
| 82 | 82 | |
| 83 | -import argparse, sys, re, zlib, struct | |
| 83 | +import argparse, sys, re, zlib, struct, os | |
| 84 | 84 | from os.path import dirname, abspath |
| 85 | 85 | |
| 86 | -# little hack to allow absolute imports even if oletools is not installed | |
| 87 | -# (required to run oletools directly as scripts in any directory). | |
| 88 | -try: | |
| 89 | - from oletools.thirdparty import prettytable | |
| 90 | -except ImportError: | |
| 91 | - PARENT_DIR = dirname(dirname(abspath(__file__))) | |
| 92 | - if PARENT_DIR not in sys.path: | |
| 93 | - sys.path.insert(0, PARENT_DIR) | |
| 94 | - del PARENT_DIR | |
| 95 | - from oletools.thirdparty import prettytable | |
| 96 | - | |
| 97 | 86 | import olefile |
| 98 | 87 | |
| 88 | +# IMPORTANT: it should be possible to run oletools directly as scripts | |
| 89 | +# in any directory without installing them with pip or setup.py. | |
| 90 | +# In that case, relative imports are NOT usable. | |
| 91 | +# And to enable Python 2+3 compatibility, we need to use absolute imports, | |
| 92 | +# so we add the oletools parent folder to sys.path (absolute+normalized path): | |
| 93 | +_thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) | |
| 94 | +# print('_thismodule_dir = %r' % _thismodule_dir) | |
| 95 | +_parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..')) | |
| 96 | +# print('_parent_dir = %r' % _thirdparty_dir) | |
| 97 | +if _parent_dir not in sys.path: | |
| 98 | + sys.path.insert(0, _parent_dir) | |
| 99 | + | |
| 100 | +from oletools.thirdparty.prettytable import prettytable | |
| 101 | +from oletools import crypto | |
| 102 | + | |
| 99 | 103 | |
| 100 | 104 | |
| 101 | 105 | #=== FUNCTIONS =============================================================== |
| ... | ... | @@ -279,20 +283,7 @@ class OleID(object): |
| 279 | 283 | self.indicators.append(encrypted) |
| 280 | 284 | if not self.ole: |
| 281 | 285 | return None |
| 282 | - # check if bit 1 of security field = 1: | |
| 283 | - # (this field may be missing for Powerpoint2000, for example) | |
| 284 | - if self.suminfo_data is None: | |
| 285 | - self.check_properties() | |
| 286 | - if 0x13 in self.suminfo_data: | |
| 287 | - if self.suminfo_data[0x13] & 1: | |
| 288 | - encrypted.value = True | |
| 289 | - # check if this is an OpenXML encrypted file | |
| 290 | - elif self.ole.exists('EncryptionInfo'): | |
| 291 | - encrypted.value = True | |
| 292 | - # or an encrypted ppt file | |
| 293 | - if self.ole.exists('EncryptedSummary') and \ | |
| 294 | - not self.ole.exists('SummaryInformation'): | |
| 295 | - encrypted.value = True | |
| 286 | + encrypted.value = crypto.is_encrypted(self.ole) | |
| 296 | 287 | return encrypted |
| 297 | 288 | |
| 298 | 289 | def check_word(self): |
| ... | ... | @@ -316,27 +307,7 @@ class OleID(object): |
| 316 | 307 | return None, None |
| 317 | 308 | if self.ole.exists('WordDocument'): |
| 318 | 309 | word.value = True |
| 319 | - # check for Word-specific encryption flag: | |
| 320 | - stream = None | |
| 321 | - try: | |
| 322 | - stream = self.ole.openstream(["WordDocument"]) | |
| 323 | - # pass header 10 bytes | |
| 324 | - stream.read(10) | |
| 325 | - # read flag structure: | |
| 326 | - temp16 = struct.unpack("H", stream.read(2))[0] | |
| 327 | - f_encrypted = (temp16 & 0x0100) >> 8 | |
| 328 | - if f_encrypted: | |
| 329 | - # correct encrypted indicator if present or add one | |
| 330 | - encrypt_ind = self.get_indicator('encrypted') | |
| 331 | - if encrypt_ind: | |
| 332 | - encrypt_ind.value = True | |
| 333 | - else: | |
| 334 | - self.indicators.append('encrypted', True, name='Encrypted') | |
| 335 | - except Exception: | |
| 336 | - raise | |
| 337 | - finally: | |
| 338 | - if stream is not None: | |
| 339 | - stream.close() | |
| 310 | + | |
| 340 | 311 | # check for VBA macros: |
| 341 | 312 | if self.ole.exists('Macros'): |
| 342 | 313 | macros.value = True | ... | ... |
oletools/olemap.py
| ... | ... | @@ -13,7 +13,7 @@ http://www.decalage.info/python/oletools |
| 13 | 13 | |
| 14 | 14 | #=== LICENSE ================================================================== |
| 15 | 15 | |
| 16 | -# olemap is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info) | |
| 16 | +# olemap is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info) | |
| 17 | 17 | # All rights reserved. |
| 18 | 18 | # |
| 19 | 19 | # Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -52,8 +52,9 @@ http://www.decalage.info/python/oletools |
| 52 | 52 | # 2017-03-23 PL: - only display the header by default |
| 53 | 53 | # - added option --exdata to display extra data in hex |
| 54 | 54 | # 2018-08-28 v0.54 PL: - olefile is now a dependency |
| 55 | +# 2019-07-10 v0.55 PL: - fixed display of OLE header CLSID (issue #394) | |
| 55 | 56 | |
| 56 | -__version__ = '0.54dev1' | |
| 57 | +__version__ = '0.55.dev3' | |
| 57 | 58 | |
| 58 | 59 | #------------------------------------------------------------------------------ |
| 59 | 60 | # TODO: |
| ... | ... | @@ -121,7 +122,7 @@ def show_header(ole, extra_data=False): |
| 121 | 122 | print("OLE HEADER:") |
| 122 | 123 | t = tablestream.TableStream([24, 16, 79-(4+24+16)], header_row=['Attribute', 'Value', 'Description']) |
| 123 | 124 | t.write_row(['OLE Signature (hex)', binascii.b2a_hex(ole.header_signature).upper(), 'Should be D0CF11E0A1B11AE1']) |
| 124 | - t.write_row(['Header CLSID (hex)', binascii.b2a_hex(ole.header_clsid).upper(), 'Should be 0']) | |
| 125 | + t.write_row(['Header CLSID', ole.header_clsid, 'Should be empty (0)']) | |
| 125 | 126 | t.write_row(['Minor Version', '%04X' % ole.minor_version, 'Should be 003E']) |
| 126 | 127 | t.write_row(['Major Version', '%04X' % ole.dll_version, 'Should be 3 or 4']) |
| 127 | 128 | t.write_row(['Byte Order', '%04X' % ole.byte_order, 'Should be FFFE (little endian)']) | ... | ... |
oletools/olemeta.py
| ... | ... | @@ -15,7 +15,7 @@ http://www.decalage.info/python/oletools |
| 15 | 15 | |
| 16 | 16 | #=== LICENSE ================================================================= |
| 17 | 17 | |
| 18 | -# olemeta is copyright (c) 2013-2018, Philippe Lagadec (http://www.decalage.info) | |
| 18 | +# olemeta is copyright (c) 2013-2019, Philippe Lagadec (http://www.decalage.info) | |
| 19 | 19 | # All rights reserved. |
| 20 | 20 | # |
| 21 | 21 | # Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -51,7 +51,7 @@ http://www.decalage.info/python/oletools |
| 51 | 51 | # 2017-05-04 PL: - added optparse and xglob (issue #141) |
| 52 | 52 | # 2018-09-11 v0.54 PL: - olefile is now a dependency |
| 53 | 53 | |
| 54 | -__version__ = '0.54dev1' | |
| 54 | +__version__ = '0.54' | |
| 55 | 55 | |
| 56 | 56 | #------------------------------------------------------------------------------ |
| 57 | 57 | # TODO: | ... | ... |
oletools/oleobj.py
| ... | ... | @@ -14,7 +14,7 @@ http://www.decalage.info/python/oletools |
| 14 | 14 | |
| 15 | 15 | # === LICENSE ================================================================= |
| 16 | 16 | |
| 17 | -# oleobj is copyright (c) 2015-2018 Philippe Lagadec (http://www.decalage.info) | |
| 17 | +# oleobj is copyright (c) 2015-2019 Philippe Lagadec (http://www.decalage.info) | |
| 18 | 18 | # All rights reserved. |
| 19 | 19 | # |
| 20 | 20 | # Redistribution and use in source and binary forms, with or without |
| ... | ... | @@ -89,7 +89,7 @@ from oletools.ooxml import XmlParser |
| 89 | 89 | # 2018-09-11 v0.54 PL: - olefile is now a dependency |
| 90 | 90 | # 2018-10-30 SA: - added detection of external links (PR #317) |
| 91 | 91 | |
| 92 | -__version__ = '0.54dev4' | |
| 92 | +__version__ = '0.54' | |
| 93 | 93 | |
| 94 | 94 | # ----------------------------------------------------------------------------- |
| 95 | 95 | # TODO: |
| ... | ... | @@ -526,29 +526,35 @@ def find_ole_in_ppt(filename): |
| 526 | 526 | can contain the actual embedded file we are looking for (caller will check |
| 527 | 527 | for these). |
| 528 | 528 | """ |
| 529 | - for stream in PptFile(filename).iter_streams(): | |
| 530 | - for record_idx, record in enumerate(stream.iter_records()): | |
| 531 | - if isinstance(record, PptRecordExOleVbaActiveXAtom): | |
| 532 | - ole = None | |
| 533 | - try: | |
| 534 | - data_start = next(record.iter_uncompressed()) | |
| 535 | - if data_start[:len(olefile.MAGIC)] != olefile.MAGIC: | |
| 536 | - continue # could be an ActiveX control or VBA Storage | |
| 537 | - | |
| 538 | - # otherwise, this should be an OLE object | |
| 539 | - log.debug('Found record with embedded ole object in ppt ' | |
| 540 | - '(stream "{0}", record no {1})' | |
| 541 | - .format(stream.name, record_idx)) | |
| 542 | - ole = record.get_data_as_olefile() | |
| 543 | - yield ole | |
| 544 | - except IOError: | |
| 545 | - log.warning('Error reading data from {0} stream or ' | |
| 546 | - 'interpreting it as OLE object' | |
| 547 | - .format(stream.name)) | |
| 548 | - log.debug('', exc_info=True) | |
| 549 | - finally: | |
| 550 | - if ole is not None: | |
| 551 | - ole.close() | |
| 529 | + ppt_file = None | |
| 530 | + try: | |
| 531 | + ppt_file = PptFile(filename) | |
| 532 | + for stream in ppt_file.iter_streams(): | |
| 533 | + for record_idx, record in enumerate(stream.iter_records()): | |
| 534 | + if isinstance(record, PptRecordExOleVbaActiveXAtom): | |
| 535 | + ole = None | |
| 536 | + try: | |
| 537 | + data_start = next(record.iter_uncompressed()) | |
| 538 | + if data_start[:len(olefile.MAGIC)] != olefile.MAGIC: | |
| 539 | + continue # could be ActiveX control / VBA Storage | |
| 540 | + | |
| 541 | + # otherwise, this should be an OLE object | |
| 542 | + log.debug('Found record with embedded ole object in ' | |
| 543 | + 'ppt (stream "{0}", record no {1})' | |
| 544 | + .format(stream.name, record_idx)) | |
| 545 | + ole = record.get_data_as_olefile() | |
| 546 | + yield ole | |
| 547 | + except IOError: | |
| 548 | + log.warning('Error reading data from {0} stream or ' | |
| 549 | + 'interpreting it as OLE object' | |
| 550 | + .format(stream.name)) | |
| 551 | + log.debug('', exc_info=True) | |
| 552 | + finally: | |
| 553 | + if ole is not None: | |
| 554 | + ole.close() | |
| 555 | + finally: | |
| 556 | + if ppt_file is not None: | |
| 557 | + ppt_file.close() | |
| 552 | 558 | |
| 553 | 559 | |
| 554 | 560 | class FakeFile(io.RawIOBase): |
| ... | ... | @@ -750,13 +756,13 @@ def process_file(filename, data, output_dir=None): |
| 750 | 756 | |
| 751 | 757 | xml_parser = None |
| 752 | 758 | if is_zipfile(filename): |
| 753 | - log.info('file is a OOXML file, looking for relationships with external links') | |
| 759 | + log.info('file could be an OOXML file, looking for relationships with ' | |
| 760 | + 'external links') | |
| 754 | 761 | xml_parser = XmlParser(filename) |
| 755 | 762 | for relationship, target in find_external_relationships(xml_parser): |
| 756 | 763 | did_dump = True |
| 757 | 764 | print("Found relationship '%s' with external link %s" % (relationship, target)) |
| 758 | 765 | |
| 759 | - | |
| 760 | 766 | # look for ole files inside file (e.g. unzip docx) |
| 761 | 767 | # have to finish work on every ole stream inside iteration, since handles |
| 762 | 768 | # are closed in find_ole |
| ... | ... | @@ -765,9 +771,9 @@ def process_file(filename, data, output_dir=None): |
| 765 | 771 | continue |
| 766 | 772 | |
| 767 | 773 | for path_parts in ole.listdir(): |
| 774 | + stream_path = '/'.join(path_parts) | |
| 775 | + log.debug('Checking stream %r', stream_path) | |
| 768 | 776 | if path_parts[-1] == '\x01Ole10Native': |
| 769 | - stream_path = '/'.join(path_parts) | |
| 770 | - log.debug('Checking stream %r', stream_path) | |
| 771 | 777 | stream = None |
| 772 | 778 | try: |
| 773 | 779 | stream = ole.openstream(path_parts) | ... | ... |
oletools/oletimes.py
| ... | ... | @@ -16,7 +16,7 @@ http://www.decalage.info/python/oletools |
| 16 | 16 | |
| 17 | 17 | #=== LICENSE ================================================================= |
| 18 | 18 | |
| 19 | -# oletimes is copyright (c) 2013-2017, Philippe Lagadec (http://www.decalage.info) | |
| 19 | +# oletimes is copyright (c) 2013-2019, Philippe Lagadec (http://www.decalage.info) | |
| 20 | 20 | # All rights reserved. |
| 21 | 21 | # |
| 22 | 22 | # Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -52,7 +52,7 @@ http://www.decalage.info/python/oletools |
| 52 | 52 | # 2017-05-04 PL: - added optparse and xglob (issue #141) |
| 53 | 53 | # 2018-09-11 v0.54 PL: - olefile is now a dependency |
| 54 | 54 | |
| 55 | -__version__ = '0.54dev1' | |
| 55 | +__version__ = '0.54' | |
| 56 | 56 | |
| 57 | 57 | #------------------------------------------------------------------------------ |
| 58 | 58 | # TODO: | ... | ... |
oletools/olevba.py
| ... | ... | @@ -7,14 +7,14 @@ olevba is a script to parse OLE and OpenXML files such as MS Office documents |
| 7 | 7 | and analyze malicious macros. |
| 8 | 8 | |
| 9 | 9 | Supported formats: |
| 10 | -- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm) | |
| 11 | -- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb) | |
| 12 | -- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm) | |
| 13 | -- Word/PowerPoint 2007+ XML (aka Flat OPC) | |
| 14 | -- Word 2003 XML (.xml) | |
| 15 | -- Word/Excel Single File Web Page / MHTML (.mht) | |
| 16 | -- Publisher (.pub) | |
| 17 | -- raises an error if run with files encrypted using MS Crypto API RC4 | |
| 10 | + - Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm) | |
| 11 | + - Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb) | |
| 12 | + - PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm) | |
| 13 | + - Word/PowerPoint 2007+ XML (aka Flat OPC) | |
| 14 | + - Word 2003 XML (.xml) | |
| 15 | + - Word/Excel Single File Web Page / MHTML (.mht) | |
| 16 | + - Publisher (.pub) | |
| 17 | + - raises an error if run with files encrypted using MS Crypto API RC4 | |
| 18 | 18 | |
| 19 | 19 | Author: Philippe Lagadec - http://www.decalage.info |
| 20 | 20 | License: BSD, see source code or documentation |
| ... | ... | @@ -28,7 +28,7 @@ https://github.com/unixfreak0037/officeparser |
| 28 | 28 | |
| 29 | 29 | # === LICENSE ================================================================== |
| 30 | 30 | |
| 31 | -# olevba is copyright (c) 2014-2018 Philippe Lagadec (http://www.decalage.info) | |
| 31 | +# olevba is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info) | |
| 32 | 32 | # All rights reserved. |
| 33 | 33 | # |
| 34 | 34 | # Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -210,8 +210,16 @@ from __future__ import print_function |
| 210 | 210 | # 2018-09-11 v0.54 PL: - olefile is now a dependency |
| 211 | 211 | # 2018-10-08 PL: - replace backspace before printing to console (issue #358) |
| 212 | 212 | # 2018-10-25 CH: - detect encryption and raise error if detected |
| 213 | +# 2018-12-03 PL: - uses tablestream (+colors) instead of prettytable | |
| 214 | +# 2018-12-06 PL: - colorize the suspicious keywords found in VBA code | |
| 215 | +# 2019-01-01 PL: - removed support for Python 2.6 | |
| 216 | +# 2019-03-18 PL: - added XLM/XLF macros detection for Excel OLE files | |
| 217 | +# 2019-03-25 CH: - added decryption of password-protected files | |
| 218 | +# 2019-04-09 PL: - decompress_stream accepts bytes (issue #422) | |
| 219 | +# 2019-05-23 v0.55 PL: - added option --pcode to call pcodedmp and display P-code | |
| 220 | +# 2019-06-05 PL: - added VBA stomping detection | |
| 213 | 221 | |
| 214 | -__version__ = '0.54dev4' | |
| 222 | +__version__ = '0.55.dev3' | |
| 215 | 223 | |
| 216 | 224 | #------------------------------------------------------------------------------ |
| 217 | 225 | # TODO: |
| ... | ... | @@ -236,23 +244,20 @@ __version__ = '0.54dev4' |
| 236 | 244 | # - extract_macros: use combined struct.unpack instead of many calls |
| 237 | 245 | # - all except clauses should target specific exceptions |
| 238 | 246 | |
| 239 | -#------------------------------------------------------------------------------ | |
| 247 | +# ------------------------------------------------------------------------------ | |
| 240 | 248 | # REFERENCES: |
| 241 | 249 | # - [MS-OVBA]: Microsoft Office VBA File Format Structure |
| 242 | 250 | # http://msdn.microsoft.com/en-us/library/office/cc313094%28v=office.12%29.aspx |
| 243 | 251 | # - officeparser: https://github.com/unixfreak0037/officeparser |
| 244 | 252 | |
| 245 | 253 | |
| 246 | -#--- IMPORTS ------------------------------------------------------------------ | |
| 254 | +# --- IMPORTS ------------------------------------------------------------------ | |
| 247 | 255 | |
| 248 | 256 | import sys |
| 249 | 257 | import os |
| 250 | 258 | import logging |
| 251 | 259 | import struct |
| 252 | -try: | |
| 253 | - from cStringIO import StringIO | |
| 254 | -except ImportError: | |
| 255 | - from io import StringIO | |
| 260 | +from io import BytesIO, StringIO | |
| 256 | 261 | import math |
| 257 | 262 | import zipfile |
| 258 | 263 | import re |
| ... | ... | @@ -261,7 +266,7 @@ import binascii |
| 261 | 266 | import base64 |
| 262 | 267 | import zlib |
| 263 | 268 | import email # for MHTML parsing |
| 264 | -import string # for printable | |
| 269 | +import string # for printable | |
| 265 | 270 | import json # for json output mode (argument --json) |
| 266 | 271 | |
| 267 | 272 | # import lxml or ElementTree for XML parsing: |
| ... | ... | @@ -297,11 +302,11 @@ _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) |
| 297 | 302 | # print('_thismodule_dir = %r' % _thismodule_dir) |
| 298 | 303 | _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..')) |
| 299 | 304 | # print('_parent_dir = %r' % _thirdparty_dir) |
| 300 | -if not _parent_dir in sys.path: | |
| 305 | +if _parent_dir not in sys.path: | |
| 301 | 306 | sys.path.insert(0, _parent_dir) |
| 302 | 307 | |
| 303 | 308 | import olefile |
| 304 | -from oletools.thirdparty.prettytable import prettytable | |
| 309 | +from oletools.thirdparty.tablestream import tablestream | |
| 305 | 310 | from oletools.thirdparty.xglob import xglob, PathNotFoundException |
| 306 | 311 | from pyparsing import \ |
| 307 | 312 | CaselessKeyword, CaselessLiteral, Combine, Forward, Literal, \ |
| ... | ... | @@ -311,9 +316,8 @@ from pyparsing import \ |
| 311 | 316 | from oletools import ppt_parser |
| 312 | 317 | from oletools import oleform |
| 313 | 318 | from oletools import rtfobj |
| 314 | -from oletools import oleid | |
| 315 | -from oletools.common.errors import FileIsEncryptedError | |
| 316 | - | |
| 319 | +from oletools import crypto | |
| 320 | +from oletools.common import codepages | |
| 317 | 321 | |
| 318 | 322 | # monkeypatch email to fix issue #32: |
| 319 | 323 | # allow header lines without ":" |
| ... | ... | @@ -324,30 +328,77 @@ email.feedparser.headerRE = re.compile(r'^(From |[\041-\071\073-\176]{1,}:?|[\t |
| 324 | 328 | |
| 325 | 329 | if sys.version_info[0] <= 2: |
| 326 | 330 | # Python 2.x |
| 327 | - if sys.version_info[1] <= 6: | |
| 328 | - # Python 2.6 | |
| 329 | - # use is_zipfile backported from Python 2.7: | |
| 330 | - from thirdparty.zipfile27 import is_zipfile | |
| 331 | - else: | |
| 332 | - # Python 2.7 | |
| 333 | - from zipfile import is_zipfile | |
| 331 | + PYTHON2 = True | |
| 332 | + # to use ord on bytes/bytearray items the same way in Python 2+3 | |
| 333 | + # on Python 2, just use the normal ord() because items are bytes | |
| 334 | + byte_ord = ord | |
| 335 | + #: Default string encoding for the olevba API | |
| 336 | + DEFAULT_API_ENCODING = 'utf8' # on Python 2: UTF-8 (bytes) | |
| 334 | 337 | else: |
| 335 | 338 | # Python 3.x+ |
| 336 | - from zipfile import is_zipfile | |
| 339 | + PYTHON2 = False | |
| 340 | + | |
| 341 | + # to use ord on bytes/bytearray items the same way in Python 2+3 | |
| 342 | + # on Python 3, items are int, so just return the item | |
| 343 | + def byte_ord(x): | |
| 344 | + return x | |
| 337 | 345 | # xrange is now called range: |
| 338 | 346 | xrange = range |
| 347 | + # unichr does not exist anymore, only chr: | |
| 348 | + unichr = chr | |
| 349 | + # json2ascii also needs "unicode": | |
| 350 | + unicode = str | |
| 351 | + from functools import reduce | |
| 352 | + #: Default string encoding for the olevba API | |
| 353 | + DEFAULT_API_ENCODING = None # on Python 3: None (unicode) | |
| 354 | + # Python 3.0 - 3.4 support: | |
| 355 | + # From https://gist.github.com/ynkdir/867347/c5e188a4886bc2dd71876c7e069a7b00b6c16c61 | |
| 356 | + if sys.version_info < (3, 5): | |
| 357 | + import codecs | |
| 358 | + _backslashreplace_errors = codecs.lookup_error("backslashreplace") | |
| 359 | + | |
| 360 | + def backslashreplace_errors(exc): | |
| 361 | + if isinstance(exc, UnicodeDecodeError): | |
| 362 | + u = "".join("\\x{0:02x}".format(c) for c in exc.object[exc.start:exc.end]) | |
| 363 | + return u, exc.end | |
| 364 | + return _backslashreplace_errors(exc) | |
| 365 | + | |
| 366 | + codecs.register_error("backslashreplace", backslashreplace_errors) | |
| 367 | + | |
| 368 | + | |
| 369 | +def unicode2str(unicode_string): | |
| 370 | + """ | |
| 371 | + convert a unicode string to a native str: | |
| 372 | + - on Python 3, it returns the same string | |
| 373 | + - on Python 2, the string is encoded with UTF-8 to a bytes str | |
| 374 | + :param unicode_string: unicode string to be converted | |
| 375 | + :return: the string converted to str | |
| 376 | + :rtype: str | |
| 377 | + """ | |
| 378 | + if PYTHON2: | |
| 379 | + return unicode_string.encode('utf8', errors='replace') | |
| 380 | + else: | |
| 381 | + return unicode_string | |
| 339 | 382 | |
| 340 | -# === LOGGING ================================================================= | |
| 341 | 383 | |
| 342 | -class NullHandler(logging.Handler): | |
| 384 | +def bytes2str(bytes_string, encoding='utf8'): | |
| 343 | 385 | """ |
| 344 | - Log Handler without output, to avoid printing messages if logging is not | |
| 345 | - configured by the main application. | |
| 346 | - Python 2.7 has logging.NullHandler, but this is necessary for 2.6: | |
| 347 | - see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library | |
| 386 | + convert a bytes string to a native str: | |
| 387 | + - on Python 2, it returns the same string (bytes=str) | |
| 388 | + - on Python 3, the string is decoded using the provided encoding | |
| 389 | + (UTF-8 by default) to a unicode str | |
| 390 | + :param bytes_string: bytes string to be converted | |
| 391 | + :param encoding: codec to be used for decoding | |
| 392 | + :return: the string converted to str | |
| 393 | + :rtype: str | |
| 348 | 394 | """ |
| 349 | - def emit(self, record): | |
| 350 | - pass | |
| 395 | + if PYTHON2: | |
| 396 | + return bytes_string | |
| 397 | + else: | |
| 398 | + return bytes_string.decode('utf8', errors='replace') | |
| 399 | + | |
| 400 | + | |
| 401 | +# === LOGGING ================================================================= | |
| 351 | 402 | |
| 352 | 403 | def get_logger(name, level=logging.CRITICAL+1): |
| 353 | 404 | """ |
| ... | ... | @@ -361,7 +412,7 @@ def get_logger(name, level=logging.CRITICAL+1): |
| 361 | 412 | # First, test if there is already a logger with the same name, else it |
| 362 | 413 | # will generate duplicate messages (due to duplicate handlers): |
| 363 | 414 | if name in logging.Logger.manager.loggerDict: |
| 364 | - #NOTE: another less intrusive but more "hackish" solution would be to | |
| 415 | + # NOTE: another less intrusive but more "hackish" solution would be to | |
| 365 | 416 | # use getLogger then test if its effective level is not default. |
| 366 | 417 | logger = logging.getLogger(name) |
| 367 | 418 | # make sure level is OK: |
| ... | ... | @@ -371,7 +422,7 @@ def get_logger(name, level=logging.CRITICAL+1): |
| 371 | 422 | logger = logging.getLogger(name) |
| 372 | 423 | # only add a NullHandler for this logger, it is up to the application |
| 373 | 424 | # to configure its own logging: |
| 374 | - logger.addHandler(NullHandler()) | |
| 425 | + logger.addHandler(logging.NullHandler()) | |
| 375 | 426 | logger.setLevel(level) |
| 376 | 427 | return logger |
| 377 | 428 | |
| ... | ... | @@ -388,6 +439,7 @@ def enable_logging(): |
| 388 | 439 | log.setLevel(logging.NOTSET) |
| 389 | 440 | # Also enable logging in the ppt_parser module: |
| 390 | 441 | ppt_parser.enable_logging() |
| 442 | + crypto.enable_logging() | |
| 391 | 443 | |
| 392 | 444 | |
| 393 | 445 | |
| ... | ... | @@ -564,7 +616,8 @@ AUTOEXEC_KEYWORDS = { |
| 564 | 616 | |
| 565 | 617 | # MS Excel: |
| 566 | 618 | 'Runs when the Excel Workbook is opened': |
| 567 | - ('Auto_Open', 'Workbook_Open', 'Workbook_Activate'), | |
| 619 | + ('Auto_Open', 'Workbook_Open', 'Workbook_Activate', 'Auto_Ope'), | |
| 620 | + # TODO: "Auto_Ope" is temporarily here because of a bug in plugin_biff, which misses the last byte in "Auto_Open"... | |
| 568 | 621 | 'Runs when the Excel Workbook is closed': |
| 569 | 622 | ('Auto_Close', 'Workbook_Close'), |
| 570 | 623 | |
| ... | ... | @@ -600,9 +653,10 @@ SUSPICIOUS_KEYWORDS = { |
| 600 | 653 | ('CreateTextFile', 'ADODB.Stream', 'WriteText', 'SaveToFile'), |
| 601 | 654 | #CreateTextFile: http://msdn.microsoft.com/en-us/library/office/gg264617%28v=office.15%29.aspx |
| 602 | 655 | #ADODB.Stream sample: http://pastebin.com/Z4TMyuq6 |
| 656 | + # ShellExecute: https://twitter.com/StanHacked/status/1075088449768693762 | |
| 603 | 657 | 'May run an executable file or a system command': |
| 604 | 658 | ('Shell', 'vbNormal', 'vbNormalFocus', 'vbHide', 'vbMinimizedFocus', 'vbMaximizedFocus', 'vbNormalNoFocus', |
| 605 | - 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute'), | |
| 659 | + 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute', 'ShellExecuteA', 'shell32'), | |
| 606 | 660 | # MacScript: see https://msdn.microsoft.com/en-us/library/office/gg264812.aspx |
| 607 | 661 | 'May run an executable file or a system command on a Mac': |
| 608 | 662 | ('MacScript',), |
| ... | ... | @@ -620,6 +674,8 @@ SUSPICIOUS_KEYWORDS = { |
| 620 | 674 | 'invoke-command', 'scriptblock', 'Invoke-Expression', 'AuthorizationManager'), |
| 621 | 675 | 'May run an executable file or a system command using PowerShell': |
| 622 | 676 | ('Start-Process',), |
| 677 | + 'May run an executable file or a system command using Excel 4 Macros (XLM/XLF)': | |
| 678 | + ('EXEC',), | |
| 623 | 679 | 'May hide the application': |
| 624 | 680 | ('Application.Visible', 'ShowWindow', 'SW_HIDE'), |
| 625 | 681 | 'May create a directory': |
| ... | ... | @@ -635,6 +691,8 @@ SUSPICIOUS_KEYWORDS = { |
| 635 | 691 | ('New-Object',), |
| 636 | 692 | 'May run an application (if combined with CreateObject)': |
| 637 | 693 | ('Shell.Application',), |
| 694 | + 'May run an Excel 4 Macro (aka XLM/XLF)': | |
| 695 | + ('ExecuteExcel4Macro',), | |
| 638 | 696 | 'May enumerate application windows (if combined with Shell.Application object)': |
| 639 | 697 | ('Windows', 'FindWindow'), |
| 640 | 698 | 'May run code from a DLL': |
| ... | ... | @@ -643,9 +701,12 @@ SUSPICIOUS_KEYWORDS = { |
| 643 | 701 | 'May run code from a library on a Mac': |
| 644 | 702 | #TODO: regex to find declare+lib on same line - see mraptor |
| 645 | 703 | ('libc.dylib', 'dylib'), |
| 704 | + 'May run code from a DLL using Excel 4 Macros (XLM/XLF)': | |
| 705 | + ('REGISTER',), | |
| 646 | 706 | 'May inject code into another process': |
| 647 | - ('CreateThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload | |
| 648 | - 'VirtualAllocEx', 'RtlMoveMemory', | |
| 707 | + ('CreateThread', 'CreateUserThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload | |
| 708 | + 'VirtualAllocEx', 'RtlMoveMemory', 'WriteProcessMemory', | |
| 709 | + 'SetContextThread', 'QueueApcThread', 'WriteVirtualMemory', 'VirtualProtect' | |
| 649 | 710 | ), |
| 650 | 711 | 'May run a shellcode in memory': |
| 651 | 712 | ('EnumSystemLanguageGroupsW?', # Used by Hancitor in Oct 2016 |
| ... | ... | @@ -777,7 +838,8 @@ re_dridex_string = re.compile(r'"[0-9A-Za-z]{20,}"') |
| 777 | 838 | re_nothex_check = re.compile(r'[G-Zg-z]') |
| 778 | 839 | |
| 779 | 840 | # regex to extract printable strings (at least 5 chars) from VBA Forms: |
| 780 | -re_printable_string = re.compile(r'[\t\r\n\x20-\xFF]{5,}') | |
| 841 | +# (must be bytes for Python 3) | |
| 842 | +re_printable_string = re.compile(b'[\\t\\r\\n\\x20-\\xFF]{5,}') | |
| 781 | 843 | |
| 782 | 844 | |
| 783 | 845 | # === PARTIAL VBA GRAMMAR ==================================================== |
| ... | ... | @@ -918,10 +980,13 @@ vba_chr = Suppress( |
| 918 | 980 | def vba_chr_tostr(t): |
| 919 | 981 | try: |
| 920 | 982 | i = t[0] |
| 921 | - # normal, non-unicode character: | |
| 922 | 983 | if i>=0 and i<=255: |
| 984 | + # normal, non-unicode character: | |
| 985 | + # TODO: check if it needs to be converted to bytes for Python 3 | |
| 923 | 986 | return VbaExpressionString(chr(i)) |
| 924 | 987 | else: |
| 988 | + # unicode character | |
| 989 | + # Note: this distinction is only needed for Python 2 | |
| 925 | 990 | return VbaExpressionString(unichr(i).encode('utf-8', 'backslashreplace')) |
| 926 | 991 | except ValueError: |
| 927 | 992 | log.exception('ERROR: incorrect parameter value for chr(): %r' % i) |
| ... | ... | @@ -1188,8 +1253,9 @@ def decompress_stream(compressed_container): |
| 1188 | 1253 | """ |
| 1189 | 1254 | Decompress a stream according to MS-OVBA section 2.4.1 |
| 1190 | 1255 | |
| 1191 | - compressed_container: string compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm | |
| 1192 | - return the decompressed container as a string (bytes) | |
| 1256 | + :param compressed_container bytearray: bytearray or bytes compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm | |
| 1257 | + :return: the decompressed container as a bytes string | |
| 1258 | + :rtype: bytes | |
| 1193 | 1259 | """ |
| 1194 | 1260 | # 2.4.1.2 State Variables |
| 1195 | 1261 | |
| ... | ... | @@ -1211,10 +1277,14 @@ def decompress_stream(compressed_container): |
| 1211 | 1277 | # DecompressedChunkStart: The location of the first byte of the DecompressedChunk (section 2.4.1.1.3) within the |
| 1212 | 1278 | # DecompressedBuffer (section 2.4.1.1.2). |
| 1213 | 1279 | |
| 1214 | - decompressed_container = '' # result | |
| 1280 | + # Check the input is a bytearray, otherwise convert it (assuming it's bytes): | |
| 1281 | + if not isinstance(compressed_container, bytearray): | |
| 1282 | + compressed_container = bytearray(compressed_container) | |
| 1283 | + # raise TypeError('decompress_stream requires a bytearray as input') | |
| 1284 | + decompressed_container = bytearray() # result | |
| 1215 | 1285 | compressed_current = 0 |
| 1216 | 1286 | |
| 1217 | - sig_byte = ord(compressed_container[compressed_current]) | |
| 1287 | + sig_byte = compressed_container[compressed_current] | |
| 1218 | 1288 | if sig_byte != 0x01: |
| 1219 | 1289 | raise ValueError('invalid signature byte {0:02X}'.format(sig_byte)) |
| 1220 | 1290 | |
| ... | ... | @@ -1260,7 +1330,7 @@ def decompress_stream(compressed_container): |
| 1260 | 1330 | # MS-OVBA 2.4.1.3.3 Decompressing a RawChunk |
| 1261 | 1331 | # uncompressed chunk: read the next 4096 bytes as-is |
| 1262 | 1332 | #TODO: check if there are at least 4096 bytes left |
| 1263 | - decompressed_container += compressed_container[compressed_current:compressed_current + 4096] | |
| 1333 | + decompressed_container.extend([compressed_container[compressed_current:compressed_current + 4096]]) | |
| 1264 | 1334 | compressed_current += 4096 |
| 1265 | 1335 | else: |
| 1266 | 1336 | # MS-OVBA 2.4.1.3.2 Decompressing a CompressedChunk |
| ... | ... | @@ -1271,7 +1341,7 @@ def decompress_stream(compressed_container): |
| 1271 | 1341 | # log.debug('compressed_current = %d / compressed_end = %d' % (compressed_current, compressed_end)) |
| 1272 | 1342 | # FlagByte: 8 bits indicating if the following 8 tokens are either literal (1 byte of plain text) or |
| 1273 | 1343 | # copy tokens (reference to a previous literal token) |
| 1274 | - flag_byte = ord(compressed_container[compressed_current]) | |
| 1344 | + flag_byte = compressed_container[compressed_current] | |
| 1275 | 1345 | compressed_current += 1 |
| 1276 | 1346 | for bit_index in xrange(0, 8): |
| 1277 | 1347 | # log.debug('bit_index=%d / compressed_current=%d / compressed_end=%d' % (bit_index, compressed_current, compressed_end)) |
| ... | ... | @@ -1283,7 +1353,7 @@ def decompress_stream(compressed_container): |
| 1283 | 1353 | #log.debug('bit_index=%d: flag_bit=%d' % (bit_index, flag_bit)) |
| 1284 | 1354 | if flag_bit == 0: # LiteralToken |
| 1285 | 1355 | # copy one byte directly to output |
| 1286 | - decompressed_container += compressed_container[compressed_current] | |
| 1356 | + decompressed_container.extend([compressed_container[compressed_current]]) | |
| 1287 | 1357 | compressed_current += 1 |
| 1288 | 1358 | else: # CopyToken |
| 1289 | 1359 | # MS-OVBA 2.4.1.3.19.2 Unpack CopyToken |
| ... | ... | @@ -1299,520 +1369,664 @@ def decompress_stream(compressed_container): |
| 1299 | 1369 | #log.debug('offset=%d length=%d' % (offset, length)) |
| 1300 | 1370 | copy_source = len(decompressed_container) - offset |
| 1301 | 1371 | for index in xrange(copy_source, copy_source + length): |
| 1302 | - decompressed_container += decompressed_container[index] | |
| 1372 | + decompressed_container.extend([decompressed_container[index]]) | |
| 1303 | 1373 | compressed_current += 2 |
| 1304 | - return decompressed_container | |
| 1374 | + return bytes(decompressed_container) | |
| 1305 | 1375 | |
| 1306 | 1376 | |
| 1307 | -def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False): | |
| 1377 | +class VBA_Module(object): | |
| 1308 | 1378 | """ |
| 1309 | - Extract VBA macros from an OleFileIO object. | |
| 1310 | - Internal function, do not call directly. | |
| 1311 | - | |
| 1312 | - vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream | |
| 1313 | - vba_project: path to the PROJECT stream | |
| 1314 | - :param relaxed: If True, only create info/debug log entry if data is not as expected | |
| 1315 | - (e.g. opening substream fails); if False, raise an error in this case | |
| 1316 | - This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream | |
| 1379 | + Class to parse a VBA module from an OLE file, and to store all the corresponding | |
| 1380 | + metadata and VBA source code. | |
| 1317 | 1381 | """ |
| 1318 | - # Open the PROJECT stream: | |
| 1319 | - project = ole.openstream(project_path) | |
| 1320 | - log.debug('relaxed is %s' % relaxed) | |
| 1321 | - | |
| 1322 | - # sample content of the PROJECT stream: | |
| 1323 | - | |
| 1324 | - ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}" | |
| 1325 | - ## Document=ThisDocument/&H00000000 | |
| 1326 | - ## Module=NewMacros | |
| 1327 | - ## Name="Project" | |
| 1328 | - ## HelpContextID="0" | |
| 1329 | - ## VersionCompatible32="393222000" | |
| 1330 | - ## CMG="F1F301E705E705E705E705" | |
| 1331 | - ## DPB="8F8D7FE3831F2020202020" | |
| 1332 | - ## GC="2D2FDD81E51EE61EE6E1" | |
| 1333 | - ## | |
| 1334 | - ## [Host Extender Info] | |
| 1335 | - ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000 | |
| 1336 | - ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000 | |
| 1337 | - ## | |
| 1338 | - ## [Workspace] | |
| 1339 | - ## ThisDocument=22, 29, 339, 477, Z | |
| 1340 | - ## NewMacros=-4, 42, 832, 510, C | |
| 1341 | - | |
| 1342 | - code_modules = {} | |
| 1343 | - | |
| 1344 | - for line in project: | |
| 1345 | - line = line.strip() | |
| 1346 | - if '=' in line: | |
| 1347 | - # split line at the 1st equal sign: | |
| 1348 | - name, value = line.split('=', 1) | |
| 1349 | - # looking for code modules | |
| 1350 | - # add the code module as a key in the dictionary | |
| 1351 | - # the value will be the extension needed later | |
| 1352 | - # The value is converted to lowercase, to allow case-insensitive matching (issue #3) | |
| 1353 | - value = value.lower() | |
| 1354 | - if name == 'Document': | |
| 1355 | - # split value at the 1st slash, keep 1st part: | |
| 1356 | - value = value.split('/', 1)[0] | |
| 1357 | - code_modules[value] = CLASS_EXTENSION | |
| 1358 | - elif name == 'Module': | |
| 1359 | - code_modules[value] = MODULE_EXTENSION | |
| 1360 | - elif name == 'Class': | |
| 1361 | - code_modules[value] = CLASS_EXTENSION | |
| 1362 | - elif name == 'BaseClass': | |
| 1363 | - code_modules[value] = FORM_EXTENSION | |
| 1364 | - | |
| 1365 | - # read data from dir stream (compressed) | |
| 1366 | - dir_compressed = ole.openstream(dir_path).read() | |
| 1367 | - | |
| 1368 | - def check_value(name, expected, value): | |
| 1369 | - if expected != value: | |
| 1370 | - if relaxed: | |
| 1371 | - log.error("invalid value for {0} expected {1:04X} got {2:04X}" | |
| 1372 | - .format(name, expected, value)) | |
| 1373 | - else: | |
| 1374 | - raise UnexpectedDataError(dir_path, name, expected, value) | |
| 1375 | - | |
| 1376 | - dir_stream = StringIO(decompress_stream(dir_compressed)) | |
| 1377 | - | |
| 1378 | - # PROJECTSYSKIND Record | |
| 1379 | - projectsyskind_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1380 | - check_value('PROJECTSYSKIND_Id', 0x0001, projectsyskind_id) | |
| 1381 | - projectsyskind_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1382 | - check_value('PROJECTSYSKIND_Size', 0x0004, projectsyskind_size) | |
| 1383 | - projectsyskind_syskind = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1384 | - if projectsyskind_syskind == 0x00: | |
| 1385 | - log.debug("16-bit Windows") | |
| 1386 | - elif projectsyskind_syskind == 0x01: | |
| 1387 | - log.debug("32-bit Windows") | |
| 1388 | - elif projectsyskind_syskind == 0x02: | |
| 1389 | - log.debug("Macintosh") | |
| 1390 | - elif projectsyskind_syskind == 0x03: | |
| 1391 | - log.debug("64-bit Windows") | |
| 1392 | - else: | |
| 1393 | - log.error("invalid PROJECTSYSKIND_SysKind {0:04X}".format(projectsyskind_syskind)) | |
| 1394 | - | |
| 1395 | - # PROJECTLCID Record | |
| 1396 | - projectlcid_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1397 | - check_value('PROJECTLCID_Id', 0x0002, projectlcid_id) | |
| 1398 | - projectlcid_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1399 | - check_value('PROJECTLCID_Size', 0x0004, projectlcid_size) | |
| 1400 | - projectlcid_lcid = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1401 | - check_value('PROJECTLCID_Lcid', 0x409, projectlcid_lcid) | |
| 1402 | - | |
| 1403 | - # PROJECTLCIDINVOKE Record | |
| 1404 | - projectlcidinvoke_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1405 | - check_value('PROJECTLCIDINVOKE_Id', 0x0014, projectlcidinvoke_id) | |
| 1406 | - projectlcidinvoke_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1407 | - check_value('PROJECTLCIDINVOKE_Size', 0x0004, projectlcidinvoke_size) | |
| 1408 | - projectlcidinvoke_lcidinvoke = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1409 | - check_value('PROJECTLCIDINVOKE_LcidInvoke', 0x409, projectlcidinvoke_lcidinvoke) | |
| 1410 | - | |
| 1411 | - # PROJECTCODEPAGE Record | |
| 1412 | - projectcodepage_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1413 | - check_value('PROJECTCODEPAGE_Id', 0x0003, projectcodepage_id) | |
| 1414 | - projectcodepage_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1415 | - check_value('PROJECTCODEPAGE_Size', 0x0002, projectcodepage_size) | |
| 1416 | - projectcodepage_codepage = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1417 | - | |
| 1418 | - # PROJECTNAME Record | |
| 1419 | - projectname_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1420 | - check_value('PROJECTNAME_Id', 0x0004, projectname_id) | |
| 1421 | - projectname_sizeof_projectname = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1422 | - if projectname_sizeof_projectname < 1 or projectname_sizeof_projectname > 128: | |
| 1423 | - log.error("PROJECTNAME_SizeOfProjectName value not in range: {0}".format(projectname_sizeof_projectname)) | |
| 1424 | - projectname_projectname = dir_stream.read(projectname_sizeof_projectname) | |
| 1425 | - unused = projectname_projectname | |
| 1426 | - | |
| 1427 | - # PROJECTDOCSTRING Record | |
| 1428 | - projectdocstring_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1429 | - check_value('PROJECTDOCSTRING_Id', 0x0005, projectdocstring_id) | |
| 1430 | - projectdocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1431 | - if projectdocstring_sizeof_docstring > 2000: | |
| 1432 | - log.error( | |
| 1433 | - "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring)) | |
| 1434 | - projectdocstring_docstring = dir_stream.read(projectdocstring_sizeof_docstring) | |
| 1435 | - projectdocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1436 | - check_value('PROJECTDOCSTRING_Reserved', 0x0040, projectdocstring_reserved) | |
| 1437 | - projectdocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1438 | - if projectdocstring_sizeof_docstring_unicode % 2 != 0: | |
| 1439 | - log.error("PROJECTDOCSTRING_SizeOfDocStringUnicode is not even") | |
| 1440 | - projectdocstring_docstring_unicode = dir_stream.read(projectdocstring_sizeof_docstring_unicode) | |
| 1441 | - unused = projectdocstring_docstring | |
| 1442 | - unused = projectdocstring_docstring_unicode | |
| 1443 | - | |
| 1444 | - # PROJECTHELPFILEPATH Record - MS-OVBA 2.3.4.2.1.7 | |
| 1445 | - projecthelpfilepath_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1446 | - check_value('PROJECTHELPFILEPATH_Id', 0x0006, projecthelpfilepath_id) | |
| 1447 | - projecthelpfilepath_sizeof_helpfile1 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1448 | - if projecthelpfilepath_sizeof_helpfile1 > 260: | |
| 1449 | - log.error( | |
| 1450 | - "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1)) | |
| 1451 | - projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1) | |
| 1452 | - projecthelpfilepath_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1453 | - check_value('PROJECTHELPFILEPATH_Reserved', 0x003D, projecthelpfilepath_reserved) | |
| 1454 | - projecthelpfilepath_sizeof_helpfile2 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1455 | - if projecthelpfilepath_sizeof_helpfile2 != projecthelpfilepath_sizeof_helpfile1: | |
| 1456 | - log.error("PROJECTHELPFILEPATH_SizeOfHelpFile1 does not equal PROJECTHELPFILEPATH_SizeOfHelpFile2") | |
| 1457 | - projecthelpfilepath_helpfile2 = dir_stream.read(projecthelpfilepath_sizeof_helpfile2) | |
| 1458 | - if projecthelpfilepath_helpfile2 != projecthelpfilepath_helpfile1: | |
| 1459 | - log.error("PROJECTHELPFILEPATH_HelpFile1 does not equal PROJECTHELPFILEPATH_HelpFile2") | |
| 1460 | - | |
| 1461 | - # PROJECTHELPCONTEXT Record | |
| 1462 | - projecthelpcontext_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1463 | - check_value('PROJECTHELPCONTEXT_Id', 0x0007, projecthelpcontext_id) | |
| 1464 | - projecthelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1465 | - check_value('PROJECTHELPCONTEXT_Size', 0x0004, projecthelpcontext_size) | |
| 1466 | - projecthelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1467 | - unused = projecthelpcontext_helpcontext | |
| 1468 | - | |
| 1469 | - # PROJECTLIBFLAGS Record | |
| 1470 | - projectlibflags_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1471 | - check_value('PROJECTLIBFLAGS_Id', 0x0008, projectlibflags_id) | |
| 1472 | - projectlibflags_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1473 | - check_value('PROJECTLIBFLAGS_Size', 0x0004, projectlibflags_size) | |
| 1474 | - projectlibflags_projectlibflags = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1475 | - check_value('PROJECTLIBFLAGS_ProjectLibFlags', 0x0000, projectlibflags_projectlibflags) | |
| 1476 | - | |
| 1477 | - # PROJECTVERSION Record | |
| 1478 | - projectversion_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1479 | - check_value('PROJECTVERSION_Id', 0x0009, projectversion_id) | |
| 1480 | - projectversion_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1481 | - check_value('PROJECTVERSION_Reserved', 0x0004, projectversion_reserved) | |
| 1482 | - projectversion_versionmajor = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1483 | - projectversion_versionminor = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1484 | - unused = projectversion_versionmajor | |
| 1485 | - unused = projectversion_versionminor | |
| 1486 | - | |
| 1487 | - # PROJECTCONSTANTS Record | |
| 1488 | - projectconstants_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1489 | - check_value('PROJECTCONSTANTS_Id', 0x000C, projectconstants_id) | |
| 1490 | - projectconstants_sizeof_constants = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1491 | - if projectconstants_sizeof_constants > 1015: | |
| 1492 | - log.error( | |
| 1493 | - "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants)) | |
| 1494 | - projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants) | |
| 1495 | - projectconstants_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1496 | - check_value('PROJECTCONSTANTS_Reserved', 0x003C, projectconstants_reserved) | |
| 1497 | - projectconstants_sizeof_constants_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1498 | - if projectconstants_sizeof_constants_unicode % 2 != 0: | |
| 1499 | - log.error("PROJECTCONSTANTS_SizeOfConstantsUnicode is not even") | |
| 1500 | - projectconstants_constants_unicode = dir_stream.read(projectconstants_sizeof_constants_unicode) | |
| 1501 | - unused = projectconstants_constants | |
| 1502 | - unused = projectconstants_constants_unicode | |
| 1503 | - | |
| 1504 | - # array of REFERENCE records | |
| 1505 | - check = None | |
| 1506 | - while True: | |
| 1507 | - check = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1508 | - log.debug("reference type = {0:04X}".format(check)) | |
| 1509 | - if check == 0x000F: | |
| 1510 | - break | |
| 1511 | - | |
| 1512 | - if check == 0x0016: | |
| 1513 | - # REFERENCENAME | |
| 1514 | - reference_id = check | |
| 1515 | - reference_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1516 | - reference_name = dir_stream.read(reference_sizeof_name) | |
| 1517 | - reference_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1518 | - # According to [MS-OVBA] 2.3.4.2.2.2 REFERENCENAME Record: | |
| 1519 | - # "Reserved (2 bytes): MUST be 0x003E. MUST be ignored." | |
| 1520 | - # So let's ignore it, otherwise it crashes on some files (issue #132) | |
| 1521 | - # PR #135 by @c1fe: | |
| 1522 | - # contrary to the specification I think that the unicode name | |
| 1523 | - # is optional. if reference_reserved is not 0x003E I think it | |
| 1524 | - # is actually the start of another REFERENCE record | |
| 1525 | - # at least when projectsyskind_syskind == 0x02 (Macintosh) | |
| 1526 | - if reference_reserved == 0x003E: | |
| 1527 | - #if reference_reserved not in (0x003E, 0x000D): | |
| 1528 | - # raise UnexpectedDataError(dir_path, 'REFERENCE_Reserved', | |
| 1529 | - # 0x0003E, reference_reserved) | |
| 1530 | - reference_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1531 | - reference_name_unicode = dir_stream.read(reference_sizeof_name_unicode) | |
| 1532 | - unused = reference_id | |
| 1533 | - unused = reference_name | |
| 1534 | - unused = reference_name_unicode | |
| 1535 | - continue | |
| 1536 | - else: | |
| 1537 | - check = reference_reserved | |
| 1538 | - log.debug("reference type = {0:04X}".format(check)) | |
| 1539 | - | |
| 1540 | - if check == 0x0033: | |
| 1541 | - # REFERENCEORIGINAL (followed by REFERENCECONTROL) | |
| 1542 | - referenceoriginal_id = check | |
| 1543 | - referenceoriginal_sizeof_libidoriginal = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1544 | - referenceoriginal_libidoriginal = dir_stream.read(referenceoriginal_sizeof_libidoriginal) | |
| 1545 | - unused = referenceoriginal_id | |
| 1546 | - unused = referenceoriginal_libidoriginal | |
| 1547 | - continue | |
| 1548 | - | |
| 1549 | - if check == 0x002F: | |
| 1550 | - # REFERENCECONTROL | |
| 1551 | - referencecontrol_id = check | |
| 1552 | - referencecontrol_sizetwiddled = struct.unpack("<L", dir_stream.read(4))[0] # ignore | |
| 1553 | - referencecontrol_sizeof_libidtwiddled = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1554 | - referencecontrol_libidtwiddled = dir_stream.read(referencecontrol_sizeof_libidtwiddled) | |
| 1555 | - referencecontrol_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] # ignore | |
| 1556 | - check_value('REFERENCECONTROL_Reserved1', 0x0000, referencecontrol_reserved1) | |
| 1557 | - referencecontrol_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] # ignore | |
| 1558 | - check_value('REFERENCECONTROL_Reserved2', 0x0000, referencecontrol_reserved2) | |
| 1559 | - unused = referencecontrol_id | |
| 1560 | - unused = referencecontrol_sizetwiddled | |
| 1561 | - unused = referencecontrol_libidtwiddled | |
| 1562 | - # optional field | |
| 1563 | - check2 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1564 | - if check2 == 0x0016: | |
| 1565 | - referencecontrol_namerecordextended_id = check | |
| 1566 | - referencecontrol_namerecordextended_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1567 | - referencecontrol_namerecordextended_name = dir_stream.read( | |
| 1568 | - referencecontrol_namerecordextended_sizeof_name) | |
| 1569 | - referencecontrol_namerecordextended_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1570 | - if referencecontrol_namerecordextended_reserved == 0x003E: | |
| 1571 | - referencecontrol_namerecordextended_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1572 | - referencecontrol_namerecordextended_name_unicode = dir_stream.read( | |
| 1573 | - referencecontrol_namerecordextended_sizeof_name_unicode) | |
| 1574 | - referencecontrol_reserved3 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1575 | - unused = referencecontrol_namerecordextended_id | |
| 1576 | - unused = referencecontrol_namerecordextended_name | |
| 1577 | - unused = referencecontrol_namerecordextended_name_unicode | |
| 1578 | - else: | |
| 1579 | - referencecontrol_reserved3 = referencecontrol_namerecordextended_reserved | |
| 1580 | - else: | |
| 1581 | - referencecontrol_reserved3 = check2 | |
| 1582 | - | |
| 1583 | - check_value('REFERENCECONTROL_Reserved3', 0x0030, referencecontrol_reserved3) | |
| 1584 | - referencecontrol_sizeextended = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1585 | - referencecontrol_sizeof_libidextended = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1586 | - referencecontrol_libidextended = dir_stream.read(referencecontrol_sizeof_libidextended) | |
| 1587 | - referencecontrol_reserved4 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1588 | - referencecontrol_reserved5 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1589 | - referencecontrol_originaltypelib = dir_stream.read(16) | |
| 1590 | - referencecontrol_cookie = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1591 | - unused = referencecontrol_sizeextended | |
| 1592 | - unused = referencecontrol_libidextended | |
| 1593 | - unused = referencecontrol_reserved4 | |
| 1594 | - unused = referencecontrol_reserved5 | |
| 1595 | - unused = referencecontrol_originaltypelib | |
| 1596 | - unused = referencecontrol_cookie | |
| 1597 | - continue | |
| 1598 | - | |
| 1599 | - if check == 0x000D: | |
| 1600 | - # REFERENCEREGISTERED | |
| 1601 | - referenceregistered_id = check | |
| 1602 | - referenceregistered_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1603 | - referenceregistered_sizeof_libid = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1604 | - referenceregistered_libid = dir_stream.read(referenceregistered_sizeof_libid) | |
| 1605 | - referenceregistered_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1606 | - check_value('REFERENCEREGISTERED_Reserved1', 0x0000, referenceregistered_reserved1) | |
| 1607 | - referenceregistered_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1608 | - check_value('REFERENCEREGISTERED_Reserved2', 0x0000, referenceregistered_reserved2) | |
| 1609 | - unused = referenceregistered_id | |
| 1610 | - unused = referenceregistered_size | |
| 1611 | - unused = referenceregistered_libid | |
| 1612 | - continue | |
| 1613 | 1382 | |
| 1614 | - if check == 0x000E: | |
| 1615 | - # REFERENCEPROJECT | |
| 1616 | - referenceproject_id = check | |
| 1617 | - referenceproject_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1618 | - referenceproject_sizeof_libidabsolute = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1619 | - referenceproject_libidabsolute = dir_stream.read(referenceproject_sizeof_libidabsolute) | |
| 1620 | - referenceproject_sizeof_libidrelative = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1621 | - referenceproject_libidrelative = dir_stream.read(referenceproject_sizeof_libidrelative) | |
| 1622 | - referenceproject_majorversion = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1623 | - referenceproject_minorversion = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1624 | - unused = referenceproject_id | |
| 1625 | - unused = referenceproject_size | |
| 1626 | - unused = referenceproject_libidabsolute | |
| 1627 | - unused = referenceproject_libidrelative | |
| 1628 | - unused = referenceproject_majorversion | |
| 1629 | - unused = referenceproject_minorversion | |
| 1630 | - continue | |
| 1383 | + def __init__(self, project, dir_stream, module_index): | |
| 1384 | + """ | |
| 1385 | + Parse a VBA Module record from the dir stream of a VBA project. | |
| 1386 | + Reference: MS-OVBA 2.3.4.2.3.2 MODULE Record | |
| 1631 | 1387 | |
| 1632 | - log.error('invalid or unknown check Id {0:04X}'.format(check)) | |
| 1633 | - # raise an exception instead of stopping abruptly (issue #180) | |
| 1634 | - raise UnexpectedDataError(dir_path, 'reference type', (0x0F, 0x16, 0x33, 0x2F, 0x0D, 0x0E), check) | |
| 1635 | - #sys.exit(0) | |
| 1636 | - | |
| 1637 | - projectmodules_id = check #struct.unpack("<H", dir_stream.read(2))[0] | |
| 1638 | - check_value('PROJECTMODULES_Id', 0x000F, projectmodules_id) | |
| 1639 | - projectmodules_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1640 | - check_value('PROJECTMODULES_Size', 0x0002, projectmodules_size) | |
| 1641 | - projectmodules_count = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1642 | - projectmodules_projectcookierecord_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1643 | - check_value('PROJECTMODULES_ProjectCookieRecord_Id', 0x0013, projectmodules_projectcookierecord_id) | |
| 1644 | - projectmodules_projectcookierecord_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1645 | - check_value('PROJECTMODULES_ProjectCookieRecord_Size', 0x0002, projectmodules_projectcookierecord_size) | |
| 1646 | - projectmodules_projectcookierecord_cookie = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1647 | - unused = projectmodules_projectcookierecord_cookie | |
| 1648 | - | |
| 1649 | - # short function to simplify unicode text output | |
| 1650 | - uni_out = lambda unicode_text: unicode_text.encode('utf-8', 'replace') | |
| 1651 | - | |
| 1652 | - log.debug("parsing {0} modules".format(projectmodules_count)) | |
| 1653 | - for projectmodule_index in xrange(0, projectmodules_count): | |
| 1388 | + :param VBA_Project project: VBA_Project, corresponding VBA project | |
| 1389 | + :param olefile.OleStream dir_stream: olefile.OleStream, file object containing the module record | |
| 1390 | + :param int module_index: int, index of the module in the VBA project list | |
| 1391 | + """ | |
| 1392 | + #: reference to the VBA project for later use (VBA_Project) | |
| 1393 | + self.project = project | |
| 1394 | + #: VBA module name (unicode str) | |
| 1395 | + self.name = None | |
| 1396 | + #: VBA module name as a native str (utf8 bytes on py2, str on py3) | |
| 1397 | + self.name_str = None | |
| 1398 | + #: VBA module name, unicode copy (unicode str) | |
| 1399 | + self._name_unicode = None | |
| 1400 | + #: Stream name containing the VBA module (unicode str) | |
| 1401 | + self.streamname = None | |
| 1402 | + #: Stream name containing the VBA module as a native str (utf8 bytes on py2, str on py3) | |
| 1403 | + self.streamname_str = None | |
| 1404 | + self._streamname_unicode = None | |
| 1405 | + self.docstring = None | |
| 1406 | + self._docstring_unicode = None | |
| 1407 | + self.textoffset = None | |
| 1408 | + self.type = None | |
| 1409 | + self.readonly = False | |
| 1410 | + self.private = False | |
| 1411 | + #: VBA source code in bytes format, using the original code page from the VBA project | |
| 1412 | + self.code_raw = None | |
| 1413 | + #: VBA source code in unicode format (unicode for Python2, str for Python 3) | |
| 1414 | + self.code = None | |
| 1415 | + #: VBA source code in native str format (str encoded with UTF-8 for Python 2, str for Python 3) | |
| 1416 | + self.code_str = None | |
| 1417 | + #: VBA module file name including an extension based on the module type such as bas, cls, frm (unicode str) | |
| 1418 | + self.filename = None | |
| 1419 | + #: VBA module file name in native str format (str) | |
| 1420 | + self.filename_str = None | |
| 1421 | + self.code_path = None | |
| 1654 | 1422 | try: |
| 1655 | - modulename_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1656 | - check_value('MODULENAME_Id', 0x0019, modulename_id) | |
| 1657 | - modulename_sizeof_modulename = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1658 | - modulename_modulename = dir_stream.read(modulename_sizeof_modulename) | |
| 1659 | - # TODO: preset variables to avoid "referenced before assignment" errors | |
| 1660 | - modulename_unicode_modulename_unicode = '' | |
| 1423 | + # 2.3.4.2.3.2.1 MODULENAME Record | |
| 1424 | + # Specifies a VBA identifier as the name of the containing MODULE Record | |
| 1425 | + _id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1426 | + project.check_value('MODULENAME_Id', 0x0019, _id) | |
| 1427 | + size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1428 | + modulename_bytes = dir_stream.read(size) | |
| 1429 | + # Module name always stored as Unicode: | |
| 1430 | + self.name = project.decode_bytes(modulename_bytes) | |
| 1431 | + self.name_str = unicode2str(self.name) | |
| 1661 | 1432 | # account for optional sections |
| 1433 | + # TODO: shouldn't this be a loop? (check MS-OVBA) | |
| 1662 | 1434 | section_id = struct.unpack("<H", dir_stream.read(2))[0] |
| 1663 | 1435 | if section_id == 0x0047: |
| 1664 | - modulename_unicode_id = section_id | |
| 1665 | - modulename_unicode_sizeof_modulename_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1666 | - modulename_unicode_modulename_unicode = dir_stream.read( | |
| 1667 | - modulename_unicode_sizeof_modulename_unicode).decode('UTF-16LE', 'replace') | |
| 1668 | - # just guessing that this is the same encoding as used in OleFileIO | |
| 1669 | - unused = modulename_unicode_id | |
| 1436 | + # 2.3.4.2.3.2.2 MODULENAMEUNICODE Record | |
| 1437 | + # Specifies a VBA identifier as the name of the containing MODULE Record (section 2.3.4.2.3.2). | |
| 1438 | + # MUST contain the UTF-16 encoding of MODULENAME Record | |
| 1439 | + size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1440 | + self._name_unicode = dir_stream.read(size).decode('UTF-16LE', 'replace') | |
| 1670 | 1441 | section_id = struct.unpack("<H", dir_stream.read(2))[0] |
| 1671 | 1442 | if section_id == 0x001A: |
| 1672 | - modulestreamname_id = section_id | |
| 1673 | - modulestreamname_sizeof_streamname = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1674 | - modulestreamname_streamname = dir_stream.read(modulestreamname_sizeof_streamname) | |
| 1675 | - modulestreamname_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1676 | - check_value('MODULESTREAMNAME_Reserved', 0x0032, modulestreamname_reserved) | |
| 1677 | - modulestreamname_sizeof_streamname_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1678 | - modulestreamname_streamname_unicode = dir_stream.read( | |
| 1679 | - modulestreamname_sizeof_streamname_unicode).decode('UTF-16LE', 'replace') | |
| 1680 | - # just guessing that this is the same encoding as used in OleFileIO | |
| 1681 | - unused = modulestreamname_id | |
| 1443 | + # 2.3.4.2.3.2.3 MODULESTREAMNAME Record | |
| 1444 | + # Specifies the stream name of the ModuleStream (section 2.3.4.3) in the VBA Storage (section 2.3.4) | |
| 1445 | + # corresponding to the containing MODULE Record | |
| 1446 | + size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1447 | + streamname_bytes = dir_stream.read(size) | |
| 1448 | + # Store it as Unicode: | |
| 1449 | + self.streamname = project.decode_bytes(streamname_bytes) | |
| 1450 | + self.streamname_str = unicode2str(self.streamname) | |
| 1451 | + reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1452 | + project.check_value('MODULESTREAMNAME_Reserved', 0x0032, reserved) | |
| 1453 | + size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1454 | + self._streamname_unicode = dir_stream.read(size).decode('UTF-16LE', 'replace') | |
| 1682 | 1455 | section_id = struct.unpack("<H", dir_stream.read(2))[0] |
| 1683 | 1456 | if section_id == 0x001C: |
| 1684 | - moduledocstring_id = section_id | |
| 1685 | - check_value('MODULEDOCSTRING_Id', 0x001C, moduledocstring_id) | |
| 1686 | - moduledocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1687 | - moduledocstring_docstring = dir_stream.read(moduledocstring_sizeof_docstring) | |
| 1688 | - moduledocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1689 | - check_value('MODULEDOCSTRING_Reserved', 0x0048, moduledocstring_reserved) | |
| 1690 | - moduledocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1691 | - moduledocstring_docstring_unicode = dir_stream.read(moduledocstring_sizeof_docstring_unicode) | |
| 1692 | - unused = moduledocstring_docstring | |
| 1693 | - unused = moduledocstring_docstring_unicode | |
| 1457 | + # 2.3.4.2.3.2.4 MODULEDOCSTRING Record | |
| 1458 | + # Specifies the description for the containing MODULE Record | |
| 1459 | + size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1460 | + docstring_bytes = dir_stream.read(size) | |
| 1461 | + self.docstring = project.decode_bytes(docstring_bytes) | |
| 1462 | + reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1463 | + project.check_value('MODULEDOCSTRING_Reserved', 0x0048, reserved) | |
| 1464 | + size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1465 | + self._docstring_unicode = dir_stream.read(size) | |
| 1694 | 1466 | section_id = struct.unpack("<H", dir_stream.read(2))[0] |
| 1695 | 1467 | if section_id == 0x0031: |
| 1696 | - moduleoffset_id = section_id | |
| 1697 | - check_value('MODULEOFFSET_Id', 0x0031, moduleoffset_id) | |
| 1698 | - moduleoffset_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1699 | - check_value('MODULEOFFSET_Size', 0x0004, moduleoffset_size) | |
| 1700 | - moduleoffset_textoffset = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1468 | + # 2.3.4.2.3.2.5 MODULEOFFSET Record | |
| 1469 | + # Specifies the location of the source code within the ModuleStream (section 2.3.4.3) | |
| 1470 | + # that corresponds to the containing MODULE Record | |
| 1471 | + size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1472 | + project.check_value('MODULEOFFSET_Size', 0x0004, size) | |
| 1473 | + self.textoffset = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1701 | 1474 | section_id = struct.unpack("<H", dir_stream.read(2))[0] |
| 1702 | 1475 | if section_id == 0x001E: |
| 1703 | - modulehelpcontext_id = section_id | |
| 1704 | - check_value('MODULEHELPCONTEXT_Id', 0x001E, modulehelpcontext_id) | |
| 1476 | + # 2.3.4.2.3.2.6 MODULEHELPCONTEXT Record | |
| 1477 | + # Specifies the Help topic identifier for the containing MODULE Record | |
| 1705 | 1478 | modulehelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0] |
| 1706 | - check_value('MODULEHELPCONTEXT_Size', 0x0004, modulehelpcontext_size) | |
| 1707 | - modulehelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1708 | - unused = modulehelpcontext_helpcontext | |
| 1479 | + project.check_value('MODULEHELPCONTEXT_Size', 0x0004, modulehelpcontext_size) | |
| 1480 | + # HelpContext (4 bytes): An unsigned integer that specifies the Help topic identifier | |
| 1481 | + # in the Help file specified by PROJECTHELPFILEPATH Record | |
| 1482 | + helpcontext = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1709 | 1483 | section_id = struct.unpack("<H", dir_stream.read(2))[0] |
| 1710 | 1484 | if section_id == 0x002C: |
| 1711 | - modulecookie_id = section_id | |
| 1712 | - check_value('MODULECOOKIE_Id', 0x002C, modulecookie_id) | |
| 1713 | - modulecookie_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1714 | - check_value('MODULECOOKIE_Size', 0x0002, modulecookie_size) | |
| 1715 | - modulecookie_cookie = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1716 | - unused = modulecookie_cookie | |
| 1485 | + # 2.3.4.2.3.2.7 MODULECOOKIE Record | |
| 1486 | + # Specifies ignored data. | |
| 1487 | + size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1488 | + project.check_value('MODULECOOKIE_Size', 0x0002, size) | |
| 1489 | + cookie = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1717 | 1490 | section_id = struct.unpack("<H", dir_stream.read(2))[0] |
| 1718 | 1491 | if section_id == 0x0021 or section_id == 0x0022: |
| 1719 | - moduletype_id = section_id | |
| 1720 | - moduletype_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1721 | - unused = moduletype_id | |
| 1722 | - unused = moduletype_reserved | |
| 1492 | + # 2.3.4.2.3.2.8 MODULETYPE Record | |
| 1493 | + # Specifies whether the containing MODULE Record (section 2.3.4.2.3.2) is a procedural module, | |
| 1494 | + # document module, class module, or designer module. | |
| 1495 | + # Id (2 bytes): An unsigned integer that specifies the identifier for this record. | |
| 1496 | + # MUST be 0x0021 when the containing MODULE Record (section 2.3.4.2.3.2) is a procedural module. | |
| 1497 | + # MUST be 0x0022 when the containing MODULE Record (section 2.3.4.2.3.2) is a document module, | |
| 1498 | + # class module, or designer module. | |
| 1499 | + self.type = section_id | |
| 1500 | + reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1723 | 1501 | section_id = struct.unpack("<H", dir_stream.read(2))[0] |
| 1724 | 1502 | if section_id == 0x0025: |
| 1725 | - modulereadonly_id = section_id | |
| 1726 | - check_value('MODULEREADONLY_Id', 0x0025, modulereadonly_id) | |
| 1727 | - modulereadonly_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1728 | - check_value('MODULEREADONLY_Reserved', 0x0000, modulereadonly_reserved) | |
| 1503 | + # 2.3.4.2.3.2.9 MODULEREADONLY Record | |
| 1504 | + # Specifies that the containing MODULE Record (section 2.3.4.2.3.2) is read-only. | |
| 1505 | + self.readonly = True | |
| 1506 | + reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1507 | + project.check_value('MODULEREADONLY_Reserved', 0x0000, reserved) | |
| 1729 | 1508 | section_id = struct.unpack("<H", dir_stream.read(2))[0] |
| 1730 | 1509 | if section_id == 0x0028: |
| 1731 | - moduleprivate_id = section_id | |
| 1732 | - check_value('MODULEPRIVATE_Id', 0x0028, moduleprivate_id) | |
| 1733 | - moduleprivate_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1734 | - check_value('MODULEPRIVATE_Reserved', 0x0000, moduleprivate_reserved) | |
| 1510 | + # 2.3.4.2.3.2.10 MODULEPRIVATE Record | |
| 1511 | + # Specifies that the containing MODULE Record (section 2.3.4.2.3.2) is only usable from within | |
| 1512 | + # the current VBA project. | |
| 1513 | + self.private = True | |
| 1514 | + reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1515 | + project.check_value('MODULEPRIVATE_Reserved', 0x0000, reserved) | |
| 1735 | 1516 | section_id = struct.unpack("<H", dir_stream.read(2))[0] |
| 1736 | 1517 | if section_id == 0x002B: # TERMINATOR |
| 1737 | - module_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1738 | - check_value('MODULE_Reserved', 0x0000, module_reserved) | |
| 1518 | + # Terminator (2 bytes): An unsigned integer that specifies the end of this record. MUST be 0x002B. | |
| 1519 | + # Reserved (4 bytes): MUST be 0x00000000. MUST be ignored. | |
| 1520 | + reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1521 | + project.check_value('MODULE_Reserved', 0x0000, reserved) | |
| 1739 | 1522 | section_id = None |
| 1740 | 1523 | if section_id != None: |
| 1741 | 1524 | log.warning('unknown or invalid module section id {0:04X}'.format(section_id)) |
| 1742 | - | |
| 1743 | - log.debug('Project CodePage = %d' % projectcodepage_codepage) | |
| 1744 | - if projectcodepage_codepage in MAC_CODEPAGES: | |
| 1745 | - vba_codec = MAC_CODEPAGES[projectcodepage_codepage] | |
| 1746 | - else: | |
| 1747 | - vba_codec = 'cp%d' % projectcodepage_codepage | |
| 1748 | - log.debug("ModuleName = {0}".format(modulename_modulename)) | |
| 1749 | - log.debug("ModuleNameUnicode = {0}".format(uni_out(modulename_unicode_modulename_unicode))) | |
| 1750 | - log.debug("StreamName = {0}".format(modulestreamname_streamname)) | |
| 1751 | - try: | |
| 1752 | - streamname_unicode = modulestreamname_streamname.decode(vba_codec) | |
| 1753 | - except UnicodeError as ue: | |
| 1754 | - log.debug('failed to decode stream name {0!r} with codec {1}' | |
| 1755 | - .format(uni_out(streamname_unicode), vba_codec)) | |
| 1756 | - streamname_unicode = modulestreamname_streamname.decode(vba_codec, errors='replace') | |
| 1757 | - log.debug("StreamName.decode('%s') = %s" % (vba_codec, uni_out(streamname_unicode))) | |
| 1758 | - log.debug("StreamNameUnicode = {0}".format(uni_out(modulestreamname_streamname_unicode))) | |
| 1759 | - log.debug("TextOffset = {0}".format(moduleoffset_textoffset)) | |
| 1760 | - | |
| 1525 | + | |
| 1526 | + log.debug("Module Name = {0}".format(self.name_str)) | |
| 1527 | + # log.debug("Module Name Unicode = {0}".format(self._name_unicode)) | |
| 1528 | + log.debug("Stream Name = {0}".format(self.streamname_str)) | |
| 1529 | + # log.debug("Stream Name Unicode = {0}".format(self._streamname_unicode)) | |
| 1530 | + log.debug("TextOffset = {0}".format(self.textoffset)) | |
| 1531 | + | |
| 1761 | 1532 | code_data = None |
| 1762 | - try_names = streamname_unicode, \ | |
| 1763 | - modulename_unicode_modulename_unicode, \ | |
| 1764 | - modulestreamname_streamname_unicode | |
| 1533 | + # let's try the different names we have, just in case some are missing: | |
| 1534 | + try_names = (self.streamname, self._streamname_unicode, self.name, self._name_unicode) | |
| 1765 | 1535 | for stream_name in try_names: |
| 1766 | 1536 | # TODO: if olefile._find were less private, could replace this |
| 1767 | 1537 | # try-except with calls to it |
| 1768 | - try: | |
| 1769 | - code_path = vba_root + u'VBA/' + stream_name | |
| 1770 | - log.debug('opening VBA code stream %s' % uni_out(code_path)) | |
| 1771 | - code_data = ole.openstream(code_path).read() | |
| 1772 | - break | |
| 1773 | - except IOError as ioe: | |
| 1774 | - log.debug('failed to open stream VBA/%r (%r), try other name' | |
| 1775 | - % (uni_out(stream_name), ioe)) | |
| 1776 | - | |
| 1538 | + if stream_name is not None: | |
| 1539 | + try: | |
| 1540 | + self.code_path = project.vba_root + u'VBA/' + stream_name | |
| 1541 | + log.debug('opening VBA code stream %s' % self.code_path) | |
| 1542 | + code_data = project.ole.openstream(self.code_path).read() | |
| 1543 | + break | |
| 1544 | + except IOError as ioe: | |
| 1545 | + log.debug('failed to open stream VBA/%r (%r), try other name' | |
| 1546 | + % (stream_name, ioe)) | |
| 1547 | + | |
| 1777 | 1548 | if code_data is None: |
| 1778 | 1549 | log.info("Could not open stream %d of %d ('VBA/' + one of %r)!" |
| 1779 | - % (projectmodule_index, projectmodules_count, | |
| 1780 | - '/'.join("'" + uni_out(stream_name) + "'" | |
| 1781 | - for stream_name in try_names))) | |
| 1782 | - if relaxed: | |
| 1783 | - continue # ... with next submodule | |
| 1550 | + % (module_index, project.modules_count, | |
| 1551 | + '/'.join("'" + stream_name + "'" | |
| 1552 | + for stream_name in try_names))) | |
| 1553 | + if project.relaxed: | |
| 1554 | + return # ... continue with next submodule | |
| 1784 | 1555 | else: |
| 1785 | - raise SubstreamOpenError('[BASE]', 'VBA/' + | |
| 1786 | - uni_out(modulename_unicode_modulename_unicode)) | |
| 1787 | - | |
| 1556 | + raise SubstreamOpenError('[BASE]', 'VBA/' + self.name) | |
| 1557 | + | |
| 1788 | 1558 | log.debug("length of code_data = {0}".format(len(code_data))) |
| 1789 | - log.debug("offset of code_data = {0}".format(moduleoffset_textoffset)) | |
| 1790 | - code_data = code_data[moduleoffset_textoffset:] | |
| 1559 | + log.debug("offset of code_data = {0}".format(self.textoffset)) | |
| 1560 | + code_data = code_data[self.textoffset:] | |
| 1791 | 1561 | if len(code_data) > 0: |
| 1792 | - code_data = decompress_stream(code_data) | |
| 1562 | + code_data = decompress_stream(bytearray(code_data)) | |
| 1563 | + # store the raw code encoded as bytes with the project's code page: | |
| 1564 | + self.code_raw = code_data | |
| 1565 | + # decode it to unicode: | |
| 1566 | + self.code = project.decode_bytes(code_data) | |
| 1567 | + # also store a native str version: | |
| 1568 | + self.code_str = unicode2str(self.code) | |
| 1793 | 1569 | # case-insensitive search in the code_modules dict to find the file extension: |
| 1794 | - filext = code_modules.get(modulename_modulename.lower(), 'bin') | |
| 1795 | - filename = '{0}.{1}'.format(modulename_modulename, filext) | |
| 1796 | - #TODO: also yield the codepage so that callers can decode it properly | |
| 1797 | - yield (code_path, filename, code_data) | |
| 1798 | - # print '-'*79 | |
| 1799 | - # print filename | |
| 1800 | - # print '' | |
| 1801 | - # print code_data | |
| 1802 | - # print '' | |
| 1803 | - log.debug('extracted file {0}'.format(filename)) | |
| 1570 | + filext = self.project.module_ext.get(self.name.lower(), 'vba') | |
| 1571 | + self.filename = u'{0}.{1}'.format(self.name, filext) | |
| 1572 | + self.filename_str = unicode2str(self.filename) | |
| 1573 | + log.debug('extracted file {0}'.format(self.filename_str)) | |
| 1804 | 1574 | else: |
| 1805 | - log.warning("module stream {0} has code data length 0".format(modulestreamname_streamname)) | |
| 1575 | + log.warning("module stream {0} has code data length 0".format(self.streamname_str)) | |
| 1806 | 1576 | except (UnexpectedDataError, SubstreamOpenError): |
| 1807 | 1577 | raise |
| 1808 | 1578 | except Exception as exc: |
| 1809 | - log.info('Error parsing module {0} of {1} in _extract_vba:' | |
| 1810 | - .format(projectmodule_index, projectmodules_count), | |
| 1579 | + log.info('Error parsing module {0} of {1}:' | |
| 1580 | + .format(module_index, project.modules_count), | |
| 1811 | 1581 | exc_info=True) |
| 1812 | - if not relaxed: | |
| 1582 | + if not project.relaxed: | |
| 1813 | 1583 | raise |
| 1814 | - _ = unused # make pylint happy: now variable "unused" is being used ;-) | |
| 1815 | - return | |
| 1584 | + | |
| 1585 | + | |
| 1586 | +class VBA_Project(object): | |
| 1587 | + """ | |
| 1588 | + Class to parse a VBA project from an OLE file, and to store all the corresponding | |
| 1589 | + metadata and VBA modules. | |
| 1590 | + """ | |
| 1591 | + | |
| 1592 | + def __init__(self, ole, vba_root, project_path, dir_path, relaxed=False): | |
| 1593 | + """ | |
| 1594 | + Extract VBA macros from an OleFileIO object. | |
| 1595 | + | |
| 1596 | + :param vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream | |
| 1597 | + :param project_path: path to the PROJECT stream | |
| 1598 | + :param relaxed: If True, only create info/debug log entry if data is not as expected | |
| 1599 | + (e.g. opening substream fails); if False, raise an error in this case | |
| 1600 | + """ | |
| 1601 | + self.ole = ole | |
| 1602 | + self.vba_root = vba_root | |
| 1603 | + self. project_path = project_path | |
| 1604 | + self.dir_path = dir_path | |
| 1605 | + self.relaxed = relaxed | |
| 1606 | + #: VBA modules contained in the project (list of VBA_Module objects) | |
| 1607 | + self.modules = [] | |
| 1608 | + #: file extension for each VBA module | |
| 1609 | + self.module_ext = {} | |
| 1610 | + log.debug('Parsing the dir stream from %r' % dir_path) | |
| 1611 | + # read data from dir stream (compressed) | |
| 1612 | + dir_compressed = ole.openstream(dir_path).read() | |
| 1613 | + # decompress it: | |
| 1614 | + dir_stream = BytesIO(decompress_stream(bytearray(dir_compressed))) | |
| 1615 | + # store reference for later use: | |
| 1616 | + self.dir_stream = dir_stream | |
| 1617 | + | |
| 1618 | + # reference: MS-VBAL 2.3.4.2 dir Stream: Version Independent Project Information | |
| 1619 | + | |
| 1620 | + # PROJECTSYSKIND Record | |
| 1621 | + # Specifies the platform for which the VBA project is created. | |
| 1622 | + projectsyskind_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1623 | + self.check_value('PROJECTSYSKIND_Id', 0x0001, projectsyskind_id) | |
| 1624 | + projectsyskind_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1625 | + self.check_value('PROJECTSYSKIND_Size', 0x0004, projectsyskind_size) | |
| 1626 | + self.syskind = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1627 | + SYSKIND_NAME = { | |
| 1628 | + 0x00: "16-bit Windows", | |
| 1629 | + 0x01: "32-bit Windows", | |
| 1630 | + 0x02: "Macintosh", | |
| 1631 | + 0x03: "64-bit Windows" | |
| 1632 | + } | |
| 1633 | + self.syskind_name = SYSKIND_NAME.get(self.syskind, 'Unknown') | |
| 1634 | + log.debug("PROJECTSYSKIND_SysKind: %d - %s" % (self.syskind, self.syskind_name)) | |
| 1635 | + if self.syskind not in SYSKIND_NAME: | |
| 1636 | + log.error("invalid PROJECTSYSKIND_SysKind {0:04X}".format(self.syskind)) | |
| 1637 | + | |
| 1638 | + # PROJECTLCID Record | |
| 1639 | + # Specifies the VBA project's LCID. | |
| 1640 | + projectlcid_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1641 | + self.check_value('PROJECTLCID_Id', 0x0002, projectlcid_id) | |
| 1642 | + projectlcid_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1643 | + self.check_value('PROJECTLCID_Size', 0x0004, projectlcid_size) | |
| 1644 | + # Lcid (4 bytes): An unsigned integer that specifies the LCID value for the VBA project. MUST be 0x00000409. | |
| 1645 | + self.lcid = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1646 | + self.check_value('PROJECTLCID_Lcid', 0x409, self.lcid) | |
| 1647 | + | |
| 1648 | + # PROJECTLCIDINVOKE Record | |
| 1649 | + # Specifies an LCID value used for Invoke calls on an Automation server as specified in [MS-OAUT] section 3.1.4.4. | |
| 1650 | + projectlcidinvoke_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1651 | + self.check_value('PROJECTLCIDINVOKE_Id', 0x0014, projectlcidinvoke_id) | |
| 1652 | + projectlcidinvoke_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1653 | + self.check_value('PROJECTLCIDINVOKE_Size', 0x0004, projectlcidinvoke_size) | |
| 1654 | + # LcidInvoke (4 bytes): An unsigned integer that specifies the LCID value used for Invoke calls. MUST be 0x00000409. | |
| 1655 | + self.lcidinvoke = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1656 | + self.check_value('PROJECTLCIDINVOKE_LcidInvoke', 0x409, self.lcidinvoke) | |
| 1657 | + | |
| 1658 | + # PROJECTCODEPAGE Record | |
| 1659 | + # Specifies the VBA project's code page. | |
| 1660 | + projectcodepage_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1661 | + self.check_value('PROJECTCODEPAGE_Id', 0x0003, projectcodepage_id) | |
| 1662 | + projectcodepage_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1663 | + self.check_value('PROJECTCODEPAGE_Size', 0x0002, projectcodepage_size) | |
| 1664 | + self.codepage = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1665 | + self.codepage_name = codepages.get_codepage_name(self.codepage) | |
| 1666 | + log.debug('Project Code Page: %r - %s' % (self.codepage, self.codepage_name)) | |
| 1667 | + self.codec = codepages.codepage2codec(self.codepage) | |
| 1668 | + log.debug('Python codec corresponding to code page %d: %s' % (self.codepage, self.codec)) | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + # PROJECTNAME Record | |
| 1672 | + # Specifies a unique VBA identifier as the name of the VBA project. | |
| 1673 | + projectname_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1674 | + self.check_value('PROJECTNAME_Id', 0x0004, projectname_id) | |
| 1675 | + sizeof_projectname = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1676 | + log.debug('Project name size: %d bytes' % sizeof_projectname) | |
| 1677 | + if sizeof_projectname < 1 or sizeof_projectname > 128: | |
| 1678 | + # TODO: raise an actual error? What is MS Office's behaviour? | |
| 1679 | + log.error("PROJECTNAME_SizeOfProjectName value not in range [1-128]: {0}".format(sizeof_projectname)) | |
| 1680 | + projectname_bytes = dir_stream.read(sizeof_projectname) | |
| 1681 | + self.projectname = self.decode_bytes(projectname_bytes) | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + # PROJECTDOCSTRING Record | |
| 1685 | + # Specifies the description for the VBA project. | |
| 1686 | + projectdocstring_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1687 | + self.check_value('PROJECTDOCSTRING_Id', 0x0005, projectdocstring_id) | |
| 1688 | + projectdocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1689 | + if projectdocstring_sizeof_docstring > 2000: | |
| 1690 | + log.error( | |
| 1691 | + "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring)) | |
| 1692 | + # DocString (variable): An array of SizeOfDocString bytes that specifies the description for the VBA project. | |
| 1693 | + # MUST contain MBCS characters encoded using the code page specified in PROJECTCODEPAGE (section 2.3.4.2.1.4). | |
| 1694 | + # MUST NOT contain null characters. | |
| 1695 | + docstring_bytes = dir_stream.read(projectdocstring_sizeof_docstring) | |
| 1696 | + self.docstring = self.decode_bytes(docstring_bytes) | |
| 1697 | + projectdocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1698 | + self.check_value('PROJECTDOCSTRING_Reserved', 0x0040, projectdocstring_reserved) | |
| 1699 | + projectdocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1700 | + if projectdocstring_sizeof_docstring_unicode % 2 != 0: | |
| 1701 | + log.error("PROJECTDOCSTRING_SizeOfDocStringUnicode is not even") | |
| 1702 | + # DocStringUnicode (variable): An array of SizeOfDocStringUnicode bytes that specifies the description for the | |
| 1703 | + # VBA project. MUST contain UTF-16 characters. MUST NOT contain null characters. | |
| 1704 | + # MUST contain the UTF-16 encoding of DocString. | |
| 1705 | + docstring_unicode_bytes = dir_stream.read(projectdocstring_sizeof_docstring_unicode) | |
| 1706 | + self.docstring_unicode = docstring_unicode_bytes.decode('utf16', errors='replace') | |
| 1707 | + | |
| 1708 | + # PROJECTHELPFILEPATH Record - MS-OVBA 2.3.4.2.1.7 | |
| 1709 | + projecthelpfilepath_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1710 | + self.check_value('PROJECTHELPFILEPATH_Id', 0x0006, projecthelpfilepath_id) | |
| 1711 | + projecthelpfilepath_sizeof_helpfile1 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1712 | + if projecthelpfilepath_sizeof_helpfile1 > 260: | |
| 1713 | + log.error( | |
| 1714 | + "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1)) | |
| 1715 | + projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1) | |
| 1716 | + projecthelpfilepath_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1717 | + self.check_value('PROJECTHELPFILEPATH_Reserved', 0x003D, projecthelpfilepath_reserved) | |
| 1718 | + projecthelpfilepath_sizeof_helpfile2 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1719 | + if projecthelpfilepath_sizeof_helpfile2 != projecthelpfilepath_sizeof_helpfile1: | |
| 1720 | + log.error("PROJECTHELPFILEPATH_SizeOfHelpFile1 does not equal PROJECTHELPFILEPATH_SizeOfHelpFile2") | |
| 1721 | + projecthelpfilepath_helpfile2 = dir_stream.read(projecthelpfilepath_sizeof_helpfile2) | |
| 1722 | + if projecthelpfilepath_helpfile2 != projecthelpfilepath_helpfile1: | |
| 1723 | + log.error("PROJECTHELPFILEPATH_HelpFile1 does not equal PROJECTHELPFILEPATH_HelpFile2") | |
| 1724 | + | |
| 1725 | + # PROJECTHELPCONTEXT Record | |
| 1726 | + projecthelpcontext_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1727 | + self.check_value('PROJECTHELPCONTEXT_Id', 0x0007, projecthelpcontext_id) | |
| 1728 | + projecthelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1729 | + self.check_value('PROJECTHELPCONTEXT_Size', 0x0004, projecthelpcontext_size) | |
| 1730 | + projecthelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1731 | + unused = projecthelpcontext_helpcontext | |
| 1732 | + | |
| 1733 | + # PROJECTLIBFLAGS Record | |
| 1734 | + projectlibflags_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1735 | + self.check_value('PROJECTLIBFLAGS_Id', 0x0008, projectlibflags_id) | |
| 1736 | + projectlibflags_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1737 | + self.check_value('PROJECTLIBFLAGS_Size', 0x0004, projectlibflags_size) | |
| 1738 | + projectlibflags_projectlibflags = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1739 | + self.check_value('PROJECTLIBFLAGS_ProjectLibFlags', 0x0000, projectlibflags_projectlibflags) | |
| 1740 | + | |
| 1741 | + # PROJECTVERSION Record | |
| 1742 | + projectversion_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1743 | + self.check_value('PROJECTVERSION_Id', 0x0009, projectversion_id) | |
| 1744 | + projectversion_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1745 | + self.check_value('PROJECTVERSION_Reserved', 0x0004, projectversion_reserved) | |
| 1746 | + projectversion_versionmajor = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1747 | + projectversion_versionminor = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1748 | + unused = projectversion_versionmajor | |
| 1749 | + unused = projectversion_versionminor | |
| 1750 | + | |
| 1751 | + # PROJECTCONSTANTS Record | |
| 1752 | + projectconstants_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1753 | + self.check_value('PROJECTCONSTANTS_Id', 0x000C, projectconstants_id) | |
| 1754 | + projectconstants_sizeof_constants = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1755 | + if projectconstants_sizeof_constants > 1015: | |
| 1756 | + log.error( | |
| 1757 | + "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants)) | |
| 1758 | + projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants) | |
| 1759 | + projectconstants_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1760 | + self.check_value('PROJECTCONSTANTS_Reserved', 0x003C, projectconstants_reserved) | |
| 1761 | + projectconstants_sizeof_constants_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1762 | + if projectconstants_sizeof_constants_unicode % 2 != 0: | |
| 1763 | + log.error("PROJECTCONSTANTS_SizeOfConstantsUnicode is not even") | |
| 1764 | + projectconstants_constants_unicode = dir_stream.read(projectconstants_sizeof_constants_unicode) | |
| 1765 | + unused = projectconstants_constants | |
| 1766 | + unused = projectconstants_constants_unicode | |
| 1767 | + | |
| 1768 | + # array of REFERENCE records | |
| 1769 | + # Specifies a reference to an Automation type library or VBA project. | |
| 1770 | + check = None | |
| 1771 | + while True: | |
| 1772 | + check = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1773 | + log.debug("reference type = {0:04X}".format(check)) | |
| 1774 | + if check == 0x000F: | |
| 1775 | + break | |
| 1776 | + | |
| 1777 | + if check == 0x0016: | |
| 1778 | + # REFERENCENAME | |
| 1779 | + # Specifies the name of a referenced VBA project or Automation type library. | |
| 1780 | + reference_id = check | |
| 1781 | + reference_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1782 | + reference_name = dir_stream.read(reference_sizeof_name) | |
| 1783 | + log.debug('REFERENCE name: %s' % unicode2str(self.decode_bytes(reference_name))) | |
| 1784 | + reference_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1785 | + # According to [MS-OVBA] 2.3.4.2.2.2 REFERENCENAME Record: | |
| 1786 | + # "Reserved (2 bytes): MUST be 0x003E. MUST be ignored." | |
| 1787 | + # So let's ignore it, otherwise it crashes on some files (issue #132) | |
| 1788 | + # PR #135 by @c1fe: | |
| 1789 | + # contrary to the specification I think that the unicode name | |
| 1790 | + # is optional. if reference_reserved is not 0x003E I think it | |
| 1791 | + # is actually the start of another REFERENCE record | |
| 1792 | + # at least when projectsyskind_syskind == 0x02 (Macintosh) | |
| 1793 | + if reference_reserved == 0x003E: | |
| 1794 | + #if reference_reserved not in (0x003E, 0x000D): | |
| 1795 | + # raise UnexpectedDataError(dir_path, 'REFERENCE_Reserved', | |
| 1796 | + # 0x0003E, reference_reserved) | |
| 1797 | + reference_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1798 | + reference_name_unicode = dir_stream.read(reference_sizeof_name_unicode) | |
| 1799 | + unused = reference_id | |
| 1800 | + unused = reference_name | |
| 1801 | + unused = reference_name_unicode | |
| 1802 | + continue | |
| 1803 | + else: | |
| 1804 | + check = reference_reserved | |
| 1805 | + log.debug("reference type = {0:04X}".format(check)) | |
| 1806 | + | |
| 1807 | + if check == 0x0033: | |
| 1808 | + # REFERENCEORIGINAL (followed by REFERENCECONTROL) | |
| 1809 | + # Specifies the identifier of the Automation type library the containing REFERENCECONTROL's | |
| 1810 | + # (section 2.3.4.2.2.3) twiddled type library was generated from. | |
| 1811 | + referenceoriginal_id = check | |
| 1812 | + referenceoriginal_sizeof_libidoriginal = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1813 | + referenceoriginal_libidoriginal = dir_stream.read(referenceoriginal_sizeof_libidoriginal) | |
| 1814 | + log.debug('REFERENCE original lib id: %s' % unicode2str(self.decode_bytes(referenceoriginal_libidoriginal))) | |
| 1815 | + unused = referenceoriginal_id | |
| 1816 | + unused = referenceoriginal_libidoriginal | |
| 1817 | + continue | |
| 1818 | + | |
| 1819 | + if check == 0x002F: | |
| 1820 | + # REFERENCECONTROL | |
| 1821 | + # Specifies a reference to a twiddled type library and its extended type library. | |
| 1822 | + referencecontrol_id = check | |
| 1823 | + referencecontrol_sizetwiddled = struct.unpack("<L", dir_stream.read(4))[0] # ignore | |
| 1824 | + referencecontrol_sizeof_libidtwiddled = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1825 | + referencecontrol_libidtwiddled = dir_stream.read(referencecontrol_sizeof_libidtwiddled) | |
| 1826 | + log.debug('REFERENCE control twiddled lib id: %s' % unicode2str(self.decode_bytes(referencecontrol_libidtwiddled))) | |
| 1827 | + referencecontrol_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] # ignore | |
| 1828 | + self.check_value('REFERENCECONTROL_Reserved1', 0x0000, referencecontrol_reserved1) | |
| 1829 | + referencecontrol_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] # ignore | |
| 1830 | + self.check_value('REFERENCECONTROL_Reserved2', 0x0000, referencecontrol_reserved2) | |
| 1831 | + unused = referencecontrol_id | |
| 1832 | + unused = referencecontrol_sizetwiddled | |
| 1833 | + unused = referencecontrol_libidtwiddled | |
| 1834 | + # optional field | |
| 1835 | + check2 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1836 | + if check2 == 0x0016: | |
| 1837 | + referencecontrol_namerecordextended_id = check | |
| 1838 | + referencecontrol_namerecordextended_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1839 | + referencecontrol_namerecordextended_name = dir_stream.read( | |
| 1840 | + referencecontrol_namerecordextended_sizeof_name) | |
| 1841 | + log.debug('REFERENCE control name record extended: %s' % unicode2str( | |
| 1842 | + self.decode_bytes(referencecontrol_namerecordextended_name))) | |
| 1843 | + referencecontrol_namerecordextended_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1844 | + if referencecontrol_namerecordextended_reserved == 0x003E: | |
| 1845 | + referencecontrol_namerecordextended_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1846 | + referencecontrol_namerecordextended_name_unicode = dir_stream.read( | |
| 1847 | + referencecontrol_namerecordextended_sizeof_name_unicode) | |
| 1848 | + referencecontrol_reserved3 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1849 | + unused = referencecontrol_namerecordextended_id | |
| 1850 | + unused = referencecontrol_namerecordextended_name | |
| 1851 | + unused = referencecontrol_namerecordextended_name_unicode | |
| 1852 | + else: | |
| 1853 | + referencecontrol_reserved3 = referencecontrol_namerecordextended_reserved | |
| 1854 | + else: | |
| 1855 | + referencecontrol_reserved3 = check2 | |
| 1856 | + | |
| 1857 | + self.check_value('REFERENCECONTROL_Reserved3', 0x0030, referencecontrol_reserved3) | |
| 1858 | + referencecontrol_sizeextended = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1859 | + referencecontrol_sizeof_libidextended = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1860 | + referencecontrol_libidextended = dir_stream.read(referencecontrol_sizeof_libidextended) | |
| 1861 | + referencecontrol_reserved4 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1862 | + referencecontrol_reserved5 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1863 | + referencecontrol_originaltypelib = dir_stream.read(16) | |
| 1864 | + referencecontrol_cookie = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1865 | + unused = referencecontrol_sizeextended | |
| 1866 | + unused = referencecontrol_libidextended | |
| 1867 | + unused = referencecontrol_reserved4 | |
| 1868 | + unused = referencecontrol_reserved5 | |
| 1869 | + unused = referencecontrol_originaltypelib | |
| 1870 | + unused = referencecontrol_cookie | |
| 1871 | + continue | |
| 1872 | + | |
| 1873 | + if check == 0x000D: | |
| 1874 | + # REFERENCEREGISTERED | |
| 1875 | + # Specifies a reference to an Automation type library. | |
| 1876 | + referenceregistered_id = check | |
| 1877 | + referenceregistered_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1878 | + referenceregistered_sizeof_libid = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1879 | + referenceregistered_libid = dir_stream.read(referenceregistered_sizeof_libid) | |
| 1880 | + log.debug('REFERENCE registered lib id: %s' % unicode2str(self.decode_bytes(referenceregistered_libid))) | |
| 1881 | + referenceregistered_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1882 | + self.check_value('REFERENCEREGISTERED_Reserved1', 0x0000, referenceregistered_reserved1) | |
| 1883 | + referenceregistered_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1884 | + self.check_value('REFERENCEREGISTERED_Reserved2', 0x0000, referenceregistered_reserved2) | |
| 1885 | + unused = referenceregistered_id | |
| 1886 | + unused = referenceregistered_size | |
| 1887 | + unused = referenceregistered_libid | |
| 1888 | + continue | |
| 1889 | + | |
| 1890 | + if check == 0x000E: | |
| 1891 | + # REFERENCEPROJECT | |
| 1892 | + # Specifies a reference to an external VBA project. | |
| 1893 | + referenceproject_id = check | |
| 1894 | + referenceproject_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1895 | + referenceproject_sizeof_libidabsolute = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1896 | + referenceproject_libidabsolute = dir_stream.read(referenceproject_sizeof_libidabsolute) | |
| 1897 | + log.debug('REFERENCE project lib id absolute: %s' % unicode2str(self.decode_bytes(referenceproject_libidabsolute))) | |
| 1898 | + referenceproject_sizeof_libidrelative = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1899 | + referenceproject_libidrelative = dir_stream.read(referenceproject_sizeof_libidrelative) | |
| 1900 | + log.debug('REFERENCE project lib id relative: %s' % unicode2str(self.decode_bytes(referenceproject_libidrelative))) | |
| 1901 | + referenceproject_majorversion = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1902 | + referenceproject_minorversion = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1903 | + unused = referenceproject_id | |
| 1904 | + unused = referenceproject_size | |
| 1905 | + unused = referenceproject_libidabsolute | |
| 1906 | + unused = referenceproject_libidrelative | |
| 1907 | + unused = referenceproject_majorversion | |
| 1908 | + unused = referenceproject_minorversion | |
| 1909 | + continue | |
| 1910 | + | |
| 1911 | + log.error('invalid or unknown check Id {0:04X}'.format(check)) | |
| 1912 | + # raise an exception instead of stopping abruptly (issue #180) | |
| 1913 | + raise UnexpectedDataError(dir_path, 'reference type', (0x0F, 0x16, 0x33, 0x2F, 0x0D, 0x0E), check) | |
| 1914 | + #sys.exit(0) | |
| 1915 | + | |
| 1916 | + def check_value(self, name, expected, value): | |
| 1917 | + if expected != value: | |
| 1918 | + if self.relaxed: | |
| 1919 | + log.error("invalid value for {0} expected {1:04X} got {2:04X}" | |
| 1920 | + .format(name, expected, value)) | |
| 1921 | + else: | |
| 1922 | + raise UnexpectedDataError(self.dir_path, name, expected, value) | |
| 1923 | + | |
| 1924 | + def parse_project_stream(self): | |
| 1925 | + """ | |
| 1926 | + Parse the PROJECT stream from the VBA project | |
| 1927 | + :return: | |
| 1928 | + """ | |
| 1929 | + # Open the PROJECT stream: | |
| 1930 | + # reference: [MS-OVBA] 2.3.1 PROJECT Stream | |
| 1931 | + project_stream = self.ole.openstream(self.project_path) | |
| 1932 | + | |
| 1933 | + # sample content of the PROJECT stream: | |
| 1934 | + | |
| 1935 | + ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}" | |
| 1936 | + ## Document=ThisDocument/&H00000000 | |
| 1937 | + ## Module=NewMacros | |
| 1938 | + ## Name="Project" | |
| 1939 | + ## HelpContextID="0" | |
| 1940 | + ## VersionCompatible32="393222000" | |
| 1941 | + ## CMG="F1F301E705E705E705E705" | |
| 1942 | + ## DPB="8F8D7FE3831F2020202020" | |
| 1943 | + ## GC="2D2FDD81E51EE61EE6E1" | |
| 1944 | + ## | |
| 1945 | + ## [Host Extender Info] | |
| 1946 | + ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000 | |
| 1947 | + ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000 | |
| 1948 | + ## | |
| 1949 | + ## [Workspace] | |
| 1950 | + ## ThisDocument=22, 29, 339, 477, Z | |
| 1951 | + ## NewMacros=-4, 42, 832, 510, C | |
| 1952 | + | |
| 1953 | + self.module_ext = {} | |
| 1954 | + | |
| 1955 | + for line in project_stream: | |
| 1956 | + line = self.decode_bytes(line) | |
| 1957 | + log.debug('PROJECT: %r' % line) | |
| 1958 | + line = line.strip() | |
| 1959 | + if '=' in line: | |
| 1960 | + # split line at the 1st equal sign: | |
| 1961 | + name, value = line.split('=', 1) | |
| 1962 | + # looking for code modules | |
| 1963 | + # add the code module as a key in the dictionary | |
| 1964 | + # the value will be the extension needed later | |
| 1965 | + # The value is converted to lowercase, to allow case-insensitive matching (issue #3) | |
| 1966 | + value = value.lower() | |
| 1967 | + if name == 'Document': | |
| 1968 | + # split value at the 1st slash, keep 1st part: | |
| 1969 | + value = value.split('/', 1)[0] | |
| 1970 | + self.module_ext[value] = CLASS_EXTENSION | |
| 1971 | + elif name == 'Module': | |
| 1972 | + self.module_ext[value] = MODULE_EXTENSION | |
| 1973 | + elif name == 'Class': | |
| 1974 | + self.module_ext[value] = CLASS_EXTENSION | |
| 1975 | + elif name == 'BaseClass': | |
| 1976 | + self.module_ext[value] = FORM_EXTENSION | |
| 1977 | + | |
| 1978 | + def parse_modules(self): | |
| 1979 | + dir_stream = self.dir_stream | |
| 1980 | + # projectmodules_id has already been read by the previous loop = 0x000F | |
| 1981 | + # projectmodules_id = check #struct.unpack("<H", dir_stream.read(2))[0] | |
| 1982 | + # self.check_value('PROJECTMODULES_Id', 0x000F, projectmodules_id) | |
| 1983 | + projectmodules_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1984 | + self.check_value('PROJECTMODULES_Size', 0x0002, projectmodules_size) | |
| 1985 | + self.modules_count = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1986 | + _id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1987 | + self.check_value('PROJECTMODULES_ProjectCookieRecord_Id', 0x0013, _id) | |
| 1988 | + size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1989 | + self.check_value('PROJECTMODULES_ProjectCookieRecord_Size', 0x0002, size) | |
| 1990 | + projectcookierecord_cookie = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1991 | + unused = projectcookierecord_cookie | |
| 1992 | + | |
| 1993 | + log.debug("parsing {0} modules".format(self.modules_count)) | |
| 1994 | + for module_index in xrange(0, self.modules_count): | |
| 1995 | + module = VBA_Module(self, self.dir_stream, module_index=module_index) | |
| 1996 | + self.modules.append(module) | |
| 1997 | + yield (module.code_path, module.filename_str, module.code_str) | |
| 1998 | + _ = unused # make pylint happy: now variable "unused" is being used ;-) | |
| 1999 | + return | |
| 2000 | + | |
| 2001 | + def decode_bytes(self, bytes_string, errors='replace'): | |
| 2002 | + """ | |
| 2003 | + Decode a bytes string to a unicode string, using the project code page | |
| 2004 | + :param bytes_string: bytes, bytes string to be decoded | |
| 2005 | + :param errors: str, mode to handle unicode conversion errors | |
| 2006 | + :return: str/unicode, decoded string | |
| 2007 | + """ | |
| 2008 | + return bytes_string.decode(self.codec, errors=errors) | |
| 2009 | + | |
| 2010 | + | |
| 2011 | + | |
| 2012 | +def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False): | |
| 2013 | + """ | |
| 2014 | + Extract VBA macros from an OleFileIO object. | |
| 2015 | + Internal function, do not call directly. | |
| 2016 | + | |
| 2017 | + vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream | |
| 2018 | + vba_project: path to the PROJECT stream | |
| 2019 | + :param relaxed: If True, only create info/debug log entry if data is not as expected | |
| 2020 | + (e.g. opening substream fails); if False, raise an error in this case | |
| 2021 | + This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream | |
| 2022 | + """ | |
| 2023 | + log.debug('relaxed is %s' % relaxed) | |
| 2024 | + | |
| 2025 | + project = VBA_Project(ole, vba_root, project_path, dir_path, relaxed=False) | |
| 2026 | + project.parse_project_stream() | |
| 2027 | + | |
| 2028 | + for code_path, filename, code_data in project.parse_modules(): | |
| 2029 | + yield (code_path, filename, code_data) | |
| 1816 | 2030 | |
| 1817 | 2031 | |
| 1818 | 2032 | def vba_collapse_long_lines(vba_code): |
| ... | ... | @@ -1824,9 +2038,13 @@ def vba_collapse_long_lines(vba_code): |
| 1824 | 2038 | :return: str, VBA module code with long lines collapsed |
| 1825 | 2039 | """ |
| 1826 | 2040 | # TODO: use a regex instead, to allow whitespaces after the underscore? |
| 1827 | - vba_code = vba_code.replace(' _\r\n', ' ') | |
| 1828 | - vba_code = vba_code.replace(' _\r', ' ') | |
| 1829 | - vba_code = vba_code.replace(' _\n', ' ') | |
| 2041 | + try: | |
| 2042 | + vba_code = vba_code.replace(' _\r\n', ' ') | |
| 2043 | + vba_code = vba_code.replace(' _\r', ' ') | |
| 2044 | + vba_code = vba_code.replace(' _\n', ' ') | |
| 2045 | + except: | |
| 2046 | + log.exception('type(vba_code)=%s' % type(vba_code)) | |
| 2047 | + raise | |
| 1830 | 2048 | return vba_code |
| 1831 | 2049 | |
| 1832 | 2050 | |
| ... | ... | @@ -1875,7 +2093,7 @@ def detect_autoexec(vba_code, obfuscation=None): |
| 1875 | 2093 | for keyword in keywords: |
| 1876 | 2094 | #TODO: if keyword is already a compiled regex, use it as-is |
| 1877 | 2095 | # search using regex to detect word boundaries: |
| 1878 | - match = re.search(r'(?i)\b' + keyword + r'\b', vba_code) | |
| 2096 | + match = re.search(r'(?i)\b' + re.escape(keyword) + r'\b', vba_code) | |
| 1879 | 2097 | if match: |
| 1880 | 2098 | #if keyword.lower() in vba_code: |
| 1881 | 2099 | found_keyword = match.group() |
| ... | ... | @@ -1901,7 +2119,8 @@ def detect_suspicious(vba_code, obfuscation=None): |
| 1901 | 2119 | for description, keywords in SUSPICIOUS_KEYWORDS.items(): |
| 1902 | 2120 | for keyword in keywords: |
| 1903 | 2121 | # search using regex to detect word boundaries: |
| 1904 | - match = re.search(r'(?i)\b' + keyword + r'\b', vba_code) | |
| 2122 | + # note: each keyword must be escaped if it contains special chars such as '\' | |
| 2123 | + match = re.search(r'(?i)\b' + re.escape(keyword) + r'\b', vba_code) | |
| 1905 | 2124 | if match: |
| 1906 | 2125 | #if keyword.lower() in vba_code: |
| 1907 | 2126 | found_keyword = match.group() |
| ... | ... | @@ -1909,7 +2128,9 @@ def detect_suspicious(vba_code, obfuscation=None): |
| 1909 | 2128 | for description, keywords in SUSPICIOUS_KEYWORDS_NOREGEX.items(): |
| 1910 | 2129 | for keyword in keywords: |
| 1911 | 2130 | if keyword.lower() in vba_code: |
| 1912 | - results.append((keyword, description + obf_text)) | |
| 2131 | + # avoid reporting backspace chars out of plain VBA code: | |
| 2132 | + if not(keyword=='\b' and obfuscation is not None): | |
| 2133 | + results.append((keyword, description + obf_text)) | |
| 1913 | 2134 | return results |
| 1914 | 2135 | |
| 1915 | 2136 | |
| ... | ... | @@ -1947,7 +2168,7 @@ def detect_hex_strings(vba_code): |
| 1947 | 2168 | for match in re_hex_string.finditer(vba_code): |
| 1948 | 2169 | value = match.group() |
| 1949 | 2170 | if value not in found: |
| 1950 | - decoded = binascii.unhexlify(value) | |
| 2171 | + decoded = bytes2str(binascii.unhexlify(value)) | |
| 1951 | 2172 | results.append((value, decoded)) |
| 1952 | 2173 | found.add(value) |
| 1953 | 2174 | return results |
| ... | ... | @@ -1972,7 +2193,7 @@ def detect_base64_strings(vba_code): |
| 1972 | 2193 | # only keep new values and not in the whitelist: |
| 1973 | 2194 | if value not in found and value.lower() not in BASE64_WHITELIST: |
| 1974 | 2195 | try: |
| 1975 | - decoded = base64.b64decode(value) | |
| 2196 | + decoded = bytes2str(base64.b64decode(value)) | |
| 1976 | 2197 | results.append((value, decoded)) |
| 1977 | 2198 | found.add(value) |
| 1978 | 2199 | except (TypeError, ValueError) as exc: |
| ... | ... | @@ -2000,7 +2221,7 @@ def detect_dridex_strings(vba_code): |
| 2000 | 2221 | continue |
| 2001 | 2222 | if value not in found: |
| 2002 | 2223 | try: |
| 2003 | - decoded = DridexUrlDecode(value) | |
| 2224 | + decoded = bytes2str(DridexUrlDecode(value)) | |
| 2004 | 2225 | results.append((value, decoded)) |
| 2005 | 2226 | found.add(value) |
| 2006 | 2227 | except Exception as exc: |
| ... | ... | @@ -2047,7 +2268,8 @@ def detect_vba_strings(vba_code): |
| 2047 | 2268 | |
| 2048 | 2269 | |
| 2049 | 2270 | def json2ascii(json_obj, encoding='utf8', errors='replace'): |
| 2050 | - """ ensure there is no unicode in json and all strings are safe to decode | |
| 2271 | + """ | |
| 2272 | + ensure there is no unicode in json and all strings are safe to decode | |
| 2051 | 2273 | |
| 2052 | 2274 | works recursively, decodes and re-encodes every string to/from unicode |
| 2053 | 2275 | to ensure there will be no trouble in loading the dumped json output |
| ... | ... | @@ -2057,20 +2279,32 @@ def json2ascii(json_obj, encoding='utf8', errors='replace'): |
| 2057 | 2279 | elif isinstance(json_obj, (bool, int, float)): |
| 2058 | 2280 | pass |
| 2059 | 2281 | elif isinstance(json_obj, str): |
| 2060 | - # de-code and re-encode | |
| 2061 | - dencoded = json_obj.decode(encoding, errors).encode(encoding, errors) | |
| 2062 | - if dencoded != json_obj: | |
| 2063 | - log.debug('json2ascii: replaced: {0} (len {1})' | |
| 2064 | - .format(json_obj, len(json_obj))) | |
| 2065 | - log.debug('json2ascii: with: {0} (len {1})' | |
| 2066 | - .format(dencoded, len(dencoded))) | |
| 2067 | - return dencoded | |
| 2068 | - elif isinstance(json_obj, unicode): | |
| 2069 | - log.debug('json2ascii: encode unicode: {0}' | |
| 2070 | - .format(json_obj.encode(encoding, errors))) | |
| 2282 | + if PYTHON2: | |
| 2283 | + # de-code and re-encode | |
| 2284 | + dencoded = json_obj.decode(encoding, errors).encode(encoding, errors) | |
| 2285 | + if dencoded != json_obj: | |
| 2286 | + log.debug('json2ascii: replaced: {0} (len {1})' | |
| 2287 | + .format(json_obj, len(json_obj))) | |
| 2288 | + log.debug('json2ascii: with: {0} (len {1})' | |
| 2289 | + .format(dencoded, len(dencoded))) | |
| 2290 | + return dencoded | |
| 2291 | + else: | |
| 2292 | + # on Python 3, just keep Unicode strings as-is: | |
| 2293 | + return json_obj | |
| 2294 | + elif isinstance(json_obj, unicode) and PYTHON2: | |
| 2295 | + # On Python 2, encode unicode to bytes: | |
| 2296 | + json_obj_bytes = json_obj.encode(encoding, errors) | |
| 2297 | + log.debug('json2ascii: encode unicode: {0}'.format(json_obj_bytes)) | |
| 2298 | + # cannot put original into logger | |
| 2299 | + # print 'original: ' json_obj | |
| 2300 | + return json_obj_bytes | |
| 2301 | + elif isinstance(json_obj, bytes) and not PYTHON2: | |
| 2302 | + # On Python 3, decode bytes to unicode str | |
| 2303 | + json_obj_str = json_obj.decode(encoding, errors) | |
| 2304 | + log.debug('json2ascii: encode unicode: {0}'.format(json_obj_str)) | |
| 2071 | 2305 | # cannot put original into logger |
| 2072 | 2306 | # print 'original: ' json_obj |
| 2073 | - return json_obj.encode(encoding, errors) | |
| 2307 | + return json_obj_str | |
| 2074 | 2308 | elif isinstance(json_obj, dict): |
| 2075 | 2309 | for key in json_obj: |
| 2076 | 2310 | json_obj[key] = json2ascii(json_obj[key]) |
| ... | ... | @@ -2096,7 +2330,6 @@ def print_json(json_dict=None, _json_is_first=False, _json_is_last=False, |
| 2096 | 2330 | :param bool _json_is_last: set to True only for very last entry to complete |
| 2097 | 2331 | the top-level json-list |
| 2098 | 2332 | """ |
| 2099 | - | |
| 2100 | 2333 | if json_dict and json_parts: |
| 2101 | 2334 | raise ValueError('Invalid json argument: want either single dict or ' |
| 2102 | 2335 | 'key=value parts but got both)') |
| ... | ... | @@ -2177,7 +2410,7 @@ class VBA_Scanner(object): |
| 2177 | 2410 | # StrReverse after hex decoding: |
| 2178 | 2411 | self.code_hex_rev += '\n' + decoded[::-1] |
| 2179 | 2412 | # StrReverse before hex decoding: |
| 2180 | - self.code_rev_hex += '\n' + binascii.unhexlify(encoded[::-1]) | |
| 2413 | + self.code_rev_hex += '\n' + bytes2str(binascii.unhexlify(encoded[::-1])) | |
| 2181 | 2414 | #example: https://malwr.com/analysis/NmFlMGI4YTY1YzYyNDkwNTg1ZTBiZmY5OGI3YjlhYzU/ |
| 2182 | 2415 | #TODO: also append the full code reversed if StrReverse? (risk of false positives?) |
| 2183 | 2416 | # Detect Base64-encoded strings |
| ... | ... | @@ -2287,7 +2520,7 @@ def scan_vba(vba_code, include_decoded_strings, deobfuscate=False): |
| 2287 | 2520 | :param include_decoded_strings: bool, if True all encoded strings will be included with their decoded content. |
| 2288 | 2521 | :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) |
| 2289 | 2522 | :return: list of tuples (type, keyword, description) |
| 2290 | - (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String') | |
| 2523 | + with type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String' | |
| 2291 | 2524 | """ |
| 2292 | 2525 | return VBA_Scanner(vba_code).scan(include_decoded_strings, deobfuscate) |
| 2293 | 2526 | |
| ... | ... | @@ -2297,44 +2530,38 @@ def scan_vba(vba_code, include_decoded_strings, deobfuscate=False): |
| 2297 | 2530 | class VBA_Parser(object): |
| 2298 | 2531 | """ |
| 2299 | 2532 | Class to parse MS Office files, to detect VBA macros and extract VBA source code |
| 2300 | - Supported file formats: | |
| 2301 | - - Word 97-2003 (.doc, .dot) | |
| 2302 | - - Word 2007+ (.docm, .dotm) | |
| 2303 | - - Word 2003 XML (.xml) | |
| 2304 | - - Word MHT - Single File Web Page / MHTML (.mht) | |
| 2305 | - - Excel 97-2003 (.xls) | |
| 2306 | - - Excel 2007+ (.xlsm, .xlsb) | |
| 2307 | - - PowerPoint 97-2003 (.ppt) | |
| 2308 | - - PowerPoint 2007+ (.pptm, .ppsm) | |
| 2309 | 2533 | """ |
| 2310 | 2534 | |
| 2311 | - def __init__(self, filename, data=None, container=None, relaxed=False): | |
| 2535 | + def __init__(self, filename, data=None, container=None, relaxed=False, encoding=DEFAULT_API_ENCODING): | |
| 2312 | 2536 | """ |
| 2313 | 2537 | Constructor for VBA_Parser |
| 2314 | 2538 | |
| 2315 | - :param filename: filename or path of file to parse, or file-like object | |
| 2539 | + :param str filename: filename or path of file to parse, or file-like object | |
| 2316 | 2540 | |
| 2317 | - :param data: None or bytes str, if None the file will be read from disk (or from the file-like object). | |
| 2318 | - If data is provided as a bytes string, it will be parsed as the content of the file in memory, | |
| 2319 | - and not read from disk. Note: files must be read in binary mode, i.e. open(f, 'rb'). | |
| 2541 | + :param bytes data: None or bytes str, if None the file will be read from disk (or from the file-like object). | |
| 2542 | + If data is provided as a bytes string, it will be parsed as the content of the file in memory, | |
| 2543 | + and not read from disk. Note: files must be read in binary mode, i.e. open(f, 'rb'). | |
| 2320 | 2544 | |
| 2321 | - :param container: str, path and filename of container if the file is within | |
| 2322 | - a zip archive, None otherwise. | |
| 2545 | + :param str container: str, path and filename of container if the file is within | |
| 2546 | + a zip archive, None otherwise. | |
| 2323 | 2547 | |
| 2324 | - :param relaxed: if True, treat mal-formed documents and missing streams more like MS office: | |
| 2325 | - do nothing; if False (default), raise errors in these cases | |
| 2548 | + :param bool relaxed: if True, treat mal-formed documents and missing streams more like MS office: | |
| 2549 | + do nothing; if False (default), raise errors in these cases | |
| 2326 | 2550 | |
| 2327 | - raises a FileOpenError if all attemps to interpret the data header failed | |
| 2551 | + :param str encoding: encoding for VBA source code and strings. | |
| 2552 | + Default: UTF-8 bytes strings on Python 2, unicode strings on Python 3 (None) | |
| 2553 | + | |
| 2554 | + raises a FileOpenError if all attempts to interpret the data header failed. | |
| 2328 | 2555 | """ |
| 2329 | - #TODO: filename should only be a string, data should be used for the file-like object | |
| 2330 | - #TODO: filename should be mandatory, optional data is a string or file-like object | |
| 2331 | - #TODO: also support olefile and zipfile as input | |
| 2556 | + # TODO: filename should only be a string, data should be used for the file-like object | |
| 2557 | + # TODO: filename should be mandatory, optional data is a string or file-like object | |
| 2558 | + # TODO: also support olefile and zipfile as input | |
| 2332 | 2559 | if data is None: |
| 2333 | 2560 | # open file from disk: |
| 2334 | 2561 | _file = filename |
| 2335 | 2562 | else: |
| 2336 | 2563 | # file already read in memory, make it a file-like object for zipfile: |
| 2337 | - _file = StringIO(data) | |
| 2564 | + _file = BytesIO(data) | |
| 2338 | 2565 | #self.file = _file |
| 2339 | 2566 | self.ole_file = None |
| 2340 | 2567 | self.ole_subfiles = [] |
| ... | ... | @@ -2359,6 +2586,13 @@ class VBA_Parser(object): |
| 2359 | 2586 | self.nb_base64strings = 0 |
| 2360 | 2587 | self.nb_dridexstrings = 0 |
| 2361 | 2588 | self.nb_vbastrings = 0 |
| 2589 | + #: Encoding for VBA source code and strings returned by all methods | |
| 2590 | + self.encoding = encoding | |
| 2591 | + self.xlm_macros = [] | |
| 2592 | + #: Output from pcodedmp, disassembly of the VBA P-code | |
| 2593 | + self.pcodedmp_output = None | |
| 2594 | + #: Flag set to True/False if VBA stomping detected | |
| 2595 | + self.vba_stomping_detected = None | |
| 2362 | 2596 | |
| 2363 | 2597 | # if filename is None: |
| 2364 | 2598 | # if isinstance(_file, basestring): |
| ... | ... | @@ -2372,15 +2606,9 @@ class VBA_Parser(object): |
| 2372 | 2606 | # This looks like an OLE file |
| 2373 | 2607 | self.open_ole(_file) |
| 2374 | 2608 | |
| 2375 | - # check whether file is encrypted (need to do this before try ppt) | |
| 2376 | - log.debug('Check encryption of ole file') | |
| 2377 | - crypt_indicator = oleid.OleID(self.ole_file).check_encrypted() | |
| 2378 | - if crypt_indicator.value: | |
| 2379 | - raise FileIsEncryptedError(filename) | |
| 2380 | - | |
| 2381 | 2609 | # if this worked, try whether it is a ppt file (special ole file) |
| 2382 | 2610 | self.open_ppt() |
| 2383 | - if self.type is None and is_zipfile(_file): | |
| 2611 | + if self.type is None and zipfile.is_zipfile(_file): | |
| 2384 | 2612 | # Zip file, which may be an OpenXML document |
| 2385 | 2613 | self.open_openxml(_file) |
| 2386 | 2614 | if self.type is None: |
| ... | ... | @@ -2600,12 +2828,12 @@ class VBA_Parser(object): |
| 2600 | 2828 | try: |
| 2601 | 2829 | # parse the MIME content |
| 2602 | 2830 | # remove any leading whitespace or newline (workaround for issue in email package) |
| 2603 | - stripped_data = data.lstrip('\r\n\t ') | |
| 2831 | + stripped_data = data.lstrip(b'\r\n\t ') | |
| 2604 | 2832 | # strip any junk from the beginning of the file |
| 2605 | 2833 | # (issue #31 fix by Greg C - gdigreg) |
| 2606 | 2834 | # TODO: improve keywords to avoid false positives |
| 2607 | - mime_offset = stripped_data.find('MIME') | |
| 2608 | - content_offset = stripped_data.find('Content') | |
| 2835 | + mime_offset = stripped_data.find(b'MIME') | |
| 2836 | + content_offset = stripped_data.find(b'Content') | |
| 2609 | 2837 | # if "MIME" is found, and located before "Content": |
| 2610 | 2838 | if -1 < mime_offset <= content_offset: |
| 2611 | 2839 | stripped_data = stripped_data[mime_offset:] |
| ... | ... | @@ -2614,7 +2842,11 @@ class VBA_Parser(object): |
| 2614 | 2842 | elif content_offset > -1: |
| 2615 | 2843 | stripped_data = stripped_data[content_offset:] |
| 2616 | 2844 | # TODO: quick and dirty fix: insert a standard line with MIME-Version header? |
| 2617 | - mhtml = email.message_from_string(stripped_data) | |
| 2845 | + if PYTHON2: | |
| 2846 | + mhtml = email.message_from_string(stripped_data) | |
| 2847 | + else: | |
| 2848 | + # on Python 3, need to use message_from_bytes instead: | |
| 2849 | + mhtml = email.message_from_bytes(stripped_data) | |
| 2618 | 2850 | # find all the attached files: |
| 2619 | 2851 | for part in mhtml.walk(): |
| 2620 | 2852 | content_type = part.get_content_type() # always returns a value |
| ... | ... | @@ -2627,7 +2859,7 @@ class VBA_Parser(object): |
| 2627 | 2859 | # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded. |
| 2628 | 2860 | # decompress the zlib data starting at offset 0x32, which is the OLE container: |
| 2629 | 2861 | # check ActiveMime header: |
| 2630 | - if isinstance(part_data, str) and is_mso_file(part_data): | |
| 2862 | + if isinstance(part_data, bytes) and is_mso_file(part_data): | |
| 2631 | 2863 | log.debug('Found ActiveMime header, decompressing MSO container') |
| 2632 | 2864 | try: |
| 2633 | 2865 | ole_data = mso_file_extract(part_data) |
| ... | ... | @@ -2697,7 +2929,9 @@ class VBA_Parser(object): |
| 2697 | 2929 | """ |
| 2698 | 2930 | log.info('Opening text file %s' % self.filename) |
| 2699 | 2931 | # directly store the source code: |
| 2700 | - self.vba_code_all_modules = data | |
| 2932 | + # On Python 2, store it as a raw bytes string | |
| 2933 | + # On Python 3, convert it to unicode assuming it was encoded with UTF-8 | |
| 2934 | + self.vba_code_all_modules = bytes2str(data) | |
| 2701 | 2935 | self.contains_macros = True |
| 2702 | 2936 | # set type only if parsing succeeds |
| 2703 | 2937 | self.type = TYPE_TEXT |
| ... | ... | @@ -2853,7 +3087,7 @@ class VBA_Parser(object): |
| 2853 | 3087 | log.debug('%r...[much more data]...%r' % (data[:100], data[-50:])) |
| 2854 | 3088 | else: |
| 2855 | 3089 | log.debug(repr(data)) |
| 2856 | - if 'Attribut\x00' in data: | |
| 3090 | + if b'Attribut\x00' in data: | |
| 2857 | 3091 | log.debug('Found VBA compressed code') |
| 2858 | 3092 | self.contains_macros = True |
| 2859 | 3093 | except IOError as exc: |
| ... | ... | @@ -2862,8 +3096,44 @@ class VBA_Parser(object): |
| 2862 | 3096 | log.debug('Trace:', exc_trace=True) |
| 2863 | 3097 | else: |
| 2864 | 3098 | raise SubstreamOpenError(self.filename, d.name, exc) |
| 3099 | + if self.detect_xlm_macros(): | |
| 3100 | + self.contains_macros = True | |
| 2865 | 3101 | return self.contains_macros |
| 2866 | 3102 | |
| 3103 | + def detect_xlm_macros(self): | |
| 3104 | + from oletools.thirdparty.oledump.plugin_biff import cBIFF | |
| 3105 | + self.xlm_macros = [] | |
| 3106 | + if self.ole_file is None: | |
| 3107 | + return False | |
| 3108 | + for excel_stream in ('Workbook', 'Book'): | |
| 3109 | + if self.ole_file.exists(excel_stream): | |
| 3110 | + log.debug('Found Excel stream %r' % excel_stream) | |
| 3111 | + data = self.ole_file.openstream(excel_stream).read() | |
| 3112 | + log.debug('Running BIFF plugin from oledump') | |
| 3113 | + try: | |
| 3114 | + biff_plugin = cBIFF(name=[excel_stream], stream=data, options='-x') | |
| 3115 | + self.xlm_macros = biff_plugin.Analyze() | |
| 3116 | + if len(self.xlm_macros)>0: | |
| 3117 | + log.debug('Found XLM macros') | |
| 3118 | + return True | |
| 3119 | + except: | |
| 3120 | + log.exception('Error when running oledump.plugin_biff, please report to %s' % URL_OLEVBA_ISSUES) | |
| 3121 | + return False | |
| 3122 | + | |
| 3123 | + | |
| 3124 | + def encode_string(self, unicode_str): | |
| 3125 | + """ | |
| 3126 | + Encode a unicode string to bytes or str, using the specified encoding | |
| 3127 | + for the VBA_parser. By default, it will be bytes/UTF-8 on Python 2, and | |
| 3128 | + a normal unicode string on Python 3. | |
| 3129 | + :param str unicode_str: string to be encoded | |
| 3130 | + :return: encoded string | |
| 3131 | + """ | |
| 3132 | + if self.encoding is None: | |
| 3133 | + return unicode_str | |
| 3134 | + else: | |
| 3135 | + return unicode_str.encode(self.encoding, errors='replace') | |
| 3136 | + | |
| 2867 | 3137 | def extract_macros(self): |
| 2868 | 3138 | """ |
| 2869 | 3139 | Extract and decompress source code for each VBA macro found in the file |
| ... | ... | @@ -2920,18 +3190,33 @@ class VBA_Parser(object): |
| 2920 | 3190 | # read data |
| 2921 | 3191 | log.debug('Reading data from stream %r' % d.name) |
| 2922 | 3192 | data = ole._open(d.isectStart, d.size).read() |
| 2923 | - for match in re.finditer(r'\x00Attribut[^e]', data, flags=re.IGNORECASE): | |
| 3193 | + for match in re.finditer(b'\\x00Attribut[^e]', data, flags=re.IGNORECASE): | |
| 2924 | 3194 | start = match.start() - 3 |
| 2925 | 3195 | log.debug('Found VBA compressed code at index %X' % start) |
| 2926 | 3196 | compressed_code = data[start:] |
| 2927 | 3197 | try: |
| 2928 | - vba_code = decompress_stream(compressed_code) | |
| 3198 | + vba_code = decompress_stream(bytearray(compressed_code)) | |
| 3199 | + # TODO vba_code = self.encode_string(vba_code) | |
| 2929 | 3200 | yield (self.filename, d.name, d.name, vba_code) |
| 2930 | 3201 | except Exception as exc: |
| 2931 | 3202 | # display the exception with full stack trace for debugging |
| 2932 | 3203 | log.debug('Error processing stream %r in file %r (%s)' % (d.name, self.filename, exc)) |
| 2933 | 3204 | log.debug('Traceback:', exc_info=True) |
| 2934 | 3205 | # do not raise the error, as it is unlikely to be a compressed macro stream |
| 3206 | + if self.xlm_macros: | |
| 3207 | + vba_code = '' | |
| 3208 | + for line in self.xlm_macros: | |
| 3209 | + vba_code += "' " + line + '\n' | |
| 3210 | + yield ('xlm_macro', 'xlm_macro', 'xlm_macro.txt', vba_code) | |
| 3211 | + # Analyse the VBA P-code to detect VBA stomping: | |
| 3212 | + # If stomping is detected, add a fake VBA module with the P-code as source comments | |
| 3213 | + # so that VBA_Scanner can find keywords and IOCs in it | |
| 3214 | + if self.detect_vba_stomping(): | |
| 3215 | + vba_code = '' | |
| 3216 | + for line in self.pcodedmp_output.splitlines(): | |
| 3217 | + vba_code += "' " + line + '\n' | |
| 3218 | + yield ('VBA P-code', 'VBA P-code', 'VBA_P-code.txt', vba_code) | |
| 3219 | + | |
| 2935 | 3220 | |
| 2936 | 3221 | def extract_all_macros(self): |
| 2937 | 3222 | """ |
| ... | ... | @@ -2953,6 +3238,8 @@ class VBA_Parser(object): |
| 2953 | 3238 | """ |
| 2954 | 3239 | runs extract_macros and analyze the source code of all VBA macros |
| 2955 | 3240 | found in the file. |
| 3241 | + All results are stored in self.analysis_results. | |
| 3242 | + If called more than once, simply returns the previous results. | |
| 2956 | 3243 | """ |
| 2957 | 3244 | if self.detect_vba_macros(): |
| 2958 | 3245 | # if the analysis was already done, avoid doing it twice: |
| ... | ... | @@ -2969,6 +3256,13 @@ class VBA_Parser(object): |
| 2969 | 3256 | # Analyze the whole code at once: |
| 2970 | 3257 | scanner = VBA_Scanner(self.vba_code_all_modules) |
| 2971 | 3258 | self.analysis_results = scanner.scan(show_decoded_strings, deobfuscate) |
| 3259 | + if self.detect_vba_stomping(): | |
| 3260 | + log.debug('adding VBA stomping to suspicious keywords') | |
| 3261 | + keyword = 'VBA Stomping' | |
| 3262 | + description = 'VBA Stomping was detected: the VBA source code and P-code are different, '\ | |
| 3263 | + 'this may have been used to hide malicious code' | |
| 3264 | + scanner.suspicious_keywords.append((keyword, description)) | |
| 3265 | + scanner.results.append(('Suspicious', keyword, description)) | |
| 2972 | 3266 | autoexec, suspicious, iocs, hexstrings, base64strings, dridex, vbastrings = scanner.scan_summary() |
| 2973 | 3267 | self.nb_autoexec += autoexec |
| 2974 | 3268 | self.nb_suspicious += suspicious |
| ... | ... | @@ -3080,11 +3374,12 @@ class VBA_Parser(object): |
| 3080 | 3374 | """ |
| 3081 | 3375 | Extract printable strings from each VBA Form found in the file |
| 3082 | 3376 | |
| 3083 | - Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found | |
| 3377 | + Iterator: yields (filename, stream_path, form_string) for each printable string found in forms | |
| 3084 | 3378 | If the file is OLE, filename is the path of the file. |
| 3085 | 3379 | If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros |
| 3086 | 3380 | within the zip archive, e.g. word/vbaProject.bin. |
| 3087 | 3381 | If the file is PPT, result is as for OpenXML but filename is useless |
| 3382 | + Note: form_string is a raw bytes string on Python 2, a unicode str on Python 3 | |
| 3088 | 3383 | """ |
| 3089 | 3384 | if self.ole_file is None: |
| 3090 | 3385 | # This may be either an OpenXML/PPT or a text file: |
| ... | ... | @@ -3107,7 +3402,13 @@ class VBA_Parser(object): |
| 3107 | 3402 | # Extract printable strings from the form object stream "o": |
| 3108 | 3403 | for m in re_printable_string.finditer(form_data): |
| 3109 | 3404 | log.debug('Printable string found in form: %r' % m.group()) |
| 3110 | - yield (self.filename, '/'.join(o_stream), m.group()) | |
| 3405 | + # On Python 3, convert bytes string to unicode str: | |
| 3406 | + if PYTHON2: | |
| 3407 | + found_str = m.group() | |
| 3408 | + else: | |
| 3409 | + found_str = m.group().decode('utf8', errors='replace') | |
| 3410 | + if found_str != 'Tahoma': | |
| 3411 | + yield (self.filename, '/'.join(o_stream), found_str) | |
| 3111 | 3412 | |
| 3112 | 3413 | def extract_form_strings_extended(self): |
| 3113 | 3414 | if self.ole_file is None: |
| ... | ... | @@ -3128,6 +3429,136 @@ class VBA_Parser(object): |
| 3128 | 3429 | for variable in oleform.extract_OleFormVariables(ole, form_storage): |
| 3129 | 3430 | yield (self.filename, '/'.join(form_storage), variable) |
| 3130 | 3431 | |
| 3432 | + def extract_pcode(self): | |
| 3433 | + """ | |
| 3434 | + Extract and disassemble the VBA P-code, using pcodedmp | |
| 3435 | + | |
| 3436 | + :return: VBA P-code disassembly | |
| 3437 | + :rtype: str | |
| 3438 | + """ | |
| 3439 | + # only run it once: | |
| 3440 | + if self.pcodedmp_output is None: | |
| 3441 | + log.debug('Calling pcodedmp to extract and disassemble the VBA P-code') | |
| 3442 | + # import pcodedmp here to avoid circular imports: | |
| 3443 | + try: | |
| 3444 | + from pcodedmp import pcodedmp | |
| 3445 | + except Exception as e: | |
| 3446 | + # This may happen with Pypy, because pcodedmp imports win_unicode_console... | |
| 3447 | + # TODO: this is a workaround, we just ignore P-code | |
| 3448 | + # TODO: here we just use log.info, because the word "error" in the output makes some of the tests fail... | |
| 3449 | + log.info('Exception when importing pcodedmp: {}'.format(e)) | |
| 3450 | + self.pcodedmp_output = '' | |
| 3451 | + return '' | |
| 3452 | + # logging is disabled after importing pcodedmp, need to re-enable it | |
| 3453 | + # This is because pcodedmp imports olevba again :-/ | |
| 3454 | + # TODO: here it works only if logging was enabled, need to change pcodedmp! | |
| 3455 | + enable_logging() | |
| 3456 | + # pcodedmp prints all its output to sys.stdout, so we need to capture it so that | |
| 3457 | + # we can process the results later on. | |
| 3458 | + # save sys.stdout, then modify it to capture pcodedmp's output: | |
| 3459 | + # stdout = sys.stdout | |
| 3460 | + if PYTHON2: | |
| 3461 | + # on Python 2, console output is bytes | |
| 3462 | + output = BytesIO() | |
| 3463 | + else: | |
| 3464 | + # on Python 3, console output is unicode | |
| 3465 | + output = StringIO() | |
| 3466 | + # sys.stdout = output | |
| 3467 | + # we need to fake an argparser for those two args used by pcodedmp: | |
| 3468 | + class args: | |
| 3469 | + disasmOnly = True | |
| 3470 | + verbose = False | |
| 3471 | + try: | |
| 3472 | + # TODO: handle files in memory too | |
| 3473 | + log.debug('before pcodedmp') | |
| 3474 | + pcodedmp.processFile(self.filename, args, output_file=output) | |
| 3475 | + log.debug('after pcodedmp') | |
| 3476 | + except Exception as e: | |
| 3477 | + # print('Error while running pcodedmp: {}'.format(e), file=sys.stderr, flush=True) | |
| 3478 | + # set sys.stdout back to its original value | |
| 3479 | + # sys.stdout = stdout | |
| 3480 | + log.exception('Error while running pcodedmp') | |
| 3481 | + # finally: | |
| 3482 | + # # set sys.stdout back to its original value | |
| 3483 | + # sys.stdout = stdout | |
| 3484 | + self.pcodedmp_output = output.getvalue() | |
| 3485 | + # print(self.pcodedmp_output) | |
| 3486 | + # log.debug(self.pcodedmp_output) | |
| 3487 | + return self.pcodedmp_output | |
| 3488 | + | |
| 3489 | + def detect_vba_stomping(self): | |
| 3490 | + """ | |
| 3491 | + Detect VBA stomping, by comparing the keywords present in the P-code and | |
| 3492 | + in the VBA source code. | |
| 3493 | + | |
| 3494 | + :return: True if VBA stomping detected, False otherwise | |
| 3495 | + :rtype: bool | |
| 3496 | + """ | |
| 3497 | + # only run it once: | |
| 3498 | + if self.vba_stomping_detected is None: | |
| 3499 | + log.debug('Analysing the P-code to detect VBA stomping') | |
| 3500 | + self.extract_pcode() | |
| 3501 | + # print('pcodedmp OK') | |
| 3502 | + log.debug('pcodedmp OK') | |
| 3503 | + # process the output to extract keywords, to detect VBA stomping | |
| 3504 | + keywords = set() | |
| 3505 | + for line in self.pcodedmp_output.splitlines(): | |
| 3506 | + if line.startswith('\t'): | |
| 3507 | + log.debug('P-code: ' + line.strip()) | |
| 3508 | + tokens = line.split(None, 1) | |
| 3509 | + mnemonic = tokens[0] | |
| 3510 | + args = '' | |
| 3511 | + if len(tokens) == 2: | |
| 3512 | + args = tokens[1].strip() | |
| 3513 | + # log.debug(repr([mnemonic, args])) | |
| 3514 | + # if mnemonic in ('VarDefn',): | |
| 3515 | + # # just add the rest of the line | |
| 3516 | + # keywords.add(args) | |
| 3517 | + # if mnemonic == 'FuncDefn': | |
| 3518 | + # # function definition: just strip parentheses | |
| 3519 | + # funcdefn = args.strip('()') | |
| 3520 | + # keywords.add(funcdefn) | |
| 3521 | + if mnemonic in ('ArgsCall', 'ArgsLd', 'St', 'Ld', 'MemSt', 'Label'): | |
| 3522 | + # add 1st argument: | |
| 3523 | + name = args.split(None, 1)[0] | |
| 3524 | + # sometimes pcodedmp reports names like "id_FFFF", which are not | |
| 3525 | + # directly present in the VBA source code | |
| 3526 | + # (for example "Me" in VBA appears as id_FFFF in P-code) | |
| 3527 | + if not name.startswith('id_'): | |
| 3528 | + keywords.add(name) | |
| 3529 | + if mnemonic == 'LitStr': | |
| 3530 | + # re_string = re.compile(r'\"([^\"]|\"\")*\"') | |
| 3531 | + # for match in re_string.finditer(line): | |
| 3532 | + # print('\t' + match.group()) | |
| 3533 | + # the string is the 2nd argument: | |
| 3534 | + s = args.split(None, 1)[1] | |
| 3535 | + # tricky issue: when a string contains double quotes inside, | |
| 3536 | + # pcodedmp returns a single ", whereas in the VBA source code | |
| 3537 | + # it is always a double "". | |
| 3538 | + # We have to remove the " around the strings, then double the remaining ", | |
| 3539 | + # and put back the " around: | |
| 3540 | + if len(s)>=2: | |
| 3541 | + assert(s[0]=='"' and s[-1]=='"') | |
| 3542 | + s = s[1:-1] | |
| 3543 | + s = s.replace('"', '""') | |
| 3544 | + s = '"' + s + '"' | |
| 3545 | + keywords.add(s) | |
| 3546 | + log.debug('Keywords extracted from P-code: ' + repr(sorted(keywords))) | |
| 3547 | + self.vba_stomping_detected = False | |
| 3548 | + # TODO: add a method to get all VBA code as one string | |
| 3549 | + vba_code_all_modules = '' | |
| 3550 | + for (_, _, _, vba_code) in self.extract_all_macros(): | |
| 3551 | + vba_code_all_modules += vba_code + '\n' | |
| 3552 | + for keyword in keywords: | |
| 3553 | + if keyword not in vba_code_all_modules: | |
| 3554 | + log.debug('Keyword {!r} not found in VBA code'.format(keyword)) | |
| 3555 | + log.debug('VBA STOMPING DETECTED!') | |
| 3556 | + self.vba_stomping_detected = True | |
| 3557 | + break | |
| 3558 | + if not self.vba_stomping_detected: | |
| 3559 | + log.debug('No VBA stomping detected.') | |
| 3560 | + return self.vba_stomping_detected | |
| 3561 | + | |
| 3131 | 3562 | def close(self): |
| 3132 | 3563 | """ |
| 3133 | 3564 | Close all the open files. This method must be called after usage, if |
| ... | ... | @@ -3156,11 +3587,11 @@ class VBA_Parser_CLI(VBA_Parser): |
| 3156 | 3587 | super(VBA_Parser_CLI, self).__init__(*args, **kwargs) |
| 3157 | 3588 | |
| 3158 | 3589 | |
| 3159 | - def print_analysis(self, show_decoded_strings=False, deobfuscate=False): | |
| 3590 | + def run_analysis(self, show_decoded_strings=False, deobfuscate=False): | |
| 3160 | 3591 | """ |
| 3161 | - Analyze the provided VBA code, and print the results in a table | |
| 3592 | + Analyze the provided VBA code, without printing the results (yet) | |
| 3593 | + All results are stored in self.analysis_results. | |
| 3162 | 3594 | |
| 3163 | - :param vba_code: str, VBA source code to be analyzed | |
| 3164 | 3595 | :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content. |
| 3165 | 3596 | :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) |
| 3166 | 3597 | :return: None |
| ... | ... | @@ -3169,21 +3600,37 @@ class VBA_Parser_CLI(VBA_Parser): |
| 3169 | 3600 | if sys.stdout.isatty(): |
| 3170 | 3601 | print('Analysis...\r', end='') |
| 3171 | 3602 | sys.stdout.flush() |
| 3172 | - results = self.analyze_macros(show_decoded_strings, deobfuscate) | |
| 3603 | + self.analyze_macros(show_decoded_strings, deobfuscate) | |
| 3604 | + | |
| 3605 | + | |
| 3606 | + def print_analysis(self, show_decoded_strings=False, deobfuscate=False): | |
| 3607 | + """ | |
| 3608 | + print the analysis results in a table | |
| 3609 | + | |
| 3610 | + :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content. | |
| 3611 | + :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) | |
| 3612 | + :return: None | |
| 3613 | + """ | |
| 3614 | + results = self.analysis_results | |
| 3173 | 3615 | if results: |
| 3174 | - t = prettytable.PrettyTable(('Type', 'Keyword', 'Description')) | |
| 3175 | - t.align = 'l' | |
| 3176 | - t.max_width['Type'] = 10 | |
| 3177 | - t.max_width['Keyword'] = 20 | |
| 3178 | - t.max_width['Description'] = 39 | |
| 3616 | + t = tablestream.TableStream(column_width=(10, 20, 45), | |
| 3617 | + header_row=('Type', 'Keyword', 'Description')) | |
| 3618 | + COLOR_TYPE = { | |
| 3619 | + 'AutoExec': 'yellow', | |
| 3620 | + 'Suspicious': 'red', | |
| 3621 | + 'IOC': 'cyan', | |
| 3622 | + } | |
| 3179 | 3623 | for kw_type, keyword, description in results: |
| 3180 | 3624 | # handle non printable strings: |
| 3181 | 3625 | if not is_printable(keyword): |
| 3182 | 3626 | keyword = repr(keyword) |
| 3183 | 3627 | if not is_printable(description): |
| 3184 | 3628 | description = repr(description) |
| 3185 | - t.add_row((kw_type, keyword, description)) | |
| 3186 | - print(t) | |
| 3629 | + color_type = COLOR_TYPE.get(kw_type, None) | |
| 3630 | + t.write_row((kw_type, keyword, description), colors=(color_type, None, None)) | |
| 3631 | + t.close() | |
| 3632 | + if self.vba_stomping_detected: | |
| 3633 | + print('VBA Stomping detection is experimental: please report any false positive/negative at https://github.com/decalage2/oletools/issues') | |
| 3187 | 3634 | else: |
| 3188 | 3635 | print('No suspicious keyword or IOC found.') |
| 3189 | 3636 | |
| ... | ... | @@ -3204,10 +3651,29 @@ class VBA_Parser_CLI(VBA_Parser): |
| 3204 | 3651 | return [dict(type=kw_type, keyword=keyword, description=description) |
| 3205 | 3652 | for kw_type, keyword, description in self.analyze_macros(show_decoded_strings, deobfuscate)] |
| 3206 | 3653 | |
| 3654 | + def colorize_keywords(self, vba_code): | |
| 3655 | + """ | |
| 3656 | + Colorize keywords found during the VBA code analysis | |
| 3657 | + :param vba_code: str, VBA code to be colorized | |
| 3658 | + :return: str, VBA code including color tags for Colorclass | |
| 3659 | + """ | |
| 3660 | + results = self.analysis_results | |
| 3661 | + if results: | |
| 3662 | + COLOR_TYPE = { | |
| 3663 | + 'AutoExec': 'yellow', | |
| 3664 | + 'Suspicious': 'red', | |
| 3665 | + 'IOC': 'cyan', | |
| 3666 | + } | |
| 3667 | + for kw_type, keyword, description in results: | |
| 3668 | + color_type = COLOR_TYPE.get(kw_type, None) | |
| 3669 | + if color_type: | |
| 3670 | + vba_code = vba_code.replace(keyword, '{auto%s}%s{/%s}' % (color_type, keyword, color_type)) | |
| 3671 | + return vba_code | |
| 3672 | + | |
| 3207 | 3673 | def process_file(self, show_decoded_strings=False, |
| 3208 | 3674 | display_code=True, hide_attributes=True, |
| 3209 | 3675 | vba_code_only=False, show_deobfuscated_code=False, |
| 3210 | - deobfuscate=False): | |
| 3676 | + deobfuscate=False, pcode=False): | |
| 3211 | 3677 | """ |
| 3212 | 3678 | Process a single file |
| 3213 | 3679 | |
| ... | ... | @@ -3219,6 +3685,7 @@ class VBA_Parser_CLI(VBA_Parser): |
| 3219 | 3685 | otherwise each module is analyzed separately (old behaviour) |
| 3220 | 3686 | :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default) |
| 3221 | 3687 | :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) |
| 3688 | + :param pcode bool: if True, call pcodedmp to disassemble P-code and display it | |
| 3222 | 3689 | """ |
| 3223 | 3690 | #TODO: replace print by writing to a provided output file (sys.stdout by default) |
| 3224 | 3691 | # fix conflicting parameters: |
| ... | ... | @@ -3234,6 +3701,8 @@ class VBA_Parser_CLI(VBA_Parser): |
| 3234 | 3701 | #TODO: handle olefile errors, when an OLE file is malformed |
| 3235 | 3702 | print('Type: %s'% self.type) |
| 3236 | 3703 | if self.detect_vba_macros(): |
| 3704 | + # run analysis before displaying VBA code, in order to colorize found keywords | |
| 3705 | + self.run_analysis(show_decoded_strings=show_decoded_strings, deobfuscate=deobfuscate) | |
| 3237 | 3706 | #print 'Contains VBA Macros:' |
| 3238 | 3707 | for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros(): |
| 3239 | 3708 | if hide_attributes: |
| ... | ... | @@ -3251,21 +3720,30 @@ class VBA_Parser_CLI(VBA_Parser): |
| 3251 | 3720 | print('(empty macro)') |
| 3252 | 3721 | else: |
| 3253 | 3722 | # check if the VBA code contains special characters such as backspace (issue #358) |
| 3254 | - if b'\x08' in vba_code_filtered: | |
| 3723 | + if '\x08' in vba_code_filtered: | |
| 3255 | 3724 | log.warning('The VBA code contains special characters such as backspace, that may be used for obfuscation.') |
| 3256 | 3725 | if sys.stdout.isatty(): |
| 3257 | 3726 | # if the standard output is the console, we'll display colors |
| 3258 | 3727 | backspace = colorclass.Color(b'{autored}\\x08{/red}') |
| 3259 | 3728 | else: |
| 3260 | - backspace = b'\x08' | |
| 3729 | + backspace = '\x08' | |
| 3261 | 3730 | # replace backspace by "\x08" for display |
| 3262 | - vba_code_filtered = vba_code_filtered.replace(b'\x08', backspace) | |
| 3731 | + vba_code_filtered = vba_code_filtered.replace('\x08', backspace) | |
| 3732 | + try: | |
| 3733 | + # Colorize the interesting keywords in the output: | |
| 3734 | + # (unless the output is redirected to a file) | |
| 3735 | + if sys.stdout.isatty(): | |
| 3736 | + vba_code_filtered = colorclass.Color(self.colorize_keywords(vba_code_filtered)) | |
| 3737 | + except UnicodeError: | |
| 3738 | + # TODO better handling of Unicode | |
| 3739 | + log.error('Unicode conversion to be fixed before colorizing the output') | |
| 3263 | 3740 | print(vba_code_filtered) |
| 3264 | 3741 | for (subfilename, stream_path, form_string) in self.extract_form_strings(): |
| 3265 | - print('-' * 79) | |
| 3266 | - print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path)) | |
| 3267 | - print('- ' * 39) | |
| 3268 | - print(form_string) | |
| 3742 | + if form_string is not None: | |
| 3743 | + print('-' * 79) | |
| 3744 | + print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path)) | |
| 3745 | + print('- ' * 39) | |
| 3746 | + print(form_string) | |
| 3269 | 3747 | try: |
| 3270 | 3748 | for (subfilename, stream_path, form_variables) in self.extract_form_strings_extended(): |
| 3271 | 3749 | if form_variables is not None: |
| ... | ... | @@ -3277,6 +3755,11 @@ class VBA_Parser_CLI(VBA_Parser): |
| 3277 | 3755 | # display the exception with full stack trace for debugging |
| 3278 | 3756 | log.info('Error parsing form: %s' % exc) |
| 3279 | 3757 | log.debug('Traceback:', exc_info=True) |
| 3758 | + if pcode: | |
| 3759 | + print('-' * 79) | |
| 3760 | + print('P-CODE disassembly:') | |
| 3761 | + pcode = self.extract_pcode() | |
| 3762 | + print(pcode) | |
| 3280 | 3763 | |
| 3281 | 3764 | if not vba_code_only: |
| 3282 | 3765 | # analyse the code from all modules at once: |
| ... | ... | @@ -3398,16 +3881,6 @@ class VBA_Parser_CLI(VBA_Parser): |
| 3398 | 3881 | |
| 3399 | 3882 | line = '%-12s %s' % (flags, self.filename) |
| 3400 | 3883 | print(line) |
| 3401 | - | |
| 3402 | - # old table display: | |
| 3403 | - # macros = autoexec = suspicious = iocs = hexstrings = 'no' | |
| 3404 | - # if nb_macros: macros = 'YES:%d' % nb_macros | |
| 3405 | - # if nb_autoexec: autoexec = 'YES:%d' % nb_autoexec | |
| 3406 | - # if nb_suspicious: suspicious = 'YES:%d' % nb_suspicious | |
| 3407 | - # if nb_iocs: iocs = 'YES:%d' % nb_iocs | |
| 3408 | - # if nb_hexstrings: hexstrings = 'YES:%d' % nb_hexstrings | |
| 3409 | - # # 2nd line = info | |
| 3410 | - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % (self.type, macros, autoexec, suspicious, iocs, hexstrings) | |
| 3411 | 3884 | except Exception as exc: |
| 3412 | 3885 | # display the exception with full stack trace for debugging only |
| 3413 | 3886 | log.debug('Error processing file %s (%s)' % (self.filename, exc), |
| ... | ... | @@ -3415,20 +3888,6 @@ class VBA_Parser_CLI(VBA_Parser): |
| 3415 | 3888 | raise ProcessingError(self.filename, exc) |
| 3416 | 3889 | |
| 3417 | 3890 | |
| 3418 | - # t = prettytable.PrettyTable(('filename', 'type', 'macros', 'autoexec', 'suspicious', 'ioc', 'hexstrings'), | |
| 3419 | - # header=False, border=False) | |
| 3420 | - # t.align = 'l' | |
| 3421 | - # t.max_width['filename'] = 30 | |
| 3422 | - # t.max_width['type'] = 10 | |
| 3423 | - # t.max_width['macros'] = 6 | |
| 3424 | - # t.max_width['autoexec'] = 6 | |
| 3425 | - # t.max_width['suspicious'] = 6 | |
| 3426 | - # t.max_width['ioc'] = 6 | |
| 3427 | - # t.max_width['hexstrings'] = 6 | |
| 3428 | - # t.add_row((filename, ftype, macros, autoexec, suspicious, iocs, hexstrings)) | |
| 3429 | - # print t | |
| 3430 | - | |
| 3431 | - | |
| 3432 | 3891 | #=== MAIN ===================================================================== |
| 3433 | 3892 | |
| 3434 | 3893 | def parse_args(cmd_line_args=None): |
| ... | ... | @@ -3452,7 +3911,11 @@ def parse_args(cmd_line_args=None): |
| 3452 | 3911 | parser.add_option("-r", action="store_true", dest="recursive", |
| 3453 | 3912 | help='find files recursively in subdirectories.') |
| 3454 | 3913 | parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None, |
| 3455 | - help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)') | |
| 3914 | + help='if the file is a zip archive, open all files from it, using the provided password.') | |
| 3915 | + parser.add_option("-p", "--password", type='str', action='append', | |
| 3916 | + default=[], | |
| 3917 | + help='if encrypted office files are encountered, try ' | |
| 3918 | + 'decryption with this password. May be repeated.') | |
| 3456 | 3919 | parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*', |
| 3457 | 3920 | help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)') |
| 3458 | 3921 | # output mode; could make this even simpler with add_option(type='choice') but that would make |
| ... | ... | @@ -3484,12 +3947,17 @@ def parse_args(cmd_line_args=None): |
| 3484 | 3947 | help="Attempt to deobfuscate VBA expressions (slow)") |
| 3485 | 3948 | parser.add_option('--relaxed', dest="relaxed", action="store_true", default=False, |
| 3486 | 3949 | help="Do not raise errors if opening of substream fails") |
| 3950 | + parser.add_option('--pcode', dest="pcode", action="store_true", default=False, | |
| 3951 | + help="Disassemble and display the P-code (using pcodedmp)") | |
| 3487 | 3952 | |
| 3488 | 3953 | (options, args) = parser.parse_args(cmd_line_args) |
| 3489 | 3954 | |
| 3490 | 3955 | # Print help if no arguments are passed |
| 3491 | 3956 | if len(args) == 0: |
| 3492 | - print('olevba %s - http://decalage.info/python/oletools' % __version__) | |
| 3957 | + # print banner with version | |
| 3958 | + python_version = '%d.%d.%d' % sys.version_info[0:3] | |
| 3959 | + print('olevba %s on Python %s - http://decalage.info/python/oletools' % | |
| 3960 | + (__version__, python_version)) | |
| 3493 | 3961 | print(__doc__) |
| 3494 | 3962 | parser.print_help() |
| 3495 | 3963 | sys.exit(RETURN_WRONG_ARGS) |
| ... | ... | @@ -3499,6 +3967,112 @@ def parse_args(cmd_line_args=None): |
| 3499 | 3967 | return options, args |
| 3500 | 3968 | |
| 3501 | 3969 | |
| 3970 | +def process_file(filename, data, container, options, crypto_nesting=0): | |
| 3971 | + """ | |
| 3972 | + Part of main function that processes a single file. | |
| 3973 | + | |
| 3974 | + This handles exceptions and encryption. | |
| 3975 | + | |
| 3976 | + Returns a single code summarizing the status of processing of this file | |
| 3977 | + """ | |
| 3978 | + try: | |
| 3979 | + # Open the file | |
| 3980 | + vba_parser = VBA_Parser_CLI(filename, data=data, container=container, | |
| 3981 | + relaxed=options.relaxed) | |
| 3982 | + | |
| 3983 | + if options.output_mode == 'detailed': | |
| 3984 | + # fully detailed output | |
| 3985 | + vba_parser.process_file(show_decoded_strings=options.show_decoded_strings, | |
| 3986 | + display_code=options.display_code, | |
| 3987 | + hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only, | |
| 3988 | + show_deobfuscated_code=options.show_deobfuscated_code, | |
| 3989 | + deobfuscate=options.deobfuscate, pcode=options.pcode) | |
| 3990 | + elif options.output_mode == 'triage': | |
| 3991 | + # summarized output for triage: | |
| 3992 | + vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings, | |
| 3993 | + deobfuscate=options.deobfuscate) | |
| 3994 | + elif options.output_mode == 'json': | |
| 3995 | + print_json( | |
| 3996 | + vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings, | |
| 3997 | + display_code=options.display_code, | |
| 3998 | + hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only, | |
| 3999 | + show_deobfuscated_code=options.show_deobfuscated_code, | |
| 4000 | + deobfuscate=options.deobfuscate)) | |
| 4001 | + else: # (should be impossible) | |
| 4002 | + raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode)) | |
| 4003 | + | |
| 4004 | + # even if processing succeeds, file might still be encrypted | |
| 4005 | + log.debug('Checking for encryption (normal)') | |
| 4006 | + if not crypto.is_encrypted(filename): | |
| 4007 | + log.debug('no encryption detected') | |
| 4008 | + return RETURN_OK | |
| 4009 | + except Exception as exc: | |
| 4010 | + log.debug('Checking for encryption (after exception)') | |
| 4011 | + if crypto.is_encrypted(filename): | |
| 4012 | + pass # deal with this below | |
| 4013 | + else: | |
| 4014 | + if isinstance(exc, (SubstreamOpenError, UnexpectedDataError)): | |
| 4015 | + if options.output_mode in ('triage', 'unspecified'): | |
| 4016 | + print('%-12s %s - Error opening substream or uenxpected ' \ | |
| 4017 | + 'content' % ('?', filename)) | |
| 4018 | + elif options.output_mode == 'json': | |
| 4019 | + print_json(file=filename, type='error', | |
| 4020 | + error=type(exc).__name__, message=str(exc)) | |
| 4021 | + else: | |
| 4022 | + log.exception('Error opening substream or unexpected ' | |
| 4023 | + 'content in %s' % filename) | |
| 4024 | + return RETURN_OPEN_ERROR | |
| 4025 | + elif isinstance(exc, FileOpenError): | |
| 4026 | + if options.output_mode in ('triage', 'unspecified'): | |
| 4027 | + print('%-12s %s - File format not supported' % ('?', filename)) | |
| 4028 | + elif options.output_mode == 'json': | |
| 4029 | + print_json(file=filename, type='error', | |
| 4030 | + error=type(exc).__name__, message=str(exc)) | |
| 4031 | + else: | |
| 4032 | + log.exception('Failed to open %s -- probably not supported!' % filename) | |
| 4033 | + return RETURN_OPEN_ERROR | |
| 4034 | + elif isinstance(exc, ProcessingError): | |
| 4035 | + if options.output_mode in ('triage', 'unspecified'): | |
| 4036 | + print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc)) | |
| 4037 | + elif options.output_mode == 'json': | |
| 4038 | + print_json(file=filename, type='error', | |
| 4039 | + error=type(exc).__name__, | |
| 4040 | + message=str(exc.orig_exc)) | |
| 4041 | + else: | |
| 4042 | + log.exception('Error processing file %s (%s)!' | |
| 4043 | + % (filename, exc.orig_exc)) | |
| 4044 | + return RETURN_PARSE_ERROR | |
| 4045 | + else: | |
| 4046 | + raise # let caller deal with this | |
| 4047 | + | |
| 4048 | + # we reach this point only if file is encrypted | |
| 4049 | + # check if this is an encrypted file in an encrypted file in an ... | |
| 4050 | + if crypto_nesting >= crypto.MAX_NESTING_DEPTH: | |
| 4051 | + raise crypto.MaxCryptoNestingReached(crypto_nesting, filename) | |
| 4052 | + | |
| 4053 | + decrypted_file = None | |
| 4054 | + try: | |
| 4055 | + log.debug('Checking encryption passwords {}'.format(options.password)) | |
| 4056 | + passwords = options.password + crypto.DEFAULT_PASSWORDS | |
| 4057 | + decrypted_file = crypto.decrypt(filename, passwords) | |
| 4058 | + if not decrypted_file: | |
| 4059 | + log.error('Decrypt failed, run with debug output to get details') | |
| 4060 | + raise crypto.WrongEncryptionPassword(filename) | |
| 4061 | + log.info('Working on decrypted file') | |
| 4062 | + return process_file(decrypted_file, data, container or filename, | |
| 4063 | + options, crypto_nesting+1) | |
| 4064 | + except Exception: | |
| 4065 | + raise | |
| 4066 | + finally: # clean up | |
| 4067 | + try: | |
| 4068 | + log.debug('Removing crypt temp file {}'.format(decrypted_file)) | |
| 4069 | + os.unlink(decrypted_file) | |
| 4070 | + except Exception: # e.g. file does not exist or is None | |
| 4071 | + pass | |
| 4072 | + # no idea what to return now | |
| 4073 | + raise Exception('Programming error -- should never have reached this!') | |
| 4074 | + | |
| 4075 | + | |
| 3502 | 4076 | def main(cmd_line_args=None): |
| 3503 | 4077 | """ |
| 3504 | 4078 | Main function, called when olevba is run from the command line |
| ... | ... | @@ -3517,52 +4091,60 @@ def main(cmd_line_args=None): |
| 3517 | 4091 | url='http://decalage.info/python/oletools', |
| 3518 | 4092 | type='MetaInformation', _json_is_first=True) |
| 3519 | 4093 | else: |
| 3520 | - print('olevba %s - http://decalage.info/python/oletools' % __version__) | |
| 4094 | + # print banner with version | |
| 4095 | + python_version = '%d.%d.%d' % sys.version_info[0:3] | |
| 4096 | + print('olevba %s on Python %s - http://decalage.info/python/oletools' % | |
| 4097 | + (__version__, python_version)) | |
| 3521 | 4098 | |
| 3522 | 4099 | logging.basicConfig(level=options.loglevel, format='%(levelname)-8s %(message)s') |
| 3523 | 4100 | # enable logging in the modules: |
| 3524 | 4101 | enable_logging() |
| 3525 | 4102 | |
| 3526 | - # Old display with number of items detected: | |
| 3527 | - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('Type', 'Macros', 'AutoEx', 'Susp.', 'IOCs', 'HexStr') | |
| 3528 | - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('-'*8, '-'*7, '-'*7, '-'*7, '-'*7, '-'*7) | |
| 3529 | - | |
| 3530 | 4103 | # with the option --reveal, make sure --deobf is also enabled: |
| 3531 | 4104 | if options.show_deobfuscated_code and not options.deobfuscate: |
| 3532 | - log.info('set --deobf because --reveal was set') | |
| 4105 | + log.debug('set --deobf because --reveal was set') | |
| 3533 | 4106 | options.deobfuscate = True |
| 3534 | 4107 | if options.output_mode == 'triage' and options.show_deobfuscated_code: |
| 3535 | - log.info('ignoring option --reveal in triage output mode') | |
| 4108 | + log.debug('ignoring option --reveal in triage output mode') | |
| 4109 | + | |
| 4110 | + # gather info on all files that must be processed | |
| 4111 | + # ignore directory names stored in zip files: | |
| 4112 | + all_input_info = tuple((container, filename, data) for | |
| 4113 | + container, filename, data in xglob.iter_files( | |
| 4114 | + args, recursive=options.recursive, | |
| 4115 | + zip_password=options.zip_password, | |
| 4116 | + zip_fname=options.zip_fname) | |
| 4117 | + if not (container and filename.endswith('/'))) | |
| 4118 | + | |
| 4119 | + # specify output mode if options -t, -d and -j were not specified | |
| 4120 | + if options.output_mode == 'unspecified': | |
| 4121 | + if len(all_input_info) == 1: | |
| 4122 | + options.output_mode = 'detailed' | |
| 4123 | + else: | |
| 4124 | + options.output_mode = 'triage' | |
| 3536 | 4125 | |
| 3537 | - # Column headers (do not know how many files there will be yet, so if no output_mode | |
| 3538 | - # was specified, we will print triage for first file --> need these headers) | |
| 3539 | - if options.output_mode in ('triage', 'unspecified'): | |
| 4126 | + # Column headers for triage mode | |
| 4127 | + if options.output_mode == 'triage': | |
| 3540 | 4128 | print('%-12s %-65s' % ('Flags', 'Filename')) |
| 3541 | 4129 | print('%-12s %-65s' % ('-' * 11, '-' * 65)) |
| 3542 | 4130 | |
| 3543 | 4131 | previous_container = None |
| 3544 | 4132 | count = 0 |
| 3545 | 4133 | container = filename = data = None |
| 3546 | - vba_parser = None | |
| 3547 | 4134 | return_code = RETURN_OK |
| 3548 | 4135 | try: |
| 3549 | - for container, filename, data in xglob.iter_files(args, recursive=options.recursive, | |
| 3550 | - zip_password=options.zip_password, zip_fname=options.zip_fname): | |
| 3551 | - # ignore directory names stored in zip files: | |
| 3552 | - if container and filename.endswith('/'): | |
| 3553 | - continue | |
| 3554 | - | |
| 4136 | + for container, filename, data in all_input_info: | |
| 3555 | 4137 | # handle errors from xglob |
| 3556 | 4138 | if isinstance(data, Exception): |
| 3557 | 4139 | if isinstance(data, PathNotFoundException): |
| 3558 | - if options.output_mode in ('triage', 'unspecified'): | |
| 4140 | + if options.output_mode == 'triage': | |
| 3559 | 4141 | print('%-12s %s - File not found' % ('?', filename)) |
| 3560 | 4142 | elif options.output_mode != 'json': |
| 3561 | 4143 | log.error('Given path %r does not exist!' % filename) |
| 3562 | 4144 | return_code = RETURN_FILE_NOT_FOUND if return_code == 0 \ |
| 3563 | 4145 | else RETURN_SEVERAL_ERRS |
| 3564 | 4146 | else: |
| 3565 | - if options.output_mode in ('triage', 'unspecified'): | |
| 4147 | + if options.output_mode == 'triage': | |
| 3566 | 4148 | print('%-12s %s - Failed to read from zip file %s' % ('?', filename, container)) |
| 3567 | 4149 | elif options.output_mode != 'json': |
| 3568 | 4150 | log.error('Exception opening/reading %r from zip file %r: %s' |
| ... | ... | @@ -3574,107 +4156,42 @@ def main(cmd_line_args=None): |
| 3574 | 4156 | error=type(data).__name__, message=str(data)) |
| 3575 | 4157 | continue |
| 3576 | 4158 | |
| 3577 | - try: | |
| 3578 | - # close the previous file if analyzing several: | |
| 3579 | - # (this must be done here to avoid closing the file if there is only 1, | |
| 3580 | - # to fix issue #219) | |
| 3581 | - if vba_parser is not None: | |
| 3582 | - vba_parser.close() | |
| 3583 | - # Open the file | |
| 3584 | - vba_parser = VBA_Parser_CLI(filename, data=data, container=container, | |
| 3585 | - relaxed=options.relaxed) | |
| 3586 | - | |
| 3587 | - if options.output_mode == 'detailed': | |
| 3588 | - # fully detailed output | |
| 3589 | - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings, | |
| 3590 | - display_code=options.display_code, | |
| 3591 | - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only, | |
| 3592 | - show_deobfuscated_code=options.show_deobfuscated_code, | |
| 3593 | - deobfuscate=options.deobfuscate) | |
| 3594 | - elif options.output_mode in ('triage', 'unspecified'): | |
| 3595 | - # print container name when it changes: | |
| 3596 | - if container != previous_container: | |
| 3597 | - if container is not None: | |
| 3598 | - print('\nFiles in %s:' % container) | |
| 3599 | - previous_container = container | |
| 3600 | - # summarized output for triage: | |
| 3601 | - vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings, | |
| 3602 | - deobfuscate=options.deobfuscate) | |
| 3603 | - elif options.output_mode == 'json': | |
| 3604 | - print_json( | |
| 3605 | - vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings, | |
| 3606 | - display_code=options.display_code, | |
| 3607 | - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only, | |
| 3608 | - show_deobfuscated_code=options.show_deobfuscated_code, | |
| 3609 | - deobfuscate=options.deobfuscate)) | |
| 3610 | - else: # (should be impossible) | |
| 3611 | - raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode)) | |
| 3612 | - count += 1 | |
| 3613 | - | |
| 3614 | - except (SubstreamOpenError, UnexpectedDataError) as exc: | |
| 3615 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3616 | - print('%-12s %s - Error opening substream or uenxpected ' \ | |
| 3617 | - 'content' % ('?', filename)) | |
| 3618 | - elif options.output_mode == 'json': | |
| 3619 | - print_json(file=filename, type='error', | |
| 3620 | - error=type(exc).__name__, message=str(exc)) | |
| 3621 | - else: | |
| 3622 | - log.exception('Error opening substream or unexpected ' | |
| 3623 | - 'content in %s' % filename) | |
| 3624 | - return_code = RETURN_OPEN_ERROR if return_code == 0 \ | |
| 3625 | - else RETURN_SEVERAL_ERRS | |
| 3626 | - except FileOpenError as exc: | |
| 3627 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3628 | - print('%-12s %s - File format not supported' % ('?', filename)) | |
| 3629 | - elif options.output_mode == 'json': | |
| 3630 | - print_json(file=filename, type='error', | |
| 3631 | - error=type(exc).__name__, message=str(exc)) | |
| 3632 | - else: | |
| 3633 | - log.exception('Failed to open %s -- probably not supported!' % filename) | |
| 3634 | - return_code = RETURN_OPEN_ERROR if return_code == 0 \ | |
| 3635 | - else RETURN_SEVERAL_ERRS | |
| 3636 | - except ProcessingError as exc: | |
| 3637 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3638 | - print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc)) | |
| 3639 | - elif options.output_mode == 'json': | |
| 3640 | - print_json(file=filename, type='error', | |
| 3641 | - error=type(exc).__name__, | |
| 3642 | - message=str(exc.orig_exc)) | |
| 3643 | - else: | |
| 3644 | - log.exception('Error processing file %s (%s)!' | |
| 3645 | - % (filename, exc.orig_exc)) | |
| 3646 | - return_code = RETURN_PARSE_ERROR if return_code == 0 \ | |
| 3647 | - else RETURN_SEVERAL_ERRS | |
| 3648 | - except FileIsEncryptedError as exc: | |
| 3649 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3650 | - print('%-12s %s - File is encrypted' % ('!ERROR', filename)) | |
| 3651 | - elif options.output_mode == 'json': | |
| 3652 | - print_json(file=filename, type='error', | |
| 3653 | - error=type(exc).__name__, message=str(exc)) | |
| 3654 | - else: | |
| 3655 | - log.exception('File %s is encrypted!' % (filename)) | |
| 3656 | - return_code = RETURN_ENCRYPTED if return_code == 0 \ | |
| 3657 | - else RETURN_SEVERAL_ERRS | |
| 3658 | - # Here we do not close the vba_parser, because process_file may need it below. | |
| 4159 | + if options.output_mode == 'triage': | |
| 4160 | + # print container name when it changes: | |
| 4161 | + if container != previous_container: | |
| 4162 | + if container is not None: | |
| 4163 | + print('\nFiles in %s:' % container) | |
| 4164 | + previous_container = container | |
| 4165 | + | |
| 4166 | + # process the file, handling errors and encryption | |
| 4167 | + curr_return_code = process_file(filename, data, container, options) | |
| 4168 | + count += 1 | |
| 4169 | + | |
| 4170 | + # adjust overall return code | |
| 4171 | + if curr_return_code == RETURN_OK: | |
| 4172 | + continue # do not modify overall return code | |
| 4173 | + if return_code == RETURN_OK: | |
| 4174 | + return_code = curr_return_code # first error return code | |
| 4175 | + else: | |
| 4176 | + return_code = RETURN_SEVERAL_ERRS # several errors | |
| 3659 | 4177 | |
| 3660 | 4178 | if options.output_mode == 'triage': |
| 3661 | 4179 | print('\n(Flags: OpX=OpenXML, XML=Word2003XML, FlX=FlatOPC XML, MHT=MHTML, TXT=Text, M=Macros, ' \ |
| 3662 | 4180 | 'A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, ' \ |
| 3663 | 4181 | 'B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown)\n') |
| 3664 | 4182 | |
| 3665 | - if count == 1 and options.output_mode == 'unspecified': | |
| 3666 | - # if options -t, -d and -j were not specified and it's a single file, print details: | |
| 3667 | - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings, | |
| 3668 | - display_code=options.display_code, | |
| 3669 | - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only, | |
| 3670 | - show_deobfuscated_code=options.show_deobfuscated_code, | |
| 3671 | - deobfuscate=options.deobfuscate) | |
| 3672 | - | |
| 3673 | 4183 | if options.output_mode == 'json': |
| 3674 | 4184 | # print last json entry (a last one without a comma) and closing ] |
| 3675 | 4185 | print_json(type='MetaInformation', return_code=return_code, |
| 3676 | 4186 | n_processed=count, _json_is_last=True) |
| 3677 | 4187 | |
| 4188 | + except crypto.CryptoErrorBase as exc: | |
| 4189 | + log.exception('Problems with encryption in main: {}'.format(exc), | |
| 4190 | + exc_info=True) | |
| 4191 | + if return_code == RETURN_OK: | |
| 4192 | + return_code = RETURN_ENCRYPTED | |
| 4193 | + else: | |
| 4194 | + return_code == RETURN_SEVERAL_ERRS | |
| 3678 | 4195 | except Exception as exc: |
| 3679 | 4196 | # some unexpected error, maybe some of the types caught in except clauses |
| 3680 | 4197 | # above were not sufficient. This is very bad, so log complete trace at exception level | ... | ... |
oletools/olevba3.py
| 1 | 1 | #!/usr/bin/env python |
| 2 | -""" | |
| 3 | -olevba3.py | |
| 4 | 2 | |
| 5 | -olevba is a script to parse OLE and OpenXML files such as MS Office documents | |
| 6 | -(e.g. Word, Excel), to extract VBA Macro code in clear text, deobfuscate | |
| 7 | -and analyze malicious macros. | |
| 3 | +# olevba3 is a stub that redirects to olevba.py, for backwards compatibility | |
| 8 | 4 | |
| 9 | -olevba3 is the version of olevba that runs on Python 3.x. | |
| 5 | +import sys, os, warnings | |
| 10 | 6 | |
| 11 | -Supported formats: | |
| 12 | -- Word 97-2003 (.doc, .dot), Word 2007+ (.docm, .dotm) | |
| 13 | -- Excel 97-2003 (.xls), Excel 2007+ (.xlsm, .xlsb) | |
| 14 | -- PowerPoint 97-2003 (.ppt), PowerPoint 2007+ (.pptm, .ppsm) | |
| 15 | -- Word/PowerPoint 2007+ XML (aka Flat OPC) | |
| 16 | -- Word 2003 XML (.xml) | |
| 17 | -- Word/Excel Single File Web Page / MHTML (.mht) | |
| 18 | -- Publisher (.pub) | |
| 19 | -- raises an error if run with files encrypted using MS Crypto API RC4 | |
| 20 | - | |
| 21 | -Author: Philippe Lagadec - http://www.decalage.info | |
| 22 | -License: BSD, see source code or documentation | |
| 23 | - | |
| 24 | -olevba is part of the python-oletools package: | |
| 25 | -http://www.decalage.info/python/oletools | |
| 26 | - | |
| 27 | -olevba is based on source code from officeparser by John William Davison | |
| 28 | -https://github.com/unixfreak0037/officeparser | |
| 29 | -""" | |
| 30 | - | |
| 31 | -# === LICENSE ================================================================== | |
| 32 | - | |
| 33 | -# olevba is copyright (c) 2014-2018 Philippe Lagadec (http://www.decalage.info) | |
| 34 | -# All rights reserved. | |
| 35 | -# | |
| 36 | -# Redistribution and use in source and binary forms, with or without modification, | |
| 37 | -# are permitted provided that the following conditions are met: | |
| 38 | -# | |
| 39 | -# * Redistributions of source code must retain the above copyright notice, this | |
| 40 | -# list of conditions and the following disclaimer. | |
| 41 | -# * Redistributions in binary form must reproduce the above copyright notice, | |
| 42 | -# this list of conditions and the following disclaimer in the documentation | |
| 43 | -# and/or other materials provided with the distribution. | |
| 44 | -# | |
| 45 | -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |
| 46 | -# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
| 47 | -# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |
| 48 | -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | |
| 49 | -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
| 50 | -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | |
| 51 | -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | |
| 52 | -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | |
| 53 | -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
| 54 | -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
| 55 | - | |
| 56 | - | |
| 57 | -# olevba contains modified source code from the officeparser project, published | |
| 58 | -# under the following MIT License (MIT): | |
| 59 | -# | |
| 60 | -# officeparser is copyright (c) 2014 John William Davison | |
| 61 | -# | |
| 62 | -# Permission is hereby granted, free of charge, to any person obtaining a copy | |
| 63 | -# of this software and associated documentation files (the "Software"), to deal | |
| 64 | -# in the Software without restriction, including without limitation the rights | |
| 65 | -# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | |
| 66 | -# copies of the Software, and to permit persons to whom the Software is | |
| 67 | -# furnished to do so, subject to the following conditions: | |
| 68 | -# | |
| 69 | -# The above copyright notice and this permission notice shall be included in all | |
| 70 | -# copies or substantial portions of the Software. | |
| 71 | -# | |
| 72 | -# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | |
| 73 | -# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | |
| 74 | -# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | |
| 75 | -# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | |
| 76 | -# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | |
| 77 | -# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE | |
| 78 | -# SOFTWARE. | |
| 79 | - | |
| 80 | -from __future__ import print_function | |
| 81 | - | |
| 82 | - | |
| 83 | -#------------------------------------------------------------------------------ | |
| 84 | -# CHANGELOG: | |
| 85 | -# 2014-08-05 v0.01 PL: - first version based on officeparser code | |
| 86 | -# 2014-08-14 v0.02 PL: - fixed bugs in code, added license from officeparser | |
| 87 | -# 2014-08-15 PL: - fixed incorrect value check in projecthelpfilepath Record | |
| 88 | -# 2014-08-15 v0.03 PL: - refactored extract_macros to support OpenXML formats | |
| 89 | -# and to find the VBA project root anywhere in the file | |
| 90 | -# 2014-11-29 v0.04 PL: - use olefile instead of OleFileIO_PL | |
| 91 | -# 2014-12-05 v0.05 PL: - refactored most functions into a class, new API | |
| 92 | -# - added detect_vba_macros | |
| 93 | -# 2014-12-10 v0.06 PL: - hide first lines with VB attributes | |
| 94 | -# - detect auto-executable macros | |
| 95 | -# - ignore empty macros | |
| 96 | -# 2014-12-14 v0.07 PL: - detect_autoexec() is now case-insensitive | |
| 97 | -# 2014-12-15 v0.08 PL: - improved display for empty macros | |
| 98 | -# - added pattern extraction | |
| 99 | -# 2014-12-25 v0.09 PL: - added suspicious keywords detection | |
| 100 | -# 2014-12-27 v0.10 PL: - added OptionParser, main and process_file | |
| 101 | -# - uses xglob to scan several files with wildcards | |
| 102 | -# - option -r to recurse subdirectories | |
| 103 | -# - option -z to scan files in password-protected zips | |
| 104 | -# 2015-01-02 v0.11 PL: - improved filter_vba to detect colons | |
| 105 | -# 2015-01-03 v0.12 PL: - fixed detect_patterns to detect all patterns | |
| 106 | -# - process_file: improved display, shows container file | |
| 107 | -# - improved list of executable file extensions | |
| 108 | -# 2015-01-04 v0.13 PL: - added several suspicious keywords, improved display | |
| 109 | -# 2015-01-08 v0.14 PL: - added hex strings detection and decoding | |
| 110 | -# - fixed issue #2, decoding VBA stream names using | |
| 111 | -# specified codepage and unicode stream names | |
| 112 | -# 2015-01-11 v0.15 PL: - added new triage mode, options -t and -d | |
| 113 | -# 2015-01-16 v0.16 PL: - fix for issue #3 (exception when module name="text") | |
| 114 | -# - added several suspicious keywords | |
| 115 | -# - added option -i to analyze VBA source code directly | |
| 116 | -# 2015-01-17 v0.17 PL: - removed .com from the list of executable extensions | |
| 117 | -# - added scan_vba to run all detection algorithms | |
| 118 | -# - decoded hex strings are now also scanned + reversed | |
| 119 | -# 2015-01-23 v0.18 PL: - fixed issue #3, case-insensitive search in code_modules | |
| 120 | -# 2015-01-24 v0.19 PL: - improved the detection of IOCs obfuscated with hex | |
| 121 | -# strings and StrReverse | |
| 122 | -# 2015-01-26 v0.20 PL: - added option --hex to show all hex strings decoded | |
| 123 | -# 2015-01-29 v0.21 PL: - added Dridex obfuscation decoding | |
| 124 | -# - improved display, shows obfuscation name | |
| 125 | -# 2015-02-01 v0.22 PL: - fixed issue #4: regex for URL, e-mail and exe filename | |
| 126 | -# - added Base64 obfuscation decoding (contribution from | |
| 127 | -# @JamesHabben) | |
| 128 | -# 2015-02-03 v0.23 PL: - triage now uses VBA_Scanner results, shows Base64 and | |
| 129 | -# Dridex strings | |
| 130 | -# - exception handling in detect_base64_strings | |
| 131 | -# 2015-02-07 v0.24 PL: - renamed option --hex to --decode, fixed display | |
| 132 | -# - display exceptions with stack trace | |
| 133 | -# - added several suspicious keywords | |
| 134 | -# - improved Base64 detection and decoding | |
| 135 | -# - fixed triage mode not to scan attrib lines | |
| 136 | -# 2015-03-04 v0.25 PL: - added support for Word 2003 XML | |
| 137 | -# 2015-03-22 v0.26 PL: - added suspicious keywords for sandboxing and | |
| 138 | -# virtualisation detection | |
| 139 | -# 2015-05-06 v0.27 PL: - added support for MHTML files with VBA macros | |
| 140 | -# (issue #10 reported by Greg from SpamStopsHere) | |
| 141 | -# 2015-05-24 v0.28 PL: - improved support for MHTML files with modified header | |
| 142 | -# (issue #11 reported by Thomas Chopitea) | |
| 143 | -# 2015-05-26 v0.29 PL: - improved MSO files parsing, taking into account | |
| 144 | -# various data offsets (issue #12) | |
| 145 | -# - improved detection of MSO files, avoiding incorrect | |
| 146 | -# parsing errors (issue #7) | |
| 147 | -# 2015-05-29 v0.30 PL: - added suspicious keywords suggested by @ozhermit, | |
| 148 | -# Davy Douhine (issue #9), issue #13 | |
| 149 | -# 2015-06-16 v0.31 PL: - added generic VBA expression deobfuscation (chr,asc,etc) | |
| 150 | -# 2015-06-19 PL: - added options -a, -c, --each, --attr | |
| 151 | -# 2015-06-21 v0.32 PL: - always display decoded strings which are printable | |
| 152 | -# - fix VBA_Scanner.scan to return raw strings, not repr() | |
| 153 | -# 2015-07-09 v0.40 PL: - removed usage of sys.stderr which causes issues | |
| 154 | -# 2015-07-12 PL: - added Hex function decoding to VBA Parser | |
| 155 | -# 2015-07-13 PL: - added Base64 function decoding to VBA Parser | |
| 156 | -# 2015-09-06 PL: - improved VBA_Parser, refactored the main functions | |
| 157 | -# 2015-09-13 PL: - moved main functions to a class VBA_Parser_CLI | |
| 158 | -# - fixed issue when analysis was done twice | |
| 159 | -# 2015-09-15 PL: - remove duplicate IOCs from results | |
| 160 | -# 2015-09-16 PL: - join long VBA lines ending with underscore before scan | |
| 161 | -# - disabled unused option --each | |
| 162 | -# 2015-09-22 v0.41 PL: - added new option --reveal | |
| 163 | -# - added suspicious strings for PowerShell.exe options | |
| 164 | -# 2015-10-09 v0.42 PL: - VBA_Parser: split each format into a separate method | |
| 165 | -# 2015-10-10 PL: - added support for text files with VBA source code | |
| 166 | -# 2015-11-17 PL: - fixed bug with --decode option | |
| 167 | -# 2015-12-16 PL: - fixed bug in main (no options input anymore) | |
| 168 | -# - improved logging, added -l option | |
| 169 | -# 2016-01-31 PL: - fixed issue #31 in VBA_Parser.open_mht | |
| 170 | -# - fixed issue #32 by monkeypatching email.feedparser | |
| 171 | -# 2016-02-07 PL: - KeyboardInterrupt is now raised properly | |
| 172 | -# 2016-02-20 v0.43 PL: - fixed issue #34 in the VBA parser and vba_chr | |
| 173 | -# 2016-02-29 PL: - added Workbook_Activate to suspicious keywords | |
| 174 | -# 2016-03-08 v0.44 PL: - added VBA Form strings extraction and analysis | |
| 175 | -# 2016-03-04 v0.45 CH: - added JSON output (by Christian Herdtweck) | |
| 176 | -# 2016-03-16 CH: - added option --no-deobfuscate (temporary) | |
| 177 | -# 2016-04-19 v0.46 PL: - new option --deobf instead of --no-deobfuscate | |
| 178 | -# - updated suspicious keywords | |
| 179 | -# 2016-05-04 v0.47 PL: - look for VBA code in any stream including orphans | |
| 180 | -# 2016-04-28 CH: - return an exit code depending on the results | |
| 181 | -# - improved error and exception handling | |
| 182 | -# - improved JSON output | |
| 183 | -# 2016-05-12 CH: - added support for PowerPoint 97-2003 files | |
| 184 | -# 2016-06-06 CH: - improved handling of unicode VBA module names | |
| 185 | -# 2016-06-07 CH: - added option --relaxed, stricter parsing by default | |
| 186 | -# 2016-06-12 v0.50 PL: - fixed small bugs in VBA parsing code | |
| 187 | -# 2016-07-01 PL: - fixed issue #58 with format() to support Python 2.6 | |
| 188 | -# 2016-07-29 CH: - fixed several bugs including #73 (Mac Roman encoding) | |
| 189 | -# 2016-08-31 PL: - added autoexec keyword InkPicture_Painted | |
| 190 | -# - detect_autoexec now returns the exact keyword found | |
| 191 | -# 2016-09-05 PL: - added autoexec keywords for MS Publisher (.pub) | |
| 192 | -# 2016-09-06 PL: - fixed issue #20, is_zipfile on Python 2.6 | |
| 193 | -# 2016-09-12 PL: - enabled packrat to improve pyparsing performance | |
| 194 | -# 2016-10-25 PL: - fixed raise and print statements for Python 3 | |
| 195 | -# 2016-11-03 v0.51 PL: - added EnumDateFormats and EnumSystemLanguageGroupsW | |
| 196 | -# 2017-02-07 PL: - temporary fix for issue #132 | |
| 197 | -# - added keywords for Mac-specific macros (issue #130) | |
| 198 | -# 2017-03-08 PL: - fixed absolute imports | |
| 199 | -# 2017-03-16 PL: - fixed issues #148 and #149 for option --reveal | |
| 200 | -# 2017-05-19 PL: - added enable_logging to fix issue #154 | |
| 201 | -# 2017-05-31 c1fe: - PR #135 fixing issue #132 for some Mac files | |
| 202 | -# 2017-06-08 PL: - fixed issue #122 Chr() with negative numbers | |
| 203 | -# 2017-06-15 PL: - deobfuscation line by line to handle large files | |
| 204 | -# 2017-07-11 v0.52 PL: - raise exception instead of sys.exit (issue #180) | |
| 205 | -# 2018-03-19 PL: - removed pyparsing from the thirdparty subfolder | |
| 206 | -# 2018-05-13 v0.53 PL: - added support for Word/PowerPoint 2007+ XML (FlatOPC) | |
| 207 | -# (issue #283) | |
| 208 | -# 2018-06-11 v0.53.1 MHW: - fixed #320: chr instead of unichr on python 3 | |
| 209 | -# 2018-06-12 MHW: - fixed #322: import reduce from functools | |
| 210 | -# 2018-09-11 v0.54 PL: - olefile is now a dependency | |
| 211 | -# 2018-10-25 CH: - detect encryption and raise error if detected | |
| 212 | - | |
| 213 | -__version__ = '0.54dev4' | |
| 214 | - | |
| 215 | -#------------------------------------------------------------------------------ | |
| 216 | -# TODO: | |
| 217 | -# + setup logging (common with other oletools) | |
| 218 | -# + add xor bruteforcing like bbharvest | |
| 219 | -# + options -a and -c should imply -d | |
| 220 | - | |
| 221 | -# TODO later: | |
| 222 | -# + performance improvement: instead of searching each keyword separately, | |
| 223 | -# first split vba code into a list of words (per line), then check each | |
| 224 | -# word against a dict. (or put vba words into a set/dict?) | |
| 225 | -# + for regex, maybe combine them into a single re with named groups? | |
| 226 | -# + add Yara support, include sample rules? plugins like balbuzard? | |
| 227 | -# + add balbuzard support | |
| 228 | -# + output to file (replace print by file.write, sys.stdout by default) | |
| 229 | -# + look for VBA in embedded documents (e.g. Excel in Word) | |
| 230 | -# + support SRP streams (see Lenny's article + links and sample) | |
| 231 | -# - python 3.x support | |
| 232 | -# - check VBA macros in Visio, Access, Project, etc | |
| 233 | -# - extract_macros: convert to a class, split long function into smaller methods | |
| 234 | -# - extract_macros: read bytes from stream file objects instead of strings | |
| 235 | -# - extract_macros: use combined struct.unpack instead of many calls | |
| 236 | -# - all except clauses should target specific exceptions | |
| 237 | - | |
| 238 | -#------------------------------------------------------------------------------ | |
| 239 | -# REFERENCES: | |
| 240 | -# - [MS-OVBA]: Microsoft Office VBA File Format Structure | |
| 241 | -# http://msdn.microsoft.com/en-us/library/office/cc313094%28v=office.12%29.aspx | |
| 242 | -# - officeparser: https://github.com/unixfreak0037/officeparser | |
| 243 | - | |
| 244 | - | |
| 245 | -#--- IMPORTS ------------------------------------------------------------------ | |
| 246 | - | |
| 247 | -import sys | |
| 248 | -import os | |
| 249 | -import logging | |
| 250 | -import struct | |
| 251 | -from _io import StringIO,BytesIO | |
| 252 | -import math | |
| 253 | -import zipfile | |
| 254 | -import re | |
| 255 | -import optparse | |
| 256 | -import binascii | |
| 257 | -import base64 | |
| 258 | -import zlib | |
| 259 | -import email # for MHTML parsing | |
| 260 | -import string # for printable | |
| 261 | -import json # for json output mode (argument --json) | |
| 262 | -from functools import reduce | |
| 263 | - | |
| 264 | -# import lxml or ElementTree for XML parsing: | |
| 265 | -try: | |
| 266 | - # lxml: best performance for XML processing | |
| 267 | - import lxml.etree as ET | |
| 268 | -except ImportError: | |
| 269 | - try: | |
| 270 | - # Python 2.5+: batteries included | |
| 271 | - import xml.etree.cElementTree as ET | |
| 272 | - except ImportError: | |
| 273 | - try: | |
| 274 | - # Python <2.5: standalone ElementTree install | |
| 275 | - import elementtree.cElementTree as ET | |
| 276 | - except ImportError: | |
| 277 | - raise ImportError("lxml or ElementTree are not installed, " \ | |
| 278 | - + "see http://codespeak.net/lxml " \ | |
| 279 | - + "or http://effbot.org/zone/element-index.htm") | |
| 7 | +warnings.warn('olevba3 is deprecated, olevba should be used instead.', DeprecationWarning) | |
| 280 | 8 | |
| 281 | 9 | # IMPORTANT: it should be possible to run oletools directly as scripts |
| 282 | 10 | # in any directory without installing them with pip or setup.py. |
| ... | ... | @@ -284,3374 +12,13 @@ except ImportError: |
| 284 | 12 | # And to enable Python 2+3 compatibility, we need to use absolute imports, |
| 285 | 13 | # so we add the oletools parent folder to sys.path (absolute+normalized path): |
| 286 | 14 | _thismodule_dir = os.path.normpath(os.path.abspath(os.path.dirname(__file__))) |
| 287 | -# print('_thismodule_dir = %r' % _thismodule_dir) | |
| 288 | 15 | _parent_dir = os.path.normpath(os.path.join(_thismodule_dir, '..')) |
| 289 | -# print('_parent_dir = %r' % _thirdparty_dir) | |
| 290 | -if not _parent_dir in sys.path: | |
| 16 | +if _parent_dir not in sys.path: | |
| 291 | 17 | sys.path.insert(0, _parent_dir) |
| 292 | 18 | |
| 293 | -import olefile | |
| 294 | -from oletools.thirdparty.prettytable import prettytable | |
| 295 | -from oletools.thirdparty.xglob import xglob, PathNotFoundException | |
| 296 | -from pyparsing import \ | |
| 297 | - CaselessKeyword, CaselessLiteral, Combine, Forward, Literal, \ | |
| 298 | - Optional, QuotedString,Regex, Suppress, Word, WordStart, \ | |
| 299 | - alphanums, alphas, hexnums,nums, opAssoc, srange, \ | |
| 300 | - infixNotation, ParserElement | |
| 301 | -import oletools.ppt_parser as ppt_parser | |
| 302 | -from oletools import rtfobj | |
| 303 | -from oletools import oleid | |
| 304 | -from oletools.common.errors import FileIsEncryptedError | |
| 305 | - | |
| 306 | -# monkeypatch email to fix issue #32: | |
| 307 | -# allow header lines without ":" | |
| 308 | -import email.feedparser | |
| 309 | -email.feedparser.headerRE = re.compile(r'^(From |[\041-\071\073-\176]{1,}:?|[\t ])') | |
| 310 | - | |
| 311 | -# === PYTHON 2+3 SUPPORT ====================================================== | |
| 312 | - | |
| 313 | -if sys.version_info[0] <= 2: | |
| 314 | - # Python 2.x | |
| 315 | - if sys.version_info[1] <= 6: | |
| 316 | - # Python 2.6 | |
| 317 | - # use is_zipfile backported from Python 2.7: | |
| 318 | - from thirdparty.zipfile27 import is_zipfile | |
| 319 | - else: | |
| 320 | - # Python 2.7 | |
| 321 | - from zipfile import is_zipfile | |
| 322 | -else: | |
| 323 | - # Python 3.x+ | |
| 324 | - from zipfile import is_zipfile | |
| 325 | - # xrange is now called range: | |
| 326 | - xrange = range | |
| 327 | - | |
| 328 | - | |
| 329 | -# === PYTHON 3.0 - 3.4 SUPPORT ====================================================== | |
| 330 | - | |
| 331 | -# From https://gist.github.com/ynkdir/867347/c5e188a4886bc2dd71876c7e069a7b00b6c16c61 | |
| 332 | - | |
| 333 | -if sys.version_info >= (3, 0) and sys.version_info < (3, 5): | |
| 334 | - import codecs | |
| 335 | - | |
| 336 | - _backslashreplace_errors = codecs.lookup_error("backslashreplace") | |
| 337 | - | |
| 338 | - def backslashreplace_errors(exc): | |
| 339 | - if isinstance(exc, UnicodeDecodeError): | |
| 340 | - u = "".join("\\x{0:02x}".format(c) for c in exc.object[exc.start:exc.end]) | |
| 341 | - return (u, exc.end) | |
| 342 | - return _backslashreplace_errors(exc) | |
| 343 | - | |
| 344 | - codecs.register_error("backslashreplace", backslashreplace_errors) | |
| 345 | - | |
| 346 | - | |
| 347 | -# === LOGGING ================================================================= | |
| 348 | - | |
| 349 | -class NullHandler(logging.Handler): | |
| 350 | - """ | |
| 351 | - Log Handler without output, to avoid printing messages if logging is not | |
| 352 | - configured by the main application. | |
| 353 | - Python 2.7 has logging.NullHandler, but this is necessary for 2.6: | |
| 354 | - see https://docs.python.org/2.6/library/logging.html#configuring-logging-for-a-library | |
| 355 | - """ | |
| 356 | - def emit(self, record): | |
| 357 | - pass | |
| 358 | - | |
| 359 | -def get_logger(name, level=logging.CRITICAL+1): | |
| 360 | - """ | |
| 361 | - Create a suitable logger object for this module. | |
| 362 | - The goal is not to change settings of the root logger, to avoid getting | |
| 363 | - other modules' logs on the screen. | |
| 364 | - If a logger exists with same name, reuse it. (Else it would have duplicate | |
| 365 | - handlers and messages would be doubled.) | |
| 366 | - The level is set to CRITICAL+1 by default, to avoid any logging. | |
| 367 | - """ | |
| 368 | - # First, test if there is already a logger with the same name, else it | |
| 369 | - # will generate duplicate messages (due to duplicate handlers): | |
| 370 | - if name in logging.Logger.manager.loggerDict: | |
| 371 | - #NOTE: another less intrusive but more "hackish" solution would be to | |
| 372 | - # use getLogger then test if its effective level is not default. | |
| 373 | - logger = logging.getLogger(name) | |
| 374 | - # make sure level is OK: | |
| 375 | - logger.setLevel(level) | |
| 376 | - return logger | |
| 377 | - # get a new logger: | |
| 378 | - logger = logging.getLogger(name) | |
| 379 | - # only add a NullHandler for this logger, it is up to the application | |
| 380 | - # to configure its own logging: | |
| 381 | - logger.addHandler(NullHandler()) | |
| 382 | - logger.setLevel(level) | |
| 383 | - return logger | |
| 384 | - | |
| 385 | -# a global logger object used for debugging: | |
| 386 | -log = get_logger('olevba') | |
| 387 | - | |
| 388 | - | |
| 389 | -def enable_logging(): | |
| 390 | - """ | |
| 391 | - Enable logging for this module (disabled by default). | |
| 392 | - This will set the module-specific logger level to NOTSET, which | |
| 393 | - means the main application controls the actual logging level. | |
| 394 | - """ | |
| 395 | - log.setLevel(logging.NOTSET) | |
| 396 | - # Also enable logging in the ppt_parser module: | |
| 397 | - ppt_parser.enable_logging() | |
| 398 | - | |
| 399 | - | |
| 400 | - | |
| 401 | -#=== EXCEPTIONS ============================================================== | |
| 402 | - | |
| 403 | -class OlevbaBaseException(Exception): | |
| 404 | - """ Base class for exceptions produced here for simpler except clauses """ | |
| 405 | - def __init__(self, msg, filename=None, orig_exc=None, **kwargs): | |
| 406 | - if orig_exc: | |
| 407 | - super(OlevbaBaseException, self).__init__(msg + | |
| 408 | - ' ({0})'.format(orig_exc), | |
| 409 | - **kwargs) | |
| 410 | - else: | |
| 411 | - super(OlevbaBaseException, self).__init__(msg, **kwargs) | |
| 412 | - self.msg = msg | |
| 413 | - self.filename = filename | |
| 414 | - self.orig_exc = orig_exc | |
| 415 | - | |
| 416 | - | |
| 417 | -class FileOpenError(OlevbaBaseException): | |
| 418 | - """ raised by VBA_Parser constructor if all open_... attempts failed | |
| 419 | - | |
| 420 | - probably means the file type is not supported | |
| 421 | - """ | |
| 422 | - | |
| 423 | - def __init__(self, filename, orig_exc=None): | |
| 424 | - super(FileOpenError, self).__init__( | |
| 425 | - 'Failed to open file %s' % filename, filename, orig_exc) | |
| 426 | - | |
| 427 | - | |
| 428 | -class ProcessingError(OlevbaBaseException): | |
| 429 | - """ raised by VBA_Parser.process_file* functions """ | |
| 430 | - | |
| 431 | - def __init__(self, filename, orig_exc): | |
| 432 | - super(ProcessingError, self).__init__( | |
| 433 | - 'Error processing file %s' % filename, filename, orig_exc) | |
| 434 | - | |
| 435 | - | |
| 436 | -class MsoExtractionError(RuntimeError, OlevbaBaseException): | |
| 437 | - """ raised by mso_file_extract if parsing MSO/ActiveMIME data failed """ | |
| 438 | - | |
| 439 | - def __init__(self, msg): | |
| 440 | - MsoExtractionError.__init__(self, msg) | |
| 441 | - OlevbaBaseException.__init__(self, msg) | |
| 442 | - | |
| 443 | - | |
| 444 | -class SubstreamOpenError(FileOpenError): | |
| 445 | - """ special kind of FileOpenError: file is a substream of original file """ | |
| 446 | - | |
| 447 | - def __init__(self, filename, subfilename, orig_exc=None): | |
| 448 | - super(SubstreamOpenError, self).__init__( | |
| 449 | - str(filename) + '/' + str(subfilename), orig_exc) | |
| 450 | - self.filename = filename # overwrite setting in OlevbaBaseException | |
| 451 | - self.subfilename = subfilename | |
| 452 | - | |
| 453 | - | |
| 454 | -class UnexpectedDataError(OlevbaBaseException): | |
| 455 | - """ raised when parsing is strict (=not relaxed) and data is unexpected """ | |
| 456 | - | |
| 457 | - def __init__(self, stream_path, variable, expected, value): | |
| 458 | - if isinstance(expected, int): | |
| 459 | - es = '{0:04X}'.format(expected) | |
| 460 | - elif isinstance(expected, tuple): | |
| 461 | - es = ','.join('{0:04X}'.format(e) for e in expected) | |
| 462 | - es = '({0})'.format(es) | |
| 463 | - else: | |
| 464 | - raise ValueError('Unknown type encountered: {0}'.format(type(expected))) | |
| 465 | - super(UnexpectedDataError, self).__init__( | |
| 466 | - 'Unexpected value in {0} for variable {1}: ' | |
| 467 | - 'expected {2} but found {3:04X}!' | |
| 468 | - .format(stream_path, variable, es, value)) | |
| 469 | - self.stream_path = stream_path | |
| 470 | - self.variable = variable | |
| 471 | - self.expected = expected | |
| 472 | - self.value = value | |
| 473 | - | |
| 474 | -#--- CONSTANTS ---------------------------------------------------------------- | |
| 475 | - | |
| 476 | -# return codes | |
| 477 | -RETURN_OK = 0 | |
| 478 | -RETURN_WARNINGS = 1 # (reserved, not used yet) | |
| 479 | -RETURN_WRONG_ARGS = 2 # (fixed, built into optparse) | |
| 480 | -RETURN_FILE_NOT_FOUND = 3 | |
| 481 | -RETURN_XGLOB_ERR = 4 | |
| 482 | -RETURN_OPEN_ERROR = 5 | |
| 483 | -RETURN_PARSE_ERROR = 6 | |
| 484 | -RETURN_SEVERAL_ERRS = 7 | |
| 485 | -RETURN_UNEXPECTED = 8 | |
| 486 | -RETURN_ENCRYPTED = 9 | |
| 487 | - | |
| 488 | -# MAC codepages (from http://stackoverflow.com/questions/1592925/decoding-mac-os-text-in-python) | |
| 489 | -MAC_CODEPAGES = { | |
| 490 | - 10000: 'mac-roman', | |
| 491 | - 10001: 'shiftjis', # not found: 'mac-shift-jis', | |
| 492 | - 10003: 'ascii', # nothing appropriate found: 'mac-hangul', | |
| 493 | - 10008: 'gb2321', # not found: 'mac-gb2312', | |
| 494 | - 10002: 'big5', # not found: 'mac-big5', | |
| 495 | - 10005: 'hebrew', # not found: 'mac-hebrew', | |
| 496 | - 10004: 'mac-arabic', | |
| 497 | - 10006: 'mac-greek', | |
| 498 | - 10081: 'mac-turkish', | |
| 499 | - 10021: 'thai', # not found: mac-thai', | |
| 500 | - 10029: 'maccentraleurope', # not found: 'mac-east europe', | |
| 501 | - 10007: 'ascii', # nothing appropriate found: 'mac-russian', | |
| 502 | -} | |
| 503 | - | |
| 504 | -# URL and message to report issues: | |
| 505 | -URL_OLEVBA_ISSUES = 'https://github.com/decalage2/oletools/issues' | |
| 506 | -MSG_OLEVBA_ISSUES = 'Please report this issue on %s' % URL_OLEVBA_ISSUES | |
| 507 | - | |
| 508 | -# Container types: | |
| 509 | -TYPE_OLE = 'OLE' | |
| 510 | -TYPE_OpenXML = 'OpenXML' | |
| 511 | -TYPE_FlatOPC_XML = 'FlatOPC_XML' | |
| 512 | -TYPE_Word2003_XML = 'Word2003_XML' | |
| 513 | -TYPE_MHTML = 'MHTML' | |
| 514 | -TYPE_TEXT = 'Text' | |
| 515 | -TYPE_PPT = 'PPT' | |
| 516 | - | |
| 517 | -# short tag to display file types in triage mode: | |
| 518 | -TYPE2TAG = { | |
| 519 | - TYPE_OLE: 'OLE:', | |
| 520 | - TYPE_OpenXML: 'OpX:', | |
| 521 | - TYPE_FlatOPC_XML: 'FlX:', | |
| 522 | - TYPE_Word2003_XML: 'XML:', | |
| 523 | - TYPE_MHTML: 'MHT:', | |
| 524 | - TYPE_TEXT: 'TXT:', | |
| 525 | - TYPE_PPT: 'PPT', | |
| 526 | -} | |
| 527 | - | |
| 528 | - | |
| 529 | -# MSO files ActiveMime header magic | |
| 530 | -MSO_ACTIVEMIME_HEADER = b'ActiveMime' | |
| 531 | - | |
| 532 | -MODULE_EXTENSION = "bas" | |
| 533 | -CLASS_EXTENSION = "cls" | |
| 534 | -FORM_EXTENSION = "frm" | |
| 535 | - | |
| 536 | -# Namespaces and tags for Word2003 XML parsing: | |
| 537 | -NS_W = '{http://schemas.microsoft.com/office/word/2003/wordml}' | |
| 538 | -# the tag <w:binData w:name="editdata.mso"> contains the VBA macro code: | |
| 539 | -TAG_BINDATA = NS_W + 'binData' | |
| 540 | -ATTR_NAME = NS_W + 'name' | |
| 541 | - | |
| 542 | -# Namespaces and tags for Word/PowerPoint 2007+ XML parsing: | |
| 543 | -# root: <pkg:package xmlns:pkg="http://schemas.microsoft.com/office/2006/xmlPackage"> | |
| 544 | -NS_XMLPACKAGE = '{http://schemas.microsoft.com/office/2006/xmlPackage}' | |
| 545 | -TAG_PACKAGE = NS_XMLPACKAGE + 'package' | |
| 546 | -# the tag <pkg:part> includes <pkg:binaryData> that contains the VBA macro code in Base64: | |
| 547 | -# <pkg:part pkg:name="/word/vbaProject.bin" pkg:contentType="application/vnd.ms-office.vbaProject"><pkg:binaryData> | |
| 548 | -TAG_PKGPART = NS_XMLPACKAGE + 'part' | |
| 549 | -ATTR_PKG_NAME = NS_XMLPACKAGE + 'name' | |
| 550 | -ATTR_PKG_CONTENTTYPE = NS_XMLPACKAGE + 'contentType' | |
| 551 | -CTYPE_VBAPROJECT = "application/vnd.ms-office.vbaProject" | |
| 552 | -TAG_PKGBINDATA = NS_XMLPACKAGE + 'binaryData' | |
| 553 | - | |
| 554 | -# Keywords to detect auto-executable macros | |
| 555 | -AUTOEXEC_KEYWORDS = { | |
| 556 | - # MS Word: | |
| 557 | - 'Runs when the Word document is opened': | |
| 558 | - ('AutoExec', 'AutoOpen', 'DocumentOpen'), | |
| 559 | - 'Runs when the Word document is closed': | |
| 560 | - ('AutoExit', 'AutoClose', 'Document_Close', 'DocumentBeforeClose'), | |
| 561 | - 'Runs when the Word document is modified': | |
| 562 | - ('DocumentChange',), | |
| 563 | - 'Runs when a new Word document is created': | |
| 564 | - ('AutoNew', 'Document_New', 'NewDocument'), | |
| 565 | - | |
| 566 | - # MS Word and Publisher: | |
| 567 | - 'Runs when the Word or Publisher document is opened': | |
| 568 | - ('Document_Open',), | |
| 569 | - 'Runs when the Publisher document is closed': | |
| 570 | - ('Document_BeforeClose',), | |
| 571 | - | |
| 572 | - # MS Excel: | |
| 573 | - 'Runs when the Excel Workbook is opened': | |
| 574 | - ('Auto_Open', 'Workbook_Open', 'Workbook_Activate'), | |
| 575 | - 'Runs when the Excel Workbook is closed': | |
| 576 | - ('Auto_Close', 'Workbook_Close'), | |
| 577 | - | |
| 578 | - # any MS Office application: | |
| 579 | - 'Runs when the file is opened (using InkPicture ActiveX object)': | |
| 580 | - # ref:https://twitter.com/joe4security/status/770691099988025345 | |
| 581 | - (r'\w+_Painted',), | |
| 582 | - 'Runs when the file is opened and ActiveX objects trigger events': | |
| 583 | - (r'\w+_(?:GotFocus|LostFocus|MouseHover)',), | |
| 584 | -} | |
| 585 | - | |
| 586 | -# Suspicious Keywords that may be used by malware | |
| 587 | -# See VBA language reference: http://msdn.microsoft.com/en-us/library/office/jj692818%28v=office.15%29.aspx | |
| 588 | -SUSPICIOUS_KEYWORDS = { | |
| 589 | - #TODO: use regex to support variable whitespaces | |
| 590 | - 'May read system environment variables': | |
| 591 | - ('Environ',), | |
| 592 | - 'May open a file': | |
| 593 | - ('Open',), | |
| 594 | - 'May write to a file (if combined with Open)': | |
| 595 | - #TODO: regex to find Open+Write on same line | |
| 596 | - ('Write', 'Put', 'Output', 'Print #'), | |
| 597 | - 'May read or write a binary file (if combined with Open)': | |
| 598 | - #TODO: regex to find Open+Binary on same line | |
| 599 | - ('Binary',), | |
| 600 | - 'May copy a file': | |
| 601 | - ('FileCopy', 'CopyFile'), | |
| 602 | - #FileCopy: http://msdn.microsoft.com/en-us/library/office/gg264390%28v=office.15%29.aspx | |
| 603 | - #CopyFile: http://msdn.microsoft.com/en-us/library/office/gg264089%28v=office.15%29.aspx | |
| 604 | - 'May delete a file': | |
| 605 | - ('Kill',), | |
| 606 | - 'May create a text file': | |
| 607 | - ('CreateTextFile', 'ADODB.Stream', 'WriteText', 'SaveToFile'), | |
| 608 | - #CreateTextFile: http://msdn.microsoft.com/en-us/library/office/gg264617%28v=office.15%29.aspx | |
| 609 | - #ADODB.Stream sample: http://pastebin.com/Z4TMyuq6 | |
| 610 | - 'May run an executable file or a system command': | |
| 611 | - ('Shell', 'vbNormal', 'vbNormalFocus', 'vbHide', 'vbMinimizedFocus', 'vbMaximizedFocus', 'vbNormalNoFocus', | |
| 612 | - 'vbMinimizedNoFocus', 'WScript.Shell', 'Run', 'ShellExecute'), | |
| 613 | - # MacScript: see https://msdn.microsoft.com/en-us/library/office/gg264812.aspx | |
| 614 | - 'May run an executable file or a system command on a Mac': | |
| 615 | - ('MacScript',), | |
| 616 | - 'May run an executable file or a system command on a Mac (if combined with libc.dylib)': | |
| 617 | - ('system', 'popen', r'exec[lv][ep]?'), | |
| 618 | - #Shell: http://msdn.microsoft.com/en-us/library/office/gg278437%28v=office.15%29.aspx | |
| 619 | - #WScript.Shell+Run sample: http://pastebin.com/Z4TMyuq6 | |
| 620 | - 'May run PowerShell commands': | |
| 621 | - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/ | |
| 622 | - #also: https://bitbucket.org/decalage/oletools/issues/14/olevba-library-update-ioc | |
| 623 | - # ref: https://blog.netspi.com/15-ways-to-bypass-the-powershell-execution-policy/ | |
| 624 | - # TODO: add support for keywords starting with a non-alpha character, such as "-noexit" | |
| 625 | - # TODO: '-command', '-EncodedCommand', '-scriptblock' | |
| 626 | - ('PowerShell', 'noexit', 'ExecutionPolicy', 'noprofile', 'command', 'EncodedCommand', | |
| 627 | - 'invoke-command', 'scriptblock', 'Invoke-Expression', 'AuthorizationManager'), | |
| 628 | - 'May run an executable file or a system command using PowerShell': | |
| 629 | - ('Start-Process',), | |
| 630 | - 'May hide the application': | |
| 631 | - ('Application.Visible', 'ShowWindow', 'SW_HIDE'), | |
| 632 | - 'May create a directory': | |
| 633 | - ('MkDir',), | |
| 634 | - 'May save the current workbook': | |
| 635 | - ('ActiveWorkbook.SaveAs',), | |
| 636 | - 'May change which directory contains files to open at startup': | |
| 637 | - #TODO: confirm the actual effect | |
| 638 | - ('Application.AltStartupPath',), | |
| 639 | - 'May create an OLE object': | |
| 640 | - ('CreateObject',), | |
| 641 | - 'May create an OLE object using PowerShell': | |
| 642 | - ('New-Object',), | |
| 643 | - 'May run an application (if combined with CreateObject)': | |
| 644 | - ('Shell.Application',), | |
| 645 | - 'May enumerate application windows (if combined with Shell.Application object)': | |
| 646 | - ('Windows', 'FindWindow'), | |
| 647 | - 'May run code from a DLL': | |
| 648 | - #TODO: regex to find declare+lib on same line - see mraptor | |
| 649 | - ('Lib',), | |
| 650 | - 'May run code from a library on a Mac': | |
| 651 | - #TODO: regex to find declare+lib on same line - see mraptor | |
| 652 | - ('libc.dylib', 'dylib'), | |
| 653 | - 'May inject code into another process': | |
| 654 | - ('CreateThread', 'VirtualAlloc', # (issue #9) suggested by Davy Douhine - used by MSF payload | |
| 655 | - 'VirtualAllocEx', 'RtlMoveMemory', | |
| 656 | - ), | |
| 657 | - 'May run a shellcode in memory': | |
| 658 | - ('EnumSystemLanguageGroupsW?', # Used by Hancitor in Oct 2016 | |
| 659 | - 'EnumDateFormats(?:W|(?:Ex){1,2})?'), # see https://msdn.microsoft.com/en-us/library/windows/desktop/dd317810(v=vs.85).aspx | |
| 660 | - 'May download files from the Internet': | |
| 661 | - #TODO: regex to find urlmon+URLDownloadToFileA on same line | |
| 662 | - ('URLDownloadToFileA', 'Msxml2.XMLHTTP', 'Microsoft.XMLHTTP', | |
| 663 | - 'MSXML2.ServerXMLHTTP', # suggested in issue #13 | |
| 664 | - 'User-Agent', # sample from @ozhermit: http://pastebin.com/MPc3iV6z | |
| 665 | - ), | |
| 666 | - 'May download files from the Internet using PowerShell': | |
| 667 | - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/ | |
| 668 | - ('Net.WebClient', 'DownloadFile', 'DownloadString'), | |
| 669 | - 'May control another application by simulating user keystrokes': | |
| 670 | - ('SendKeys', 'AppActivate'), | |
| 671 | - #SendKeys: http://msdn.microsoft.com/en-us/library/office/gg278655%28v=office.15%29.aspx | |
| 672 | - 'May attempt to obfuscate malicious function calls': | |
| 673 | - ('CallByName',), | |
| 674 | - #CallByName: http://msdn.microsoft.com/en-us/library/office/gg278760%28v=office.15%29.aspx | |
| 675 | - 'May attempt to obfuscate specific strings (use option --deobf to deobfuscate)': | |
| 676 | - #TODO: regex to find several Chr*, not just one | |
| 677 | - ('Chr', 'ChrB', 'ChrW', 'StrReverse', 'Xor'), | |
| 678 | - #Chr: http://msdn.microsoft.com/en-us/library/office/gg264465%28v=office.15%29.aspx | |
| 679 | - 'May read or write registry keys': | |
| 680 | - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/ | |
| 681 | - ('RegOpenKeyExA', 'RegOpenKeyEx', 'RegCloseKey'), | |
| 682 | - 'May read registry keys': | |
| 683 | - #sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/ | |
| 684 | - ('RegQueryValueExA', 'RegQueryValueEx', | |
| 685 | - 'RegRead', #with Wscript.Shell | |
| 686 | - ), | |
| 687 | - 'May detect virtualization': | |
| 688 | - # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/ | |
| 689 | - (r'SYSTEM\ControlSet001\Services\Disk\Enum', 'VIRTUAL', 'VMWARE', 'VBOX'), | |
| 690 | - 'May detect Anubis Sandbox': | |
| 691 | - # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/ | |
| 692 | - # NOTES: this sample also checks App.EXEName but that seems to be a bug, it works in VB6 but not in VBA | |
| 693 | - # ref: http://www.syssec-project.eu/m/page-media/3/disarm-raid11.pdf | |
| 694 | - ('GetVolumeInformationA', 'GetVolumeInformation', # with kernel32.dll | |
| 695 | - '1824245000', r'HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProductId', | |
| 696 | - '76487-337-8429955-22614', 'andy', 'sample', r'C:\exec\exec.exe', 'popupkiller' | |
| 697 | - ), | |
| 698 | - 'May detect Sandboxie': | |
| 699 | - # sample: https://malwr.com/analysis/M2NjZWNmMjA0YjVjNGVhYmJlZmFhNWY4NmQxZDllZTY/ | |
| 700 | - # ref: http://www.cplusplus.com/forum/windows/96874/ | |
| 701 | - ('SbieDll.dll', 'SandboxieControlWndClass'), | |
| 702 | - 'May detect Sunbelt Sandbox': | |
| 703 | - # ref: http://www.cplusplus.com/forum/windows/96874/ | |
| 704 | - (r'C:\file.exe',), | |
| 705 | - 'May detect Norman Sandbox': | |
| 706 | - # ref: http://www.cplusplus.com/forum/windows/96874/ | |
| 707 | - ('currentuser',), | |
| 708 | - 'May detect CW Sandbox': | |
| 709 | - # ref: http://www.cplusplus.com/forum/windows/96874/ | |
| 710 | - ('Schmidti',), | |
| 711 | - 'May detect WinJail Sandbox': | |
| 712 | - # ref: http://www.cplusplus.com/forum/windows/96874/ | |
| 713 | - ('Afx:400000:0',), | |
| 714 | -} | |
| 715 | - | |
| 716 | -# Regular Expression for a URL: | |
| 717 | -# http://en.wikipedia.org/wiki/Uniform_resource_locator | |
| 718 | -# http://www.w3.org/Addressing/URL/uri-spec.html | |
| 719 | -#TODO: also support username:password@server | |
| 720 | -#TODO: other protocols (file, gopher, wais, ...?) | |
| 721 | -SCHEME = r'\b(?:http|ftp)s?' | |
| 722 | -# see http://en.wikipedia.org/wiki/List_of_Internet_top-level_domains | |
| 723 | -TLD = r'(?:xn--[a-zA-Z0-9]{4,20}|[a-zA-Z]{2,20})' | |
| 724 | -DNS_NAME = r'(?:[a-zA-Z0-9\-\.]+\.' + TLD + ')' | |
| 725 | -#TODO: IPv6 - see https://www.debuggex.com/ | |
| 726 | -# A literal numeric IPv6 address may be given, but must be enclosed in [ ] e.g. [db8:0cec::99:123a] | |
| 727 | -NUMBER_0_255 = r'(?:25[0-5]|2[0-4][0-9]|1[0-9]{2}|[1-9][0-9]|[0-9])' | |
| 728 | -IPv4 = r'(?:' + NUMBER_0_255 + r'\.){3}' + NUMBER_0_255 | |
| 729 | -# IPv4 must come before the DNS name because it is more specific | |
| 730 | -SERVER = r'(?:' + IPv4 + '|' + DNS_NAME + ')' | |
| 731 | -PORT = r'(?:\:[0-9]{1,5})?' | |
| 732 | -SERVER_PORT = SERVER + PORT | |
| 733 | -URL_PATH = r'(?:/[a-zA-Z0-9\-\._\?\,\'/\\\+&%\$#\=~]*)?' # [^\.\,\)\(\s"] | |
| 734 | -URL_RE = SCHEME + r'\://' + SERVER_PORT + URL_PATH | |
| 735 | -re_url = re.compile(URL_RE) | |
| 736 | - | |
| 737 | - | |
| 738 | -# Patterns to be extracted (IP addresses, URLs, etc) | |
| 739 | -# From patterns.py in balbuzard | |
| 740 | -RE_PATTERNS = ( | |
| 741 | - ('URL', re.compile(URL_RE)), | |
| 742 | - ('IPv4 address', re.compile(IPv4)), | |
| 743 | - # TODO: add IPv6 | |
| 744 | - ('E-mail address', re.compile(r'(?i)\b[A-Z0-9._%+-]+@' + SERVER + '\b')), | |
| 745 | - # ('Domain name', re.compile(r'(?=^.{1,254}$)(^(?:(?!\d+\.|-)[a-zA-Z0-9_\-]{1,63}(?<!-)\.?)+(?:[a-zA-Z]{2,})$)')), | |
| 746 | - # Executable file name with known extensions (except .com which is present in many URLs, and .application): | |
| 747 | - ("Executable file name", re.compile( | |
| 748 | - r"(?i)\b\w+\.(EXE|PIF|GADGET|MSI|MSP|MSC|VBS|VBE|VB|JSE|JS|WSF|WSC|WSH|WS|BAT|CMD|DLL|SCR|HTA|CPL|CLASS|JAR|PS1XML|PS1|PS2XML|PS2|PSC1|PSC2|SCF|LNK|INF|REG)\b")), | |
| 749 | - # Sources: http://www.howtogeek.com/137270/50-file-extensions-that-are-potentially-dangerous-on-windows/ | |
| 750 | - # TODO: https://support.office.com/en-us/article/Blocked-attachments-in-Outlook-3811cddc-17c3-4279-a30c-060ba0207372#__attachment_file_types | |
| 751 | - # TODO: add win & unix file paths | |
| 752 | - #('Hex string', re.compile(r'(?:[0-9A-Fa-f]{2}){4,}')), | |
| 753 | -) | |
| 754 | - | |
| 755 | -# regex to detect strings encoded in hexadecimal | |
| 756 | -re_hex_string = re.compile(r'(?:[0-9A-Fa-f]{2}){4,}') | |
| 757 | - | |
| 758 | -# regex to detect strings encoded in base64 | |
| 759 | -#re_base64_string = re.compile(r'"(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?"') | |
| 760 | -# better version from balbuzard, less false positives: | |
| 761 | -# (plain version without double quotes, used also below in quoted_base64_string) | |
| 762 | -BASE64_RE = r'(?:[A-Za-z0-9+/]{4}){1,}(?:[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=|[A-Za-z0-9+/][AQgw]==)?' | |
| 763 | -re_base64_string = re.compile('"' + BASE64_RE + '"') | |
| 764 | -# white list of common strings matching the base64 regex, but which are not base64 strings (all lowercase): | |
| 765 | -BASE64_WHITELIST = set(['thisdocument', 'thisworkbook', 'test', 'temp', 'http', 'open', 'exit']) | |
| 766 | - | |
| 767 | -# regex to detect strings encoded with a specific Dridex algorithm | |
| 768 | -# (see https://github.com/JamesHabben/MalwareStuff) | |
| 769 | -re_dridex_string = re.compile(r'"[0-9A-Za-z]{20,}"') | |
| 770 | -# regex to check that it is not just a hex string: | |
| 771 | -re_nothex_check = re.compile(r'[G-Zg-z]') | |
| 772 | - | |
| 773 | -# regex to extract printable strings (at least 5 chars) from VBA Forms: | |
| 774 | -re_printable_string = re.compile(b'[\\t\\r\\n\\x20-\\xFF]{5,}') | |
| 775 | - | |
| 776 | - | |
| 777 | -# === PARTIAL VBA GRAMMAR ==================================================== | |
| 778 | - | |
| 779 | -# REFERENCES: | |
| 780 | -# - [MS-VBAL]: VBA Language Specification | |
| 781 | -# https://msdn.microsoft.com/en-us/library/dd361851.aspx | |
| 782 | -# - pyparsing: http://pyparsing.wikispaces.com/ | |
| 783 | - | |
| 784 | -# TODO: set whitespaces according to VBA | |
| 785 | -# TODO: merge extended lines before parsing | |
| 786 | - | |
| 787 | -# Enable PackRat for better performance: | |
| 788 | -# (see https://pythonhosted.org/pyparsing/pyparsing.ParserElement-class.html#enablePackrat) | |
| 789 | -ParserElement.enablePackrat() | |
| 790 | - | |
| 791 | -# VBA identifier chars (from MS-VBAL 3.3.5) | |
| 792 | -vba_identifier_chars = alphanums + '_' | |
| 793 | - | |
| 794 | -class VbaExpressionString(str): | |
| 795 | - """ | |
| 796 | - Class identical to str, used to distinguish plain strings from strings | |
| 797 | - obfuscated using VBA expressions (Chr, StrReverse, etc) | |
| 798 | - Usage: each VBA expression parse action should convert strings to | |
| 799 | - VbaExpressionString. | |
| 800 | - Then isinstance(s, VbaExpressionString) is True only for VBA expressions. | |
| 801 | - (see detect_vba_strings) | |
| 802 | - """ | |
| 803 | - # TODO: use Unicode everywhere instead of str | |
| 804 | - pass | |
| 805 | - | |
| 806 | - | |
| 807 | -# --- NUMBER TOKENS ---------------------------------------------------------- | |
| 808 | - | |
| 809 | -# 3.3.2 Number Tokens | |
| 810 | -# INTEGER = integer-literal ["%" / "&" / "^"] | |
| 811 | -# integer-literal = decimal-literal / octal-literal / hex-literal | |
| 812 | -# decimal-literal = 1*decimal-digit | |
| 813 | -# octal-literal = "&" [%x004F / %x006F] 1*octal-digit | |
| 814 | -# ; & or &o or &O | |
| 815 | -# hex-literal = "&" (%x0048 / %x0068) 1*hex-digit | |
| 816 | -# ; &h or &H | |
| 817 | -# octal-digit = "0" / "1" / "2" / "3" / "4" / "5" / "6" / "7" | |
| 818 | -# decimal-digit = octal-digit / "8" / "9" | |
| 819 | -# hex-digit = decimal-digit / %x0041-0046 / %x0061-0066 ;A-F / a-f | |
| 820 | - | |
| 821 | -# NOTE: here Combine() is required to avoid spaces between elements | |
| 822 | -# NOTE: here WordStart is necessary to avoid matching a number preceded by | |
| 823 | -# letters or underscore (e.g. "VBT1" or "ABC_34"), when using scanString | |
| 824 | -decimal_literal = Combine(Optional('-') + WordStart(vba_identifier_chars) + Word(nums) | |
| 825 | - + Suppress(Optional(Word('%&^', exact=1)))) | |
| 826 | -decimal_literal.setParseAction(lambda t: int(t[0])) | |
| 827 | - | |
| 828 | -octal_literal = Combine(Suppress(Literal('&') + Optional((CaselessLiteral('o')))) + Word(srange('[0-7]')) | |
| 829 | - + Suppress(Optional(Word('%&^', exact=1)))) | |
| 830 | -octal_literal.setParseAction(lambda t: int(t[0], base=8)) | |
| 831 | - | |
| 832 | -hex_literal = Combine(Suppress(CaselessLiteral('&h')) + Word(srange('[0-9a-fA-F]')) | |
| 833 | - + Suppress(Optional(Word('%&^', exact=1)))) | |
| 834 | -hex_literal.setParseAction(lambda t: int(t[0], base=16)) | |
| 835 | - | |
| 836 | -integer = decimal_literal | octal_literal | hex_literal | |
| 837 | - | |
| 838 | - | |
| 839 | -# --- QUOTED STRINGS --------------------------------------------------------- | |
| 840 | - | |
| 841 | -# 3.3.4 String Tokens | |
| 842 | -# STRING = double-quote *string-character (double-quote / line-continuation / LINE-END) | |
| 843 | -# double-quote = %x0022 ; " | |
| 844 | -# string-character = NO-LINE-CONTINUATION ((double-quote double-quote) termination-character) | |
| 845 | - | |
| 846 | -quoted_string = QuotedString('"', escQuote='""') | |
| 847 | -quoted_string.setParseAction(lambda t: str(t[0])) | |
| 848 | - | |
| 849 | - | |
| 850 | -#--- VBA Expressions --------------------------------------------------------- | |
| 851 | - | |
| 852 | -# See MS-VBAL 5.6 Expressions | |
| 853 | - | |
| 854 | -# need to pre-declare using Forward() because it is recursive | |
| 855 | -# VBA string expression and integer expression | |
| 856 | -vba_expr_str = Forward() | |
| 857 | -vba_expr_int = Forward() | |
| 858 | - | |
| 859 | -# --- CHR -------------------------------------------------------------------- | |
| 860 | - | |
| 861 | -# MS-VBAL 6.1.2.11.1.4 Chr / Chr$ | |
| 862 | -# Function Chr(CharCode As Long) As Variant | |
| 863 | -# Function Chr$(CharCode As Long) As String | |
| 864 | -# Parameter Description | |
| 865 | -# CharCode Long whose value is a code point. | |
| 866 | -# Returns a String data value consisting of a single character containing the character whose code | |
| 867 | -# point is the data value of the argument. | |
| 868 | -# - If the argument is not in the range 0 to 255, Error Number 5 ("Invalid procedure call or | |
| 869 | -# argument") is raised unless the implementation supports a character set with a larger code point | |
| 870 | -# range. | |
| 871 | -# - If the argument value is in the range of 0 to 127, it is interpreted as a 7-bit ASCII code point. | |
| 872 | -# - If the argument value is in the range of 128 to 255, the code point interpretation of the value is | |
| 873 | -# implementation defined. | |
| 874 | -# - Chr$ has the same runtime semantics as Chr, however the declared type of its function result is | |
| 875 | -# String rather than Variant. | |
| 876 | - | |
| 877 | -# 6.1.2.11.1.5 ChrB / ChrB$ | |
| 878 | -# Function ChrB(CharCode As Long) As Variant | |
| 879 | -# Function ChrB$(CharCode As Long) As String | |
| 880 | -# CharCode Long whose value is a code point. | |
| 881 | -# Returns a String data value consisting of a single byte character whose code point value is the | |
| 882 | -# data value of the argument. | |
| 883 | -# - If the argument is not in the range 0 to 255, Error Number 6 ("Overflow") is raised. | |
| 884 | -# - ChrB$ has the same runtime semantics as ChrB however the declared type of its function result | |
| 885 | -# is String rather than Variant. | |
| 886 | -# - Note: the ChrB function is used with byte data contained in a String. Instead of returning a | |
| 887 | -# character, which may be one or two bytes, ChrB always returns a single byte. The ChrW function | |
| 888 | -# returns a String containing the Unicode character except on platforms where Unicode is not | |
| 889 | -# supported, in which case, the behavior is identical to the Chr function. | |
| 890 | - | |
| 891 | -# 6.1.2.11.1.6 ChrW/ ChrW$ | |
| 892 | -# Function ChrW(CharCode As Long) As Variant | |
| 893 | -# Function ChrW$(CharCode As Long) As String | |
| 894 | -# CharCode Long whose value is a code point. | |
| 895 | -# Returns a String data value consisting of a single character containing the character whose code | |
| 896 | -# point is the data value of the argument. | |
| 897 | -# - If the argument is not in the range -32,767 to 65,535 then Error Number 5 ("Invalid procedure | |
| 898 | -# call or argument") is raised. | |
| 899 | -# - If the argument is a negative value it is treated as if it was the value: CharCode + 65,536. | |
| 900 | -# - If the implemented uses 16-bit Unicode code points argument, data value is interpreted as a 16- | |
| 901 | -# bit Unicode code point. | |
| 902 | -# - If the implementation does not support Unicode, ChrW has the same semantics as Chr. | |
| 903 | -# - ChrW$ has the same runtime semantics as ChrW, however the declared type of its function result | |
| 904 | -# is String rather than Variant. | |
| 905 | - | |
| 906 | -# Chr, Chr$, ChrB, ChrW(int) => char | |
| 907 | -vba_chr = Suppress( | |
| 908 | - Combine(WordStart(vba_identifier_chars) + CaselessLiteral('Chr') | |
| 909 | - + Optional(CaselessLiteral('B') | CaselessLiteral('W')) + Optional('$')) | |
| 910 | - + '(') + vba_expr_int + Suppress(')') | |
| 911 | - | |
| 912 | -def vba_chr_tostr(t): | |
| 913 | - try: | |
| 914 | - i = t[0] | |
| 915 | - # normal, non-unicode character: | |
| 916 | - if i>=0 and i<=255: | |
| 917 | - return VbaExpressionString(chr(i)) | |
| 918 | - else: | |
| 919 | - return VbaExpressionString(chr(i).encode('utf-8', 'backslashreplace')) | |
| 920 | - except ValueError: | |
| 921 | - log.exception('ERROR: incorrect parameter value for chr(): %r' % i) | |
| 922 | - return VbaExpressionString('Chr(%r)' % i) | |
| 923 | - | |
| 924 | -vba_chr.setParseAction(vba_chr_tostr) | |
| 925 | - | |
| 926 | - | |
| 927 | -# --- ASC -------------------------------------------------------------------- | |
| 928 | - | |
| 929 | -# Asc(char) => int | |
| 930 | -#TODO: see MS-VBAL 6.1.2.11.1.1 page 240 => AscB, AscW | |
| 931 | -vba_asc = Suppress(CaselessKeyword('Asc') + '(') + vba_expr_str + Suppress(')') | |
| 932 | -vba_asc.setParseAction(lambda t: ord(t[0])) | |
| 933 | - | |
| 934 | - | |
| 935 | -# --- VAL -------------------------------------------------------------------- | |
| 936 | - | |
| 937 | -# Val(string) => int | |
| 938 | -# TODO: make sure the behavior of VBA's val is fully covered | |
| 939 | -vba_val = Suppress(CaselessKeyword('Val') + '(') + vba_expr_str + Suppress(')') | |
| 940 | -vba_val.setParseAction(lambda t: int(t[0].strip())) | |
| 941 | - | |
| 942 | - | |
| 943 | -# --- StrReverse() -------------------------------------------------------------------- | |
| 944 | - | |
| 945 | -# StrReverse(string) => string | |
| 946 | -strReverse = Suppress(CaselessKeyword('StrReverse') + '(') + vba_expr_str + Suppress(')') | |
| 947 | -strReverse.setParseAction(lambda t: VbaExpressionString(str(t[0])[::-1])) | |
| 948 | - | |
| 949 | - | |
| 950 | -# --- ENVIRON() -------------------------------------------------------------------- | |
| 951 | - | |
| 952 | -# Environ("name") => just translated to "%name%", that is enough for malware analysis | |
| 953 | -environ = Suppress(CaselessKeyword('Environ') + '(') + vba_expr_str + Suppress(')') | |
| 954 | -environ.setParseAction(lambda t: VbaExpressionString('%%%s%%' % t[0])) | |
| 955 | - | |
| 956 | - | |
| 957 | -# --- IDENTIFIER ------------------------------------------------------------- | |
| 958 | - | |
| 959 | -#TODO: see MS-VBAL 3.3.5 page 33 | |
| 960 | -# 3.3.5 Identifier Tokens | |
| 961 | -# Latin-identifier = first-Latin-identifier-character *subsequent-Latin-identifier-character | |
| 962 | -# first-Latin-identifier-character = (%x0041-005A / %x0061-007A) ; A-Z / a-z | |
| 963 | -# subsequent-Latin-identifier-character = first-Latin-identifier-character / DIGIT / %x5F ; underscore | |
| 964 | -latin_identifier = Word(initChars=alphas, bodyChars=alphanums + '_') | |
| 965 | - | |
| 966 | -# --- HEX FUNCTION ----------------------------------------------------------- | |
| 967 | - | |
| 968 | -# match any custom function name with a hex string as argument: | |
| 969 | -# TODO: accept vba_expr_str_item as argument, check if it is a hex or base64 string at runtime | |
| 970 | - | |
| 971 | -# quoted string of at least two hexadecimal numbers of two digits: | |
| 972 | -quoted_hex_string = Suppress('"') + Combine(Word(hexnums, exact=2) * (2, None)) + Suppress('"') | |
| 973 | -quoted_hex_string.setParseAction(lambda t: str(t[0])) | |
| 974 | - | |
| 975 | -hex_function_call = Suppress(latin_identifier) + Suppress('(') + \ | |
| 976 | - quoted_hex_string('hex_string') + Suppress(')') | |
| 977 | -hex_function_call.setParseAction(lambda t: VbaExpressionString(binascii.a2b_hex(t.hex_string))) | |
| 978 | - | |
| 979 | - | |
| 980 | -# --- BASE64 FUNCTION ----------------------------------------------------------- | |
| 981 | - | |
| 982 | -# match any custom function name with a Base64 string as argument: | |
| 983 | -# TODO: accept vba_expr_str_item as argument, check if it is a hex or base64 string at runtime | |
| 984 | - | |
| 985 | -# quoted string of at least two hexadecimal numbers of two digits: | |
| 986 | -quoted_base64_string = Suppress('"') + Regex(BASE64_RE) + Suppress('"') | |
| 987 | -quoted_base64_string.setParseAction(lambda t: str(t[0])) | |
| 988 | - | |
| 989 | -base64_function_call = Suppress(latin_identifier) + Suppress('(') + \ | |
| 990 | - quoted_base64_string('base64_string') + Suppress(')') | |
| 991 | -base64_function_call.setParseAction(lambda t: VbaExpressionString(binascii.a2b_base64(t.base64_string))) | |
| 992 | - | |
| 993 | - | |
| 994 | -# ---STRING EXPRESSION ------------------------------------------------------- | |
| 995 | - | |
| 996 | -def concat_strings_list(tokens): | |
| 997 | - """ | |
| 998 | - parse action to concatenate strings in a VBA expression with operators '+' or '&' | |
| 999 | - """ | |
| 1000 | - # extract argument from the tokens: | |
| 1001 | - # expected to be a tuple containing a list of strings such as [a,'&',b,'&',c,...] | |
| 1002 | - strings = tokens[0][::2] | |
| 1003 | - return VbaExpressionString(''.join(strings)) | |
| 1004 | - | |
| 1005 | - | |
| 1006 | -vba_expr_str_item = (vba_chr | strReverse | environ | quoted_string | hex_function_call | base64_function_call) | |
| 1007 | - | |
| 1008 | -vba_expr_str <<= infixNotation(vba_expr_str_item, | |
| 1009 | - [ | |
| 1010 | - ("+", 2, opAssoc.LEFT, concat_strings_list), | |
| 1011 | - ("&", 2, opAssoc.LEFT, concat_strings_list), | |
| 1012 | - ]) | |
| 1013 | - | |
| 1014 | - | |
| 1015 | -# --- INTEGER EXPRESSION ------------------------------------------------------- | |
| 1016 | - | |
| 1017 | -def sum_ints_list(tokens): | |
| 1018 | - """ | |
| 1019 | - parse action to sum integers in a VBA expression with operator '+' | |
| 1020 | - """ | |
| 1021 | - # extract argument from the tokens: | |
| 1022 | - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...] | |
| 1023 | - integers = tokens[0][::2] | |
| 1024 | - return sum(integers) | |
| 1025 | - | |
| 1026 | - | |
| 1027 | -def subtract_ints_list(tokens): | |
| 1028 | - """ | |
| 1029 | - parse action to subtract integers in a VBA expression with operator '-' | |
| 1030 | - """ | |
| 1031 | - # extract argument from the tokens: | |
| 1032 | - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...] | |
| 1033 | - integers = tokens[0][::2] | |
| 1034 | - return reduce(lambda x,y:x-y, integers) | |
| 1035 | - | |
| 1036 | - | |
| 1037 | -def multiply_ints_list(tokens): | |
| 1038 | - """ | |
| 1039 | - parse action to multiply integers in a VBA expression with operator '*' | |
| 1040 | - """ | |
| 1041 | - # extract argument from the tokens: | |
| 1042 | - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...] | |
| 1043 | - integers = tokens[0][::2] | |
| 1044 | - return reduce(lambda x,y:x*y, integers) | |
| 1045 | - | |
| 1046 | - | |
| 1047 | -def divide_ints_list(tokens): | |
| 1048 | - """ | |
| 1049 | - parse action to divide integers in a VBA expression with operator '/' | |
| 1050 | - """ | |
| 1051 | - # extract argument from the tokens: | |
| 1052 | - # expected to be a tuple containing a list of integers such as [a,'&',b,'&',c,...] | |
| 1053 | - integers = tokens[0][::2] | |
| 1054 | - return reduce(lambda x,y:x/y, integers) | |
| 1055 | - | |
| 1056 | - | |
| 1057 | -vba_expr_int_item = (vba_asc | vba_val | integer) | |
| 1058 | - | |
| 1059 | -# operators associativity: | |
| 1060 | -# https://en.wikipedia.org/wiki/Operator_associativity | |
| 1061 | - | |
| 1062 | -vba_expr_int <<= infixNotation(vba_expr_int_item, | |
| 1063 | - [ | |
| 1064 | - ("*", 2, opAssoc.LEFT, multiply_ints_list), | |
| 1065 | - ("/", 2, opAssoc.LEFT, divide_ints_list), | |
| 1066 | - ("-", 2, opAssoc.LEFT, subtract_ints_list), | |
| 1067 | - ("+", 2, opAssoc.LEFT, sum_ints_list), | |
| 1068 | - ]) | |
| 1069 | - | |
| 1070 | - | |
| 1071 | -# see detect_vba_strings for the deobfuscation code using this grammar | |
| 1072 | - | |
| 1073 | -# === MSO/ActiveMime files parsing =========================================== | |
| 1074 | - | |
| 1075 | -def is_mso_file(data): | |
| 1076 | - """ | |
| 1077 | - Check if the provided data is the content of a MSO/ActiveMime file, such as | |
| 1078 | - the ones created by Outlook in some cases, or Word/Excel when saving a | |
| 1079 | - file with the MHTML format or the Word 2003 XML format. | |
| 1080 | - This function only checks the ActiveMime magic at the beginning of data. | |
| 1081 | - :param data: bytes string, MSO/ActiveMime file content | |
| 1082 | - :return: bool, True if the file is MSO, False otherwise | |
| 1083 | - """ | |
| 1084 | - return data.startswith(MSO_ACTIVEMIME_HEADER) | |
| 1085 | - | |
| 1086 | - | |
| 1087 | -# regex to find zlib block headers, starting with byte 0x78 = 'x' | |
| 1088 | -re_zlib_header = re.compile(r'x') | |
| 1089 | - | |
| 1090 | - | |
| 1091 | -def mso_file_extract(data): | |
| 1092 | - """ | |
| 1093 | - Extract the data stored into a MSO/ActiveMime file, such as | |
| 1094 | - the ones created by Outlook in some cases, or Word/Excel when saving a | |
| 1095 | - file with the MHTML format or the Word 2003 XML format. | |
| 1096 | - | |
| 1097 | - :param data: bytes string, MSO/ActiveMime file content | |
| 1098 | - :return: bytes string, extracted data (uncompressed) | |
| 1099 | - | |
| 1100 | - raise a MsoExtractionError if the data cannot be extracted | |
| 1101 | - """ | |
| 1102 | - # check the magic: | |
| 1103 | - assert is_mso_file(data) | |
| 1104 | - | |
| 1105 | - # In all the samples seen so far, Word always uses an offset of 0x32, | |
| 1106 | - # and Excel 0x22A. But we read the offset from the header to be more | |
| 1107 | - # generic. | |
| 1108 | - offsets = [0x32, 0x22A] | |
| 1109 | - | |
| 1110 | - # First, attempt to get the compressed data offset from the header | |
| 1111 | - # According to my tests, it should be an unsigned 16 bits integer, | |
| 1112 | - # at offset 0x1E (little endian) + add 46: | |
| 1113 | - try: | |
| 1114 | - offset = struct.unpack_from('<H', data, offset=0x1E)[0] + 46 | |
| 1115 | - log.debug('Parsing MSO file: data offset = 0x%X' % offset) | |
| 1116 | - offsets.insert(0, offset) # insert at beginning of offsets | |
| 1117 | - except struct.error as exc: | |
| 1118 | - log.info('Unable to parse MSO/ActiveMime file header (%s)' % exc) | |
| 1119 | - log.debug('Trace:', exc_info=True) | |
| 1120 | - raise MsoExtractionError('Unable to parse MSO/ActiveMime file header') | |
| 1121 | - # now try offsets | |
| 1122 | - for start in offsets: | |
| 1123 | - try: | |
| 1124 | - log.debug('Attempting zlib decompression from MSO file offset 0x%X' % start) | |
| 1125 | - extracted_data = zlib.decompress(data[start:]) | |
| 1126 | - return extracted_data | |
| 1127 | - except zlib.error as exc: | |
| 1128 | - log.info('zlib decompression failed for offset %s (%s)' | |
| 1129 | - % (start, exc)) | |
| 1130 | - log.debug('Trace:', exc_info=True) | |
| 1131 | - # None of the guessed offsets worked, let's try brute-forcing by looking | |
| 1132 | - # for potential zlib-compressed blocks starting with 0x78: | |
| 1133 | - log.debug('Looking for potential zlib-compressed blocks in MSO file') | |
| 1134 | - for match in re_zlib_header.finditer(data): | |
| 1135 | - start = match.start() | |
| 1136 | - try: | |
| 1137 | - log.debug('Attempting zlib decompression from MSO file offset 0x%X' % start) | |
| 1138 | - extracted_data = zlib.decompress(data[start:]) | |
| 1139 | - return extracted_data | |
| 1140 | - except zlib.error as exc: | |
| 1141 | - log.info('zlib decompression failed (%s)' % exc) | |
| 1142 | - log.debug('Trace:', exc_info=True) | |
| 1143 | - raise MsoExtractionError('Unable to decompress data from a MSO/ActiveMime file') | |
| 1144 | - | |
| 1145 | - | |
| 1146 | -#--- FUNCTIONS ---------------------------------------------------------------- | |
| 1147 | - | |
| 1148 | -# set of printable characters, for is_printable | |
| 1149 | -_PRINTABLE_SET = set(string.printable) | |
| 1150 | - | |
| 1151 | -def is_printable(s): | |
| 1152 | - """ | |
| 1153 | - returns True if string s only contains printable ASCII characters | |
| 1154 | - (i.e. contained in string.printable) | |
| 1155 | - This is similar to Python 3's str.isprintable, for Python 2.x. | |
| 1156 | - :param s: str | |
| 1157 | - :return: bool | |
| 1158 | - """ | |
| 1159 | - # inspired from http://stackoverflow.com/questions/3636928/test-if-a-python-string-is-printable | |
| 1160 | - # check if the set of chars from s is contained into the set of printable chars: | |
| 1161 | - return set(s).issubset(_PRINTABLE_SET) | |
| 1162 | - | |
| 1163 | - | |
| 1164 | -def copytoken_help(decompressed_current, decompressed_chunk_start): | |
| 1165 | - """ | |
| 1166 | - compute bit masks to decode a CopyToken according to MS-OVBA 2.4.1.3.19.1 CopyToken Help | |
| 1167 | - | |
| 1168 | - decompressed_current: number of decompressed bytes so far, i.e. len(decompressed_container) | |
| 1169 | - decompressed_chunk_start: offset of the current chunk in the decompressed container | |
| 1170 | - return length_mask, offset_mask, bit_count, maximum_length | |
| 1171 | - """ | |
| 1172 | - difference = decompressed_current - decompressed_chunk_start | |
| 1173 | - bit_count = int(math.ceil(math.log(difference, 2))) | |
| 1174 | - bit_count = max([bit_count, 4]) | |
| 1175 | - length_mask = 0xFFFF >> bit_count | |
| 1176 | - offset_mask = ~length_mask | |
| 1177 | - maximum_length = (0xFFFF >> bit_count) + 3 | |
| 1178 | - return length_mask, offset_mask, bit_count, maximum_length | |
| 1179 | - | |
| 1180 | - | |
| 1181 | -def decompress_stream(compressed_container): | |
| 1182 | - """ | |
| 1183 | - Decompress a stream according to MS-OVBA section 2.4.1 | |
| 1184 | - | |
| 1185 | - compressed_container: string compressed according to the MS-OVBA 2.4.1.3.6 Compression algorithm | |
| 1186 | - return the decompressed container as a string (bytes) | |
| 1187 | - """ | |
| 1188 | - # 2.4.1.2 State Variables | |
| 1189 | - | |
| 1190 | - # The following state is maintained for the CompressedContainer (section 2.4.1.1.1): | |
| 1191 | - # CompressedRecordEnd: The location of the byte after the last byte in the CompressedContainer (section 2.4.1.1.1). | |
| 1192 | - # CompressedCurrent: The location of the next byte in the CompressedContainer (section 2.4.1.1.1) to be read by | |
| 1193 | - # decompression or to be written by compression. | |
| 1194 | - | |
| 1195 | - # The following state is maintained for the current CompressedChunk (section 2.4.1.1.4): | |
| 1196 | - # CompressedChunkStart: The location of the first byte of the CompressedChunk (section 2.4.1.1.4) within the | |
| 1197 | - # CompressedContainer (section 2.4.1.1.1). | |
| 1198 | - | |
| 1199 | - # The following state is maintained for a DecompressedBuffer (section 2.4.1.1.2): | |
| 1200 | - # DecompressedCurrent: The location of the next byte in the DecompressedBuffer (section 2.4.1.1.2) to be written by | |
| 1201 | - # decompression or to be read by compression. | |
| 1202 | - # DecompressedBufferEnd: The location of the byte after the last byte in the DecompressedBuffer (section 2.4.1.1.2). | |
| 1203 | - | |
| 1204 | - # The following state is maintained for the current DecompressedChunk (section 2.4.1.1.3): | |
| 1205 | - # DecompressedChunkStart: The location of the first byte of the DecompressedChunk (section 2.4.1.1.3) within the | |
| 1206 | - # DecompressedBuffer (section 2.4.1.1.2). | |
| 1207 | - | |
| 1208 | - decompressed_container = bytearray() # result | |
| 1209 | - compressed_current = 0 | |
| 1210 | - | |
| 1211 | - sig_byte = compressed_container[compressed_current] | |
| 1212 | - if sig_byte != 0x01: | |
| 1213 | - raise ValueError('invalid signature byte {0:02X}'.format(sig_byte)) | |
| 1214 | - | |
| 1215 | - compressed_current += 1 | |
| 1216 | - | |
| 1217 | - #NOTE: the definition of CompressedRecordEnd is ambiguous. Here we assume that | |
| 1218 | - # CompressedRecordEnd = len(compressed_container) | |
| 1219 | - while compressed_current < len(compressed_container): | |
| 1220 | - # 2.4.1.1.5 | |
| 1221 | - compressed_chunk_start = compressed_current | |
| 1222 | - # chunk header = first 16 bits | |
| 1223 | - compressed_chunk_header = \ | |
| 1224 | - struct.unpack("<H", compressed_container[compressed_chunk_start:compressed_chunk_start + 2])[0] | |
| 1225 | - # chunk size = 12 first bits of header + 3 | |
| 1226 | - chunk_size = (compressed_chunk_header & 0x0FFF) + 3 | |
| 1227 | - # chunk signature = 3 next bits - should always be 0b011 | |
| 1228 | - chunk_signature = (compressed_chunk_header >> 12) & 0x07 | |
| 1229 | - if chunk_signature != 0b011: | |
| 1230 | - raise ValueError('Invalid CompressedChunkSignature in VBA compressed stream') | |
| 1231 | - # chunk flag = next bit - 1 == compressed, 0 == uncompressed | |
| 1232 | - chunk_flag = (compressed_chunk_header >> 15) & 0x01 | |
| 1233 | - log.debug("chunk size = {0}, compressed flag = {1}".format(chunk_size, chunk_flag)) | |
| 1234 | - | |
| 1235 | - #MS-OVBA 2.4.1.3.12: the maximum size of a chunk including its header is 4098 bytes (header 2 + data 4096) | |
| 1236 | - # The minimum size is 3 bytes | |
| 1237 | - # NOTE: there seems to be a typo in MS-OVBA, the check should be with 4098, not 4095 (which is the max value | |
| 1238 | - # in chunk header before adding 3. | |
| 1239 | - # Also the first test is not useful since a 12 bits value cannot be larger than 4095. | |
| 1240 | - if chunk_flag == 1 and chunk_size > 4098: | |
| 1241 | - raise ValueError('CompressedChunkSize > 4098 but CompressedChunkFlag == 1') | |
| 1242 | - if chunk_flag == 0 and chunk_size != 4098: | |
| 1243 | - raise ValueError('CompressedChunkSize != 4098 but CompressedChunkFlag == 0') | |
| 1244 | - | |
| 1245 | - # check if chunk_size goes beyond the compressed data, instead of silently cutting it: | |
| 1246 | - #TODO: raise an exception? | |
| 1247 | - if compressed_chunk_start + chunk_size > len(compressed_container): | |
| 1248 | - log.warning('Chunk size is larger than remaining compressed data') | |
| 1249 | - compressed_end = min([len(compressed_container), compressed_chunk_start + chunk_size]) | |
| 1250 | - # read after chunk header: | |
| 1251 | - compressed_current = compressed_chunk_start + 2 | |
| 1252 | - | |
| 1253 | - if chunk_flag == 0: | |
| 1254 | - # MS-OVBA 2.4.1.3.3 Decompressing a RawChunk | |
| 1255 | - # uncompressed chunk: read the next 4096 bytes as-is | |
| 1256 | - #TODO: check if there are at least 4096 bytes left | |
| 1257 | - decompressed_container.extend([compressed_container[compressed_current:compressed_current + 4096]]) | |
| 1258 | - compressed_current += 4096 | |
| 1259 | - else: | |
| 1260 | - # MS-OVBA 2.4.1.3.2 Decompressing a CompressedChunk | |
| 1261 | - # compressed chunk | |
| 1262 | - decompressed_chunk_start = len(decompressed_container) | |
| 1263 | - while compressed_current < compressed_end: | |
| 1264 | - # MS-OVBA 2.4.1.3.4 Decompressing a TokenSequence | |
| 1265 | - # log.debug('compressed_current = %d / compressed_end = %d' % (compressed_current, compressed_end)) | |
| 1266 | - # FlagByte: 8 bits indicating if the following 8 tokens are either literal (1 byte of plain text) or | |
| 1267 | - # copy tokens (reference to a previous literal token) | |
| 1268 | - flag_byte = compressed_container[compressed_current] | |
| 1269 | - compressed_current += 1 | |
| 1270 | - for bit_index in range(0, 8): | |
| 1271 | - # log.debug('bit_index=%d / compressed_current=%d / compressed_end=%d' % (bit_index, compressed_current, compressed_end)) | |
| 1272 | - if compressed_current >= compressed_end: | |
| 1273 | - break | |
| 1274 | - # MS-OVBA 2.4.1.3.5 Decompressing a Token | |
| 1275 | - # MS-OVBA 2.4.1.3.17 Extract FlagBit | |
| 1276 | - flag_bit = (flag_byte >> bit_index) & 1 | |
| 1277 | - #log.debug('bit_index=%d: flag_bit=%d' % (bit_index, flag_bit)) | |
| 1278 | - if flag_bit == 0: # LiteralToken | |
| 1279 | - # copy one byte directly to output | |
| 1280 | - decompressed_container.extend([compressed_container[compressed_current]]) | |
| 1281 | - compressed_current += 1 | |
| 1282 | - else: # CopyToken | |
| 1283 | - # MS-OVBA 2.4.1.3.19.2 Unpack CopyToken | |
| 1284 | - copy_token = \ | |
| 1285 | - struct.unpack("<H", compressed_container[compressed_current:compressed_current + 2])[0] | |
| 1286 | - #TODO: check this | |
| 1287 | - length_mask, offset_mask, bit_count, _ = copytoken_help( | |
| 1288 | - len(decompressed_container), decompressed_chunk_start) | |
| 1289 | - length = (copy_token & length_mask) + 3 | |
| 1290 | - temp1 = copy_token & offset_mask | |
| 1291 | - temp2 = 16 - bit_count | |
| 1292 | - offset = (temp1 >> temp2) + 1 | |
| 1293 | - #log.debug('offset=%d length=%d' % (offset, length)) | |
| 1294 | - copy_source = len(decompressed_container) - offset | |
| 1295 | - for index in range(copy_source, copy_source + length): | |
| 1296 | - decompressed_container.extend([decompressed_container[index]]) | |
| 1297 | - compressed_current += 2 | |
| 1298 | - return bytes(decompressed_container) | |
| 1299 | - | |
| 1300 | - | |
| 1301 | -def _extract_vba(ole, vba_root, project_path, dir_path, relaxed=False): | |
| 1302 | - """ | |
| 1303 | - Extract VBA macros from an OleFileIO object. | |
| 1304 | - Internal function, do not call directly. | |
| 1305 | - | |
| 1306 | - vba_root: path to the VBA root storage, containing the VBA storage and the PROJECT stream | |
| 1307 | - vba_project: path to the PROJECT stream | |
| 1308 | - :param relaxed: If True, only create info/debug log entry if data is not as expected | |
| 1309 | - (e.g. opening substream fails); if False, raise an error in this case | |
| 1310 | - This is a generator, yielding (stream path, VBA filename, VBA source code) for each VBA code stream | |
| 1311 | - """ | |
| 1312 | - # Open the PROJECT stream: | |
| 1313 | - project = ole.openstream(project_path) | |
| 1314 | - log.debug('relaxed is %s' % relaxed) | |
| 1315 | - | |
| 1316 | - # sample content of the PROJECT stream: | |
| 1317 | - | |
| 1318 | - ## ID="{5312AC8A-349D-4950-BDD0-49BE3C4DD0F0}" | |
| 1319 | - ## Document=ThisDocument/&H00000000 | |
| 1320 | - ## Module=NewMacros | |
| 1321 | - ## Name="Project" | |
| 1322 | - ## HelpContextID="0" | |
| 1323 | - ## VersionCompatible32="393222000" | |
| 1324 | - ## CMG="F1F301E705E705E705E705" | |
| 1325 | - ## DPB="8F8D7FE3831F2020202020" | |
| 1326 | - ## GC="2D2FDD81E51EE61EE6E1" | |
| 1327 | - ## | |
| 1328 | - ## [Host Extender Info] | |
| 1329 | - ## &H00000001={3832D640-CF90-11CF-8E43-00A0C911005A};VBE;&H00000000 | |
| 1330 | - ## &H00000002={000209F2-0000-0000-C000-000000000046};Word8.0;&H00000000 | |
| 1331 | - ## | |
| 1332 | - ## [Workspace] | |
| 1333 | - ## ThisDocument=22, 29, 339, 477, Z | |
| 1334 | - ## NewMacros=-4, 42, 832, 510, C | |
| 1335 | - | |
| 1336 | - code_modules = {} | |
| 1337 | - | |
| 1338 | - for line in project: | |
| 1339 | - line = line.strip().decode('utf-8','ignore') | |
| 1340 | - if '=' in line: | |
| 1341 | - # split line at the 1st equal sign: | |
| 1342 | - name, value = line.split('=', 1) | |
| 1343 | - # looking for code modules | |
| 1344 | - # add the code module as a key in the dictionary | |
| 1345 | - # the value will be the extension needed later | |
| 1346 | - # The value is converted to lowercase, to allow case-insensitive matching (issue #3) | |
| 1347 | - value = value.lower() | |
| 1348 | - if name == 'Document': | |
| 1349 | - # split value at the 1st slash, keep 1st part: | |
| 1350 | - value = value.split('/', 1)[0] | |
| 1351 | - code_modules[value] = CLASS_EXTENSION | |
| 1352 | - elif name == 'Module': | |
| 1353 | - code_modules[value] = MODULE_EXTENSION | |
| 1354 | - elif name == 'Class': | |
| 1355 | - code_modules[value] = CLASS_EXTENSION | |
| 1356 | - elif name == 'BaseClass': | |
| 1357 | - code_modules[value] = FORM_EXTENSION | |
| 1358 | - | |
| 1359 | - # read data from dir stream (compressed) | |
| 1360 | - dir_compressed = ole.openstream(dir_path).read() | |
| 1361 | - | |
| 1362 | - def check_value(name, expected, value): | |
| 1363 | - if expected != value: | |
| 1364 | - if relaxed: | |
| 1365 | - log.error("invalid value for {0} expected {1:04X} got {2:04X}" | |
| 1366 | - .format(name, expected, value)) | |
| 1367 | - else: | |
| 1368 | - raise UnexpectedDataError(dir_path, name, expected, value) | |
| 1369 | - | |
| 1370 | - dir_stream = BytesIO(decompress_stream(dir_compressed)) | |
| 1371 | - | |
| 1372 | - # PROJECTSYSKIND Record | |
| 1373 | - projectsyskind_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1374 | - check_value('PROJECTSYSKIND_Id', 0x0001, projectsyskind_id) | |
| 1375 | - projectsyskind_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1376 | - check_value('PROJECTSYSKIND_Size', 0x0004, projectsyskind_size) | |
| 1377 | - projectsyskind_syskind = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1378 | - if projectsyskind_syskind == 0x00: | |
| 1379 | - log.debug("16-bit Windows") | |
| 1380 | - elif projectsyskind_syskind == 0x01: | |
| 1381 | - log.debug("32-bit Windows") | |
| 1382 | - elif projectsyskind_syskind == 0x02: | |
| 1383 | - log.debug("Macintosh") | |
| 1384 | - elif projectsyskind_syskind == 0x03: | |
| 1385 | - log.debug("64-bit Windows") | |
| 1386 | - else: | |
| 1387 | - log.error("invalid PROJECTSYSKIND_SysKind {0:04X}".format(projectsyskind_syskind)) | |
| 1388 | - | |
| 1389 | - # PROJECTLCID Record | |
| 1390 | - projectlcid_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1391 | - check_value('PROJECTLCID_Id', 0x0002, projectlcid_id) | |
| 1392 | - projectlcid_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1393 | - check_value('PROJECTLCID_Size', 0x0004, projectlcid_size) | |
| 1394 | - projectlcid_lcid = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1395 | - check_value('PROJECTLCID_Lcid', 0x409, projectlcid_lcid) | |
| 1396 | - | |
| 1397 | - # PROJECTLCIDINVOKE Record | |
| 1398 | - projectlcidinvoke_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1399 | - check_value('PROJECTLCIDINVOKE_Id', 0x0014, projectlcidinvoke_id) | |
| 1400 | - projectlcidinvoke_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1401 | - check_value('PROJECTLCIDINVOKE_Size', 0x0004, projectlcidinvoke_size) | |
| 1402 | - projectlcidinvoke_lcidinvoke = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1403 | - check_value('PROJECTLCIDINVOKE_LcidInvoke', 0x409, projectlcidinvoke_lcidinvoke) | |
| 1404 | - | |
| 1405 | - # PROJECTCODEPAGE Record | |
| 1406 | - projectcodepage_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1407 | - check_value('PROJECTCODEPAGE_Id', 0x0003, projectcodepage_id) | |
| 1408 | - projectcodepage_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1409 | - check_value('PROJECTCODEPAGE_Size', 0x0002, projectcodepage_size) | |
| 1410 | - projectcodepage_codepage = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1411 | - | |
| 1412 | - # PROJECTNAME Record | |
| 1413 | - projectname_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1414 | - check_value('PROJECTNAME_Id', 0x0004, projectname_id) | |
| 1415 | - projectname_sizeof_projectname = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1416 | - if projectname_sizeof_projectname < 1 or projectname_sizeof_projectname > 128: | |
| 1417 | - log.error("PROJECTNAME_SizeOfProjectName value not in range: {0}".format(projectname_sizeof_projectname)) | |
| 1418 | - projectname_projectname = dir_stream.read(projectname_sizeof_projectname) | |
| 1419 | - unused = projectname_projectname | |
| 1420 | - | |
| 1421 | - # PROJECTDOCSTRING Record | |
| 1422 | - projectdocstring_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1423 | - check_value('PROJECTDOCSTRING_Id', 0x0005, projectdocstring_id) | |
| 1424 | - projectdocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1425 | - if projectdocstring_sizeof_docstring > 2000: | |
| 1426 | - log.error( | |
| 1427 | - "PROJECTDOCSTRING_SizeOfDocString value not in range: {0}".format(projectdocstring_sizeof_docstring)) | |
| 1428 | - projectdocstring_docstring = dir_stream.read(projectdocstring_sizeof_docstring) | |
| 1429 | - projectdocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1430 | - check_value('PROJECTDOCSTRING_Reserved', 0x0040, projectdocstring_reserved) | |
| 1431 | - projectdocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1432 | - if projectdocstring_sizeof_docstring_unicode % 2 != 0: | |
| 1433 | - log.error("PROJECTDOCSTRING_SizeOfDocStringUnicode is not even") | |
| 1434 | - projectdocstring_docstring_unicode = dir_stream.read(projectdocstring_sizeof_docstring_unicode) | |
| 1435 | - unused = projectdocstring_docstring | |
| 1436 | - unused = projectdocstring_docstring_unicode | |
| 1437 | - | |
| 1438 | - # PROJECTHELPFILEPATH Record - MS-OVBA 2.3.4.2.1.7 | |
| 1439 | - projecthelpfilepath_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1440 | - check_value('PROJECTHELPFILEPATH_Id', 0x0006, projecthelpfilepath_id) | |
| 1441 | - projecthelpfilepath_sizeof_helpfile1 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1442 | - if projecthelpfilepath_sizeof_helpfile1 > 260: | |
| 1443 | - log.error( | |
| 1444 | - "PROJECTHELPFILEPATH_SizeOfHelpFile1 value not in range: {0}".format(projecthelpfilepath_sizeof_helpfile1)) | |
| 1445 | - projecthelpfilepath_helpfile1 = dir_stream.read(projecthelpfilepath_sizeof_helpfile1) | |
| 1446 | - projecthelpfilepath_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1447 | - check_value('PROJECTHELPFILEPATH_Reserved', 0x003D, projecthelpfilepath_reserved) | |
| 1448 | - projecthelpfilepath_sizeof_helpfile2 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1449 | - if projecthelpfilepath_sizeof_helpfile2 != projecthelpfilepath_sizeof_helpfile1: | |
| 1450 | - log.error("PROJECTHELPFILEPATH_SizeOfHelpFile1 does not equal PROJECTHELPFILEPATH_SizeOfHelpFile2") | |
| 1451 | - projecthelpfilepath_helpfile2 = dir_stream.read(projecthelpfilepath_sizeof_helpfile2) | |
| 1452 | - if projecthelpfilepath_helpfile2 != projecthelpfilepath_helpfile1: | |
| 1453 | - log.error("PROJECTHELPFILEPATH_HelpFile1 does not equal PROJECTHELPFILEPATH_HelpFile2") | |
| 1454 | - | |
| 1455 | - # PROJECTHELPCONTEXT Record | |
| 1456 | - projecthelpcontext_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1457 | - check_value('PROJECTHELPCONTEXT_Id', 0x0007, projecthelpcontext_id) | |
| 1458 | - projecthelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1459 | - check_value('PROJECTHELPCONTEXT_Size', 0x0004, projecthelpcontext_size) | |
| 1460 | - projecthelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1461 | - unused = projecthelpcontext_helpcontext | |
| 1462 | - | |
| 1463 | - # PROJECTLIBFLAGS Record | |
| 1464 | - projectlibflags_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1465 | - check_value('PROJECTLIBFLAGS_Id', 0x0008, projectlibflags_id) | |
| 1466 | - projectlibflags_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1467 | - check_value('PROJECTLIBFLAGS_Size', 0x0004, projectlibflags_size) | |
| 1468 | - projectlibflags_projectlibflags = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1469 | - check_value('PROJECTLIBFLAGS_ProjectLibFlags', 0x0000, projectlibflags_projectlibflags) | |
| 1470 | - | |
| 1471 | - # PROJECTVERSION Record | |
| 1472 | - projectversion_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1473 | - check_value('PROJECTVERSION_Id', 0x0009, projectversion_id) | |
| 1474 | - projectversion_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1475 | - check_value('PROJECTVERSION_Reserved', 0x0004, projectversion_reserved) | |
| 1476 | - projectversion_versionmajor = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1477 | - projectversion_versionminor = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1478 | - unused = projectversion_versionmajor | |
| 1479 | - unused = projectversion_versionminor | |
| 1480 | - | |
| 1481 | - # PROJECTCONSTANTS Record | |
| 1482 | - projectconstants_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1483 | - check_value('PROJECTCONSTANTS_Id', 0x000C, projectconstants_id) | |
| 1484 | - projectconstants_sizeof_constants = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1485 | - if projectconstants_sizeof_constants > 1015: | |
| 1486 | - log.error( | |
| 1487 | - "PROJECTCONSTANTS_SizeOfConstants value not in range: {0}".format(projectconstants_sizeof_constants)) | |
| 1488 | - projectconstants_constants = dir_stream.read(projectconstants_sizeof_constants) | |
| 1489 | - projectconstants_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1490 | - check_value('PROJECTCONSTANTS_Reserved', 0x003C, projectconstants_reserved) | |
| 1491 | - projectconstants_sizeof_constants_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1492 | - if projectconstants_sizeof_constants_unicode % 2 != 0: | |
| 1493 | - log.error("PROJECTCONSTANTS_SizeOfConstantsUnicode is not even") | |
| 1494 | - projectconstants_constants_unicode = dir_stream.read(projectconstants_sizeof_constants_unicode) | |
| 1495 | - unused = projectconstants_constants | |
| 1496 | - unused = projectconstants_constants_unicode | |
| 1497 | - | |
| 1498 | - # array of REFERENCE records | |
| 1499 | - check = None | |
| 1500 | - while True: | |
| 1501 | - check = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1502 | - log.debug("reference type = {0:04X}".format(check)) | |
| 1503 | - if check == 0x000F: | |
| 1504 | - break | |
| 1505 | - | |
| 1506 | - if check == 0x0016: | |
| 1507 | - # REFERENCENAME | |
| 1508 | - reference_id = check | |
| 1509 | - reference_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1510 | - reference_name = dir_stream.read(reference_sizeof_name) | |
| 1511 | - reference_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1512 | - # According to [MS-OVBA] 2.3.4.2.2.2 REFERENCENAME Record: | |
| 1513 | - # "Reserved (2 bytes): MUST be 0x003E. MUST be ignored." | |
| 1514 | - # So let's ignore it, otherwise it crashes on some files (issue #132) | |
| 1515 | - # PR #135 by @c1fe: | |
| 1516 | - # contrary to the specification I think that the unicode name | |
| 1517 | - # is optional. if reference_reserved is not 0x003E I think it | |
| 1518 | - # is actually the start of another REFERENCE record | |
| 1519 | - # at least when projectsyskind_syskind == 0x02 (Macintosh) | |
| 1520 | - if reference_reserved == 0x003E: | |
| 1521 | - #if reference_reserved not in (0x003E, 0x000D): | |
| 1522 | - # raise UnexpectedDataError(dir_path, 'REFERENCE_Reserved', | |
| 1523 | - # 0x0003E, reference_reserved) | |
| 1524 | - reference_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1525 | - reference_name_unicode = dir_stream.read(reference_sizeof_name_unicode) | |
| 1526 | - unused = reference_id | |
| 1527 | - unused = reference_name | |
| 1528 | - unused = reference_name_unicode | |
| 1529 | - continue | |
| 1530 | - else: | |
| 1531 | - check = reference_reserved | |
| 1532 | - log.debug("reference type = {0:04X}".format(check)) | |
| 1533 | - | |
| 1534 | - if check == 0x0033: | |
| 1535 | - # REFERENCEORIGINAL (followed by REFERENCECONTROL) | |
| 1536 | - referenceoriginal_id = check | |
| 1537 | - referenceoriginal_sizeof_libidoriginal = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1538 | - referenceoriginal_libidoriginal = dir_stream.read(referenceoriginal_sizeof_libidoriginal) | |
| 1539 | - unused = referenceoriginal_id | |
| 1540 | - unused = referenceoriginal_libidoriginal | |
| 1541 | - continue | |
| 1542 | - | |
| 1543 | - if check == 0x002F: | |
| 1544 | - # REFERENCECONTROL | |
| 1545 | - referencecontrol_id = check | |
| 1546 | - referencecontrol_sizetwiddled = struct.unpack("<L", dir_stream.read(4))[0] # ignore | |
| 1547 | - referencecontrol_sizeof_libidtwiddled = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1548 | - referencecontrol_libidtwiddled = dir_stream.read(referencecontrol_sizeof_libidtwiddled) | |
| 1549 | - referencecontrol_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] # ignore | |
| 1550 | - check_value('REFERENCECONTROL_Reserved1', 0x0000, referencecontrol_reserved1) | |
| 1551 | - referencecontrol_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] # ignore | |
| 1552 | - check_value('REFERENCECONTROL_Reserved2', 0x0000, referencecontrol_reserved2) | |
| 1553 | - unused = referencecontrol_id | |
| 1554 | - unused = referencecontrol_sizetwiddled | |
| 1555 | - unused = referencecontrol_libidtwiddled | |
| 1556 | - # optional field | |
| 1557 | - check2 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1558 | - if check2 == 0x0016: | |
| 1559 | - referencecontrol_namerecordextended_id = check | |
| 1560 | - referencecontrol_namerecordextended_sizeof_name = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1561 | - referencecontrol_namerecordextended_name = dir_stream.read( | |
| 1562 | - referencecontrol_namerecordextended_sizeof_name) | |
| 1563 | - referencecontrol_namerecordextended_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1564 | - if referencecontrol_namerecordextended_reserved == 0x003E: | |
| 1565 | - referencecontrol_namerecordextended_sizeof_name_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1566 | - referencecontrol_namerecordextended_name_unicode = dir_stream.read( | |
| 1567 | - referencecontrol_namerecordextended_sizeof_name_unicode) | |
| 1568 | - referencecontrol_reserved3 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1569 | - unused = referencecontrol_namerecordextended_id | |
| 1570 | - unused = referencecontrol_namerecordextended_name | |
| 1571 | - unused = referencecontrol_namerecordextended_name_unicode | |
| 1572 | - else: | |
| 1573 | - referencecontrol_reserved3 = referencecontrol_namerecordextended_reserved | |
| 1574 | - else: | |
| 1575 | - referencecontrol_reserved3 = check2 | |
| 1576 | - | |
| 1577 | - check_value('REFERENCECONTROL_Reserved3', 0x0030, referencecontrol_reserved3) | |
| 1578 | - referencecontrol_sizeextended = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1579 | - referencecontrol_sizeof_libidextended = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1580 | - referencecontrol_libidextended = dir_stream.read(referencecontrol_sizeof_libidextended) | |
| 1581 | - referencecontrol_reserved4 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1582 | - referencecontrol_reserved5 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1583 | - referencecontrol_originaltypelib = dir_stream.read(16) | |
| 1584 | - referencecontrol_cookie = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1585 | - unused = referencecontrol_sizeextended | |
| 1586 | - unused = referencecontrol_libidextended | |
| 1587 | - unused = referencecontrol_reserved4 | |
| 1588 | - unused = referencecontrol_reserved5 | |
| 1589 | - unused = referencecontrol_originaltypelib | |
| 1590 | - unused = referencecontrol_cookie | |
| 1591 | - continue | |
| 1592 | - | |
| 1593 | - if check == 0x000D: | |
| 1594 | - # REFERENCEREGISTERED | |
| 1595 | - referenceregistered_id = check | |
| 1596 | - referenceregistered_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1597 | - referenceregistered_sizeof_libid = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1598 | - referenceregistered_libid = dir_stream.read(referenceregistered_sizeof_libid) | |
| 1599 | - referenceregistered_reserved1 = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1600 | - check_value('REFERENCEREGISTERED_Reserved1', 0x0000, referenceregistered_reserved1) | |
| 1601 | - referenceregistered_reserved2 = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1602 | - check_value('REFERENCEREGISTERED_Reserved2', 0x0000, referenceregistered_reserved2) | |
| 1603 | - unused = referenceregistered_id | |
| 1604 | - unused = referenceregistered_size | |
| 1605 | - unused = referenceregistered_libid | |
| 1606 | - continue | |
| 1607 | - | |
| 1608 | - if check == 0x000E: | |
| 1609 | - # REFERENCEPROJECT | |
| 1610 | - referenceproject_id = check | |
| 1611 | - referenceproject_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1612 | - referenceproject_sizeof_libidabsolute = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1613 | - referenceproject_libidabsolute = dir_stream.read(referenceproject_sizeof_libidabsolute) | |
| 1614 | - referenceproject_sizeof_libidrelative = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1615 | - referenceproject_libidrelative = dir_stream.read(referenceproject_sizeof_libidrelative) | |
| 1616 | - referenceproject_majorversion = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1617 | - referenceproject_minorversion = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1618 | - unused = referenceproject_id | |
| 1619 | - unused = referenceproject_size | |
| 1620 | - unused = referenceproject_libidabsolute | |
| 1621 | - unused = referenceproject_libidrelative | |
| 1622 | - unused = referenceproject_majorversion | |
| 1623 | - unused = referenceproject_minorversion | |
| 1624 | - continue | |
| 1625 | - | |
| 1626 | - log.error('invalid or unknown check Id {0:04X}'.format(check)) | |
| 1627 | - # raise an exception instead of stopping abruptly (issue #180) | |
| 1628 | - raise UnexpectedDataError(dir_path, 'reference type', (0x0F, 0x16, 0x33, 0x2F, 0x0D, 0x0E), check) | |
| 1629 | - #sys.exit(0) | |
| 1630 | - | |
| 1631 | - projectmodules_id = check #struct.unpack("<H", dir_stream.read(2))[0] | |
| 1632 | - check_value('PROJECTMODULES_Id', 0x000F, projectmodules_id) | |
| 1633 | - projectmodules_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1634 | - check_value('PROJECTMODULES_Size', 0x0002, projectmodules_size) | |
| 1635 | - projectmodules_count = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1636 | - projectmodules_projectcookierecord_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1637 | - check_value('PROJECTMODULES_ProjectCookieRecord_Id', 0x0013, projectmodules_projectcookierecord_id) | |
| 1638 | - projectmodules_projectcookierecord_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1639 | - check_value('PROJECTMODULES_ProjectCookieRecord_Size', 0x0002, projectmodules_projectcookierecord_size) | |
| 1640 | - projectmodules_projectcookierecord_cookie = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1641 | - unused = projectmodules_projectcookierecord_cookie | |
| 1642 | - | |
| 1643 | - # short function to simplify unicode text output | |
| 1644 | - uni_out = lambda unicode_text: unicode_text.encode('utf-8', 'replace') | |
| 1645 | - | |
| 1646 | - log.debug("parsing {0} modules".format(projectmodules_count)) | |
| 1647 | - for projectmodule_index in range(0, projectmodules_count): | |
| 1648 | - try: | |
| 1649 | - modulename_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1650 | - check_value('MODULENAME_Id', 0x0019, modulename_id) | |
| 1651 | - modulename_sizeof_modulename = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1652 | - modulename_modulename = dir_stream.read(modulename_sizeof_modulename).decode('utf-8', 'backslashreplace') | |
| 1653 | - # TODO: preset variables to avoid "referenced before assignment" errors | |
| 1654 | - modulename_unicode_modulename_unicode = '' | |
| 1655 | - # account for optional sections | |
| 1656 | - section_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1657 | - if section_id == 0x0047: | |
| 1658 | - modulename_unicode_id = section_id | |
| 1659 | - modulename_unicode_sizeof_modulename_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1660 | - modulename_unicode_modulename_unicode = dir_stream.read( | |
| 1661 | - modulename_unicode_sizeof_modulename_unicode).decode('UTF-16LE', 'replace') | |
| 1662 | - # just guessing that this is the same encoding as used in OleFileIO | |
| 1663 | - unused = modulename_unicode_id | |
| 1664 | - section_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1665 | - if section_id == 0x001A: | |
| 1666 | - modulestreamname_id = section_id | |
| 1667 | - modulestreamname_sizeof_streamname = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1668 | - modulestreamname_streamname = dir_stream.read(modulestreamname_sizeof_streamname) | |
| 1669 | - modulestreamname_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1670 | - check_value('MODULESTREAMNAME_Reserved', 0x0032, modulestreamname_reserved) | |
| 1671 | - modulestreamname_sizeof_streamname_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1672 | - modulestreamname_streamname_unicode = dir_stream.read( | |
| 1673 | - modulestreamname_sizeof_streamname_unicode).decode('UTF-16LE', 'replace') | |
| 1674 | - # just guessing that this is the same encoding as used in OleFileIO | |
| 1675 | - unused = modulestreamname_id | |
| 1676 | - section_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1677 | - if section_id == 0x001C: | |
| 1678 | - moduledocstring_id = section_id | |
| 1679 | - check_value('MODULEDOCSTRING_Id', 0x001C, moduledocstring_id) | |
| 1680 | - moduledocstring_sizeof_docstring = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1681 | - moduledocstring_docstring = dir_stream.read(moduledocstring_sizeof_docstring) | |
| 1682 | - moduledocstring_reserved = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1683 | - check_value('MODULEDOCSTRING_Reserved', 0x0048, moduledocstring_reserved) | |
| 1684 | - moduledocstring_sizeof_docstring_unicode = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1685 | - moduledocstring_docstring_unicode = dir_stream.read(moduledocstring_sizeof_docstring_unicode) | |
| 1686 | - unused = moduledocstring_docstring | |
| 1687 | - unused = moduledocstring_docstring_unicode | |
| 1688 | - section_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1689 | - if section_id == 0x0031: | |
| 1690 | - moduleoffset_id = section_id | |
| 1691 | - check_value('MODULEOFFSET_Id', 0x0031, moduleoffset_id) | |
| 1692 | - moduleoffset_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1693 | - check_value('MODULEOFFSET_Size', 0x0004, moduleoffset_size) | |
| 1694 | - moduleoffset_textoffset = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1695 | - section_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1696 | - if section_id == 0x001E: | |
| 1697 | - modulehelpcontext_id = section_id | |
| 1698 | - check_value('MODULEHELPCONTEXT_Id', 0x001E, modulehelpcontext_id) | |
| 1699 | - modulehelpcontext_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1700 | - check_value('MODULEHELPCONTEXT_Size', 0x0004, modulehelpcontext_size) | |
| 1701 | - modulehelpcontext_helpcontext = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1702 | - unused = modulehelpcontext_helpcontext | |
| 1703 | - section_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1704 | - if section_id == 0x002C: | |
| 1705 | - modulecookie_id = section_id | |
| 1706 | - check_value('MODULECOOKIE_Id', 0x002C, modulecookie_id) | |
| 1707 | - modulecookie_size = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1708 | - check_value('MODULECOOKIE_Size', 0x0002, modulecookie_size) | |
| 1709 | - modulecookie_cookie = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1710 | - unused = modulecookie_cookie | |
| 1711 | - section_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1712 | - if section_id == 0x0021 or section_id == 0x0022: | |
| 1713 | - moduletype_id = section_id | |
| 1714 | - moduletype_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1715 | - unused = moduletype_id | |
| 1716 | - unused = moduletype_reserved | |
| 1717 | - section_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1718 | - if section_id == 0x0025: | |
| 1719 | - modulereadonly_id = section_id | |
| 1720 | - check_value('MODULEREADONLY_Id', 0x0025, modulereadonly_id) | |
| 1721 | - modulereadonly_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1722 | - check_value('MODULEREADONLY_Reserved', 0x0000, modulereadonly_reserved) | |
| 1723 | - section_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1724 | - if section_id == 0x0028: | |
| 1725 | - moduleprivate_id = section_id | |
| 1726 | - check_value('MODULEPRIVATE_Id', 0x0028, moduleprivate_id) | |
| 1727 | - moduleprivate_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1728 | - check_value('MODULEPRIVATE_Reserved', 0x0000, moduleprivate_reserved) | |
| 1729 | - section_id = struct.unpack("<H", dir_stream.read(2))[0] | |
| 1730 | - if section_id == 0x002B: # TERMINATOR | |
| 1731 | - module_reserved = struct.unpack("<L", dir_stream.read(4))[0] | |
| 1732 | - check_value('MODULE_Reserved', 0x0000, module_reserved) | |
| 1733 | - section_id = None | |
| 1734 | - if section_id != None: | |
| 1735 | - log.warning('unknown or invalid module section id {0:04X}'.format(section_id)) | |
| 1736 | - | |
| 1737 | - log.debug('Project CodePage = %d' % projectcodepage_codepage) | |
| 1738 | - if projectcodepage_codepage in MAC_CODEPAGES: | |
| 1739 | - vba_codec = MAC_CODEPAGES[projectcodepage_codepage] | |
| 1740 | - else: | |
| 1741 | - vba_codec = 'cp%d' % projectcodepage_codepage | |
| 1742 | - log.debug("ModuleName = {0}".format(modulename_modulename)) | |
| 1743 | - log.debug("ModuleNameUnicode = {0}".format(uni_out(modulename_unicode_modulename_unicode))) | |
| 1744 | - log.debug("StreamName = {0}".format(modulestreamname_streamname)) | |
| 1745 | - try: | |
| 1746 | - streamname_unicode = modulestreamname_streamname.decode(vba_codec) | |
| 1747 | - except UnicodeError as ue: | |
| 1748 | - log.debug('failed to decode stream name {0!r} with codec {1}' | |
| 1749 | - .format(uni_out(streamname_unicode), vba_codec)) | |
| 1750 | - streamname_unicode = modulestreamname_streamname.decode(vba_codec, errors='replace') | |
| 1751 | - log.debug("StreamName.decode('%s') = %s" % (vba_codec, uni_out(streamname_unicode))) | |
| 1752 | - log.debug("StreamNameUnicode = {0}".format(uni_out(modulestreamname_streamname_unicode))) | |
| 1753 | - log.debug("TextOffset = {0}".format(moduleoffset_textoffset)) | |
| 1754 | - | |
| 1755 | - code_data = None | |
| 1756 | - try_names = streamname_unicode, \ | |
| 1757 | - modulename_unicode_modulename_unicode, \ | |
| 1758 | - modulestreamname_streamname_unicode | |
| 1759 | - for stream_name in try_names: | |
| 1760 | - # TODO: if olefile._find were less private, could replace this | |
| 1761 | - # try-except with calls to it | |
| 1762 | - try: | |
| 1763 | - code_path = vba_root + u'VBA/' + stream_name | |
| 1764 | - log.debug('opening VBA code stream %s' % uni_out(code_path)) | |
| 1765 | - code_data = ole.openstream(code_path).read() | |
| 1766 | - break | |
| 1767 | - except IOError as ioe: | |
| 1768 | - log.debug('failed to open stream VBA/%r (%r), try other name' | |
| 1769 | - % (uni_out(stream_name), ioe)) | |
| 1770 | - | |
| 1771 | - if code_data is None: | |
| 1772 | - log.info("Could not open stream %d of %d ('VBA/' + one of %r)!" | |
| 1773 | - % (projectmodule_index, projectmodules_count, | |
| 1774 | - '/'.join("'" + uni_out(stream_name) + "'" | |
| 1775 | - for stream_name in try_names))) | |
| 1776 | - if relaxed: | |
| 1777 | - continue # ... with next submodule | |
| 1778 | - else: | |
| 1779 | - raise SubstreamOpenError('[BASE]', 'VBA/' + | |
| 1780 | - uni_out(modulename_unicode_modulename_unicode)) | |
| 1781 | - | |
| 1782 | - log.debug("length of code_data = {0}".format(len(code_data))) | |
| 1783 | - log.debug("offset of code_data = {0}".format(moduleoffset_textoffset)) | |
| 1784 | - code_data = code_data[moduleoffset_textoffset:] | |
| 1785 | - if len(code_data) > 0: | |
| 1786 | - code_data = decompress_stream(code_data) | |
| 1787 | - # case-insensitive search in the code_modules dict to find the file extension: | |
| 1788 | - filext = code_modules.get(modulename_modulename.lower(), 'bin') | |
| 1789 | - filename = '{0}.{1}'.format(modulename_modulename, filext) | |
| 1790 | - #TODO: also yield the codepage so that callers can decode it properly | |
| 1791 | - yield (code_path, filename, code_data) | |
| 1792 | - # print '-'*79 | |
| 1793 | - # print filename | |
| 1794 | - # print '' | |
| 1795 | - # print code_data | |
| 1796 | - # print '' | |
| 1797 | - log.debug('extracted file {0}'.format(filename)) | |
| 1798 | - else: | |
| 1799 | - log.warning("module stream {0} has code data length 0".format(modulestreamname_streamname)) | |
| 1800 | - except (UnexpectedDataError, SubstreamOpenError): | |
| 1801 | - raise | |
| 1802 | - except Exception as exc: | |
| 1803 | - log.info('Error parsing module {0} of {1} in _extract_vba:' | |
| 1804 | - .format(projectmodule_index, projectmodules_count), | |
| 1805 | - exc_info=True) | |
| 1806 | - if not relaxed: | |
| 1807 | - raise | |
| 1808 | - _ = unused # make pylint happy: now variable "unused" is being used ;-) | |
| 1809 | - return | |
| 1810 | - | |
| 1811 | - | |
| 1812 | -def vba_collapse_long_lines(vba_code): | |
| 1813 | - """ | |
| 1814 | - Parse a VBA module code to detect continuation line characters (underscore) and | |
| 1815 | - collapse split lines. Continuation line characters are replaced by spaces. | |
| 1816 | - | |
| 1817 | - :param vba_code: str, VBA module code | |
| 1818 | - :return: str, VBA module code with long lines collapsed | |
| 1819 | - """ | |
| 1820 | - # TODO: use a regex instead, to allow whitespaces after the underscore? | |
| 1821 | - vba_code = vba_code.replace(' _\r\n', ' ') | |
| 1822 | - vba_code = vba_code.replace(' _\r', ' ') | |
| 1823 | - vba_code = vba_code.replace(' _\n', ' ') | |
| 1824 | - return vba_code | |
| 1825 | - | |
| 1826 | - | |
| 1827 | -def filter_vba(vba_code): | |
| 1828 | - """ | |
| 1829 | - Filter VBA source code to remove the first lines starting with "Attribute VB_", | |
| 1830 | - which are automatically added by MS Office and not displayed in the VBA Editor. | |
| 1831 | - This should only be used when displaying source code for human analysis. | |
| 1832 | - | |
| 1833 | - Note: lines are not filtered if they contain a colon, because it could be | |
| 1834 | - used to hide malicious instructions. | |
| 1835 | - | |
| 1836 | - :param vba_code: str, VBA source code | |
| 1837 | - :return: str, filtered VBA source code | |
| 1838 | - """ | |
| 1839 | - vba_lines = vba_code.splitlines() | |
| 1840 | - start = 0 | |
| 1841 | - for line in vba_lines: | |
| 1842 | - if line.startswith("Attribute VB_") and not ':' in line: | |
| 1843 | - start += 1 | |
| 1844 | - else: | |
| 1845 | - break | |
| 1846 | - #TODO: also remove empty lines? | |
| 1847 | - vba = '\n'.join(vba_lines[start:]) | |
| 1848 | - return vba | |
| 1849 | - | |
| 1850 | - | |
| 1851 | -def detect_autoexec(vba_code, obfuscation=None): | |
| 1852 | - """ | |
| 1853 | - Detect if the VBA code contains keywords corresponding to macros running | |
| 1854 | - automatically when triggered by specific actions (e.g. when a document is | |
| 1855 | - opened or closed). | |
| 1856 | - | |
| 1857 | - :param vba_code: str, VBA source code | |
| 1858 | - :param obfuscation: None or str, name of obfuscation to be added to description | |
| 1859 | - :return: list of str tuples (keyword, description) | |
| 1860 | - """ | |
| 1861 | - #TODO: merge code with detect_suspicious | |
| 1862 | - # case-insensitive search | |
| 1863 | - #vba_code = vba_code.lower() | |
| 1864 | - results = [] | |
| 1865 | - obf_text = '' | |
| 1866 | - if obfuscation: | |
| 1867 | - obf_text = ' (obfuscation: %s)' % obfuscation | |
| 1868 | - for description, keywords in AUTOEXEC_KEYWORDS.items(): | |
| 1869 | - for keyword in keywords: | |
| 1870 | - #TODO: if keyword is already a compiled regex, use it as-is | |
| 1871 | - # search using regex to detect word boundaries: | |
| 1872 | - match = re.search(r'(?i)\b' + keyword + r'\b', vba_code) | |
| 1873 | - if match: | |
| 1874 | - #if keyword.lower() in vba_code: | |
| 1875 | - found_keyword = match.group() | |
| 1876 | - results.append((found_keyword, description + obf_text)) | |
| 1877 | - return results | |
| 1878 | - | |
| 1879 | - | |
| 1880 | -def detect_suspicious(vba_code, obfuscation=None): | |
| 1881 | - """ | |
| 1882 | - Detect if the VBA code contains suspicious keywords corresponding to | |
| 1883 | - potential malware behaviour. | |
| 1884 | - | |
| 1885 | - :param vba_code: str, VBA source code | |
| 1886 | - :param obfuscation: None or str, name of obfuscation to be added to description | |
| 1887 | - :return: list of str tuples (keyword, description) | |
| 1888 | - """ | |
| 1889 | - # case-insensitive search | |
| 1890 | - #vba_code = vba_code.lower() | |
| 1891 | - results = [] | |
| 1892 | - obf_text = '' | |
| 1893 | - if obfuscation: | |
| 1894 | - obf_text = ' (obfuscation: %s)' % obfuscation | |
| 1895 | - for description, keywords in SUSPICIOUS_KEYWORDS.items(): | |
| 1896 | - for keyword in keywords: | |
| 1897 | - # search using regex to detect word boundaries: | |
| 1898 | - match = re.search(r'(?i)\b' + re.escape(keyword) + r'\b', vba_code) | |
| 1899 | - if match: | |
| 1900 | - #if keyword.lower() in vba_code: | |
| 1901 | - found_keyword = match.group() | |
| 1902 | - results.append((found_keyword, description + obf_text)) | |
| 1903 | - return results | |
| 1904 | - | |
| 1905 | - | |
| 1906 | -def detect_patterns(vba_code, obfuscation=None): | |
| 1907 | - """ | |
| 1908 | - Detect if the VBA code contains specific patterns such as IP addresses, | |
| 1909 | - URLs, e-mail addresses, executable file names, etc. | |
| 1910 | - | |
| 1911 | - :param vba_code: str, VBA source code | |
| 1912 | - :return: list of str tuples (pattern type, value) | |
| 1913 | - """ | |
| 1914 | - results = [] | |
| 1915 | - found = set() | |
| 1916 | - obf_text = '' | |
| 1917 | - if obfuscation: | |
| 1918 | - obf_text = ' (obfuscation: %s)' % obfuscation | |
| 1919 | - for pattern_type, pattern_re in RE_PATTERNS: | |
| 1920 | - for match in pattern_re.finditer(vba_code): | |
| 1921 | - value = match.group() | |
| 1922 | - if value not in found: | |
| 1923 | - results.append((pattern_type + obf_text, value)) | |
| 1924 | - found.add(value) | |
| 1925 | - return results | |
| 1926 | - | |
| 1927 | - | |
| 1928 | -def detect_hex_strings(vba_code): | |
| 1929 | - """ | |
| 1930 | - Detect if the VBA code contains strings encoded in hexadecimal. | |
| 1931 | - | |
| 1932 | - :param vba_code: str, VBA source code | |
| 1933 | - :return: list of str tuples (encoded string, decoded string) | |
| 1934 | - """ | |
| 1935 | - results = [] | |
| 1936 | - found = set() | |
| 1937 | - for match in re_hex_string.finditer(vba_code): | |
| 1938 | - value = match.group() | |
| 1939 | - if value not in found: | |
| 1940 | - decoded = binascii.unhexlify(value) | |
| 1941 | - results.append((value, decoded.decode('utf-8', 'backslashreplace'))) | |
| 1942 | - found.add(value) | |
| 1943 | - return results | |
| 1944 | - | |
| 1945 | - | |
| 1946 | -def detect_base64_strings(vba_code): | |
| 1947 | - """ | |
| 1948 | - Detect if the VBA code contains strings encoded in base64. | |
| 1949 | - | |
| 1950 | - :param vba_code: str, VBA source code | |
| 1951 | - :return: list of str tuples (encoded string, decoded string) | |
| 1952 | - """ | |
| 1953 | - #TODO: avoid matching simple hex strings as base64? | |
| 1954 | - results = [] | |
| 1955 | - found = set() | |
| 1956 | - for match in re_base64_string.finditer(vba_code): | |
| 1957 | - # extract the base64 string without quotes: | |
| 1958 | - value = match.group().strip('"') | |
| 1959 | - # check it is not just a hex string: | |
| 1960 | - if not re_nothex_check.search(value): | |
| 1961 | - continue | |
| 1962 | - # only keep new values and not in the whitelist: | |
| 1963 | - if value not in found and value.lower() not in BASE64_WHITELIST: | |
| 1964 | - try: | |
| 1965 | - decoded = base64.b64decode(value) | |
| 1966 | - results.append((value, decoded.decode('utf-8','replace'))) | |
| 1967 | - found.add(value) | |
| 1968 | - except (TypeError, ValueError) as exc: | |
| 1969 | - log.debug('Failed to base64-decode (%s)' % exc) | |
| 1970 | - # if an exception occurs, it is likely not a base64-encoded string | |
| 1971 | - return results | |
| 1972 | - | |
| 1973 | - | |
| 1974 | -def detect_dridex_strings(vba_code): | |
| 1975 | - """ | |
| 1976 | - Detect if the VBA code contains strings obfuscated with a specific algorithm found in Dridex samples. | |
| 1977 | - | |
| 1978 | - :param vba_code: str, VBA source code | |
| 1979 | - :return: list of str tuples (encoded string, decoded string) | |
| 1980 | - """ | |
| 1981 | - # TODO: move this at the beginning of script | |
| 1982 | - from oletools.thirdparty.DridexUrlDecoder.DridexUrlDecoder import DridexUrlDecode | |
| 1983 | - | |
| 1984 | - results = [] | |
| 1985 | - found = set() | |
| 1986 | - for match in re_dridex_string.finditer(vba_code): | |
| 1987 | - value = match.group()[1:-1] | |
| 1988 | - # check it is not just a hex string: | |
| 1989 | - if not re_nothex_check.search(value): | |
| 1990 | - continue | |
| 1991 | - if value not in found: | |
| 1992 | - try: | |
| 1993 | - decoded = DridexUrlDecode(value) | |
| 1994 | - results.append((value, decoded)) | |
| 1995 | - found.add(value) | |
| 1996 | - except Exception as exc: | |
| 1997 | - log.debug('Failed to Dridex-decode (%s)' % exc) | |
| 1998 | - # if an exception occurs, it is likely not a dridex-encoded string | |
| 1999 | - return results | |
| 2000 | - | |
| 2001 | - | |
| 2002 | -def detect_vba_strings(vba_code): | |
| 2003 | - """ | |
| 2004 | - Detect if the VBA code contains strings obfuscated with VBA expressions | |
| 2005 | - using keywords such as Chr, Asc, Val, StrReverse, etc. | |
| 2006 | - | |
| 2007 | - :param vba_code: str, VBA source code | |
| 2008 | - :return: list of str tuples (encoded string, decoded string) | |
| 2009 | - """ | |
| 2010 | - # TODO: handle exceptions | |
| 2011 | - results = [] | |
| 2012 | - found = set() | |
| 2013 | - # IMPORTANT: to extract the actual VBA expressions found in the code, | |
| 2014 | - # we must expand tabs to have the same string as pyparsing. | |
| 2015 | - # Otherwise, start and end offsets are incorrect. | |
| 2016 | - vba_code = vba_code.expandtabs() | |
| 2017 | - # Split the VBA code line by line to avoid MemoryError on large scripts: | |
| 2018 | - for vba_line in vba_code.splitlines(): | |
| 2019 | - for tokens, start, end in vba_expr_str.scanString(vba_line): | |
| 2020 | - encoded = vba_line[start:end] | |
| 2021 | - decoded = tokens[0] | |
| 2022 | - if isinstance(decoded, VbaExpressionString): | |
| 2023 | - # This is a VBA expression, not a simple string | |
| 2024 | - # print 'VBA EXPRESSION: encoded=%r => decoded=%r' % (encoded, decoded) | |
| 2025 | - # remove parentheses and quotes from original string: | |
| 2026 | - # if encoded.startswith('(') and encoded.endswith(')'): | |
| 2027 | - # encoded = encoded[1:-1] | |
| 2028 | - # if encoded.startswith('"') and encoded.endswith('"'): | |
| 2029 | - # encoded = encoded[1:-1] | |
| 2030 | - # avoid duplicates and simple strings: | |
| 2031 | - if encoded not in found and decoded != encoded: | |
| 2032 | - results.append((encoded, decoded)) | |
| 2033 | - found.add(encoded) | |
| 2034 | - # else: | |
| 2035 | - # print 'VBA STRING: encoded=%r => decoded=%r' % (encoded, decoded) | |
| 2036 | - return results | |
| 2037 | - | |
| 2038 | - | |
| 2039 | -def json2ascii(json_obj, encoding='utf8', errors='replace'): | |
| 2040 | - """ ensure there is no unicode in json and all strings are safe to decode | |
| 2041 | - | |
| 2042 | - works recursively, decodes and re-encodes every string to/from unicode | |
| 2043 | - to ensure there will be no trouble in loading the dumped json output | |
| 2044 | - """ | |
| 2045 | - if json_obj is None: | |
| 2046 | - pass | |
| 2047 | - elif isinstance(json_obj, (bool, int, float)): | |
| 2048 | - pass | |
| 2049 | - elif isinstance(json_obj, str): | |
| 2050 | - # de-code and re-encode | |
| 2051 | - dencoded = json_obj | |
| 2052 | - if dencoded != json_obj: | |
| 2053 | - log.debug('json2ascii: replaced: {0} (len {1})' | |
| 2054 | - .format(json_obj, len(json_obj))) | |
| 2055 | - log.debug('json2ascii: with: {0} (len {1})' | |
| 2056 | - .format(dencoded, len(dencoded))) | |
| 2057 | - return dencoded | |
| 2058 | - elif isinstance(json_obj, bytes): | |
| 2059 | - log.debug('json2ascii: encode unicode: {0}' | |
| 2060 | - .format(json_obj.decode(encoding, errors))) | |
| 2061 | - # cannot put original into logger | |
| 2062 | - # print 'original: ' json_obj | |
| 2063 | - return json_obj.decode(encoding, errors) | |
| 2064 | - elif isinstance(json_obj, dict): | |
| 2065 | - for key in json_obj: | |
| 2066 | - json_obj[key] = json2ascii(json_obj[key]) | |
| 2067 | - elif isinstance(json_obj, (list,tuple)): | |
| 2068 | - for item in json_obj: | |
| 2069 | - item = json2ascii(item) | |
| 2070 | - else: | |
| 2071 | - log.debug('unexpected type in json2ascii: {0} -- leave as is' | |
| 2072 | - .format(type(json_obj))) | |
| 2073 | - return json_obj | |
| 2074 | - | |
| 2075 | - | |
| 2076 | -def print_json(json_dict=None, _json_is_first=False, _json_is_last=False, | |
| 2077 | - **json_parts): | |
| 2078 | - """ line-wise print of json.dumps(json2ascii(..)) with options and indent+1 | |
| 2079 | - | |
| 2080 | - can use in two ways: | |
| 2081 | - (1) print_json(some_dict) | |
| 2082 | - (2) print_json(key1=value1, key2=value2, ...) | |
| 2083 | - | |
| 2084 | - :param bool _json_is_first: set to True only for very first entry to complete | |
| 2085 | - the top-level json-list | |
| 2086 | - :param bool _json_is_last: set to True only for very last entry to complete | |
| 2087 | - the top-level json-list | |
| 2088 | - """ | |
| 2089 | - if json_dict and json_parts: | |
| 2090 | - raise ValueError('Invalid json argument: want either single dict or ' | |
| 2091 | - 'key=value parts but got both)') | |
| 2092 | - elif (json_dict is not None) and (not isinstance(json_dict, dict)): | |
| 2093 | - raise ValueError('Invalid json argument: want either single dict or ' | |
| 2094 | - 'key=value parts but got {0} instead of dict)' | |
| 2095 | - .format(type(json_dict))) | |
| 2096 | - if json_parts: | |
| 2097 | - json_dict = json_parts | |
| 2098 | - | |
| 2099 | - if _json_is_first: | |
| 2100 | - print('[') | |
| 2101 | - | |
| 2102 | - lines = json.dumps(json2ascii(json_dict), check_circular=False, | |
| 2103 | - indent=4, ensure_ascii=False).splitlines() | |
| 2104 | - for line in lines[:-1]: | |
| 2105 | - print(' {0}'.format(line)) | |
| 2106 | - if _json_is_last: | |
| 2107 | - print(' {0}'.format(lines[-1])) # print last line without comma | |
| 2108 | - print(']') | |
| 2109 | - else: | |
| 2110 | - print(' {0},'.format(lines[-1])) # print last line with comma | |
| 2111 | - | |
| 2112 | - | |
| 2113 | -class VBA_Scanner(object): | |
| 2114 | - """ | |
| 2115 | - Class to scan the source code of a VBA module to find obfuscated strings, | |
| 2116 | - suspicious keywords, IOCs, auto-executable macros, etc. | |
| 2117 | - """ | |
| 2118 | - | |
| 2119 | - def __init__(self, vba_code): | |
| 2120 | - """ | |
| 2121 | - VBA_Scanner constructor | |
| 2122 | - | |
| 2123 | - :param vba_code: str, VBA source code to be analyzed | |
| 2124 | - """ | |
| 2125 | - if isinstance(vba_code, bytes): | |
| 2126 | - vba_code = vba_code.decode('utf-8', 'backslashreplace') | |
| 2127 | - # join long lines ending with " _": | |
| 2128 | - self.code = vba_collapse_long_lines(vba_code) | |
| 2129 | - self.code_hex = '' | |
| 2130 | - self.code_hex_rev = '' | |
| 2131 | - self.code_rev_hex = '' | |
| 2132 | - self.code_base64 = '' | |
| 2133 | - self.code_dridex = '' | |
| 2134 | - self.code_vba = '' | |
| 2135 | - self.strReverse = None | |
| 2136 | - # results = None before scanning, then a list of tuples after scanning | |
| 2137 | - self.results = None | |
| 2138 | - self.autoexec_keywords = None | |
| 2139 | - self.suspicious_keywords = None | |
| 2140 | - self.iocs = None | |
| 2141 | - self.hex_strings = None | |
| 2142 | - self.base64_strings = None | |
| 2143 | - self.dridex_strings = None | |
| 2144 | - self.vba_strings = None | |
| 2145 | - | |
| 2146 | - | |
| 2147 | - def scan(self, include_decoded_strings=False, deobfuscate=False): | |
| 2148 | - """ | |
| 2149 | - Analyze the provided VBA code to detect suspicious keywords, | |
| 2150 | - auto-executable macros, IOC patterns, obfuscation patterns | |
| 2151 | - such as hex-encoded strings. | |
| 2152 | - | |
| 2153 | - :param include_decoded_strings: bool, if True, all encoded strings will be included with their decoded content. | |
| 2154 | - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) | |
| 2155 | - :return: list of tuples (type, keyword, description) | |
| 2156 | - (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String') | |
| 2157 | - """ | |
| 2158 | - # First, detect and extract hex-encoded strings: | |
| 2159 | - self.hex_strings = detect_hex_strings(self.code) | |
| 2160 | - # detect if the code contains StrReverse: | |
| 2161 | - self.strReverse = False | |
| 2162 | - if 'strreverse' in self.code.lower(): self.strReverse = True | |
| 2163 | - # Then append the decoded strings to the VBA code, to detect obfuscated IOCs and keywords: | |
| 2164 | - for encoded, decoded in self.hex_strings: | |
| 2165 | - self.code_hex += '\n' + decoded | |
| 2166 | - # if the code contains "StrReverse", also append the hex strings in reverse order: | |
| 2167 | - if self.strReverse: | |
| 2168 | - # StrReverse after hex decoding: | |
| 2169 | - self.code_hex_rev += '\n' + decoded[::-1] | |
| 2170 | - # StrReverse before hex decoding: | |
| 2171 | - self.code_rev_hex += '\n' + str(binascii.unhexlify(encoded[::-1])) | |
| 2172 | - #example: https://malwr.com/analysis/NmFlMGI4YTY1YzYyNDkwNTg1ZTBiZmY5OGI3YjlhYzU/ | |
| 2173 | - #TODO: also append the full code reversed if StrReverse? (risk of false positives?) | |
| 2174 | - # Detect Base64-encoded strings | |
| 2175 | - self.base64_strings = detect_base64_strings(self.code) | |
| 2176 | - for encoded, decoded in self.base64_strings: | |
| 2177 | - self.code_base64 += '\n' + decoded | |
| 2178 | - # Detect Dridex-encoded strings | |
| 2179 | - self.dridex_strings = detect_dridex_strings(self.code) | |
| 2180 | - for encoded, decoded in self.dridex_strings: | |
| 2181 | - self.code_dridex += '\n' + decoded | |
| 2182 | - # Detect obfuscated strings in VBA expressions | |
| 2183 | - if deobfuscate: | |
| 2184 | - self.vba_strings = detect_vba_strings(self.code) | |
| 2185 | - else: | |
| 2186 | - self.vba_strings = [] | |
| 2187 | - for encoded, decoded in self.vba_strings: | |
| 2188 | - self.code_vba += '\n' + decoded | |
| 2189 | - results = [] | |
| 2190 | - self.autoexec_keywords = [] | |
| 2191 | - self.suspicious_keywords = [] | |
| 2192 | - self.iocs = [] | |
| 2193 | - | |
| 2194 | - for code, obfuscation in ( | |
| 2195 | - (self.code, None), | |
| 2196 | - (self.code_hex, 'Hex'), | |
| 2197 | - (self.code_hex_rev, 'Hex+StrReverse'), | |
| 2198 | - (self.code_rev_hex, 'StrReverse+Hex'), | |
| 2199 | - (self.code_base64, 'Base64'), | |
| 2200 | - (self.code_dridex, 'Dridex'), | |
| 2201 | - (self.code_vba, 'VBA expression'), | |
| 2202 | - ): | |
| 2203 | - if isinstance(code,bytes): | |
| 2204 | - code=code.decode('utf-8','backslashreplace') | |
| 2205 | - self.autoexec_keywords += detect_autoexec(code, obfuscation) | |
| 2206 | - self.suspicious_keywords += detect_suspicious(code, obfuscation) | |
| 2207 | - self.iocs += detect_patterns(code, obfuscation) | |
| 2208 | - | |
| 2209 | - # If hex-encoded strings were discovered, add an item to suspicious keywords: | |
| 2210 | - if self.hex_strings: | |
| 2211 | - self.suspicious_keywords.append(('Hex Strings', | |
| 2212 | - 'Hex-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)')) | |
| 2213 | - if self.base64_strings: | |
| 2214 | - self.suspicious_keywords.append(('Base64 Strings', | |
| 2215 | - 'Base64-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)')) | |
| 2216 | - if self.dridex_strings: | |
| 2217 | - self.suspicious_keywords.append(('Dridex Strings', | |
| 2218 | - 'Dridex-encoded strings were detected, may be used to obfuscate strings (option --decode to see all)')) | |
| 2219 | - if self.vba_strings: | |
| 2220 | - self.suspicious_keywords.append(('VBA obfuscated Strings', | |
| 2221 | - 'VBA string expressions were detected, may be used to obfuscate strings (option --decode to see all)')) | |
| 2222 | - # use a set to avoid duplicate keywords | |
| 2223 | - keyword_set = set() | |
| 2224 | - for keyword, description in self.autoexec_keywords: | |
| 2225 | - if keyword not in keyword_set: | |
| 2226 | - results.append(('AutoExec', keyword, description)) | |
| 2227 | - keyword_set.add(keyword) | |
| 2228 | - keyword_set = set() | |
| 2229 | - for keyword, description in self.suspicious_keywords: | |
| 2230 | - if keyword not in keyword_set: | |
| 2231 | - results.append(('Suspicious', keyword, description)) | |
| 2232 | - keyword_set.add(keyword) | |
| 2233 | - keyword_set = set() | |
| 2234 | - for pattern_type, value in self.iocs: | |
| 2235 | - if value not in keyword_set: | |
| 2236 | - results.append(('IOC', value, pattern_type)) | |
| 2237 | - keyword_set.add(value) | |
| 2238 | - | |
| 2239 | - # include decoded strings only if they are printable or if --decode option: | |
| 2240 | - for encoded, decoded in self.hex_strings: | |
| 2241 | - if include_decoded_strings or is_printable(decoded): | |
| 2242 | - results.append(('Hex String', decoded, encoded)) | |
| 2243 | - for encoded, decoded in self.base64_strings: | |
| 2244 | - if include_decoded_strings or is_printable(decoded): | |
| 2245 | - results.append(('Base64 String', decoded, encoded)) | |
| 2246 | - for encoded, decoded in self.dridex_strings: | |
| 2247 | - if include_decoded_strings or is_printable(decoded): | |
| 2248 | - results.append(('Dridex string', decoded, encoded)) | |
| 2249 | - for encoded, decoded in self.vba_strings: | |
| 2250 | - if include_decoded_strings or is_printable(decoded): | |
| 2251 | - results.append(('VBA string', decoded, encoded)) | |
| 2252 | - self.results = results | |
| 2253 | - return results | |
| 2254 | - | |
| 2255 | - def scan_summary(self): | |
| 2256 | - """ | |
| 2257 | - Analyze the provided VBA code to detect suspicious keywords, | |
| 2258 | - auto-executable macros, IOC patterns, obfuscation patterns | |
| 2259 | - such as hex-encoded strings. | |
| 2260 | - | |
| 2261 | - :return: tuple with the number of items found for each category: | |
| 2262 | - (autoexec, suspicious, IOCs, hex, base64, dridex, vba) | |
| 2263 | - """ | |
| 2264 | - # avoid scanning the same code twice: | |
| 2265 | - if self.results is None: | |
| 2266 | - self.scan() | |
| 2267 | - return (len(self.autoexec_keywords), len(self.suspicious_keywords), | |
| 2268 | - len(self.iocs), len(self.hex_strings), len(self.base64_strings), | |
| 2269 | - len(self.dridex_strings), len(self.vba_strings)) | |
| 2270 | - | |
| 2271 | - | |
| 2272 | -def scan_vba(vba_code, include_decoded_strings, deobfuscate=False): | |
| 2273 | - """ | |
| 2274 | - Analyze the provided VBA code to detect suspicious keywords, | |
| 2275 | - auto-executable macros, IOC patterns, obfuscation patterns | |
| 2276 | - such as hex-encoded strings. | |
| 2277 | - (shortcut for VBA_Scanner(vba_code).scan()) | |
| 2278 | - | |
| 2279 | - :param vba_code: str, VBA source code to be analyzed | |
| 2280 | - :param include_decoded_strings: bool, if True all encoded strings will be included with their decoded content. | |
| 2281 | - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) | |
| 2282 | - :return: list of tuples (type, keyword, description) | |
| 2283 | - (type = 'AutoExec', 'Suspicious', 'IOC', 'Hex String', 'Base64 String' or 'Dridex String') | |
| 2284 | - """ | |
| 2285 | - return VBA_Scanner(vba_code).scan(include_decoded_strings, deobfuscate) | |
| 2286 | - | |
| 2287 | - | |
| 2288 | -#=== CLASSES ================================================================= | |
| 2289 | - | |
| 2290 | -class VBA_Parser(object): | |
| 2291 | - """ | |
| 2292 | - Class to parse MS Office files, to detect VBA macros and extract VBA source code | |
| 2293 | - Supported file formats: | |
| 2294 | - - Word 97-2003 (.doc, .dot) | |
| 2295 | - - Word 2007+ (.docm, .dotm) | |
| 2296 | - - Word 2003 XML (.xml) | |
| 2297 | - - Word MHT - Single File Web Page / MHTML (.mht) | |
| 2298 | - - Excel 97-2003 (.xls) | |
| 2299 | - - Excel 2007+ (.xlsm, .xlsb) | |
| 2300 | - - PowerPoint 97-2003 (.ppt) | |
| 2301 | - - PowerPoint 2007+ (.pptm, .ppsm) | |
| 2302 | - """ | |
| 2303 | - | |
| 2304 | - def __init__(self, filename, data=None, container=None, relaxed=False): | |
| 2305 | - """ | |
| 2306 | - Constructor for VBA_Parser | |
| 2307 | - | |
| 2308 | - :param filename: filename or path of file to parse, or file-like object | |
| 2309 | - | |
| 2310 | - :param data: None or bytes str, if None the file will be read from disk (or from the file-like object). | |
| 2311 | - If data is provided as a bytes string, it will be parsed as the content of the file in memory, | |
| 2312 | - and not read from disk. Note: files must be read in binary mode, i.e. open(f, 'rb'). | |
| 2313 | - | |
| 2314 | - :param container: str, path and filename of container if the file is within | |
| 2315 | - a zip archive, None otherwise. | |
| 2316 | - | |
| 2317 | - :param relaxed: if True, treat mal-formed documents and missing streams more like MS office: | |
| 2318 | - do nothing; if False (default), raise errors in these cases | |
| 2319 | - | |
| 2320 | - raises a FileOpenError if all attemps to interpret the data header failed | |
| 2321 | - """ | |
| 2322 | - #TODO: filename should only be a string, data should be used for the file-like object | |
| 2323 | - #TODO: filename should be mandatory, optional data is a string or file-like object | |
| 2324 | - #TODO: also support olefile and zipfile as input | |
| 2325 | - if data is None: | |
| 2326 | - # open file from disk: | |
| 2327 | - _file = filename | |
| 2328 | - else: | |
| 2329 | - # file already read in memory, make it a file-like object for zipfile: | |
| 2330 | - _file = BytesIO(data) | |
| 2331 | - #self.file = _file | |
| 2332 | - self.ole_file = None | |
| 2333 | - self.ole_subfiles = [] | |
| 2334 | - self.filename = filename | |
| 2335 | - self.container = container | |
| 2336 | - self.relaxed = relaxed | |
| 2337 | - self.type = None | |
| 2338 | - self.vba_projects = None | |
| 2339 | - self.vba_forms = None | |
| 2340 | - self.contains_macros = None # will be set to True or False by detect_macros | |
| 2341 | - self.vba_code_all_modules = None # to store the source code of all modules | |
| 2342 | - # list of tuples for each module: (subfilename, stream_path, vba_filename, vba_code) | |
| 2343 | - self.modules = None | |
| 2344 | - # Analysis results: list of tuples (type, keyword, description) - See VBA_Scanner | |
| 2345 | - self.analysis_results = None | |
| 2346 | - # statistics for the scan summary and flags | |
| 2347 | - self.nb_macros = 0 | |
| 2348 | - self.nb_autoexec = 0 | |
| 2349 | - self.nb_suspicious = 0 | |
| 2350 | - self.nb_iocs = 0 | |
| 2351 | - self.nb_hexstrings = 0 | |
| 2352 | - self.nb_base64strings = 0 | |
| 2353 | - self.nb_dridexstrings = 0 | |
| 2354 | - self.nb_vbastrings = 0 | |
| 2355 | - | |
| 2356 | - # if filename is None: | |
| 2357 | - # if isinstance(_file, basestring): | |
| 2358 | - # if len(_file) < olefile.MINIMAL_OLEFILE_SIZE: | |
| 2359 | - # self.filename = _file | |
| 2360 | - # else: | |
| 2361 | - # self.filename = '<file in bytes string>' | |
| 2362 | - # else: | |
| 2363 | - # self.filename = '<file-like object>' | |
| 2364 | - if olefile.isOleFile(_file): | |
| 2365 | - # This looks like an OLE file | |
| 2366 | - self.open_ole(_file) | |
| 2367 | - | |
| 2368 | - # check whether file is encrypted (need to do this before try ppt) | |
| 2369 | - log.debug('Check encryption of ole file') | |
| 2370 | - crypt_indicator = oleid.OleID(self.ole_file).check_encrypted() | |
| 2371 | - if crypt_indicator.value: | |
| 2372 | - raise FileIsEncryptedError(filename) | |
| 2373 | - | |
| 2374 | - # if this worked, try whether it is a ppt file (special ole file) | |
| 2375 | - self.open_ppt() | |
| 2376 | - if self.type is None and is_zipfile(_file): | |
| 2377 | - # Zip file, which may be an OpenXML document | |
| 2378 | - self.open_openxml(_file) | |
| 2379 | - if self.type is None: | |
| 2380 | - # read file from disk, check if it is a Word 2003 XML file (WordProcessingML), Excel 2003 XML, | |
| 2381 | - # or a plain text file containing VBA code | |
| 2382 | - if data is None: | |
| 2383 | - with open(filename, 'rb') as file_handle: | |
| 2384 | - data = file_handle.read() | |
| 2385 | - # check if it is a Word 2003 XML file (WordProcessingML): must contain the namespace | |
| 2386 | - if b'http://schemas.microsoft.com/office/word/2003/wordml' in data: | |
| 2387 | - self.open_word2003xml(data) | |
| 2388 | - # check if it is a Word/PowerPoint 2007+ XML file (Flat OPC): must contain the namespace | |
| 2389 | - if b'http://schemas.microsoft.com/office/2006/xmlPackage' in data: | |
| 2390 | - self.open_flatopc(data) | |
| 2391 | - # store a lowercase version for the next tests: | |
| 2392 | - data_lowercase = data.lower() | |
| 2393 | - # check if it is a MHT file (MIME HTML, Word or Excel saved as "Single File Web Page"): | |
| 2394 | - # According to my tests, these files usually start with "MIME-Version: 1.0" on the 1st line | |
| 2395 | - # BUT Word accepts a blank line or other MIME headers inserted before, | |
| 2396 | - # and even whitespaces in between "MIME", "-", "Version" and ":". The version number is ignored. | |
| 2397 | - # And the line is case insensitive. | |
| 2398 | - # so we'll just check the presence of mime, version and multipart anywhere: | |
| 2399 | - if self.type is None and b'mime' in data_lowercase and b'version' in data_lowercase \ | |
| 2400 | - and b'multipart' in data_lowercase: | |
| 2401 | - self.open_mht(data) | |
| 2402 | - #TODO: handle exceptions | |
| 2403 | - #TODO: Excel 2003 XML | |
| 2404 | - # Check whether this is rtf | |
| 2405 | - if rtfobj.is_rtf(data, treat_str_as_data=True): | |
| 2406 | - # Ignore RTF since it contains no macros and methods in here will not find macros | |
| 2407 | - # in embedded objects. run rtfobj and repeat on its output. | |
| 2408 | - msg = '%s is RTF, need to run rtfobj.py and find VBA Macros in its output.' % self.filename | |
| 2409 | - log.info(msg) | |
| 2410 | - raise FileOpenError(msg) | |
| 2411 | - # Check if this is a plain text VBA or VBScript file: | |
| 2412 | - # To avoid scanning binary files, we simply check for some control chars: | |
| 2413 | - if self.type is None and b'\x00' not in data: | |
| 2414 | - self.open_text(data) | |
| 2415 | - if self.type is None: | |
| 2416 | - # At this stage, could not match a known format: | |
| 2417 | - msg = '%s is not a supported file type, cannot extract VBA Macros.' % self.filename | |
| 2418 | - log.info(msg) | |
| 2419 | - raise FileOpenError(msg) | |
| 2420 | - | |
| 2421 | - def open_ole(self, _file): | |
| 2422 | - """ | |
| 2423 | - Open an OLE file | |
| 2424 | - :param _file: filename or file contents in a file object | |
| 2425 | - :return: nothing | |
| 2426 | - """ | |
| 2427 | - log.info('Opening OLE file %s' % self.filename) | |
| 2428 | - try: | |
| 2429 | - # Open and parse the OLE file, using unicode for path names: | |
| 2430 | - self.ole_file = olefile.OleFileIO(_file, path_encoding=None) | |
| 2431 | - # set type only if parsing succeeds | |
| 2432 | - self.type = TYPE_OLE | |
| 2433 | - except (IOError, TypeError, ValueError) as exc: | |
| 2434 | - # TODO: handle OLE parsing exceptions | |
| 2435 | - log.info('Failed OLE parsing for file %r (%s)' % (self.filename, exc)) | |
| 2436 | - log.debug('Trace:', exc_info=True) | |
| 2437 | - | |
| 2438 | - | |
| 2439 | - def open_openxml(self, _file): | |
| 2440 | - """ | |
| 2441 | - Open an OpenXML file | |
| 2442 | - :param _file: filename or file contents in a file object | |
| 2443 | - :return: nothing | |
| 2444 | - """ | |
| 2445 | - # This looks like a zip file, need to look for vbaProject.bin inside | |
| 2446 | - # It can be any OLE file inside the archive | |
| 2447 | - #...because vbaProject.bin can be renamed: | |
| 2448 | - # see http://www.decalage.info/files/JCV07_Lagadec_OpenDocument_OpenXML_v4_decalage.pdf#page=18 | |
| 2449 | - log.info('Opening ZIP/OpenXML file %s' % self.filename) | |
| 2450 | - try: | |
| 2451 | - z = zipfile.ZipFile(_file) | |
| 2452 | - #TODO: check if this is actually an OpenXML file | |
| 2453 | - #TODO: if the zip file is encrypted, suggest to use the -z option, or try '-z infected' automatically | |
| 2454 | - # check each file within the zip if it is an OLE file, by reading its magic: | |
| 2455 | - for subfile in z.namelist(): | |
| 2456 | - with z.open(subfile) as file_handle: | |
| 2457 | - magic = file_handle.read(len(olefile.MAGIC)) | |
| 2458 | - if magic == olefile.MAGIC: | |
| 2459 | - log.debug('Opening OLE file %s within zip' % subfile) | |
| 2460 | - with z.open(subfile) as file_handle: | |
| 2461 | - ole_data = file_handle.read() | |
| 2462 | - try: | |
| 2463 | - self.ole_subfiles.append( | |
| 2464 | - VBA_Parser(filename=subfile, data=ole_data, | |
| 2465 | - relaxed=self.relaxed)) | |
| 2466 | - except OlevbaBaseException as exc: | |
| 2467 | - if self.relaxed: | |
| 2468 | - log.info('%s is not a valid OLE file (%s)' % (subfile, exc)) | |
| 2469 | - log.debug('Trace:', exc_info=True) | |
| 2470 | - continue | |
| 2471 | - else: | |
| 2472 | - raise SubstreamOpenError(self.filename, subfile, | |
| 2473 | - exc) | |
| 2474 | - z.close() | |
| 2475 | - # set type only if parsing succeeds | |
| 2476 | - self.type = TYPE_OpenXML | |
| 2477 | - except OlevbaBaseException as exc: | |
| 2478 | - if self.relaxed: | |
| 2479 | - log.info('Error {0} caught in Zip/OpenXML parsing for file {1}' | |
| 2480 | - .format(exc, self.filename)) | |
| 2481 | - log.debug('Trace:', exc_info=True) | |
| 2482 | - else: | |
| 2483 | - raise | |
| 2484 | - except (RuntimeError, zipfile.BadZipfile, zipfile.LargeZipFile, IOError) as exc: | |
| 2485 | - # TODO: handle parsing exceptions | |
| 2486 | - log.info('Failed Zip/OpenXML parsing for file %r (%s)' | |
| 2487 | - % (self.filename, exc)) | |
| 2488 | - log.debug('Trace:', exc_info=True) | |
| 2489 | - | |
| 2490 | - def open_word2003xml(self, data): | |
| 2491 | - """ | |
| 2492 | - Open a Word 2003 XML file | |
| 2493 | - :param data: file contents in a string or bytes | |
| 2494 | - :return: nothing | |
| 2495 | - """ | |
| 2496 | - log.info('Opening Word 2003 XML file %s' % self.filename) | |
| 2497 | - try: | |
| 2498 | - # parse the XML content | |
| 2499 | - # TODO: handle XML parsing exceptions | |
| 2500 | - et = ET.fromstring(data) | |
| 2501 | - # find all the binData elements: | |
| 2502 | - for bindata in et.getiterator(TAG_BINDATA): | |
| 2503 | - # the binData content is an OLE container for the VBA project, compressed | |
| 2504 | - # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded. | |
| 2505 | - # get the filename: | |
| 2506 | - fname = bindata.get(ATTR_NAME, 'noname.mso') | |
| 2507 | - # decode the base64 activemime | |
| 2508 | - mso_data = binascii.a2b_base64(bindata.text) | |
| 2509 | - if is_mso_file(mso_data): | |
| 2510 | - # decompress the zlib data stored in the MSO file, which is the OLE container: | |
| 2511 | - # TODO: handle different offsets => separate function | |
| 2512 | - try: | |
| 2513 | - ole_data = mso_file_extract(mso_data) | |
| 2514 | - self.ole_subfiles.append( | |
| 2515 | - VBA_Parser(filename=fname, data=ole_data, | |
| 2516 | - relaxed=self.relaxed)) | |
| 2517 | - except OlevbaBaseException as exc: | |
| 2518 | - if self.relaxed: | |
| 2519 | - log.info('Error parsing subfile {0}: {1}' | |
| 2520 | - .format(fname, exc)) | |
| 2521 | - log.debug('Trace:', exc_info=True) | |
| 2522 | - else: | |
| 2523 | - raise SubstreamOpenError(self.filename, fname, exc) | |
| 2524 | - else: | |
| 2525 | - log.info('%s is not a valid MSO file' % fname) | |
| 2526 | - # set type only if parsing succeeds | |
| 2527 | - self.type = TYPE_Word2003_XML | |
| 2528 | - except OlevbaBaseException as exc: | |
| 2529 | - if self.relaxed: | |
| 2530 | - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc)) | |
| 2531 | - log.debug('Trace:', exc_info=True) | |
| 2532 | - else: | |
| 2533 | - raise | |
| 2534 | - except Exception as exc: | |
| 2535 | - # TODO: differentiate exceptions for each parsing stage | |
| 2536 | - # (but ET is different libs, no good exception description in API) | |
| 2537 | - # found: XMLSyntaxError | |
| 2538 | - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc)) | |
| 2539 | - log.debug('Trace:', exc_info=True) | |
| 2540 | - | |
| 2541 | - def open_flatopc(self, data): | |
| 2542 | - """ | |
| 2543 | - Open a Word or PowerPoint 2007+ XML file, aka "Flat OPC" | |
| 2544 | - :param data: file contents in a string or bytes | |
| 2545 | - :return: nothing | |
| 2546 | - """ | |
| 2547 | - log.info('Opening Flat OPC Word/PowerPoint XML file %s' % self.filename) | |
| 2548 | - try: | |
| 2549 | - # parse the XML content | |
| 2550 | - # TODO: handle XML parsing exceptions | |
| 2551 | - et = ET.fromstring(data) | |
| 2552 | - # TODO: check root node namespace and tag | |
| 2553 | - # find all the pkg:part elements: | |
| 2554 | - for pkgpart in et.iter(TAG_PKGPART): | |
| 2555 | - fname = pkgpart.get(ATTR_PKG_NAME, 'unknown') | |
| 2556 | - content_type = pkgpart.get(ATTR_PKG_CONTENTTYPE, 'unknown') | |
| 2557 | - if content_type == CTYPE_VBAPROJECT: | |
| 2558 | - for bindata in pkgpart.iterfind(TAG_PKGBINDATA): | |
| 2559 | - try: | |
| 2560 | - ole_data = binascii.a2b_base64(bindata.text) | |
| 2561 | - self.ole_subfiles.append( | |
| 2562 | - VBA_Parser(filename=fname, data=ole_data, | |
| 2563 | - relaxed=self.relaxed)) | |
| 2564 | - except OlevbaBaseException as exc: | |
| 2565 | - if self.relaxed: | |
| 2566 | - log.info('Error parsing subfile {0}: {1}' | |
| 2567 | - .format(fname, exc)) | |
| 2568 | - log.debug('Trace:', exc_info=True) | |
| 2569 | - else: | |
| 2570 | - raise SubstreamOpenError(self.filename, fname, exc) | |
| 2571 | - # set type only if parsing succeeds | |
| 2572 | - self.type = TYPE_FlatOPC_XML | |
| 2573 | - except OlevbaBaseException as exc: | |
| 2574 | - if self.relaxed: | |
| 2575 | - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc)) | |
| 2576 | - log.debug('Trace:', exc_info=True) | |
| 2577 | - else: | |
| 2578 | - raise | |
| 2579 | - except Exception as exc: | |
| 2580 | - # TODO: differentiate exceptions for each parsing stage | |
| 2581 | - # (but ET is different libs, no good exception description in API) | |
| 2582 | - # found: XMLSyntaxError | |
| 2583 | - log.info('Failed XML parsing for file %r (%s)' % (self.filename, exc)) | |
| 2584 | - log.debug('Trace:', exc_info=True) | |
| 2585 | - | |
| 2586 | - def open_mht(self, data): | |
| 2587 | - """ | |
| 2588 | - Open a MHTML file | |
| 2589 | - :param data: file contents in a string or bytes | |
| 2590 | - :return: nothing | |
| 2591 | - """ | |
| 2592 | - log.info('Opening MHTML file %s' % self.filename) | |
| 2593 | - try: | |
| 2594 | - if isinstance(data,bytes): | |
| 2595 | - data = data.decode('utf8', 'backslashreplace') | |
| 2596 | - # parse the MIME content | |
| 2597 | - # remove any leading whitespace or newline (workaround for issue in email package) | |
| 2598 | - stripped_data = data.lstrip('\r\n\t ') | |
| 2599 | - # strip any junk from the beginning of the file | |
| 2600 | - # (issue #31 fix by Greg C - gdigreg) | |
| 2601 | - # TODO: improve keywords to avoid false positives | |
| 2602 | - mime_offset = stripped_data.find('MIME') | |
| 2603 | - content_offset = stripped_data.find('Content') | |
| 2604 | - # if "MIME" is found, and located before "Content": | |
| 2605 | - if -1 < mime_offset <= content_offset: | |
| 2606 | - stripped_data = stripped_data[mime_offset:] | |
| 2607 | - # else if "Content" is found, and before "MIME" | |
| 2608 | - # TODO: can it work without "MIME" at all? | |
| 2609 | - elif content_offset > -1: | |
| 2610 | - stripped_data = stripped_data[content_offset:] | |
| 2611 | - # TODO: quick and dirty fix: insert a standard line with MIME-Version header? | |
| 2612 | - mhtml = email.message_from_string(stripped_data) | |
| 2613 | - # find all the attached files: | |
| 2614 | - for part in mhtml.walk(): | |
| 2615 | - content_type = part.get_content_type() # always returns a value | |
| 2616 | - fname = part.get_filename(None) # returns None if it fails | |
| 2617 | - # TODO: get content-location if no filename | |
| 2618 | - log.debug('MHTML part: filename=%r, content-type=%r' % (fname, content_type)) | |
| 2619 | - part_data = part.get_payload(decode=True) | |
| 2620 | - # VBA macros are stored in a binary file named "editdata.mso". | |
| 2621 | - # the data content is an OLE container for the VBA project, compressed | |
| 2622 | - # using the ActiveMime/MSO format (zlib-compressed), and Base64 encoded. | |
| 2623 | - # decompress the zlib data starting at offset 0x32, which is the OLE container: | |
| 2624 | - # check ActiveMime header: | |
| 2625 | - | |
| 2626 | - if (isinstance(part_data, str) or isinstance(part_data, bytes)) and is_mso_file(part_data): | |
| 2627 | - log.debug('Found ActiveMime header, decompressing MSO container') | |
| 2628 | - try: | |
| 2629 | - ole_data = mso_file_extract(part_data) | |
| 2630 | - | |
| 2631 | - # TODO: check if it is actually an OLE file | |
| 2632 | - # TODO: get the MSO filename from content_location? | |
| 2633 | - self.ole_subfiles.append( | |
| 2634 | - VBA_Parser(filename=fname, data=ole_data, | |
| 2635 | - relaxed=self.relaxed)) | |
| 2636 | - except OlevbaBaseException as exc: | |
| 2637 | - if self.relaxed: | |
| 2638 | - log.info('%s does not contain a valid OLE file (%s)' | |
| 2639 | - % (fname, exc)) | |
| 2640 | - log.debug('Trace:', exc_info=True) | |
| 2641 | - # TODO: bug here - need to split in smaller functions/classes? | |
| 2642 | - else: | |
| 2643 | - raise SubstreamOpenError(self.filename, fname, exc) | |
| 2644 | - else: | |
| 2645 | - log.debug('type(part_data) = %s' % type(part_data)) | |
| 2646 | - try: | |
| 2647 | - log.debug('part_data[0:20] = %r' % part_data[0:20]) | |
| 2648 | - except TypeError as err: | |
| 2649 | - log.debug('part_data has no __getitem__') | |
| 2650 | - # set type only if parsing succeeds | |
| 2651 | - self.type = TYPE_MHTML | |
| 2652 | - except OlevbaBaseException: | |
| 2653 | - raise | |
| 2654 | - except Exception: | |
| 2655 | - log.info('Failed MIME parsing for file %r - %s' | |
| 2656 | - % (self.filename, MSG_OLEVBA_ISSUES)) | |
| 2657 | - log.debug('Trace:', exc_info=True) | |
| 2658 | - | |
| 2659 | - def open_ppt(self): | |
| 2660 | - """ try to interpret self.ole_file as PowerPoint 97-2003 using PptParser | |
| 2661 | - | |
| 2662 | - Although self.ole_file is a valid olefile.OleFileIO, we set | |
| 2663 | - self.ole_file = None in here and instead set self.ole_subfiles to the | |
| 2664 | - VBA ole streams found within the main ole file. That makes most of the | |
| 2665 | - code below treat this like an OpenXML file and only look at the | |
| 2666 | - ole_subfiles (except find_vba_* which needs to explicitly check for | |
| 2667 | - self.type) | |
| 2668 | - """ | |
| 2669 | - | |
| 2670 | - log.info('Check whether OLE file is PPT') | |
| 2671 | - try: | |
| 2672 | - ppt = ppt_parser.PptParser(self.ole_file, fast_fail=True) | |
| 2673 | - for vba_data in ppt.iter_vba_data(): | |
| 2674 | - self.ole_subfiles.append(VBA_Parser(None, vba_data, | |
| 2675 | - container='PptParser')) | |
| 2676 | - log.info('File is PPT') | |
| 2677 | - self.ole_file.close() # just in case | |
| 2678 | - self.ole_file = None # required to make other methods look at ole_subfiles | |
| 2679 | - self.type = TYPE_PPT | |
| 2680 | - except Exception as exc: | |
| 2681 | - if self.container == 'PptParser': | |
| 2682 | - # this is a subfile of a ppt --> to be expected that is no ppt | |
| 2683 | - log.debug('PPT subfile is not a PPT file') | |
| 2684 | - else: | |
| 2685 | - log.debug("File appears not to be a ppt file (%s)" % exc) | |
| 2686 | - | |
| 2687 | - | |
| 2688 | - def open_text(self, data): | |
| 2689 | - """ | |
| 2690 | - Open a text file containing VBA or VBScript source code | |
| 2691 | - :param data: file contents in a string or bytes | |
| 2692 | - :return: nothing | |
| 2693 | - """ | |
| 2694 | - log.info('Opening text file %s' % self.filename) | |
| 2695 | - # directly store the source code: | |
| 2696 | - if isinstance(data,bytes): | |
| 2697 | - data=data.decode('utf8','backslashreplace') | |
| 2698 | - self.vba_code_all_modules = data | |
| 2699 | - self.contains_macros = True | |
| 2700 | - # set type only if parsing succeeds | |
| 2701 | - self.type = TYPE_TEXT | |
| 2702 | - | |
| 2703 | - | |
| 2704 | - def find_vba_projects(self): | |
| 2705 | - """ | |
| 2706 | - Finds all the VBA projects stored in an OLE file. | |
| 2707 | - | |
| 2708 | - Return None if the file is not OLE but OpenXML. | |
| 2709 | - Return a list of tuples (vba_root, project_path, dir_path) for each VBA project. | |
| 2710 | - vba_root is the path of the root OLE storage containing the VBA project, | |
| 2711 | - including a trailing slash unless it is the root of the OLE file. | |
| 2712 | - project_path is the path of the OLE stream named "PROJECT" within the VBA project. | |
| 2713 | - dir_path is the path of the OLE stream named "VBA/dir" within the VBA project. | |
| 2714 | - | |
| 2715 | - If this function returns an empty list for one of the supported formats | |
| 2716 | - (i.e. Word, Excel, Powerpoint), then the file does not contain VBA macros. | |
| 2717 | - | |
| 2718 | - :return: None if OpenXML file, list of tuples (vba_root, project_path, dir_path) | |
| 2719 | - for each VBA project found if OLE file | |
| 2720 | - """ | |
| 2721 | - log.debug('VBA_Parser.find_vba_projects') | |
| 2722 | - | |
| 2723 | - # if the file is not OLE but OpenXML, return None: | |
| 2724 | - if self.ole_file is None and self.type != TYPE_PPT: | |
| 2725 | - return None | |
| 2726 | - | |
| 2727 | - # if this method has already been called, return previous result: | |
| 2728 | - if self.vba_projects is not None: | |
| 2729 | - return self.vba_projects | |
| 2730 | - | |
| 2731 | - # if this is a ppt file (PowerPoint 97-2003): | |
| 2732 | - # self.ole_file is None but the ole_subfiles do contain vba_projects | |
| 2733 | - # (like for OpenXML files). | |
| 2734 | - if self.type == TYPE_PPT: | |
| 2735 | - # TODO: so far, this function is never called for PPT files, but | |
| 2736 | - # if that happens, the information is lost which ole file contains | |
| 2737 | - # which storage! | |
| 2738 | - log.warning('Returned info is not complete for PPT types!') | |
| 2739 | - self.vba_projects = [] | |
| 2740 | - for subfile in self.ole_subfiles: | |
| 2741 | - self.vba_projects.extend(subfile.find_vba_projects()) | |
| 2742 | - return self.vba_projects | |
| 2743 | - | |
| 2744 | - # Find the VBA project root (different in MS Word, Excel, etc): | |
| 2745 | - # - Word 97-2003: Macros | |
| 2746 | - # - Excel 97-2003: _VBA_PROJECT_CUR | |
| 2747 | - # - PowerPoint 97-2003: PptParser has identified ole_subfiles | |
| 2748 | - # - Word 2007+: word/vbaProject.bin in zip archive, then the VBA project is the root of vbaProject.bin. | |
| 2749 | - # - Excel 2007+: xl/vbaProject.bin in zip archive, then same as Word | |
| 2750 | - # - PowerPoint 2007+: ppt/vbaProject.bin in zip archive, then same as Word | |
| 2751 | - # - Visio 2007: not supported yet (different file structure) | |
| 2752 | - | |
| 2753 | - # According to MS-OVBA section 2.2.1: | |
| 2754 | - # - the VBA project root storage MUST contain a VBA storage and a PROJECT stream | |
| 2755 | - # - The root/VBA storage MUST contain a _VBA_PROJECT stream and a dir stream | |
| 2756 | - # - all names are case-insensitive | |
| 2757 | - | |
| 2758 | - def check_vba_stream(ole, vba_root, stream_path): | |
| 2759 | - full_path = vba_root + stream_path | |
| 2760 | - if ole.exists(full_path) and ole.get_type(full_path) == olefile.STGTY_STREAM: | |
| 2761 | - log.debug('Found %s stream: %s' % (stream_path, full_path)) | |
| 2762 | - return full_path | |
| 2763 | - else: | |
| 2764 | - log.debug('Missing %s stream, this is not a valid VBA project structure' % stream_path) | |
| 2765 | - return False | |
| 2766 | - | |
| 2767 | - # start with an empty list: | |
| 2768 | - self.vba_projects = [] | |
| 2769 | - # Look for any storage containing those storage/streams: | |
| 2770 | - ole = self.ole_file | |
| 2771 | - for storage in ole.listdir(streams=False, storages=True): | |
| 2772 | - log.debug('Checking storage %r' % storage) | |
| 2773 | - # Look for a storage ending with "VBA": | |
| 2774 | - if storage[-1].upper() == 'VBA': | |
| 2775 | - log.debug('Found VBA storage: %s' % ('/'.join(storage))) | |
| 2776 | - vba_root = '/'.join(storage[:-1]) | |
| 2777 | - # Add a trailing slash to vba_root, unless it is the root of the OLE file: | |
| 2778 | - # (used later to append all the child streams/storages) | |
| 2779 | - if vba_root != '': | |
| 2780 | - vba_root += '/' | |
| 2781 | - log.debug('Checking vba_root="%s"' % vba_root) | |
| 2782 | - | |
| 2783 | - # Check if the VBA root storage also contains a PROJECT stream: | |
| 2784 | - project_path = check_vba_stream(ole, vba_root, 'PROJECT') | |
| 2785 | - if not project_path: continue | |
| 2786 | - # Check if the VBA root storage also contains a VBA/_VBA_PROJECT stream: | |
| 2787 | - vba_project_path = check_vba_stream(ole, vba_root, 'VBA/_VBA_PROJECT') | |
| 2788 | - if not vba_project_path: continue | |
| 2789 | - # Check if the VBA root storage also contains a VBA/dir stream: | |
| 2790 | - dir_path = check_vba_stream(ole, vba_root, 'VBA/dir') | |
| 2791 | - if not dir_path: continue | |
| 2792 | - # Now we are pretty sure it is a VBA project structure | |
| 2793 | - log.debug('VBA root storage: "%s"' % vba_root) | |
| 2794 | - # append the results to the list as a tuple for later use: | |
| 2795 | - self.vba_projects.append((vba_root, project_path, dir_path)) | |
| 2796 | - return self.vba_projects | |
| 2797 | - | |
| 2798 | - def detect_vba_macros(self): | |
| 2799 | - """ | |
| 2800 | - Detect the potential presence of VBA macros in the file, by checking | |
| 2801 | - if it contains VBA projects. Both OLE and OpenXML files are supported. | |
| 2802 | - | |
| 2803 | - Important: for now, results are accurate only for Word, Excel and PowerPoint | |
| 2804 | - | |
| 2805 | - Note: this method does NOT attempt to check the actual presence or validity | |
| 2806 | - of VBA macro source code, so there might be false positives. | |
| 2807 | - It may also detect VBA macros in files embedded within the main file, | |
| 2808 | - for example an Excel workbook with macros embedded into a Word | |
| 2809 | - document without macros may be detected, without distinction. | |
| 2810 | - | |
| 2811 | - :return: bool, True if at least one VBA project has been found, False otherwise | |
| 2812 | - """ | |
| 2813 | - #TODO: return None or raise exception if format not supported | |
| 2814 | - #TODO: return the number of VBA projects found instead of True/False? | |
| 2815 | - # if this method was already called, return the previous result: | |
| 2816 | - if self.contains_macros is not None: | |
| 2817 | - return self.contains_macros | |
| 2818 | - # if OpenXML/PPT, check all the OLE subfiles: | |
| 2819 | - if self.ole_file is None: | |
| 2820 | - for ole_subfile in self.ole_subfiles: | |
| 2821 | - if ole_subfile.detect_vba_macros(): | |
| 2822 | - self.contains_macros = True | |
| 2823 | - return True | |
| 2824 | - # otherwise, no macro found: | |
| 2825 | - self.contains_macros = False | |
| 2826 | - return False | |
| 2827 | - # otherwise it's an OLE file, find VBA projects: | |
| 2828 | - vba_projects = self.find_vba_projects() | |
| 2829 | - if len(vba_projects) == 0: | |
| 2830 | - self.contains_macros = False | |
| 2831 | - else: | |
| 2832 | - self.contains_macros = True | |
| 2833 | - # Also look for VBA code in any stream including orphans | |
| 2834 | - # (happens in some malformed files) | |
| 2835 | - ole = self.ole_file | |
| 2836 | - for sid in xrange(len(ole.direntries)): | |
| 2837 | - # check if id is already done above: | |
| 2838 | - log.debug('Checking DirEntry #%d' % sid) | |
| 2839 | - d = ole.direntries[sid] | |
| 2840 | - if d is None: | |
| 2841 | - # this direntry is not part of the tree: either unused or an orphan | |
| 2842 | - d = ole._load_direntry(sid) | |
| 2843 | - log.debug('This DirEntry is an orphan or unused') | |
| 2844 | - if d.entry_type == olefile.STGTY_STREAM: | |
| 2845 | - # read data | |
| 2846 | - log.debug('Reading data from stream %r - size: %d bytes' % (d.name, d.size)) | |
| 2847 | - try: | |
| 2848 | - data = ole._open(d.isectStart, d.size).read() | |
| 2849 | - log.debug('Read %d bytes' % len(data)) | |
| 2850 | - if len(data) > 200: | |
| 2851 | - log.debug('%r...[much more data]...%r' % (data[:100], data[-50:])) | |
| 2852 | - else: | |
| 2853 | - log.debug(repr(data)) | |
| 2854 | - if 'Attribut\x00' in data.decode('utf-8', 'ignore'): | |
| 2855 | - log.debug('Found VBA compressed code') | |
| 2856 | - self.contains_macros = True | |
| 2857 | - except IOError as exc: | |
| 2858 | - if self.relaxed: | |
| 2859 | - log.info('Error when reading OLE Stream %r' % d.name) | |
| 2860 | - log.debug('Trace:', exc_trace=True) | |
| 2861 | - else: | |
| 2862 | - raise SubstreamOpenError(self.filename, d.name, exc) | |
| 2863 | - return self.contains_macros | |
| 2864 | - | |
| 2865 | - def extract_macros(self): | |
| 2866 | - """ | |
| 2867 | - Extract and decompress source code for each VBA macro found in the file | |
| 2868 | - | |
| 2869 | - Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found | |
| 2870 | - If the file is OLE, filename is the path of the file. | |
| 2871 | - If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros | |
| 2872 | - within the zip archive, e.g. word/vbaProject.bin. | |
| 2873 | - If the file is PPT, result is as for OpenXML but filename is useless | |
| 2874 | - """ | |
| 2875 | - log.debug('extract_macros:') | |
| 2876 | - if self.ole_file is None: | |
| 2877 | - # This may be either an OpenXML/PPT or a text file: | |
| 2878 | - if self.type == TYPE_TEXT: | |
| 2879 | - # This is a text file, yield the full code: | |
| 2880 | - yield (self.filename, '', self.filename, self.vba_code_all_modules) | |
| 2881 | - else: | |
| 2882 | - # OpenXML/PPT: recursively yield results from each OLE subfile: | |
| 2883 | - for ole_subfile in self.ole_subfiles: | |
| 2884 | - for results in ole_subfile.extract_macros(): | |
| 2885 | - yield results | |
| 2886 | - else: | |
| 2887 | - # This is an OLE file: | |
| 2888 | - self.find_vba_projects() | |
| 2889 | - # set of stream ids | |
| 2890 | - vba_stream_ids = set() | |
| 2891 | - for vba_root, project_path, dir_path in self.vba_projects: | |
| 2892 | - # extract all VBA macros from that VBA root storage: | |
| 2893 | - # The function _extract_vba may fail on some files (issue #132) | |
| 2894 | - try: | |
| 2895 | - for stream_path, vba_filename, vba_code in \ | |
| 2896 | - _extract_vba(self.ole_file, vba_root, project_path, | |
| 2897 | - dir_path, self.relaxed): | |
| 2898 | - # store direntry ids in a set: | |
| 2899 | - vba_stream_ids.add(self.ole_file._find(stream_path)) | |
| 2900 | - yield (self.filename, stream_path, vba_filename, vba_code) | |
| 2901 | - except Exception as e: | |
| 2902 | - log.exception('Error in _extract_vba') | |
| 2903 | - # Also look for VBA code in any stream including orphans | |
| 2904 | - # (happens in some malformed files) | |
| 2905 | - ole = self.ole_file | |
| 2906 | - for sid in xrange(len(ole.direntries)): | |
| 2907 | - # check if id is already done above: | |
| 2908 | - log.debug('Checking DirEntry #%d' % sid) | |
| 2909 | - if sid in vba_stream_ids: | |
| 2910 | - log.debug('Already extracted') | |
| 2911 | - continue | |
| 2912 | - d = ole.direntries[sid] | |
| 2913 | - if d is None: | |
| 2914 | - # this direntry is not part of the tree: either unused or an orphan | |
| 2915 | - d = ole._load_direntry(sid) | |
| 2916 | - log.debug('This DirEntry is an orphan or unused') | |
| 2917 | - if d.entry_type == olefile.STGTY_STREAM: | |
| 2918 | - # read data | |
| 2919 | - log.debug('Reading data from stream %r' % d.name) | |
| 2920 | - data = ole._open(d.isectStart, d.size).read() | |
| 2921 | - for match in re.finditer(b'\\x00Attribut[^e]', data, flags=re.IGNORECASE): | |
| 2922 | - start = match.start() - 3 | |
| 2923 | - log.debug('Found VBA compressed code at index %X' % start) | |
| 2924 | - compressed_code = data[start:] | |
| 2925 | - try: | |
| 2926 | - vba_code = decompress_stream(compressed_code) | |
| 2927 | - yield (self.filename, d.name, d.name, vba_code) | |
| 2928 | - except Exception as exc: | |
| 2929 | - # display the exception with full stack trace for debugging | |
| 2930 | - log.debug('Error processing stream %r in file %r (%s)' % (d.name, self.filename, exc)) | |
| 2931 | - log.debug('Traceback:', exc_info=True) | |
| 2932 | - # do not raise the error, as it is unlikely to be a compressed macro stream | |
| 2933 | - | |
| 2934 | - def extract_all_macros(self): | |
| 2935 | - """ | |
| 2936 | - Extract and decompress source code for each VBA macro found in the file | |
| 2937 | - by calling extract_macros(), store the results as a list of tuples | |
| 2938 | - (filename, stream_path, vba_filename, vba_code) in self.modules. | |
| 2939 | - See extract_macros for details. | |
| 2940 | - """ | |
| 2941 | - if self.modules is None: | |
| 2942 | - self.modules = [] | |
| 2943 | - for (subfilename, stream_path, vba_filename, vba_code) in self.extract_macros(): | |
| 2944 | - self.modules.append((subfilename, stream_path, vba_filename, vba_code)) | |
| 2945 | - self.nb_macros = len(self.modules) | |
| 2946 | - return self.modules | |
| 2947 | - | |
| 2948 | - | |
| 2949 | - | |
| 2950 | - def analyze_macros(self, show_decoded_strings=False, deobfuscate=False): | |
| 2951 | - """ | |
| 2952 | - runs extract_macros and analyze the source code of all VBA macros | |
| 2953 | - found in the file. | |
| 2954 | - """ | |
| 2955 | - if self.detect_vba_macros(): | |
| 2956 | - # if the analysis was already done, avoid doing it twice: | |
| 2957 | - if self.analysis_results is not None: | |
| 2958 | - return self.analysis_results | |
| 2959 | - # variable to merge source code from all modules: | |
| 2960 | - if self.vba_code_all_modules is None: | |
| 2961 | - self.vba_code_all_modules = '' | |
| 2962 | - for (_, _, _, vba_code) in self.extract_all_macros(): | |
| 2963 | - #TODO: filter code? (each module) | |
| 2964 | - if isinstance(vba_code, bytes): | |
| 2965 | - vba_code = vba_code.decode('utf-8', 'ignore') | |
| 2966 | - self.vba_code_all_modules += vba_code + '\n' | |
| 2967 | - for (_, _, form_string) in self.extract_form_strings(): | |
| 2968 | - self.vba_code_all_modules += form_string.decode('utf-8', 'ignore') + '\n' | |
| 2969 | - # Analyze the whole code at once: | |
| 2970 | - scanner = VBA_Scanner(self.vba_code_all_modules) | |
| 2971 | - self.analysis_results = scanner.scan(show_decoded_strings, deobfuscate) | |
| 2972 | - autoexec, suspicious, iocs, hexstrings, base64strings, dridex, vbastrings = scanner.scan_summary() | |
| 2973 | - self.nb_autoexec += autoexec | |
| 2974 | - self.nb_suspicious += suspicious | |
| 2975 | - self.nb_iocs += iocs | |
| 2976 | - self.nb_hexstrings += hexstrings | |
| 2977 | - self.nb_base64strings += base64strings | |
| 2978 | - self.nb_dridexstrings += dridex | |
| 2979 | - self.nb_vbastrings += vbastrings | |
| 2980 | - | |
| 2981 | - return self.analysis_results | |
| 2982 | - | |
| 2983 | - | |
| 2984 | - def reveal(self): | |
| 2985 | - # we only want printable strings: | |
| 2986 | - analysis = self.analyze_macros(show_decoded_strings=False) | |
| 2987 | - # to avoid replacing short strings contained into longer strings, we sort the analysis results | |
| 2988 | - # based on the length of the encoded string, in reverse order: | |
| 2989 | - analysis = sorted(analysis, key=lambda type_decoded_encoded: len(type_decoded_encoded[2]), reverse=True) | |
| 2990 | - # normally now self.vba_code_all_modules contains source code from all modules | |
| 2991 | - # Need to collapse long lines: | |
| 2992 | - deobf_code = vba_collapse_long_lines(self.vba_code_all_modules) | |
| 2993 | - deobf_code = filter_vba(deobf_code) | |
| 2994 | - for kw_type, decoded, encoded in analysis: | |
| 2995 | - if kw_type == 'VBA string': | |
| 2996 | - #print '%3d occurences: %r => %r' % (deobf_code.count(encoded), encoded, decoded) | |
| 2997 | - # need to add double quotes around the decoded strings | |
| 2998 | - # after escaping double-quotes as double-double-quotes for VBA: | |
| 2999 | - decoded = decoded.replace('"', '""') | |
| 3000 | - decoded = '"%s"' % decoded | |
| 3001 | - # if the encoded string is enclosed in parentheses, | |
| 3002 | - # keep them in the decoded version: | |
| 3003 | - if encoded.startswith('(') and encoded.endswith(')'): | |
| 3004 | - decoded = '(%s)' % decoded | |
| 3005 | - deobf_code = deobf_code.replace(encoded, decoded) | |
| 3006 | - # # TODO: there is a bug somewhere which creates double returns '\r\r' | |
| 3007 | - # deobf_code = deobf_code.replace('\r\r', '\r') | |
| 3008 | - return deobf_code | |
| 3009 | - #TODO: repasser l'analyse plusieurs fois si des chaines hex ou base64 sont revelees | |
| 3010 | - | |
| 3011 | - | |
| 3012 | - def find_vba_forms(self): | |
| 3013 | - """ | |
| 3014 | - Finds all the VBA forms stored in an OLE file. | |
| 3015 | - | |
| 3016 | - Return None if the file is not OLE but OpenXML. | |
| 3017 | - Return a list of tuples (vba_root, project_path, dir_path) for each VBA project. | |
| 3018 | - vba_root is the path of the root OLE storage containing the VBA project, | |
| 3019 | - including a trailing slash unless it is the root of the OLE file. | |
| 3020 | - project_path is the path of the OLE stream named "PROJECT" within the VBA project. | |
| 3021 | - dir_path is the path of the OLE stream named "VBA/dir" within the VBA project. | |
| 3022 | - | |
| 3023 | - If this function returns an empty list for one of the supported formats | |
| 3024 | - (i.e. Word, Excel, Powerpoint), then the file does not contain VBA forms. | |
| 3025 | - | |
| 3026 | - :return: None if OpenXML file, list of tuples (vba_root, project_path, dir_path) | |
| 3027 | - for each VBA project found if OLE file | |
| 3028 | - """ | |
| 3029 | - log.debug('VBA_Parser.find_vba_forms') | |
| 3030 | - | |
| 3031 | - # if the file is not OLE but OpenXML, return None: | |
| 3032 | - if self.ole_file is None and self.type != TYPE_PPT: | |
| 3033 | - return None | |
| 3034 | - | |
| 3035 | - # if this method has already been called, return previous result: | |
| 3036 | - # if self.vba_projects is not None: | |
| 3037 | - # return self.vba_projects | |
| 3038 | - | |
| 3039 | - # According to MS-OFORMS section 2.1.2 Control Streams: | |
| 3040 | - # - A parent control, that is, a control that can contain embedded controls, | |
| 3041 | - # MUST be persisted as a storage that contains multiple streams. | |
| 3042 | - # - All parent controls MUST contain a FormControl. The FormControl | |
| 3043 | - # properties are persisted to a stream (1) as specified in section 2.1.1.2. | |
| 3044 | - # The name of this stream (1) MUST be "f". | |
| 3045 | - # - Embedded controls that cannot themselves contain other embedded | |
| 3046 | - # controls are persisted sequentially as FormEmbeddedActiveXControls | |
| 3047 | - # to a stream (1) contained in the same storage as the parent control. | |
| 3048 | - # The name of this stream (1) MUST be "o". | |
| 3049 | - # - all names are case-insensitive | |
| 3050 | - | |
| 3051 | - if self.type == TYPE_PPT: | |
| 3052 | - # TODO: so far, this function is never called for PPT files, but | |
| 3053 | - # if that happens, the information is lost which ole file contains | |
| 3054 | - # which storage! | |
| 3055 | - ole_files = self.ole_subfiles | |
| 3056 | - log.warning('Returned info is not complete for PPT types!') | |
| 3057 | - else: | |
| 3058 | - ole_files = [self.ole_file, ] | |
| 3059 | - | |
| 3060 | - # start with an empty list: | |
| 3061 | - self.vba_forms = [] | |
| 3062 | - | |
| 3063 | - # Loop over ole streams | |
| 3064 | - for ole in ole_files: | |
| 3065 | - # Look for any storage containing those storage/streams: | |
| 3066 | - for storage in ole.listdir(streams=False, storages=True): | |
| 3067 | - log.debug('Checking storage %r' % storage) | |
| 3068 | - # Look for two streams named 'o' and 'f': | |
| 3069 | - o_stream = storage + ['o'] | |
| 3070 | - f_stream = storage + ['f'] | |
| 3071 | - log.debug('Checking if streams %r and %r exist' % (f_stream, o_stream)) | |
| 3072 | - if ole.exists(o_stream) and ole.get_type(o_stream) == olefile.STGTY_STREAM \ | |
| 3073 | - and ole.exists(f_stream) and ole.get_type(f_stream) == olefile.STGTY_STREAM: | |
| 3074 | - form_path = '/'.join(storage) | |
| 3075 | - log.debug('Found VBA Form: %r' % form_path) | |
| 3076 | - self.vba_forms.append(storage) | |
| 3077 | - return self.vba_forms | |
| 3078 | - | |
| 3079 | - def extract_form_strings(self): | |
| 3080 | - """ | |
| 3081 | - Extract printable strings from each VBA Form found in the file | |
| 3082 | - | |
| 3083 | - Iterator: yields (filename, stream_path, vba_filename, vba_code) for each VBA macro found | |
| 3084 | - If the file is OLE, filename is the path of the file. | |
| 3085 | - If the file is OpenXML, filename is the path of the OLE subfile containing VBA macros | |
| 3086 | - within the zip archive, e.g. word/vbaProject.bin. | |
| 3087 | - If the file is PPT, result is as for OpenXML but filename is useless | |
| 3088 | - """ | |
| 3089 | - if self.ole_file is None: | |
| 3090 | - # This may be either an OpenXML/PPT or a text file: | |
| 3091 | - if self.type == TYPE_TEXT: | |
| 3092 | - # This is a text file, return no results: | |
| 3093 | - return | |
| 3094 | - else: | |
| 3095 | - # OpenXML/PPT: recursively yield results from each OLE subfile: | |
| 3096 | - for ole_subfile in self.ole_subfiles: | |
| 3097 | - for results in ole_subfile.extract_form_strings(): | |
| 3098 | - yield results | |
| 3099 | - else: | |
| 3100 | - # This is an OLE file: | |
| 3101 | - self.find_vba_forms() | |
| 3102 | - ole = self.ole_file | |
| 3103 | - for form_storage in self.vba_forms: | |
| 3104 | - o_stream = form_storage + ['o'] | |
| 3105 | - log.debug('Opening form object stream %r' % '/'.join(o_stream)) | |
| 3106 | - form_data = ole.openstream(o_stream).read() | |
| 3107 | - # Extract printable strings from the form object stream "o": | |
| 3108 | - for m in re_printable_string.finditer(form_data): | |
| 3109 | - log.debug('Printable string found in form: %r' % m.group()) | |
| 3110 | - yield (self.filename, '/'.join(o_stream), m.group()) | |
| 3111 | - | |
| 3112 | - | |
| 3113 | - def close(self): | |
| 3114 | - """ | |
| 3115 | - Close all the open files. This method must be called after usage, if | |
| 3116 | - the application is opening many files. | |
| 3117 | - """ | |
| 3118 | - if self.ole_file is None: | |
| 3119 | - if self.ole_subfiles is not None: | |
| 3120 | - for ole_subfile in self.ole_subfiles: | |
| 3121 | - ole_subfile.close() | |
| 3122 | - else: | |
| 3123 | - self.ole_file.close() | |
| 3124 | - | |
| 3125 | - | |
| 3126 | - | |
| 3127 | -class VBA_Parser_CLI(VBA_Parser): | |
| 3128 | - """ | |
| 3129 | - VBA parser and analyzer, adding methods for the command line interface | |
| 3130 | - of olevba. (see VBA_Parser) | |
| 3131 | - """ | |
| 3132 | - | |
| 3133 | - def __init__(self, *args, **kwargs): | |
| 3134 | - """ | |
| 3135 | - Constructor for VBA_Parser_CLI. | |
| 3136 | - Calls __init__ from VBA_Parser with all arguments --> see doc there | |
| 3137 | - """ | |
| 3138 | - super(VBA_Parser_CLI, self).__init__(*args, **kwargs) | |
| 3139 | - | |
| 3140 | - | |
| 3141 | - def print_analysis(self, show_decoded_strings=False, deobfuscate=False): | |
| 3142 | - """ | |
| 3143 | - Analyze the provided VBA code, and print the results in a table | |
| 3144 | - | |
| 3145 | - :param vba_code: str, VBA source code to be analyzed | |
| 3146 | - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content. | |
| 3147 | - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) | |
| 3148 | - :return: None | |
| 3149 | - """ | |
| 3150 | - # print a waiting message only if the output is not redirected to a file: | |
| 3151 | - if sys.stdout.isatty(): | |
| 3152 | - print('Analysis...\r', end='') | |
| 3153 | - sys.stdout.flush() | |
| 3154 | - results = self.analyze_macros(show_decoded_strings, deobfuscate) | |
| 3155 | - if results: | |
| 3156 | - t = prettytable.PrettyTable(('Type', 'Keyword', 'Description')) | |
| 3157 | - t.align = 'l' | |
| 3158 | - t.max_width['Type'] = 10 | |
| 3159 | - t.max_width['Keyword'] = 20 | |
| 3160 | - t.max_width['Description'] = 39 | |
| 3161 | - for kw_type, keyword, description in results: | |
| 3162 | - # handle non printable strings: | |
| 3163 | - if not is_printable(keyword): | |
| 3164 | - keyword = repr(keyword) | |
| 3165 | - if not is_printable(description): | |
| 3166 | - description = repr(description) | |
| 3167 | - t.add_row((kw_type, keyword, description)) | |
| 3168 | - print(t) | |
| 3169 | - else: | |
| 3170 | - print('No suspicious keyword or IOC found.') | |
| 3171 | - | |
| 3172 | - def print_analysis_json(self, show_decoded_strings=False, deobfuscate=False): | |
| 3173 | - """ | |
| 3174 | - Analyze the provided VBA code, and return the results in json format | |
| 3175 | - | |
| 3176 | - :param vba_code: str, VBA source code to be analyzed | |
| 3177 | - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content. | |
| 3178 | - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) | |
| 3179 | - | |
| 3180 | - :return: dict | |
| 3181 | - """ | |
| 3182 | - # print a waiting message only if the output is not redirected to a file: | |
| 3183 | - if sys.stdout.isatty(): | |
| 3184 | - print('Analysis...\r', end='') | |
| 3185 | - sys.stdout.flush() | |
| 3186 | - return [dict(type=kw_type, keyword=keyword, description=description) | |
| 3187 | - for kw_type, keyword, description in self.analyze_macros(show_decoded_strings, deobfuscate)] | |
| 3188 | - | |
| 3189 | - def process_file(self, show_decoded_strings=False, | |
| 3190 | - display_code=True, hide_attributes=True, | |
| 3191 | - vba_code_only=False, show_deobfuscated_code=False, | |
| 3192 | - deobfuscate=False): | |
| 3193 | - """ | |
| 3194 | - Process a single file | |
| 3195 | - | |
| 3196 | - :param filename: str, path and filename of file on disk, or within the container. | |
| 3197 | - :param data: bytes, content of the file if it is in a container, None if it is a file on disk. | |
| 3198 | - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content. | |
| 3199 | - :param display_code: bool, if False VBA source code is not displayed (default True) | |
| 3200 | - :param global_analysis: bool, if True all modules are merged for a single analysis (default), | |
| 3201 | - otherwise each module is analyzed separately (old behaviour) | |
| 3202 | - :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default) | |
| 3203 | - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) | |
| 3204 | - """ | |
| 3205 | - #TODO: replace print by writing to a provided output file (sys.stdout by default) | |
| 3206 | - # fix conflicting parameters: | |
| 3207 | - if vba_code_only and not display_code: | |
| 3208 | - display_code = True | |
| 3209 | - if self.container: | |
| 3210 | - display_filename = '%s in %s' % (self.filename, self.container) | |
| 3211 | - else: | |
| 3212 | - display_filename = self.filename | |
| 3213 | - print('=' * 79) | |
| 3214 | - print('FILE: %s' % display_filename) | |
| 3215 | - try: | |
| 3216 | - #TODO: handle olefile errors, when an OLE file is malformed | |
| 3217 | - print('Type: %s'% self.type) | |
| 3218 | - if self.detect_vba_macros(): | |
| 3219 | - #print 'Contains VBA Macros:' | |
| 3220 | - for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros(): | |
| 3221 | - if hide_attributes: | |
| 3222 | - # hide attribute lines: | |
| 3223 | - if isinstance(vba_code,bytes): | |
| 3224 | - vba_code =vba_code.decode('utf-8','backslashreplace') | |
| 3225 | - vba_code_filtered = filter_vba(vba_code) | |
| 3226 | - else: | |
| 3227 | - vba_code_filtered = vba_code | |
| 3228 | - print('-' * 79) | |
| 3229 | - print('VBA MACRO %s ' % vba_filename) | |
| 3230 | - print('in file: %s - OLE stream: %s' % (subfilename, repr(stream_path))) | |
| 3231 | - if display_code: | |
| 3232 | - print('- ' * 39) | |
| 3233 | - # detect empty macros: | |
| 3234 | - if vba_code_filtered.strip() == '': | |
| 3235 | - print('(empty macro)') | |
| 3236 | - else: | |
| 3237 | - print(vba_code_filtered) | |
| 3238 | - for (subfilename, stream_path, form_string) in self.extract_form_strings(): | |
| 3239 | - print('-' * 79) | |
| 3240 | - print('VBA FORM STRING IN %r - OLE stream: %r' % (subfilename, stream_path)) | |
| 3241 | - print('- ' * 39) | |
| 3242 | - print(form_string.decode('utf-8', 'ignore')) | |
| 3243 | - if not vba_code_only: | |
| 3244 | - # analyse the code from all modules at once: | |
| 3245 | - self.print_analysis(show_decoded_strings, deobfuscate) | |
| 3246 | - if show_deobfuscated_code: | |
| 3247 | - print('MACRO SOURCE CODE WITH DEOBFUSCATED VBA STRINGS (EXPERIMENTAL):\n\n') | |
| 3248 | - print(self.reveal()) | |
| 3249 | - else: | |
| 3250 | - print('No VBA macros found.') | |
| 3251 | - except OlevbaBaseException: | |
| 3252 | - raise | |
| 3253 | - except Exception as exc: | |
| 3254 | - # display the exception with full stack trace for debugging | |
| 3255 | - log.info('Error processing file %s (%s)' % (self.filename, exc)) | |
| 3256 | - log.debug('Traceback:', exc_info=True) | |
| 3257 | - raise ProcessingError(self.filename, exc) | |
| 3258 | - print('') | |
| 3259 | - | |
| 3260 | - | |
| 3261 | - def process_file_json(self, show_decoded_strings=False, | |
| 3262 | - display_code=True, hide_attributes=True, | |
| 3263 | - vba_code_only=False, show_deobfuscated_code=False, | |
| 3264 | - deobfuscate=False): | |
| 3265 | - """ | |
| 3266 | - Process a single file | |
| 3267 | - | |
| 3268 | - every "show" or "print" here is to be translated as "add to json" | |
| 3269 | - | |
| 3270 | - :param filename: str, path and filename of file on disk, or within the container. | |
| 3271 | - :param data: bytes, content of the file if it is in a container, None if it is a file on disk. | |
| 3272 | - :param show_decoded_strings: bool, if True hex-encoded strings will be displayed with their decoded content. | |
| 3273 | - :param display_code: bool, if False VBA source code is not displayed (default True) | |
| 3274 | - :param global_analysis: bool, if True all modules are merged for a single analysis (default), | |
| 3275 | - otherwise each module is analyzed separately (old behaviour) | |
| 3276 | - :param hide_attributes: bool, if True the first lines starting with "Attribute VB" are hidden (default) | |
| 3277 | - :param deobfuscate: bool, if True attempt to deobfuscate VBA expressions (slow) | |
| 3278 | - """ | |
| 3279 | - #TODO: fix conflicting parameters (?) | |
| 3280 | - | |
| 3281 | - if vba_code_only and not display_code: | |
| 3282 | - display_code = True | |
| 3283 | - | |
| 3284 | - result = {} | |
| 3285 | - | |
| 3286 | - if self.container: | |
| 3287 | - result['container'] = self.container | |
| 3288 | - else: | |
| 3289 | - result['container'] = None | |
| 3290 | - result['file'] = self.filename | |
| 3291 | - result['json_conversion_successful'] = False | |
| 3292 | - result['analysis'] = None | |
| 3293 | - result['code_deobfuscated'] = None | |
| 3294 | - result['do_deobfuscate'] = deobfuscate | |
| 3295 | - | |
| 3296 | - try: | |
| 3297 | - #TODO: handle olefile errors, when an OLE file is malformed | |
| 3298 | - result['type'] = self.type | |
| 3299 | - macros = [] | |
| 3300 | - if self.detect_vba_macros(): | |
| 3301 | - for (subfilename, stream_path, vba_filename, vba_code) in self.extract_all_macros(): | |
| 3302 | - curr_macro = {} | |
| 3303 | - if isinstance(vba_code, bytes): | |
| 3304 | - vba_code = vba_code.decode('utf-8', 'backslashreplace') | |
| 3305 | - | |
| 3306 | - if hide_attributes: | |
| 3307 | - # hide attribute lines: | |
| 3308 | - vba_code_filtered = filter_vba(vba_code) | |
| 3309 | - else: | |
| 3310 | - vba_code_filtered = vba_code | |
| 3311 | - | |
| 3312 | - curr_macro['vba_filename'] = vba_filename | |
| 3313 | - curr_macro['subfilename'] = subfilename | |
| 3314 | - curr_macro['ole_stream'] = stream_path | |
| 3315 | - if display_code: | |
| 3316 | - curr_macro['code'] = vba_code_filtered.strip() | |
| 3317 | - else: | |
| 3318 | - curr_macro['code'] = None | |
| 3319 | - macros.append(curr_macro) | |
| 3320 | - if not vba_code_only: | |
| 3321 | - # analyse the code from all modules at once: | |
| 3322 | - result['analysis'] = self.print_analysis_json(show_decoded_strings, | |
| 3323 | - deobfuscate) | |
| 3324 | - if show_deobfuscated_code: | |
| 3325 | - result['code_deobfuscated'] = self.reveal() | |
| 3326 | - result['macros'] = macros | |
| 3327 | - result['json_conversion_successful'] = True | |
| 3328 | - except Exception as exc: | |
| 3329 | - # display the exception with full stack trace for debugging | |
| 3330 | - log.info('Error processing file %s (%s)' % (self.filename, exc)) | |
| 3331 | - log.debug('Traceback:', exc_info=True) | |
| 3332 | - raise ProcessingError(self.filename, exc) | |
| 3333 | - | |
| 3334 | - return result | |
| 3335 | - | |
| 3336 | - | |
| 3337 | - def process_file_triage(self, show_decoded_strings=False, deobfuscate=False): | |
| 3338 | - """ | |
| 3339 | - Process a file in triage mode, showing only summary results on one line. | |
| 3340 | - """ | |
| 3341 | - #TODO: replace print by writing to a provided output file (sys.stdout by default) | |
| 3342 | - try: | |
| 3343 | - #TODO: handle olefile errors, when an OLE file is malformed | |
| 3344 | - if self.detect_vba_macros(): | |
| 3345 | - # print a waiting message only if the output is not redirected to a file: | |
| 3346 | - if sys.stdout.isatty(): | |
| 3347 | - print('Analysis...\r', end='') | |
| 3348 | - sys.stdout.flush() | |
| 3349 | - self.analyze_macros(show_decoded_strings=show_decoded_strings, | |
| 3350 | - deobfuscate=deobfuscate) | |
| 3351 | - flags = TYPE2TAG[self.type] | |
| 3352 | - macros = autoexec = suspicious = iocs = hexstrings = base64obf = dridex = vba_obf = '-' | |
| 3353 | - if self.contains_macros: macros = 'M' | |
| 3354 | - if self.nb_autoexec: autoexec = 'A' | |
| 3355 | - if self.nb_suspicious: suspicious = 'S' | |
| 3356 | - if self.nb_iocs: iocs = 'I' | |
| 3357 | - if self.nb_hexstrings: hexstrings = 'H' | |
| 3358 | - if self.nb_base64strings: base64obf = 'B' | |
| 3359 | - if self.nb_dridexstrings: dridex = 'D' | |
| 3360 | - if self.nb_vbastrings: vba_obf = 'V' | |
| 3361 | - flags += '%s%s%s%s%s%s%s%s' % (macros, autoexec, suspicious, iocs, hexstrings, | |
| 3362 | - base64obf, dridex, vba_obf) | |
| 3363 | - | |
| 3364 | - line = '%-12s %s' % (flags, self.filename) | |
| 3365 | - print(line) | |
| 3366 | - | |
| 3367 | - # old table display: | |
| 3368 | - # macros = autoexec = suspicious = iocs = hexstrings = 'no' | |
| 3369 | - # if nb_macros: macros = 'YES:%d' % nb_macros | |
| 3370 | - # if nb_autoexec: autoexec = 'YES:%d' % nb_autoexec | |
| 3371 | - # if nb_suspicious: suspicious = 'YES:%d' % nb_suspicious | |
| 3372 | - # if nb_iocs: iocs = 'YES:%d' % nb_iocs | |
| 3373 | - # if nb_hexstrings: hexstrings = 'YES:%d' % nb_hexstrings | |
| 3374 | - # # 2nd line = info | |
| 3375 | - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % (self.type, macros, autoexec, suspicious, iocs, hexstrings) | |
| 3376 | - except Exception as exc: | |
| 3377 | - # display the exception with full stack trace for debugging only | |
| 3378 | - log.debug('Error processing file %s (%s)' % (self.filename, exc), | |
| 3379 | - exc_info=True) | |
| 3380 | - raise ProcessingError(self.filename, exc) | |
| 3381 | - | |
| 3382 | - | |
| 3383 | - # t = prettytable.PrettyTable(('filename', 'type', 'macros', 'autoexec', 'suspicious', 'ioc', 'hexstrings'), | |
| 3384 | - # header=False, border=False) | |
| 3385 | - # t.align = 'l' | |
| 3386 | - # t.max_width['filename'] = 30 | |
| 3387 | - # t.max_width['type'] = 10 | |
| 3388 | - # t.max_width['macros'] = 6 | |
| 3389 | - # t.max_width['autoexec'] = 6 | |
| 3390 | - # t.max_width['suspicious'] = 6 | |
| 3391 | - # t.max_width['ioc'] = 6 | |
| 3392 | - # t.max_width['hexstrings'] = 6 | |
| 3393 | - # t.add_row((filename, ftype, macros, autoexec, suspicious, iocs, hexstrings)) | |
| 3394 | - # print t | |
| 3395 | - | |
| 3396 | - | |
| 3397 | -#=== MAIN ===================================================================== | |
| 3398 | - | |
| 3399 | -def parse_args(cmd_line_args=None): | |
| 3400 | - """ parse command line arguments (given ones or per default sys.argv) """ | |
| 3401 | - | |
| 3402 | - DEFAULT_LOG_LEVEL = "warning" # Default log level | |
| 3403 | - LOG_LEVELS = { | |
| 3404 | - 'debug': logging.DEBUG, | |
| 3405 | - 'info': logging.INFO, | |
| 3406 | - 'warning': logging.WARNING, | |
| 3407 | - 'error': logging.ERROR, | |
| 3408 | - 'critical': logging.CRITICAL | |
| 3409 | - } | |
| 3410 | - | |
| 3411 | - usage = 'usage: olevba [options] <filename> [filename2 ...]' | |
| 3412 | - parser = optparse.OptionParser(usage=usage) | |
| 3413 | - # parser.add_option('-o', '--outfile', dest='outfile', | |
| 3414 | - # help='output file') | |
| 3415 | - # parser.add_option('-c', '--csv', dest='csv', | |
| 3416 | - # help='export results to a CSV file') | |
| 3417 | - parser.add_option("-r", action="store_true", dest="recursive", | |
| 3418 | - help='find files recursively in subdirectories.') | |
| 3419 | - parser.add_option("-z", "--zip", dest='zip_password', type='str', default=None, | |
| 3420 | - help='if the file is a zip archive, open all files from it, using the provided password (requires Python 2.6+)') | |
| 3421 | - parser.add_option("-f", "--zipfname", dest='zip_fname', type='str', default='*', | |
| 3422 | - help='if the file is a zip archive, file(s) to be opened within the zip. Wildcards * and ? are supported. (default:*)') | |
| 3423 | - # output mode; could make this even simpler with add_option(type='choice') but that would make | |
| 3424 | - # cmd line interface incompatible... | |
| 3425 | - modes = optparse.OptionGroup(parser, title='Output mode (mutually exclusive)') | |
| 3426 | - modes.add_option("-t", '--triage', action="store_const", dest="output_mode", | |
| 3427 | - const='triage', default='unspecified', | |
| 3428 | - help='triage mode, display results as a summary table (default for multiple files)') | |
| 3429 | - modes.add_option("-d", '--detailed', action="store_const", dest="output_mode", | |
| 3430 | - const='detailed', default='unspecified', | |
| 3431 | - help='detailed mode, display full results (default for single file)') | |
| 3432 | - modes.add_option("-j", '--json', action="store_const", dest="output_mode", | |
| 3433 | - const='json', default='unspecified', | |
| 3434 | - help='json mode, detailed in json format (never default)') | |
| 3435 | - parser.add_option_group(modes) | |
| 3436 | - parser.add_option("-a", '--analysis', action="store_false", dest="display_code", default=True, | |
| 3437 | - help='display only analysis results, not the macro source code') | |
| 3438 | - parser.add_option("-c", '--code', action="store_true", dest="vba_code_only", default=False, | |
| 3439 | - help='display only VBA source code, do not analyze it') | |
| 3440 | - parser.add_option("--decode", action="store_true", dest="show_decoded_strings", | |
| 3441 | - help='display all the obfuscated strings with their decoded content (Hex, Base64, StrReverse, Dridex, VBA).') | |
| 3442 | - parser.add_option("--attr", action="store_false", dest="hide_attributes", default=True, | |
| 3443 | - help='display the attribute lines at the beginning of VBA source code') | |
| 3444 | - parser.add_option("--reveal", action="store_true", dest="show_deobfuscated_code", | |
| 3445 | - help='display the macro source code after replacing all the obfuscated strings by their decoded content.') | |
| 3446 | - parser.add_option('-l', '--loglevel', dest="loglevel", action="store", default=DEFAULT_LOG_LEVEL, | |
| 3447 | - help="logging level debug/info/warning/error/critical (default=%default)") | |
| 3448 | - parser.add_option('--deobf', dest="deobfuscate", action="store_true", default=False, | |
| 3449 | - help="Attempt to deobfuscate VBA expressions (slow)") | |
| 3450 | - parser.add_option('--relaxed', dest="relaxed", action="store_true", default=False, | |
| 3451 | - help="Do not raise errors if opening of substream fails") | |
| 3452 | - | |
| 3453 | - (options, args) = parser.parse_args(cmd_line_args) | |
| 3454 | - | |
| 3455 | - # Print help if no arguments are passed | |
| 3456 | - if len(args) == 0: | |
| 3457 | - print('olevba %s - http://decalage.info/python/oletools' % __version__) | |
| 3458 | - print(__doc__) | |
| 3459 | - parser.print_help() | |
| 3460 | - sys.exit(RETURN_WRONG_ARGS) | |
| 3461 | - | |
| 3462 | - options.loglevel = LOG_LEVELS[options.loglevel] | |
| 3463 | - | |
| 3464 | - return options, args | |
| 3465 | - | |
| 3466 | - | |
| 3467 | -def main(cmd_line_args=None): | |
| 3468 | - """ | |
| 3469 | - Main function, called when olevba is run from the command line | |
| 3470 | - | |
| 3471 | - Optional argument: command line arguments to be forwarded to ArgumentParser | |
| 3472 | - in process_args. Per default (cmd_line_args=None), sys.argv is used. Option | |
| 3473 | - mainly added for unit-testing | |
| 3474 | - """ | |
| 3475 | - | |
| 3476 | - options, args = parse_args(cmd_line_args) | |
| 3477 | - | |
| 3478 | - # provide info about tool and its version | |
| 3479 | - if options.output_mode == 'json': | |
| 3480 | - # print first json entry with meta info and opening '[' | |
| 3481 | - print_json(script_name='olevba', version=__version__, | |
| 3482 | - url='http://decalage.info/python/oletools', | |
| 3483 | - type='MetaInformation', _json_is_first=True) | |
| 3484 | - else: | |
| 3485 | - print('olevba3 %s - http://decalage.info/python/oletools' % __version__) | |
| 3486 | - | |
| 3487 | - logging.basicConfig(level=options.loglevel, format='%(levelname)-8s %(message)s') | |
| 3488 | - # enable logging in the modules: | |
| 3489 | - enable_logging() | |
| 3490 | - | |
| 3491 | - # Old display with number of items detected: | |
| 3492 | - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('Type', 'Macros', 'AutoEx', 'Susp.', 'IOCs', 'HexStr') | |
| 3493 | - # print '%-8s %-7s %-7s %-7s %-7s %-7s' % ('-'*8, '-'*7, '-'*7, '-'*7, '-'*7, '-'*7) | |
| 3494 | - | |
| 3495 | - # with the option --reveal, make sure --deobf is also enabled: | |
| 3496 | - if options.show_deobfuscated_code and not options.deobfuscate: | |
| 3497 | - log.info('set --deobf because --reveal was set') | |
| 3498 | - options.deobfuscate = True | |
| 3499 | - if options.output_mode == 'triage' and options.show_deobfuscated_code: | |
| 3500 | - log.info('ignoring option --reveal in triage output mode') | |
| 3501 | - | |
| 3502 | - # Column headers (do not know how many files there will be yet, so if no output_mode | |
| 3503 | - # was specified, we will print triage for first file --> need these headers) | |
| 3504 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3505 | - print('%-12s %-65s' % ('Flags', 'Filename')) | |
| 3506 | - print('%-12s %-65s' % ('-' * 11, '-' * 65)) | |
| 3507 | - | |
| 3508 | - previous_container = None | |
| 3509 | - count = 0 | |
| 3510 | - container = filename = data = None | |
| 3511 | - vba_parser = None | |
| 3512 | - return_code = RETURN_OK | |
| 3513 | - try: | |
| 3514 | - for container, filename, data in xglob.iter_files(args, recursive=options.recursive, | |
| 3515 | - zip_password=options.zip_password, zip_fname=options.zip_fname): | |
| 3516 | - # ignore directory names stored in zip files: | |
| 3517 | - if container and filename.endswith('/'): | |
| 3518 | - continue | |
| 3519 | - | |
| 3520 | - # handle errors from xglob | |
| 3521 | - if isinstance(data, Exception): | |
| 3522 | - if isinstance(data, PathNotFoundException): | |
| 3523 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3524 | - print('%-12s %s - File not found' % ('?', filename)) | |
| 3525 | - elif options.output_mode != 'json': | |
| 3526 | - log.error('Given path %r does not exist!' % filename) | |
| 3527 | - return_code = RETURN_FILE_NOT_FOUND if return_code == 0 \ | |
| 3528 | - else RETURN_SEVERAL_ERRS | |
| 3529 | - else: | |
| 3530 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3531 | - print('%-12s %s - Failed to read from zip file %s' % ('?', filename, container)) | |
| 3532 | - elif options.output_mode != 'json': | |
| 3533 | - log.error('Exception opening/reading %r from zip file %r: %s' | |
| 3534 | - % (filename, container, data)) | |
| 3535 | - return_code = RETURN_XGLOB_ERR if return_code == 0 \ | |
| 3536 | - else RETURN_SEVERAL_ERRS | |
| 3537 | - if options.output_mode == 'json': | |
| 3538 | - print_json(file=filename, type='error', | |
| 3539 | - error=type(data).__name__, message=str(data)) | |
| 3540 | - continue | |
| 3541 | - | |
| 3542 | - try: | |
| 3543 | - # Open the file | |
| 3544 | - vba_parser = VBA_Parser_CLI(filename, data=data, container=container, | |
| 3545 | - relaxed=options.relaxed) | |
| 3546 | - | |
| 3547 | - if options.output_mode == 'detailed': | |
| 3548 | - # fully detailed output | |
| 3549 | - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings, | |
| 3550 | - display_code=options.display_code, | |
| 3551 | - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only, | |
| 3552 | - show_deobfuscated_code=options.show_deobfuscated_code, | |
| 3553 | - deobfuscate=options.deobfuscate) | |
| 3554 | - elif options.output_mode in ('triage', 'unspecified'): | |
| 3555 | - # print container name when it changes: | |
| 3556 | - if container != previous_container: | |
| 3557 | - if container is not None: | |
| 3558 | - print('\nFiles in %s:' % container) | |
| 3559 | - previous_container = container | |
| 3560 | - # summarized output for triage: | |
| 3561 | - vba_parser.process_file_triage(show_decoded_strings=options.show_decoded_strings, | |
| 3562 | - deobfuscate=options.deobfuscate) | |
| 3563 | - elif options.output_mode == 'json': | |
| 3564 | - print_json( | |
| 3565 | - vba_parser.process_file_json(show_decoded_strings=options.show_decoded_strings, | |
| 3566 | - display_code=options.display_code, | |
| 3567 | - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only, | |
| 3568 | - show_deobfuscated_code=options.show_deobfuscated_code, | |
| 3569 | - deobfuscate=options.deobfuscate)) | |
| 3570 | - else: # (should be impossible) | |
| 3571 | - raise ValueError('unexpected output mode: "{0}"!'.format(options.output_mode)) | |
| 3572 | - count += 1 | |
| 3573 | - | |
| 3574 | - except (SubstreamOpenError, UnexpectedDataError) as exc: | |
| 3575 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3576 | - print('%-12s %s - Error opening substream or uenxpected ' \ | |
| 3577 | - 'content' % ('?', filename)) | |
| 3578 | - elif options.output_mode == 'json': | |
| 3579 | - print_json(file=filename, type='error', | |
| 3580 | - error=type(exc).__name__, message=str(exc)) | |
| 3581 | - else: | |
| 3582 | - log.exception('Error opening substream or unexpected ' | |
| 3583 | - 'content in %s' % filename) | |
| 3584 | - return_code = RETURN_OPEN_ERROR if return_code == 0 \ | |
| 3585 | - else RETURN_SEVERAL_ERRS | |
| 3586 | - except FileOpenError as exc: | |
| 3587 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3588 | - print('%-12s %s - File format not supported' % ('?', filename)) | |
| 3589 | - elif options.output_mode == 'json': | |
| 3590 | - print_json(file=filename, type='error', | |
| 3591 | - error=type(exc).__name__, message=str(exc)) | |
| 3592 | - else: | |
| 3593 | - log.exception('Failed to open %s -- probably not supported!' % filename) | |
| 3594 | - return_code = RETURN_OPEN_ERROR if return_code == 0 \ | |
| 3595 | - else RETURN_SEVERAL_ERRS | |
| 3596 | - except ProcessingError as exc: | |
| 3597 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3598 | - print('%-12s %s - %s' % ('!ERROR', filename, exc.orig_exc)) | |
| 3599 | - elif options.output_mode == 'json': | |
| 3600 | - print_json(file=filename, type='error', | |
| 3601 | - error=type(exc).__name__, | |
| 3602 | - message=str(exc.orig_exc)) | |
| 3603 | - else: | |
| 3604 | - log.exception('Error processing file %s (%s)!' | |
| 3605 | - % (filename, exc.orig_exc)) | |
| 3606 | - return_code = RETURN_PARSE_ERROR if return_code == 0 \ | |
| 3607 | - else RETURN_SEVERAL_ERRS | |
| 3608 | - except FileIsEncryptedError as exc: | |
| 3609 | - if options.output_mode in ('triage', 'unspecified'): | |
| 3610 | - print('%-12s %s - File is encrypted' % ('!ERROR', filename)) | |
| 3611 | - elif options.output_mode == 'json': | |
| 3612 | - print_json(file=filename, type='error', | |
| 3613 | - error=type(exc).__name__, message=str(exc)) | |
| 3614 | - else: | |
| 3615 | - log.exception('File %s is encrypted!' % (filename)) | |
| 3616 | - return_code = RETURN_ENCRYPTED if return_code == 0 \ | |
| 3617 | - else RETURN_SEVERAL_ERRS | |
| 3618 | - # Here we do not close the vba_parser, because process_file may need it below. | |
| 3619 | - | |
| 3620 | - finally: | |
| 3621 | - if vba_parser is not None: | |
| 3622 | - vba_parser.close() | |
| 3623 | - | |
| 3624 | - if options.output_mode == 'triage': | |
| 3625 | - print('\n(Flags: OpX=OpenXML, XML=Word2003XML, FlX=FlatOPC XML, MHT=MHTML, TXT=Text, M=Macros, ' \ | |
| 3626 | - 'A=Auto-executable, S=Suspicious keywords, I=IOCs, H=Hex strings, ' \ | |
| 3627 | - 'B=Base64 strings, D=Dridex strings, V=VBA strings, ?=Unknown)\n') | |
| 3628 | - | |
| 3629 | - if count == 1 and options.output_mode == 'unspecified': | |
| 3630 | - # if options -t, -d and -j were not specified and it's a single file, print details: | |
| 3631 | - vba_parser.process_file(show_decoded_strings=options.show_decoded_strings, | |
| 3632 | - display_code=options.display_code, | |
| 3633 | - hide_attributes=options.hide_attributes, vba_code_only=options.vba_code_only, | |
| 3634 | - show_deobfuscated_code=options.show_deobfuscated_code, | |
| 3635 | - deobfuscate=options.deobfuscate) | |
| 3636 | - | |
| 3637 | - if options.output_mode == 'json': | |
| 3638 | - # print last json entry (a last one without a comma) and closing ] | |
| 3639 | - print_json(type='MetaInformation', return_code=return_code, | |
| 3640 | - n_processed=count, _json_is_last=True) | |
| 3641 | - | |
| 3642 | - except Exception as exc: | |
| 3643 | - # some unexpected error, maybe some of the types caught in except clauses | |
| 3644 | - # above were not sufficient. This is very bad, so log complete trace at exception level | |
| 3645 | - # and do not care about output mode | |
| 3646 | - log.exception('Unhandled exception in main: %s' % exc, exc_info=True) | |
| 3647 | - return_code = RETURN_UNEXPECTED # even if there were others before -- this is more important | |
| 3648 | - # TODO: print msg with URL to report issues (except in JSON mode) | |
| 3649 | - | |
| 3650 | - # done. exit | |
| 3651 | - log.debug('will exit now with code %s' % return_code) | |
| 3652 | - sys.exit(return_code) | |
| 19 | +from oletools.olevba import * | |
| 20 | +from oletools.olevba import __doc__, __version__ | |
| 3653 | 21 | |
| 3654 | 22 | if __name__ == '__main__': |
| 3655 | 23 | main() |
| 3656 | 24 | |
| 3657 | -# This was coded while listening to "Dust" from I Love You But I've Chosen Darkness | ... | ... |
oletools/ooxml.py
| ... | ... | @@ -16,11 +16,11 @@ TODO: "xml2003" == "flatopc"? |
| 16 | 16 | """ |
| 17 | 17 | |
| 18 | 18 | import sys |
| 19 | -from oletools.common.log_helper import log_helper | |
| 20 | 19 | from zipfile import ZipFile, BadZipfile, is_zipfile |
| 21 | 20 | from os.path import splitext |
| 22 | 21 | import io |
| 23 | 22 | import re |
| 23 | +from oletools.common.log_helper import log_helper | |
| 24 | 24 | |
| 25 | 25 | # import lxml or ElementTree for XML parsing: |
| 26 | 26 | try: |
| ... | ... | @@ -107,16 +107,14 @@ def debug_str(elem): |
| 107 | 107 | text = u', '.join(parts) |
| 108 | 108 | if len(text) > 150: |
| 109 | 109 | return text[:147] + u'...]' |
| 110 | - else: | |
| 111 | - return text + u']' | |
| 110 | + return text + u']' | |
| 112 | 111 | |
| 113 | 112 | |
| 114 | 113 | def isstr(some_var): |
| 115 | 114 | """ version-independent test for isinstance(some_var, (str, unicode)) """ |
| 116 | 115 | if sys.version_info.major == 2: |
| 117 | 116 | return isinstance(some_var, basestring) # true for str and unicode |
| 118 | - else: | |
| 119 | - return isinstance(some_var, str) # there is no unicode | |
| 117 | + return isinstance(some_var, str) # there is no unicode | |
| 120 | 118 | |
| 121 | 119 | |
| 122 | 120 | ############################################################################### |
| ... | ... | @@ -136,23 +134,29 @@ def get_type(filename): |
| 136 | 134 | prog_id = match.groups()[0] |
| 137 | 135 | if prog_id == WORD_XML_PROG_ID: |
| 138 | 136 | return DOCTYPE_WORD_XML |
| 139 | - elif prog_id == EXCEL_XML_PROG_ID: | |
| 137 | + if prog_id == EXCEL_XML_PROG_ID: | |
| 140 | 138 | return DOCTYPE_EXCEL_XML |
| 141 | - else: | |
| 142 | - return DOCTYPE_NONE | |
| 139 | + return DOCTYPE_NONE | |
| 143 | 140 | |
| 144 | 141 | is_doc = False |
| 145 | 142 | is_xls = False |
| 146 | 143 | is_ppt = False |
| 147 | - for _, elem, _ in parser.iter_xml(FILE_CONTENT_TYPES): | |
| 148 | - logger.debug(u' ' + debug_str(elem)) | |
| 149 | - try: | |
| 150 | - content_type = elem.attrib['ContentType'] | |
| 151 | - except KeyError: # ContentType not an attr | |
| 152 | - continue | |
| 153 | - is_xls |= content_type.startswith(CONTENT_TYPES_EXCEL) | |
| 154 | - is_doc |= content_type.startswith(CONTENT_TYPES_WORD) | |
| 155 | - is_ppt |= content_type.startswith(CONTENT_TYPES_PPT) | |
| 144 | + try: | |
| 145 | + for _, elem, _ in parser.iter_xml(FILE_CONTENT_TYPES): | |
| 146 | + logger.debug(u' ' + debug_str(elem)) | |
| 147 | + try: | |
| 148 | + content_type = elem.attrib['ContentType'] | |
| 149 | + except KeyError: # ContentType not an attr | |
| 150 | + continue | |
| 151 | + is_xls |= content_type.startswith(CONTENT_TYPES_EXCEL) | |
| 152 | + is_doc |= content_type.startswith(CONTENT_TYPES_WORD) | |
| 153 | + is_ppt |= content_type.startswith(CONTENT_TYPES_PPT) | |
| 154 | + except BadOOXML as oo_err: | |
| 155 | + if oo_err.more_info.startswith('invalid subfile') and \ | |
| 156 | + FILE_CONTENT_TYPES in oo_err.more_info: | |
| 157 | + # no FILE_CONTENT_TYPES in zip, so probably no ms office xml. | |
| 158 | + return DOCTYPE_NONE | |
| 159 | + raise | |
| 156 | 160 | |
| 157 | 161 | if is_doc and not is_xls and not is_ppt: |
| 158 | 162 | return DOCTYPE_WORD |
| ... | ... | @@ -162,9 +166,8 @@ def get_type(filename): |
| 162 | 166 | return DOCTYPE_POWERPOINT |
| 163 | 167 | if not is_doc and not is_xls and not is_ppt: |
| 164 | 168 | return DOCTYPE_NONE |
| 165 | - else: | |
| 166 | - logger.warning('Encountered contradictory content types') | |
| 167 | - return DOCTYPE_MIXED | |
| 169 | + logger.warning('Encountered contradictory content types') | |
| 170 | + return DOCTYPE_MIXED | |
| 168 | 171 | |
| 169 | 172 | |
| 170 | 173 | def is_ooxml(filename): |
| ... | ... | @@ -177,6 +180,7 @@ def is_ooxml(filename): |
| 177 | 180 | return False |
| 178 | 181 | if doctype == DOCTYPE_NONE: |
| 179 | 182 | return False |
| 183 | + return True | |
| 180 | 184 | |
| 181 | 185 | |
| 182 | 186 | ############################################################################### |
| ... | ... | @@ -216,6 +220,7 @@ class ZipSubFile(object): |
| 216 | 220 | See also (and maybe could some day merge with): |
| 217 | 221 | ppt_record_parser.IterStream; also: oleobj.FakeFile |
| 218 | 222 | """ |
| 223 | + CHUNK_SIZE = 4096 | |
| 219 | 224 | |
| 220 | 225 | def __init__(self, container, filename, mode='r', size=None): |
| 221 | 226 | """ remember all necessary vars but do not open yet """ |
| ... | ... | @@ -253,7 +258,7 @@ class ZipSubFile(object): |
| 253 | 258 | # print('ZipSubFile: opened; size={}'.format(self.size)) |
| 254 | 259 | return self |
| 255 | 260 | |
| 256 | - def write(self, *args, **kwargs): # pylint: disable=unused-argument,no-self-use | |
| 261 | + def write(self, *args, **kwargs): | |
| 257 | 262 | """ write is not allowed """ |
| 258 | 263 | raise IOError('writing not implemented') |
| 259 | 264 | |
| ... | ... | @@ -311,10 +316,9 @@ class ZipSubFile(object): |
| 311 | 316 | """ helper for seek: skip forward by given amount using read() """ |
| 312 | 317 | # print('ZipSubFile: seek by skipping {} bytes starting at {}' |
| 313 | 318 | # .format(self.pos, to_skip)) |
| 314 | - CHUNK_SIZE = 4096 | |
| 315 | - n_chunks, leftover = divmod(to_skip, CHUNK_SIZE) | |
| 319 | + n_chunks, leftover = divmod(to_skip, self.CHUNK_SIZE) | |
| 316 | 320 | for _ in range(n_chunks): |
| 317 | - self.read(CHUNK_SIZE) # just read and discard | |
| 321 | + self.read(self.CHUNK_SIZE) # just read and discard | |
| 318 | 322 | self.read(leftover) |
| 319 | 323 | # print('ZipSubFile: seek by skipping done, pos now {}' |
| 320 | 324 | # .format(self.pos)) |
| ... | ... | @@ -417,8 +421,7 @@ class XmlParser(object): |
| 417 | 421 | if match: |
| 418 | 422 | self._is_single_xml = True |
| 419 | 423 | return True |
| 420 | - if not match: | |
| 421 | - raise BadOOXML(self.filename, 'is no zip and has no prog_id') | |
| 424 | + raise BadOOXML(self.filename, 'is no zip and has no prog_id') | |
| 422 | 425 | |
| 423 | 426 | def iter_files(self, args=None): |
| 424 | 427 | """ Find files in zip or just give single xml file """ |
| ... | ... | @@ -433,17 +436,14 @@ class XmlParser(object): |
| 433 | 436 | subfiles = None |
| 434 | 437 | try: |
| 435 | 438 | zipper = ZipFile(self.filename) |
| 436 | - try: | |
| 437 | - _ = zipper.getinfo(FILE_CONTENT_TYPES) | |
| 438 | - except KeyError: | |
| 439 | - raise BadOOXML(self.filename, | |
| 440 | - 'No content type information') | |
| 441 | 439 | if not args: |
| 442 | 440 | subfiles = zipper.namelist() |
| 443 | 441 | elif isstr(args): |
| 444 | 442 | subfiles = [args, ] |
| 445 | 443 | else: |
| 446 | - subfiles = tuple(args) # make a copy in case orig changes | |
| 444 | + # make a copy in case original args are modified | |
| 445 | + # Not sure whether this really is needed... | |
| 446 | + subfiles = tuple(arg for arg in args) | |
| 447 | 447 | |
| 448 | 448 | for subfile in subfiles: |
| 449 | 449 | with zipper.open(subfile, 'r') as handle: |
| ... | ... | @@ -451,10 +451,12 @@ class XmlParser(object): |
| 451 | 451 | if not args: |
| 452 | 452 | self.did_iter_all = True |
| 453 | 453 | except KeyError as orig_err: |
| 454 | + # Note: do not change text of this message without adjusting | |
| 455 | + # conditions in except handlers | |
| 454 | 456 | raise BadOOXML(self.filename, |
| 455 | 457 | 'invalid subfile: ' + str(orig_err)) |
| 456 | 458 | except BadZipfile: |
| 457 | - raise BadOOXML(self.filename, 'neither zip nor xml') | |
| 459 | + raise BadOOXML(self.filename, 'not in zip format') | |
| 458 | 460 | finally: |
| 459 | 461 | if zipper: |
| 460 | 462 | zipper.close() |
| ... | ... | @@ -503,7 +505,7 @@ class XmlParser(object): |
| 503 | 505 | if event == 'start': |
| 504 | 506 | if elem.tag in want_tags: |
| 505 | 507 | logger.debug('remember start of tag {0} at {1}' |
| 506 | - .format(elem.tag, depth)) | |
| 508 | + .format(elem.tag, depth)) | |
| 507 | 509 | inside_tags.append((elem.tag, depth)) |
| 508 | 510 | depth += 1 |
| 509 | 511 | continue |
| ... | ... | @@ -519,18 +521,18 @@ class XmlParser(object): |
| 519 | 521 | inside_tags.pop() |
| 520 | 522 | else: |
| 521 | 523 | logger.error('found end for wanted tag {0} ' |
| 522 | - 'but last start tag {1} does not' | |
| 523 | - ' match'.format(curr_tag, | |
| 524 | - inside_tags[-1])) | |
| 524 | + 'but last start tag {1} does not' | |
| 525 | + ' match'.format(curr_tag, | |
| 526 | + inside_tags[-1])) | |
| 525 | 527 | # try to recover: close all deeper tags |
| 526 | 528 | while inside_tags and \ |
| 527 | 529 | inside_tags[-1][1] >= depth: |
| 528 | 530 | logger.debug('recover: pop {0}' |
| 529 | - .format(inside_tags[-1])) | |
| 531 | + .format(inside_tags[-1])) | |
| 530 | 532 | inside_tags.pop() |
| 531 | 533 | except IndexError: # no inside_tag[-1] |
| 532 | 534 | logger.error('found end of {0} at depth {1} but ' |
| 533 | - 'no start event') | |
| 535 | + 'no start event') | |
| 534 | 536 | # yield element |
| 535 | 537 | if is_wanted or not want_tags: |
| 536 | 538 | yield subfile, elem, depth |
| ... | ... | @@ -544,7 +546,7 @@ class XmlParser(object): |
| 544 | 546 | except ET.ParseError as err: |
| 545 | 547 | self.subfiles_no_xml.add(subfile) |
| 546 | 548 | if subfile is None: # this is no zip subfile but single xml |
| 547 | - raise BadOOXML(self.filename, 'is neither zip nor xml') | |
| 549 | + raise BadOOXML(self.filename, 'content is not valid XML') | |
| 548 | 550 | elif subfile.endswith('.xml'): |
| 549 | 551 | log = logger.warning |
| 550 | 552 | else: |
| ... | ... | @@ -568,21 +570,30 @@ class XmlParser(object): |
| 568 | 570 | |
| 569 | 571 | defaults = [] |
| 570 | 572 | files = [] |
| 571 | - for _, elem, _ in self.iter_xml(FILE_CONTENT_TYPES): | |
| 572 | - if elem.tag.endswith('Default'): | |
| 573 | - extension = elem.attrib['Extension'] | |
| 574 | - if extension.startswith('.'): | |
| 575 | - extension = extension[1:] | |
| 576 | - defaults.append((extension, elem.attrib['ContentType'])) | |
| 577 | - logger.debug('found content type for extension {0[0]}: {0[1]}' | |
| 578 | - .format(defaults[-1])) | |
| 579 | - elif elem.tag.endswith('Override'): | |
| 580 | - subfile = elem.attrib['PartName'] | |
| 581 | - if subfile.startswith('/'): | |
| 582 | - subfile = subfile[1:] | |
| 583 | - files.append((subfile, elem.attrib['ContentType'])) | |
| 584 | - logger.debug('found content type for subfile {0[0]}: {0[1]}' | |
| 585 | - .format(files[-1])) | |
| 573 | + try: | |
| 574 | + for _, elem, _ in self.iter_xml(FILE_CONTENT_TYPES): | |
| 575 | + if elem.tag.endswith('Default'): | |
| 576 | + extension = elem.attrib['Extension'] | |
| 577 | + if extension.startswith('.'): | |
| 578 | + extension = extension[1:] | |
| 579 | + defaults.append((extension, elem.attrib['ContentType'])) | |
| 580 | + logger.debug('found content type for extension {0[0]}: ' | |
| 581 | + '{0[1]}'.format(defaults[-1])) | |
| 582 | + elif elem.tag.endswith('Override'): | |
| 583 | + subfile = elem.attrib['PartName'] | |
| 584 | + if subfile.startswith('/'): | |
| 585 | + subfile = subfile[1:] | |
| 586 | + files.append((subfile, elem.attrib['ContentType'])) | |
| 587 | + logger.debug('found content type for subfile {0[0]}: ' | |
| 588 | + '{0[1]}'.format(files[-1])) | |
| 589 | + except BadOOXML as oo_err: | |
| 590 | + if oo_err.more_info.startswith('invalid subfile') and \ | |
| 591 | + FILE_CONTENT_TYPES in oo_err.more_info: | |
| 592 | + # no FILE_CONTENT_TYPES in zip, so probably no ms office xml. | |
| 593 | + # Maybe OpenDocument format? In any case, try to analyze. | |
| 594 | + pass | |
| 595 | + else: | |
| 596 | + raise | |
| 586 | 597 | return dict(files), dict(defaults) |
| 587 | 598 | |
| 588 | 599 | def iter_non_xml(self): |
| ... | ... | @@ -599,7 +610,7 @@ class XmlParser(object): |
| 599 | 610 | """ |
| 600 | 611 | if not self.did_iter_all: |
| 601 | 612 | logger.warning('Did not iterate through complete file. ' |
| 602 | - 'Should run iter_xml() without args, first.') | |
| 613 | + 'Should run iter_xml() without args, first.') | |
| 603 | 614 | if not self.subfiles_no_xml: |
| 604 | 615 | return |
| 605 | 616 | |
| ... | ... | @@ -631,7 +642,7 @@ def test(): |
| 631 | 642 | |
| 632 | 643 | see module doc for more info |
| 633 | 644 | """ |
| 634 | - log_helper.enable_logging(False, logger.DEBUG) | |
| 645 | + log_helper.enable_logging(False, 'debug') | |
| 635 | 646 | if len(sys.argv) != 2: |
| 636 | 647 | print(u'To test this code, give me a single file as arg') |
| 637 | 648 | return 2 | ... | ... |
oletools/ppt_parser.py
| ... | ... | @@ -43,7 +43,7 @@ file structure and will replace this module some time soon! |
| 43 | 43 | # 2017-04-23 v0.51 PL: - fixed absolute imports and issue #101 |
| 44 | 44 | # 2018-09-11 v0.54 PL: - olefile is now a dependency |
| 45 | 45 | |
| 46 | -__version__ = '0.54dev1' | |
| 46 | +__version__ = '0.54' | |
| 47 | 47 | |
| 48 | 48 | |
| 49 | 49 | # --- IMPORTS ------------------------------------------------------------------ | ... | ... |
oletools/ppt_record_parser.py
| ... | ... | @@ -63,7 +63,6 @@ except ImportError: |
| 63 | 63 | sys.path.insert(0, PARENT_DIR) |
| 64 | 64 | del PARENT_DIR |
| 65 | 65 | from oletools import record_base |
| 66 | -from oletools.common.errors import FileIsEncryptedError | |
| 67 | 66 | |
| 68 | 67 | |
| 69 | 68 | # types of relevant records (there are much more than listed here) |
| ... | ... | @@ -109,10 +108,11 @@ RECORD_TYPES = dict([ |
| 109 | 108 | ]) |
| 110 | 109 | |
| 111 | 110 | |
| 112 | -# record types where version is not 0x0 or 0xf | |
| 111 | +# record types where version is not 0x0 or 0x1 or 0xf | |
| 113 | 112 | VERSION_EXCEPTIONS = dict([ |
| 114 | 113 | (0x0400, 2), # rt_vbainfoatom |
| 115 | 114 | (0x03ef, 2), # rt_slideatom |
| 115 | + (0xe9c7, 7), # tests/test-data/encrypted/encrypted.ppt, not investigated | |
| 116 | 116 | ]) |
| 117 | 117 | |
| 118 | 118 | |
| ... | ... | @@ -149,6 +149,10 @@ def is_ppt(filename): |
| 149 | 149 | Param filename can be anything that OleFileIO constructor accepts: name of |
| 150 | 150 | file or file data or data stream. |
| 151 | 151 | |
| 152 | + Will not try to decrypt the file not even try to determine whether it is | |
| 153 | + encrypted. If the file is encrypted will either raise an error or just | |
| 154 | + return `False`. | |
| 155 | + | |
| 152 | 156 | see also: oleid.OleID.check_powerpoint |
| 153 | 157 | """ |
| 154 | 158 | have_current_user = False |
| ... | ... | @@ -170,7 +174,7 @@ def is_ppt(filename): |
| 170 | 174 | for record in stream.iter_records(): |
| 171 | 175 | if record.type == 0x0ff5: # UserEditAtom |
| 172 | 176 | have_user_edit = True |
| 173 | - elif record.type == 0x1772: # PersisDirectoryAtom | |
| 177 | + elif record.type == 0x1772: # PersistDirectoryAtom | |
| 174 | 178 | have_persist_dir = True |
| 175 | 179 | elif record.type == 0x03e8: # DocumentContainer |
| 176 | 180 | have_document_container = True |
| ... | ... | @@ -181,13 +185,12 @@ def is_ppt(filename): |
| 181 | 185 | return True |
| 182 | 186 | else: # ignore other streams/storages since they are optional |
| 183 | 187 | continue |
| 184 | - except FileIsEncryptedError: | |
| 185 | - assert ppt_file is not None, \ | |
| 186 | - 'Encryption error should not be raised from just opening OLE file.' | |
| 187 | - # just rely on stream names, copied from oleid | |
| 188 | - return ppt_file.exists('PowerPoint Document') | |
| 189 | - except Exception: | |
| 190 | - pass | |
| 188 | + except Exception as exc: | |
| 189 | + logging.debug('Ignoring exception in is_ppt, assume is not ppt', | |
| 190 | + exc_info=True) | |
| 191 | + finally: | |
| 192 | + if ppt_file is not None: | |
| 193 | + ppt_file.close() | |
| 191 | 194 | return False |
| 192 | 195 | |
| 193 | 196 | ... | ... |
oletools/pyxswf.py
| ... | ... | @@ -25,7 +25,7 @@ http://www.decalage.info/python/oletools |
| 25 | 25 | |
| 26 | 26 | #=== LICENSE ================================================================= |
| 27 | 27 | |
| 28 | -# pyxswf is copyright (c) 2012-2016, Philippe Lagadec (http://www.decalage.info) | |
| 28 | +# pyxswf is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info) | |
| 29 | 29 | # All rights reserved. |
| 30 | 30 | # |
| 31 | 31 | # Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -59,7 +59,7 @@ http://www.decalage.info/python/oletools |
| 59 | 59 | # 2016-11-01 PL: - replaced StringIO by BytesIO for Python 3 |
| 60 | 60 | # 2018-09-11 v0.54 PL: - olefile is now a dependency |
| 61 | 61 | |
| 62 | -__version__ = '0.54dev1' | |
| 62 | +__version__ = '0.54' | |
| 63 | 63 | |
| 64 | 64 | #------------------------------------------------------------------------------ |
| 65 | 65 | # TODO: | ... | ... |
oletools/record_base.py
| ... | ... | @@ -8,7 +8,10 @@ This is the case for xls and ppt, so classes are bases for xls_parser.py and |
| 8 | 8 | ppt_record_parser.py . |
| 9 | 9 | """ |
| 10 | 10 | |
| 11 | -# === LICENSE ================================================================= | |
| 11 | +# === LICENSE ================================================================== | |
| 12 | + | |
| 13 | +# record_base is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info) | |
| 14 | +# All rights reserved. | |
| 12 | 15 | # |
| 13 | 16 | # Redistribution and use in source and binary forms, with or without |
| 14 | 17 | # modification, are permitted provided that the following conditions are met: |
| ... | ... | @@ -37,8 +40,10 @@ from __future__ import print_function |
| 37 | 40 | # CHANGELOG: |
| 38 | 41 | # 2017-11-30 v0.01 CH: - first version based on xls_parser |
| 39 | 42 | # 2018-09-11 v0.54 PL: - olefile is now a dependency |
| 43 | +# 2019-01-30 PL: - fixed import to avoid mixing installed oletools | |
| 44 | +# and dev version | |
| 40 | 45 | |
| 41 | -__version__ = '0.54dev1' | |
| 46 | +__version__ = '0.54' | |
| 42 | 47 | |
| 43 | 48 | # ----------------------------------------------------------------------------- |
| 44 | 49 | # TODO: |
| ... | ... | @@ -63,16 +68,12 @@ import logging |
| 63 | 68 | |
| 64 | 69 | import olefile |
| 65 | 70 | |
| 66 | -try: | |
| 67 | - from oletools.common.errors import FileIsEncryptedError | |
| 68 | -except ImportError: | |
| 69 | - # little hack to allow absolute imports even if oletools is not installed. | |
| 70 | - PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname( | |
| 71 | - os.path.abspath(__file__)))) | |
| 72 | - if PARENT_DIR not in sys.path: | |
| 73 | - sys.path.insert(0, PARENT_DIR) | |
| 74 | - del PARENT_DIR | |
| 75 | - from oletools.common.errors import FileIsEncryptedError | |
| 71 | +# little hack to allow absolute imports even if oletools is not installed. | |
| 72 | +PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname( | |
| 73 | + os.path.abspath(__file__)))) | |
| 74 | +if PARENT_DIR not in sys.path: | |
| 75 | + sys.path.insert(0, PARENT_DIR) | |
| 76 | +del PARENT_DIR | |
| 76 | 77 | from oletools import oleid |
| 77 | 78 | |
| 78 | 79 | |
| ... | ... | @@ -125,10 +126,9 @@ class OleRecordFile(olefile.OleFileIO): |
| 125 | 126 | """ |
| 126 | 127 | |
| 127 | 128 | def open(self, filename, *args, **kwargs): |
| 128 | - """Call OleFileIO.open, raise error if is encrypted.""" | |
| 129 | + """Call OleFileIO.open.""" | |
| 129 | 130 | #super(OleRecordFile, self).open(filename, *args, **kwargs) |
| 130 | 131 | OleFileIO.open(self, filename, *args, **kwargs) |
| 131 | - self.is_encrypted = oleid.OleID(self).check_encrypted().value | |
| 132 | 132 | |
| 133 | 133 | @classmethod |
| 134 | 134 | def stream_class_for_name(cls, stream_name): |
| ... | ... | @@ -161,8 +161,7 @@ class OleRecordFile(olefile.OleFileIO): |
| 161 | 161 | stream = clz(self._open(direntry.isectStart, direntry.size), |
| 162 | 162 | direntry.size, |
| 163 | 163 | None if is_orphan else direntry.name, |
| 164 | - direntry.entry_type, | |
| 165 | - self.is_encrypted) | |
| 164 | + direntry.entry_type) | |
| 166 | 165 | yield stream |
| 167 | 166 | stream.close() |
| 168 | 167 | |
| ... | ... | @@ -175,14 +174,13 @@ class OleRecordStream(object): |
| 175 | 174 | abstract base class |
| 176 | 175 | """ |
| 177 | 176 | |
| 178 | - def __init__(self, stream, size, name, stream_type, is_encrypted=False): | |
| 177 | + def __init__(self, stream, size, name, stream_type): | |
| 179 | 178 | self.stream = stream |
| 180 | 179 | self.size = size |
| 181 | 180 | self.name = name |
| 182 | 181 | if stream_type not in ENTRY_TYPE2STR: |
| 183 | 182 | raise ValueError('Unknown stream type: {0}'.format(stream_type)) |
| 184 | 183 | self.stream_type = stream_type |
| 185 | - self.is_encrypted = is_encrypted | |
| 186 | 184 | |
| 187 | 185 | def read_record_head(self): |
| 188 | 186 | """ read first few bytes of record to determine size and type |
| ... | ... | @@ -211,9 +209,6 @@ class OleRecordStream(object): |
| 211 | 209 | |
| 212 | 210 | Stream must be positioned at start of records (e.g. start of stream). |
| 213 | 211 | """ |
| 214 | - if self.is_encrypted: | |
| 215 | - raise FileIsEncryptedError() | |
| 216 | - | |
| 217 | 212 | while True: |
| 218 | 213 | # unpacking as in olevba._extract_vba |
| 219 | 214 | pos = self.stream.tell() | ... | ... |
oletools/rtfobj.py
| ... | ... | @@ -17,7 +17,7 @@ http://www.decalage.info/python/oletools |
| 17 | 17 | |
| 18 | 18 | #=== LICENSE ================================================================= |
| 19 | 19 | |
| 20 | -# rtfobj is copyright (c) 2012-2018, Philippe Lagadec (http://www.decalage.info) | |
| 20 | +# rtfobj is copyright (c) 2012-2019, Philippe Lagadec (http://www.decalage.info) | |
| 21 | 21 | # All rights reserved. |
| 22 | 22 | # |
| 23 | 23 | # Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -88,8 +88,10 @@ http://www.decalage.info/python/oletools |
| 88 | 88 | # 2018-05-31 v0.53.1 PP: - fixed issue #316: whitespace after \bin on Python 3 |
| 89 | 89 | # 2018-06-22 v0.53.2 PL: - fixed issue #327: added "\pnaiu" & "\pnaiud" |
| 90 | 90 | # 2018-09-11 v0.54 PL: - olefile is now a dependency |
| 91 | +# 2019-07-08 v0.55 MM: - added URL carver for CVE-2017-0199 (Equation Editor) PR #460 | |
| 92 | +# - added SCT to the list of executable file extensions PR #461 | |
| 91 | 93 | |
| 92 | -__version__ = '0.54dev1' | |
| 94 | +__version__ = '0.55.dev3' | |
| 93 | 95 | |
| 94 | 96 | # ------------------------------------------------------------------------------ |
| 95 | 97 | # TODO: |
| ... | ... | @@ -103,7 +105,7 @@ __version__ = '0.54dev1' |
| 103 | 105 | |
| 104 | 106 | # === IMPORTS ================================================================= |
| 105 | 107 | |
| 106 | -import re, os, sys, binascii, logging, optparse | |
| 108 | +import re, os, sys, binascii, logging, optparse, hashlib | |
| 107 | 109 | import os.path |
| 108 | 110 | from time import time |
| 109 | 111 | |
| ... | ... | @@ -268,7 +270,7 @@ re_delim_hexblock = re.compile(DELIMITER + PATTERN) |
| 268 | 270 | |
| 269 | 271 | # TODO: use a frozenset instead of a regex? |
| 270 | 272 | re_executable_extensions = re.compile( |
| 271 | - r"(?i)\.(EXE|COM|PIF|GADGET|MSI|MSP|MSC|VBS|VBE|VB|JSE|JS|WSF|WSC|WSH|WS|BAT|CMD|DLL|SCR|HTA|CPL|CLASS|JAR|PS1XML|PS1|PS2XML|PS2|PSC1|PSC2|SCF|LNK|INF|REG)\b") | |
| 273 | + r"(?i)\.(BAT|CLASS|CMD|CPL|DLL|EXECOM|GADGET|HTA|INF|JAR|JS|JSE|LNK|MSC|MSI|MSP|PIF|PS1|PS1XML|PS2|PS2XML|PSC1|PSC2|REG|SCF|SCR|SCT|VB|VBE|VBS|WS|WSC|WSF|WSH)\b") | |
| 272 | 274 | |
| 273 | 275 | # Destination Control Words, according to MS RTF Specifications v1.9.1: |
| 274 | 276 | DESTINATION_CONTROL_WORDS = frozenset(( |
| ... | ... | @@ -678,6 +680,7 @@ class RtfObjParser(RtfParser): |
| 678 | 680 | rtfobj.hexdata = hexdata |
| 679 | 681 | object_data = binascii.unhexlify(hexdata) |
| 680 | 682 | rtfobj.rawdata = object_data |
| 683 | + rtfobj.rawdata_md5 = hashlib.md5(object_data).hexdigest() | |
| 681 | 684 | # TODO: check if all hex data is extracted properly |
| 682 | 685 | |
| 683 | 686 | obj = oleobj.OleObject() |
| ... | ... | @@ -687,6 +690,7 @@ class RtfObjParser(RtfParser): |
| 687 | 690 | rtfobj.class_name = obj.class_name |
| 688 | 691 | rtfobj.oledata_size = obj.data_size |
| 689 | 692 | rtfobj.oledata = obj.data |
| 693 | + rtfobj.oledata_md5 = hashlib.md5(obj.data).hexdigest() | |
| 690 | 694 | rtfobj.is_ole = True |
| 691 | 695 | if obj.class_name.lower() == b'package': |
| 692 | 696 | opkg = oleobj.OleNativeStream(bindata=obj.data, |
| ... | ... | @@ -695,6 +699,7 @@ class RtfObjParser(RtfParser): |
| 695 | 699 | rtfobj.src_path = opkg.src_path |
| 696 | 700 | rtfobj.temp_path = opkg.temp_path |
| 697 | 701 | rtfobj.olepkgdata = opkg.data |
| 702 | + rtfobj.olepkgdata_md5 = hashlib.md5(opkg.data).hexdigest() | |
| 698 | 703 | rtfobj.is_package = True |
| 699 | 704 | else: |
| 700 | 705 | if olefile.isOleFile(obj.data): |
| ... | ... | @@ -878,15 +883,23 @@ def process_file(container, filename, data, output_dir=None, save_object=False): |
| 878 | 883 | ole_column += '\nFilename: %r' % rtfobj.filename |
| 879 | 884 | ole_column += '\nSource path: %r' % rtfobj.src_path |
| 880 | 885 | ole_column += '\nTemp path = %r' % rtfobj.temp_path |
| 886 | + ole_column += '\nMD5 = %r' % rtfobj.olepkgdata_md5 | |
| 881 | 887 | ole_color = 'yellow' |
| 882 | 888 | # check if the file extension is executable: |
| 883 | - _, ext = os.path.splitext(rtfobj.filename) | |
| 884 | - log.debug('File extension: %r' % ext) | |
| 885 | - if re_executable_extensions.match(ext): | |
| 889 | + | |
| 890 | + _, temp_ext = os.path.splitext(rtfobj.temp_path) | |
| 891 | + log.debug('Temp path extension: %r' % temp_ext) | |
| 892 | + _, file_ext = os.path.splitext(rtfobj.filename) | |
| 893 | + log.debug('File extension: %r' % file_ext) | |
| 894 | + | |
| 895 | + if temp_ext != file_ext: | |
| 896 | + ole_column += "\nMODIFIED FILE EXTENSION" | |
| 897 | + | |
| 898 | + if re_executable_extensions.match(temp_ext) or re_executable_extensions.match(file_ext): | |
| 886 | 899 | ole_color = 'red' |
| 887 | 900 | ole_column += '\nEXECUTABLE FILE' |
| 888 | - # else: | |
| 889 | - # pkg_column = 'Not an OLE Package' | |
| 901 | + else: | |
| 902 | + ole_column += '\nMD5 = %r' % rtfobj.oledata_md5 | |
| 890 | 903 | if rtfobj.clsid is not None: |
| 891 | 904 | ole_column += '\nCLSID: %s' % rtfobj.clsid |
| 892 | 905 | ole_column += '\n%s' % rtfobj.clsid_desc |
| ... | ... | @@ -896,7 +909,28 @@ def process_file(container, filename, data, output_dir=None, save_object=False): |
| 896 | 909 | # http://www.kb.cert.org/vuls/id/921560 |
| 897 | 910 | if rtfobj.class_name == b'OLE2Link': |
| 898 | 911 | ole_color = 'red' |
| 899 | - ole_column += '\nPossibly an exploit for the OLE2Link vulnerability (VU#921560, CVE-2017-0199)' | |
| 912 | + ole_column += '\nPossibly an exploit for the OLE2Link vulnerability (VU#921560, CVE-2017-0199)\n' | |
| 913 | + # https://bitbucket.org/snippets/Alexander_Hanel/7Adpp | |
| 914 | + found_list = re.findall(r'[a-fA-F0-9\x0D\x0A]{128,}',data) | |
| 915 | + urls = [] | |
| 916 | + for item in found_list: | |
| 917 | + try: | |
| 918 | + temp = item.replace("\x0D\x0A","").decode("hex") | |
| 919 | + except: | |
| 920 | + continue | |
| 921 | + pat = re.compile(r'(?:[\x20-\x7E][\x00]){3,}') | |
| 922 | + words = [w.decode('utf-16le') for w in pat.findall(temp)] | |
| 923 | + for w in words: | |
| 924 | + if "http" in w: | |
| 925 | + urls.append(w) | |
| 926 | + urls = sorted(set(urls)) | |
| 927 | + if urls: | |
| 928 | + ole_column += 'URL extracted: ' + ', '.join(urls) | |
| 929 | + # Detect Equation Editor exploit | |
| 930 | + # https://www.kb.cert.org/vuls/id/421280/ | |
| 931 | + elif rtfobj.class_name.lower() == b'equation.3': | |
| 932 | + ole_color = 'red' | |
| 933 | + ole_column += '\nPossibly an exploit for the Equation Editor vulnerability (VU#421280, CVE-2017-11882)' | |
| 900 | 934 | else: |
| 901 | 935 | ole_column = 'Not a well-formed OLE object' |
| 902 | 936 | tstream.write_row(( |
| ... | ... | @@ -930,6 +964,7 @@ def process_file(container, filename, data, output_dir=None, save_object=False): |
| 930 | 964 | else: |
| 931 | 965 | fname = '%s_object_%08X.noname' % (fname_prefix, rtfobj.start) |
| 932 | 966 | print(' saving to file %s' % fname) |
| 967 | + print(' md5 %s' % rtfobj.olepkgdata_md5) | |
| 933 | 968 | open(fname, 'wb').write(rtfobj.olepkgdata) |
| 934 | 969 | # When format_id=TYPE_LINKED, oledata_size=None |
| 935 | 970 | elif rtfobj.is_ole and rtfobj.oledata_size is not None: |
| ... | ... | @@ -947,11 +982,13 @@ def process_file(container, filename, data, output_dir=None, save_object=False): |
| 947 | 982 | ext = 'bin' |
| 948 | 983 | fname = '%s_object_%08X.%s' % (fname_prefix, rtfobj.start, ext) |
| 949 | 984 | print(' saving to file %s' % fname) |
| 985 | + print(' md5 %s' % rtfobj.oledata_md5) | |
| 950 | 986 | open(fname, 'wb').write(rtfobj.oledata) |
| 951 | 987 | else: |
| 952 | 988 | print('Saving raw data in object #%d:' % i) |
| 953 | 989 | fname = '%s_object_%08X.raw' % (fname_prefix, rtfobj.start) |
| 954 | 990 | print(' saving object to file %s' % fname) |
| 991 | + print(' md5 %s' % rtfobj.rawdata_md5) | |
| 955 | 992 | open(fname, 'wb').write(rtfobj.rawdata) |
| 956 | 993 | |
| 957 | 994 | |
| ... | ... | @@ -1035,4 +1072,3 @@ if __name__ == '__main__': |
| 1035 | 1072 | main() |
| 1036 | 1073 | |
| 1037 | 1074 | # This code was developed while listening to The Mary Onettes "Lost" |
| 1038 | - | ... | ... |
oletools/thirdparty/oledump/__init__.py
0 → 100644
oletools/thirdparty/oledump/plugin_biff.py
0 → 100644
| 1 | +#!/usr/bin/env python | |
| 2 | + | |
| 3 | +__description__ = 'BIFF plugin for oledump.py' | |
| 4 | +__author__ = 'Didier Stevens' | |
| 5 | +__version__ = '0.0.5' | |
| 6 | +__date__ = '2019/03/06' | |
| 7 | + | |
| 8 | +# Slightly modified version by Philippe Lagadec to be imported into olevba | |
| 9 | + | |
| 10 | +""" | |
| 11 | + | |
| 12 | +Source code put in public domain by Didier Stevens, no Copyright | |
| 13 | +https://DidierStevens.com | |
| 14 | +Use at your own risk | |
| 15 | + | |
| 16 | +History: | |
| 17 | + 2014/11/15: start | |
| 18 | + 2014/11/21: changed interface: added options; added options -a (asciidump) and -s (strings) | |
| 19 | + 2017/12/10: 0.0.2 added optparse & option -o | |
| 20 | + 2017/12/12: added option -f | |
| 21 | + 2017/12/13: added 0x support for option -f | |
| 22 | + 2018/10/24: 0.0.3 started coding Excel 4.0 macro support | |
| 23 | + 2018/10/25: continue | |
| 24 | + 2018/10/26: continue | |
| 25 | + 2019/01/05: 0.0.4 added option -x | |
| 26 | + 2019/03/06: 0.0.5 enhanced parsing of formula expressions | |
| 27 | + | |
| 28 | +Todo: | |
| 29 | +""" | |
| 30 | + | |
| 31 | +import struct | |
| 32 | +import re | |
| 33 | +import optparse | |
| 34 | +import binascii | |
| 35 | +import sys | |
| 36 | + | |
| 37 | +# from olevba: | |
| 38 | + | |
| 39 | +if sys.version_info[0] <= 2: | |
| 40 | + # Python 2.x | |
| 41 | + PYTHON2 = True | |
| 42 | +else: | |
| 43 | + # Python 3.x+ | |
| 44 | + PYTHON2 = False | |
| 45 | + | |
| 46 | +def unicode2str(unicode_string): | |
| 47 | + """ | |
| 48 | + convert a unicode string to a native str: | |
| 49 | + - on Python 3, it returns the same string | |
| 50 | + - on Python 2, the string is encoded with UTF-8 to a bytes str | |
| 51 | + :param unicode_string: unicode string to be converted | |
| 52 | + :return: the string converted to str | |
| 53 | + :rtype: str | |
| 54 | + """ | |
| 55 | + if PYTHON2: | |
| 56 | + return unicode_string.encode('utf8', errors='replace') | |
| 57 | + else: | |
| 58 | + return unicode_string | |
| 59 | + | |
| 60 | + | |
| 61 | +def bytes2str(bytes_string, encoding='utf8'): | |
| 62 | + """ | |
| 63 | + convert a bytes string to a native str: | |
| 64 | + - on Python 2, it returns the same string (bytes=str) | |
| 65 | + - on Python 3, the string is decoded using the provided encoding | |
| 66 | + (UTF-8 by default) to a unicode str | |
| 67 | + :param bytes_string: bytes string to be converted | |
| 68 | + :param encoding: codec to be used for decoding | |
| 69 | + :return: the string converted to str | |
| 70 | + :rtype: str | |
| 71 | + """ | |
| 72 | + if PYTHON2: | |
| 73 | + return bytes_string | |
| 74 | + else: | |
| 75 | + return bytes_string.decode(encoding, errors='replace') | |
| 76 | + | |
| 77 | + | |
| 78 | +dTokens = { | |
| 79 | +0x01: 'ptgExp', | |
| 80 | +0x02: 'ptgTbl', | |
| 81 | +0x03: 'ptgAdd', | |
| 82 | +0x04: 'ptgSub', | |
| 83 | +0x05: 'ptgMul', | |
| 84 | +0x06: 'ptgDiv', | |
| 85 | +0x07: 'ptgPower', | |
| 86 | +0x08: 'ptgConcat', | |
| 87 | +0x09: 'ptgLT', | |
| 88 | +0x0A: 'ptgLE', | |
| 89 | +0x0B: 'ptgEQ', | |
| 90 | +0x0C: 'ptgGE', | |
| 91 | +0x0D: 'ptgGT', | |
| 92 | +0x0E: 'ptgNE', | |
| 93 | +0x0F: 'ptgIsect', | |
| 94 | +0x10: 'ptgUnion', | |
| 95 | +0x11: 'ptgRange', | |
| 96 | +0x12: 'ptgUplus', | |
| 97 | +0x13: 'ptgUminus', | |
| 98 | +0x14: 'ptgPercent', | |
| 99 | +0x15: 'ptgParen', | |
| 100 | +0x16: 'ptgMissArg', | |
| 101 | +0x17: 'ptgStr', | |
| 102 | +0x19: 'ptgAttr', | |
| 103 | +0x1A: 'ptgSheet', | |
| 104 | +0x1B: 'ptgEndSheet', | |
| 105 | +0x1C: 'ptgErr', | |
| 106 | +0x1D: 'ptgBool', | |
| 107 | +0x1E: 'ptgInt', | |
| 108 | +0x1F: 'ptgNum', | |
| 109 | +0x20: 'ptgArray', | |
| 110 | +0x21: 'ptgFunc', | |
| 111 | +0x22: 'ptgFuncVar', | |
| 112 | +0x23: 'ptgName', | |
| 113 | +0x24: 'ptgRef', | |
| 114 | +0x25: 'ptgArea', | |
| 115 | +0x26: 'ptgMemArea', | |
| 116 | +0x27: 'ptgMemErr', | |
| 117 | +0x28: 'ptgMemNoMem', | |
| 118 | +0x29: 'ptgMemFunc', | |
| 119 | +0x2A: 'ptgRefErr', | |
| 120 | +0x2B: 'ptgAreaErr', | |
| 121 | +0x2C: 'ptgRefN', | |
| 122 | +0x2D: 'ptgAreaN', | |
| 123 | +0x2E: 'ptgMemAreaN', | |
| 124 | +0x2F: 'ptgMemNoMemN', | |
| 125 | +0x39: 'ptgNameX', | |
| 126 | +0x3A: 'ptgRef3d', | |
| 127 | +0x3B: 'ptgArea3d', | |
| 128 | +0x3C: 'ptgRefErr3d', | |
| 129 | +0x3D: 'ptgAreaErr3d', | |
| 130 | +0x40: 'ptgArrayV', | |
| 131 | +0x41: 'ptgFuncV', | |
| 132 | +0x42: 'ptgFuncVarV', | |
| 133 | +0x43: 'ptgNameV', | |
| 134 | +0x44: 'ptgRefV', | |
| 135 | +0x45: 'ptgAreaV', | |
| 136 | +0x46: 'ptgMemAreaV', | |
| 137 | +0x47: 'ptgMemErrV', | |
| 138 | +0x48: 'ptgMemNoMemV', | |
| 139 | +0x49: 'ptgMemFuncV', | |
| 140 | +0x4A: 'ptgRefErrV', | |
| 141 | +0x4B: 'ptgAreaErrV', | |
| 142 | +0x4C: 'ptgRefNV', | |
| 143 | +0x4D: 'ptgAreaNV', | |
| 144 | +0x4E: 'ptgMemAreaNV', | |
| 145 | +0x4F: 'ptgMemNoMemNV', | |
| 146 | +0x58: 'ptgFuncCEV', | |
| 147 | +0x59: 'ptgNameXV', | |
| 148 | +0x5A: 'ptgRef3dV', | |
| 149 | +0x5B: 'ptgArea3dV', | |
| 150 | +0x5C: 'ptgRefErr3dV', | |
| 151 | +0x5D: 'ptgAreaErr3dV', | |
| 152 | +0x60: 'ptgArrayA', | |
| 153 | +0x61: 'ptgFuncA', | |
| 154 | +0x62: 'ptgFuncVarA', | |
| 155 | +0x63: 'ptgNameA', | |
| 156 | +0x64: 'ptgRefA', | |
| 157 | +0x65: 'ptgAreaA', | |
| 158 | +0x66: 'ptgMemAreaA', | |
| 159 | +0x67: 'ptgMemErrA', | |
| 160 | +0x68: 'ptgMemNoMemA', | |
| 161 | +0x69: 'ptgMemFuncA', | |
| 162 | +0x6A: 'ptgRefErrA', | |
| 163 | +0x6B: 'ptgAreaErrA', | |
| 164 | +0x6C: 'ptgRefNA', | |
| 165 | +0x6D: 'ptgAreaNA', | |
| 166 | +0x6E: 'ptgMemAreaNA', | |
| 167 | +0x6F: 'ptgMemNoMemNA', | |
| 168 | +0x78: 'ptgFuncCEA', | |
| 169 | +0x79: 'ptgNameXA', | |
| 170 | +0x7A: 'ptgRef3dA', | |
| 171 | +0x7B: 'ptgArea3dA', | |
| 172 | +0x7C: 'ptgRefErr3dA', | |
| 173 | +0x7D: 'ptgAreaErr3dA', | |
| 174 | +} | |
| 175 | + | |
| 176 | +#https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/00b5dd7d-51ca-4938-b7b7-483fe0e5933b | |
| 177 | +dFunctions = { | |
| 178 | +0x0000: 'COUNT', | |
| 179 | +0x0001: 'IF', | |
| 180 | +0x0002: 'ISNA', | |
| 181 | +0x0003: 'ISERROR', | |
| 182 | +0x0004: 'SUM', | |
| 183 | +0x0005: 'AVERAGE', | |
| 184 | +0x0006: 'MIN', | |
| 185 | +0x0007: 'MAX', | |
| 186 | +0x0008: 'ROW', | |
| 187 | +0x0009: 'COLUMN', | |
| 188 | +0x000A: 'NA', | |
| 189 | +0x000B: 'NPV', | |
| 190 | +0x000C: 'STDEV', | |
| 191 | +0x000D: 'DOLLAR', | |
| 192 | +0x000E: 'FIXED', | |
| 193 | +0x000F: 'SIN', | |
| 194 | +0x0010: 'COS', | |
| 195 | +0x0011: 'TAN', | |
| 196 | +0x0012: 'ATAN', | |
| 197 | +0x0013: 'PI', | |
| 198 | +0x0014: 'SQRT', | |
| 199 | +0x0015: 'EXP', | |
| 200 | +0x0016: 'LN', | |
| 201 | +0x0017: 'LOG10', | |
| 202 | +0x0018: 'ABS', | |
| 203 | +0x0019: 'INT', | |
| 204 | +0x001A: 'SIGN', | |
| 205 | +0x001B: 'ROUND', | |
| 206 | +0x001C: 'LOOKUP', | |
| 207 | +0x001D: 'INDEX', | |
| 208 | +0x001E: 'REPT', | |
| 209 | +0x001F: 'MID', | |
| 210 | +0x0020: 'LEN', | |
| 211 | +0x0021: 'VALUE', | |
| 212 | +0x0022: 'TRUE', | |
| 213 | +0x0023: 'FALSE', | |
| 214 | +0x0024: 'AND', | |
| 215 | +0x0025: 'OR', | |
| 216 | +0x0026: 'NOT', | |
| 217 | +0x0027: 'MOD', | |
| 218 | +0x0028: 'DCOUNT', | |
| 219 | +0x0029: 'DSUM', | |
| 220 | +0x002A: 'DAVERAGE', | |
| 221 | +0x002B: 'DMIN', | |
| 222 | +0x002C: 'DMAX', | |
| 223 | +0x002D: 'DSTDEV', | |
| 224 | +0x002E: 'VAR', | |
| 225 | +0x002F: 'DVAR', | |
| 226 | +0x0030: 'TEXT', | |
| 227 | +0x0031: 'LINEST', | |
| 228 | +0x0032: 'TREND', | |
| 229 | +0x0033: 'LOGEST', | |
| 230 | +0x0034: 'GROWTH', | |
| 231 | +0x0035: 'GOTO', | |
| 232 | +0x0036: 'HALT', | |
| 233 | +0x0037: 'RETURN', | |
| 234 | +0x0038: 'PV', | |
| 235 | +0x0039: 'FV', | |
| 236 | +0x003A: 'NPER', | |
| 237 | +0x003B: 'PMT', | |
| 238 | +0x003C: 'RATE', | |
| 239 | +0x003D: 'MIRR', | |
| 240 | +0x003E: 'IRR', | |
| 241 | +0x003F: 'RAND', | |
| 242 | +0x0040: 'MATCH', | |
| 243 | +0x0041: 'DATE', | |
| 244 | +0x0042: 'TIME', | |
| 245 | +0x0043: 'DAY', | |
| 246 | +0x0044: 'MONTH', | |
| 247 | +0x0045: 'YEAR', | |
| 248 | +0x0046: 'WEEKDAY', | |
| 249 | +0x0047: 'HOUR', | |
| 250 | +0x0048: 'MINUTE', | |
| 251 | +0x0049: 'SECOND', | |
| 252 | +0x004A: 'NOW', | |
| 253 | +0x004B: 'AREAS', | |
| 254 | +0x004C: 'ROWS', | |
| 255 | +0x004D: 'COLUMNS', | |
| 256 | +0x004E: 'OFFSET', | |
| 257 | +0x004F: 'ABSREF', | |
| 258 | +0x0050: 'RELREF', | |
| 259 | +0x0051: 'ARGUMENT', | |
| 260 | +0x0052: 'SEARCH', | |
| 261 | +0x0053: 'TRANSPOSE', | |
| 262 | +0x0054: 'ERROR', | |
| 263 | +0x0055: 'STEP', | |
| 264 | +0x0056: 'TYPE', | |
| 265 | +0x0057: 'ECHO', | |
| 266 | +0x0058: 'SET.NAME', | |
| 267 | +0x0059: 'CALLER', | |
| 268 | +0x005A: 'DEREF', | |
| 269 | +0x005B: 'WINDOWS', | |
| 270 | +0x005C: 'SERIES', | |
| 271 | +0x005D: 'DOCUMENTS', | |
| 272 | +0x005E: 'ACTIVE.CELL', | |
| 273 | +0x005F: 'SELECTION', | |
| 274 | +0x0060: 'RESULT', | |
| 275 | +0x0061: 'ATAN2', | |
| 276 | +0x0062: 'ASIN', | |
| 277 | +0x0063: 'ACOS', | |
| 278 | +0x0064: 'CHOOSE', | |
| 279 | +0x0065: 'HLOOKUP', | |
| 280 | +0x0066: 'VLOOKUP', | |
| 281 | +0x0067: 'LINKS', | |
| 282 | +0x0068: 'INPUT', | |
| 283 | +0x0069: 'ISREF', | |
| 284 | +0x006A: 'GET.FORMULA', | |
| 285 | +0x006B: 'GET.NAME', | |
| 286 | +0x006C: 'SET.VALUE', | |
| 287 | +0x006D: 'LOG', | |
| 288 | +0x006E: 'EXEC', | |
| 289 | +0x006F: 'CHAR', | |
| 290 | +0x0070: 'LOWER', | |
| 291 | +0x0071: 'UPPER', | |
| 292 | +0x0072: 'PROPER', | |
| 293 | +0x0073: 'LEFT', | |
| 294 | +0x0074: 'RIGHT', | |
| 295 | +0x0075: 'EXACT', | |
| 296 | +0x0076: 'TRIM', | |
| 297 | +0x0077: 'REPLACE', | |
| 298 | +0x0078: 'SUBSTITUTE', | |
| 299 | +0x0079: 'CODE', | |
| 300 | +0x007A: 'NAMES', | |
| 301 | +0x007B: 'DIRECTORY', | |
| 302 | +0x007C: 'FIND', | |
| 303 | +0x007D: 'CELL', | |
| 304 | +0x007E: 'ISERR', | |
| 305 | +0x007F: 'ISTEXT', | |
| 306 | +0x0080: 'ISNUMBER', | |
| 307 | +0x0081: 'ISBLANK', | |
| 308 | +0x0082: 'T', | |
| 309 | +0x0083: 'N', | |
| 310 | +0x0084: 'FOPEN', | |
| 311 | +0x0085: 'FCLOSE', | |
| 312 | +0x0086: 'FSIZE', | |
| 313 | +0x0087: 'FREADLN', | |
| 314 | +0x0088: 'FREAD', | |
| 315 | +0x0089: 'FWRITELN', | |
| 316 | +0x008A: 'FWRITE', | |
| 317 | +0x008B: 'FPOS', | |
| 318 | +0x008C: 'DATEVALUE', | |
| 319 | +0x008D: 'TIMEVALUE', | |
| 320 | +0x008E: 'SLN', | |
| 321 | +0x008F: 'SYD', | |
| 322 | +0x0090: 'DDB', | |
| 323 | +0x0091: 'GET.DEF', | |
| 324 | +0x0092: 'REFTEXT', | |
| 325 | +0x0093: 'TEXTREF', | |
| 326 | +0x0094: 'INDIRECT', | |
| 327 | +0x0095: 'REGISTER', | |
| 328 | +0x0096: 'CALL', | |
| 329 | +0x0097: 'ADD.BAR', | |
| 330 | +0x0098: 'ADD.MENU', | |
| 331 | +0x0099: 'ADD.COMMAND', | |
| 332 | +0x009A: 'ENABLE.COMMAND', | |
| 333 | +0x009B: 'CHECK.COMMAND', | |
| 334 | +0x009C: 'RENAME.COMMAND', | |
| 335 | +0x009D: 'SHOW.BAR', | |
| 336 | +0x009E: 'DELETE.MENU', | |
| 337 | +0x009F: 'DELETE.COMMAND', | |
| 338 | +0x00A0: 'GET.CHART.ITEM', | |
| 339 | +0x00A1: 'DIALOG.BOX', | |
| 340 | +0x00A2: 'CLEAN', | |
| 341 | +0x00A3: 'MDETERM', | |
| 342 | +0x00A4: 'MINVERSE', | |
| 343 | +0x00A5: 'MMULT', | |
| 344 | +0x00A6: 'FILES', | |
| 345 | +0x00A7: 'IPMT', | |
| 346 | +0x00A8: 'PPMT', | |
| 347 | +0x00A9: 'COUNTA', | |
| 348 | +0x00AA: 'CANCEL.KEY', | |
| 349 | +0x00AB: 'FOR', | |
| 350 | +0x00AC: 'WHILE', | |
| 351 | +0x00AD: 'BREAK', | |
| 352 | +0x00AE: 'NEXT', | |
| 353 | +0x00AF: 'INITIATE', | |
| 354 | +0x00B0: 'REQUEST', | |
| 355 | +0x00B1: 'POKE', | |
| 356 | +0x00B2: 'EXECUTE', | |
| 357 | +0x00B3: 'TERMINATE', | |
| 358 | +0x00B4: 'RESTART', | |
| 359 | +0x00B5: 'HELP', | |
| 360 | +0x00B6: 'GET.BAR', | |
| 361 | +0x00B7: 'PRODUCT', | |
| 362 | +0x00B8: 'FACT', | |
| 363 | +0x00B9: 'GET.CELL', | |
| 364 | +0x00BA: 'GET.WORKSPACE', | |
| 365 | +0x00BB: 'GET.WINDOW', | |
| 366 | +0x00BC: 'GET.DOCUMENT', | |
| 367 | +0x00BD: 'DPRODUCT', | |
| 368 | +0x00BE: 'ISNONTEXT', | |
| 369 | +0x00BF: 'GET.NOTE', | |
| 370 | +0x00C0: 'NOTE', | |
| 371 | +0x00C1: 'STDEVP', | |
| 372 | +0x00C2: 'VARP', | |
| 373 | +0x00C3: 'DSTDEVP', | |
| 374 | +0x00C4: 'DVARP', | |
| 375 | +0x00C5: 'TRUNC', | |
| 376 | +0x00C6: 'ISLOGICAL', | |
| 377 | +0x00C7: 'DCOUNTA', | |
| 378 | +0x00C8: 'DELETE.BAR', | |
| 379 | +0x00C9: 'UNREGISTER', | |
| 380 | +0x00CC: 'USDOLLAR', | |
| 381 | +0x00CD: 'FINDB', | |
| 382 | +0x00CE: 'SEARCHB', | |
| 383 | +0x00CF: 'REPLACEB', | |
| 384 | +0x00D0: 'LEFTB', | |
| 385 | +0x00D1: 'RIGHTB', | |
| 386 | +0x00D2: 'MIDB', | |
| 387 | +0x00D3: 'LENB', | |
| 388 | +0x00D4: 'ROUNDUP', | |
| 389 | +0x00D5: 'ROUNDDOWN', | |
| 390 | +0x00D6: 'ASC', | |
| 391 | +0x00D7: 'DBCS', | |
| 392 | +0x00D8: 'RANK', | |
| 393 | +0x00DB: 'ADDRESS', | |
| 394 | +0x00DC: 'DAYS360', | |
| 395 | +0x00DD: 'TODAY', | |
| 396 | +0x00DE: 'VDB', | |
| 397 | +0x00DF: 'ELSE', | |
| 398 | +0x00E0: 'ELSE.IF', | |
| 399 | +0x00E1: 'END.IF', | |
| 400 | +0x00E2: 'FOR.CELL', | |
| 401 | +0x00E3: 'MEDIAN', | |
| 402 | +0x00E4: 'SUMPRODUCT', | |
| 403 | +0x00E5: 'SINH', | |
| 404 | +0x00E6: 'COSH', | |
| 405 | +0x00E7: 'TANH', | |
| 406 | +0x00E8: 'ASINH', | |
| 407 | +0x00E9: 'ACOSH', | |
| 408 | +0x00EA: 'ATANH', | |
| 409 | +0x00EB: 'DGET', | |
| 410 | +0x00EC: 'CREATE.OBJECT', | |
| 411 | +0x00ED: 'VOLATILE', | |
| 412 | +0x00EE: 'LAST.ERROR', | |
| 413 | +0x00EF: 'CUSTOM.UNDO', | |
| 414 | +0x00F0: 'CUSTOM.REPEAT', | |
| 415 | +0x00F1: 'FORMULA.CONVERT', | |
| 416 | +0x00F2: 'GET.LINK.INFO', | |
| 417 | +0x00F3: 'TEXT.BOX', | |
| 418 | +0x00F4: 'INFO', | |
| 419 | +0x00F5: 'GROUP', | |
| 420 | +0x00F6: 'GET.OBJECT', | |
| 421 | +0x00F7: 'DB', | |
| 422 | +0x00F8: 'PAUSE', | |
| 423 | +0x00FB: 'RESUME', | |
| 424 | +0x00FC: 'FREQUENCY', | |
| 425 | +0x00FD: 'ADD.TOOLBAR', | |
| 426 | +0x00FE: 'DELETE.TOOLBAR', | |
| 427 | +0x00FF: 'User Defined Function', | |
| 428 | +0x0100: 'RESET.TOOLBAR', | |
| 429 | +0x0101: 'EVALUATE', | |
| 430 | +0x0102: 'GET.TOOLBAR', | |
| 431 | +0x0103: 'GET.TOOL', | |
| 432 | +0x0104: 'SPELLING.CHECK', | |
| 433 | +0x0105: 'ERROR.TYPE', | |
| 434 | +0x0106: 'APP.TITLE', | |
| 435 | +0x0107: 'WINDOW.TITLE', | |
| 436 | +0x0108: 'SAVE.TOOLBAR', | |
| 437 | +0x0109: 'ENABLE.TOOL', | |
| 438 | +0x010A: 'PRESS.TOOL', | |
| 439 | +0x010B: 'REGISTER.ID', | |
| 440 | +0x010C: 'GET.WORKBOOK', | |
| 441 | +0x010D: 'AVEDEV', | |
| 442 | +0x010E: 'BETADIST', | |
| 443 | +0x010F: 'GAMMALN', | |
| 444 | +0x0110: 'BETAINV', | |
| 445 | +0x0111: 'BINOMDIST', | |
| 446 | +0x0112: 'CHIDIST', | |
| 447 | +0x0113: 'CHIINV', | |
| 448 | +0x0114: 'COMBIN', | |
| 449 | +0x0115: 'CONFIDENCE', | |
| 450 | +0x0116: 'CRITBINOM', | |
| 451 | +0x0117: 'EVEN', | |
| 452 | +0x0118: 'EXPONDIST', | |
| 453 | +0x0119: 'FDIST', | |
| 454 | +0x011A: 'FINV', | |
| 455 | +0x011B: 'FISHER', | |
| 456 | +0x011C: 'FISHERINV', | |
| 457 | +0x011D: 'FLOOR', | |
| 458 | +0x011E: 'GAMMADIST', | |
| 459 | +0x011F: 'GAMMAINV', | |
| 460 | +0x0120: 'CEILING', | |
| 461 | +0x0121: 'HYPGEOMDIST', | |
| 462 | +0x0122: 'LOGNORMDIST', | |
| 463 | +0x0123: 'LOGINV', | |
| 464 | +0x0124: 'NEGBINOMDIST', | |
| 465 | +0x0125: 'NORMDIST', | |
| 466 | +0x0126: 'NORMSDIST', | |
| 467 | +0x0127: 'NORMINV', | |
| 468 | +0x0128: 'NORMSINV', | |
| 469 | +0x0129: 'STANDARDIZE', | |
| 470 | +0x012A: 'ODD', | |
| 471 | +0x012B: 'PERMUT', | |
| 472 | +0x012C: 'POISSON', | |
| 473 | +0x012D: 'TDIST', | |
| 474 | +0x012E: 'WEIBULL', | |
| 475 | +0x012F: 'SUMXMY2', | |
| 476 | +0x0130: 'SUMX2MY2', | |
| 477 | +0x0131: 'SUMX2PY2', | |
| 478 | +0x0132: 'CHITEST', | |
| 479 | +0x0133: 'CORREL', | |
| 480 | +0x0134: 'COVAR', | |
| 481 | +0x0135: 'FORECAST', | |
| 482 | +0x0136: 'FTEST', | |
| 483 | +0x0137: 'INTERCEPT', | |
| 484 | +0x0138: 'PEARSON', | |
| 485 | +0x0139: 'RSQ', | |
| 486 | +0x013A: 'STEYX', | |
| 487 | +0x013B: 'SLOPE', | |
| 488 | +0x013C: 'TTEST', | |
| 489 | +0x013D: 'PROB', | |
| 490 | +0x013E: 'DEVSQ', | |
| 491 | +0x013F: 'GEOMEAN', | |
| 492 | +0x0140: 'HARMEAN', | |
| 493 | +0x0141: 'SUMSQ', | |
| 494 | +0x0142: 'KURT', | |
| 495 | +0x0143: 'SKEW', | |
| 496 | +0x0144: 'ZTEST', | |
| 497 | +0x0145: 'LARGE', | |
| 498 | +0x0146: 'SMALL', | |
| 499 | +0x0147: 'QUARTILE', | |
| 500 | +0x0148: 'PERCENTILE', | |
| 501 | +0x0149: 'PERCENTRANK', | |
| 502 | +0x014A: 'MODE', | |
| 503 | +0x014B: 'TRIMMEAN', | |
| 504 | +0x014C: 'TINV', | |
| 505 | +0x014E: 'MOVIE.COMMAND', | |
| 506 | +0x014F: 'GET.MOVIE', | |
| 507 | +0x0150: 'CONCATENATE', | |
| 508 | +0x0151: 'POWER', | |
| 509 | +0x0152: 'PIVOT.ADD.DATA', | |
| 510 | +0x0153: 'GET.PIVOT.TABLE', | |
| 511 | +0x0154: 'GET.PIVOT.FIELD', | |
| 512 | +0x0155: 'GET.PIVOT.ITEM', | |
| 513 | +0x0156: 'RADIANS', | |
| 514 | +0x0157: 'DEGREES', | |
| 515 | +0x0158: 'SUBTOTAL', | |
| 516 | +0x0159: 'SUMIF', | |
| 517 | +0x015A: 'COUNTIF', | |
| 518 | +0x015B: 'COUNTBLANK', | |
| 519 | +0x015C: 'SCENARIO.GET', | |
| 520 | +0x015D: 'OPTIONS.LISTS.GET', | |
| 521 | +0x015E: 'ISPMT', | |
| 522 | +0x015F: 'DATEDIF', | |
| 523 | +0x0160: 'DATESTRING', | |
| 524 | +0x0161: 'NUMBERSTRING', | |
| 525 | +0x0162: 'ROMAN', | |
| 526 | +0x0163: 'OPEN.DIALOG', | |
| 527 | +0x0164: 'SAVE.DIALOG', | |
| 528 | +0x0165: 'VIEW.GET', | |
| 529 | +0x0166: 'GETPIVOTDATA', | |
| 530 | +0x0167: 'HYPERLINK', | |
| 531 | +0x0168: 'PHONETIC', | |
| 532 | +0x0169: 'AVERAGEA', | |
| 533 | +0x016A: 'MAXA', | |
| 534 | +0x016B: 'MINA', | |
| 535 | +0x016C: 'STDEVPA', | |
| 536 | +0x016D: 'VARPA', | |
| 537 | +0x016E: 'STDEVA', | |
| 538 | +0x016F: 'VARA', | |
| 539 | +0x0170: 'BAHTTEXT', | |
| 540 | +0x0171: 'THAIDAYOFWEEK', | |
| 541 | +0x0172: 'THAIDIGIT', | |
| 542 | +0x0173: 'THAIMONTHOFYEAR', | |
| 543 | +0x0174: 'THAINUMSOUND', | |
| 544 | +0x0175: 'THAINUMSTRING', | |
| 545 | +0x0176: 'THAISTRINGLENGTH', | |
| 546 | +0x0177: 'ISTHAIDIGIT', | |
| 547 | +0x0178: 'ROUNDBAHTDOWN', | |
| 548 | +0x0179: 'ROUNDBAHTUP', | |
| 549 | +0x017A: 'THAIYEAR', | |
| 550 | +0x017B: 'RTD', | |
| 551 | + | |
| 552 | +0x8076: 'ALERT', | |
| 553 | +} | |
| 554 | + | |
| 555 | +dOpcodes = { | |
| 556 | + 0x06: 'FORMULA : Cell Formula', | |
| 557 | + 0x0A: 'EOF : End of File', | |
| 558 | + 0x0C: 'CALCCOUNT : Iteration Count', | |
| 559 | + 0x0D: 'CALCMODE : Calculation Mode', | |
| 560 | + 0x0E: 'PRECISION : Precision', | |
| 561 | + 0x0F: 'REFMODE : Reference Mode', | |
| 562 | + 0x10: 'DELTA : Iteration Increment', | |
| 563 | + 0x11: 'ITERATION : Iteration Mode', | |
| 564 | + 0x12: 'PROTECT : Protection Flag', | |
| 565 | + 0x13: 'PASSWORD : Protection Password', | |
| 566 | + 0x14: 'HEADER : Print Header on Each Page', | |
| 567 | + 0x15: 'FOOTER : Print Footer on Each Page', | |
| 568 | + 0x16: 'EXTERNCOUNT : Number of External References', | |
| 569 | + 0x17: 'EXTERNSHEET : External Reference', | |
| 570 | + 0x18: 'LABEL : Cell Value, String Constant', | |
| 571 | + 0x19: 'WINDOWPROTECT : Windows Are Protected', | |
| 572 | + 0x1A: 'VERTICALPAGEBREAKS : Explicit Column Page Breaks', | |
| 573 | + 0x1B: 'HORIZONTALPAGEBREAKS : Explicit Row Page Breaks', | |
| 574 | + 0x1C: 'NOTE : Comment Associated with a Cell', | |
| 575 | + 0x1D: 'SELECTION : Current Selection', | |
| 576 | + 0x22: '1904 : 1904 Date System', | |
| 577 | + 0x26: 'LEFTMARGIN : Left Margin Measurement', | |
| 578 | + 0x27: 'RIGHTMARGIN : Right Margin Measurement', | |
| 579 | + 0x28: 'TOPMARGIN : Top Margin Measurement', | |
| 580 | + 0x29: 'BOTTOMMARGIN : Bottom Margin Measurement', | |
| 581 | + 0x2A: 'PRINTHEADERS : Print Row/Column Labels', | |
| 582 | + 0x2B: 'PRINTGRIDLINES : Print Gridlines Flag', | |
| 583 | + 0x2F: 'FILEPASS : File Is Password-Protected', | |
| 584 | + 0x3C: 'CONTINUE : Continues Long Records', | |
| 585 | + 0x3D: 'WINDOW1 : Window Information', | |
| 586 | + 0x40: 'BACKUP : Save Backup Version of the File', | |
| 587 | + 0x41: 'PANE : Number of Panes and Their Position', | |
| 588 | + 0x42: 'CODENAME : VBE Object Name', | |
| 589 | + 0x42: 'CODEPAGE : Default Code Page', | |
| 590 | + 0x4D: 'PLS : Environment-Specific Print Record', | |
| 591 | + 0x50: 'DCON : Data Consolidation Information', | |
| 592 | + 0x51: 'DCONREF : Data Consolidation References', | |
| 593 | + 0x52: 'DCONNAME : Data Consolidation Named References', | |
| 594 | + 0x55: 'DEFCOLWIDTH : Default Width for Columns', | |
| 595 | + 0x59: 'XCT : CRN Record Count', | |
| 596 | + 0x5A: 'CRN : Nonresident Operands', | |
| 597 | + 0x5B: 'FILESHARING : File-Sharing Information', | |
| 598 | + 0x5C: 'WRITEACCESS : Write Access User Name', | |
| 599 | + 0x5D: 'OBJ : Describes a Graphic Object', | |
| 600 | + 0x5E: 'UNCALCED : Recalculation Status', | |
| 601 | + 0x5F: 'SAVERECALC : Recalculate Before Save', | |
| 602 | + 0x60: 'TEMPLATE : Workbook Is a Template', | |
| 603 | + 0x63: 'OBJPROTECT : Objects Are Protected', | |
| 604 | + 0x7D: 'COLINFO : Column Formatting Information', | |
| 605 | + 0x7E: 'RK : Cell Value, RK Number', | |
| 606 | + 0x7F: 'IMDATA : Image Data', | |
| 607 | + 0x80: 'GUTS : Size of Row and Column Gutters', | |
| 608 | + 0x81: 'WSBOOL : Additional Workspace Information', | |
| 609 | + 0x82: 'GRIDSET : State Change of Gridlines Option', | |
| 610 | + 0x83: 'HCENTER : Center Between Horizontal Margins', | |
| 611 | + 0x84: 'VCENTER : Center Between Vertical Margins', | |
| 612 | + 0x85: 'BOUNDSHEET : Sheet Information', | |
| 613 | + 0x86: 'WRITEPROT : Workbook Is Write-Protected', | |
| 614 | + 0x87: 'ADDIN : Workbook Is an Add-in Macro', | |
| 615 | + 0x88: 'EDG : Edition Globals', | |
| 616 | + 0x89: 'PUB : Publisher', | |
| 617 | + 0x8C: 'COUNTRY : Default Country and WIN.INI Country', | |
| 618 | + 0x8D: 'HIDEOBJ : Object Display Options', | |
| 619 | + 0x90: 'SORT : Sorting Options', | |
| 620 | + 0x91: 'SUB : Subscriber', | |
| 621 | + 0x92: 'PALETTE : Color Palette Definition', | |
| 622 | + 0x94: 'LHRECORD : .WK? File Conversion Information', | |
| 623 | + 0x95: 'LHNGRAPH : Named Graph Information', | |
| 624 | + 0x96: 'SOUND : Sound Note', | |
| 625 | + 0x98: 'LPR : Sheet Was Printed Using LINE.PRINT(', | |
| 626 | + 0x99: 'STANDARDWIDTH : Standard Column Width', | |
| 627 | + 0x9A: 'FNGROUPNAME : Function Group Name', | |
| 628 | + 0x9B: 'FILTERMODE : Sheet Contains Filtered List', | |
| 629 | + 0x9C: 'FNGROUPCOUNT : Built-in Function Group Count', | |
| 630 | + 0x9D: 'AUTOFILTERINFO : Drop-Down Arrow Count', | |
| 631 | + 0x9E: 'AUTOFILTER : AutoFilter Data', | |
| 632 | + 0xA0: 'SCL : Window Zoom Magnification', | |
| 633 | + 0xA1: 'SETUP : Page Setup', | |
| 634 | + 0xA9: 'COORDLIST : Polygon Object Vertex Coordinates', | |
| 635 | + 0xAB: 'GCW : Global Column-Width Flags', | |
| 636 | + 0xAE: 'SCENMAN : Scenario Output Data', | |
| 637 | + 0xAF: 'SCENARIO : Scenario Data', | |
| 638 | + 0xB0: 'SXVIEW : View Definition', | |
| 639 | + 0xB1: 'SXVD : View Fields', | |
| 640 | + 0xB2: 'SXVI : View Item', | |
| 641 | + 0xB4: 'SXIVD : Row/Column Field IDs', | |
| 642 | + 0xB5: 'SXLI : Line Item Array', | |
| 643 | + 0xB6: 'SXPI : Page Item', | |
| 644 | + 0xB8: 'DOCROUTE : Routing Slip Information', | |
| 645 | + 0xB9: 'RECIPNAME : Recipient Name', | |
| 646 | + 0xBC: 'SHRFMLA : Shared Formula', | |
| 647 | + 0xBD: 'MULRK : Multiple RK Cells', | |
| 648 | + 0xBE: 'MULBLANK : Multiple Blank Cells', | |
| 649 | + 0xC1: 'MMS : ADDMENU / DELMENU Record Group Count', | |
| 650 | + 0xC2: 'ADDMENU : Menu Addition', | |
| 651 | + 0xC3: 'DELMENU : Menu Deletion', | |
| 652 | + 0xC5: 'SXDI : Data Item', | |
| 653 | + 0xC6: 'SXDB : PivotTable Cache Data', | |
| 654 | + 0xCD: 'SXSTRING : String', | |
| 655 | + 0xD0: 'SXTBL : Multiple Consolidation Source Info', | |
| 656 | + 0xD1: 'SXTBRGIITM : Page Item Name Count', | |
| 657 | + 0xD2: 'SXTBPG : Page Item Indexes', | |
| 658 | + 0xD3: 'OBPROJ : Visual Basic Project', | |
| 659 | + 0xD5: 'SXIDSTM : Stream ID', | |
| 660 | + 0xD6: 'RSTRING : Cell with Character Formatting', | |
| 661 | + 0xD7: 'DBCELL : Stream Offsets', | |
| 662 | + 0xDA: 'BOOKBOOL : Workbook Option Flag', | |
| 663 | + 0xDC: 'PARAMQRY : Query Parameters', | |
| 664 | + 0xDC: 'SXEXT : External Source Information', | |
| 665 | + 0xDD: 'SCENPROTECT : Scenario Protection', | |
| 666 | + 0xDE: 'OLESIZE : Size of OLE Object', | |
| 667 | + 0xDF: 'UDDESC : Description String for Chart Autoformat', | |
| 668 | + 0xE0: 'XF : Extended Format', | |
| 669 | + 0xE1: 'INTERFACEHDR : Beginning of User Interface Records', | |
| 670 | + 0xE2: 'INTERFACEEND : End of User Interface Records', | |
| 671 | + 0xE3: 'SXVS : View Source', | |
| 672 | + 0xE5: 'MERGECELLS : Merged Cells', | |
| 673 | + 0xEA: 'TABIDCONF : Sheet Tab ID of Conflict History', | |
| 674 | + 0xEB: 'MSODRAWINGGROUP : Microsoft Office Drawing Group', | |
| 675 | + 0xEC: 'MSODRAWING : Microsoft Office Drawing', | |
| 676 | + 0xED: 'MSODRAWINGSELECTION : Microsoft Office Drawing Selection', | |
| 677 | + 0xF0: 'SXRULE : PivotTable Rule Data', | |
| 678 | + 0xF1: 'SXEX : PivotTable View Extended Information', | |
| 679 | + 0xF2: 'SXFILT : PivotTable Rule Filter', | |
| 680 | + 0xF4: 'SXDXF : Pivot Table Formatting', | |
| 681 | + 0xF5: 'SXITM : Pivot Table Item Indexes', | |
| 682 | + 0xF6: 'SXNAME : PivotTable Name', | |
| 683 | + 0xF7: 'SXSELECT : PivotTable Selection Information', | |
| 684 | + 0xF8: 'SXPAIR : PivotTable Name Pair', | |
| 685 | + 0xF9: 'SXFMLA : Pivot Table Parsed Expression', | |
| 686 | + 0xFB: 'SXFORMAT : PivotTable Format Record', | |
| 687 | + 0xFC: 'SST : Shared String Table', | |
| 688 | + 0xFD: 'LABELSST : Cell Value, String Constant/ SST', | |
| 689 | + 0xFF: 'EXTSST : Extended Shared String Table', | |
| 690 | + 0x100: 'SXVDEX : Extended PivotTable View Fields', | |
| 691 | + 0x103: 'SXFORMULA : PivotTable Formula Record', | |
| 692 | + 0x122: 'SXDBEX : PivotTable Cache Data', | |
| 693 | + 0x13D: 'TABID : Sheet Tab Index Array', | |
| 694 | + 0x160: 'USESELFS : Natural Language Formulas Flag', | |
| 695 | + 0x161: 'DSF : Double Stream File', | |
| 696 | + 0x162: 'XL5MODIFY : Flag for DSF', | |
| 697 | + 0x1A5: 'FILESHARING2 : File-Sharing Information for Shared Lists', | |
| 698 | + 0x1A9: 'USERBVIEW : Workbook Custom View Settings', | |
| 699 | + 0x1AA: 'USERSVIEWBEGIN : Custom View Settings', | |
| 700 | + 0x1AB: 'USERSVIEWEND : End of Custom View Records', | |
| 701 | + 0x1AD: 'QSI : External Data Range', | |
| 702 | + 0x1AE: 'SUPBOOK : Supporting Workbook', | |
| 703 | + 0x1AF: 'PROT4REV : Shared Workbook Protection Flag', | |
| 704 | + 0x1B0: 'CONDFMT : Conditional Formatting Range Information', | |
| 705 | + 0x1B1: 'CF : Conditional Formatting Conditions', | |
| 706 | + 0x1B2: 'DVAL : Data Validation Information', | |
| 707 | + 0x1B5: 'DCONBIN : Data Consolidation Information', | |
| 708 | + 0x1B6: 'TXO : Text Object', | |
| 709 | + 0x1B7: 'REFRESHALL : Refresh Flag', | |
| 710 | + 0x1B8: 'HLINK : Hyperlink', | |
| 711 | + 0x1BB: 'SXFDBTYPE : SQL Datatype Identifier', | |
| 712 | + 0x1BC: 'PROT4REVPASS : Shared Workbook Protection Password', | |
| 713 | + 0x1BE: 'DV : Data Validation Criteria', | |
| 714 | + 0x1C0: 'EXCEL9FILE : Excel 9 File', | |
| 715 | + 0x1C1: 'RECALCID : Recalc Information', | |
| 716 | + 0x200: 'DIMENSIONS : Cell Table Size', | |
| 717 | + 0x201: 'BLANK : Cell Value, Blank Cell', | |
| 718 | + 0x203: 'NUMBER : Cell Value, Floating-Point Number', | |
| 719 | + 0x204: 'LABEL : Cell Value, String Constant', | |
| 720 | + 0x205: 'BOOLERR : Cell Value, Boolean or Error', | |
| 721 | + 0x207: 'STRING : String Value of a Formula', | |
| 722 | + 0x208: 'ROW : Describes a Row', | |
| 723 | + 0x20B: 'INDEX : Index Record', | |
| 724 | + 0x218: 'NAME : Defined Name', | |
| 725 | + 0x221: 'ARRAY : Array-Entered Formula', | |
| 726 | + 0x223: 'EXTERNNAME : Externally Referenced Name', | |
| 727 | + 0x225: 'DEFAULTROWHEIGHT : Default Row Height', | |
| 728 | + 0x231: 'FONT : Font Description', | |
| 729 | + 0x236: 'TABLE : Data Table', | |
| 730 | + 0x23E: 'WINDOW2 : Sheet Window Information', | |
| 731 | + 0x293: 'STYLE : Style Information', | |
| 732 | + 0x406: 'FORMULA : Cell Formula', | |
| 733 | + 0x41E: 'FORMAT : Number Format', | |
| 734 | + 0x800: 'HLINKTOOLTIP : Hyperlink Tooltip', | |
| 735 | + 0x801: 'WEBPUB : Web Publish Item', | |
| 736 | + 0x802: 'QSISXTAG : PivotTable and Query Table Extensions', | |
| 737 | + 0x803: 'DBQUERYEXT : Database Query Extensions', | |
| 738 | + 0x804: 'EXTSTRING : FRT String', | |
| 739 | + 0x805: 'TXTQUERY : Text Query Information', | |
| 740 | + 0x806: 'QSIR : Query Table Formatting', | |
| 741 | + 0x807: 'QSIF : Query Table Field Formatting', | |
| 742 | + 0x809: 'BOF : Beginning of File', | |
| 743 | + 0x80A: 'OLEDBCONN : OLE Database Connection', | |
| 744 | + 0x80B: 'WOPT : Web Options', | |
| 745 | + 0x80C: 'SXVIEWEX : Pivot Table OLAP Extensions', | |
| 746 | + 0x80D: 'SXTH : PivotTable OLAP Hierarchy', | |
| 747 | + 0x80E: 'SXPIEX : OLAP Page Item Extensions', | |
| 748 | + 0x80F: 'SXVDTEX : View Dimension OLAP Extensions', | |
| 749 | + 0x810: 'SXVIEWEX9 : Pivot Table Extensions', | |
| 750 | + 0x812: 'CONTINUEFRT : Continued FRT', | |
| 751 | + 0x813: 'REALTIMEDATA : Real-Time Data (RTD)', | |
| 752 | + 0x862: 'SHEETEXT : Extra Sheet Info', | |
| 753 | + 0x863: 'BOOKEXT : Extra Book Info', | |
| 754 | + 0x864: 'SXADDL : Pivot Table Additional Info', | |
| 755 | + 0x865: 'CRASHRECERR : Crash Recovery Error', | |
| 756 | + 0x866: 'HFPicture : Header / Footer Picture', | |
| 757 | + 0x867: 'FEATHEADR : Shared Feature Header', | |
| 758 | + 0x868: 'FEAT : Shared Feature Record', | |
| 759 | + 0x86A: 'DATALABEXT : Chart Data Label Extension', | |
| 760 | + 0x86B: 'DATALABEXTCONTENTS : Chart Data Label Extension Contents', | |
| 761 | + 0x86C: 'CELLWATCH : Cell Watch', | |
| 762 | + 0x86d: 'FEATINFO : Shared Feature Info Record', | |
| 763 | + 0x871: 'FEATHEADR11 : Shared Feature Header 11', | |
| 764 | + 0x872: 'FEAT11 : Shared Feature 11 Record', | |
| 765 | + 0x873: 'FEATINFO11 : Shared Feature Info 11 Record', | |
| 766 | + 0x874: 'DROPDOWNOBJIDS : Drop Down Object', | |
| 767 | + 0x875: 'CONTINUEFRT11 : Continue FRT 11', | |
| 768 | + 0x876: 'DCONN : Data Connection', | |
| 769 | + 0x877: 'LIST12 : Extra Table Data Introduced in Excel 2007', | |
| 770 | + 0x878: 'FEAT12 : Shared Feature 12 Record', | |
| 771 | + 0x879: 'CONDFMT12 : Conditional Formatting Range Information 12', | |
| 772 | + 0x87A: 'CF12 : Conditional Formatting Condition 12', | |
| 773 | + 0x87B: 'CFEX : Conditional Formatting Extension', | |
| 774 | + 0x87C: 'XFCRC : XF Extensions Checksum', | |
| 775 | + 0x87D: 'XFEXT : XF Extension', | |
| 776 | + 0x87E: 'EZFILTER12 : AutoFilter Data Introduced in Excel 2007', | |
| 777 | + 0x87F: 'CONTINUEFRT12 : Continue FRT 12', | |
| 778 | + 0x881: 'SXADDL12 : Additional Workbook Connections Information', | |
| 779 | + 0x884: 'MDTINFO : Information about a Metadata Type', | |
| 780 | + 0x885: 'MDXSTR : MDX Metadata String', | |
| 781 | + 0x886: 'MDXTUPLE : Tuple MDX Metadata', | |
| 782 | + 0x887: 'MDXSET : Set MDX Metadata', | |
| 783 | + 0x888: 'MDXPROP : Member Property MDX Metadata', | |
| 784 | + 0x889: 'MDXKPI : Key Performance Indicator MDX Metadata', | |
| 785 | + 0x88A: 'MDTB : Block of Metadata Records', | |
| 786 | + 0x88B: 'PLV : Page Layout View Settings in Excel 2007', | |
| 787 | + 0x88C: 'COMPAT12 : Compatibility Checker 12', | |
| 788 | + 0x88D: 'DXF : Differential XF', | |
| 789 | + 0x88E: 'TABLESTYLES : Table Styles', | |
| 790 | + 0x88F: 'TABLESTYLE : Table Style', | |
| 791 | + 0x890: 'TABLESTYLEELEMENT : Table Style Element', | |
| 792 | + 0x892: 'STYLEEXT : Named Cell Style Extension', | |
| 793 | + 0x893: 'NAMEPUBLISH : Publish To Excel Server Data for Name', | |
| 794 | + 0x894: 'NAMECMT : Name Comment', | |
| 795 | + 0x895: 'SORTDATA12 : Sort Data 12', | |
| 796 | + 0x896: 'THEME : Theme', | |
| 797 | + 0x897: 'GUIDTYPELIB : VB Project Typelib GUID', | |
| 798 | + 0x898: 'FNGRP12 : Function Group', | |
| 799 | + 0x899: 'NAMEFNGRP12 : Extra Function Group', | |
| 800 | + 0x89A: 'MTRSETTINGS : Multi-Threaded Calculation Settings', | |
| 801 | + 0x89B: 'COMPRESSPICTURES : Automatic Picture Compression Mode', | |
| 802 | + 0x89C: 'HEADERFOOTER : Header Footer', | |
| 803 | + 0x8A3: 'FORCEFULLCALCULATION : Force Full Calculation Settings', | |
| 804 | + 0x8c1: 'LISTOBJ : List Object', | |
| 805 | + 0x8c2: 'LISTFIELD : List Field', | |
| 806 | + 0x8c3: 'LISTDV : List Data Validation', | |
| 807 | + 0x8c4: 'LISTCONDFMT : List Conditional Formatting', | |
| 808 | + 0x8c5: 'LISTCF : List Cell Formatting', | |
| 809 | + 0x8c6: 'FMQRY : Filemaker queries', | |
| 810 | + 0x8c7: 'FMSQRY : File maker queries', | |
| 811 | + 0x8c8: 'PLV : Page Layout View in Mac Excel 11', | |
| 812 | + 0x8c9: 'LNEXT : Extension information for borders in Mac Office 11', | |
| 813 | + 0x8ca: 'MKREXT : Extension information for markers in Mac Office 11' | |
| 814 | +} | |
| 815 | + | |
| 816 | + | |
| 817 | +# CIC: Call If Callable | |
| 818 | +def CIC(expression): | |
| 819 | + if callable(expression): | |
| 820 | + return expression() | |
| 821 | + else: | |
| 822 | + return expression | |
| 823 | + | |
| 824 | + | |
| 825 | +# IFF: IF Function | |
| 826 | +def IFF(expression, valueTrue, valueFalse): | |
| 827 | + if expression: | |
| 828 | + return CIC(valueTrue) | |
| 829 | + else: | |
| 830 | + return CIC(valueFalse) | |
| 831 | + | |
| 832 | + | |
| 833 | +def CombineHexASCII(hexDump, asciiDump, length): | |
| 834 | + if hexDump == '': | |
| 835 | + return '' | |
| 836 | + return hexDump + ' ' + (' ' * (3 * (length - len(asciiDump)))) + asciiDump | |
| 837 | + | |
| 838 | +def HexASCII(data, length=16): | |
| 839 | + result = [] | |
| 840 | + if len(data) > 0: | |
| 841 | + hexDump = '' | |
| 842 | + asciiDump = '' | |
| 843 | + for i, b in enumerate(data): | |
| 844 | + if i % length == 0: | |
| 845 | + if hexDump != '': | |
| 846 | + result.append(CombineHexASCII(hexDump, asciiDump, length)) | |
| 847 | + hexDump = '%08X:' % i | |
| 848 | + asciiDump = '' | |
| 849 | + hexDump += ' %02X' % ord(b) | |
| 850 | + asciiDump += IFF(ord(b) >= 32, b, '.') | |
| 851 | + result.append(CombineHexASCII(hexDump, asciiDump, length)) | |
| 852 | + return result | |
| 853 | + | |
| 854 | +def StringsASCII(data): | |
| 855 | + """ | |
| 856 | + Extract a list of plain ASCII strings of 4+ chars found in data. | |
| 857 | + :param data: bytearray or bytes | |
| 858 | + :return: list of str (converted to unicode on Python 3) | |
| 859 | + """ | |
| 860 | + # list of bytes strings: | |
| 861 | + bytes_strings = re.findall(b'[^\x00-\x08\x0A-\x1F\x7F-\xFF]{4,}', bytes(data)) | |
| 862 | + return [bytes2str(bs) for bs in bytes_strings] | |
| 863 | + | |
| 864 | +def StringsUNICODE(data): | |
| 865 | + """ | |
| 866 | + Extract a list of Unicode strings (made of 4+ plain ASCII characters only) found in data. | |
| 867 | + :param data: bytearray or bytes | |
| 868 | + :return: list of str (converted to unicode on Python 3) | |
| 869 | + """ | |
| 870 | + # list of bytes strings: | |
| 871 | + # TODO: check if the null byte should be before or after the ascii byte | |
| 872 | + bytes_strings = [foundunicodestring.replace(b'\x00', b'') for foundunicodestring, dummy in re.findall(b'(([^\x00-\x08\x0A-\x1F\x7F-\xFF]\x00){4,})', bytes(data))] | |
| 873 | + return [bytes2str(bs) for bs in bytes_strings] | |
| 874 | + | |
| 875 | +def Strings(data, encodings='sL'): | |
| 876 | + """ | |
| 877 | + | |
| 878 | + :param data bytearray: bytearray, data to be scanned for strings | |
| 879 | + :param encodings: | |
| 880 | + :return: dict with key = 's' or 'L', values = list of str | |
| 881 | + """ | |
| 882 | + dStrings = {} | |
| 883 | + for encoding in encodings: | |
| 884 | + if encoding == 's': | |
| 885 | + dStrings[encoding] = StringsASCII(data) | |
| 886 | + elif encoding == 'L': | |
| 887 | + dStrings[encoding] = StringsUNICODE(data) | |
| 888 | + return dStrings | |
| 889 | + | |
| 890 | +def ContainsWord(word, expression): | |
| 891 | + return struct.pack('<H', word) in expression | |
| 892 | + | |
| 893 | +# https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-xls/6e5eed10-5b77-43d6-8dd0-37345f8654ad | |
| 894 | +def ParseLoc(expression): | |
| 895 | + """ | |
| 896 | + | |
| 897 | + :param expression bytearray: bytearray, data to be parsed | |
| 898 | + :return: | |
| 899 | + :rtype: str | |
| 900 | + """ | |
| 901 | + formatcodes = 'HH' | |
| 902 | + formatsize = struct.calcsize(formatcodes) | |
| 903 | + row, column = struct.unpack(formatcodes, expression[0:formatsize]) | |
| 904 | + rowRelative = column & 0x8000 | |
| 905 | + colRelative = column & 0x4000 | |
| 906 | + column = column & 0x3FFF | |
| 907 | + if rowRelative: | |
| 908 | + rowindicator = '~' | |
| 909 | + else: | |
| 910 | + rowindicator = '' | |
| 911 | + row += 1 | |
| 912 | + if colRelative: | |
| 913 | + colindicator = '~' | |
| 914 | + else: | |
| 915 | + colindicator = '' | |
| 916 | + column += 1 | |
| 917 | + return 'R%s%dC%s%d' % (rowindicator, row, colindicator, column) | |
| 918 | + | |
| 919 | +def ParseExpression(expression): | |
| 920 | + ''' | |
| 921 | + Parse an expression into a human readable string. | |
| 922 | + | |
| 923 | + :param expression bytearray: bytearray, expression data to be parsed | |
| 924 | + :return: str, parsed expression as a string (bytes on Python 2, unicode on python 3) | |
| 925 | + :rtype: str | |
| 926 | + ''' | |
| 927 | + result = '' | |
| 928 | + while len(expression) > 0: | |
| 929 | + ptgid = expression[0] # int | |
| 930 | + expression = expression[1:] # bytearray | |
| 931 | + if ptgid in dTokens: | |
| 932 | + result += dTokens[ptgid] + ' ' | |
| 933 | + if ptgid == 0x17: # ptgStr | |
| 934 | + length = expression[0] # int | |
| 935 | + expression = expression[1:] | |
| 936 | + if expression[0] == 0: # probably BIFF8 -> UNICODE (compressed) | |
| 937 | + expression = expression[1:] | |
| 938 | + result += '"%s" ' % bytes2str(expression[:length]) | |
| 939 | + expression = expression[length:] | |
| 940 | + elif ptgid == 0x19: # ptgAttr | |
| 941 | + grbit = expression[0] # int | |
| 942 | + expression = expression[1:] | |
| 943 | + if grbit & 0x04: | |
| 944 | + result += 'CHOOSE ' | |
| 945 | + break | |
| 946 | + else: | |
| 947 | + expression = expression[2:] | |
| 948 | + elif ptgid == 0x16 or ptgid == 0x0e: # 0x0E: 'ptgNE', 0x16: 'ptgMissArg' | |
| 949 | + pass | |
| 950 | + elif ptgid == 0x1e: # ptgInt | |
| 951 | + result += '%d ' % (expression[0] + expression[1] * 0x100) | |
| 952 | + expression = expression[2:] | |
| 953 | + elif ptgid == 0x41: # ptgFuncV | |
| 954 | + functionid = expression[0] + expression[1] * 0x100 | |
| 955 | + result += '%s (0x%04x) ' % (dFunctions.get(functionid, '*UNKNOWN FUNCTION*'), functionid) | |
| 956 | + expression = expression[2:] | |
| 957 | + elif ptgid == 0x22 or ptgid == 0x42: # 0x22: 'ptgFuncVar', 0x42: 'ptgFuncVarV' | |
| 958 | + functionid = expression[1] + expression[2] * 0x100 | |
| 959 | + result += 'args %d func %s (0x%04x) ' % (expression[0], dFunctions.get(functionid, '*UNKNOWN FUNCTION*'), functionid) | |
| 960 | + expression = expression[3:] | |
| 961 | + elif ptgid == 0x23: # ptgName | |
| 962 | + result += '%04x ' % (expression[0] + expression[1] * 0x100) | |
| 963 | + # TODO: looks like we're skipping quite a few bytes | |
| 964 | + expression = expression[14:] | |
| 965 | + elif ptgid == 0x1f: # ptgNum | |
| 966 | + result += 'FLOAT ' | |
| 967 | + # TODO: looks like we're skipping quite a few bytes | |
| 968 | + expression = expression[8:] | |
| 969 | + elif ptgid == 0x26: # ptgMemArea | |
| 970 | + expression = expression[4:] # skipping 4 bytes | |
| 971 | + expression = expression[expression[0] + expression[1] * 0x100:] | |
| 972 | + result += 'REFERENCE-EXPRESSION ' | |
| 973 | + elif ptgid == 0x01: # ptgExp | |
| 974 | + formatcodes = 'HH' | |
| 975 | + formatsize = struct.calcsize(formatcodes) | |
| 976 | + row, column = struct.unpack(formatcodes, expression[0:formatsize]) | |
| 977 | + expression = expression[formatsize:] | |
| 978 | + result += 'R%dC%d ' % (row + 1, column + 1) | |
| 979 | + elif ptgid == 0x24 or ptgid == 0x44: # 0x24: 'ptgRef', 0x44: 'ptgRefV' | |
| 980 | + result += '%s ' % ParseLoc(expression) | |
| 981 | + expression = expression[4:] | |
| 982 | + elif ptgid == 0x3A or ptgid == 0x5A: # 0x3A: 'ptgRef3d', 0x5A: 'ptgRef3dV' | |
| 983 | + result += '%s ' % ParseLoc(expression[2:]) | |
| 984 | + expression = expression[6:] | |
| 985 | + else: | |
| 986 | + break | |
| 987 | + else: | |
| 988 | + result += '*UNKNOWN TOKEN* ' | |
| 989 | + break | |
| 990 | + if len(expression) == 0: | |
| 991 | + return result | |
| 992 | + else: | |
| 993 | + # 0x006E: 'EXEC', 0x0095: 'REGISTER' | |
| 994 | + functions = [dFunctions[functionid] for functionid in [0x6E, 0x95] if ContainsWord(functionid, expression)] | |
| 995 | + if functions != []: | |
| 996 | + message = ' Could contain following functions: ' + ','.join(functions) + ' -' | |
| 997 | + else: | |
| 998 | + message = '' | |
| 999 | + return result + ' *INCOMPLETE FORMULA PARSING*' + message + ' Remaining, unparsed expression: ' + repr(expression) | |
| 1000 | + | |
| 1001 | + | |
| 1002 | +class cBIFF(object): # cPluginParent): | |
| 1003 | + macroOnly = False | |
| 1004 | + name = 'BIFF plugin' | |
| 1005 | + | |
| 1006 | + def __init__(self, name, stream, options): | |
| 1007 | + self.streamname = name | |
| 1008 | + self.stream = stream | |
| 1009 | + self.options = options | |
| 1010 | + self.ran = False | |
| 1011 | + | |
| 1012 | + def Analyze(self): | |
| 1013 | + result = [] | |
| 1014 | + macros4Found = False | |
| 1015 | + if self.streamname in [['Workbook'], ['Book']]: | |
| 1016 | + self.ran = True | |
| 1017 | + # use a bytearray to have Python 2+3 compatibility with the same code (no need for ord()) | |
| 1018 | + stream = bytearray(self.stream) | |
| 1019 | + | |
| 1020 | + oParser = optparse.OptionParser() | |
| 1021 | + oParser.add_option('-s', '--strings', action='store_true', default=False, help='Dump strings') | |
| 1022 | + oParser.add_option('-a', '--hexascii', action='store_true', default=False, help='Dump hex ascii') | |
| 1023 | + oParser.add_option('-x', '--xlm', action='store_true', default=False, help='Select all records relevant for Excel 4.0 macros') | |
| 1024 | + oParser.add_option('-o', '--opcode', type=str, default='', help='Opcode to filter for') | |
| 1025 | + oParser.add_option('-f', '--find', type=str, default='', help='Content to search for') | |
| 1026 | + (options, args) = oParser.parse_args(self.options.split(' ')) | |
| 1027 | + | |
| 1028 | + if options.find.startswith('0x'): | |
| 1029 | + options.find = binascii.a2b_hex(options.find[2:]) | |
| 1030 | + | |
| 1031 | + while len(stream)>0: | |
| 1032 | + formatcodes = 'HH' | |
| 1033 | + formatsize = struct.calcsize(formatcodes) | |
| 1034 | + # print('formatsize=%d' % formatsize) | |
| 1035 | + opcode, length = struct.unpack(formatcodes, stream[0:formatsize]) | |
| 1036 | + # print('opcode=%d length=%d len(stream)=%d' % (opcode, length, len(stream))) | |
| 1037 | + stream = stream[formatsize:] | |
| 1038 | + data = stream[:length] | |
| 1039 | + stream = stream[length:] | |
| 1040 | + | |
| 1041 | + if opcode in dOpcodes: | |
| 1042 | + opcodename = dOpcodes[opcode] | |
| 1043 | + else: | |
| 1044 | + opcodename = '' | |
| 1045 | + line = '%04x %6d %s' % (opcode, length, opcodename) | |
| 1046 | + # print(line) | |
| 1047 | + | |
| 1048 | + # FORMULA record | |
| 1049 | + if opcode == 0x06 and len(data) >= 21: | |
| 1050 | + formatcodes = 'HH' | |
| 1051 | + formatsize = struct.calcsize(formatcodes) | |
| 1052 | + row, column = struct.unpack(formatcodes, data[0:formatsize]) | |
| 1053 | + formatcodes = 'H' | |
| 1054 | + formatsize = struct.calcsize(formatcodes) | |
| 1055 | + length = struct.unpack(formatcodes, data[20:20 + formatsize])[0] | |
| 1056 | + expression = data[22:] | |
| 1057 | + line += ' - R%dC%d len=%d %s' % (row + 1, column + 1, length, ParseExpression(expression)) | |
| 1058 | + # print(line) | |
| 1059 | + | |
| 1060 | + # FORMULA record #a# difference BIFF4 and BIFF5+ | |
| 1061 | + if opcode == 0x18 and len(data) >= 16: | |
| 1062 | + if data[0] & 0x20: | |
| 1063 | + dBuildInNames = {1: 'Auto_Open', 2: 'Auto_Close'} | |
| 1064 | + code = data[14] | |
| 1065 | + if code == 0: #a# hack with BIFF8 Unicode | |
| 1066 | + code = data[15] | |
| 1067 | + line += ' - build-in-name %d %s' % (code, dBuildInNames.get(code, '?')) | |
| 1068 | + else: | |
| 1069 | + pass | |
| 1070 | + line += ' - %s' % bytes2str(data[14:14+data[3]]) | |
| 1071 | + # print(line) | |
| 1072 | + | |
| 1073 | + # BOUNDSHEET record | |
| 1074 | + if opcode == 0x85 and len(data) >= 6: | |
| 1075 | + dSheetType = {0: 'worksheet or dialog sheet', 1: 'Excel 4.0 macro sheet', 2: 'chart', 6: 'Visual Basic module'} | |
| 1076 | + if data[5] == 1: | |
| 1077 | + macros4Found = True | |
| 1078 | + dSheetState = {0: 'visible', 1: 'hidden', 2: 'very hidden'} | |
| 1079 | + line += ' - %s, %s' % (dSheetType.get(data[5], '%02x' % data[5]), dSheetState.get(data[4], '%02x' % data[4])) | |
| 1080 | + # print(line) | |
| 1081 | + | |
| 1082 | + # STRING record | |
| 1083 | + if opcode == 0x207 and len(data) >= 4: | |
| 1084 | + values = list(Strings(data[3:]).values()) | |
| 1085 | + strings = '' | |
| 1086 | + if values[0] != []: | |
| 1087 | + strings += ' '.join(values[0]) | |
| 1088 | + if values[1] != []: | |
| 1089 | + if strings != '': | |
| 1090 | + strings += ' ' | |
| 1091 | + strings += ' '.join(values[1]) | |
| 1092 | + line += ' - %s' % strings | |
| 1093 | + # print(line) | |
| 1094 | + | |
| 1095 | + if options.find == '' and options.opcode == '' and not options.xlm or options.opcode != '' and options.opcode.lower() in line.lower() or options.find != '' and options.find in data or options.xlm and opcode in [0x06, 0x18, 0x85, 0x207]: | |
| 1096 | + result.append(line) | |
| 1097 | + | |
| 1098 | + if options.hexascii: | |
| 1099 | + result.extend(' ' + foundstring for foundstring in HexASCII(data, 8)) | |
| 1100 | + elif options.strings: | |
| 1101 | + dEncodings = {'s': 'ASCII', 'L': 'UNICODE'} | |
| 1102 | + for encoding, strings in Strings(data).items(): | |
| 1103 | + if len(strings) > 0: | |
| 1104 | + result.append(' ' + dEncodings[encoding] + ':') | |
| 1105 | + result.extend(' ' + foundstring for foundstring in strings) | |
| 1106 | + | |
| 1107 | + if options.xlm and not macros4Found: | |
| 1108 | + result = [] | |
| 1109 | + | |
| 1110 | + return result | |
| 1111 | + | |
| 1112 | +# AddPlugin(cBIFF) | ... | ... |
oletools/thirdparty/tablestream/tablestream.py
| ... | ... | @@ -55,8 +55,9 @@ from __future__ import print_function |
| 55 | 55 | # 2016-08-28 v0.07 PL: - support for both Python 2.6+ and 3.x |
| 56 | 56 | # - all cells are converted to unicode |
| 57 | 57 | # 2018-09-22 v0.08 PL: - removed mention to oletools' thirdparty folder |
| 58 | +# 2019-03-27 v0.09 PL: - slight fix, TableStyleSlim inherits from TableStyle | |
| 58 | 59 | |
| 59 | -__version__ = '0.08' | |
| 60 | +__version__ = '0.09' | |
| 60 | 61 | |
| 61 | 62 | #------------------------------------------------------------------------------ |
| 62 | 63 | # TODO: |
| ... | ... | @@ -174,7 +175,7 @@ class TableStyle(object): |
| 174 | 175 | bottom_right = u'+' |
| 175 | 176 | |
| 176 | 177 | |
| 177 | -class TableStyleSlim(object): | |
| 178 | +class TableStyleSlim(TableStyle): | |
| 178 | 179 | """ |
| 179 | 180 | Style for a TableStream. |
| 180 | 181 | Example: | ... | ... |
oletools/thirdparty/xglob/xglob.py
| 1 | -#! /usr/bin/env python2 | |
| 2 | -""" | |
| 3 | -xglob | |
| 4 | - | |
| 5 | -xglob is a python package to list files matching wildcards (*, ?, []), | |
| 6 | -extending the functionality of the glob module from the standard python | |
| 7 | -library (https://docs.python.org/2/library/glob.html). | |
| 8 | - | |
| 9 | -Main features: | |
| 10 | -- recursive file listing (including subfolders) | |
| 11 | -- file listing within Zip archives | |
| 12 | -- helper function to open files specified as arguments, supporting files | |
| 13 | - within zip archives encrypted with a password | |
| 14 | - | |
| 15 | -Author: Philippe Lagadec - http://www.decalage.info | |
| 16 | -License: BSD, see source code or documentation | |
| 17 | - | |
| 18 | -For more info and updates: http://www.decalage.info/xglob | |
| 19 | -""" | |
| 20 | - | |
| 21 | -# LICENSE: | |
| 22 | -# | |
| 23 | -# xglob is copyright (c) 2013-2016, Philippe Lagadec (http://www.decalage.info) | |
| 24 | -# All rights reserved. | |
| 25 | -# | |
| 26 | -# Redistribution and use in source and binary forms, with or without modification, | |
| 27 | -# are permitted provided that the following conditions are met: | |
| 28 | -# | |
| 29 | -# * Redistributions of source code must retain the above copyright notice, this | |
| 30 | -# list of conditions and the following disclaimer. | |
| 31 | -# * Redistributions in binary form must reproduce the above copyright notice, | |
| 32 | -# this list of conditions and the following disclaimer in the documentation | |
| 33 | -# and/or other materials provided with the distribution. | |
| 34 | -# | |
| 35 | -# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |
| 36 | -# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
| 37 | -# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |
| 38 | -# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | |
| 39 | -# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
| 40 | -# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | |
| 41 | -# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | |
| 42 | -# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | |
| 43 | -# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
| 44 | -# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
| 45 | - | |
| 46 | - | |
| 47 | -#------------------------------------------------------------------------------ | |
| 48 | -# CHANGELOG: | |
| 49 | -# 2013-12-04 v0.01 PL: - scan several files from command line args | |
| 50 | -# 2014-01-14 v0.02 PL: - added riglob, ziglob | |
| 51 | -# 2014-12-26 v0.03 PL: - moved code from balbuzard into a separate package | |
| 52 | -# 2015-01-03 v0.04 PL: - fixed issues in iter_files + yield container name | |
| 53 | -# 2016-02-24 v0.05 PL: - do not stop on exceptions, return them as data | |
| 54 | -# - fixed issue when using wildcards with empty path | |
| 55 | -# 2016-04-28 v0.06 CH: - improved handling of non-existing files | |
| 56 | -# (by Christian Herdtweck) | |
| 57 | - | |
| 58 | -__version__ = '0.06' | |
| 59 | - | |
| 60 | - | |
| 61 | -#=== IMPORTS ================================================================= | |
| 62 | - | |
| 63 | -import os, fnmatch, glob, zipfile | |
| 64 | - | |
| 65 | -#=== EXCEPTIONS ============================================================== | |
| 66 | - | |
| 67 | -class PathNotFoundException(Exception): | |
| 68 | - """ raised if given a fixed file/dir (not a glob) that does not exist """ | |
| 69 | - def __init__(self, path): | |
| 70 | - super(PathNotFoundException, self).__init__( | |
| 71 | - 'Given path does not exist: %r' % path) | |
| 72 | - | |
| 73 | - | |
| 74 | -#=== FUNCTIONS =============================================================== | |
| 75 | - | |
| 76 | -# recursive glob function to find files in any subfolder: | |
| 77 | -# inspired by http://stackoverflow.com/questions/14798220/how-can-i-search-sub-folders-using-glob-glob-module-in-python | |
| 78 | -def rglob (path, pattern='*.*'): | |
| 79 | - """ | |
| 80 | - Recursive glob: | |
| 81 | - similar to glob.glob, but finds files recursively in all subfolders of path. | |
| 82 | - path: root directory where to search files | |
| 83 | - pattern: pattern for filenames, using wildcards, e.g. *.txt | |
| 84 | - """ | |
| 85 | - #TODO: more compatible API with glob: use single param, split path from pattern | |
| 86 | - return [os.path.join(dirpath, f) | |
| 87 | - for dirpath, dirnames, files in os.walk(path) | |
| 88 | - for f in fnmatch.filter(files, pattern)] | |
| 89 | - | |
| 90 | - | |
| 91 | -def riglob (pathname): | |
| 92 | - """ | |
| 93 | - Recursive iglob: | |
| 94 | - similar to glob.iglob, but finds files recursively in all subfolders of path. | |
| 95 | - pathname: root directory where to search files followed by pattern for | |
| 96 | - filenames, using wildcards, e.g. *.txt | |
| 97 | - """ | |
| 98 | - path, filespec = os.path.split(pathname) | |
| 99 | - # fix path if empty: | |
| 100 | - if path == '': | |
| 101 | - path = '.' | |
| 102 | - # print 'riglob: path=%r, filespec=%r' % (path, filespec) | |
| 103 | - for dirpath, dirnames, files in os.walk(path): | |
| 104 | - for f in fnmatch.filter(files, filespec): | |
| 105 | - yield os.path.join(dirpath, f) | |
| 106 | - | |
| 107 | - | |
| 108 | -def ziglob (zipfileobj, pathname): | |
| 109 | - """ | |
| 110 | - iglob in a zip: | |
| 111 | - similar to glob.iglob, but finds files within a zip archive. | |
| 112 | - - zipfileobj: zipfile.ZipFile object | |
| 113 | - - pathname: root directory where to search files followed by pattern for | |
| 114 | - filenames, using wildcards, e.g. *.txt | |
| 115 | - """ | |
| 116 | - files = zipfileobj.namelist() | |
| 117 | - #for f in files: print f | |
| 118 | - for f in fnmatch.filter(files, pathname): | |
| 119 | - yield f | |
| 120 | - | |
| 121 | - | |
| 122 | -def iter_files(files, recursive=False, zip_password=None, zip_fname='*'): | |
| 123 | - """ | |
| 124 | - Open each file provided as argument: | |
| 125 | - - files is a list of arguments | |
| 126 | - - if zip_password is None, each file is listed without reading its content. | |
| 127 | - Wilcards are supported. | |
| 128 | - - if not, then each file is opened as a zip archive with the provided password | |
| 129 | - - then files matching zip_fname are opened from the zip archive | |
| 130 | - | |
| 131 | - Iterator: yields (container, filename, data) for each file. If zip_password is None, then | |
| 132 | - only the filename is returned, container and data=None. Otherwise container is the | |
| 133 | - filename of the container (zip file), and data is the file content (or an exception). | |
| 134 | - If a given filename is not a glob and does not exist, the triplet | |
| 135 | - (None, filename, PathNotFoundException) is yielded. (Globs matching nothing | |
| 136 | - do not trigger exceptions) | |
| 137 | - """ | |
| 138 | - #TODO: catch exceptions and yield them for the caller (no file found, file is not zip, wrong password, etc) | |
| 139 | - #TODO: use logging instead of printing | |
| 140 | - #TODO: split in two simpler functions, the caller knows if it's a zip or not | |
| 141 | - # print 'iter_files: files=%r, recursive=%s' % (files, recursive) | |
| 142 | - # choose recursive or non-recursive iglob: | |
| 143 | - if recursive: | |
| 144 | - iglob = riglob | |
| 145 | - else: | |
| 146 | - iglob = glob.iglob | |
| 147 | - for filespec in files: | |
| 148 | - if not is_glob(filespec) and not os.path.exists(filespec): | |
| 149 | - yield None, filespec, PathNotFoundException(filespec) | |
| 150 | - continue | |
| 151 | - for filename in iglob(filespec): | |
| 152 | - if zip_password is not None: | |
| 153 | - # Each file is expected to be a zip archive: | |
| 154 | - #print 'Opening zip archive %s with provided password' % filename | |
| 155 | - z = zipfile.ZipFile(filename, 'r') | |
| 156 | - #print 'Looking for file(s) matching "%s"' % zip_fname | |
| 157 | - for subfilename in ziglob(z, zip_fname): | |
| 158 | - #print 'Opening file in zip archive:', filename | |
| 159 | - try: | |
| 160 | - data = z.read(subfilename, zip_password) | |
| 161 | - yield filename, subfilename, data | |
| 162 | - except Exception as e: | |
| 163 | - yield filename, subfilename, e | |
| 164 | - z.close() | |
| 165 | - else: | |
| 166 | - # normal file | |
| 167 | - # do not read the file content, just yield the filename | |
| 168 | - yield None, filename, None | |
| 169 | - #print 'Opening file', filename | |
| 170 | - #data = open(filename, 'rb').read() | |
| 171 | - #yield None, filename, data | |
| 172 | - | |
| 173 | - | |
| 174 | -def is_glob(filespec): | |
| 175 | - """ determine if given file specification is a single file name or a glob | |
| 176 | - | |
| 177 | - python's glob and fnmatch can only interpret ?, *, [list], and [ra-nge], | |
| 178 | - (and combinations: hex_*_[A-Fabcdef0-9]). | |
| 179 | - The special chars *?[-] can only be escaped using [] | |
| 180 | - --> file_name is not a glob | |
| 181 | - --> file?name is a glob | |
| 182 | - --> file* is a glob | |
| 183 | - --> file[-._]name is a glob | |
| 184 | - --> file[?]name is not a glob (matches literal "file?name") | |
| 185 | - --> file[*]name is not a glob (matches literal "file*name") | |
| 186 | - --> file[-]name is not a glob (matches literal "file-name") | |
| 187 | - --> file-name is not a glob | |
| 188 | - | |
| 189 | - Also, obviously incorrect globs are treated as non-globs | |
| 190 | - --> file[name is not a glob (matches literal "file[name") | |
| 191 | - --> file]-[name is treated as a glob | |
| 192 | - (it is not a valid glob but detecting errors like this requires | |
| 193 | - sophisticated regular expression matching) | |
| 194 | - | |
| 195 | - Python's glob also works with globs in directory-part of path | |
| 196 | - --> dir-part of path is analyzed just like filename-part | |
| 197 | - --> thirdparty/*/xglob.py is a (valid) glob | |
| 198 | - | |
| 199 | - TODO: create a correct regexp to test for validity of ranges | |
| 200 | - """ | |
| 201 | - | |
| 202 | - # remove escaped special chars | |
| 203 | - cleaned = filespec.replace('[*]', '').replace('[?]', '') \ | |
| 204 | - .replace('[[]', '').replace('[]]', '').replace('[-]', '') | |
| 205 | - | |
| 206 | - # check if special chars remain | |
| 207 | - return '*' in cleaned or '?' in cleaned or \ | |
| 208 | - ('[' in cleaned and ']' in cleaned) | |
| 1 | +#! /usr/bin/env python2 | |
| 2 | +""" | |
| 3 | +xglob | |
| 4 | + | |
| 5 | +xglob is a python package to list files matching wildcards (*, ?, []), | |
| 6 | +extending the functionality of the glob module from the standard python | |
| 7 | +library (https://docs.python.org/2/library/glob.html). | |
| 8 | + | |
| 9 | +Main features: | |
| 10 | +- recursive file listing (including subfolders) | |
| 11 | +- file listing within Zip archives | |
| 12 | +- helper function to open files specified as arguments, supporting files | |
| 13 | + within zip archives encrypted with a password | |
| 14 | + | |
| 15 | +Author: Philippe Lagadec - http://www.decalage.info | |
| 16 | +License: BSD, see source code or documentation | |
| 17 | + | |
| 18 | +For more info and updates: http://www.decalage.info/xglob | |
| 19 | +""" | |
| 20 | + | |
| 21 | +# LICENSE: | |
| 22 | +# | |
| 23 | +# xglob is copyright (c) 2013-2018, Philippe Lagadec (http://www.decalage.info) | |
| 24 | +# All rights reserved. | |
| 25 | +# | |
| 26 | +# Redistribution and use in source and binary forms, with or without modification, | |
| 27 | +# are permitted provided that the following conditions are met: | |
| 28 | +# | |
| 29 | +# * Redistributions of source code must retain the above copyright notice, this | |
| 30 | +# list of conditions and the following disclaimer. | |
| 31 | +# * Redistributions in binary form must reproduce the above copyright notice, | |
| 32 | +# this list of conditions and the following disclaimer in the documentation | |
| 33 | +# and/or other materials provided with the distribution. | |
| 34 | +# | |
| 35 | +# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND | |
| 36 | +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED | |
| 37 | +# WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | |
| 38 | +# DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | |
| 39 | +# FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | |
| 40 | +# DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | |
| 41 | +# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | |
| 42 | +# CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | |
| 43 | +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | |
| 44 | +# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. | |
| 45 | + | |
| 46 | + | |
| 47 | +#------------------------------------------------------------------------------ | |
| 48 | +# CHANGELOG: | |
| 49 | +# 2013-12-04 v0.01 PL: - scan several files from command line args | |
| 50 | +# 2014-01-14 v0.02 PL: - added riglob, ziglob | |
| 51 | +# 2014-12-26 v0.03 PL: - moved code from balbuzard into a separate package | |
| 52 | +# 2015-01-03 v0.04 PL: - fixed issues in iter_files + yield container name | |
| 53 | +# 2016-02-24 v0.05 PL: - do not stop on exceptions, return them as data | |
| 54 | +# - fixed issue when using wildcards with empty path | |
| 55 | +# 2016-04-28 v0.06 CH: - improved handling of non-existing files | |
| 56 | +# (by Christian Herdtweck) | |
| 57 | +# 2018-12-08 v0.07 PL: - fixed issue #373, zip password must be bytes | |
| 58 | + | |
| 59 | +__version__ = '0.07' | |
| 60 | + | |
| 61 | + | |
| 62 | +#=== IMPORTS ================================================================= | |
| 63 | + | |
| 64 | +import os, fnmatch, glob, zipfile | |
| 65 | + | |
| 66 | +#=== EXCEPTIONS ============================================================== | |
| 67 | + | |
| 68 | +class PathNotFoundException(Exception): | |
| 69 | + """ raised if given a fixed file/dir (not a glob) that does not exist """ | |
| 70 | + def __init__(self, path): | |
| 71 | + super(PathNotFoundException, self).__init__( | |
| 72 | + 'Given path does not exist: %r' % path) | |
| 73 | + | |
| 74 | + | |
| 75 | +#=== FUNCTIONS =============================================================== | |
| 76 | + | |
| 77 | +# recursive glob function to find files in any subfolder: | |
| 78 | +# inspired by http://stackoverflow.com/questions/14798220/how-can-i-search-sub-folders-using-glob-glob-module-in-python | |
| 79 | +def rglob (path, pattern='*.*'): | |
| 80 | + """ | |
| 81 | + Recursive glob: | |
| 82 | + similar to glob.glob, but finds files recursively in all subfolders of path. | |
| 83 | + path: root directory where to search files | |
| 84 | + pattern: pattern for filenames, using wildcards, e.g. *.txt | |
| 85 | + """ | |
| 86 | + #TODO: more compatible API with glob: use single param, split path from pattern | |
| 87 | + return [os.path.join(dirpath, f) | |
| 88 | + for dirpath, dirnames, files in os.walk(path) | |
| 89 | + for f in fnmatch.filter(files, pattern)] | |
| 90 | + | |
| 91 | + | |
| 92 | +def riglob (pathname): | |
| 93 | + """ | |
| 94 | + Recursive iglob: | |
| 95 | + similar to glob.iglob, but finds files recursively in all subfolders of path. | |
| 96 | + pathname: root directory where to search files followed by pattern for | |
| 97 | + filenames, using wildcards, e.g. *.txt | |
| 98 | + """ | |
| 99 | + path, filespec = os.path.split(pathname) | |
| 100 | + # fix path if empty: | |
| 101 | + if path == '': | |
| 102 | + path = '.' | |
| 103 | + # print 'riglob: path=%r, filespec=%r' % (path, filespec) | |
| 104 | + for dirpath, dirnames, files in os.walk(path): | |
| 105 | + for f in fnmatch.filter(files, filespec): | |
| 106 | + yield os.path.join(dirpath, f) | |
| 107 | + | |
| 108 | + | |
| 109 | +def ziglob (zipfileobj, pathname): | |
| 110 | + """ | |
| 111 | + iglob in a zip: | |
| 112 | + similar to glob.iglob, but finds files within a zip archive. | |
| 113 | + - zipfileobj: zipfile.ZipFile object | |
| 114 | + - pathname: root directory where to search files followed by pattern for | |
| 115 | + filenames, using wildcards, e.g. *.txt | |
| 116 | + """ | |
| 117 | + files = zipfileobj.namelist() | |
| 118 | + #for f in files: print f | |
| 119 | + for f in fnmatch.filter(files, pathname): | |
| 120 | + yield f | |
| 121 | + | |
| 122 | + | |
| 123 | +def iter_files(files, recursive=False, zip_password=None, zip_fname='*'): | |
| 124 | + """ | |
| 125 | + Open each file provided as argument: | |
| 126 | + - files is a list of arguments | |
| 127 | + - if zip_password is None, each file is listed without reading its content. | |
| 128 | + Wilcards are supported. | |
| 129 | + - if not, then each file is opened as a zip archive with the provided password | |
| 130 | + - then files matching zip_fname are opened from the zip archive | |
| 131 | + | |
| 132 | + Iterator: yields (container, filename, data) for each file. If zip_password is None, then | |
| 133 | + only the filename is returned, container and data=None. Otherwise container is the | |
| 134 | + filename of the container (zip file), and data is the file content (or an exception). | |
| 135 | + If a given filename is not a glob and does not exist, the triplet | |
| 136 | + (None, filename, PathNotFoundException) is yielded. (Globs matching nothing | |
| 137 | + do not trigger exceptions) | |
| 138 | + """ | |
| 139 | + #TODO: catch exceptions and yield them for the caller (no file found, file is not zip, wrong password, etc) | |
| 140 | + #TODO: use logging instead of printing | |
| 141 | + #TODO: split in two simpler functions, the caller knows if it's a zip or not | |
| 142 | + # print 'iter_files: files=%r, recursive=%s' % (files, recursive) | |
| 143 | + # choose recursive or non-recursive iglob: | |
| 144 | + if recursive: | |
| 145 | + iglob = riglob | |
| 146 | + else: | |
| 147 | + iglob = glob.iglob | |
| 148 | + for filespec in files: | |
| 149 | + if not is_glob(filespec) and not os.path.exists(filespec): | |
| 150 | + yield None, filespec, PathNotFoundException(filespec) | |
| 151 | + continue | |
| 152 | + for filename in iglob(filespec): | |
| 153 | + if zip_password is not None: | |
| 154 | + # Each file is expected to be a zip archive: | |
| 155 | + # The zip password must be bytes, not unicode/str: | |
| 156 | + if not isinstance(zip_password, bytes): | |
| 157 | + zip_password = bytes(zip_password, encoding='utf8') | |
| 158 | + # print('Opening zip archive %s with provided password' % filename) | |
| 159 | + # print('zip password: %r' % zip_password) | |
| 160 | + # print(type(zip_password)) | |
| 161 | + z = zipfile.ZipFile(filename, 'r') | |
| 162 | + #print 'Looking for file(s) matching "%s"' % zip_fname | |
| 163 | + for subfilename in ziglob(z, zip_fname): | |
| 164 | + #print 'Opening file in zip archive:', filename | |
| 165 | + try: | |
| 166 | + data = z.read(subfilename, zip_password) | |
| 167 | + yield filename, subfilename, data | |
| 168 | + except Exception as e: | |
| 169 | + yield filename, subfilename, e | |
| 170 | + z.close() | |
| 171 | + else: | |
| 172 | + # normal file | |
| 173 | + # do not read the file content, just yield the filename | |
| 174 | + yield None, filename, None | |
| 175 | + #print 'Opening file', filename | |
| 176 | + #data = open(filename, 'rb').read() | |
| 177 | + #yield None, filename, data | |
| 178 | + | |
| 179 | + | |
| 180 | +def is_glob(filespec): | |
| 181 | + """ determine if given file specification is a single file name or a glob | |
| 182 | + | |
| 183 | + python's glob and fnmatch can only interpret ?, *, [list], and [ra-nge], | |
| 184 | + (and combinations: hex_*_[A-Fabcdef0-9]). | |
| 185 | + The special chars *?[-] can only be escaped using [] | |
| 186 | + --> file_name is not a glob | |
| 187 | + --> file?name is a glob | |
| 188 | + --> file* is a glob | |
| 189 | + --> file[-._]name is a glob | |
| 190 | + --> file[?]name is not a glob (matches literal "file?name") | |
| 191 | + --> file[*]name is not a glob (matches literal "file*name") | |
| 192 | + --> file[-]name is not a glob (matches literal "file-name") | |
| 193 | + --> file-name is not a glob | |
| 194 | + | |
| 195 | + Also, obviously incorrect globs are treated as non-globs | |
| 196 | + --> file[name is not a glob (matches literal "file[name") | |
| 197 | + --> file]-[name is treated as a glob | |
| 198 | + (it is not a valid glob but detecting errors like this requires | |
| 199 | + sophisticated regular expression matching) | |
| 200 | + | |
| 201 | + Python's glob also works with globs in directory-part of path | |
| 202 | + --> dir-part of path is analyzed just like filename-part | |
| 203 | + --> thirdparty/*/xglob.py is a (valid) glob | |
| 204 | + | |
| 205 | + TODO: create a correct regexp to test for validity of ranges | |
| 206 | + """ | |
| 207 | + | |
| 208 | + # remove escaped special chars | |
| 209 | + cleaned = filespec.replace('[*]', '').replace('[?]', '') \ | |
| 210 | + .replace('[[]', '').replace('[]]', '').replace('[-]', '') | |
| 211 | + | |
| 212 | + # check if special chars remain | |
| 213 | + return '*' in cleaned or '?' in cleaned or \ | |
| 214 | + ('[' in cleaned and ']' in cleaned) | ... | ... |
oletools/thirdparty/zipfile27/LICENSE.txt deleted
| 1 | -Python 2.7 license | |
| 2 | - | |
| 3 | -This is the official license for the Python 2.7 release: | |
| 4 | - | |
| 5 | -A. HISTORY OF THE SOFTWARE | |
| 6 | -========================== | |
| 7 | - | |
| 8 | -Python was created in the early 1990s by Guido van Rossum at Stichting | |
| 9 | -Mathematisch Centrum (CWI, see http://www.cwi.nl) in the Netherlands | |
| 10 | -as a successor of a language called ABC. Guido remains Python's | |
| 11 | -principal author, although it includes many contributions from others. | |
| 12 | - | |
| 13 | -In 1995, Guido continued his work on Python at the Corporation for | |
| 14 | -National Research Initiatives (CNRI, see http://www.cnri.reston.va.us) | |
| 15 | -in Reston, Virginia where he released several versions of the | |
| 16 | -software. | |
| 17 | - | |
| 18 | -In May 2000, Guido and the Python core development team moved to | |
| 19 | -BeOpen.com to form the BeOpen PythonLabs team. In October of the same | |
| 20 | -year, the PythonLabs team moved to Digital Creations (now Zope | |
| 21 | -Corporation, see http://www.zope.com). In 2001, the Python Software | |
| 22 | -Foundation (PSF, see http://www.python.org/psf/) was formed, a | |
| 23 | -non-profit organization created specifically to own Python-related | |
| 24 | -Intellectual Property. Zope Corporation is a sponsoring member of | |
| 25 | -the PSF. | |
| 26 | - | |
| 27 | -All Python releases are Open Source (see http://www.opensource.org for | |
| 28 | -the Open Source Definition). Historically, most, but not all, Python | |
| 29 | -releases have also been GPL-compatible; the table below summarizes | |
| 30 | -the various releases. | |
| 31 | - | |
| 32 | - Release Derived Year Owner GPL- | |
| 33 | - from compatible? (1) | |
| 34 | - | |
| 35 | - 0.9.0 thru 1.2 1991-1995 CWI yes | |
| 36 | - 1.3 thru 1.5.2 1.2 1995-1999 CNRI yes | |
| 37 | - 1.6 1.5.2 2000 CNRI no | |
| 38 | - 2.0 1.6 2000 BeOpen.com no | |
| 39 | - 1.6.1 1.6 2001 CNRI yes (2) | |
| 40 | - 2.1 2.0+1.6.1 2001 PSF no | |
| 41 | - 2.0.1 2.0+1.6.1 2001 PSF yes | |
| 42 | - 2.1.1 2.1+2.0.1 2001 PSF yes | |
| 43 | - 2.2 2.1.1 2001 PSF yes | |
| 44 | - 2.1.2 2.1.1 2002 PSF yes | |
| 45 | - 2.1.3 2.1.2 2002 PSF yes | |
| 46 | - 2.2.1 2.2 2002 PSF yes | |
| 47 | - 2.2.2 2.2.1 2002 PSF yes | |
| 48 | - 2.2.3 2.2.2 2003 PSF yes | |
| 49 | - 2.3 2.2.2 2002-2003 PSF yes | |
| 50 | - 2.3.1 2.3 2002-2003 PSF yes | |
| 51 | - 2.3.2 2.3.1 2002-2003 PSF yes | |
| 52 | - 2.3.3 2.3.2 2002-2003 PSF yes | |
| 53 | - 2.3.4 2.3.3 2004 PSF yes | |
| 54 | - 2.3.5 2.3.4 2005 PSF yes | |
| 55 | - 2.4 2.3 2004 PSF yes | |
| 56 | - 2.4.1 2.4 2005 PSF yes | |
| 57 | - 2.4.2 2.4.1 2005 PSF yes | |
| 58 | - 2.4.3 2.4.2 2006 PSF yes | |
| 59 | - 2.5 2.4 2006 PSF yes | |
| 60 | - 2.7 2.6 2010 PSF yes | |
| 61 | - | |
| 62 | -Footnotes: | |
| 63 | - | |
| 64 | -(1) GPL-compatible doesn't mean that we're distributing Python under | |
| 65 | - the GPL. All Python licenses, unlike the GPL, let you distribute | |
| 66 | - a modified version without making your changes open source. The | |
| 67 | - GPL-compatible licenses make it possible to combine Python with | |
| 68 | - other software that is released under the GPL; the others don't. | |
| 69 | - | |
| 70 | -(2) According to Richard Stallman, 1.6.1 is not GPL-compatible, | |
| 71 | - because its license has a choice of law clause. According to | |
| 72 | - CNRI, however, Stallman's lawyer has told CNRI's lawyer that 1.6.1 | |
| 73 | - is "not incompatible" with the GPL. | |
| 74 | - | |
| 75 | -Thanks to the many outside volunteers who have worked under Guido's | |
| 76 | -direction to make these releases possible. | |
| 77 | - | |
| 78 | - | |
| 79 | -B. TERMS AND CONDITIONS FOR ACCESSING OR OTHERWISE USING PYTHON | |
| 80 | -=============================================================== | |
| 81 | - | |
| 82 | -PYTHON SOFTWARE FOUNDATION LICENSE VERSION 2 | |
| 83 | --------------------------------------------- | |
| 84 | - | |
| 85 | -1. This LICENSE AGREEMENT is between the Python Software Foundation | |
| 86 | -("PSF"), and the Individual or Organization ("Licensee") accessing and | |
| 87 | -otherwise using this software ("Python") in source or binary form and | |
| 88 | -its associated documentation. | |
| 89 | - | |
| 90 | -2. Subject to the terms and conditions of this License Agreement, PSF | |
| 91 | -hereby grants Licensee a nonexclusive, royalty-free, world-wide | |
| 92 | -license to reproduce, analyze, test, perform and/or display publicly, | |
| 93 | -prepare derivative works, distribute, and otherwise use Python | |
| 94 | -alone or in any derivative version, provided, however, that PSF's | |
| 95 | -License Agreement and PSF's notice of copyright, i.e., "Copyright (c) | |
| 96 | -2001, 2002, 2003, 2004, 2005, 2006 Python Software Foundation; All Rights | |
| 97 | -Reserved" are retained in Python alone or in any derivative version | |
| 98 | -prepared by Licensee. | |
| 99 | - | |
| 100 | -3. In the event Licensee prepares a derivative work that is based on | |
| 101 | -or incorporates Python or any part thereof, and wants to make | |
| 102 | -the derivative work available to others as provided herein, then | |
| 103 | -Licensee hereby agrees to include in any such work a brief summary of | |
| 104 | -the changes made to Python. | |
| 105 | - | |
| 106 | -4. PSF is making Python available to Licensee on an "AS IS" | |
| 107 | -basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR | |
| 108 | -IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND | |
| 109 | -DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS | |
| 110 | -FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON WILL NOT | |
| 111 | -INFRINGE ANY THIRD PARTY RIGHTS. | |
| 112 | - | |
| 113 | -5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON | |
| 114 | -FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS | |
| 115 | -A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON, | |
| 116 | -OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. | |
| 117 | - | |
| 118 | -6. This License Agreement will automatically terminate upon a material | |
| 119 | -breach of its terms and conditions. | |
| 120 | - | |
| 121 | -7. Nothing in this License Agreement shall be deemed to create any | |
| 122 | -relationship of agency, partnership, or joint venture between PSF and | |
| 123 | -Licensee. This License Agreement does not grant permission to use PSF | |
| 124 | -trademarks or trade name in a trademark sense to endorse or promote | |
| 125 | -products or services of Licensee, or any third party. | |
| 126 | - | |
| 127 | -8. By copying, installing or otherwise using Python, Licensee | |
| 128 | -agrees to be bound by the terms and conditions of this License | |
| 129 | -Agreement. | |
| 130 | - | |
| 131 | - | |
| 132 | -BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0 | |
| 133 | -------------------------------------------- | |
| 134 | - | |
| 135 | -BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1 | |
| 136 | - | |
| 137 | -1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an | |
| 138 | -office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the | |
| 139 | -Individual or Organization ("Licensee") accessing and otherwise using | |
| 140 | -this software in source or binary form and its associated | |
| 141 | -documentation ("the Software"). | |
| 142 | - | |
| 143 | -2. Subject to the terms and conditions of this BeOpen Python License | |
| 144 | -Agreement, BeOpen hereby grants Licensee a non-exclusive, | |
| 145 | -royalty-free, world-wide license to reproduce, analyze, test, perform | |
| 146 | -and/or display publicly, prepare derivative works, distribute, and | |
| 147 | -otherwise use the Software alone or in any derivative version, | |
| 148 | -provided, however, that the BeOpen Python License is retained in the | |
| 149 | -Software, alone or in any derivative version prepared by Licensee. | |
| 150 | - | |
| 151 | -3. BeOpen is making the Software available to Licensee on an "AS IS" | |
| 152 | -basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR | |
| 153 | -IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND | |
| 154 | -DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS | |
| 155 | -FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT | |
| 156 | -INFRINGE ANY THIRD PARTY RIGHTS. | |
| 157 | - | |
| 158 | -4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE | |
| 159 | -SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS | |
| 160 | -AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY | |
| 161 | -DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. | |
| 162 | - | |
| 163 | -5. This License Agreement will automatically terminate upon a material | |
| 164 | -breach of its terms and conditions. | |
| 165 | - | |
| 166 | -6. This License Agreement shall be governed by and interpreted in all | |
| 167 | -respects by the law of the State of California, excluding conflict of | |
| 168 | -law provisions. Nothing in this License Agreement shall be deemed to | |
| 169 | -create any relationship of agency, partnership, or joint venture | |
| 170 | -between BeOpen and Licensee. This License Agreement does not grant | |
| 171 | -permission to use BeOpen trademarks or trade names in a trademark | |
| 172 | -sense to endorse or promote products or services of Licensee, or any | |
| 173 | -third party. As an exception, the "BeOpen Python" logos available at | |
| 174 | -http://www.pythonlabs.com/logos.html may be used according to the | |
| 175 | -permissions granted on that web page. | |
| 176 | - | |
| 177 | -7. By copying, installing or otherwise using the software, Licensee | |
| 178 | -agrees to be bound by the terms and conditions of this License | |
| 179 | -Agreement. | |
| 180 | - | |
| 181 | - | |
| 182 | -CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1 | |
| 183 | ---------------------------------------- | |
| 184 | - | |
| 185 | -1. This LICENSE AGREEMENT is between the Corporation for National | |
| 186 | -Research Initiatives, having an office at 1895 Preston White Drive, | |
| 187 | -Reston, VA 20191 ("CNRI"), and the Individual or Organization | |
| 188 | -("Licensee") accessing and otherwise using Python 1.6.1 software in | |
| 189 | -source or binary form and its associated documentation. | |
| 190 | - | |
| 191 | -2. Subject to the terms and conditions of this License Agreement, CNRI | |
| 192 | -hereby grants Licensee a nonexclusive, royalty-free, world-wide | |
| 193 | -license to reproduce, analyze, test, perform and/or display publicly, | |
| 194 | -prepare derivative works, distribute, and otherwise use Python 1.6.1 | |
| 195 | -alone or in any derivative version, provided, however, that CNRI's | |
| 196 | -License Agreement and CNRI's notice of copyright, i.e., "Copyright (c) | |
| 197 | -1995-2001 Corporation for National Research Initiatives; All Rights | |
| 198 | -Reserved" are retained in Python 1.6.1 alone or in any derivative | |
| 199 | -version prepared by Licensee. Alternately, in lieu of CNRI's License | |
| 200 | -Agreement, Licensee may substitute the following text (omitting the | |
| 201 | -quotes): "Python 1.6.1 is made available subject to the terms and | |
| 202 | -conditions in CNRI's License Agreement. This Agreement together with | |
| 203 | -Python 1.6.1 may be located on the Internet using the following | |
| 204 | -unique, persistent identifier (known as a handle): 1895.22/1013. This | |
| 205 | -Agreement may also be obtained from a proxy server on the Internet | |
| 206 | -using the following URL: http://hdl.handle.net/1895.22/1013". | |
| 207 | - | |
| 208 | -3. In the event Licensee prepares a derivative work that is based on | |
| 209 | -or incorporates Python 1.6.1 or any part thereof, and wants to make | |
| 210 | -the derivative work available to others as provided herein, then | |
| 211 | -Licensee hereby agrees to include in any such work a brief summary of | |
| 212 | -the changes made to Python 1.6.1. | |
| 213 | - | |
| 214 | -4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS" | |
| 215 | -basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR | |
| 216 | -IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND | |
| 217 | -DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS | |
| 218 | -FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT | |
| 219 | -INFRINGE ANY THIRD PARTY RIGHTS. | |
| 220 | - | |
| 221 | -5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON | |
| 222 | -1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS | |
| 223 | -A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1, | |
| 224 | -OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. | |
| 225 | - | |
| 226 | -6. This License Agreement will automatically terminate upon a material | |
| 227 | -breach of its terms and conditions. | |
| 228 | - | |
| 229 | -7. This License Agreement shall be governed by the federal | |
| 230 | -intellectual property law of the United States, including without | |
| 231 | -limitation the federal copyright law, and, to the extent such | |
| 232 | -U.S. federal law does not apply, by the law of the Commonwealth of | |
| 233 | -Virginia, excluding Virginia's conflict of law provisions. | |
| 234 | -Notwithstanding the foregoing, with regard to derivative works based | |
| 235 | -on Python 1.6.1 that incorporate non-separable material that was | |
| 236 | -previously distributed under the GNU General Public License (GPL), the | |
| 237 | -law of the Commonwealth of Virginia shall govern this License | |
| 238 | -Agreement only as to issues arising under or with respect to | |
| 239 | -Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this | |
| 240 | -License Agreement shall be deemed to create any relationship of | |
| 241 | -agency, partnership, or joint venture between CNRI and Licensee. This | |
| 242 | -License Agreement does not grant permission to use CNRI trademarks or | |
| 243 | -trade name in a trademark sense to endorse or promote products or | |
| 244 | -services of Licensee, or any third party. | |
| 245 | - | |
| 246 | -8. By clicking on the "ACCEPT" button where indicated, or by copying, | |
| 247 | -installing or otherwise using Python 1.6.1, Licensee agrees to be | |
| 248 | -bound by the terms and conditions of this License Agreement. | |
| 249 | - | |
| 250 | - ACCEPT | |
| 251 | - | |
| 252 | - | |
| 253 | -CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2 | |
| 254 | --------------------------------------------------- | |
| 255 | - | |
| 256 | -Copyright (c) 1991 - 1995, Stichting Mathematisch Centrum Amsterdam, | |
| 257 | -The Netherlands. All rights reserved. | |
| 258 | - | |
| 259 | -Permission to use, copy, modify, and distribute this software and its | |
| 260 | -documentation for any purpose and without fee is hereby granted, | |
| 261 | -provided that the above copyright notice appear in all copies and that | |
| 262 | -both that copyright notice and this permission notice appear in | |
| 263 | -supporting documentation, and that the name of Stichting Mathematisch | |
| 264 | -Centrum or CWI not be used in advertising or publicity pertaining to | |
| 265 | -distribution of the software without specific, written prior | |
| 266 | -permission. | |
| 267 | - | |
| 268 | -STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO | |
| 269 | -THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND | |
| 270 | -FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE | |
| 271 | -FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES | |
| 272 | -WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN | |
| 273 | -ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT | |
| 274 | -OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. | |
| 275 | - |
oletools/thirdparty/zipfile27/__init__.py deleted
| 1 | -# Excerpt from the zipfile module from Python 2.7, to enable is_zipfile | |
| 2 | -# to check any file object (e.g. in memory), for Python 2.6. | |
| 3 | -# is_zipfile in Python 2.6 can only check files on disk. | |
| 4 | - | |
| 5 | -# This code from Python 2.7 was not modified. | |
| 6 | - | |
| 7 | -# 2016-09-06 v0.01 PL: - first version | |
| 8 | - | |
| 9 | - | |
| 10 | -from zipfile import _EndRecData | |
| 11 | - | |
| 12 | -def _check_zipfile(fp): | |
| 13 | - try: | |
| 14 | - if _EndRecData(fp): | |
| 15 | - return True # file has correct magic number | |
| 16 | - except IOError: | |
| 17 | - pass | |
| 18 | - return False | |
| 19 | - | |
| 20 | -def is_zipfile(filename): | |
| 21 | - """Quickly see if a file is a ZIP file by checking the magic number. | |
| 22 | - | |
| 23 | - The filename argument may be a file or file-like object too. | |
| 24 | - """ | |
| 25 | - result = False | |
| 26 | - try: | |
| 27 | - if hasattr(filename, "read"): | |
| 28 | - result = _check_zipfile(fp=filename) | |
| 29 | - else: | |
| 30 | - with open(filename, "rb") as fp: | |
| 31 | - result = _check_zipfile(fp) | |
| 32 | - except IOError: | |
| 33 | - pass | |
| 34 | - return result | |
| 35 | - |
oletools/xls_parser.py
| ... | ... | @@ -5,7 +5,7 @@ Read storages, (sub-)streams, records from xls file |
| 5 | 5 | # |
| 6 | 6 | # === LICENSE ================================================================== |
| 7 | 7 | |
| 8 | -# xls_parser is copyright (c) 2014-2018 Philippe Lagadec (http://www.decalage.info) | |
| 8 | +# xls_parser is copyright (c) 2014-2019 Philippe Lagadec (http://www.decalage.info) | |
| 9 | 9 | # All rights reserved. |
| 10 | 10 | # |
| 11 | 11 | # Redistribution and use in source and binary forms, with or without modification, |
| ... | ... | @@ -33,8 +33,10 @@ Read storages, (sub-)streams, records from xls file |
| 33 | 33 | # 2017-11-02 v0.1 CH: - first version |
| 34 | 34 | # 2017-11-02 v0.2 CH: - move some code to record_base.py |
| 35 | 35 | # (to avoid copy-and-paste in ppt_parser.py) |
| 36 | +# 2019-01-30 v0.54 PL: - fixed import to avoid mixing installed oletools | |
| 37 | +# and dev version | |
| 36 | 38 | |
| 37 | -__version__ = '0.2' | |
| 39 | +__version__ = '0.54' | |
| 38 | 40 | |
| 39 | 41 | # ----------------------------------------------------------------------------- |
| 40 | 42 | # TODO: |
| ... | ... | @@ -56,17 +58,14 @@ import os.path |
| 56 | 58 | from struct import unpack |
| 57 | 59 | import logging |
| 58 | 60 | |
| 59 | -try: | |
| 60 | - from oletools import record_base | |
| 61 | -except ImportError: | |
| 62 | - # little hack to allow absolute imports even if oletools is not installed. | |
| 63 | - # Copied from olevba.py | |
| 64 | - PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname( | |
| 65 | - os.path.abspath(__file__)))) | |
| 66 | - if PARENT_DIR not in sys.path: | |
| 67 | - sys.path.insert(0, PARENT_DIR) | |
| 68 | - del PARENT_DIR | |
| 69 | - from oletools import record_base | |
| 61 | +# little hack to allow absolute imports even if oletools is not installed. | |
| 62 | +# Copied from olevba.py | |
| 63 | +PARENT_DIR = os.path.normpath(os.path.dirname(os.path.dirname( | |
| 64 | + os.path.abspath(__file__)))) | |
| 65 | +if PARENT_DIR not in sys.path: | |
| 66 | + sys.path.insert(0, PARENT_DIR) | |
| 67 | +del PARENT_DIR | |
| 68 | +from oletools import record_base | |
| 70 | 69 | |
| 71 | 70 | |
| 72 | 71 | # === PYTHON 2+3 SUPPORT ====================================================== |
| ... | ... | @@ -89,12 +88,18 @@ def is_xls(filename): |
| 89 | 88 | substream. |
| 90 | 89 | See also: oleid.OleID.check_excel |
| 91 | 90 | """ |
| 91 | + xls_file = None | |
| 92 | 92 | try: |
| 93 | - for stream in XlsFile(filename).iter_streams(): | |
| 93 | + xls_file = XlsFile(filename) | |
| 94 | + for stream in xls_file.iter_streams(): | |
| 94 | 95 | if isinstance(stream, WorkbookStream): |
| 95 | 96 | return True |
| 96 | 97 | except Exception: |
| 97 | - pass | |
| 98 | + logging.debug('Ignoring exception in is_xls, assume is not xls', | |
| 99 | + exc_info=True) | |
| 100 | + finally: | |
| 101 | + if xls_file is not None: | |
| 102 | + xls_file.close() | |
| 98 | 103 | return False |
| 99 | 104 | |
| 100 | 105 | |
| ... | ... | @@ -102,7 +107,7 @@ def read_unicode(data, start_idx, n_chars): |
| 102 | 107 | """ read a unicode string from a XLUnicodeStringNoCch structure """ |
| 103 | 108 | # first bit 0x0 --> only low-bytes are saved, all high bytes are 0 |
| 104 | 109 | # first bit 0x1 --> 2 bytes per character |
| 105 | - low_bytes_only = (ord(data[start_idx]) == 0) | |
| 110 | + low_bytes_only = (ord(data[start_idx:start_idx+1]) == 0) | |
| 106 | 111 | if low_bytes_only: |
| 107 | 112 | end_idx = start_idx + 1 + n_chars |
| 108 | 113 | return data[start_idx+1:end_idx].decode('ascii'), end_idx |
| ... | ... | @@ -350,6 +355,7 @@ class XlsRecordSupBook(XlsRecord): |
| 350 | 355 | LINK_TYPE_EXTERNAL = 'external workbook' |
| 351 | 356 | |
| 352 | 357 | def finish_constructing(self, _): |
| 358 | + """Finish constructing this record; called at end of constructor.""" | |
| 353 | 359 | # set defaults |
| 354 | 360 | self.ctab = None |
| 355 | 361 | self.cch = None | ... | ... |
requirements.txt
setup.py
| ... | ... | @@ -28,6 +28,9 @@ to install this package. |
| 28 | 28 | # 2018-09-15 PL: - easygui is now a dependency |
| 29 | 29 | # 2018-09-22 PL: - colorclass is now a dependency |
| 30 | 30 | # 2018-10-27 PL: - fixed issue #359 (bug when importing log_helper) |
| 31 | +# 2019-02-26 CH: - add optional dependency msoffcrypto for decryption | |
| 32 | +# 2019-05-22 PL: - 'msoffcrypto-tool' is now a required dependency | |
| 33 | +# 2019-05-23 v0.55 PL: - added pcodedmp as dependency | |
| 31 | 34 | |
| 32 | 35 | #--- TODO --------------------------------------------------------------------- |
| 33 | 36 | |
| ... | ... | @@ -47,7 +50,7 @@ import os, fnmatch |
| 47 | 50 | #--- METADATA ----------------------------------------------------------------- |
| 48 | 51 | |
| 49 | 52 | name = "oletools" |
| 50 | -version = '0.54dev4' | |
| 53 | +version = '0.55.dev3' | |
| 51 | 54 | desc = "Python tools to analyze security characteristics of MS Office and OLE files (also called Structured Storage, Compound File Binary Format or Compound Document File Format), for Malware Analysis and Incident Response #DFIR" |
| 52 | 55 | long_desc = open('oletools/README.rst').read() |
| 53 | 56 | author = "Philippe Lagadec" |
| ... | ... | @@ -73,6 +76,7 @@ classifiers=[ |
| 73 | 76 | "Programming Language :: Python :: 3.4", |
| 74 | 77 | "Programming Language :: Python :: 3.5", |
| 75 | 78 | "Programming Language :: Python :: 3.6", |
| 79 | + "Programming Language :: Python :: 3.7", | |
| 76 | 80 | "Topic :: Security", |
| 77 | 81 | "Topic :: Software Development :: Libraries :: Python Modules", |
| 78 | 82 | ] |
| ... | ... | @@ -89,7 +93,7 @@ packages=[ |
| 89 | 93 | 'oletools.thirdparty.xglob', |
| 90 | 94 | 'oletools.thirdparty.DridexUrlDecoder', |
| 91 | 95 | 'oletools.thirdparty.tablestream', |
| 92 | - 'oletools.thirdparty.zipfile27', | |
| 96 | + 'oletools.thirdparty.oledump', | |
| 93 | 97 | ] |
| 94 | 98 | ##setupdir = '.' |
| 95 | 99 | ##package_dir={'': setupdir} |
| ... | ... | @@ -177,9 +181,6 @@ package_data={ |
| 177 | 181 | 'oletools.thirdparty.DridexUrlDecoder': [ |
| 178 | 182 | 'LICENSE.txt', |
| 179 | 183 | ], |
| 180 | - 'oletools.thirdparty.zipfile27': [ | |
| 181 | - 'LICENSE.txt', | |
| 182 | - ], | |
| 183 | 184 | # 'oletools.thirdparty.tablestream': [ |
| 184 | 185 | # 'LICENSE', 'README', |
| 185 | 186 | # ], |
| ... | ... | @@ -305,11 +306,11 @@ def main(): |
| 305 | 306 | author_email=author_email, |
| 306 | 307 | url=url, |
| 307 | 308 | license=license, |
| 308 | -## package_dir=package_dir, | |
| 309 | + # package_dir=package_dir, | |
| 309 | 310 | packages=packages, |
| 310 | 311 | package_data = package_data, |
| 311 | 312 | download_url=download_url, |
| 312 | -# data_files=data_files, | |
| 313 | + # data_files=data_files, | |
| 313 | 314 | entry_points=entry_points, |
| 314 | 315 | test_suite="tests", |
| 315 | 316 | # scripts=scripts, |
| ... | ... | @@ -318,6 +319,8 @@ def main(): |
| 318 | 319 | "olefile>=0.46", |
| 319 | 320 | "easygui", |
| 320 | 321 | 'colorclass', |
| 322 | + 'msoffcrypto-tool', | |
| 323 | + 'pcodedmp>=1.2.5', | |
| 321 | 324 | ], |
| 322 | 325 | ) |
| 323 | 326 | ... | ... |
tests/common/log_helper/log_helper_test_imported.py
| ... | ... | @@ -11,6 +11,8 @@ INFO_MESSAGE = 'imported: info log' |
| 11 | 11 | WARNING_MESSAGE = 'imported: warning log' |
| 12 | 12 | ERROR_MESSAGE = 'imported: error log' |
| 13 | 13 | CRITICAL_MESSAGE = 'imported: critical log' |
| 14 | +RESULT_MESSAGE = 'imported: result log' | |
| 15 | +RESULT_TYPE = 'imported: result' | |
| 14 | 16 | |
| 15 | 17 | logger = log_helper.get_or_create_silent_logger('test_imported', logging.ERROR) |
| 16 | 18 | |
| ... | ... | @@ -21,3 +23,4 @@ def log(): |
| 21 | 23 | logger.warning(WARNING_MESSAGE) |
| 22 | 24 | logger.error(ERROR_MESSAGE) |
| 23 | 25 | logger.critical(CRITICAL_MESSAGE) |
| 26 | + logger.info(RESULT_MESSAGE, type=RESULT_TYPE) | ... | ... |
tests/common/log_helper/log_helper_test_main.py
| ... | ... | @@ -9,6 +9,8 @@ INFO_MESSAGE = 'main: info log' |
| 9 | 9 | WARNING_MESSAGE = 'main: warning log' |
| 10 | 10 | ERROR_MESSAGE = 'main: error log' |
| 11 | 11 | CRITICAL_MESSAGE = 'main: critical log' |
| 12 | +RESULT_MESSAGE = 'main: result log' | |
| 13 | +RESULT_TYPE = 'main: result' | |
| 12 | 14 | |
| 13 | 15 | logger = log_helper.get_or_create_silent_logger('test_main') |
| 14 | 16 | |
| ... | ... | @@ -32,12 +34,16 @@ def init_logging_and_log(args): |
| 32 | 34 | level = args[-1] |
| 33 | 35 | use_json = 'as-json' in args |
| 34 | 36 | throw = 'throw' in args |
| 37 | + percent_autoformat = '%-autoformat' in args | |
| 35 | 38 | |
| 36 | 39 | if 'enable' in args: |
| 37 | 40 | log_helper.enable_logging(use_json, level, stream=sys.stdout) |
| 38 | 41 | |
| 39 | 42 | _log() |
| 40 | 43 | |
| 44 | + if percent_autoformat: | |
| 45 | + logger.info('The %s is %d.', 'answer', 47) | |
| 46 | + | |
| 41 | 47 | if throw: |
| 42 | 48 | raise Exception('An exception occurred before ending the logging') |
| 43 | 49 | |
| ... | ... | @@ -50,6 +56,7 @@ def _log(): |
| 50 | 56 | logger.warning(WARNING_MESSAGE) |
| 51 | 57 | logger.error(ERROR_MESSAGE) |
| 52 | 58 | logger.critical(CRITICAL_MESSAGE) |
| 59 | + logger.info(RESULT_MESSAGE, type=RESULT_TYPE) | |
| 53 | 60 | log_helper_test_imported.log() |
| 54 | 61 | |
| 55 | 62 | ... | ... |
tests/common/log_helper/test_log_helper.py
| ... | ... | @@ -13,9 +13,11 @@ from tests.common.log_helper import log_helper_test_main |
| 13 | 13 | from tests.common.log_helper import log_helper_test_imported |
| 14 | 14 | from os.path import dirname, join, relpath, abspath |
| 15 | 15 | |
| 16 | +from tests.test_utils import PROJECT_ROOT | |
| 17 | + | |
| 16 | 18 | # this is the common base of "tests" and "oletools" dirs |
| 17 | -ROOT_DIRECTORY = abspath(join(__file__, '..', '..', '..', '..')) | |
| 18 | -TEST_FILE = relpath(join(dirname(__file__), 'log_helper_test_main.py'), ROOT_DIRECTORY) | |
| 19 | +TEST_FILE = relpath(join(dirname(abspath(__file__)), 'log_helper_test_main.py'), | |
| 20 | + PROJECT_ROOT) | |
| 19 | 21 | PYTHON_EXECUTABLE = sys.executable |
| 20 | 22 | |
| 21 | 23 | MAIN_LOG_MESSAGES = [ |
| ... | ... | @@ -59,6 +61,62 @@ class TestLogHelper(unittest.TestCase): |
| 59 | 61 | log_helper_test_imported.CRITICAL_MESSAGE |
| 60 | 62 | ]) |
| 61 | 63 | |
| 64 | + def test_logs_type_ignored(self): | |
| 65 | + """Run test script with logging enabled at info level. Want no type.""" | |
| 66 | + output = self._run_test(['enable', 'info']) | |
| 67 | + | |
| 68 | + expect = '\n'.join([ | |
| 69 | + 'INFO ' + log_helper_test_main.INFO_MESSAGE, | |
| 70 | + 'WARNING ' + log_helper_test_main.WARNING_MESSAGE, | |
| 71 | + 'ERROR ' + log_helper_test_main.ERROR_MESSAGE, | |
| 72 | + 'CRITICAL ' + log_helper_test_main.CRITICAL_MESSAGE, | |
| 73 | + 'INFO ' + log_helper_test_main.RESULT_MESSAGE, | |
| 74 | + 'INFO ' + log_helper_test_imported.INFO_MESSAGE, | |
| 75 | + 'WARNING ' + log_helper_test_imported.WARNING_MESSAGE, | |
| 76 | + 'ERROR ' + log_helper_test_imported.ERROR_MESSAGE, | |
| 77 | + 'CRITICAL ' + log_helper_test_imported.CRITICAL_MESSAGE, | |
| 78 | + 'INFO ' + log_helper_test_imported.RESULT_MESSAGE, | |
| 79 | + ]) | |
| 80 | + self.assertEqual(output, expect) | |
| 81 | + | |
| 82 | + def test_logs_type_in_json(self): | |
| 83 | + """Check type field is contained in json log.""" | |
| 84 | + output = self._run_test(['enable', 'as-json', 'info']) | |
| 85 | + | |
| 86 | + # convert to json preserving order of output | |
| 87 | + jout = json.loads(output) | |
| 88 | + | |
| 89 | + jexpect = [ | |
| 90 | + dict(type='msg', level='INFO', | |
| 91 | + msg=log_helper_test_main.INFO_MESSAGE), | |
| 92 | + dict(type='msg', level='WARNING', | |
| 93 | + msg=log_helper_test_main.WARNING_MESSAGE), | |
| 94 | + dict(type='msg', level='ERROR', | |
| 95 | + msg=log_helper_test_main.ERROR_MESSAGE), | |
| 96 | + dict(type='msg', level='CRITICAL', | |
| 97 | + msg=log_helper_test_main.CRITICAL_MESSAGE), | |
| 98 | + # this is the important entry (has a different "type" field): | |
| 99 | + dict(type=log_helper_test_main.RESULT_TYPE, level='INFO', | |
| 100 | + msg=log_helper_test_main.RESULT_MESSAGE), | |
| 101 | + dict(type='msg', level='INFO', | |
| 102 | + msg=log_helper_test_imported.INFO_MESSAGE), | |
| 103 | + dict(type='msg', level='WARNING', | |
| 104 | + msg=log_helper_test_imported.WARNING_MESSAGE), | |
| 105 | + dict(type='msg', level='ERROR', | |
| 106 | + msg=log_helper_test_imported.ERROR_MESSAGE), | |
| 107 | + dict(type='msg', level='CRITICAL', | |
| 108 | + msg=log_helper_test_imported.CRITICAL_MESSAGE), | |
| 109 | + # ... and this: | |
| 110 | + dict(type=log_helper_test_imported.RESULT_TYPE, level='INFO', | |
| 111 | + msg=log_helper_test_imported.RESULT_MESSAGE), | |
| 112 | + ] | |
| 113 | + self.assertEqual(jout, jexpect) | |
| 114 | + | |
| 115 | + def test_percent_autoformat(self): | |
| 116 | + """Test that auto-formatting of log strings with `%` works.""" | |
| 117 | + output = self._run_test(['enable', '%-autoformat', 'info']) | |
| 118 | + self.assertIn('The answer is 47.', output) | |
| 119 | + | |
| 62 | 120 | def test_json_correct_on_exceptions(self): |
| 63 | 121 | """ |
| 64 | 122 | Test that even on unhandled exceptions our JSON is always correct |
| ... | ... | @@ -72,10 +130,10 @@ class TestLogHelper(unittest.TestCase): |
| 72 | 130 | def _assert_json_messages(self, output, messages): |
| 73 | 131 | try: |
| 74 | 132 | json_data = json.loads(output) |
| 75 | - self.assertEquals(len(json_data), len(messages)) | |
| 133 | + self.assertEqual(len(json_data), len(messages)) | |
| 76 | 134 | |
| 77 | 135 | for i in range(len(messages)): |
| 78 | - self.assertEquals(messages[i], json_data[i]['msg']) | |
| 136 | + self.assertEqual(messages[i], json_data[i]['msg']) | |
| 79 | 137 | except ValueError: |
| 80 | 138 | self.fail('Invalid json:\n' + output) |
| 81 | 139 | |
| ... | ... | @@ -90,9 +148,9 @@ class TestLogHelper(unittest.TestCase): |
| 90 | 148 | child = subprocess.Popen( |
| 91 | 149 | [PYTHON_EXECUTABLE, TEST_FILE] + args, |
| 92 | 150 | shell=False, |
| 93 | - env={'PYTHONPATH': ROOT_DIRECTORY}, | |
| 151 | + env={'PYTHONPATH': PROJECT_ROOT}, | |
| 94 | 152 | universal_newlines=True, |
| 95 | - cwd=ROOT_DIRECTORY, | |
| 153 | + cwd=PROJECT_ROOT, | |
| 96 | 154 | stdin=None, |
| 97 | 155 | stdout=subprocess.PIPE, |
| 98 | 156 | stderr=subprocess.PIPE |
| ... | ... | @@ -102,7 +160,7 @@ class TestLogHelper(unittest.TestCase): |
| 102 | 160 | if not isinstance(output, str): |
| 103 | 161 | output = output.decode('utf-8') |
| 104 | 162 | |
| 105 | - self.assertEquals(child.returncode == 0, should_succeed) | |
| 163 | + self.assertEqual(child.returncode == 0, should_succeed) | |
| 106 | 164 | |
| 107 | 165 | return output.strip() |
| 108 | 166 | ... | ... |
tests/msodde/test_basic.py
| ... | ... | @@ -9,11 +9,16 @@ Ensure that |
| 9 | 9 | from __future__ import print_function |
| 10 | 10 | |
| 11 | 11 | import unittest |
| 12 | -from oletools import msodde | |
| 13 | -from tests.test_utils import DATA_BASE_DIR as BASE_DIR | |
| 12 | +import sys | |
| 14 | 13 | import os |
| 15 | -from os.path import join | |
| 14 | +from os.path import join, basename | |
| 16 | 15 | from traceback import print_exc |
| 16 | +import json | |
| 17 | +from collections import OrderedDict | |
| 18 | +from oletools import msodde | |
| 19 | +from oletools.crypto import \ | |
| 20 | + WrongEncryptionPassword, CryptoLibNotImported, check_msoffcrypto | |
| 21 | +from tests.test_utils import call_and_capture, DATA_BASE_DIR as BASE_DIR | |
| 17 | 22 | |
| 18 | 23 | |
| 19 | 24 | class TestReturnCode(unittest.TestCase): |
| ... | ... | @@ -46,15 +51,21 @@ class TestReturnCode(unittest.TestCase): |
| 46 | 51 | |
| 47 | 52 | def test_invalid_none(self): |
| 48 | 53 | """ check that no file argument leads to non-zero exit status """ |
| 49 | - self.do_test_validity('', True) | |
| 54 | + if sys.hexversion > 0x03030000: # version 3.3 and higher | |
| 55 | + # different errors probably depending on whether msoffcryto is | |
| 56 | + # available or not | |
| 57 | + expect_error = (AttributeError, FileNotFoundError) | |
| 58 | + else: | |
| 59 | + expect_error = (AttributeError, IOError) | |
| 60 | + self.do_test_validity('', expect_error) | |
| 50 | 61 | |
| 51 | 62 | def test_invalid_empty(self): |
| 52 | 63 | """ check that empty file argument leads to non-zero exit status """ |
| 53 | - self.do_test_validity(join(BASE_DIR, 'basic/empty'), True) | |
| 64 | + self.do_test_validity(join(BASE_DIR, 'basic/empty'), Exception) | |
| 54 | 65 | |
| 55 | 66 | def test_invalid_text(self): |
| 56 | 67 | """ check that text file argument leads to non-zero exit status """ |
| 57 | - self.do_test_validity(join(BASE_DIR, 'basic/text'), True) | |
| 68 | + self.do_test_validity(join(BASE_DIR, 'basic/text'), Exception) | |
| 58 | 69 | |
| 59 | 70 | def test_encrypted(self): |
| 60 | 71 | """ |
| ... | ... | @@ -64,28 +75,56 @@ class TestReturnCode(unittest.TestCase): |
| 64 | 75 | Encryption) is tested. |
| 65 | 76 | """ |
| 66 | 77 | CRYPT_DIR = join(BASE_DIR, 'encrypted') |
| 67 | - ADD_ARGS = '', '-j', '-d', '-f', '-a' | |
| 78 | + have_crypto = check_msoffcrypto() | |
| 68 | 79 | for filename in os.listdir(CRYPT_DIR): |
| 69 | - full_name = join(CRYPT_DIR, filename) | |
| 70 | - for args in ADD_ARGS: | |
| 71 | - self.do_test_validity(args + ' ' + full_name, True) | |
| 72 | - | |
| 73 | - def do_test_validity(self, args, expect_error=False): | |
| 74 | - """ helper for test_valid_doc[x] """ | |
| 75 | - have_exception = False | |
| 80 | + if have_crypto and 'standardpassword' in filename: | |
| 81 | + # these are automagically decrypted | |
| 82 | + self.do_test_validity(join(CRYPT_DIR, filename)) | |
| 83 | + elif have_crypto: | |
| 84 | + self.do_test_validity(join(CRYPT_DIR, filename), | |
| 85 | + WrongEncryptionPassword) | |
| 86 | + else: | |
| 87 | + self.do_test_validity(join(CRYPT_DIR, filename), | |
| 88 | + CryptoLibNotImported) | |
| 89 | + | |
| 90 | + def do_test_validity(self, filename, expect_error=None): | |
| 91 | + """ helper for test_[in]valid_* """ | |
| 92 | + found_error = None | |
| 93 | + # DEBUG: print('Testing file {}'.format(filename)) | |
| 76 | 94 | try: |
| 77 | - msodde.process_file(args, msodde.FIELD_FILTER_BLACKLIST) | |
| 78 | - except Exception: | |
| 79 | - have_exception = True | |
| 80 | - print_exc() | |
| 81 | - except SystemExit as exc: # sys.exit() was called | |
| 82 | - have_exception = True | |
| 83 | - if exc.code is None: | |
| 84 | - have_exception = False | |
| 85 | - | |
| 86 | - self.assertEqual(expect_error, have_exception, | |
| 87 | - msg='Args={0}, expect={1}, exc={2}' | |
| 88 | - .format(args, expect_error, have_exception)) | |
| 95 | + msodde.process_maybe_encrypted(filename, | |
| 96 | + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST) | |
| 97 | + except Exception as exc: | |
| 98 | + found_error = exc | |
| 99 | + # DEBUG: print_exc() | |
| 100 | + | |
| 101 | + if expect_error and not found_error: | |
| 102 | + self.fail('Expected {} but msodde finished without errors for {}' | |
| 103 | + .format(expect_error, filename)) | |
| 104 | + elif not expect_error and found_error: | |
| 105 | + self.fail('Unexpected error {} from msodde for {}' | |
| 106 | + .format(found_error, filename)) | |
| 107 | + elif expect_error and not isinstance(found_error, expect_error): | |
| 108 | + self.fail('Wrong kind of error {} from msodde for {}, expected {}' | |
| 109 | + .format(type(found_error), filename, expect_error)) | |
| 110 | + | |
| 111 | + | |
| 112 | +@unittest.skipIf(not check_msoffcrypto(), | |
| 113 | + 'Module msoffcrypto not installed for {}' | |
| 114 | + .format(basename(sys.executable))) | |
| 115 | +class TestErrorOutput(unittest.TestCase): | |
| 116 | + """msodde does not specify error by return code but text output.""" | |
| 117 | + | |
| 118 | + def test_crypt_output(self): | |
| 119 | + """Check for helpful error message when failing to decrypt.""" | |
| 120 | + for suffix in 'doc', 'docm', 'docx', 'ppt', 'pptm', 'pptx', 'xls', \ | |
| 121 | + 'xlsb', 'xlsm', 'xlsx': | |
| 122 | + example_file = join(BASE_DIR, 'encrypted', 'encrypted.' + suffix) | |
| 123 | + output, ret_code = call_and_capture('msodde', [example_file, ], | |
| 124 | + accept_nonzero_exit=True) | |
| 125 | + self.assertEqual(ret_code, 1) | |
| 126 | + self.assertIn('passwords could not decrypt office file', output, | |
| 127 | + msg='Unexpected output: {}'.format(output.strip())) | |
| 89 | 128 | |
| 90 | 129 | |
| 91 | 130 | class TestDdeLinks(unittest.TestCase): |
| ... | ... | @@ -100,33 +139,37 @@ class TestDdeLinks(unittest.TestCase): |
| 100 | 139 | def test_with_dde(self): |
| 101 | 140 | """ check that dde links appear on stdout """ |
| 102 | 141 | filename = 'dde-test-from-office2003.doc' |
| 103 | - output = msodde.process_file( | |
| 104 | - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST) | |
| 142 | + output = msodde.process_maybe_encrypted( | |
| 143 | + join(BASE_DIR, 'msodde', filename), | |
| 144 | + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST) | |
| 105 | 145 | self.assertNotEqual(len(self.get_dde_from_output(output)), 0, |
| 106 | 146 | msg='Found no dde links in output of ' + filename) |
| 107 | 147 | |
| 108 | 148 | def test_no_dde(self): |
| 109 | 149 | """ check that no dde links appear on stdout """ |
| 110 | 150 | filename = 'harmless-clean.doc' |
| 111 | - output = msodde.process_file( | |
| 112 | - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST) | |
| 151 | + output = msodde.process_maybe_encrypted( | |
| 152 | + join(BASE_DIR, 'msodde', filename), | |
| 153 | + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST) | |
| 113 | 154 | self.assertEqual(len(self.get_dde_from_output(output)), 0, |
| 114 | 155 | msg='Found dde links in output of ' + filename) |
| 115 | 156 | |
| 116 | 157 | def test_with_dde_utf16le(self): |
| 117 | 158 | """ check that dde links appear on stdout """ |
| 118 | 159 | filename = 'dde-test-from-office2013-utf_16le-korean.doc' |
| 119 | - output = msodde.process_file( | |
| 120 | - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST) | |
| 160 | + output = msodde.process_maybe_encrypted( | |
| 161 | + join(BASE_DIR, 'msodde', filename), | |
| 162 | + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST) | |
| 121 | 163 | self.assertNotEqual(len(self.get_dde_from_output(output)), 0, |
| 122 | 164 | msg='Found no dde links in output of ' + filename) |
| 123 | 165 | |
| 124 | 166 | def test_excel(self): |
| 125 | 167 | """ check that dde links are found in excel 2007+ files """ |
| 126 | - expect = ['DDE-Link cmd /c calc.exe', ] | |
| 168 | + expect = ['cmd /c calc.exe', ] | |
| 127 | 169 | for extn in 'xlsx', 'xlsm', 'xlsb': |
| 128 | - output = msodde.process_file( | |
| 129 | - join(BASE_DIR, 'msodde', 'dde-test.' + extn), msodde.FIELD_FILTER_BLACKLIST) | |
| 170 | + output = msodde.process_maybe_encrypted( | |
| 171 | + join(BASE_DIR, 'msodde', 'dde-test.' + extn), | |
| 172 | + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST) | |
| 130 | 173 | |
| 131 | 174 | self.assertEqual(expect, self.get_dde_from_output(output), |
| 132 | 175 | msg='unexpected output for dde-test.{0}: {1}' |
| ... | ... | @@ -136,8 +179,9 @@ class TestDdeLinks(unittest.TestCase): |
| 136 | 179 | """ check that dde in xml from word / excel is found """ |
| 137 | 180 | for name_part in 'excel2003', 'word2003', 'word2007': |
| 138 | 181 | filename = 'dde-in-' + name_part + '.xml' |
| 139 | - output = msodde.process_file( | |
| 140 | - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST) | |
| 182 | + output = msodde.process_maybe_encrypted( | |
| 183 | + join(BASE_DIR, 'msodde', filename), | |
| 184 | + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST) | |
| 141 | 185 | links = self.get_dde_from_output(output) |
| 142 | 186 | self.assertEqual(len(links), 1, 'found {0} dde-links in {1}' |
| 143 | 187 | .format(len(links), filename)) |
| ... | ... | @@ -149,15 +193,17 @@ class TestDdeLinks(unittest.TestCase): |
| 149 | 193 | def test_clean_rtf_blacklist(self): |
| 150 | 194 | """ find a lot of hyperlinks in rtf spec """ |
| 151 | 195 | filename = 'RTF-Spec-1.7.rtf' |
| 152 | - output = msodde.process_file( | |
| 153 | - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_BLACKLIST) | |
| 196 | + output = msodde.process_maybe_encrypted( | |
| 197 | + join(BASE_DIR, 'msodde', filename), | |
| 198 | + field_filter_mode=msodde.FIELD_FILTER_BLACKLIST) | |
| 154 | 199 | self.assertEqual(len(self.get_dde_from_output(output)), 1413) |
| 155 | 200 | |
| 156 | 201 | def test_clean_rtf_ddeonly(self): |
| 157 | 202 | """ find no dde links in rtf spec """ |
| 158 | 203 | filename = 'RTF-Spec-1.7.rtf' |
| 159 | - output = msodde.process_file( | |
| 160 | - join(BASE_DIR, 'msodde', filename), msodde.FIELD_FILTER_DDE) | |
| 204 | + output = msodde.process_maybe_encrypted( | |
| 205 | + join(BASE_DIR, 'msodde', filename), | |
| 206 | + field_filter_mode=msodde.FIELD_FILTER_DDE) | |
| 161 | 207 | self.assertEqual(len(self.get_dde_from_output(output)), 0, |
| 162 | 208 | msg='Found dde links in output of ' + filename) |
| 163 | 209 | ... | ... |
tests/msodde/test_crypto.py
0 → 100644
| 1 | +"""Check decryption of files from msodde works.""" | |
| 2 | + | |
| 3 | +import sys | |
| 4 | +import unittest | |
| 5 | +from os.path import basename, join as pjoin | |
| 6 | + | |
| 7 | +from tests.test_utils import DATA_BASE_DIR, call_and_capture | |
| 8 | + | |
| 9 | +from oletools import crypto | |
| 10 | + | |
| 11 | + | |
| 12 | +@unittest.skipIf(not crypto.check_msoffcrypto(), | |
| 13 | + 'Module msoffcrypto not installed for {}' | |
| 14 | + .format(basename(sys.executable))) | |
| 15 | +class MsoddeCryptoTest(unittest.TestCase): | |
| 16 | + """Test integration of decryption in msodde.""" | |
| 17 | + | |
| 18 | + def test_standard_password(self): | |
| 19 | + """Check dde-link is found in xls[mb] sample files.""" | |
| 20 | + for suffix in 'xls', 'xlsx', 'xlsm', 'xlsb': | |
| 21 | + example_file = pjoin(DATA_BASE_DIR, 'encrypted', | |
| 22 | + 'dde-test-encrypt-standardpassword.' + suffix) | |
| 23 | + output, _ = call_and_capture('msodde', [example_file, ]) | |
| 24 | + self.assertIn('\nDDE Links:\ncmd /c calc.exe\n', output, | |
| 25 | + msg='Unexpected output {!r} for {}' | |
| 26 | + .format(output, suffix)) | |
| 27 | + | |
| 28 | + # TODO: add more, in particular a sample with a "proper" password | |
| 29 | + | |
| 30 | + | |
| 31 | +if __name__ == '__main__': | |
| 32 | + unittest.main() | ... | ... |
tests/oleid/test_basic.py
| ... | ... | @@ -20,7 +20,7 @@ class TestOleIDBasic(unittest.TestCase): |
| 20 | 20 | """Run all file in test-data through oleid and compare to known ouput""" |
| 21 | 21 | # this relies on order of indicators being constant, could relax that |
| 22 | 22 | # Also requires that files have the correct suffixes (no rtf in doc) |
| 23 | - NON_OLE_SUFFIXES = ('.xml', '.csv', '.rtf', '') | |
| 23 | + NON_OLE_SUFFIXES = ('.xml', '.csv', '.rtf', '', '.odt', '.ods', '.odp') | |
| 24 | 24 | NON_OLE_VALUES = (False, ) |
| 25 | 25 | WORD = b'Microsoft Office Word' |
| 26 | 26 | PPT = b'Microsoft Office PowerPoint' |
| ... | ... | @@ -121,6 +121,33 @@ class TestOleIDBasic(unittest.TestCase): |
| 121 | 121 | 'msodde/harmless-clean.docx': (False,), |
| 122 | 122 | 'oleform/oleform-PR314.docm': (False,), |
| 123 | 123 | 'basic/encrypted.docx': CRYPT, |
| 124 | + 'oleobj/external_link/sample_with_external_link_to_doc.docx': (False,), | |
| 125 | + 'oleobj/external_link/sample_with_external_link_to_doc.xlsb': (False,), | |
| 126 | + 'oleobj/external_link/sample_with_external_link_to_doc.dotm': (False,), | |
| 127 | + 'oleobj/external_link/sample_with_external_link_to_doc.xlsm': (False,), | |
| 128 | + 'oleobj/external_link/sample_with_external_link_to_doc.pptx': (False,), | |
| 129 | + 'oleobj/external_link/sample_with_external_link_to_doc.dotx': (False,), | |
| 130 | + 'oleobj/external_link/sample_with_external_link_to_doc.docm': (False,), | |
| 131 | + 'oleobj/external_link/sample_with_external_link_to_doc.potm': (False,), | |
| 132 | + 'oleobj/external_link/sample_with_external_link_to_doc.xlsx': (False,), | |
| 133 | + 'oleobj/external_link/sample_with_external_link_to_doc.potx': (False,), | |
| 134 | + 'oleobj/external_link/sample_with_external_link_to_doc.ppsm': (False,), | |
| 135 | + 'oleobj/external_link/sample_with_external_link_to_doc.pptm': (False,), | |
| 136 | + 'oleobj/external_link/sample_with_external_link_to_doc.ppsx': (False,), | |
| 137 | + 'encrypted/autostart-encrypt-standardpassword.xlsm': | |
| 138 | + (True, False, 'unknown', True, False, False, False, False, False, False, 0), | |
| 139 | + 'encrypted/autostart-encrypt-standardpassword.xls': | |
| 140 | + (True, True, EXCEL, True, False, True, True, False, False, False, 0), | |
| 141 | + 'encrypted/dde-test-encrypt-standardpassword.xlsx': | |
| 142 | + (True, False, 'unknown', True, False, False, False, False, False, False, 0), | |
| 143 | + 'encrypted/dde-test-encrypt-standardpassword.xlsm': | |
| 144 | + (True, False, 'unknown', True, False, False, False, False, False, False, 0), | |
| 145 | + 'encrypted/autostart-encrypt-standardpassword.xlsb': | |
| 146 | + (True, False, 'unknown', True, False, False, False, False, False, False, 0), | |
| 147 | + 'encrypted/dde-test-encrypt-standardpassword.xls': | |
| 148 | + (True, True, EXCEL, True, False, False, True, False, False, False, 0), | |
| 149 | + 'encrypted/dde-test-encrypt-standardpassword.xlsb': | |
| 150 | + (True, False, 'unknown', True, False, False, False, False, False, False, 0), | |
| 124 | 151 | } |
| 125 | 152 | |
| 126 | 153 | indicator_names = [] |
| ... | ... | @@ -148,7 +175,8 @@ class TestOleIDBasic(unittest.TestCase): |
| 148 | 175 | OLE_VALUES[name])) |
| 149 | 176 | except KeyError: |
| 150 | 177 | print('Should add oleid output for {} to {} ({})' |
| 151 | - .format(name, __name__, values[3:])) | |
| 178 | + .format(name, __name__, values)) | |
| 179 | + | |
| 152 | 180 | |
| 153 | 181 | # just in case somebody calls this file as a script |
| 154 | 182 | if __name__ == '__main__': | ... | ... |
tests/oleobj/test_basic.py
| ... | ... | @@ -8,7 +8,7 @@ from hashlib import md5 |
| 8 | 8 | from glob import glob |
| 9 | 9 | |
| 10 | 10 | # Directory with test data, independent of current working directory |
| 11 | -from tests.test_utils import DATA_BASE_DIR | |
| 11 | +from tests.test_utils import DATA_BASE_DIR, call_and_capture | |
| 12 | 12 | from oletools import oleobj |
| 13 | 13 | |
| 14 | 14 | |
| ... | ... | @@ -41,8 +41,10 @@ SAMPLES += tuple( |
| 41 | 41 | 'ab8c65e4c0fc51739aa66ca5888265b4') |
| 42 | 42 | for extn in ('xls', 'xlsx', 'xlsb', 'xlsm', 'xla', 'xlam', 'xlt', 'xltm', |
| 43 | 43 | 'xltx', 'ppt', 'pptx', 'pptm', 'pps', 'ppsx', 'ppsm', 'pot', |
| 44 | - 'potx', 'potm') | |
| 44 | + 'potx', 'potm', 'ods', 'odp') | |
| 45 | 45 | ) |
| 46 | +SAMPLES += (('embedded-simple-2007.odt', 'simple-text-file.txt', | |
| 47 | + 'bd5c063a5a43f67b3c50dc7b0f1195af'), ) | |
| 46 | 48 | |
| 47 | 49 | |
| 48 | 50 | def calc_md5(filename): |
| ... | ... | @@ -79,10 +81,6 @@ class TestOleObj(unittest.TestCase): |
| 79 | 81 | """ fixture start: create temp dir """ |
| 80 | 82 | self.temp_dir = mkdtemp(prefix='oletools-oleobj-') |
| 81 | 83 | self.did_fail = False |
| 82 | - if DEBUG: | |
| 83 | - import logging | |
| 84 | - logging.basicConfig(level=logging.DEBUG if DEBUG else logging.INFO) | |
| 85 | - oleobj.log.setLevel(logging.NOTSET) | |
| 86 | 84 | |
| 87 | 85 | def tearDown(self): |
| 88 | 86 | """ fixture end: remove temp dir """ |
| ... | ... | @@ -99,7 +97,8 @@ class TestOleObj(unittest.TestCase): |
| 99 | 97 | """ |
| 100 | 98 | test that oleobj can be called with -i and -v |
| 101 | 99 | |
| 102 | - this is the way that amavisd calls oleobj, thinking it is ripOLE | |
| 100 | + This is how ripOLE used to be often called (e.g. by amavisd-new); | |
| 101 | + ensure oleobj is a compatible replacement. | |
| 103 | 102 | """ |
| 104 | 103 | self.do_test_md5(['-d', self.temp_dir, '-v', '-i']) |
| 105 | 104 | |
| ... | ... | @@ -110,35 +109,52 @@ class TestOleObj(unittest.TestCase): |
| 110 | 109 | 'embedded-simple-2007.xml', |
| 111 | 110 | 'embedded-simple-2007-as2003.xml'): |
| 112 | 111 | full_name = join(DATA_BASE_DIR, 'oleobj', sample_name) |
| 113 | - ret_val = oleobj.main(args + [full_name, ]) | |
| 112 | + output, ret_val = call_and_capture('oleobj', args + [full_name, ], | |
| 113 | + accept_nonzero_exit=True) | |
| 114 | 114 | if glob(self.temp_dir + 'ole-object-*'): |
| 115 | - self.fail('found embedded data in {0}'.format(sample_name)) | |
| 116 | - self.assertEqual(ret_val, oleobj.RETURN_NO_DUMP) | |
| 115 | + self.fail('found embedded data in {0}. Output:\n{1}' | |
| 116 | + .format(sample_name, output)) | |
| 117 | + self.assertEqual(ret_val, oleobj.RETURN_NO_DUMP, | |
| 118 | + msg='Wrong return value {} for {}. Output:\n{}' | |
| 119 | + .format(ret_val, sample_name, output)) | |
| 117 | 120 | |
| 118 | - def do_test_md5(self, args, test_fun=oleobj.main): | |
| 121 | + def do_test_md5(self, args, test_fun=None, only_run_every=1): | |
| 119 | 122 | """ helper for test_md5 and test_md5_args """ |
| 120 | - # name of sample, extension of embedded file, md5 hash of embedded file | |
| 121 | 123 | data_dir = join(DATA_BASE_DIR, 'oleobj') |
| 122 | - for sample_name, embedded_name, expect_hash in SAMPLES: | |
| 123 | - ret_val = test_fun(args + [join(data_dir, sample_name), ]) | |
| 124 | - self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP) | |
| 124 | + | |
| 125 | + # name of sample, extension of embedded file, md5 hash of embedded file | |
| 126 | + for sample_index, (sample_name, embedded_name, expect_hash) \ | |
| 127 | + in enumerate(SAMPLES): | |
| 128 | + if sample_index % only_run_every != 0: | |
| 129 | + continue | |
| 130 | + args_with_path = args + [join(data_dir, sample_name), ] | |
| 131 | + if test_fun is None: | |
| 132 | + output, ret_val = call_and_capture('oleobj', args_with_path, | |
| 133 | + accept_nonzero_exit=True) | |
| 134 | + else: | |
| 135 | + ret_val = test_fun(args_with_path) | |
| 136 | + output = '[output: see above]' | |
| 137 | + self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP, | |
| 138 | + msg='Wrong return value {} for {}. Output:\n{}' | |
| 139 | + .format(ret_val, sample_name, output)) | |
| 125 | 140 | expect_name = join(self.temp_dir, |
| 126 | 141 | sample_name + '_' + embedded_name) |
| 127 | 142 | if not isfile(expect_name): |
| 128 | 143 | self.did_fail = True |
| 129 | - self.fail('{0} not created from {1}'.format(expect_name, | |
| 130 | - sample_name)) | |
| 144 | + self.fail('{0} not created from {1}. Output:\n{2}' | |
| 145 | + .format(expect_name, sample_name, output)) | |
| 131 | 146 | continue |
| 132 | 147 | md5_hash = calc_md5(expect_name) |
| 133 | 148 | if md5_hash != expect_hash: |
| 134 | 149 | self.did_fail = True |
| 135 | - self.fail('Wrong md5 {0} of {1} from {2}' | |
| 136 | - .format(md5_hash, expect_name, sample_name)) | |
| 150 | + self.fail('Wrong md5 {0} of {1} from {2}. Output:\n{3}' | |
| 151 | + .format(md5_hash, expect_name, sample_name, output)) | |
| 137 | 152 | continue |
| 138 | 153 | |
| 139 | 154 | def test_non_streamed(self): |
| 140 | 155 | """ Ensure old oleobj behaviour still works: pre-read whole file """ |
| 141 | - return self.do_test_md5(['-d', self.temp_dir], test_fun=preread_file) | |
| 156 | + return self.do_test_md5(['-d', self.temp_dir], test_fun=preread_file, | |
| 157 | + only_run_every=4) | |
| 142 | 158 | |
| 143 | 159 | |
| 144 | 160 | # just in case somebody calls this file as a script | ... | ... |
tests/oleobj/test_external_links.py
| ... | ... | @@ -6,7 +6,7 @@ import os |
| 6 | 6 | from os import path |
| 7 | 7 | |
| 8 | 8 | # Directory with test data, independent of current working directory |
| 9 | -from tests.test_utils import DATA_BASE_DIR | |
| 9 | +from tests.test_utils import DATA_BASE_DIR, call_and_capture | |
| 10 | 10 | from oletools import oleobj |
| 11 | 11 | |
| 12 | 12 | BASE_DIR = path.join(DATA_BASE_DIR, 'oleobj', 'external_link') |
| ... | ... | @@ -22,8 +22,11 @@ class TestExternalLinks(unittest.TestCase): |
| 22 | 22 | for filename in filenames: |
| 23 | 23 | file_path = path.join(dirpath, filename) |
| 24 | 24 | |
| 25 | - ret_val = oleobj.main([file_path]) | |
| 26 | - self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP) | |
| 25 | + output, ret_val = call_and_capture('oleobj', [file_path, ], | |
| 26 | + accept_nonzero_exit=True) | |
| 27 | + self.assertEqual(ret_val, oleobj.RETURN_DID_DUMP, | |
| 28 | + msg='Wrong return value {} for {}. Output:\n{}' | |
| 29 | + .format(ret_val, filename, output)) | |
| 27 | 30 | |
| 28 | 31 | |
| 29 | 32 | # just in case somebody calls this file as a script | ... | ... |
tests/olevba/test_basic.py
| ... | ... | @@ -3,21 +3,71 @@ Test basic functionality of olevba[3] |
| 3 | 3 | """ |
| 4 | 4 | |
| 5 | 5 | import unittest |
| 6 | -import sys | |
| 7 | -if sys.version_info.major <= 2: | |
| 8 | - from oletools import olevba | |
| 9 | -else: | |
| 10 | - from oletools import olevba3 as olevba | |
| 11 | 6 | import os |
| 12 | 7 | from os.path import join |
| 8 | +import re | |
| 13 | 9 | |
| 14 | 10 | # Directory with test data, independent of current working directory |
| 15 | -from tests.test_utils import DATA_BASE_DIR | |
| 11 | +from tests.test_utils import DATA_BASE_DIR, call_and_capture | |
| 16 | 12 | |
| 17 | 13 | |
| 18 | 14 | class TestOlevbaBasic(unittest.TestCase): |
| 19 | 15 | """Tests olevba basic functionality""" |
| 20 | 16 | |
| 17 | + def test_text_behaviour(self): | |
| 18 | + """Test behaviour of olevba when presented with pure text file.""" | |
| 19 | + self.do_test_behaviour('text') | |
| 20 | + | |
| 21 | + def test_empty_behaviour(self): | |
| 22 | + """Test behaviour of olevba when presented with pure text file.""" | |
| 23 | + self.do_test_behaviour('empty') | |
| 24 | + | |
| 25 | + def do_test_behaviour(self, filename): | |
| 26 | + """Helper for test_{text,empty}_behaviour.""" | |
| 27 | + input_file = join(DATA_BASE_DIR, 'basic', filename) | |
| 28 | + output, _ = call_and_capture('olevba', args=(input_file, )) | |
| 29 | + | |
| 30 | + # check output | |
| 31 | + self.assertTrue(re.search(r'^Type:\s+Text\s*$', output, re.MULTILINE), | |
| 32 | + msg='"Type: Text" not found in output:\n' + output) | |
| 33 | + self.assertTrue(re.search(r'^No suspicious .+ found.$', output, | |
| 34 | + re.MULTILINE), | |
| 35 | + msg='"No suspicous...found" not found in output:\n' + \ | |
| 36 | + output) | |
| 37 | + self.assertNotIn('error', output.lower()) | |
| 38 | + | |
| 39 | + # check warnings | |
| 40 | + for line in output.splitlines(): | |
| 41 | + if line.startswith('WARNING ') and 'encrypted' in line: | |
| 42 | + continue # encryption warnings are ok | |
| 43 | + elif 'warn' in line.lower(): | |
| 44 | + raise self.fail('Found "warn" in output line: "{}"' | |
| 45 | + .format(line.rstrip())) | |
| 46 | + self.assertIn('not encrypted', output) | |
| 47 | + | |
| 48 | + def test_rtf_behaviour(self): | |
| 49 | + """Test behaviour of olevba when presented with an rtf file.""" | |
| 50 | + input_file = join(DATA_BASE_DIR, 'msodde', 'RTF-Spec-1.7.rtf') | |
| 51 | + output, ret_code = call_and_capture('olevba', args=(input_file, ), | |
| 52 | + accept_nonzero_exit=True) | |
| 53 | + | |
| 54 | + # check that return code is olevba.RETURN_OPEN_ERROR | |
| 55 | + self.assertEqual(ret_code, 5) | |
| 56 | + | |
| 57 | + # check output: | |
| 58 | + self.assertIn('FileOpenError', output) | |
| 59 | + self.assertIn('is RTF', output) | |
| 60 | + self.assertIn('rtfobj.py', output) | |
| 61 | + self.assertIn('not encrypted', output) | |
| 62 | + | |
| 63 | + # check warnings | |
| 64 | + for line in output.splitlines(): | |
| 65 | + if line.startswith('WARNING ') and 'encrypted' in line: | |
| 66 | + continue # encryption warnings are ok | |
| 67 | + elif 'warn' in line.lower(): | |
| 68 | + raise self.fail('Found "warn" in output line: "{}"' | |
| 69 | + .format(line.rstrip())) | |
| 70 | + | |
| 21 | 71 | def test_crypt_return(self): |
| 22 | 72 | """ |
| 23 | 73 | Tests that encrypted files give a certain return code. |
| ... | ... | @@ -28,15 +78,23 @@ class TestOlevbaBasic(unittest.TestCase): |
| 28 | 78 | CRYPT_DIR = join(DATA_BASE_DIR, 'encrypted') |
| 29 | 79 | CRYPT_RETURN_CODE = 9 |
| 30 | 80 | ADD_ARGS = [], ['-d', ], ['-a', ], ['-j', ], ['-t', ] |
| 81 | + EXCEPTIONS = ['autostart-encrypt-standardpassword.xls', # These ... | |
| 82 | + 'autostart-encrypt-standardpassword.xlsm', # files ... | |
| 83 | + 'autostart-encrypt-standardpassword.xlsb', # are ... | |
| 84 | + 'dde-test-encrypt-standardpassword.xls', # automati... | |
| 85 | + 'dde-test-encrypt-standardpassword.xlsx', # ...cally... | |
| 86 | + 'dde-test-encrypt-standardpassword.xlsm', # decrypted. | |
| 87 | + 'dde-test-encrypt-standardpassword.xlsb'] | |
| 31 | 88 | for filename in os.listdir(CRYPT_DIR): |
| 89 | + if filename in EXCEPTIONS: | |
| 90 | + continue | |
| 32 | 91 | full_name = join(CRYPT_DIR, filename) |
| 33 | 92 | for args in ADD_ARGS: |
| 34 | - try: | |
| 35 | - ret_code = olevba.main(args + [full_name, ]) | |
| 36 | - except SystemExit as se: | |
| 37 | - ret_code = se.code or 0 # se.code can be None | |
| 93 | + _, ret_code = call_and_capture('olevba', | |
| 94 | + args=[full_name, ] + args, | |
| 95 | + accept_nonzero_exit=True) | |
| 38 | 96 | self.assertEqual(ret_code, CRYPT_RETURN_CODE, |
| 39 | - msg='Wrong return code {} for args {}' | |
| 97 | + msg='Wrong return code {} for args {}'\ | |
| 40 | 98 | .format(ret_code, args + [filename, ])) |
| 41 | 99 | |
| 42 | 100 | ... | ... |
tests/olevba/test_crypto.py
0 → 100644
| 1 | +"""Check decryption of files from olevba works.""" | |
| 2 | + | |
| 3 | +import sys | |
| 4 | +import unittest | |
| 5 | +from os.path import basename, join as pjoin | |
| 6 | +import json | |
| 7 | +from collections import OrderedDict | |
| 8 | + | |
| 9 | +from tests.test_utils import DATA_BASE_DIR, call_and_capture | |
| 10 | + | |
| 11 | +from oletools import crypto | |
| 12 | + | |
| 13 | + | |
| 14 | +@unittest.skipIf(not crypto.check_msoffcrypto(), | |
| 15 | + 'Module msoffcrypto not installed for {}' | |
| 16 | + .format(basename(sys.executable))) | |
| 17 | +class OlevbaCryptoWriteProtectTest(unittest.TestCase): | |
| 18 | + """ | |
| 19 | + Test documents that are 'write-protected' through encryption. | |
| 20 | + | |
| 21 | + Excel has a way to 'write-protect' documents by encrypting them with a | |
| 22 | + hard-coded standard password. When looking at the file-structure you see | |
| 23 | + an OLE-file with streams `EncryptedPackage`, `StrongEncryptionSpace`, and | |
| 24 | + `EncryptionInfo`. Contained in the first is the actual file. When opening | |
| 25 | + such a file in excel, it is decrypted without the user noticing. | |
| 26 | + | |
| 27 | + Olevba should detect such encryption, try to decrypt with the standard | |
| 28 | + password and look for VBA code in the decrypted file. | |
| 29 | + | |
| 30 | + All these tests are skipped if the module `msoffcrypto-tools` is not | |
| 31 | + installed. | |
| 32 | + """ | |
| 33 | + def test_autostart(self): | |
| 34 | + """Check that autostart macro is found in xls[mb] sample file.""" | |
| 35 | + for suffix in 'xlsm', 'xlsb': | |
| 36 | + example_file = pjoin( | |
| 37 | + DATA_BASE_DIR, 'encrypted', | |
| 38 | + 'autostart-encrypt-standardpassword.' + suffix) | |
| 39 | + output, _ = call_and_capture('olevba', args=('-j', example_file), | |
| 40 | + exclude_stderr=True) | |
| 41 | + data = json.loads(output, object_pairs_hook=OrderedDict) | |
| 42 | + # debug: json.dump(data, sys.stdout, indent=4) | |
| 43 | + self.assertEqual(len(data), 4) | |
| 44 | + self.assertIn('script_name', data[0]) | |
| 45 | + self.assertIn('version', data[0]) | |
| 46 | + self.assertEqual(data[0]['type'], 'MetaInformation') | |
| 47 | + self.assertIn('return_code', data[-1]) | |
| 48 | + self.assertEqual(data[-1]['type'], 'MetaInformation') | |
| 49 | + self.assertEqual(data[1]['container'], None) | |
| 50 | + self.assertEqual(data[1]['file'], example_file) | |
| 51 | + self.assertEqual(data[1]['analysis'], None) | |
| 52 | + self.assertEqual(data[1]['macros'], []) | |
| 53 | + self.assertEqual(data[1]['type'], 'OLE') | |
| 54 | + self.assertEqual(data[2]['container'], example_file) | |
| 55 | + self.assertNotEqual(data[2]['file'], example_file) | |
| 56 | + self.assertEqual(data[2]['type'], "OpenXML") | |
| 57 | + analysis = data[2]['analysis'] | |
| 58 | + self.assertEqual(analysis[0]['type'], 'AutoExec') | |
| 59 | + self.assertEqual(analysis[0]['keyword'], 'Auto_Open') | |
| 60 | + macros = data[2]['macros'] | |
| 61 | + self.assertEqual(macros[0]['vba_filename'], 'Modul1.bas') | |
| 62 | + self.assertIn('Sub Auto_Open()', macros[0]['code']) | |
| 63 | + | |
| 64 | + | |
| 65 | +if __name__ == '__main__': | |
| 66 | + unittest.main() | ... | ... |
tests/ooxml/test_basic.py
| ... | ... | @@ -33,6 +33,8 @@ class TestOOXML(unittest.TestCase): |
| 33 | 33 | pptx=ooxml.DOCTYPE_POWERPOINT, pptm=ooxml.DOCTYPE_POWERPOINT, |
| 34 | 34 | ppsx=ooxml.DOCTYPE_POWERPOINT, ppsm=ooxml.DOCTYPE_POWERPOINT, |
| 35 | 35 | potx=ooxml.DOCTYPE_POWERPOINT, potm=ooxml.DOCTYPE_POWERPOINT, |
| 36 | + ods=ooxml.DOCTYPE_NONE, odt=ooxml.DOCTYPE_NONE, | |
| 37 | + odp=ooxml.DOCTYPE_NONE, | |
| 36 | 38 | ) |
| 37 | 39 | |
| 38 | 40 | # files that are neither OLE nor xml: | ... | ... |
tests/ooxml/test_zip_sub_file.py
| ... | ... | @@ -144,15 +144,15 @@ class TestZipSubFile(unittest.TestCase): |
| 144 | 144 | self.subfile.seek(0, os.SEEK_END) |
| 145 | 145 | self.compare.seek(0, os.SEEK_END) |
| 146 | 146 | |
| 147 | - self.assertEquals(self.compare.read(10), self.subfile.read(10)) | |
| 148 | - self.assertEquals(self.compare.tell(), self.subfile.tell()) | |
| 147 | + self.assertEqual(self.compare.read(10), self.subfile.read(10)) | |
| 148 | + self.assertEqual(self.compare.tell(), self.subfile.tell()) | |
| 149 | 149 | |
| 150 | 150 | self.subfile.seek(0) |
| 151 | 151 | self.compare.seek(0) |
| 152 | 152 | self.subfile.seek(len(FILE_CONTENTS) - 1) |
| 153 | 153 | self.compare.seek(len(FILE_CONTENTS) - 1) |
| 154 | - self.assertEquals(self.compare.read(10), self.subfile.read(10)) | |
| 155 | - self.assertEquals(self.compare.tell(), self.subfile.tell()) | |
| 154 | + self.assertEqual(self.compare.read(10), self.subfile.read(10)) | |
| 155 | + self.assertEqual(self.compare.tell(), self.subfile.tell()) | |
| 156 | 156 | |
| 157 | 157 | def test_error_seek(self): |
| 158 | 158 | """ test correct behaviour if seek beyond end (no exception) """ | ... | ... |
tests/ppt_parser/test_basic.py
| ... | ... | @@ -16,7 +16,7 @@ class TestBasic(unittest.TestCase): |
| 16 | 16 | |
| 17 | 17 | def test_is_ppt(self): |
| 18 | 18 | """ test ppt_record_parser.is_ppt(filename) """ |
| 19 | - exceptions = [] | |
| 19 | + exceptions = ['encrypted.ppt', ] # actually is ppt but embedded | |
| 20 | 20 | for base_dir, _, files in os.walk(DATA_BASE_DIR): |
| 21 | 21 | for filename in files: |
| 22 | 22 | if filename in exceptions: | ... | ... |
tests/test-data/encrypted/autostart-encrypt-standardpassword.xls
0 → 100644
No preview for this file type
tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsb
0 → 100644
No preview for this file type
tests/test-data/encrypted/autostart-encrypt-standardpassword.xlsm
0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xls
0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsb
0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsm
0 → 100644
No preview for this file type
tests/test-data/encrypted/dde-test-encrypt-standardpassword.xlsx
0 → 100644
No preview for this file type
tests/test-data/oleobj/embedded-simple-2007.odp
0 → 100644
No preview for this file type
tests/test-data/oleobj/embedded-simple-2007.ods
0 → 100644
No preview for this file type
tests/test-data/oleobj/embedded-simple-2007.odt
0 → 100644
No preview for this file type
tests/test_utils/__init__.py
tests/test_utils/utils.py
0 → 100644
| 1 | +#!/usr/bin/env python3 | |
| 2 | + | |
| 3 | +"""Utils generally useful for unittests.""" | |
| 4 | + | |
| 5 | +import sys | |
| 6 | +import os | |
| 7 | +from os.path import dirname, join, abspath | |
| 8 | +from subprocess import check_output, PIPE, STDOUT, CalledProcessError | |
| 9 | + | |
| 10 | + | |
| 11 | +# Base dir of project, contains subdirs "tests" and "oletools" and README.md | |
| 12 | +PROJECT_ROOT = dirname(dirname(dirname(abspath(__file__)))) | |
| 13 | + | |
| 14 | +# Directory with test data, independent of current working directory | |
| 15 | +DATA_BASE_DIR = join(PROJECT_ROOT, 'tests', 'test-data') | |
| 16 | + | |
| 17 | +# Directory with source code | |
| 18 | +SOURCE_BASE_DIR = join(PROJECT_ROOT, 'oletools') | |
| 19 | + | |
| 20 | + | |
| 21 | +def call_and_capture(module, args=None, accept_nonzero_exit=False, | |
| 22 | + exclude_stderr=False): | |
| 23 | + """ | |
| 24 | + Run module as script, capturing and returning output and return code. | |
| 25 | + | |
| 26 | + This is the best way to capture a module's stdout and stderr; trying to | |
| 27 | + modify sys.stdout/sys.stderr to StringIO-Buffers frequently causes trouble. | |
| 28 | + | |
| 29 | + Only drawback sofar: stdout and stderr are merged into one (which is | |
| 30 | + what users see on their shell as well). When testing for json-compatible | |
| 31 | + output you should `exclude_stderr` to `False` since logging ignores stderr, | |
| 32 | + so unforseen warnings (e.g. issued by pypy) would mess up your json. | |
| 33 | + | |
| 34 | + :param str module: name of module to test, e.g. `olevba` | |
| 35 | + :param args: arguments for module's main function | |
| 36 | + :param bool fail_nonzero: Raise error if command returns non-0 return code | |
| 37 | + :param bool exclude_stderr: Exclude output to `sys.stderr` from output | |
| 38 | + (e.g. if parsing output through json) | |
| 39 | + :returns: ret_code, output | |
| 40 | + :rtype: int, str | |
| 41 | + """ | |
| 42 | + # create a PYTHONPATH environment var to prefer our current code | |
| 43 | + env = os.environ.copy() | |
| 44 | + try: | |
| 45 | + env['PYTHONPATH'] = SOURCE_BASE_DIR + os.pathsep + \ | |
| 46 | + os.environ['PYTHONPATH'] | |
| 47 | + except KeyError: | |
| 48 | + env['PYTHONPATH'] = SOURCE_BASE_DIR | |
| 49 | + | |
| 50 | + # hack: in python2 output encoding (sys.stdout.encoding) was None | |
| 51 | + # although sys.getdefaultencoding() and sys.getfilesystemencoding were ok | |
| 52 | + # TODO: maybe can remove this once branch | |
| 53 | + # "encoding-for-non-unicode-environments" is merged | |
| 54 | + if 'PYTHONIOENCODING' not in env: | |
| 55 | + env['PYTHONIOENCODING'] = 'utf8' | |
| 56 | + | |
| 57 | + # ensure args is a tuple | |
| 58 | + my_args = tuple(args) if args else () | |
| 59 | + | |
| 60 | + ret_code = -1 | |
| 61 | + try: | |
| 62 | + output = check_output((sys.executable, '-m', module) + my_args, | |
| 63 | + universal_newlines=True, env=env, | |
| 64 | + stderr=PIPE if exclude_stderr else STDOUT) | |
| 65 | + ret_code = 0 | |
| 66 | + | |
| 67 | + except CalledProcessError as err: | |
| 68 | + if accept_nonzero_exit: | |
| 69 | + ret_code = err.returncode | |
| 70 | + output = err.output | |
| 71 | + else: | |
| 72 | + print(err.output) | |
| 73 | + raise | |
| 74 | + | |
| 75 | + return output, ret_code | ... | ... |