Sonntag, 4. November 2012

Towards an Anti-Anti-Reversing Framework

Abstract -- TL;DR

  • too many diverse anti-anti-reversing tools (="pro-reversing")
  • idea: create open-source library of pro-reversing techniques
  • compile as DLL and inject it into process to be reversed


I really like reversing and really dislike anti-reversing techniques -- unless my objective is to analyze them. So I've decided to think a bit about anti-anti reversing reversing techniques or, as I like to call them: pro-reversing techniques.

Again, I was motivated by my day job. A specimen of malware just did not want to run in a debugged environment. The culprit was a debugger detection code, which roughly looks like this:

call ds:HeapCreate
mov [ebp+hHeap], eax


mov edx, [ebp+hHeap]
mov [ebp+hHeap2], edx
mov eax, [ebp+hHeap2]
mov cl, [eax+8]
mov [ebp+heap_signature], cl
mov edx, [ebp+hHeap2]
mov al, [edx+0Ch]
mov [ebp+heap_flags], al


movzx ecx, [ebp+heap_signature]
cmp ecx, 0FFh
jnz short no_debugger
movzx edx, [ebp+heap_flags]
test edx, edx
jz short no_debugger
jmp debugger_detected


; do initialization stuff



As you can see, a heap is created and after a bit of shuffling some fields are checked. A very in-depth information about the various heap flags and fields can be found here. In the current malware specimen the fields Signature (offset 0x8) and Flags (offset 0xC) are being checked.Usually, the fields Flags and ForceFlags (offsets 0xC and 0x10, respectively) are checked, as both can be used as debugger check. Probably, again a case of malware authors not properly checking their code ;-)

Too Many Tools, Too Many Plugins

Another thing that annoys me is needing 1,000 plugins for 1,000 tools. If you search for pro-reversing plugins for your favorite debugger, you will be presented lots of choices. Take, for example, OllyDbg. The number of options is overwhelming, you get

  • Anti Anti Hardware Breakpoint
  • Olly Advanced (which is my favorite choice BTW)
  • HideDebugger
  • IsDebuggerPresent
  • aadp4olly 
  • Anti-Anti
  • Robin
  • ...

The list is by far not exhaustive and each plugin may have its own raison d'être. However, the inner workings of most plugins is almost certainly the same, as there's only a couple of frequently used anti-reversing techniques. Hence, the wheel gets reinvented over and over again.

A Solution

So, why this article? The idea is simple: create an open library of pro-reversing techniques (naturally test-driven developed) and put the whole bunch of techniques into a DLL. This DLL can be injected into the process to be analyzed and we can continue using our favorite analysis tools. Hence, you won't need to sift the Internet for other pro-reversing techniques anymore ;-)

 But Why "Towards?"

The title of this article would not contain a "towards" if there was no catch, however. The current version of the framework contains code that is test-driven developed and has been tested under the following OS:
  • Windows XP 32 bit
  • Windows Vista 32 bit
  • Windows Vista 64 bit
  • Windows 7 32 bit
  • Windows 7 64 bit
I left Windows XP 64 bit out on purpose as it is rarely used. If anyone requests it, I might give it a try, though.

Furthermore, the framework contains three pro-reversing techniques working under all of the above environments:
  • PEB BeingDebugged
  • NtGlobalFlags
  • Heap Flags and ForceFlag

We all know (or should know) Peter Ferrie's extensive list of anti-debugging tricks, so implementation is straightforward. He explains everything needed for implementing a pro-reversing technique and I shouldn't be at all surprised if he already implemented each technique...
As my time is restricted, I want to start a small crowd-sourcing project: If you stumble upon code that has anti-reversing properties, you are invited to adding it to the framework yourself or send the sample to me and I will implement the pro-reversing technique. Hence, the framework will grow with each use.

Unit Testing of Pro-Reversing Techniques

As setting up a framework for testing debugger detection is far from trivial, I will present the basic concept in this section. BTW: that's the reason why this article took so long ;-)

First off the easy part: testing the pro-reversing techniques is very easy, as we are presented the testing code in the malware specimens we analyze. Namely, they are the anti-reversing techniques themselves. For example, checking if the BeingDebugged flag is set boils down to a call to the Windows API function IsDebuggerPresent().

Now, we have to implement a basic debugger, as we want to check if a) the anti-reversing technique and b) the pro-reversing technique are correctly working. Luckily, the Windows API makes it very easy to write your own debugger. 

Now that we have a debugger, we also need a "debuggee", a standalone process that executes the anti-reversing and pro-reversing techniques and checks if they succeeded. For that purpose, I created a TCP server that executes a debugger check upon request. During the unit tests, this server is then started under the following environments:
  • without debugger
  • with debugger, but no pro-reversing techniques are enabled
  • with debugger and pro-reversing techniques enabled
  • with debugger and pro-reversing techniques enabled, then disabled
The last test ensures the DLL can be unloaded without any side effects. 

In my repository you can already find the necessary tool to inject a DLL into a new or running process. Furthermore, it offers the opportunity to leave the process suspended after injection in order to facilitate reverse engineering.The ProReverse library can be downloaded here.


Using the ProReverse library is very easy. Taking the sample mentioned above, there are two opportunities to inject the DLL and debug the process:
  1. Start the process and inject the DLL at the beginning, then attach the debugger. Invoke

    DllInjector.exe --run malware.exe --dll c:\absolute\path\to\ProReverse.dll -s

    and then attach your favorite debugger and continue the process.

  2. Start the process in a debugger and inject the DLL afterwards. Start your debugger, load malware.exe and pause it anywhere you want. Then invoke

    DllInjector.exe --pid 1234 --dll c:\absolute\path\to\ProReverse.dll

    1234 is the PID of malware.exe. In OllyDbg, I noticed that the injection thread would not directly execute.
    In order to overcome this problem, I simply changed the instructions at EIP of the malware to an endless loop (in memory view, go to EIP and enter EB FE), then continued the process. Instantly, the DLL was being loaded and I could change the bytes to the original ones, continuing reverse engineering the malware.
    Maybe, I will write a DLL injection plugin just for OllyDbg ;-)
Note: The path of the DLL doesn't have necessarily to be absolute, but under normal circumstances the DLL will not be in the same working directory as the malware to be analyzed.

    Dienstag, 23. Oktober 2012

    Enhancing IDA Pro. Part 1: Highlighting Calls and Crypto-Operations

    This blog entry is to be regarded as a short intermezzo -- I will explain in my next article why it was taking so long ;-)

    IDA Pro provides a nice interface and serves as good platform for reverse engineering. I miss, however, some features. Gladly, IDA provides a well-documented (*cough*) API that makes it easy for anyone enhance it as he wishes.

    In this mini-series I will present some scripts to ease the process of malware analysis with IDA.

    Recognizing Functions

    For example, IDA does not always correctly recognize functions as such.  A short script changes that:

    def findUnidentifiedFunctions():
     next = idaapi.cvar.inf.minEA
     while next != idaapi.BADADDR:
      next = idaapi.find_not_func(next, SEARCH_DOWN)
      flags = idaapi.getFlags(next)
      if idaapi.isCode(flags):

    This IDAPython snippet searches for code that is not inside a function and simply assumes that it should be in a function. Therefore, it creates a function. Simple as that.

    Highlighting Call Instructions

    In larger functions, you can easily miss some calls that lead to important sub-functions. Hence, it's a good idea to mark all calls with a special color. Note that the IDA coloring scheme is somewhat different from what you would expect: it uses 0xBBGGRR as coloring scheme.

    You can achieve this by the following snippet:

    COLOR_CALL = 0xffffd0
    call_instructions = [idaapi.NN_call, idaapi.NN_callfi, 
    if cmd.itype in self.call_instructions:
     idaapi.set_item_color(cmd.ea, COLOR_CALL)

    Highlighting Crypto-related Instructions

    Crypto-related instructions are assembler instructions that are the essence of many cryptographic functions. For example, almost each block cipher makes use of the XOR instruction. As this instruction is also used in order to zero registers, some care has to be put into finding the ones relevant for cryptography. Hence, I added another condition that checks if the operands differ:

    COLOR_CRYPTO = 0xffd2f8
    xor_instructions = [idaapi.NN_xor, idaapi.NN_pxor, 
     idaapi.NN_xorps, idaapi.NN_xorpd]
    if cmd.itype in self.xor_instructions:
     # check if different operands
     if cmd.Op1.type != cmd.Op2.type or
      cmd.Op1.reg != cmd.Op2.reg or
      cmd.Op1.value != cmd.Op2.value:
      idaapi.set_item_color(cmd.ea, COLOR_CRYPTO)

    There are several other vital cryptographic-related instructions such as ROL, ROR and NOT. But these don't need the check for different operands.

    Putting it all together and adding some "syntactic sugar" we get a nice script which you can download from my GitHub page. The output looks for example like this:

    Sample run for the initialAnalysis script

    Montag, 15. Oktober 2012

    Generic *Static* Unpacker For XPACK

    To me, static unpacking of executables always had the nimbus of aloofness, as I have tended to unpack the binaries dynamically, as seen on the last post. I could not resist looking behind the curtain of the most commonly used unpacking stub commonly employed in malware and see what's behind the myth of static unpacking. I already described it roughly in my previous blog entry, so you might want to read that first...

    [insert coffee break here -- for me at least, you go ahead and read the previous article ;-)]

    The XPACK Unpacker Stub

    Welcome back. So now that you know the basic concept of how the unpacking stub works I will go through to the central unpacking steps. From now on, I will call the packer/unpacker XPACK, as at a certain point the string "XPXAXCXK" appears. This string is too complicated to write all the time and PACK seems to be too generic, hence the name. BTW: this guy also calls it XPACK ;-)
    The code examples and addresses will refer to the sample with MD5 hash d5b0a448f59c3e946255333ad0ef5cc5.

    NB: you can't just disassemble the sample and go to the addresses, as the actual unpacking stub is encrypted, as can be seen in my last post.

    Chunk Assembly

    The first stage of the XPACK unpacker stub (function at address 004043E4) sifts byte by byte through the whole binary and searches for data chunks. These chunks have a special format, which has the following format:

    struct Chunk{
        unsigned char magic[8];
        ULONG hash;
        ULONG unknown;
        ULONG dechunkedOffset;
        ULONG size;
        unsigned char chunkData[];

    The data structure is somewhat interesting, as the magic is not a fixed number, but has to meet certain conditions. The conditions are:
    • neither of the first 4 bytes must be 0
    • (byte 0) XOR (byte 1) OR (byte 2) must be equal to (byte 4)
    • (byte 1) XOR (byte 2) OR (byte 3) must be equal to (byte 5)
    • (byte 2) XOR (byte 3) OR (byte 0) must be equal to (byte 6)
    • (byte 0) XOR (byte 3) OR (byte 1) must be equal to (byte 7)
    If these preconditions are met, a 32-bit hash value over these 8 bytes is calculated. The hash function -- as the other code -- is obfuscated in a way I already described in my last article. It looks very similar to CRC-32, as the following code shows (it starts at address 00403BED):

    ULONG modifiedCrc(const unsigned char *memArray)
        int hashTable[256];
        unsigned int hash;

        for (size_t i = 0; i < 256; i++)
            hashTable[i] = genHashTableElement(i);

        hash = 0xffffffff;
        for (size_t i = 0; i < 8; i++)
            hash = (hash >> 8) ^ hashTable[(hash ^ memArray[i]) & 0xff];

        return hash ^ 0xffffffff;

    Yes, boring code, you are right ;-) I took the liberty of cleaning up the code a little, that's why. Of special interest is the function genHashTableElement, as it contains code specific for each sample. It starts at 00403B45 and again, after some cleaning and slight modifications it roughly looks like this:

    #define previous(x) (x + 0x789ADA71) ^ 0x75683572
    #define next(x) ((x ^ 0x75683572) - 0x789ADA71)

    ULONG genHashTableElement(const ULONG v1)

        v1 = previous(v1);
        for (size_t i = 0; i < 8; i++)
            if ( next(v1) & 1 )
                v1 = (next(v1) >> 1) ^ 0xEDB88320;
                v1 = next(v1) >> 1;
        return v1;

    previous and next are sample-specific functions reversing each others' operations. You can easily see that  

    next(previous(x)) = x

    for any 32-bit integer. The functions are closely related to the obfuscation algorithm and probably have been introduced to make reversing harder. Nevertheless, we can rewrite the code to:

    ULONG genHashTableElement(const ULONG v1)

        for (size_t i = 0; i < 8; i++)
            if ( v1 & 1 )
                v1 = (v1 >> 1) ^ 0xEDB88320;
                v1 >>= 1;

            if (i == 7)   
                return v1;

            v1 = next(v1);


    This is pretty much the standard CRC-32 table generation algorithm, with the addition of the next call during each round but the very last. I don't know if that was the original intent of the authors or just a glitch in the implementation of the obfuscation engine, but the result is a polymorphic hash function.

    Once both, the preconditions and hash, match the input values, the data chunk of size size is copied to a target buffer at offset dechunkedOffset. Thus, the chunks don't have to be in a particular order in the binary. This procedure is repeated throughout the whole binary, and magically, all chunks fit neatly together in the target buffer.

    Decryption (1)

    The next step in the unpacking process of XPACK is a decryption layer (loop at 0040145F). In this sample, the decryption consists of the following code:

    for (size_t i = 0; i < s.length(); i++)
        s[i] = s[i] - i % 0x5a;

    In other samples, the code is slightly more complicated, but always has the pattern

        s[i] += a - (i % b);

    with a and b being constant byte values.

    Base-64 Decoding

    As a result of the first decryption, a standard base-64 encoded string appears, which is simply decoded. In the presented sample,  this happens in the function at address 00404434.

    Decryption (2)

    After the base-64 decoding, yet another decryption layer is employed. It is divided into two stages:
    1. A simple addition/subtraction round is applied on the data blob, which you can see in the following image:
      XPACK decryption stage 2a

      In high-level language, the code looks like

      void decrypt2a(unsigned char *buffer, DWORD len)
          unsigned int currentBlock = 0;

          if ( !len )

              for (size_t i = 0; i < 16 && i + currentBlock < len; i++)
                  buffer[i + currentBlock] += -2 - ((i + currentBlock) & 3) *

                    ((i + currentBlock) & 0x1F) * ((currentBlock >> 4) & 3);
              for (size_t i = 0; i < 8 && i + currentBlock < len; i++)
                  buffer[i + currentBlock] -= i;

              currentBlock += 64;

          while ( currentBlock < len );

    2. The buffer is XORed with a constant byte and yet another constant byte is added/subtracted (loop at address 00401582):

       for (size_t i = 0; i < s.length(); i++)
          s[i] = (s[i] ^ 0xA1) - 0x6E;

      Again, these constants are specific for this sample and differ from sample to sample. The general way XPACK implements this decryption layer is

          s[i] = (s[i] ^ c) - d;

      with c and d being bytes randomly chosen supposedly during packing of the original executable. 


    As result of the last decryption layer the following data structure emerges:

    struct XpackStruct{
        unsigned char magic[8];
        ULONG decompressedLength;
        ULONG compressedLength;
        unsigned char compressedBuffer[];

    magic is our long-awaited string "XPXAXCXK", compressedLength represents the length of the compressed buffer, while decompressedLength defines the length of the data once it is unpacked. After that, the structure holds the compressed data.

    Hence, the last step in the unpacking process is decompressing the buffer. In the presented sample, the function can be found at address 00401903. The algorithm is obviously statically linked into the binary as it does not bear the obfuscation methods found throughout the sample. Furthermore, it seems to be either compiled for minimal size or hand-crafted in assembler language, as some very uncommon patterns such as calls into the middle of functions etc. can be found.

    Unfortunately, I have no further information about the decompression algorithm would be happy if anyone could tell me. For a proper code listing, take a look at Decompressor.cpp at my GitHub page.

    Creating a Generic Unpacker

    In order to properly, and, most of all, generically, unpack XPACK-packed code, we have to overcome all of the challenges the authors implemented. In the following section, I will point out the weaknesses in each stage of the XPACK implementation. Furthermore, several ways of exploiting these weaknesses will be presented and my choices will be explained. So, let's cut to the chase.

    Chunk Assembly

    As the results of the hash function differ from sample to sample due to the CRC table generation (remember, the constant values of next(x) ((x ^ 0x75683572) - 0x789ADA71) differ in each sample), we are presented several choices:
    • skip the validation part of the hash function
    • search for possible constants in question throughout the original binary and basically brute-force them until we find a matching hash
    • decrypt the unpacker stub and search there for the hash function, then extract the constant
    I did a small evaluation if the constants really occur in the original XPACK-packed binary and indeed they do, so that's a possible solution. Here is one example of our constant appearing near the entry point:

    004046FF sub eax, 789ADA71h

    So we can skip option 3, as we already hold potential constants in our hands.

    On the other hand, why bother if the easiest solution works? A short evaluation of this possibility revealed only very few false positives. For example, when all magic bytes are the same,

    (a ^ a) | a == a

    always is true. However, this problem is easily overcome by adding another constraint, that no all bytes must be the same.

    Hence, I have chosen the first method of finding the chunks without the need of checking the hash function, and so far it works flawlessly.

    The idea of the XPACK authors to introduce a polymorphic header was good and new to me. But exactly the characteristics the authors introduced as initial constraints have proven as an easily exploitable weakness. Thus, the additional protection by means of a polymorphic hash function based on CRC-32 has become completely obsolete. You could consider that an implementation failure ;-)

    Decryption (1)

    We need to understand the algorithm employed by the first decryption layer if we want to solve it in a generic way. A bit of basic cryptanalysis is helpful here. Let's recap the algorithm:

    for (size_t i = 0; i < s.length(); i++)
        s[i] += a - (i % b);

    with a and b being byte values. Hence, we can see the algorithm as two steps:
    1. add a constant to each byte
    2. add a counter modulo  a constant to each byte
    True to the motto "reverse everything" we start with step 2, deriving the modulus. As we know the output has to be base-64, we can easily derive the modulus by the following algorithm:
    • iterate modulus from 2..256
    • apply s[i] += i % b to each byte of the buffer
    • count the number of different occurring byte values
    • stop if the number at most 65
    What happened here? We applied the reverse of -(i % b) to the values and thus have removed the modulo-counter. Hence we know that if we hit the correct modulus, we only have the part s[i] += a in our buffer. But this is just a "rotation" operation to each byte, it does not change the number of different values that occur in the data. And yes, the number 65 is correct as the base-64 alphabet contains a padding character usually being "=".

    As the size of the packed executable is in the range of some 100 KiB, I chose to only use the first KiB due to performance reasons. A couple of dozen bytes should do the trick as well.

    As a result of the analysis, we have successfully derived b and can now apply the reverse function, that is adding (i % b), to our buffer:

    s[i] = s[i] + a - (i % b) + (i % b) which now becomes
    s[i] = s[i] + a

    Now to step 1: We now hold the nearly-base-64 code and make the assumption that the + operator smoothly "rotates" the value over the -1..0 limit. Hence, we can create a (binary) histogram over the byte range where we already removed the modulus operation. I call it binary histogram as we are only interested if a byte occurs in the buffer or not.
    Then, we calculate that binary histogram over the well-known base-64 alphabet and simply rotate it until both histograms match. The number of rotation operations gets us offset a!

    Summarizing, we derived the modulus b by brute-forcing over all possible byte values until we get a buffer of at most 65 different characters. We then apply the reverse modulus function. After that, we generate the binary histograms over the well-known base-64 alphabet and the input values in order to match them, rotating one of the histograms element-wise (that is byte-wise). Once we find a match, we have successfully derived the offset a.

    We could have tried the naive approach, that is to brute-force each combination of a and b, until we have found a buffer that contains only bytes of the base-64 alphabet. The complexity of the naive approach, however, is roughly in 256², while the presented approach uses at most about 256+256 loop iterations. Hence, unpacking becomes way faster ;-)

    Again, the authors had a good idea but failed to implement it properly.

    Base-64 Decoding

    The base-64 decoding is straightforward. Also, the alphabet is standard, so no further comments are necessary in this section...

    Decryption (2)

    Again, stage "2a" of the decryption process is straightforward and we can utilize the extracted code.

    Remembering stage "2b" of the decryption, we have two byte variables:

        s[i] = (s[i] ^ c) - d;

    with c and d being byte constants randomly chosen at packing the original executable. We could use the naive approach and iterate over c and d, but that takes quite a long time and I have something way more nicer up my sleeve: differential cryptanalysis. Although it seems to be a bit pretentious to call it that way, my method can indeed be seen as it.

    We can rely on the fact that the first eight bytes of the decrypted buffer have to equal "XPXAXCXK". First, we try to get a valid XOR value c. So, brute-force over all possible values for c and check if the distance of magic[0] and magic[1] equals the distance of 'X' - 'P' and so on for the other seven characters. This will result in several possible values for c and we have to test each value.

    Getting the offset, i.e. the value for d is rather trivial: just calculate

    'X' - (magic[0] ^ c)

    and we're done. As we relied on the output of magic being "XPXAXCXK" we only have to check the remaining fields for validity. The only field we can utilize for that purpose is decompressedLength, as we don't know the uncompressed length of the binary data.

    So, how many potential solutions do we get and how many of these potential solutions will lead to a correct decompression of the binary? I will leave these questions for the interested reader to answer ;-)

    In summary, we again employed basic cryptanalysis in order to retrieve the values for c and d, pushing down the complexity from 256² to 256 + 256. But this time, I won't bash the XPACK authors again...


    The decompression is simple: just use the code ripped from the XPACK unpacker. It works like a charm!


    Writing a generic static unpacker for XPACK surely was fun. The XPACK creators went through a variety of painstaking efforts such as
    • spreading the packed contents in chunks all over the binary
    • no fixed header constants
    • polymorphic check sums
    • several layers of polymorphic encryption
    • compression and base-64 encoding
    With each step the authors tried to thwart the possibility of creating a static unpacker. However, each step of the packing process has its own unique weaknesses that have been exposed and successfully exploited in the presented article.

    A quick experiment over a collection of binaries revealed about a dozen of binaries packed with XPACK, of which 3 I previously had not recognized as such in a manual inspection. The unpacker is available at my GitHub page. As usual, the code has been developed test-driven. I'm really thankful I learned this technique as it makes the development process way easier. And not only that: reverse engineering becomes easier, too, as it often relies on the ability to develop agile. Just try it!

    Samstag, 6. Oktober 2012

    Using DLL Injection to Automatically Unpack Malware

    In this post, I will present DLL injection by means of automatically unpacking malware. But first, the most important question:

    What is DLL Injection and Reasons for Injecting a DLL

    DLL injection is one way of executing code in the context of another process. There are other techniques to execute code in another process, but essentially this is the easiest way to do it. As a DLL brings nifty features like automatic relocation of code good testability, you don't have to reinvent the wheel and do everything on your own.

    But why should you want to injecting a DLL into a foreign process? There are lots of reasons to inject a DLL. As you are within the process address space, you have full control over the process. You can read and write arbitrary memory locations, set hooks etc. with unsurpassed performance. You could basically do the same with a debugger, but it is way more convenient to do it in an injected DLL.

     Some showcases are:
    • creation of cheats, trainers
    • extracting passwords and encryption keys
    • unpacking packed/encrypted executables

    To me, especially the opportunity of unpacking and decrypting malware is very interesting. Basically, most malware samples are packed by the same packer or family of packers. In the following, I will shortly summarize how it works.


    The Malware Packer

     In order to evade anti-virus detection, the authors of the packer have devised an interesting unpacking procedure. Roughly, it can be summarized in the following stages:

    1. First, the unpacker stub does some inconspicuously looking stuff in order to thwart AV detection. The code is slightly obfuscated, but not as strong as to raise suspicion. Actually, the code that is being executed decrypts parts of the executable and jumps to it by self-modifying code. In the snippet below, you see how exactly the code is modified. The first instruction of the function that is supposedly called is changed to a jump to the newly decrypted code.

      mov [ebp+var_1], 0F6h
      mov al, [ebp+var_1]
      mov ecx, ptr_to_function
      xor al, 0A1h
      sub al, 6Eh
      mov [ecx], al ; =0xE9
      mov ecx, ptr_to_function
      mov [ecx+1], eax ; delta to decrypted code
      call eax

      As you can see (after doing some math), an unconditional near jmp is inserted right at the beginning of the function to be called. Hence, by calling a supposedly normal function, the decrypted code is executed.
    2. The decrypted stub allocates some memory and copies the whole executable to that memory. Them it does some relocation (as the base address has changed) and executes the entry point of executable. In the following code excerpt, you can see the generic calculation of the entry point:

      mov edx, [ebp+newImageBase]
      mov ecx, [edx+3Ch] ; e_lfanew
      add ecx, edx ; get PE header
      mov ebx, [ecx+28h] ; get AddressOfEntryPoint
      add ebx, edx ; add imageBase
      mov [ebp+vaOfEntryPoint], ebx
      mov ebx, [ebp+vaOfEntryPoint]
      call ebx

    3. Here, the next stage begins. At first glance it seems the same code is executed twice, but naturally, there's a deviation in control flow.

      For example, the the packer authors had to make sure that the encrypted code doesn't get decrypted twice. For that, they declared a global variable which in this sample initially holds the value 0x6E6C82B7. So upon first execution, the variable alreadyDecrypted is set to zero.

      mov eax, alreadyDecrypted
      cmp eax, 6E6C82B7h
      jnz dontInitialize

      mov alreadyDecrypted, 0

      In the decryption function, that variable is checked for zero, as you can see/calculate in the following snippet:

      mov [ebp+const_0DF2EF03], 0DF2EF03h
      mov edi, 75683572h
      mov esi, 789ADA71h
      mov eax, [ebp+const_0DF2EF03]
      mov ecx, alreadyDecrypted
      xor eax, edi
      sub eax, esi
      cmp eax, ecx ; eax = 0
      jnz dontDecrypt

      Once more, you see the obfuscation employed by the packer.

      Then, a lengthy function is executed that takes care of the actual unpacking process. It comprises the following steps:
      • gather chunks of the packed program from the executable memory space
      • BASE64-decode it
      • decompress it
      • write it section by section to the original executable's memory space, effectively overwriting all of the original code
      • fix imports etc. 

      After that, the OEP (original entry point) is called.

      The image below depicts a typical OEP of an unspecified malware. Note that after a call to some initialization function, the first API function it calls is SetErrorMode.

      Code at the OEP


      What are possible points to attack the unpacking process? Basically, you can grab the unpacked binary at two points:
      • first, when it is completely unpacked on the heap, but not yet written to the original executable's image space, and 
      • second, once the malware has reached its OEP.
      The second option is the most common and generic one when unpacking binaries, so I will explain that one. Naturally, you can write a static unpacker and perhaps one of my future posts will deal with that.

      One of the largest weaknesses are the memory allocations and setting the appropriate access rights. As a matter of fact, in order to write to the original executable's memory, the unpacker grants RWE access to the whole image space. Hence, it has no problems accessing and executing all data and code contained in it.

      If you set a breakpoint on VirtualProtect, you will see what I mean. There are very distinct calls to this function and the one setting the appropriate rights to the whole image space really sticks out.

      After a little research, I found two articles dealing with the unpacking process of  the packer (here and here), but both seem not aware that the technique presented in the following is really easily implemented.

      Once you have reached the VirtualProtect call that changes the access rights to RWE, you can change the flags to RW-only, hence execution of the unpacked binary will not be possible. So, once the unpacker tries to jump to the OEP, an exception will be raised due to missing execution rights.

      So, now that we know the correct location where to break the packer, how to unpack malware automatically?

      Here DLL injection enters the scene. The basic idea is very simple:
      • start the binary in suspended state
      • inject a DLL
        • this DLL sets a hook on VirtualProtect, changing RWE  to RW at the correct place
        • as backup, a hook on SetErrorMode is set. Hence, when encountering unknown packers, the binary won't be executed for too long.
      • resume the process

      Some other things have to be taken care of, like correctly dumping the process and rebuilding imports, but these are out of the scope of this article. If you encounter them yourself and don't know how to handle them, just ask me ;-)

      It seems not too easy to find a decent DLL injector. Especially, one that injects a DLL before the program starts (if there is one around, please tell me). As I could not find an injector that is capable of injecting right at program start, I coded my own. You can find it at my GitHub page. It uses code from Jan Newger, so kudos to him. I'm particularly fond of using test-driven development employing the googletest framework ;-)


      The presented technique works very well against the unpacker. So far, I've encountered about 50 samples and almost all can be unpacked using this technique. Furthermore, all unpackers that overwrite the original executable's image space can be unpacked by this technique. In future posts, I will evaluate this technique against other packers.

      Freitag, 28. September 2012

      "hello \x77\x6F\x72\x6C\x64"


      let me introduce me to you in some short words: my name is Sebastian and I'm working as Ph.D. student at Fraunhofer. In this blog, I will share some of my findings about malware I analyze and present some interesting reverse engineering projects I do in my spare time.

      So stay tuned for the next posts :-)