Windows x64 Shellcode - Part 2

Table of Contents

Writing Custom Shellcode - This article is part of a series.

Part 2: This Article

Introduction
#

As mentioned in the previous part of this series, I’m going to build on the assembly program that we created, which launches a simple message box. In particular, I want to add a way to dynamically resolve functions without having to hardcode their addresses. This is what’s known as dynamically resolving the function’s address.

Approaches
#

There are two ways we can dynamically resolve the target function’s address:

Resolve the address of the famous GetProcAddress WinAPI. We can then reuse this function to acquire the address of any function present in the loaded DLLs of the process. Typically, the GetModuleHandle or LoadLibraryA WinAPI will also be loaded to facilitate further function retrieval.
Use the functions available in the target process, located in its IAT.

Let’s go over how one could implement each of these approaches.

Walking the PEB
#

This method has the benefit of being very flexible. We don’t need to rely on any special scenarios, since kernel32.dll is one of the fundamental modules loaded when a process is executed. Since it will always be present, and therefore the GetModuleHandle/LoadLibraryA and GetProcAddress functions can be found, we can load any other module that we wish, and look up any function that we choose.

However, there is a potential downside to using this technique. This is a method commonly used by shellcode to resolve function addresses, and can therefore seem suspicious to an AV or EDR. The offsets and lookup algorithm are characteristic and can easily be distinguished by a security solution.

Finding a Module
#

This is a well-known technique to retrieve the WinAPIs mentioned above, allowing us to call other functions with ease. First, we need the address of either LoadLibraryA or GetModuleHandle, which can be done by performing the following steps:

Get a pointer to the PEB.
Get a pointer to the Ldr member, which points to a PEB_LDR_DATA structure.
This structure contains a member called InMemoryOrderModuleList, which points to the head of a doubly linked list of LIST_ENTRY structures, each representing a loaded module in the process. Each of these structures has a forward and backward pointer to LDR_DATA_TABLE_ENTRY structures.
Each LDR_DATA_TABLE_ENTRY has a UNICODE_STRING member named FullDllName, indicating the name of the DLL. The idea is to compare this value against the name of the target DLL, and once a match is found, save its DllBase value, which gives us the module’s base address.

The first LDR_DATA_TABLE_ENTRY represents the process itself, followed by ntdll.dll, kernel32.dll, and so on, depending on the process. Below is an implementation of the steps described above to print all of the loaded modules in the process.

#include <Windows.h>
#include <stdio.h>
#include <winnt.h>
#include <winternl.h>

#define info(msg, ...) printf("[i] " msg "\n", ##__VA_ARGS__)
#define warn(msg, ...) printf("[!] " msg "\n", ##__VA_ARGS__)

int main() {
    PPEB pPEB = (PPEB)__readgsqword(0x60);
    PPEB_LDR_DATA pPEBLdrData = (PPEB_LDR_DATA)(pPEB->Ldr);

    PLDR_DATA_TABLE_ENTRY pLdrDataTableEntry = NULL;
    PLIST_ENTRY pListEntryNode = &pPEBLdrData->InMemoryOrderModuleList;

    do {
        pLdrDataTableEntry = (PLDR_DATA_TABLE_ENTRY)pListEntryNode;
        if (pLdrDataTableEntry->DllBase == NULL) {
            pListEntryNode = pListEntryNode->Flink;
            continue;
        }
        info("Found loaded module '%ws'\n", pLdrDataTableEntry->FullDllName.Buffer);
        pListEntryNode = pListEntryNode->Flink;
    } while (pListEntryNode != &pPEBLdrData->InMemoryOrderModuleList);

    return 0;
}

List of loaded modules — The output of the above program.

The previous program worked out of luck. You should read this section to understand where I made a mistake before.

As mentioned above, the InMemoryOrderModuleList member points to the head of a doubly linked list of LIST_ENTRY structures. I’m pretty rusty with data structures and was having some trouble with the logic used for traversing this list. I jumped into WinDbg to see why I was having some issues and noticed that the value of DllBase seemed to be… off.

Weird DllBase value — This is the address of the first `LDR_DATA_TABLE_ENTRY` entry corresponding to the executable itself. Note the value of `DllBase`. Also, take a look at the `InMemoryOrderLinks` value.

Something’s not making sense here… Turns out I had misunderstood how these structures are linked. Each of the LIST_ENTRY structures points to the next/previous LIST_ENTRY, and each of these LIST_ENTRY members is offset in the “destination” LDR_DATA_TABLE_ENTRY structure. Therefore, we cannot just use the value of Flink directly. Of course, the simple fix (more like “patch” since this isn’t easily maintainable and isn’t guaranteed not to change) would be to subtract the offset of the LIST_ENTRY within the LDR_DATA_TABLE_ENTRY (0x10 in this case) from the address pointed to by each Flink. However, we will use a solution made specifically for these cases: CONTAINING_RECORD . After a bit of refactoring, the final code used to parse the loaded modules is shown below.

#include <Windows.h>
#include <stdio.h>
#include <winnt.h>
#include <winternl.h>

#define info(msg, ...) printf("[i] " msg "\n", ##__VA_ARGS__)
#define warn(msg, ...) printf("[!] " msg "\n", ##__VA_ARGS__)

int main() {
    PPEB pPEB = (PPEB)__readgsqword(0x60);
    PPEB_LDR_DATA pPEBLdrData = (PPEB_LDR_DATA)(pPEB->Ldr);

    PLIST_ENTRY pListHead = &pPEBLdrData->InMemoryOrderModuleList;
    PLIST_ENTRY pListEntry = pListHead->Flink;

    while (pListEntry != pListHead) {
        PLDR_DATA_TABLE_ENTRY pLdrDataTableEntry = CONTAINING_RECORD(
        pListEntry,
        LDR_DATA_TABLE_ENTRY,
        InMemoryOrderLinks
        );

        if (pLdrDataTableEntry->DllBase == NULL) {
            warn("Found module with NULL DllBase — skipping\n");
            pListEntry = pListEntry->Flink;
            continue;
        }

        info("Loaded module '%ws' @ 0x%p\n",
            pLdrDataTableEntry->FullDllName.Buffer,
            pLdrDataTableEntry->DllBase
        );

        pListEntry = pListEntry->Flink;
    }

    return 0;
}

Fixed list of loaded modules — The output of the fixed code.

The attentive viewer may have realized that the FullDllName member in the incorrect version of this program was actually getting replaced with the value of the Reserved4 member. If we subtract the offset (0x10) of the LIST_ENTRY from the address of the LDR_DATA_TABLE_ENTRY entry, we see that the values of the structure make a lot more sense.

In the official Microsoft documentation, there are many members marked as Reserved. Thankfully, some fantastic people have created online repositories documenting these “hidden” structures and other things that Microsoft has chosen to keep obscure. Two great examples are: ntdoc and the Vergilius Project . If we look up the LDR_DATA_TABLE_ENTRY structure , we can conclude that the Reserved4 member must be BaseDllName of type UNICODE_STRING. If we apply this knowledge in WinDbg, we can see that FullDllName was in fact getting replaced by the value of Reserved4, which we’ve discovered now, is actually BaseDllName.

The value of `BaseDllName` was being printed before, not the actual `FullDllName`.

Finding a Function in the Module
#

Once we have the base address of kernel32.dll, we can proceed to resolve the addresses of the functions of interest. Specifically, we’re interested in LoadLibraryA or GetModuleHandle, as well as GetProcAddress. To locate a function in a DLL using the module’s base address, we can parse its export table as follows:

Verify the signature in the IMAGE_NT_HEADERS.
The IMAGE_OPTIONAL_HEADER contains an IMAGE_DATA_DIRECTORY array called DataDirectory. The RVA and size of the export table reside at index IMAGE_DIRECTORY_ENTRY_EXPORT (usually 0).
The IMAGE_EXPORT_DIRECTORY contains several members of interest: NumberOfFunctions, AddressOfFunctions, AddressOfNames, and AddressOfNameOrdinals. By looping through these arrays (which are aligned by index), we can perform a simple string comparison of each name against our target function’s name. Once found, we use the corresponding ordinal to retrieve the function’s address.

A proof of concept that lists the exported functions of a module is presented below, run on the base address of kernel32.

#include <Windows.h>
#include <stdio.h>
#include <winnt.h>
#include <winternl.h>

int main() {

    // pTargetModuleBase already found...

    if (pTargetModuleBase == NULL) {
        fail("Failed to find target module");
    } else {
        info("Target module found at base address: %p\n", pTargetModuleBase);
    }

    PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)pTargetModuleBase;
    if (pDosHeader->e_magic != IMAGE_DOS_SIGNATURE) { // check the DOS header signature
        fail("Invalid DOS header signature");
    }
    PIMAGE_NT_HEADERS pNtHeaders = (PIMAGE_NT_HEADERS)(pTargetModuleBase + pDosHeader->e_lfanew);
    if (pNtHeaders->Signature != IMAGE_NT_SIGNATURE) { // check the NT header signature
        fail("Invalid NT header signature");
    }
    PIMAGE_EXPORT_DIRECTORY pExportDir = (PIMAGE_EXPORT_DIRECTORY)(pTargetModuleBase + pNtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
    if (pExportDir == NULL) {
        fail("Failed to get export directory");
    }
    DWORD* pFunctionAddresses = (DWORD*)(pTargetModuleBase + pExportDir->AddressOfFunctions);
    DWORD* pFunctionNames = (DWORD*)(pTargetModuleBase + pExportDir->AddressOfNames);
    WORD* pFunctionOrdinals = (WORD*)(pTargetModuleBase + pExportDir->AddressOfNameOrdinals);
    info("%d Exported functions in KERNEL32.DLL:\n", pExportDir->NumberOfFunctions);
    for (DWORD i = 0; i < pExportDir->NumberOfFunctions; i++) {
        if (pFunctionNames[i] == 0) {
            continue;
    }
    PCHAR pFunctionName = (PCHAR)(pTargetModuleBase + pFunctionNames[i]);
    DWORD functionAddress = pFunctionAddresses[pFunctionOrdinals[i]];
        printf("  %s at %p\n", pFunctionName, (PBYTE)pTargetModuleBase + functionAddress);
    }

    return 0;
}

kernel32 exported functions — The PoC code was added to the module finding code presented previously.

Reusing the IAT
#

While performing the research for this post, I found that apparently, this technique has already been investigated and is known as the Cordyceps technique. A link can be found here , where the original author describes it best.

This method of resolving function addresses is relatively new. As mentioned in its name, the idea is to reuse the already present functions in the target process’s IAT. This means we won’t have to “walk the PEB” to find the target function, which in turn means the shellcode acts less suspicious to a security solution.

Implementation Details
#

As described in his post, we can call a function as a regular program would do by reusing its IAT. Let’s see how a regular program would call the function that we’re interested in calling (MessageBoxA). I used the straightforward program shown below for this demonstration.

#include <windows.h>

int main() {
    MessageBoxA(NULL, "Hello, this is a simple message box!", "Simple Message Box", MB_OK);
    return 0;
}

If we open it up in a debugger and place a breakpoint on main() and step until we reach the call to MessageBoxA, we can see the following:

Notice the addresses that I’ve marked. According to the debugger, the expression ds:[<&MessageBoxA>] is actually ds:[00007FF7B3B82080 <simplemessageboxcaller.&MessageBoxA>], and its value is <user32.MessageBoxA>. This indicates that this expression is referring to address 0x00007FF7B3B82080 of the program, in which an address pointing to the MessageBoxA function in the user32.dll module resides. Let’s take a look for ourselves.

When browsing for this target address, I found that it is contained within the .rdata section. Apparently, these entries that are displayed in the figure above represent the IAT. And note that since the program is now loaded in memory, the destination addresses have been resolved! Take a look at the value stored at address 0x00007FF7B3B82080 shown below:

resolved address of MessageBoxA — The value is the address of the `MessageBoxA` function within the **user32** module memory space.

Now, we can analyze the call instruction that’s set to be executed. I’ll post the screenshot, which displays it again, here for convenience.

Looking at the opcodes and online resources, we can gather the following:

The second byte of the opcodes (0x15) is known as the ModR/M byte, and indicates the following:
- mod (bits 7 - 6): 00 –> indirect addressing
- reg (bits 5 - 3): 010 –> /2, which means CALL (near, absolute indirect)
- r/m (bits 2 - 0): 101 –> displacement-only addressing, which is indicated in the next 4 bytes
The last 4 bytes, therefore, indicate the displacement. We can therefore infer that the instructions indicate to call whatever’s located at the address residing at 0x1063 bytes from here. That address is calculated below:

$$ \texttt{00007FF7B3B81017} \; (\text{current address}) + \texttt{1063} \; (\text{displacement}) + \texttt{6} \; (\text{size of call instruction}) = \texttt{00007FF7B3B82080} $$

To finalize this section, let’s take a look at this executable in PE Bear.

MessageBoxA in IAT — Note the ‘Call Via’ column. This value indicates the location where the target function’s address will be stored. The value `2080` coincides with the value (offset) that was calculated above.

Points to Consider
#

Obviously, due to the details described above, this method is meant to be used on an existing binary, since we need an IAT. Recall the assembly program written in the first part of this series; the call instruction was:

call r10 ; translates to `41 FF D2`

We would replace this with:

call qword ptr [rip + calculated_offset] ; translates to FF 15 <4 bytes indicating displacement>

Implementation
#

Walking the PEB in Assembly
#

Performing a PEB walk in assembly is actually easier than implementing the C-version of it, since we don’t have to worry about types and typecasting. Given that the offsets of the values we’re interested in are public knowledge, all that needs to be done is make a few calculated jumps to acquire the address of the target module (i.e. kernel32.dll).

The same search logic can be implemented in assembly, like so:

GetKernel32Address PROC

    mov rax, gs:[60h]           ; retrieve PEB pointer in x64
    mov rax, [rax + 18h]        ; Ldr member is at offset 0x18
    mov rax, [rax + 20h]        ; InMemoryOrderModuleList at offset 0x20
    mov rax, [rax]              ; first entry in the list (this program)
    mov rax, [rax]              ; second entry in the list (ntdll.dll)
    mov rax, [rax - 10h + 30h]  ; DllBase of kernel32.dll

GetKernel32Address ENDP

Finding a Function in Assembly
#

Now that we have the base address of a module in memory, we can iterate over its exports in search of the function that we’re interested in. An implementation of the logic we wrote before is presented below:

GetLoadLibraryAddress PROC

    mov rbx, rcx                            ; save a copy of kernel32 address in rbx so we can manipulate rcx
    xor rdx, rdx

    cmp word ptr [rcx], 5A4Dh               ; compare USHORT in rcx with 5A4D (IMAGE_DOS_HEADER)
    jnz failed

    mov eax, dword ptr [rcx + 3Ch]          ; load dword at [rcx + 3C]
    lea rcx, [rcx + rax]                    ; load [rcx + rax] into rcx -> rcx is now pointing to the IMAGE_NT_HEADERS
    cmp dword ptr [rcx], 00004550h          ; compare dword in rcx with 00004550 (IMAGE_NT_SIGNATURE)
    jnz failed

    mov edx, dword ptr [rcx + 18h + 70h]    ; rdx holds the offset to the first entry in the array of DataDirectory[16]
    add rdx, rbx                            ; add the base address of kernel32.dll to rdx; since IMAGE_EXPORT_DIRECTORY is at index 0, we can just use rdx directly

    xor r11, r11
    mov r11d, dword ptr [rdx + 14h]         ; NumberOfFunctions is at offset 0x14

    mov r8d, dword ptr [rdx + 1Ch]          ; AddressOfFunctions is at offset 0x1C
    add r8, rbx                             ; add the base address of kernel32.dll

    mov r9d, dword ptr [rdx + 20h]          ; AddressOfNames is at offset 0x20
    add r9, rbx                             ; add the base address of kernel32.dll

    mov r10d, dword ptr [rdx + 24h]         ; AddressOfNameOrdinals is at offset 0x24
    add r10, rbx                            ; add the base address of kernel32.dll

    xor rcx, rcx                            ; clear this register so we can use it as a counter

find_function:
    mov esi, dword ptr [r9 + rcx * 4]       ; rsi now holds the offset of the function name
    add rsi, rbx                            ; add the base address of kernel32.dll to rsi
    cmp dword ptr [rsi], 'daoL'             ; compare the first 4 bytes of the function name with 'Load' in LE
    jne next_function
    cmp dword ptr [rsi + 4], 'rbiL'         ; compare the next 4 bytes with 'Libr' in LE
    jne next_function
    cmp dword ptr [rsi + 8], 'Ayra'         ; compare the next 4 bytes with 'aryA' in LE
    cmp byte ptr [rsi + 12], 00h            ; check if null-terminated, could be removed?
    jne next_function

    movzx rax, word ptr [r10 + rcx * 2]     ; if we found the function name, load the corresponding ordinal into rax
    mov eax, dword ptr [r8 + rax * 4]       ; use the ordinal as index in ArrayOfFunctions to get the function RVA
    add rax, rbx                            ; add the base address of kernel32.dll to rax
    test rax, rax                           ; check if rax is NULL
    jz failed                               ; if rax is NULL, jump to failed
    ret

next_function:
    inc rcx                                 ; increment the index
    cmp rcx, r11                            ; check if we have reached the end of the function names
    jae failed                              ; if we have, jump to failed
    jmp find_function                       ; otherwise, continue searching for the function name

failed:
    xor rax, rax
    ret

GetLoadLibraryAddress ENDP

The code above might be a bit difficult to follow, so I’ve also added its CFG below:

Putting it Together
#

Now that we can find LoadLibraryA, we can use it to load a module! And what better example than to load the module that exports MessageBoxA: user32.dll.

Entrypoint PROC

    push rbp
    mov rbp, rsp
    sub rsp, 16+32+8+8                      ; allocate my space for 'user32.dll' string (16) + shadow space (32) + 8 bytes for alignment + 8 for return address

    call GetKernel32Address                 ; this procedure doesn't manipulate the stack

    mov rcx, rax
    call GetLoadLibraryAddress              ; this procedure doesn't manipulate the stack
    mov r10, rax

    mov dword ptr [rbp - 10h], 72657375h    ; 'user' LE
    mov dword ptr [rbp - 0Ch], 642e3233h    ; '32.d' LE
    mov dword ptr [rbp - 08h], 00006c6ch    ; 'll' LE

    lea rcx, [rbp - 10h]                    ; rcx now points to the stack where 'user32.dll' is stored (in "our" first 16 bytes space)
    call r10

    add rsp, 16+32+8+8                      ; clean up the stack
    mov rsp, rbp
    pop rbp
    ret

Entrypoint ENDP

There seems to be a bit of a snag
#

After writing the GetLoadLibraryAddress code, I thought it was going to be easy enough to call LoadLibraryA to load the user32.dll module by pushing the string onto the stack. This did not turn out to be the case. I miscalculated the stack space, and it turns out that the user32.dll string was being overwritten during the execution of LoadLibraryA. Let’s take a look. I originally had something like this:

Entrypoint PROC

    push rbp
    mov rbp, rsp
    sub rsp, 32+8+8             ; allocate shadow space

    call GetKernel32Address

    mov rcx, rax                ; save kernel32 address in rcx to call GetLoadLibraryAddress
    call GetLoadLibraryAddress
    mov r10, rax

    mov rax, 0000000000006c6ch  ; 'll'
    push rax
    mov rax, 642e323372657375h  ; 'user32.d'
    push rax

    mov rcx, rsp                ; rcx now points to the stack where 'user32.dll' is stored
    call r10                    ; call LoadLibraryA with the address of 'user32.dll' on the stack

    add rsp, 32+8+8             ; deallocate shadow space
    mov rsp, rbp
    pop rbp
    ret

Entrypoint ENDP

First, we check the stack right before executing call r10.

In retrospect, this was such a simple mistake, but it took a couple of hours for me to find… Let’s step through the code and see what happens with the stack string.

right after call r10 — Note here that rcx is still pointing to the **user32.dll** string on the stack

And then finally, we see what’s happening:

This led to LoadLibraryA essentially being fed garbage, resulting in the target module not being loaded.

While trying to solve this issue, I came up with a couple of ideas:

We could push the stack string “down” by pushing some garbage onto the stack “above” the stack string. Of course, we would have to make sure to load the correct address into rcx, since the top of the stack would contain the “garbage”. It would look something like this:

mov rax, 0000000000006c6ch  ; 'll'
push rax
mov rax, 642e323372657375h  ; 'user32.d'
push rax

xor rax, rax                ; clear rax
push rax                    ; nasty fix
push rax                    ; nasty fix

lea rcx, [rsp+10h]          ; rcx now points to the stack where 'user32.dll' is stored
call r10                    ; call LoadLibraryA with 'user32.dll' as the argument

This is what we saw before: the ‘user32.dll’ string is overwritten.

after — Now that we’ve pushed ‘sacrificial’ data onto the stack, the ‘user32.dll’ string remains untouched.

We could recalculate the amount of stack space we’re reserving and adjust accordingly. This is where I thought of pushing strings onto the stack using rbp, which I had completely forgotten about, so that modification is included in the below snippet:

Entrypoint PROC

    push rbp
    mov rbp, rsp
    sub rsp, 16+32+8+8                      ; allocate my space for 'user32.dll' string (16) + shadow space (32) + 8 bytes for alignment + 8 for return address
                                            ; The 32 bytes of shadow space can be overwritten by the callee if it chooses to do so.
                                            ; Although that doesn't seem to be the case here, I decided to create a little space (16 bytes)
                                            ; before the shadow space so everything stays nice and clean.

    call GetKernel32Address                 ; this procedure doesn't manipulate the stack

    mov rcx, rax
    call GetLoadLibraryAddress              ; this procedure doesn't manipulate the stack
    mov r10, rax

    mov dword ptr [rbp - 10h], 72657375h    ; 'user' LE
    mov dword ptr [rbp - 0Ch], 642e3233h    ; '32.d' LE
    mov dword ptr [rbp - 08h], 00006c6ch    ; 'll' LE

    lea rcx, [rbp - 10h]                    ; rcx now points to the stack where 'user32.dll' is stored (in "our" first 16 bytes space)
    call r10

    add rsp, 16+32+8+8                      ; clean up the stack
    mov rsp, rbp
    pop rbp
    ret

Entrypoint ENDP

I ended up going with option 2 since it resulted in cleaner code for me.

Demo Time
#

Now that that’s been cleared up, we can proceed with the long-awaited demo!

It doesn’t seem like much now, but we’ll use this knowledge in the next and penultimate part of this series: executing MessageBoxA!

Writing Custom Shellcode - This article is part of a series.

Part 1: Windows x64 Shellcode - Part 1

Part 2: This Article

Introduction #

Approaches #

Walking the PEB #

Finding a Module #

Finding a Function in the Module #

Reusing the IAT #

Implementation Details #

Points to Consider #

Implementation #

Walking the PEB in Assembly #

Finding a Function in Assembly #

Putting it Together #

There seems to be a bit of a snag #

Demo Time #