Introduction #
As mentioned in the previous part of this series, I’m going to build on the assembly program that we created, which launches a simple message box. In particular, I want to add a way to dynamically resolve functions without having to hardcode their addresses. This is what’s known as dynamically resolving the function’s address.
Approaches #
There are two ways we can dynamically resolve the target function’s address:
- Resolve the address of the famous
GetProcAddressWinAPI. We can then reuse this function to acquire the address of any function present in the loaded DLLs of the process. Typically, theGetModuleHandleorLoadLibraryAWinAPI will also be loaded to facilitate further function retrieval. - Use the functions available in the target process, located in its IAT.
Let’s go over how one could implement each of these approaches.
Walking the PEB #
This method has the benefit of being very flexible. We don’t need to rely on any special scenarios, since kernel32.dll is one of the fundamental modules loaded when a process is executed. Since it will always be present, and therefore the GetModuleHandle/LoadLibraryA and GetProcAddress functions can be found, we can load any other module that we wish, and look up any function that we choose.
However, there is a potential downside to using this technique. This is a method commonly used by shellcode to resolve function addresses, and can therefore seem suspicious to an AV or EDR. The offsets and lookup algorithm are characteristic and can easily be distinguished by a security solution.
Finding a Module #
This is a well-known technique to retrieve the WinAPIs mentioned above, allowing us to call other functions with ease. First, we need the address of either LoadLibraryA or GetModuleHandle, which can be done by performing the following steps:
- Get a pointer to the PEB.
- Get a pointer to the
Ldrmember, which points to aPEB_LDR_DATAstructure. - This structure contains a member called
InMemoryOrderModuleList, which points to the head of a doubly linked list ofLIST_ENTRYstructures, each representing a loaded module in the process. Each of these structures has a forward and backward pointer toLDR_DATA_TABLE_ENTRYstructures. - Each
LDR_DATA_TABLE_ENTRYhas aUNICODE_STRINGmember namedFullDllName, indicating the name of the DLL. The idea is to compare this value against the name of the target DLL, and once a match is found, save itsDllBasevalue, which gives us the module’s base address.
The first LDR_DATA_TABLE_ENTRY represents the process itself, followed by ntdll.dll, kernel32.dll, and so on, depending on the process. Below is an implementation of the steps described above to print all of the loaded modules in the process.
#include <Windows.h>
#include <stdio.h>
#include <winnt.h>
#include <winternl.h>
#define info(msg, ...) printf("[i] " msg "\n", ##__VA_ARGS__)
#define warn(msg, ...) printf("[!] " msg "\n", ##__VA_ARGS__)
int main() {
PPEB pPEB = (PPEB)__readgsqword(0x60);
PPEB_LDR_DATA pPEBLdrData = (PPEB_LDR_DATA)(pPEB->Ldr);
PLDR_DATA_TABLE_ENTRY pLdrDataTableEntry = NULL;
PLIST_ENTRY pListEntryNode = &pPEBLdrData->InMemoryOrderModuleList;
do {
pLdrDataTableEntry = (PLDR_DATA_TABLE_ENTRY)pListEntryNode;
if (pLdrDataTableEntry->DllBase == NULL) {
pListEntryNode = pListEntryNode->Flink;
continue;
}
info("Found loaded module '%ws'\n", pLdrDataTableEntry->FullDllName.Buffer);
pListEntryNode = pListEntryNode->Flink;
} while (pListEntryNode != &pPEBLdrData->InMemoryOrderModuleList);
return 0;
}
As mentioned above, the InMemoryOrderModuleList member points to the head of a doubly linked list of LIST_ENTRY structures. I’m pretty rusty with data structures and was having some trouble with the logic used for traversing this list. I jumped into WinDbg to see why I was having some issues and noticed that the value of DllBase seemed to be… off.
LDR_DATA_TABLE_ENTRY entry corresponding to the executable itself. Note the value of DllBase. Also, take a look at the InMemoryOrderLinks value.
Something’s not making sense here… Turns out I had misunderstood how these structures are linked. Each of the LIST_ENTRY structures points to the next/previous LIST_ENTRY, and each of these LIST_ENTRY members is offset in the “destination” LDR_DATA_TABLE_ENTRY structure. Therefore, we cannot just use the value of Flink directly. Of course, the simple fix (more like “patch” since this isn’t easily maintainable and isn’t guaranteed not to change) would be to subtract the offset of the LIST_ENTRY within the LDR_DATA_TABLE_ENTRY (0x10 in this case) from the address pointed to by each Flink. However, we will use a solution made specifically for these cases: CONTAINING_RECORD
. After a bit of refactoring, the final code used to parse the loaded modules is shown below.
#include <Windows.h>
#include <stdio.h>
#include <winnt.h>
#include <winternl.h>
#define info(msg, ...) printf("[i] " msg "\n", ##__VA_ARGS__)
#define warn(msg, ...) printf("[!] " msg "\n", ##__VA_ARGS__)
int main() {
PPEB pPEB = (PPEB)__readgsqword(0x60);
PPEB_LDR_DATA pPEBLdrData = (PPEB_LDR_DATA)(pPEB->Ldr);
PLIST_ENTRY pListHead = &pPEBLdrData->InMemoryOrderModuleList;
PLIST_ENTRY pListEntry = pListHead->Flink;
while (pListEntry != pListHead) {
PLDR_DATA_TABLE_ENTRY pLdrDataTableEntry = CONTAINING_RECORD(
pListEntry,
LDR_DATA_TABLE_ENTRY,
InMemoryOrderLinks
);
if (pLdrDataTableEntry->DllBase == NULL) {
warn("Found module with NULL DllBase — skipping\n");
pListEntry = pListEntry->Flink;
continue;
}
info("Loaded module '%ws' @ 0x%p\n",
pLdrDataTableEntry->FullDllName.Buffer,
pLdrDataTableEntry->DllBase
);
pListEntry = pListEntry->Flink;
}
return 0;
}
The attentive viewer may have realized that the FullDllName member in the incorrect version of this program was actually getting replaced with the value of the Reserved4 member. If we subtract the offset (0x10) of the LIST_ENTRY from the address of the LDR_DATA_TABLE_ENTRY entry, we see that the values of the structure make a lot more sense.
In the official Microsoft documentation, there are many members marked as Reserved. Thankfully, some fantastic people have created online repositories documenting these “hidden” structures and other things that Microsoft has chosen to keep obscure. Two great examples are: ntdoc
and the Vergilius Project
. If we look up the LDR_DATA_TABLE_ENTRY structure
, we can conclude that the Reserved4 member must be BaseDllName of type UNICODE_STRING. If we apply this knowledge in WinDbg, we can see that FullDllName was in fact getting replaced by the value of Reserved4, which we’ve discovered now, is actually BaseDllName.
BaseDllName was being printed before, not the actual FullDllName.
Finding a Function in the Module #
Once we have the base address of kernel32.dll, we can proceed to resolve the addresses of the functions of interest. Specifically, we’re interested in LoadLibraryA or GetModuleHandle, as well as GetProcAddress. To locate a function in a DLL using the module’s base address, we can parse its export table as follows:
- Verify the signature in the
IMAGE_NT_HEADERS. - The
IMAGE_OPTIONAL_HEADERcontains anIMAGE_DATA_DIRECTORYarray calledDataDirectory. The RVA and size of the export table reside at indexIMAGE_DIRECTORY_ENTRY_EXPORT(usually 0). - The
IMAGE_EXPORT_DIRECTORYcontains several members of interest:NumberOfFunctions,AddressOfFunctions,AddressOfNames, andAddressOfNameOrdinals. By looping through these arrays (which are aligned by index), we can perform a simple string comparison of each name against our target function’s name. Once found, we use the corresponding ordinal to retrieve the function’s address.
A proof of concept that lists the exported functions of a module is presented below, run on the base address of kernel32.
#include <Windows.h>
#include <stdio.h>
#include <winnt.h>
#include <winternl.h>
int main() {
// pTargetModuleBase already found...
if (pTargetModuleBase == NULL) {
fail("Failed to find target module");
} else {
info("Target module found at base address: %p\n", pTargetModuleBase);
}
PIMAGE_DOS_HEADER pDosHeader = (PIMAGE_DOS_HEADER)pTargetModuleBase;
if (pDosHeader->e_magic != IMAGE_DOS_SIGNATURE) { // check the DOS header signature
fail("Invalid DOS header signature");
}
PIMAGE_NT_HEADERS pNtHeaders = (PIMAGE_NT_HEADERS)(pTargetModuleBase + pDosHeader->e_lfanew);
if (pNtHeaders->Signature != IMAGE_NT_SIGNATURE) { // check the NT header signature
fail("Invalid NT header signature");
}
PIMAGE_EXPORT_DIRECTORY pExportDir = (PIMAGE_EXPORT_DIRECTORY)(pTargetModuleBase + pNtHeaders->OptionalHeader.DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT].VirtualAddress);
if (pExportDir == NULL) {
fail("Failed to get export directory");
}
DWORD* pFunctionAddresses = (DWORD*)(pTargetModuleBase + pExportDir->AddressOfFunctions);
DWORD* pFunctionNames = (DWORD*)(pTargetModuleBase + pExportDir->AddressOfNames);
WORD* pFunctionOrdinals = (WORD*)(pTargetModuleBase + pExportDir->AddressOfNameOrdinals);
info("%d Exported functions in KERNEL32.DLL:\n", pExportDir->NumberOfFunctions);
for (DWORD i = 0; i < pExportDir->NumberOfFunctions; i++) {
if (pFunctionNames[i] == 0) {
continue;
}
PCHAR pFunctionName = (PCHAR)(pTargetModuleBase + pFunctionNames[i]);
DWORD functionAddress = pFunctionAddresses[pFunctionOrdinals[i]];
printf(" %s at %p\n", pFunctionName, (PBYTE)pTargetModuleBase + functionAddress);
}
return 0;
}
Reusing the IAT #
This method of resolving function addresses is relatively new. As mentioned in its name, the idea is to reuse the already present functions in the target process’s IAT. This means we won’t have to “walk the PEB” to find the target function, which in turn means the shellcode acts less suspicious to a security solution.
Implementation Details #
As described in his post, we can call a function as a regular program would do by reusing its IAT. Let’s see how a regular program would call the function that we’re interested in calling (MessageBoxA). I used the straightforward program shown below for this demonstration.
#include <windows.h>
int main() {
MessageBoxA(NULL, "Hello, this is a simple message box!", "Simple Message Box", MB_OK);
return 0;
}
If we open it up in a debugger and place a breakpoint on main() and step until we reach the call to MessageBoxA, we can see the following:
Notice the addresses that I’ve marked. According to the debugger, the expression ds:[<&MessageBoxA>] is actually ds:[00007FF7B3B82080 <simplemessageboxcaller.&MessageBoxA>], and its value is <user32.MessageBoxA>. This indicates that this expression is referring to address 0x00007FF7B3B82080 of the program, in which an address pointing to the MessageBoxA function in the user32.dll module resides. Let’s take a look for ourselves.
When browsing for this target address, I found that it is contained within the .rdata section. Apparently, these entries that are displayed in the figure above represent the IAT. And note that since the program is now loaded in memory, the destination addresses have been resolved! Take a look at the value stored at address 0x00007FF7B3B82080 shown below:
MessageBoxA function within the user32 module memory space.
Now, we can analyze the call instruction that’s set to be executed. I’ll post the screenshot, which displays it again, here for convenience.
Looking at the opcodes and online resources, we can gather the following:
- The second byte of the opcodes (
0x15) is known as the ModR/M byte, and indicates the following:- mod (bits 7 - 6):
00–> indirect addressing - reg (bits 5 - 3):
010–>/2, which meansCALL(near, absolute indirect) - r/m (bits 2 - 0):
101–> displacement-only addressing, which is indicated in the next 4 bytes
- mod (bits 7 - 6):
- The last 4 bytes, therefore, indicate the displacement. We can therefore infer that the instructions indicate to call whatever’s located at the address residing at
0x1063bytes from here. That address is calculated below:
$$ \texttt{00007FF7B3B81017} \; (\text{current address}) + \texttt{1063} \; (\text{displacement}) + \texttt{6} \; (\text{size of call instruction}) = \texttt{00007FF7B3B82080} $$
To finalize this section, let’s take a look at this executable in PE Bear.
2080 coincides with the value (offset) that was calculated above.
Points to Consider #
Obviously, due to the details described above, this method is meant to be used on an existing binary, since we need an IAT. Recall the assembly program written in the first part of this series; the call instruction was:
call r10 ; translates to `41 FF D2`
We would replace this with:
call qword ptr [rip + calculated_offset] ; translates to FF 15 <4 bytes indicating displacement>
Implementation #
Walking the PEB in Assembly #
Performing a PEB walk in assembly is actually easier than implementing the C-version of it, since we don’t have to worry about types and typecasting. Given that the offsets of the values we’re interested in are public knowledge, all that needs to be done is make a few calculated jumps to acquire the address of the target module (i.e. kernel32.dll).
The same search logic can be implemented in assembly, like so:
GetKernel32Address PROC
mov rax, gs:[60h] ; retrieve PEB pointer in x64
mov rax, [rax + 18h] ; Ldr member is at offset 0x18
mov rax, [rax + 20h] ; InMemoryOrderModuleList at offset 0x20
mov rax, [rax] ; first entry in the list (this program)
mov rax, [rax] ; second entry in the list (ntdll.dll)
mov rax, [rax - 10h + 30h] ; DllBase of kernel32.dll
GetKernel32Address ENDP
Finding a Function in Assembly #
Now that we have the base address of a module in memory, we can iterate over its exports in search of the function that we’re interested in. An implementation of the logic we wrote before is presented below:
GetLoadLibraryAddress PROC
mov rbx, rcx ; save a copy of kernel32 address in rbx so we can manipulate rcx
xor rdx, rdx
cmp word ptr [rcx], 5A4Dh ; compare USHORT in rcx with 5A4D (IMAGE_DOS_HEADER)
jnz failed
mov eax, dword ptr [rcx + 3Ch] ; load dword at [rcx + 3C]
lea rcx, [rcx + rax] ; load [rcx + rax] into rcx -> rcx is now pointing to the IMAGE_NT_HEADERS
cmp dword ptr [rcx], 00004550h ; compare dword in rcx with 00004550 (IMAGE_NT_SIGNATURE)
jnz failed
mov edx, dword ptr [rcx + 18h + 70h] ; rdx holds the offset to the first entry in the array of DataDirectory[16]
add rdx, rbx ; add the base address of kernel32.dll to rdx; since IMAGE_EXPORT_DIRECTORY is at index 0, we can just use rdx directly
xor r11, r11
mov r11d, dword ptr [rdx + 14h] ; NumberOfFunctions is at offset 0x14
mov r8d, dword ptr [rdx + 1Ch] ; AddressOfFunctions is at offset 0x1C
add r8, rbx ; add the base address of kernel32.dll
mov r9d, dword ptr [rdx + 20h] ; AddressOfNames is at offset 0x20
add r9, rbx ; add the base address of kernel32.dll
mov r10d, dword ptr [rdx + 24h] ; AddressOfNameOrdinals is at offset 0x24
add r10, rbx ; add the base address of kernel32.dll
xor rcx, rcx ; clear this register so we can use it as a counter
find_function:
mov esi, dword ptr [r9 + rcx * 4] ; rsi now holds the offset of the function name
add rsi, rbx ; add the base address of kernel32.dll to rsi
cmp dword ptr [rsi], 'daoL' ; compare the first 4 bytes of the function name with 'Load' in LE
jne next_function
cmp dword ptr [rsi + 4], 'rbiL' ; compare the next 4 bytes with 'Libr' in LE
jne next_function
cmp dword ptr [rsi + 8], 'Ayra' ; compare the next 4 bytes with 'aryA' in LE
cmp byte ptr [rsi + 12], 00h ; check if null-terminated, could be removed?
jne next_function
movzx rax, word ptr [r10 + rcx * 2] ; if we found the function name, load the corresponding ordinal into rax
mov eax, dword ptr [r8 + rax * 4] ; use the ordinal as index in ArrayOfFunctions to get the function RVA
add rax, rbx ; add the base address of kernel32.dll to rax
test rax, rax ; check if rax is NULL
jz failed ; if rax is NULL, jump to failed
ret
next_function:
inc rcx ; increment the index
cmp rcx, r11 ; check if we have reached the end of the function names
jae failed ; if we have, jump to failed
jmp find_function ; otherwise, continue searching for the function name
failed:
xor rax, rax
ret
GetLoadLibraryAddress ENDP
The code above might be a bit difficult to follow, so I’ve also added its CFG below:
Putting it Together #
Now that we can find LoadLibraryA, we can use it to load a module! And what better example than to load the module that exports MessageBoxA: user32.dll.
Entrypoint PROC
push rbp
mov rbp, rsp
sub rsp, 16+32+8+8 ; allocate my space for 'user32.dll' string (16) + shadow space (32) + 8 bytes for alignment + 8 for return address
call GetKernel32Address ; this procedure doesn't manipulate the stack
mov rcx, rax
call GetLoadLibraryAddress ; this procedure doesn't manipulate the stack
mov r10, rax
mov dword ptr [rbp - 10h], 72657375h ; 'user' LE
mov dword ptr [rbp - 0Ch], 642e3233h ; '32.d' LE
mov dword ptr [rbp - 08h], 00006c6ch ; 'll' LE
lea rcx, [rbp - 10h] ; rcx now points to the stack where 'user32.dll' is stored (in "our" first 16 bytes space)
call r10
add rsp, 16+32+8+8 ; clean up the stack
mov rsp, rbp
pop rbp
ret
Entrypoint ENDP
There seems to be a bit of a snag #
After writing the GetLoadLibraryAddress code, I thought it was going to be easy enough to call LoadLibraryA to load the user32.dll module by pushing the string onto the stack. This did not turn out to be the case. I miscalculated the stack space, and it turns out that the user32.dll string was being overwritten during the execution of LoadLibraryA. Let’s take a look. I originally had something like this:
Entrypoint PROC
push rbp
mov rbp, rsp
sub rsp, 32+8+8 ; allocate shadow space
call GetKernel32Address
mov rcx, rax ; save kernel32 address in rcx to call GetLoadLibraryAddress
call GetLoadLibraryAddress
mov r10, rax
mov rax, 0000000000006c6ch ; 'll'
push rax
mov rax, 642e323372657375h ; 'user32.d'
push rax
mov rcx, rsp ; rcx now points to the stack where 'user32.dll' is stored
call r10 ; call LoadLibraryA with the address of 'user32.dll' on the stack
add rsp, 32+8+8 ; deallocate shadow space
mov rsp, rbp
pop rbp
ret
Entrypoint ENDP
First, we check the stack right before executing call r10.
In retrospect, this was such a simple mistake, but it took a couple of hours for me to find… Let’s step through the code and see what happens with the stack string.
And then finally, we see what’s happening:
This led to LoadLibraryA essentially being fed garbage, resulting in the target module not being loaded.
While trying to solve this issue, I came up with a couple of ideas:
- We could push the stack string “down” by pushing some garbage onto the stack “above” the stack string. Of course, we would have to make sure to load the correct address into rcx, since the top of the stack would contain the “garbage”. It would look something like this:
mov rax, 0000000000006c6ch ; 'll'
push rax
mov rax, 642e323372657375h ; 'user32.d'
push rax
xor rax, rax ; clear rax
push rax ; nasty fix
push rax ; nasty fix
lea rcx, [rsp+10h] ; rcx now points to the stack where 'user32.dll' is stored
call r10 ; call LoadLibraryA with 'user32.dll' as the argument
- We could recalculate the amount of stack space we’re reserving and adjust accordingly. This is where I thought of pushing strings onto the stack using rbp, which I had completely forgotten about, so that modification is included in the below snippet:
Entrypoint PROC
push rbp
mov rbp, rsp
sub rsp, 16+32+8+8 ; allocate my space for 'user32.dll' string (16) + shadow space (32) + 8 bytes for alignment + 8 for return address
; The 32 bytes of shadow space can be overwritten by the callee if it chooses to do so.
; Although that doesn't seem to be the case here, I decided to create a little space (16 bytes)
; before the shadow space so everything stays nice and clean.
call GetKernel32Address ; this procedure doesn't manipulate the stack
mov rcx, rax
call GetLoadLibraryAddress ; this procedure doesn't manipulate the stack
mov r10, rax
mov dword ptr [rbp - 10h], 72657375h ; 'user' LE
mov dword ptr [rbp - 0Ch], 642e3233h ; '32.d' LE
mov dword ptr [rbp - 08h], 00006c6ch ; 'll' LE
lea rcx, [rbp - 10h] ; rcx now points to the stack where 'user32.dll' is stored (in "our" first 16 bytes space)
call r10
add rsp, 16+32+8+8 ; clean up the stack
mov rsp, rbp
pop rbp
ret
Entrypoint ENDP
I ended up going with option 2 since it resulted in cleaner code for me.
Demo Time #
Now that that’s been cleared up, we can proceed with the long-awaited demo!
It doesn’t seem like much now, but we’ll use this knowledge in the next and penultimate part of this series: executing MessageBoxA!