A Detailed Description of CVE-2016-0176 and Its Exploitation
Essentials of a Successful Pwn of Microsoft Edge
A successful Pwn of Microsoft Edge consists of two essential parts: Browser RCE(Remote Code Execution) and browser sandbox bypass. Browser RCE is typically achieved by exploiting a Javascript vulnerability, while browser sandbox bypass can be achieved in different ways, logical sandbox escape or EoP(Escalation of Privilege) through kernel vulnerabilities.
Sandbox of Microsoft Edge is built upon the access check mechanism. In Windows operating system, resources are shared in system-wide range, for example, a file or device can be shared across different processes. Some resources contain sensitive informations, some others are critical to the whole system’s well-functioning, corruptions of those resources will crash the whole system. For those reasons, there should be strict checks when a process want to access a specific resource, this is called access check. When a resource is opened, token of the subject process will be checked against security descriptor of the object resource. Access check consists of several elementary checks in different dimensions, such as ownership and group membership check, privileges check, integrity level and trust level check, capabilities check, etc. The previous generation sandbox is based on integrity level check, where the sandboxed application runs in low integrity level, thus it can not access resources protected by medium or higher integrity level. Microsoft Edge adopts new generation sandbox based on AppContainer, where additional capabilities check will be conducted when accessing resources, besides basic integrity level check. For more details about access check mechanism, refer to my talk at ZeroNights 2015: Did You Get Your Token?
The most common approach of a sandbox bypass is EoP though kernel vulnerabilities, with DKOM(Direct Kernel Object Manipulation) on token objects.
CVE-2016-0176
This vulnerability is in dxgkrnl.sys driver, and it is a heap overflow vulnerability.
The data structure that has been abused is shown as below:
1 | typedef struct _D3DKMT_PRESENTHISTORYTOKEN |
I will use “history token” as alias of this structure, there are some prerequisites for this vulnerability in this structure:
- Model member should be set to D3DKMT_PM_REDIRECTED_FLIP;
- TokenSize member should be set to 0x438;
You may already guessed that the vulnerability is in the Token.Flip member, whose type is shown as below:
1 | typedef struct _D3DKMT_FLIPMODEL_PRESENTHISTORYTOKEN |
Keep diving into the last member DirtyRegions:
1 | typedef struct tagRECT |
Now we reach to the primitive level, there is a DWORD member NumRects, and an array of RECT structures as Rects, this array is fixed-sized to 16 elements, each element is 0x10 bytes, so the size of Rects is 0x100 bytes.
This graph above shows the relationship and layout of abused data structures, the left column is the data structure that we prepared in user-mode and passed to kernel-mode drivers by calling Win32 API D3DKMTPresent, the middle column is the data structure that dxgkrnl.sys driver received and maintained, it is copied out from the user-mode buffer, the right column is the embedded union member Token.Flip, a very important feature of this union member is that it is the largest member in the union, we know that the size of a union is determined by its largest member, so the content of Token.Flip stretches to the end of the history token structure. This layout simplifies the exploitation to a large extent.
With the knowledge of the abused data structures, it will be easy to understand the vulnerability, below is the disassembly code snippet that cause the overflow:
loc_1C009832A: DXGCONTEXT::SubmitPresentHistoryToken(......) + 0x67B
cmp dword ptr[r15 + 334h], 10h // NumRects
jbe short loc_1C009834B; Jump if Below or Equal(CF = 1 | ZF = 1)
call cs : __imp_WdLogNewEntry5_WdAssertion
mov rcx, rax
mov qword ptr[rax + 18h], 38h
call cs : __imp_WdLogEvent5_WdAssertion
loc_1C009834B: DXGCONTEXT::SubmitPresentHistoryToken (......) + 0x6B2
mov eax, [r15 + 334h]
shl eax, 4
add eax, 338h
jmp short loc_1C00983BD
loc_1C00983BD: DXGCONTEXT::SubmitPresentHistoryToken (......) + 0x6A5
lea r8d, [rax + 7]
mov rdx, r15; Src
mov eax, 0FFFFFFF8h;
mov rcx, rsi; Dst
and r8, rax; Size
call memmove
The r15 register is pointing to the buffer of history token at the entry of this piece of code. It first picks out the DWORD at 0x334 offset and compare it with 0x10, we already know that this DWORD is the Token.Flip.NumRects field, so it is checking if this field exceeds the capacity of the embedded array Token.Flip.Rects. If you are doing code auditing, and you see this check, you may feel frustrated and soliloquize that Microsoft already realized the potential problem here and done some check. But when you move forward, you will see after this check the code logs this abnormal behavior to the watch dog driver with assertion logic, and either branches initiated from this comparison will flow into the same code block at loc_1C009834B. Then you may think that the watch dog driver will invoke the bug check logic in case of overflow, but nothing happened actually. No matter what the value is in Token.Flip.NumRects field, the code flow will reach the block at loc_1C009834B, this block first does some arithmatic calculation based on the Token.Flip.NumRects field and then use it as the size of a memcpy operation.
I rewrite this piece of disassembly code to C++ code as below:
1 | D3DKMT_PRESENTHISTORYTOKEN* hist_token_src = BufferPassedFromUserMode(…); |
Things become clear in C++ codes, no matter what the Token.Flip.NumRects is, dxgkrnl.sys driver will do a memcpy operation, the source buffer of this memcpy is the buffer we passed from user-mode by calling Win32 API D3DKMTPresent function, the destination of this memcpy is a piece of buffer allocated from kernel-mode pool by ExpInterlockedPopEntrySList, the size of this memcpy is calculated by adding the array size of Token.Flip.NumRects elements with the buffer size before this array. If we pass a value larger than 0x10 in Token.Flip.NumRects field in the user-mode buffer, then an overflow to kernel-mode paged pool will occur, we can control the size of the overflow, as well as the first 0x38 bytes content of this overflow. (0x38 more bytes can be set after the end of history token, check the layout graph for more details.)
This vulnerability is interesting, because Microsoft already foresee it but fail to prevent it. The lesson is do not fully trust some best practices unless you know it very well, such as assertion mechanism.
Exploitation
For exploitation of a heap overflow, the layout of the destination buffer is very important. We already know that the destination buffer is allocated from kernel-mode paged pool with ExpInterlockedPopEntrySList function.
With a little debugging work, we can get some basic information about the destination buffer.
kd> u rip-6 L2
dxgkrnl!DXGCONTEXT::SubmitPresentHistoryToken+0x47b:
fffff801`cedb80fb call qword ptr [dxgkrnl!_imp_ExpInterlockedPopEntrySList (fffff801`ced77338)]
fffff801`cedb8101 test rax,rax
kd> !pool rax
Pool page ffffc0012764c5a0 region is Paged pool
*ffffc0012764b000 : large page allocation, tag is DxgK, size is 0x2290 bytes
Pooltag DxgK : Vista display driver support, Binary : dxgkrnl.sys
It is a large buffer in 0x2290 bytes, as its size is larger than 1 page(a page is 0x1000 bytes), it will be allocated as large page allocation. In this case, 3 continuous pages will be consumed to serve this allocation request. The extra bytes after 0x2290 offset will be reclaimed and linked back to free list of paged pool, while an extra separating pool entry tagged as “Frag” will be added between them. For more information about Windows kernel pool layout and large page allocation, please refer to Kernel Pool Exploitation on Windows 7. Below is how it looks at the 0x2290 offset:
kd> db ffffc0012764b000+0x2290 L40
ffffc001`2764d290 00 01 02 03 46 72 61 67-00 00 00 00 00 00 00 00 ....Frag........
ffffc001`2764d2a0 90 22 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ."..............
ffffc001`2764d2b0 02 01 01 00 46 72 65 65-0b 43 44 9e f1 81 a8 47 ....Free.CD....G
ffffc001`2764d2c0 01 01 04 03 4e 74 46 73-c0 32 42 3a 00 e0 ff ff ....NtFs.2B:....
It is DXGPRESENTHISTORYTOKENQUEUE::GrowPresentHistoryBuffer who is responsible for allocating and managing history tokens as a singly-linked list. Each history token is 0x438 bytes in size, and extend to 0x450 bytes by counting pool header and padding bytes in; The large page allocation is divided into 8 history tokens, linked in reverse order to form the singly-linked list. Dxgkrnl.sys driver intends to use this slist as look-aside list for serving the allocation requests of history token.
This singly-linked list looks as below initially:
The singly-linked list looks as below after serving 1 history token allocation request:
The singly-linked list looks as below after serving 2 history token allocation request:
With knowledge of the memory layout of destination buffer of the heap overflow, we have 2 ideas about exploitation:
Idea 1. Overflow the buffer after 0x2290 offset, where maybe reused by some small allocations from paged pool:
Idea 2. Overflow the adjacent history token’s header, which may abuse the singly-linked list:
The first exploitation idea has some limitations, recall that we can control only 0x38 bytes of the overflowed content, it means we can almost control nothing but the padding bytes, separating frag pool entry and the following pool entry’s header.
The second exploitation idea seems promising, although now Windows kernel is enforcing strict validation for doubly-linked list, but no checks for singly-linked list, we can play the redirecting tricks for singly-linked list.
Let’s do some thought experiments just like Einstein for idea 2. In the above graphs, we see that after poping 2 history tokens out of the slist, we can overflow node B and overwriting the header of node A. Then we push node B back to the slist:
What happens after we push node A back to the slist, will it redirect next pointer to the overwritten QWORD?
Actually this will never happen, because while we pushing node A back to slist, the overwritten QWORD in node A’s header will be recovered to pointing to node B:
Then we try another possibility, first get back to where after poping 2 nodes out of slist:
This time we first push node A back to slist:
Then we overflow node B to overwrite node A’s header, because now node A already be reclaimed to slist, and its header will not be recovered any more. Now the slist is broken and redirected to the overwritten QWORD:
After this series of thought experiments, it is more promising for exploitation in idea 2, let’s get our hands dirty. It seems that we need to pop and push the slist in random order to trigger the above redirection, at least 2 continuous pops side by side. I did the following tries:
1st Try: Loop calling D3DKMTPresent with overflowing fields set in the buffer.
This time I failed, it turns out looping poping node A out and pushing node A back again, in this case I can only overflow as idea 1. The reason is simple, those D3DKMTPresent API calls are served in turns, so we need to call it simultaneously.
2nd Try: Loop calling D3DKMTPresent with overflowing fields set in the buffer from multiple threads.
This time I failed again, after checking some disassembly codes, I believe
the callstack of D3DKMTPresent is protected by a lock.
After those 2 tries, I start to doubt if the 2 continuous pops are doable, I abandoned this doubt quickly after realizing the complex slist should not be degenerated to 1 element, there should be other callstacks triggering pop of the slist. I wrote a windbg script for logging push and pop operations, and tried launching some graphics intensive applications while doing 2nd try. Then miracle happened, while I playing with the built-in Solitaire games, a double pop happened, I debugged and found out a BitBlt API will trigger poping elements out of the slist from another callstack.
3rd and Last Try: Loop calling D3DKMTPresent with overflowing fields set in the buffer from multiple threads, while loop calling BitBlt from another multiple threads.
It succeeded in redirecting the next pointer in slist, and lead to arbitrary write to kernel-mode memory. But it is still far from perfect, we need to find out the tokens of current and system process, and do token stealing. During this process, more than 1 reads and writes are needed, but the tricks above is not easily repeatable, especially with the strict rules of Pwn2Own 2016 that only 3 tries within 15 minutes, some more tricks is needed.
Some More Tricks
Repeatable arbitrary read and write into kernel-mode memory
I used Win32k bitmap object as intermediate targets, I did it by first spraying lots of bitmap objects into kernel-mode memory, and then guessing their addresses as targets of the redirection write. If I succeeded in hitting one of those bitmap objects, I modify the buffer pointer and size field in it, make it pointing to another bitmap object. So 2 bitmap objects in use, first for controlling the address of read and write, second for doing actual read and write.
Actually I sprayed bitmap objects into 4GB range of memory, I first sprayed 256MB large bitmap objects to reserve continuous and well-aligned pool memory, then I replace them with 1MB small bitmap objects whose address is aligned at 0x100000 boundary, which makes guessing much easier.
Information leakage is needed as a hint for guessing the addresses of sprayed bitmap objects, this is done with the help of user32! gSharedInfo.
Token Stealing
With the ability of repeatably arbitrary read and write, as well as information leakage of nt kernel module base address by sidt, we can easily find the address of nt!PspCidTable, then we can find the _EPROCESS object of current and system process by parsing this table, and get the respective _TOKEN object addresses and finally do the token stealing.
Exploitation Code(parts)
1 | VOID ThPresent(THREAD_HOST * th) |