The information has been provided by Derek Soeder.
* VMware Player version 2.0.5-Build 109488
* VMware Server version 1.0.7-Build 108231
* VMware Workstation version 6.0.5-Build 109488
* VMware Server version 1.0.8-Build 126538
* VMware Player version 2.5
* VMware Server version 2.0
* VMware Workstation version 6.5
Trap Flag Set by IRET Not Cleared for CCh Instruction (CVE-2008-4915)
If an interrupt occurs when the Trap Flag is set, a proper CPU clears the Trap Flag before transferring execution to the interrupt handler. The affected versions of VMware, however, exhibit a flaw in that the Trap Flag persists across the mode switch when a single-byte "INT 3" instruction (CCh only, not CDh/03h) executes, if the Trap Flag was set by a kernel-mode IRET. The result is that user-mode code can cause a single-step debug trap (#DB) to occur at the very first instruction of the INT 3 breakpoint (#BP) handler, if it can persuade the kernel to set the Trap Flag via an IRET. On x64 versions of Windows there are multiple techniques for accomplishing this, including the SetThreadContext and ZwContinue APIs, and the method used below.
The following x64 assembly will take advantage of this emulation flaw on x64 Windows to produce a proof-of-concept triple fault:
PUSH 0x100 ; set RFLAGS.TF (Trap Flag) after POPFQ
POPFQ ; #DB occurs *after* next instruction
LOCK LAHF ; trick to make kernel IRETQ and set TF
INT 3 ; emulation flaw pertains to CCh opcode
Just executing "PUSH 0x100 / POPFQ / INT 3" is insufficient, as a Trap Flag set in user mode apparently will not be preserved across the "INT 3" switch into kernel mode. Although a SetThreadContext call that redirects RIP to an "INT 3" instruction and sets RFLAGS.TF would work, the above assembly takes a different approach.
The LOCK prefix is not allowed on most instructions, including LAHF, so "LOCK LAHF" causes an undefined opcode (#UD) fault. On x64 Windows, however, the #UD fault handler will actually emulate the LAHF instruction if it faults, because some x64 processors may not support SAHF and LAHF in 64-bit mode. (See ECX bit 0, "LahfSahf," for CPUID function 8000_0001h in the AMD "CPUID Specification", 25481.pdf.) The LOCK prefix will always force the instruction to cause a #UD fault, and the Windows instruction decoder (NT!KiOpDecode, called from NT!KiPreprocessFault) ignores the prefix when determining the instruction's opcode, so "LOCK LAHF" is as good as an unsupported LAHF. After determining the faulting instruction's opcode, the exception dispatching mechanism indirectly calls NT!KiOp_LSAHF, which advances RIP to point past the instruction, modifies RFLAGS (but not RFLAGS.TF) or AH as appropriate to accomplish emulation, and indicates that the fault handler should resume execution rather than dispatch an exception. As a result, this "LOCK LAHF" instruction leads the kernel to IRETQ directly to the "INT 3" instruction that follows it, with RFLAGS.TF still set from the preceding POPFQ, thereby providing an easy means of exploiting this VMware emulation flaw.
Unlike the first flaw, this flaw is not affected by VMware's "Disable acceleration" option, and does not require repetition due to a timing dependency. More important, however, is that this flaw can be reproduced easily and accidentally, during real-world usage, by attempting to single-step an "INT 3" instruction in a debugger. It is likely that other software developers, and possibly security researchers, have experienced unintended manifestations of this flaw.
This section gives a detailed account of how these emulation flaws can be exploited on Windows XP x64 and Windows Server 2003 x64. Exploitation on x64 versions of *BSD is also believed to be possible, but has not yet been proven, so a brief discussion of the BSD x64 kernel and also the Linux x64 kernel (which is believed to prevent exploitation) is presented first.
The assembly language entry points for BSD's x64 interrupt handlers are contained in "src/sys/arch/amd64/amd64/vector.S", and chiefly consist of the following "INTRENTRY" macro defined in "src/sys/arch/amd64/include/frameasm.h": (The identifier "SEL_UPL" that appears below is defined in "segments.h" as the value 3.)
#define INTRENTRY \
subq $32,%rsp ; \
testq $SEL_UPL,56(%rsp) ; \
je 98f ; \
swapgs ; \
movw %gs,0(%rsp) ; \
movw %fs,8(%rsp) ; \
movw %es,16(%rsp) ; \
movw %ds,24(%rsp) ; \
This prologue is simple and lacks any safeguards against exploitation of the VMware emulation flaws, and in fact, executing the three AT&T-syntax assembly instructions provided to demonstrate the first flaw will reboot the system. Exploitability then solely depends on how GS: is used throughout the rest of the exception handling code. The "INTRFASTEXIT" macro, also defined in "frameasm.h", similarly exhibits the simplest possible GS-swapping logic, with no safety checks:
#define INTRFASTEXIT \
INTR_RESTORE_GPRS ; \
testq $SEL_UPL,56(%rsp) ; \
je 99f ; \
cli ; \
swapgs ; \
movw 0(%rsp),%gs ; \
movw 8(%rsp),%fs ; \
movw 16(%rsp),%es ; \
movw 24(%rsp),%ds ; \
99: addq $48,%rsp ; \
Exploitation of these VMware flaws on BSD is very likely to be identical to exploitation of FreeBSD kernel vulnerability CVE-2008-3890 discovered by Nate Eldredge, although this has not been confirmed.
The Linux kernel is much more careful in its exception handlers, and although the safeguards do not seem to have been designed with knowledge of any specific CPU flaws in mind, they nonetheless offer a general robustness that prevents exploitation of the two VMware emulation flaws discussed in this document. The relevant Linux kernel source resides in "arch/x86/entry_64.S".
Most fault handlers, including the #GP fault handler ("general_protection"), are based on either the "errorentry" or "zeroentry" macro, both of which are defined as code that sets up an exception frame on the stack, then transfers control to the "error_entry" routine. The following excerpt illustrates the major safety check that thwarts exploitation of the first flaw: (Note that capital "CS" and "RIP" correspond to the stack offsets at which the return CS and return RIP are stored. Two definitions of "retint_kernel" are possible, but both lead to "retint_restore_args".)
/* ebx: no swapgs flag (1: don't need swapgs, 0: need it) */
/* There are two places in the kernel that can potentially fault with
usergs. Handle them here. The exception handlers after
iret run with kernel gs again, so don't set the user space flag.
retint_swapgs: /* return to user space */
retint_restore_args: /* return to kernel space */
Although this code uses SWAPGS in the same way as exploitable kernels, the code was written to survive kernel exceptions when user GS is still active, as the block comment suggests. If a fault occurs at the IRETQ instruction ("iret_label"), the code near "error_kernelspace" will recognize this and force a GS swap, which keeps the first emulation flaw from producing a condition where a fault on IRETQ will lead the exception handler to improperly operate on user GS. (Interrupt handlers, constructed using the "interrupt" macro, also flow to the same, shared IRETQ instruction at "iret_label".)
Exploitation of the second flaw is thwarted in an even more elaborate way, by the use of the "paranoidentry" macro for the #DB trap handler "debug". The following excerpt shows the code responsible for sanitizing the GS base address: (Note that "MSR_GS_BASE" refers to MSR C000_0101h, which contains the currently effective base address of GS, rather than MSR C000_0102h, which contains the inactive base address that will be made active by a SWAPGS instruction.)
.macro paranoidentry sym, ist=0, irqtrace=1
If the RDMSR instruction returns a negative EDX (bits 63..32 of the MSR's contents), then the current GS base address resides in kernel space, so no SWAPGS is necessary; otherwise, user GS is active, so a SWAPGS is needed to switch to kernel GS before subsequent kernel code can safely execute.
Because the #DB trap handler performs the extra sanitization of "paranoidentry", causing a single-step trap to occur in the INT 3 handler will not produce an exploitable GS mismatch. In short, the Linux x64 kernel appears immune to attempts to exploit either emulation flaw.
Windows XP x64 and Windows Server 2003 x64
Reliable exploitation of both VMware emulation flaws has been achieved on Windows XP x64 and Windows Server 2003 x64, allowing an unprivileged user to execute arbitrary code with kernel privileges. Since the relevant portions of the two operating systems' kernels are so similar, the following discussion applies equally to both.
Although the two emulation flaws discussed in this document are entirely separate, techniques for exploiting them largely overlap. Either flaw can be used to cause an unexpected kernel exception with user GS active -- the first flaw causes a #GP fault on the IRETQ of a hardware interrupt handler (typically NT!KiInterruptDispatchNoLock), while the second flaw triggers a #DB trap on the first instruction of the INT 3 handler (NT!KiBreakpointTrap). Whether the #GP fault handler (NT!KiGeneralProtectionFault) or the #DB trap handler (NT!KiDebugTrapOrFault) is invoked with user GS and kernel mode indicated as the previous mode, both end up calling NT!KiExceptionDispatch, and from this point exploitation is essentially identical between the two flaws.
Since exploitability hinges entirely on user control of GS during the execution of GS-dependent kernel code, GS-relative memory accesses in the code path starting with the interrupt handler are of the most interest. NT!KiGeneralProtectionFault and NT!KiDebugTrapOrFault both include the "LDMXCSR DWORD PTR GS:[0x180]" instruction, which will raise an undesirable #GP fault if that DWORD contains invalid set flags, so GS:[0x180] (here referring to user GS, which will be treated like kernel GS during exploitation) should be assigned a value of zero.
The next important GS-relative access occurs in NT!KiDispatchException, which is called by NT!KiExceptionDispatch. Early in the function, the sequence "MOV RAX, GS:[0x20] / INC DWORD PTR [RAX+0x22A0]" is executed. (GS:[0x20] is the "KPCR.CurrentPrcb" pointer, and the field at offset 0x22A0 from there is "KPRCB.KeExceptionDispatchCount". On Vista x64, the increment is simply "INC DWORD PTR GS:[0x34FC]", and therefore cannot be used to modify arbitrary kernel memory.) Although other, subsequent GS-relative accesses are performed, controlling this increment alone is sufficient for exploitation.
After the increment, NT!KiDispatchException calls NT!KeContextFromKframes and then NT!KiPreprocessFault, neither of which makes notable use of GS. The next "CALL" instruction, "CALL QWORD PTR [NT!KiDebugRoutine]", reads a function pointer global variable that points to NT!KdpStub if the kernel is not being debugged, or NT!KdpTrap if a kernel debugger is attached to the system. (This exploitation technique has only been made successful for cases where the kernel is not being debugged, which is basically assumed to be the only real-world attack scenario.)
The NT!KiDebugRoutine function pointer is writable and can therefore be the target of the user-controllable increment. By pointing GS:[0x20] to &NT!KiDebugRoutine - 0x22A0 before exploiting one of the emulation flaws, NT!KiDebugRoutine will be incremented, and then its modified contents (NT!KdpStub + 1) will be called. The first instruction of NT!KdpStub is "SUB RSP, 0x58", which in machine code is "48/83/EC/58". Therefore, the instruction that gets executed at NT!KdpStub + 1 is "83/EC/58", or in assembly, "SUB ESP, 0x58". On the x64 architecture, instructions that perform a 32-bit write to a register implicitly zero the upper 32 bits of that register, so in this case, "SUB ESP, 0x58" subtracts 0x58 from RSP, then clears bits 63..32, resulting in an RSP that points into user-land.
If the kernel stack pointer can be leaked, or even guessed to within a reasonable range, then memory can be allocated that covers the address of the DWORD-truncated kernel stack pointer, meaning that the kernel stack -- and therefore kernel execution -- can be controlled once NT!KdpStub returns. Because user GS will remain active until the exploit payload has a chance to execute, any hardware interrupts (interrupts are enabled before NT!KiExceptionDispatch is called) or page faults that occur before execution reaches the payload will cause a cascade of exceptions that culminates in a triple fault (reboot). Fortunately, the critical window is small, and the exploit can take steps to reduce these risks, and even relatively reckless exploitation has proven to be reliable.
Windows Vista x64
As mentioned above, incrementing arbitrary kernel memory is not possible on Windows Vista x64, because the "INC" instruction of interest modifies a GS-relative DWORD directly (and therefore can only increment a DWORD in user GS), rather than dereferencing a pointer retrieved from a GS-relative field. By carefully crafting user GS data, it may be possible to allow kernel execution to continue without disruption until some other exploitable operation is reached (perhaps RtlCaptureContext as called by KeBugCheckEx), but as of this writing, no such technique has been attempted.
This document discloses details of two VMware emulation flaws that have been proven exploitable on Windows XP x64 and Windows Server 2003 x64 for gaining kernel privileges. Excerpts from the *BSD and Linux x64 kernel source are examined for the sake of illustrating their presumed exploitability or resilience. Techniques are also presented for exploiting the "GS mismatch" condition caused by inducing unexpected kernel exceptions on x64 operating systems; such techniques are not specific to these VMware flaws, and may be applied in any case where a GS mismatch arises. Very specific implementation details of exploitation are omitted.
Other specific means of causing an operating system to experience an unsafely-handled kernel exception are not considered, as they would constitute new operating system vulnerabilities. To researchers and developers interested in finding such vulnerabilities, the author recommends first examining any kernel code that constructs or modifies an IRETQ stack frame, since returning to a non-canonical RIP, or returning to 32-bit mode with RIP >= 4GB, is the most straightforward way to experience a kernel fault with user GS active.