Menu

Extending the Traditional Hypervisor’s Approach of System Call Hooking in the Post-2018 Windows Operating Systems

2021-10-23 - Computer Architecture, Virtualization Technology

Introduction to KVA-Shadow

As the calamity called the Meltdown brought by Google Project Zero in early 2018, all OS vendors worked on mitigations against it. Microsoft developed a mechanism called the KVA-Shadowing. To understand what’s going on in KVA-Shadowing, we should observe the following:

With this knowledge, we may observe that if we change the value of LSTAR MSR to the address of our own proxy handler, a syscall instruction would of course trigger a #PF exception. This means the KiPageFaultShadow procedure would be invoked. This becomes interesting in that it seems most procedures in KVASCODE section do not necessarily switch page table. Let’s observe the KiPageFaultShadow function:

nt!KiPageFaultShadow:
fffff801`23211840 f644241001      test    byte ptr [rsp+10h],1
fffff801`23211845 746a            je      nt!KiPageFaultShadow+0x71 (fffff801`232118b1)

nt!KiPageFaultShadow+0x7:
fffff801`23211847 0f01f8          swapgs
fffff801`2321184a 0faee8          lfence
fffff801`2321184d 650fba24251890000001 bt  dword ptr gs:[9018h],1
fffff801`23211857 720c            jb      nt!KiPageFaultShadow+0x25 (fffff801`23211865)

nt!KiPageFaultShadow+0x19:
fffff801`23211859 65488b242500900000 mov   rsp,qword ptr gs:[9000h]
fffff801`23211862 0f22dc          mov     tmm,rsp                       ; Why would WinDbg disassemble cr3 register as tmm?

nt!KiPageFaultShadow+0x25:
fffff801`23211865 65488b242508900000 mov   rsp,qword ptr gs:[9008h]
fffff801`2321186e 654889342510000000 mov   qword ptr gs:[10h],rsi
fffff801`23211877 65488b342538000000 mov   rsi,qword ptr gs:[38h]
fffff801`23211880 4881c600420000  add     rsi,4200h
fffff801`23211887 ff76f8          push    qword ptr [rsi-8]
fffff801`2321188a ff76f0          push    qword ptr [rsi-10h]
fffff801`2321188d ff76e8          push    qword ptr [rsi-18h]
fffff801`23211890 ff76e0          push    qword ptr [rsi-20h]
fffff801`23211893 ff76d8          push    qword ptr [rsi-28h]
fffff801`23211896 ff76d0          push    qword ptr [rsi-30h]
fffff801`23211899 65488b342510000000 mov   rsi,qword ptr gs:[10h]
fffff801`232118a2 65488324251000000000 and qword ptr gs:[10h],0
fffff801`232118ac e94f379fff      jmp     nt!KiPageFault (fffff801`22c05000)

nt!KiPageFaultShadow+0x71:
fffff801`232118b1 0faee8          lfence
fffff801`232118b4 e947379fff      jmp     nt!KiPageFault (fffff801`22c05000)

The first instruction is worth mentioning in that it is actually testing whether the byte located in [rsp+10h] has bit 1 set. Because #PF exceptions have error codes, [rsp+10h] is actually pointing the cs selector, where the lowest 2 bits are actually referring to the CPL from which the #PF exception occurs. This means, if this page-fault occurs in kernel mode, the je instruction will be jumping to KiPageFaultShadow+0x71, which continues to jump to KiPageFault procedure. In terms of LSTAR hooking, the page fault occurs in kernel mode, yet the page table is remained unswitched by virtue of immediate exception as the handler is being invoked. Therefore, such jump would trigger page fault again, resulting in #DF failure. Interestingly, the KiDoubleFaultAbortShadow is not such a simpleton. It would check if the cr3 matches the one that maps the entire kernel space. Therefore, it would not trigger a triple-fault failure.

Similarly, if you set a breakpoint in the beginning of KiSystemCall64Shadow function, it would also induce a #DF failure. This is because a breakpoint could cause either #DB or #BP exception. Such exception, nevertheless, is evaluated as kernel-mode exception. Thus the RPL field of cs selector is zero. The KiDebugTrapOrFaultShadow function thinks it is unnecessary to switch page-table, so when it jumps to KiDebugTrapOrFault function, it triggers #PF exception. Once again, because the exception is taken in kernel-mode, recurrence of #PF is inevitable. Such recurrence would result in #DF failure.

How to Hook System Call MSR with Hypervisor and Be Compatible with KVA-Shadow

With the power of virtualization, we may set up the VMCS to intercept page-fault exceptions. Upon interception of page fault, read the Exit Qualification field in VMCS. This is the linear address where the page fault occurs. Compare it with our proxy function. If they match, switch the guest CR3 to the proper page table. The “proper page table” I am referring to is actually located in +0x28 offset of KPROCESS structure. In addition, don’t forget to invalidate TLB via invvpid instruction if VPID is enabled for guest, in that the guest should be running with a different set of address mapping. If the TLB of the guest is left not invalidated, all effort paid in #PF-interception would be in vain.

In terms of hardware-accelerated virtualization, the less VM-Exits the guest triggers, the better performance the guest could have. With Intel VT-x, we may filter unwanted page faults being intercepted by setting the Page-Fault Error-Code Mask and Page-Fault Error-Code Match fields in VMCS. When a page-fault occurs, the processor would get a masked error code by doing a logical-and with the error code of the page-fault and the mask set in VMCS, then compare the masked error code with the match set in VMCS. If they are equal and #PF-interception is set, a VM-Exit occurs, If they are not equal and the #PF-interception is reset, a VM-Exit also occurs. In this regard, we should observe the exact trait of the page-fault we intend to intercept: it comes from kernel-mode, and results by an instruction fetch. In this regard, we set the both I/D and U/S in the Page-Fault Error-Code Mask field in VMCS, but we only set the I/D bit in Page-Fault Error-Code Match field in VMCS. In this way, interceptions of common page-faults like memory swapping, writing to read-only pages, etc., could be circumvented and could thereby effectively increase performance.

Is this a throwback of Meltdown mitigation? No, the user-mode programs are still running with the page table that maps limited kernel-mode memory space, so this is not a throwback of Meltdown mitigation.

What about AMD-V? As a matter of fact, Windows disables KVA-Shadow mechanism by default if the machine has AMD CPU, so perhaps we don’t have to worry. However, AMD-V lacks the filtering feature of page-fault interceptions, so all #PF exceptions will be intercepted. Optimization technique used on Intel VT-x is infeasible on AMD-V.

Similar Approach of KVA-Shadow-Compatible System Call Hook

Aidan Khoury introduced a method to purposefully disable syscall and sysret instruction in the EFER MSR:
https://revers.engineering/syscall-hooking-via-extended-feature-enable-register-efer/

To do this, you will have to intercept #UD exceptions and emulate these instructions on your own. Such emulation is not arcane, but it is of course mundane. Do what is described in Intel’s manual. In addition, you will also have to intercept reads on the EFER MSR. Elsewise, the PatchGuard could detect this modification and trigger a BSOD.

You may do something special and favorable with your hypervisor: mitigate the CVE-2012-0217 vulnerability in your handler, albeit your system should have vulnerability this mitigated in that this vulnerability is disclosed in nearly ten years ago. Nonetheless, mitigation could actually be easier than you may even imagine: remove your canonicality check on the returning address because Intel VT-x does not require a canonical address to be loaded to the rip register. Hence, upon VM-Entry, the processor detects such canonicality violation, so it would trigger #GP with CPL=3 and the vulnerability is thereby mitigated.

Beware of Supervisor-Mode Access Prevention

Is game over? The answer is yes and no. Half of the answer is yes because our MSR-Hook should be compatible with KVA-Shadow mechanism, but please duly note there is something more. If you observe the syscall handler in later version of Windows 10 (at least on version 1903), you should see there is an interesting instruction: the stac instruction.

IDA disassembly of KiSystemCall64Shadow

Let’s inspect backward. We may observe the control flow could go toward the stac instruction if byte KeSmapEnabled is not zero. This brings up an interesting feature in x86 called Supervisor-Mode Access Prevention, often abbreviated as SMAP. This feature prevents kernel codes from accidentally accessing rogue user-mode memory without prior knowledge. For instance, OS Vendors may mitigate CVE-2018-8897 vulnerability by utilizing the SMAP feature so that user-mode memory will not be accidentally treated as kernel-mode memory. The general rule of SMAP is that if CR4.SMAP bit is set, accessing user-mode memory in kernel-mode while RFLAGS.AC is cleared would result in #PF exception. This, if not properly considered, would prevent the system call handler from accessing parameters saved in user-mode memory. Solution is easy: just execute the stac instruction before you access user-mode memory in your proxy system call handler.

Novel PatchGuard Trick

This novel PatchGuard trick was already introduced by Aidan Khoury:
https://revers.engineering/patchguard-detection-of-hypervisor-based-instrospection-p2/

Method of countering this trick was also introduced in that article: we may unhook the system call if the Guest is writing to LSTAR with something else and rehook the system call once the Guest is writing the original system call handler to LSTAR. Make sure that during the rdmsr interception, the unhook is manifested so that PatchGuard will not trigger a BSoD.

Summary

This blog introduced three things relevant to MSR-Hooking in latest versions of Windows.

  1. How to make your MSR-Hook compatible with KVA-Shadow mechanism.
  2. How to make your MSR-Hook compatible with SMAP processor feature.
  3. How to make your MSR-Hook compatible with novel PatchGuard trick.

By the way, feel free to visit Project NoirVisor on GitHub: https://github.com/Zero-Tang/NoirVisor

Leave a Reply

Your email address will not be published. Required fields are marked *