The VMCS - Part #1
- Added disclaimer and links
- Added navigation links
This article is based on extensive research and I am not an expert in this field. The only intention I’ve got was “ to understand how virtualization is working and share it ” (a pretty naive target). Please tell me any mistake you discover. I will correct it as quick as possible.
So in the last part we talked about the life cycle of a VM.
But how does the processor gets the information for setting up the guest, what registers it should load and which interrupts or CPU-instructions cause a VM-Exit?
For all this information the VMM needs the Virtual Machine Control Structure (VMCS) which holds all the information about the configuration and which rules it has to obey. So, how does this structure looks like?
VMCS __________________________________ | | | ID [30:0] Shadow indicator  | |__________________________________|[byte 4] | | | abort indicator (boring) | |__________________________________|[byte 8] | | | Guest-state Area | |__________________________________| | | | Host-State Area | |__________________________________| | | | VM execution control fields | |__________________________________| | | | VM-Exit control fields | |__________________________________| | | | VM-Entry control fields | |__________________________________| | | | VM-Exit information fields | |__________________________________|[byte 4096]
The first fields are pretty self-explanatory. The first 31 bits (30:0) are the ID of the VMCS. Every guest gets its own VMCS. So it is possible to have different guests with different rule sets.
Although it seems reasonable that this ID is simply a memory address, it is actually not. In fact the processor returns the ID when you reading a model specific register (MSR).
xor eax, eax mov rcx, 0x480 ; copy id for MSR which holds information about VMCS readmsr ; stores VMCS identifier into eax
Model specific registers are simply registers on you processor which either hold information (CPU name, cache size, features the CPU supports, debug info, timer etc.) or can be used to change the behavior of the CPU (mostly turn features on or off).
Bit 31 marks whether it is a self updating copy of an already existing VMCS (isn’t important for us now) and bits 63:32 are always zero, except an error occurred during a VM-Exit.
The rest of the VMCS is the actual important stuff. Here are the rules for all the magic written down.
The guest-state area
Remember that the processor had to save all the register values and some other information of a guest when it exits? Well, these values get in this area. For example:
The host state area
is the equivalent to the guest-state area. It safes more or less the same registers as the guest-state area, but on a VM-Entry.
VM-execution control fields
Here you define what a guest is allowed to process and what causes a VM-Exit:
Over simplified explanation of interrupts
An interrupt is a simple term to describe an interruption (o.O) of the processor by hardware or some software. E.g. when you press your power button long enough it will send an interrupt request (IRQ) to you processor. The CPU will look up the IRQ in a interrupt descriptor table (IDT) and executes the instructions to which the entry in the IDT is pointing to. In this example it will start a quick shutdown routine. In other cases it could give you a bluescreen or does something imperceptible.
- Processor based controls
Setting those bits you can flag some instructions so they lead to a VM-Exit. For example: Read or write to control register 03 and 08, use I/O instructions, read/write the local descriptor table or exit at any instruction.
Note: Not every processor supports all the settings listed in the Intel manuals. Hypervisors like Xen need to read MSRs before it turns some of those settings on.
- Additionally, you can define whether the VMM wants to use bitmaps. Bitmaps? What are bitmaps? Bitmaps are like arrays of bits and each bit represents an interrupt, a MSR or an I/O instruction (depends whether you’ve got a I/O, a MSR or Interrupt bitmap).
If for example an interrupt occurs while the guest is running on the CPU, the processor looks up the equivalent bit in the bitmap. If it is 1 -> VM-Exit. Is it 0? No VM-Exit. Same procedure if the guest wants to use a MSR or I/O instructions.
Of cause there are way more settings but we want to keep it simple (and I’m not smart enough to understand them all ).
This field mainly defines which registers are loaded or stored during a VM-Exit (mostly MSRs and debug registers).
As above the VM-Entry controls list which registers are loaded or stored on VM-Entry.
But on top of that the VMM can specify interrupts which shall be injected into the guests’ execution flow.
To inject an interrupt in the guests’ execution flow, the VMM just have to specify the entry in the interrupt descriptor table. The processor will then prepare the necessary registers (depends on the interrupt) and executes the interrupt-handler once it returned into non-root mode.
VM-Exit information field
How does the VMM knows the reason for the VM-Exit? Because the processor sets the relevant bits in this field. The VMM can read them afterwards and knows, which instruction the guest tried to execute.
Where the VMCS is stored in the memory is totally up to the processor (researchers have found a way to find it anyway in memory).
So in order to read and write bits in the VMCS you have to call the functions VMREAD/VMWRITE.
mov rax, 0x6800 ; 0x6800 encodes the guest-state field which holds the value of CR0 mov rbx, 0x1337 ; set value which is written vmwrite rax, rbx ; writes value to VMCS
At this point just think about those settings again.
Before the code of a guest can run on the processor, the VMM gets a chance to manipulate the whole environment of the guest.
Registers, interrupts, IO related things and the memory (we’ll come to this in the next part). So the VMM holds all the basic hardware strings in its hands. Hypervisors like Xen, KVM or Hyper-V are built on top of that and can provide you with a way more abstract and easier way to spawn VMs and manage them.
This was the second part. Please let me know if you have any questions or want more details.
In case you want to program a basic hypervisor yourself, you can follow the first of those links down there.
Do it yourself
- https://rayanfam.com/topics/hypervisor-from-scratch-part-1/ (building a kernel module)
- https://software.intel.com/sites/default/files/managed/7c/f1/326019-sdm-vol-3c.pdf (raw information)
- https://www.codeproject.com/Articles/215458/Virtualization-for-System-Programmers (same content like this, but different words and with PoC)
But I want to read about hacky stuff!1!