linux

q3k/linux

Author	SHA1	Message	Date
Eddie Dong	8668a3c468	KVM: VMX: Reset mmu context when entering real mode Resetting an SMP guest will force AP enter real mode (RESET) with paging enabled in protected mode. While current enter_rmode() can only handle mode switch from nonpaging mode to real mode which leads to SMP reboot failure. Fix by reloading the mmu context on entering real mode. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-22 12:03:28 +02:00
Izik Eidus	7f2145ad6f	KVM: MMU: Set shadow pte atomically in mmu_pte_write_zap_pte() Setting shadow page table entry should be set atomicly using set_shadow_pte(). Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-22 12:03:28 +02:00
Avi Kivity	2e3e5882dc	KVM: MMU: Don't do GFP_NOWAIT allocations Before preempt notifiers, kvm needed to allocate memory with GFP_NOWAIT so as not to have to enable preemption and take a heavyweight exit. On oom, we'd fall back to a GFP_KERNEL allocation. With preemption notifiers, we can do a GFP_KERNEL allocation, and perform the heavyweight exit only if the kernel decides to put us to sleep. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:27 +02:00
Christian Ehrhardt	cbdd1bea2a	KVM: Rename kvm_arch_ops to kvm_x86_ops This patch just renames the current (misnamed) _arch namings to _x86 to ensure better readability when a real arch layer takes place. Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:27 +02:00
Shaohua Li	11ec280471	KVM: Convert vm lock to a mutex This allows the kvm mmu to perform sleepy operations, such as memory allocation. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:20 +02:00
Avi Kivity	15ad71460d	KVM: Use the scheduler preemption notifiers to make kvm preemptible Current kvm disables preemption while the new virtualization registers are in use. This of course is not very good for latency sensitive workloads (one use of virtualization is to offload user interface and other latency insensitive stuff to a container, so that it is easier to analyze the remaining workload). This patch re-enables preemption for kvm; preemption is now only disabled when switching the registers in and out, and during the switch to guest mode and back. Contains fixes from Shaohua Li <shaohua.li@intel.com>. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:20 +02:00
Shaohua Li	fe55188194	KVM: Move gfn_to_page out of kmap/unmap pairs gfn_to_page might sleep with swap support. Move it out of the kmap calls. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:19 +02:00
Rusty Russell	707d92fa72	KVM: Trivial: Use standard CR0 flags macros from asm/cpu-features.h The kernel now has asm/cpu-features.h: use those macros instead of inventing our own. Also spell out definition of CR0_RESEVED_BITS (no code change) and fix typo. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-10-13 10:18:18 +02:00
Avi Kivity	22d95b1282	KVM: MMU: Fix rare oops on guest context switch A guest context switch to an uncached cr3 can require allocation of shadow pages, but we only recycle shadow pages in kvm_mmu_page_fault(). Move shadow page recycling to mmu_topup_memory_caches(), which is called from both the page fault handler and from guest cr3 reload. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-09-14 13:59:55 -07:00
Avi Kivity	c4d198d518	KVM: MMU: Fix cleaning up the shadow page allocation cache __free_page() wants a struct page, not a virtual address. Signed-off-by: Avi Kivity <avi@qumranet.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-07-20 23:48:47 -07:00
Avi Kivity	c1158e63df	KVM: MMU: Fix oopses with SLUB The kvm mmu uses page->private on shadow page tables; so does slub, and an oops result. Fix by allocating regular pages for shadows instead of using slub. Tested-by: S.Çağlar Onur <caglar@pardus.org.tr> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-20 20:23:59 +03:00
Avi Kivity	90cb0529dd	KVM: Fix memory slot management functions for guest smp The memory slot management functions were oriented against vcpu 0, where they should be kvm-wide. This causes hangs starting X on guest smp. Fix by making the functions (and resultant tail in the mmu) non-vcpu-specific. Unfortunately this reduces the efficiency of the mmu object cache a bit. We may have to revisit this later. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-20 20:16:29 +03:00
Paul Mundt	20c2df83d2	mm: Remove slab destructors from kmem_cache_create(). Slab destructors were no longer supported after Christoph's `c59def9f22` change. They've been BUGs for both slab and slub, and slob never supported them either. This rips out support for the dtor pointer from kmem_cache_create() completely and fixes up every single callsite in the kernel (there were about 224, not including the slab allocator definitions themselves, or the documentation references). Signed-off-by: Paul Mundt <lethal@linux-sh.org>	2007-07-20 10:11:58 +09:00
Avi Kivity	e495606dd0	KVM: Clean up #includes Remove unnecessary ones, and rearange the remaining in the standard order. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:49 +03:00
Shaohua Li	88a97f0b2f	KVM: MMU: Fix Wrong tlb flush order Need to flush the tlb after updating a pte, not before. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:48 +03:00
Avi Kivity	d9e368d612	KVM: Flush remote tlbs when reducing shadow pte permissions When a vcpu causes a shadow tlb entry to have reduced permissions, it must also clear the tlb on remote vcpus. We do that by: - setting a bit on the vcpu that requests a tlb flush before the next entry - if the vcpu is currently executing, we send an ipi to make sure it exits before we continue Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:46 +03:00
Avi Kivity	7b53aa5650	KVM: Fix vcpu freeing for guest smp A vcpu can pin up to four mmu shadow pages, which means the freeing loop will never terminate. Fix by first unpinning shadow pages on all vcpus, then freeing shadow pages. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:45 +03:00
Avi Kivity	17c3ba9d37	KVM: Lazy guest cr3 switching Switch guest paging context may require us to allocate memory, which might fail. Instead of wiring up error paths everywhere, make context switching lazy and actually do the switch before the next guest entry, where we can return an error if allocation fails. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:45 +03:00
Avi Kivity	bd2b2baa5c	KVM: MMU: Remove unused large page marker This has not been used for some time, as the same information is available in the page header. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:45 +03:00
Avi Kivity	b64b3763a5	KVM: MMU: Don't cache guest access bits in the shadow page table This was once used to avoid accessing the guest pte when upgrading the shadow pte from read-only to read-write. But usually we need to set the guest pte dirty or accessed bits anyway, so this wasn't really exploited. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:44 +03:00
Avi Kivity	fd97dc516c	KVM: MMU: Simpify accessed/dirty/present/nx bit handling Always set the accessed and dirty bit (since having them cleared causes a read-modify-write cycle), always set the present bit, and copy the nx bit from the guest. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:44 +03:00
Avi Kivity	e663ee64ae	KVM: MMU: Make setting shadow ptes atomic on i386 Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:44 +03:00
Avi Kivity	97a0a01ea9	KVM: MMU: Fold fix_write_pf() into set_pte_common() This prevents some work from being performed twice, and, more importantly, reduces the number of places where we modify shadow ptes. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:43 +03:00
Avi Kivity	63b1ad24d2	KVM: MMU: Fold fix_read_pf() into set_pte_common() Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:43 +03:00
Avi Kivity	e60d75ea29	KVM: MMU: Move set_pte_common() to pte width dependent code In preparation of some modifications. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:43 +03:00
Avi Kivity	d3d25b048b	KVM: MMU: Use slab caches for shadow pages and their headers Use slab caches instead of a simple custom list. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:43 +03:00
Avi Kivity	47ad8e689b	KVM: MMU: Store shadow page tables as kernel virtual addresses, not physical Simpifies things a bit. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:40 +03:00
Avi Kivity	4b02d6daa1	KVM: MMU: Simplify kvm_mmu_free_page() a tiny bit Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:40 +03:00
Avi Kivity	0028425f64	KVM: Update shadow pte on write to guest pte A typical demand page/copy on write pattern is: - page fault on vaddr - kvm propagates fault to guest - guest handles fault, updates pte - kvm traps write, clears shadow pte, resumes guest - guest returns to userspace, re-faults on same vaddr - kvm installs shadow pte, resumes guest - guest continues So, three vmexits for a single guest page fault. But if instead of clearing the page table entry, we update to correspond to the value that the guest has just written, we eliminate the third vmexit. This patch does exactly that, reducing kbuild time by about 10%. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:39 +03:00
Avi Kivity	fce0657ff9	KVM: MMU: Respect nonpae pagetable quadrant when zapping ptes When a guest writes to a page that has an mmu shadow, we have to clear the shadow pte corresponding to the memory location touched by the guest. Now, in nonpae mode, a single guest page may have two or four shadow pages (because a nonpae page maps 4MB or 4GB, whereas the pae shadow maps 2MB or 1GB), so we when we look up the page we find up to three additional aliases for the page. Since we _clear_ the shadow pte, it doesn't matter except for a slight performance penalty, but if we want to _update_ the shadow pte instead of clearing it, it is vital that we don't modify the aliases. Fortunately, exactly which page is needed (the "quadrant") is easily computed, and is accessible in the shadow page header. All we need is to ignore shadow pages from the wrong quadrants. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:39 +03:00
Avi Kivity	09072daf37	KVM: Unify kvm_mmu_pre_write() and kvm_mmu_post_write() Instead of calling two functions and repeating expensive checks, call one function and provide it with before/after information. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:38 +03:00
Avi Kivity	e925c5ba93	KVM: Assume that writes smaller than 4 bytes are to non-pagetable pages This allows us to remove write protection earlier than otherwise. Should some mad OS choose to use byte writes to update pagetables, it will suffer a performance hit, but still work correctly. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-07-16 12:05:38 +03:00
Adrian Bunk	2807696c37	KVM: fix an if() condition It might have worked in this case since PT_PRESENT_MASK is 1, but let's express this correctly. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:31 +03:00
Avi Kivity	1165f5fec1	KVM: Per-vcpu statistics Make the exit statistics per-vcpu instead of global. This gives a 3.5% boost when running one virtual machine per core on my two socket dual core (4 cores total) machine. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:30 +03:00
Yaozu Dong	d6c69ee9a2	KVM: MMU: Avoid heavy ASSERT at non debug mode. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:30 +03:00
Avi Kivity	8c4385024d	KVM: Retry sleeping allocation if atomic allocation fails This avoids -ENOMEM under memory pressure. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:29 +03:00
Avi Kivity	b5a33a7572	KVM: Use slab caches to allocate mmu data structures Better leak detection, statistics, memory use, speed -- goodness all around. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:29 +03:00
Avi Kivity	417726a3fb	KVM: Handle partial pae pdptr Some guests (Solaris) do not set up all four pdptrs, but leave some invalid. kvm incorrectly treated these as valid page directories, pinning the wrong pages and causing general confusion. Fix by checking the valid bit of a pae pdpte. This closes sourceforge bug 1698922. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:29 +03:00
Avi Kivity	954bbbc236	KVM: Simply gfn_to_page() Mapping a guest page to a host page is a common operation. Currently, one has first to find the memory slot where the page belongs (gfn_to_memslot), then locate the page itself (gfn_to_page()). This is clumsy, and also won't work well with memory aliases. So simplify gfn_to_page() not to require memory slot translation first, and instead do it internally. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:28 +03:00
Dor Laor	e0fa826f96	KVM: Add mmu cache clear function Functions that play around with the physical memory map need a way to clear mappings to possibly nonexistent or invalid memory. Both the mmu cache and the processor tlb are cleared. Signed-off-by: Dor Laor <dor.laor@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:28 +03:00
Avi Kivity	36868f7b0e	KVM: Use list_move() Use list_move() where possible. Noticed by Dor Laor. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:27 +03:00
Avi Kivity	d28c6cfbbc	KVM: MMU: Fix hugepage pdes mapping same physical address with different access The kvm mmu keeps a shadow page for hugepage pdes; if several such pdes map the same physical address, they share the same shadow page. This is a fairly common case (kernel mappings on i386 nonpae Linux, for example). However, if the two pdes map the same memory but with different permissions, kvm will happily use the cached shadow page. If the access through the more permissive pde will occur after the access to the strict pde, an endless pagefault loop will be generated and the guest will make no progress. Fix by making the access permissions part of the cache lookup key. The fix allows Xen pae to boot on kvm and run guest domains. Thanks to Jeremy Fitzhardinge for reporting the bug and testing the fix. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:27 +03:00
Avi Kivity	aac012245a	KVM: MMU: Remove global pte tracking The initial, noncaching, version of the kvm mmu flushed the all nonglobal shadow page table translations (much like a native tlb flush). The new implementation flushes translations only when they change, rendering global pte tracking superfluous. This removes the unused tracking mechanism and storage space. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:25 +03:00
Avi Kivity	039576c03c	KVM: Avoid guest virtual addresses in string pio userspace interface The current string pio interface communicates using guest virtual addresses, relying on userspace to translate addresses and to check permissions. This interface cannot fully support guest smp, as the check needs to take into account two pages at one in case an unaligned string transfer straddles a page boundary. Change the interface not to communicate guest addresses at all; instead use a buffer page (mmaped by userspace) and do transfers there. The kernel manages the virtual to physical translation and can perform the checks atomically by taking the appropriate locks. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:25 +03:00
Avi Kivity	1ea252afcd	KVM: Fix bogus sign extension in mmu mapping audit When auditing a 32-bit guest on a 64-bit host, sign extension of the page table directory pointer table index caused bogus addresses to be shown on audit errors. Fix by declaring the index unsigned. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-05-03 10:52:23 +03:00
Avi Kivity	6b8d0f9b18	KVM: Fix off-by-one when writing to a nonpae guest pde Nonpae guest pdes are shadowed by two pae ptes, so we double the offset twice: once to account for the pte size difference, and once because we need to shadow pdes for a single guest pde. But when writing to the upper guest pde we also need to truncate the lower bits, otherwise the multiply shifts these bits into the pde index and causes an access to the wrong shadow pde. If we're at the end of the page (accessing the very last guest pde) we can even overflow into the next host page and oops. Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-04-19 18:39:26 +03:00
Avi Kivity	27aba76615	KVM: MMU: Fix host memory corruption on i386 with >= 4GB ram PAGE_MASK is an unsigned long, so using it to mask physical addresses on i386 (which are 64-bit wide) leads to truncation. This can result in page->private of unrelated memory pages being modified, with disasterous results. Fix by not using PAGE_MASK for physical addresses; instead calculate the correct value directly from PAGE_SIZE. Also fix a similar BUG_ON(). Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-18 10:49:09 +02:00
Avi Kivity	ac1b714e78	KVM: MMU: Fix guest writes to nonpae pde KVM shadow page tables are always in pae mode, regardless of the guest setting. This means that a guest pde (mapping 4MB of memory) is mapped to two shadow pdes (mapping 2MB each). When the guest writes to a pte or pde, we intercept the write and emulate it. We also remove any shadowed mappings corresponding to the write. Since the mmu did not account for the doubling in the number of pdes, it removed the wrong entry, resulting in a mismatch between shadow page tables and guest page tables, followed shortly by guest memory corruption. This patch fixes the problem by detecting the special case of writing to a non-pae pde and adjusting the address and number of shadow pdes zapped accordingly. Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-18 10:49:09 +02:00
Markus Rechberger	5972e9535e	KVM: Use page_private()/set_page_private() apis Besides using an established api, this allows using kvm in older kernels. Signed-off-by: Markus Rechberger <markus.rechberger@amd.com> Signed-off-by: Avi Kivity <avi@qumranet.com>	2007-03-04 11:12:39 +02:00
Al Viro	11718b4d6b	[PATCH] misc NULL noise removal Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-09 09:14:07 -08:00

1 2

84 commits