Commit graph

1764 commits

Author SHA1 Message Date
Joerg Roedel
d09beabd7c KVM: x86 emulator: Add check_perm callback
This patch adds a check_perm callback for each opcode into
the instruction emulator. This will be used to do all
necessary permission checks on instructions before checking
whether they are intercepted or not.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:57:01 -04:00
Joerg Roedel
775fde8648 KVM: x86 emulator: Don't write-back cpu-state on X86EMUL_INTERCEPTED
This patch prevents the changed CPU state to be written back
when the emulator detected that the instruction was
intercepted by the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:57:00 -04:00
Avi Kivity
3c6e276f22 KVM: x86 emulator: add SVM intercepts
Add intercept codes for instructions defined by SVM as
interceptable.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:57:00 -04:00
Avi Kivity
c4f035c60d KVM: x86 emulator: add framework for instruction intercepts
When running in guest mode, certain instructions can be intercepted by
hardware.  This also holds for nested guests running on emulated
virtualization hardware, in particular instructions emulated by kvm
itself.

This patch adds a framework for intercepting instructions.  If an
instruction is marked for interception, and if we're running in guest
mode, a callback is called to check whether an intercept is needed or
not.  The callback is called at three points in time: immediately after
beginning execution, after checking privilge exceptions, and after
checking memory exception.  This suits the different interception points
defined for different instructions and for the various virtualization
instruction sets.

In addition, a new X86EMUL_INTERCEPT is defined, which any callback or
memory access may define, allowing the more complicated intercepts to be
implemented in existing callbacks.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:57:00 -04:00
Avi Kivity
aa97bb4891 KVM: x86 emulator: implement movdqu instruction (f3 0f 6f, f3 0f 7f)
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:59 -04:00
Avi Kivity
1253791df9 KVM: x86 emulator: SSE support
Add support for marking an instruction as SSE, switching registers used
to the SSE register file.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:59 -04:00
Avi Kivity
0d7cdee83a KVM: x86 emulator: Specialize decoding for insns with 66/f2/f3 prefixes
Most SIMD instructions use the 66/f2/f3 prefixes to distinguish between
different variants of the same instruction.  Usually the encoding is quite
regular, but in some cases (including non-SIMD instructions) the prefixes
generate very different instructions.  Examples include XCHG/PAUSE,
MOVQ/MOVDQA/MOVDQU, and MOVBE/CRC32.

Allow the emulator to handle these special cases by splitting such opcodes
into groups, with different decode flags and execution functions for different
prefixes.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:59 -04:00
Avi Kivity
5037f6f324 KVM: x86 emulator: define callbacks for using the guest fpu within the emulator
Needed for emulating fpu instructions.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:58 -04:00
Avi Kivity
1d6b114f20 KVM: x86 emulator: do not munge rep prefix
Currently we store a rep prefix as 1 or 2 depending on whether it is a REPE or
REPNE.  Since sse instructions depend on the prefix value, store it as the
original opcode to simplify things further on.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:58 -04:00
Avi Kivity
cef4dea07f KVM: 16-byte mmio support
Since sse instructions can issue 16-byte mmios, we need to support them.  We
can't increase the kvm_run mmio buffer size to 16 bytes without breaking
compatibility, so instead we break the large mmios into two smaller 8-byte
ones.  Since the bus is 64-bit we aren't breaking any atomicity guarantees.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:58 -04:00
Avi Kivity
5287f194bf KVM: Split mmio completion into a function
Make room for sse mmio completions.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:58 -04:00
Avi Kivity
70252a1053 KVM: extend in-kernel mmio to handle >8 byte transactions
Needed for coalesced mmio using sse.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:58 -04:00
Gleb Natapov
1499e54af0 KVM: x86: better fix for race between nmi injection and enabling nmi window
Fix race between nmi injection and enabling nmi window in a simpler way.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-05-11 07:56:57 -04:00
Marcelo Tosatti
c761e5868e Revert "KVM: Fix race between nmi injection and enabling nmi window"
This reverts commit f86368493e.

Simpler fix to follow.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-05-11 07:56:57 -04:00
Glauber Costa
3291892450 KVM: expose async pf through our standard mechanism
As Avi recently mentioned, the new standard mechanism for exposing features
is KVM_GET_SUPPORTED_CPUID, not spamming CAPs. For some reason async pf
missed that.

So expose async_pf here.

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: Gleb Natapov <gleb@redhat.com>
CC: Avi Kivity <avi@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:57 -04:00
Avi Kivity
654f06fc65 KVM: VMX: simplify NMI mask management
Use vmx_set_nmi_mask() instead of open-coding management of
the hardware bit and the software hint (nmi_known_unmasked).

There's a slight change of behaviour when running without
hardware virtual NMI support - we now clear the NMI mask if
NMI delivery faulted in that case as well.  This improves
emulation accuracy.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:57 -04:00
Jan Kiszka
89a9fb78b5 KVM: SVM: Remove unused svm_features
We use boot_cpu_has now.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:57 -04:00
Avi Kivity
8878647585 KVM: VMX: Use cached VM_EXIT_INTR_INFO in handle_exception
vmx_complete_atomic_exit() cached it for us, so we can use it here.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:56 -04:00
Avi Kivity
c5ca8e572c KVM: VMX: Don't VMREAD VM_EXIT_INTR_INFO unconditionally
Only read it if we're going to use it later.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:56 -04:00
Avi Kivity
00eba012d5 KVM: VMX: Refactor vmx_complete_atomic_exit()
Move the exit reason checks to the front of the function, for early
exit in the common case.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:56 -04:00
Avi Kivity
f9902069c4 KVM: VMX: Qualify check for host NMI
Check for the exit reason first; this allows us, later,
to avoid a VMREAD for VM_EXIT_INTR_INFO_FIELD.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:56 -04:00
Avi Kivity
9d58b93192 KVM: VMX: Avoid vmx_recover_nmi_blocking() when unneeded
When we haven't injected an interrupt, we don't need to recover
the nmi blocking state (since the guest can't set it by itself).
This allows us to avoid a VMREAD later on.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:56 -04:00
Avi Kivity
69c7302890 KVM: VMX: Cache cpl
We may read the cpl quite often in the same vmexit (instruction privilege
check, memory access checks for instruction and operands), so we gain
a bit if we cache the value.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:54 -04:00
Avi Kivity
f4c63e5d5a KVM: VMX: Optimize vmx_get_cpl()
In long mode, vm86 mode is disallowed, so we need not check for
it.  Reading rflags.vm may require a VMREAD, so it is expensive.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:54 -04:00
Avi Kivity
6de12732c4 KVM: VMX: Optimize vmx_get_rflags()
If called several times within the same exit, return cached results.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:54 -04:00
Avi Kivity
f6e7847589 KVM: Use kvm_get_rflags() and kvm_set_rflags() instead of the raw versions
Some rflags bits are owned by the host, not guest, so we need to use
kvm_get_rflags() to strip those bits away or kvm_set_rflags() to add them
back.

Signed-off-by: Avi Kivity <avi@redhat.com>
2011-05-11 07:56:54 -04:00
Andre Przywara
bd22f5cfcf KVM: move and fix substitue search for missing CPUID entries
If KVM cannot find an exact match for a requested CPUID leaf, the
code will try to find the closest match instead of simply confessing
it's failure.
The implementation was meant to satisfy the CPUID specification, but
did not properly check for extended and standard leaves and also
didn't account for the index subleaf.
Beside that this rule only applies to CPUID intercepts, which is not
the only user of the kvm_find_cpuid_entry() function.

So fix this algorithm and call it from kvm_emulate_cpuid().
This fixes a crash of newer Linux kernels as KVM guests on
AMD Bulldozer CPUs, where bogus values were returned in response to
a CPUID intercept.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-04-06 13:15:56 +03:00
Andre Przywara
20800bc940 KVM: fix XSAVE bit scanning
When KVM scans the 0xD CPUID leaf for propagating the XSAVE save area
leaves, it assumes that the leaves are contigious and stops at the
first zero one. On AMD hardware there is a gap, though, as LWP uses
leaf 62 to announce it's state save area.
So lets iterate through all 64 possible leaves and simply skip zero
ones to also cover later features.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-04-06 13:15:55 +03:00
Linus Torvalds
f2e1fbb5f2 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Flush TLB if PGD entry is changed in i386 PAE mode
  x86, dumpstack: Correct stack dump info when frame pointer is available
  x86: Clean up csum-copy_64.S a bit
  x86: Fix common misspellings
  x86: Fix misspelling and align params
  x86: Use PentiumPro-optimized partial_csum() on VIA C7
2011-03-18 10:45:21 -07:00
Lucas De Marchi
0d2eb44f63 x86: Fix common misspellings
They were generated by 'codespell' and then manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
Cc: trivial@kernel.org
LKML-Reference: <1300389856-1099-3-git-send-email-lucas.demarchi@profusion.mobi>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-03-18 10:39:30 +01:00
Gleb Natapov
776e58ea3d KVM: unbreak userspace that does not sets tss address
Commit 6440e5967bc broke old userspaces that do not set tss address
before entering vcpu. Unbreak it by setting tss address to a safe
value on the first vcpu entry. New userspaces should set tss address,
so print warning in case it doesn't.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-17 13:08:35 -03:00
Xiao Guangrong
0f53b5b1c0 KVM: MMU: cleanup pte write path
This patch does:
- call vcpu->arch.mmu.update_pte directly
- use gfn_to_pfn_atomic in update_pte path

The suggestion is from Avi.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:35 -03:00
Xiao Guangrong
5d163b1c9d KVM: MMU: introduce a common function to get no-dirty-logged slot
Cleanup the code of pte_prefetch_gfn_to_memslot and mapping_level_dirty_bitmap

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:34 -03:00
Xiao Guangrong
40dcaa9f69 KVM: fix rcu usage in init_rmode_* functions
fix:
[ 3494.671786] stack backtrace:
[ 3494.671789] Pid: 10527, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23
[ 3494.671790] Call Trace:
[ 3494.671796]  [] ? lockdep_rcu_dereference+0x9d/0xa5
[ 3494.671826]  [] ? kvm_memslots+0x6b/0x73 [kvm]
[ 3494.671834]  [] ? gfn_to_memslot+0x16/0x4f [kvm]
[ 3494.671843]  [] ? gfn_to_hva+0x16/0x27 [kvm]
[ 3494.671851]  [] ? kvm_write_guest_page+0x31/0x83 [kvm]
[ 3494.671861]  [] ? kvm_clear_guest_page+0x1a/0x1c [kvm]
[ 3494.671867]  [] ? vmx_set_tss_addr+0x83/0x122 [kvm_intel]

and:
[ 8328.789599] stack backtrace:
[ 8328.789601] Pid: 18736, comm: qemu-system-x86 Not tainted 2.6.38-rc6+ #23
[ 8328.789603] Call Trace:
[ 8328.789609]  [] ? lockdep_rcu_dereference+0x9d/0xa5
[ 8328.789621]  [] ? kvm_memslots+0x6b/0x73 [kvm]
[ 8328.789628]  [] ? gfn_to_memslot+0x16/0x4f [kvm]
[ 8328.789635]  [] ? gfn_to_hva+0x16/0x27 [kvm]
[ 8328.789643]  [] ? kvm_write_guest_page+0x31/0x83 [kvm]
[ 8328.789699]  [] ? kvm_clear_guest_page+0x1a/0x1c [kvm]
[ 8328.789713]  [] ? vmx_create_vcpu+0x316/0x3c8 [kvm_intel]

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:34 -03:00
Nikola Ciprich
1aa8ceef03 KVM: fix kvmclock regression due to missing clock update
commit 387b9f97750444728962b236987fbe8ee8cc4f8c moved kvm_request_guest_time_update(vcpu),
breaking 32bit SMP guests using kvm-clock. Fix this by moving (new) clock update function
to proper place.

Signed-off-by: Nikola Ciprich <nikola.ciprich@linuxbox.cz>
Acked-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:33 -03:00
Gleb Natapov
399a40c92d KVM: emulator: Fix permission checking in io permission bitmap
Currently if io port + len crosses 8bit boundary in io permission bitmap the
check may allow IO that otherwise should not be allowed. The patch fixes that.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:33 -03:00
Gleb Natapov
5601d05b8c KVM: emulator: Fix io permission checking for 64bit guest
Current implementation truncates upper 32bit of TR base address during IO
permission bitmap check. The patch fixes this.

Reported-and-tested-by: Francis Moreau <francis.moro@gmail.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:33 -03:00
Avi Kivity
831ca6093c KVM: SVM: Load %gs earlier if CONFIG_X86_32_LAZY_GS=n
With CONFIG_CC_STACKPROTECTOR, we need a valid %gs at all times, so disable
lazy reload and do an eager reload immediately after the vmexit.

Reported-by: IVAN ANGELOV <ivangotoy@gmail.com>
Acked-By: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:33 -03:00
Takuya Yoshikawa
afc20184b7 KVM: x86: Remove useless regs_page pointer from kvm_lapic
Access to this page is mostly done through the regs member which holds
the address to this page.  The exceptions are in vmx_vcpu_reset() and
kvm_free_lapic() and these both can easily be converted to using regs.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:33 -03:00
Xiao Guangrong
676646ee4b KVM: MMU: remove unused macros
These macros are not used, so removed

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:32 -03:00
Xiao Guangrong
842f22ed9b KVM: MMU: cleanup page alloc and free
Using __get_free_page instead of alloc_page and page_address,
using free_page instead of __free_page and virt_to_page

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:32 -03:00
Xiao Guangrong
49b26e26e4 KVM: MMU: do not record gfn in kvm_mmu_pte_write
No need to record the gfn to verifier the pte has the same mode as
current vcpu, it's because we only speculatively update the pte only
if the pte and vcpu have the same mode

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:32 -03:00
Xiao Guangrong
48c0e4e906 KVM: MMU: move mmu pages calculated out of mmu lock
kvm_mmu_calculate_mmu_pages need to walk all memslots and it's protected by
kvm->slots_lock, so move it out of mmu spinlock

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:32 -03:00
Xiao Guangrong
1b7fd45c32 KVM: MMU: set spte accessed bit properly
Set spte accessed bit only if guest_initiated == 1 that means the really
accessed

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:32 -03:00
Xiao Guangrong
da8dc75f0c KVM: MMU: fix kvm_mmu_slot_remove_write_access dropping intermediate W bits
Only remove write access in the last sptes.

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:32 -03:00
Lai Jiangshan
1260edbe7d KVM: better readability of efer_reserved_bits
use EFER_SCE, EFER_LME and EFER_LMA instead of magic numbers.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:31 -03:00
Lai Jiangshan
d170c41906 KVM: Clear async page fault hash after switching to real mode
The hash array of async gfns may still contain some left gfns after
kvm_clear_async_pf_completion_queue() called, need to clear them.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:31 -03:00
Gleb Natapov
93ea5388ea KVM: VMX: Initialize vm86 TSS only once.
Currently vm86 task is initialized on each real mode entry and vcpu
reset. Initialization is done by zeroing TSS and updating relevant
fields. But since all vcpus are using the same TSS there is a race where
one vcpu may use TSS while other vcpu is initializing it, so the vcpu
that uses TSS will see wrong TSS content and will behave incorrectly.
Fix that by initializing TSS only once.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:31 -03:00
Gleb Natapov
a8ba6c2622 KVM: VMX: update live TR selector if it changes in real mode
When rmode.vm86 is active TR descriptor is updated with vm86 task values,
but selector is left intact. vmx_set_segment() makes sure that if TR
register is written into while vm86 is active the new values are saved
for use after vm86 is deactivated, but since selector is not updated on
vm86 activation/deactivation new value is lost. Fix this by writing new
selector into vmcs immediately.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2011-03-17 13:08:31 -03:00
Lai Jiangshan
a3b5ba49a8 KVM: VMX: add the __noclone attribute to vmx_vcpu_run
The changelog of 104f226 said "adds the __noclone attribute",
but it was missing in its patch. I think it is still needed.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2011-03-17 13:08:31 -03:00