Commit Graph

422 Commits (95cb229530f329ec8002274891793be9c91385f7)

Author SHA1 Message Date
Avi Kivity e5bb40251a KVM: Drop kvm_{load,put}_guest_fpu() exports
Not used anymore.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:36:04 -03:00
Avi Kivity 2608d7a12f KVM: Allow kvm_load_guest_fpu() even when !vcpu->fpu_active
This allows accessing the guest fpu from the instruction emulator, as well as
being symmetric with kvm_put_guest_fpu().

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:36:03 -03:00
Gleb Natapov ab344828eb KVM: x86: fix checking of cr0 validity
Move to/from Control Registers chapter of Intel SDM says.  "Reserved bits
in CR0 remain clear after any load of those registers; attempts to set
them have no impact". Control Register chapter says "Bits 63:32 of CR0 are
reserved and must be written with zeros. Writing a nonzero value to any
of the upper 32 bits results in a general-protection exception, #GP(0)."

This patch tries to implement this twisted logic.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Reported-by: Lorenzo Martignoni <martignlo@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:36:03 -03:00
Jan Kiszka c76de350c8 KVM: SVM: Clean up and enhance mov dr emulation
Enhance mov dr instruction emulation used by SVM so that it properly
handles dr4/5: alias to dr6/7 if cr4.de is cleared. Otherwise return
EMULATE_FAIL which will let our only possible caller in that scenario,
ud_interception, re-inject UD.

We do not need to inject faults, SVM does this for us (exceptions take
precedence over instruction interceptions). For the same reason, the
value overflow checks can be removed.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:36:02 -03:00
Marcelo Tosatti 6474920477 KVM: fix cleanup_srcu_struct on vm destruction
cleanup_srcu_struct on VM destruction remains broken:

BUG: unable to handle kernel paging request at ffffffffffffffff
IP: [<ffffffff802533d2>] srcu_read_lock+0x16/0x21
RIP: 0010:[<ffffffff802533d2>]  [<ffffffff802533d2>] srcu_read_lock+0x16/0x21
Call Trace:
 [<ffffffffa05354c4>] kvm_arch_vcpu_uninit+0x1b/0x48 [kvm]
 [<ffffffffa05339c6>] kvm_vcpu_uninit+0x9/0x15 [kvm]
 [<ffffffffa0569f7d>] vmx_free_vcpu+0x7f/0x8f [kvm_intel]
 [<ffffffffa05357b5>] kvm_arch_destroy_vm+0x78/0x111 [kvm]
 [<ffffffffa053315b>] kvm_put_kvm+0xd4/0xfe [kvm]

Move it to kvm_arch_destroy_vm.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
2010-03-01 12:36:01 -03:00
Gleb Natapov ccd469362e KVM: fix Hyper-V hypercall warnings and wrong mask value
Fix compilation warnings and wrong mask value.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:36:01 -03:00
Gleb Natapov c25bc1638a KVM: Implement NotifyLongSpinWait HYPER-V hypercall
Windows issues this hypercall after guest was spinning on a spinlock
for too many iterations.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:36:00 -03:00
Gleb Natapov 10388a0716 KVM: Add HYPER-V apic access MSRs
Implement HYPER-V apic MSRs. Spec defines three MSRs that speed-up
access to EOI/TPR/ICR apic registers for PV guests.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:36:00 -03:00
Gleb Natapov 55cd8e5a4e KVM: Implement bare minimum of HYPER-V MSRs
Minimum HYPER-V implementation should have GUEST_OS_ID, HYPERCALL and
VP_INDEX MSRs.

[avi: fix build on i386]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Vadim Rozenfeld <vrozenfe@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:57 -03:00
Avi Kivity f9a48e6a18 KVM: Set cr0.et when the guest writes cr0
Follow the hardware.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:51 -03:00
Avi Kivity 02daab21d9 KVM: Lazify fpu activation and deactivation
Defer fpu deactivation as much as possible - if the guest fpu is loaded, keep
it loaded until the next heavyweight exit (where we are forced to unload it).
This reduces unnecessary exits.

We also defer fpu activation on clts; while clts signals the intent to use the
fpu, we can't be sure the guest will actually use it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:50 -03:00
Avi Kivity 4d4ec08745 KVM: Replace read accesses of vcpu->arch.cr0 by an accessor
Since we'd like to allow the guest to own a few bits of cr0 at times, we need
to know when we access those bits.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:50 -03:00
Sheng Yang 17cc393596 KVM: x86: Rename gb_page_enable() to get_lpage_level() in kvm_x86_ops
Then the callback can provide the maximum supported large page level, which
is more flexible.

Also move the gb page support into x86_64 specific.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:46 -03:00
Marcelo Tosatti 79fac95ecf KVM: convert slots_lock to a mutex
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:45 -03:00
Marcelo Tosatti f656ce0185 KVM: switch vcpu context to use SRCU
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:45 -03:00
Marcelo Tosatti e93f8a0f82 KVM: convert io_bus to SRCU
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:45 -03:00
Marcelo Tosatti a983fb2387 KVM: x86: switch kvm_set_memory_alias to SRCU update
Using a similar two-step procedure as for memslots.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:45 -03:00
Marcelo Tosatti b050b015ab KVM: use SRCU for dirty log
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:44 -03:00
Marcelo Tosatti f7784b8ec9 KVM: split kvm_arch_set_memory_region into prepare and commit
Required for SRCU convertion later.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:44 -03:00
Marcelo Tosatti fef9cce0eb KVM: modify alias layout in x86s struct kvm_arch
Have a pointer to an allocated region inside x86's kvm_arch.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:43 -03:00
Marcelo Tosatti 46a26bf557 KVM: modify memslots layout in struct kvm
Have a pointer to an allocated region inside struct kvm.

[alex: fix ppc book 3s]

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:43 -03:00
Sheng Yang 4e47c7a6d7 KVM: VMX: Add instruction rdtscp support for guest
Before enabling, execution of "rdtscp" in guest would result in #UD.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:40 -03:00
Sheng Yang 0e85188049 KVM: Add cpuid_update() callback to kvm_x86_ops
Sometime, we need to adjust some state in order to reflect guest CPUID
setting, e.g. if we don't expose rdtscp to guest, we won't want to enable
it on hardware. cpuid_update() is introduced for this purpose.

Also export kvm_find_cpuid_entry() for later use.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:40 -03:00
Sheng Yang 2bf78fa7b9 KVM: Extended shared_msr_global to per CPU
shared_msr_global saved host value of relevant MSRs, but it have an
assumption that all MSRs it tracked shared the value across the different
CPUs. It's not true with some MSRs, e.g. MSR_TSC_AUX.

Extend it to per CPU to provide the support of MSR_TSC_AUX, and more
alike MSRs.

Notice now the shared_msr_global still have one assumption: it can only deal
with the MSRs that won't change in host after KVM module loaded.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:40 -03:00
Avi Kivity fc78f51938 KVM: Add accessor for reading cr4 (or some bits of cr4)
Some bits of cr4 can be owned by the guest on vmx, so when we read them,
we copy them to the vcpu structure.  In preparation for making the set of
guest-owned bits dynamic, use helpers to access these bits so we don't need
to know where the bit resides.

No changes to svm since all bits are host-owned there.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:39 -03:00
Marcelo Tosatti cb84b55f6c KVM: x86: raise TSS exception for NULL CS and SS segments
Windows 2003 uses task switch to triple fault and reboot (the other
exception being reserved pdptrs bits).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:38 -03:00
Eddie Dong 3fd28fce76 KVM: x86: make double/triple fault promotion generic to all exceptions
Move Double-Fault generation logic out of page fault
exception generating function to cover more generic case.

Signed-off-by: Eddie Dong <eddie.dong@intel.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:38 -03:00
Jason Wang 923de3cf5b kvmclock: count total_sleep_time when updating guest clock
Current kvm wallclock does not consider the total_sleep_time which could cause
wrong wallclock in guest after host suspend/resume. This patch solve
this issue by counting total_sleep_time to get the correct host boot time.

Cc: stable@kernel.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-02-09 19:20:15 +02:00
Wei Yongjun 443c39bc9e KVM: x86: Fix leak of free lapic date in kvm_arch_vcpu_init()
In function kvm_arch_vcpu_init(), if the memory malloc for
vcpu->arch.mce_banks is fail, it does not free the memory
of lapic date. This patch fixed it.

Cc: stable@kernel.org
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25 12:26:40 -02:00
Wei Yongjun 36cb93fd6b KVM: x86: Fix probable memory leak of vcpu->arch.mce_banks
vcpu->arch.mce_banks is malloc in kvm_arch_vcpu_init(), but
never free in any place, this may cause memory leak. So this
patch fixed to free it in kvm_arch_vcpu_uninit().

Cc: stable@kernel.org
Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-01-25 12:26:40 -02:00
Jan Kiszka dab4b911a5 KVM: x86: Extend KVM_SET_VCPU_EVENTS with selective updates
User space may not want to overwrite asynchronously changing VCPU event
states on write-back. So allow to skip nmi.pending and sipi_vector by
setting corresponding bits in the flags field of kvm_vcpu_events.

[avi: advertise the bits in KVM_GET_VCPU_EVENTS]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-27 13:36:33 -02:00
Linus Torvalds ed9216c171 Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits)
  KVM: VMX: Fix comparison of guest efer with stale host value
  KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
  KVM: Drop user return notifier when disabling virtualization on a cpu
  KVM: VMX: Disable unrestricted guest when EPT disabled
  KVM: x86 emulator: limit instructions to 15 bytes
  KVM: s390: Make psw available on all exits, not just a subset
  KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
  KVM: VMX: Report unexpected simultaneous exceptions as internal errors
  KVM: Allow internal errors reported to userspace to carry extra data
  KVM: Reorder IOCTLs in main kvm.h
  KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
  KVM: only clear irq_source_id if irqchip is present
  KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
  KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
  KVM: VMX: Remove vmx->msr_offset_efer
  KVM: MMU: update invlpg handler comment
  KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
  KVM: remove duplicated task_switch check
  KVM: powerpc: Fix BUILD_BUG_ON condition
  KVM: VMX: Use shared msr infrastructure
  ...

Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile
2009-12-08 08:02:38 -08:00
Avi Kivity d5696725b2 KVM: VMX: Fix comparison of guest efer with stale host value
update_transition_efer() masks out some efer bits when deciding whether
to switch the msr during guest entry; for example, NX is emulated using the
mmu so we don't need to disable it, and LMA/LME are handled by the hardware.

However, with shared msrs, the comparison is made against a stale value;
at the time of the guest switch we may be running with another guest's efer.

Fix by deferring the mask/compare to the actual point of guest entry.

Noted by Marcelo.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:34:20 +02:00
Avi Kivity 3548bab501 KVM: Drop user return notifier when disabling virtualization on a cpu
This way, we don't leave a dangling notifier on cpu hotunplug or module
unload.  In particular, module unload leaves the notifier pointing into
freed memory.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:26 +02:00
Jan Kiszka 3cfc3092f4 KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Jan Kiszka 4f926bf291 KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
Decouple KVM_GUESTDBG_INJECT_DB and KVM_GUESTDBG_INJECT_BP from
KVM_GUESTDBG_ENABLE, their are actually orthogonal. At this chance,
avoid triggering the WARN_ON in kvm_queue_exception if there is already
an exception pending and reject such invalid requests.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:24 +02:00
Marcelo Tosatti 2204ae3c96 KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
Otherwise kvm might attempt to dereference a NULL pointer.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Marcelo Tosatti 3ddea128ad KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP.
Also serialize multiple accesses with kvm->lock.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:23 +02:00
Marcelo Tosatti 7c93be44a4 KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from
outside guest context. Similarly pdptrs are updated via load_pdptrs.

Let kvm_set_cr3 perform the update, removing it from the vcpu_run
fast path.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Acked-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:22 +02:00
Gleb Natapov 1655e3a3dc KVM: remove duplicated task_switch check
Probably introduced by a bad merge.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:22 +02:00
Avi Kivity 18863bdd60 KVM: x86 shared msr infrastructure
The various syscall-related MSRs are fairly expensive to switch.  Currently
we switch them on every vcpu preemption, which is far too often:

- if we're switching to a kernel thread (idle task, threaded interrupt,
  kernel-mode virtio server (vhost-net), for example) and back, then
  there's no need to switch those MSRs since kernel threasd won't
  be exiting to userspace.

- if we're switching to another guest running an identical OS, most likely
  those MSRs will have the same value, so there's little point in reloading
  them.

- if we're running the same OS on the guest and host, the MSRs will have
  identical values and reloading is unnecessary.

This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Glauber Costa afbcf7ab8d KVM: allow userspace to adjust kvmclock offset
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:19 +02:00
Jan Kiszka 94fe45da48 KVM: x86: Fix guest single-stepping while interruptible
Commit 705c5323 opened the doors of hell by unconditionally injecting
single-step flags as long as guest_debug signaled this. This doesn't
work when the guest branches into some interrupt or exception handler
and triggers a vmexit with flag reloading.

Fix it by saving cs:rip when user space requests single-stepping and
restricting the trace flag injection to this guest code position.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:19 +02:00
Ed Swierk ffde22ac53 KVM: Xen PV-on-HVM guest support
Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files.  When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]

Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:18 +02:00
Jan Kiszka 94c30d9ca6 KVM: x86: Drop unneeded CONFIG_HAS_IOMEM check
This (broken) check dates back to the days when this code was shared
across architectures. x86 has IOMEM, so drop it.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:18 +02:00
Zachary Amsden 6b7d7e762b KVM: x86: Harden against cpufreq
If cpufreq can't determine the CPU khz, or cpufreq is not compiled in,
we should fallback to the measured TSC khz.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:18 +02:00
Joerg Roedel 532a46b989 KVM: SVM: Add tracepoint for skinit instruction
This patch adds a tracepoint for the event that the guest
executed the SKINIT instruction. This information is
important because SKINIT is an SVM extenstion not yet
implemented by nested SVM and we may need this information
for debugging hypervisors that do not yet run on nested SVM.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:16 +02:00
Joerg Roedel ec1ff79084 KVM: SVM: Add tracepoint for invlpga instruction
This patch adds a tracepoint for the event that the guest
executed the INVLPGA instruction.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:16 +02:00
Joerg Roedel 236649de33 KVM: SVM: Add tracepoint for #vmexit because intr pending
This patch adds a special tracepoint for the event that a
nested #vmexit is injected because kvm wants to inject an
interrupt into the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:16 +02:00
Joerg Roedel 17897f3668 KVM: SVM: Add tracepoint for injected #vmexit
This patch adds a tracepoint for a nested #vmexit that gets
re-injected to the guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Joerg Roedel d8cabddf7e KVM: SVM: Add tracepoint for nested #vmexit
This patch adds a tracepoint for every #vmexit we get from a
nested guest.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Joerg Roedel 0ac406de8f KVM: SVM: Add tracepoint for nested vmrun
This patch adds a dedicated kvm tracepoint for a nested
vmrun.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:15 +02:00
Glauber Costa e3267cbbbf KVM: x86: include pvclock MSRs in msrs_to_save
For a while now, we are issuing a rdmsr instruction to find out which
msrs in our save list are really supported by the underlying machine.
However, it fails to account for kvm-specific msrs, such as the pvclock
ones.

This patch moves then to the beginning of the list, and skip testing them.

Cc: stable@kernel.org
Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:14 +02:00
Jan Kiszka 91586a3b7d KVM: x86: Rework guest single-step flag injection and filtering
Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Marcelo Tosatti a68a6a7282 KVM: x86: disable paravirt mmu reporting
Disable paravirt MMU capability reporting, so that new (or rebooted)
guests switch to native operation.

Paravirt MMU is a burden to maintain and does not bring significant
advantages compared to shadow anymore.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Jan Kiszka 355be0b930 KVM: x86: Refactor guest debug IOCTL handling
Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Juan Quintela 201d945bcf KVM: remove pre_task_link setting in save_state_to_tss16
Now, also remove pre_task_link setting in save_state_to_tss16.

  commit b237ac37a1
  Author: Gleb Natapov <gleb@redhat.com>
  Date:   Mon Mar 30 16:03:24 2009 +0300

    KVM: Fix task switch back link handling.

CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:13 +02:00
Zachary Amsden 0cca790753 KVM: Kill the confusing tsc_ref_khz and ref_freq variables
They are globals, not clearly protected by any ordering or locking, and
vulnerable to various startup races.

Instead, for variable TSC machines, register the cpufreq notifier and get
the TSC frequency directly from the cpufreq machinery.  Not only is it
always right, it is also perfectly accurate, as no error prone measurement
is required.

On such machines, when a new CPU online is brought online, it isn't clear what
frequency it will start with, and it may not correspond to the reference, thus
in hardware_enable we clear the cpu_tsc_khz variable to zero and make sure
it is set before running on a VCPU.

Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:12 +02:00
Zachary Amsden b820cc0ca2 KVM: Separate timer intialization into an indepedent function
Signed-off-by: Zachary Amsden <zamsden@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:12 +02:00
Alexander Graf 10474ae894 KVM: Activate Virtualization On Demand
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:10 +02:00
Avi Kivity 367e1319b2 KVM: Return -ENOTTY on unrecognized ioctls
Not the incorrect -EINVAL.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:08 +02:00
Gleb Natapov 680b3648ba KVM: Drop kvm->irq_lock lock from irq injection path
The only thing it protects now is interrupt injection into lapic and
this can work lockless. Even now with kvm->irq_lock in place access
to lapic is not entirely serialized since vcpu access doesn't take
kvm->irq_lock.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:08 +02:00
Gleb Natapov eba0226bdf KVM: Move IO APIC to its own lock
The allows removal of irq_lock from the injection path.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:08 +02:00
Avi Kivity 851ba6922a KVM: Don't pass kvm_run arguments
They're just copies of vcpu->run, which is readily accessible.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:06 +02:00
Ingo Molnar 96200591a3 Merge branch 'tracing/hw-breakpoints' into perf/core
Conflicts:
	arch/x86/kernel/kprobes.c
	kernel/trace/Makefile

Merge reason: hw-breakpoints perf integration is looking
              good in testing and in reviews, plus conflicts
              are mounting up - so merge & resolve.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-21 14:07:23 +01:00
Frederic Weisbecker 59d8eb53ea hw-breakpoints: Wrap in the KVM breakpoint active state check
Wrap in the cpu dr7 check that tells if we have active
breakpoints that need to be restored in the cpu.

This wrapper makes the check more self-explainable and also
reusable for any further other uses.

Reported-by: Jan Kiszka <jan.kiszka@web.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: "K. Prasad" <prasad@linux.vnet.ibm.com>
2009-11-10 11:23:43 +01:00
Frederic Weisbecker 24f1e32c60 hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events
This patch rebase the implementation of the breakpoints API on top of
perf events instances.

Each breakpoints are now perf events that handle the
register scheduling, thread/cpu attachment, etc..

The new layering is now made as follows:

       ptrace       kgdb      ftrace   perf syscall
          \          |          /         /
           \         |         /         /
                                        /
            Core breakpoint API        /
                                      /
                     |               /
                     |              /

              Breakpoints perf events

                     |
                     |

               Breakpoints PMU ---- Debug Register constraints handling
                                    (Part of core breakpoint API)
                     |
                     |

             Hardware debug registers

Reasons of this rewrite:

- Use the centralized/optimized pmu registers scheduling,
  implying an easier arch integration
- More powerful register handling: perf attributes (pinned/flexible
  events, exclusive/non-exclusive, tunable period, etc...)

Impact:

- New perf ABI: the hardware breakpoints counters
- Ptrace breakpoints setting remains tricky and still needs some per
  thread breakpoints references.

Todo (in the order):

- Support breakpoints perf counter events for perf tools (ie: implement
  perf_bpcounter_event())
- Support from perf tools

Changes in v2:

- Follow the perf "event " rename
- The ptrace regression have been fixed (ptrace breakpoint perf events
  weren't released when a task ended)
- Drop the struct hw_breakpoint and store generic fields in
  perf_event_attr.
- Separate core and arch specific headers, drop
  asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
- Use new generic len/type for breakpoint
- Handle off case: when breakpoints api is not supported by an arch

Changes in v3:

- Fix broken CONFIG_KVM, we need to propagate the breakpoint api
  changes to kvm when we exit the guest and restore the bp registers
  to the host.

Changes in v4:

- Drop the hw_breakpoint_restore() stub as it is only used by KVM
- EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
  module
- Restore the breakpoints unconditionally on kvm guest exit:
  TIF_DEBUG_THREAD doesn't anymore cover every cases of running
  breakpoints and vcpu->arch.switch_db_regs might not always be
  set when the guest used debug registers.
  (Waiting for a reliable optimization)

Changes in v5:

- Split-up the asm-generic/hw-breakpoint.h moving to
  linux/hw_breakpoint.h into a separate patch
- Optimize the breakpoints restoring while switching from kvm guest
  to host. We only want to restore the state if we have active
  breakpoints to the host, otherwise we don't care about messed-up
  address registers.
- Add asm/hw_breakpoint.h to Kbuild
- Fix bad breakpoint type in trace_selftest.c

Changes in v6:

- Fix wrong header inclusion in trace.h (triggered a build
  error with CONFIG_FTRACE_SELFTEST

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jan Kiszka <jan.kiszka@web.de>
Cc: Jiri Slaby <jirislaby@gmail.com>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Paul Mundt <lethal@linux-sh.org>
2009-11-08 15:34:42 +01:00
Gleb Natapov abb3911965 KVM: get_tss_base_addr() should return a gpa_t
If TSS we are switching to resides in high memory task switch will fail
since address will be truncated. Windows2k3 does this sometimes when
running with more then 4G

Cc: stable@kernel.org
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-11-04 12:42:36 -02:00
Jan Kiszka a9e38c3e01 KVM: x86: Catch potential overrun in MCE setup
We only allocate memory for 32 MCE banks (KVM_MAX_MCE_BANKS) but we
allow user space to fill up to 255 on setup (mcg_cap & 0xff), corrupting
kernel memory. Catch these overflows.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-11-04 12:42:35 -02:00
Frederic Weisbecker 0f8f86c7bd Merge commit 'perf/core' into perf/hw-breakpoint
Conflicts:
	kernel/Makefile
	kernel/trace/Makefile
	kernel/trace/trace.h
	samples/Makefile

Merge reason: We need to be uptodate with the perf events development
branch because we plan to rewrite the breakpoints API on top of
perf events.
2009-10-18 01:12:33 +02:00
Avi Kivity 6a54435560 KVM: Prevent overflow in KVM_GET_SUPPORTED_CPUID
The number of entries is multiplied by the entry size, which can
overflow on 32-bit hosts.  Bound the entry count instead.

Reported-by: David Wagner <daw@cs.berkeley.edu>
Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-10-04 17:04:16 +02:00
Ingo Molnar dca2d6ac09 Merge branch 'linus' into tracing/hw-breakpoints
Conflicts:
	arch/x86/kernel/process_64.c

Semantic conflict fixed in:
	arch/x86/kvm/x86.c

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-15 12:18:15 +02:00
Linus Torvalds 69def9f05d Merge branch 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (202 commits)
  MAINTAINERS: update KVM entry
  KVM: correct error-handling code
  KVM: fix compile warnings on s390
  KVM: VMX: Check cpl before emulating debug register access
  KVM: fix misreporting of coalesced interrupts by kvm tracer
  KVM: x86: drop duplicate kvm_flush_remote_tlb calls
  KVM: VMX: call vmx_load_host_state() only if msr is cached
  KVM: VMX: Conditionally reload debug register 6
  KVM: Use thread debug register storage instead of kvm specific data
  KVM guest: do not batch pte updates from interrupt context
  KVM: Fix coalesced interrupt reporting in IOAPIC
  KVM guest: fix bogus wallclock physical address calculation
  KVM: VMX: Fix cr8 exiting control clobbering by EPT
  KVM: Optimize kvm_mmu_unprotect_page_virt() for tdp
  KVM: Document KVM_CAP_IRQCHIP
  KVM: Protect update_cr8_intercept() when running without an apic
  KVM: VMX: Fix EPT with WP bit change during paging
  KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors
  KVM: x86 emulator: Add adc and sbb missing decoder flags
  KVM: Add missing #include
  ...
2009-09-14 17:43:43 -07:00
Avi Kivity 0a79b00952 KVM: VMX: Check cpl before emulating debug register access
Debug registers may only be accessed from cpl 0.  Unfortunately, vmx will
code to emulate the instruction even though it was issued from guest
userspace, possibly leading to an unexpected trap later.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10 18:11:10 +03:00
Marcelo Tosatti e3904e6ed0 KVM: x86: drop duplicate kvm_flush_remote_tlb calls
kvm_mmu_slot_remove_write_access already calls it.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 18:11:08 +03:00
Avi Kivity 3d53c27d05 KVM: Use thread debug register storage instead of kvm specific data
Instead of saving the debug registers from the processor to a kvm data
structure, rely in the debug registers stored in the thread structure.
This allows us not to save dr6 and dr7.

Reduces lightweight vmexit cost by 350 cycles, or 11 percent.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 18:11:04 +03:00
Avi Kivity 88c808fd42 KVM: Protect update_cr8_intercept() when running without an apic
update_cr8_intercept() can be triggered from userspace while there
is no apic present.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 10:46:54 +03:00
Mikhail Ershov d9048d3278 KVM: Use kvm_{read,write}_guest_virt() to read and write segment descriptors
Segment descriptors tables can be placed on two non-contiguous pages.
This patch makes reading segment descriptors by linear address.

Signed-off-by: Mikhail Ershov <Mike.Ershov@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 10:46:52 +03:00
Avi Kivity 56e8231841 KVM: Rename x86_emulate.c to emulate.c
We're in arch/x86, what could we possibly be emulating?

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 10:46:45 +03:00
Anthony Liguori c0c7c04b87 KVM: When switching to a vm8086 task, load segments as 16-bit
According to 16.2.5 in the SDM, eflags.vm in the tss is consulted before loading
and new segments.  If eflags.vm == 1, then the segments are treated as 16-bit
segments.  The LDTR and TR are not normally available in vm86 mode so if they
happen to somehow get loaded, they need to be treated as 32-bit segments.

This fixes an invalid vmentry failure in a custom OS that was happening after
a task switch into vm8086 mode.  Since the segments were being mistakenly
treated as 32-bit, we loaded garbage state.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 10:46:44 +03:00
Gleb Natapov cb142eb743 KVM: Update cr8 intercept when APIC TPR is changed by userspace
Since on vcpu entry we do it only if apic is enabled we should do
it when TPR is changed while apic is disabled. This happens when windows
resets HW without setting TPR to zero.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 10:46:39 +03:00
Amit Shah 1f3ee616dd KVM: ignore reads to perfctr msrs
We ignore writes to the perfctr msrs. Ignore reads as well.

Kaspersky antivirus crashes Windows guests if it can't read
these MSRs.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:21 +03:00
Jan Kiszka 07708c4af1 KVM: x86: Disallow hypercalls for guest callers in rings > 0
So far unprivileged guest callers running in ring 3 can issue, e.g., MMU
hypercalls. Normally, such callers cannot provide any hand-crafted MMU
command structure as it has to be passed by its physical address, but
they can still crash the guest kernel by passing random addresses.

To close the hole, this patch considers hypercalls valid only if issued
from guest ring 0. This may still be relaxed on a per-hypercall base in
the future once required.

Cc: stable@kernel.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:20 +03:00
Joerg Roedel 344f414fa0 KVM: report 1GB page support to userspace
If userspace knows that the kernel part supports 1GB pages it can enable
the corresponding cpuid bit so that guests actually use GB pages.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:19 +03:00
Mikhail Ershov 5f0269f5d7 KVM: Align cr8 threshold when userspace changes cr8
Commit f0a3602c20 ("KVM: Move interrupt injection logic to x86.c") does not
update the cr8 intercept if the lapic is disabled, so when userspace updates
cr8, the cr8 threshold control is not updated and we are left with illegal
control fields.

Fix by explicitly resetting the cr8 threshold.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:17 +03:00
Sheng Yang b927a3cec0 KVM: VMX: Introduce KVM_SET_IDENTITY_MAP_ADDR ioctl
Now KVM allow guest to modify guest's physical address of EPT's identity mapping page.

(change from v1, discard unnecessary check, change ioctl to accept parameter
address rather than value)

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10 08:33:16 +03:00
Akinobu Mita b792c344df KVM: x86: use kvm_get_gdt() and kvm_read_ldt()
Use kvm_get_gdt() and kvm_read_ldt() to reduce inline assembly code.

Cc: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10 08:33:15 +03:00
Akinobu Mita 46a359e715 KVM: x86: use get_desc_base() and get_desc_limit()
Use get_desc_base() and get_desc_limit() to get the base address and
limit in desc_struct.

Cc: Avi Kivity <avi@redhat.com>
Cc: kvm@vger.kernel.org
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-09-10 08:33:15 +03:00
Gleb Natapov a1b37100d9 KVM: Reduce runnability interface with arch support code
Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
interface between general code and arch code. kvm_arch_vcpu_runnable()
checks for interrupts instead.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:13 +03:00
Gleb Natapov b59bb7bdf0 KVM: Move exception handling to the same place as other events
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:13 +03:00
Gregory Haskins d34e6b175e KVM: add ioeventfd support
ioeventfd is a mechanism to register PIO/MMIO regions to trigger an eventfd
signal when written to by a guest.  Host userspace can register any
arbitrary IO address with a corresponding eventfd and then pass the eventfd
to a specific end-point of interest for handling.

Normal IO requires a blocking round-trip since the operation may cause
side-effects in the emulated model or may return data to the caller.
Therefore, an IO in KVM traps from the guest to the host, causes a VMX/SVM
"heavy-weight" exit back to userspace, and is ultimately serviced by qemu's
device model synchronously before returning control back to the vcpu.

However, there is a subclass of IO which acts purely as a trigger for
other IO (such as to kick off an out-of-band DMA request, etc).  For these
patterns, the synchronous call is particularly expensive since we really
only want to simply get our notification transmitted asychronously and
return as quickly as possible.  All the sychronous infrastructure to ensure
proper data-dependencies are met in the normal IO case are just unecessary
overhead for signalling.  This adds additional computational load on the
system, as well as latency to the signalling path.

Therefore, we provide a mechanism for registration of an in-kernel trigger
point that allows the VCPU to only require a very brief, lightweight
exit just long enough to signal an eventfd.  This also means that any
clients compatible with the eventfd interface (which includes userspace
and kernelspace equally well) can now register to be notified. The end
result should be a more flexible and higher performance notification API
for the backend KVM hypervisor and perhipheral components.

To test this theory, we built a test-harness called "doorbell".  This
module has a function called "doorbell_ring()" which simply increments a
counter for each time the doorbell is signaled.  It supports signalling
from either an eventfd, or an ioctl().

We then wired up two paths to the doorbell: One via QEMU via a registered
io region and through the doorbell ioctl().  The other is direct via
ioeventfd.

You can download this test harness here:

ftp://ftp.novell.com/dev/ghaskins/doorbell.tar.bz2

The measured results are as follows:

qemu-mmio:       110000 iops, 9.09us rtt
ioeventfd-mmio: 200100 iops, 5.00us rtt
ioeventfd-pio:  367300 iops, 2.72us rtt

I didn't measure qemu-pio, because I have to figure out how to register a
PIO region with qemu's device model, and I got lazy.  However, for now we
can extrapolate based on the data from the NULLIO runs of +2.56us for MMIO,
and -350ns for HC, we get:

qemu-pio:      153139 iops, 6.53us rtt
ioeventfd-hc: 412585 iops, 2.37us rtt

these are just for fun, for now, until I can gather more data.

Here is a graph for your convenience:

http://developer.novell.com/wiki/images/7/76/Iofd-chart.png

The conclusion to draw is that we save about 4us by skipping the userspace
hop.

--------------------

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:12 +03:00
Beth Kon e9f4275732 KVM: PIT support for HPET legacy mode
When kvm is in hpet_legacy_mode, the hpet is providing the timer
interrupt and the pit should not be. So in legacy mode, the pit timer
is destroyed, but the *state* of the pit is maintained. So if kvm or
the guest tries to modify the state of the pit, this modification is
accepted, *except* that the timer isn't actually started. When we exit
hpet_legacy_mode, the current state of the pit (which is up to date
since we've been accepting modifications) is used to restart the pit
timer.

The saved_mode code in kvm_pit_load_count temporarily changes mode to
0xff in order to destroy the timer, but then restores the actual
value, again maintaining "current" state of the pit for possible later
reenablement.

[avi: add some reserved storage in the ioctl; make SET_PIT2 IOW]
[marcelo: fix memory corruption due to reserved storage]

Signed-off-by: Beth Kon <eak@us.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:12 +03:00
Gleb Natapov 0d1de2d901 KVM: Always report x2apic as supported feature
We emulate x2apic in software, so host support is not required.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:11 +03:00
Gleb Natapov c7f0f24b1f KVM: No need to kick cpu if not in a guest mode
This will save a couple of IPIs.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Acked-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:11 +03:00
Andre Przywara f7c6d14003 KVM: fix MMIO_CONF_BASE MSR access
Some Windows versions check whether the BIOS has setup MMI/O for
config space accesses on AMD Fam10h CPUs, we say "no" by returning 0 on
reads and only allow disabling of MMI/O CfgSpace setup by igoring "0" writes.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:10 +03:00
Jan Kiszka dc7e795e3d Revert "KVM: x86: check for cr3 validity in ioctl_set_sregs"
This reverts commit 6c20e1442bb1c62914bb85b7f4a38973d2a423ba.

To my understanding, it became obsolete with the advent of the more
robust check in mmu_alloc_roots (89da4ff17f). Moreover, it prevents
the conceptually safe pattern

 1. set sregs
 2. register mem-slots
 3. run vcpu

by setting a sticky triple fault during step 1.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:09 +03:00
Andre Przywara 6098ca939e KVM: handle AMD microcode MSR
Windows 7 tries to update the CPU's microcode on some processors,
so we ignore the MSR write here. The patchlevel register is already handled
(returning 0), because the MSR number is the same as Intel's.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:08 +03:00
Gleb Natapov 0105d1a526 KVM: x2apic interface to lapic
This patch implements MSR interface to local apic as defines by x2apic
Intel specification.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:08 +03:00
Gleb Natapov fc61b800f9 KVM: Add Directed EOI support to APIC emulation
Directed EOI is specified by x2APIC, but is available even when lapic is
in xAPIC mode.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:07 +03:00
Avi Kivity aec51dc4f1 KVM: Trace mmio
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:07 +03:00
Andre Przywara c323c0e5f0 KVM: Ignore PCI ECS I/O enablement
Linux guests will try to enable access to the extended PCI config space
via the I/O ports 0xCF8/0xCFC on AMD Fam10h CPU. Since we (currently?)
don't use ECS, simply ignore write and read attempts.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:06 +03:00
Michael S. Tsirkin bda9020e24 KVM: remove in_range from io devices
This changes bus accesses to use high-level kvm_io_bus_read/kvm_io_bus_write
functions. in_range now becomes unused so it is removed from device ops in
favor of read/write callbacks performing range checks internally.

This allows aliasing (mostly for in-kernel virtio), as well as better error
handling by making it possible to pass errors up to userspace.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:05 +03:00
Michael S. Tsirkin 108b56690f KVM: switch pit creation to slots_lock
switch pit creation to slots_lock. slots_lock is already taken for read
everywhere, so we only need to take it for write when creating pit.
This is in preparation to removing in_range and kvm->lock around it.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:05 +03:00
Andre Przywara ed85c06853 KVM: introduce module parameter for ignoring unknown MSRs accesses
KVM will inject a #GP into the guest if that tries to access unhandled
MSRs. This will crash many guests. Although it would be the correct
way to actually handle these MSRs, we introduce a runtime switchable
module param called "ignore_msrs" (defaults to 0). If this is Y, unknown
MSR reads will return 0, while MSR writes are simply dropped. In both cases
we print a message to dmesg to inform the user about that.

You can change the behaviour at any time by saying:

 # echo 1 > /sys/modules/kvm/parameters/ignore_msrs

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:03 +03:00
Andre Przywara 1fdbd48c24 KVM: ignore reads from AMDs C1E enabled MSR
If the Linux kernel detects an C1E capable AMD processor (K8 RevF and
higher), it will access a certain MSR on every attempt to go to halt.
Explicitly handle this read and return 0 to let KVM run a Linux guest
with the native AMD host CPU propagated to the guest.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:03 +03:00
Andre Przywara 8f1589d95e KVM: ignore AMDs HWCR register access to set the FFDIS bit
Linux tries to disable the flush filter on all AMD K8 CPUs. Since KVM
does not handle the needed MSR, the injected #GP will panic the Linux
kernel. Ignore setting of the HWCR.FFDIS bit in this MSR to let Linux
boot with an AMD K8 family guest CPU.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:02 +03:00
Marcelo Tosatti 894a9c5543 KVM: x86: missing locking in PIT/IRQCHIP/SET_BSP_CPU ioctl paths
Correct missing locking in a few places in x86's vm_ioctl handling path.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:02 +03:00
Andre Przywara 0cb5762ed2 KVM: Allow emulation of syscalls instructions on #UD
Add the opcodes for syscall, sysenter and sysexit to the list of instructions
handled by the undefined opcode handler.

Signed-off-by: Christoph Egger <christoph.egger@amd.com>
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:59 +03:00
Marcelo Tosatti 229456fc34 KVM: convert custom marker based tracing to event traces
This allows use of the powerful ftrace infrastructure.

See Documentation/trace/ for usage information.

[avi, stephen: various build fixes]
[sheng: fix control register breakage]

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:59 +03:00
Gleb Natapov 6edf14d8d0 KVM: Replace pending exception by PF if it happens serially
Replace previous exception with a new one in a hope that instruction
re-execution will regenerate lost exception.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:56 +03:00
Andre Przywara 71db602322 KVM: Move performance counter MSR access interception to generic x86 path
The performance counter MSRs are different for AMD and Intel CPUs and they
are chosen mainly by the CPUID vendor string. This patch catches writes to
all addresses (regardless of VMX/SVM path) and handles them in the generic
MSR handler routine. Writing a 0 into the event select register is something
we perfectly emulate ;-), so don't print out a warning to dmesg in this
case.
This fixes booting a 64bit Windows guest with an AMD CPUID on an Intel host.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:54 +03:00
Gleb Natapov 988a2cae6a KVM: Use macro to iterate over vcpus.
[christian: remove unused variables on s390]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:52 +03:00
Gleb Natapov c5af89b68a KVM: Introduce kvm_vcpu_is_bsp() function.
Use it instead of open code "vcpu_id zero is BSP" assumption.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:51 +03:00
Avi Kivity 43a3795a3a KVM: MMU: Adjust pte accessors to explicitly indicate guest or shadow pte
Since the guest and host ptes can have wildly different format, adjust
the pte accessor names to indicate on which type of pte they operate on.

No functional changes.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:51 +03:00
Marcelo Tosatti fa40a8214b KVM: switch irq injection/acking data structures to irq_lock
Protect irq injection/acking data structures with a separate irq_lock
mutex. This fixes the following deadlock:

CPU A                               CPU B
kvm_vm_ioctl_deassign_dev_irq()
  mutex_lock(&kvm->lock);            worker_thread()
  -> kvm_deassign_irq()                -> kvm_assigned_dev_interrupt_work_handler()
    -> deassign_host_irq()               mutex_lock(&kvm->lock);
      -> cancel_work_sync() [blocked]

[gleb: fix ia64 path]

Reported-by: Alex Williamson <alex.williamson@hp.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:49 +03:00
Avi Kivity 6de4f3ada4 KVM: Cache pdptrs
Instead of reloading the pdptrs on every entry and exit (vmcs writes on vmx,
guest memory access on svm) extract them on demand.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:46 +03:00
Gregory Haskins d76685c4a0 KVM: cleanup io_device code
We modernize the io_device code so that we use container_of() instead of
dev->private, and move the vtable to a separate ops structure
(theoretically allows better caching for multiple instances of the same
ops structure)

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:45 +03:00
Jan Kiszka c5ff41ce66 KVM: Allow PIT emulation without speaker port
The in-kernel speaker emulation is only a dummy and also unneeded from
the performance point of view. Rather, it takes user space support to
generate sound output on the host, e.g. console beeps.

To allow this, introduce KVM_CREATE_PIT2 which controls in-kernel
speaker port emulation via a flag passed along the new IOCTL. It also
leaves room for future extensions of the PIT configuration interface.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:41 +03:00
Gregory Haskins 721eecbf4f KVM: irqfd
KVM provides a complete virtual system environment for guests, including
support for injecting interrupts modeled after the real exception/interrupt
facilities present on the native platform (such as the IDT on x86).
Virtual interrupts can come from a variety of sources (emulated devices,
pass-through devices, etc) but all must be injected to the guest via
the KVM infrastructure.  This patch adds a new mechanism to inject a specific
interrupt to a guest using a decoupled eventfd mechnanism:  Any legal signal
on the irqfd (using eventfd semantics from either userspace or kernel) will
translate into an injected interrupt in the guest at the next available
interrupt window.

Signed-off-by: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:41 +03:00
Gleb Natapov 787ff73637 KVM: Drop interrupt shadow when single stepping should be done only on VMX
The problem exists only on VMX. Also currently we skip this step if
there is pending exception. The patch fixes this too.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:41 +03:00
Mark McLoughlin cb007648de KVM: fix cpuid E2BIG handling for extended request types
If we run out of cpuid entries for extended request types
we should return -E2BIG, just like we do for the standard
request types.

Signed-off-by: Mark McLoughlin <markmc@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:39 +03:00
Jaswinder Singh Rajput 60af2ecdc5 KVM: Use MSR names in place of address
Replace 0xc0010010 with MSR_K8_SYSCFG and 0xc0010015 with MSR_K7_HWCR.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:39 +03:00
Huang Ying 890ca9aefa KVM: Add MCE support
The related MSRs are emulated. MCE capability is exported via
extension KVM_CAP_MCE and ioctl KVM_X86_GET_MCE_CAP_SUPPORTED.  A new
vcpu ioctl command KVM_X86_SETUP_MCE is used to setup MCE emulation
such as the mcg_cap. MCE is injected via vcpu ioctl command
KVM_X86_SET_MCE. Extended machine-check state (MCG_EXT_P) and CMCI are
not implemented.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:39 +03:00
Jaswinder Singh Rajput af24a4e4ae KVM: Replace MSR_IA32_TIME_STAMP_COUNTER with MSR_IA32_TSC of msr-index.h
Use standard msr-index.h's MSR declaration.

MSR_IA32_TSC is better than MSR_IA32_TIME_STAMP_COUNTER as it also solves
80 column issue.

Signed-off-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:38 +03:00
Ingo Molnar 5f9ece0240 Merge commit 'v2.6.31-rc7' into x86/cleanups
Merge reason: we were on -rc1 before - go up to -rc7

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-24 12:25:54 +02:00
Marcin Slusarz 9f51e24ee8 x86: Use printk_once()
Signed-off-by: Marcin Slusarz <marcin.slusarz@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
LKML-Reference: <1249847649-11631-6-git-send-email-marcin.slusarz@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-09 22:28:34 +02:00
Marcelo Tosatti d6289b9365 KVM: x86: verify MTRR/PAT validity
Do not allow invalid memory types in MTRR/PAT (generating a #GP
otherwise).

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-08-05 13:58:16 +03:00
Jan Kiszka e125e7b694 KVM: Fix KVM_GET_MSR_INDEX_LIST
So far, KVM copied the emulated_msrs (only MSR_IA32_MISC_ENABLE) to a
wrong address in user space due to broken pointer arithmetic. This
caused subtle corruption up there (missing MSR_IA32_MISC_ENABLE had
probably no practical relevance). Moreover, the size check for the
user-provided kvm_msr_list forgot about emulated MSRs.

Cc: stable@kernel.org
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-08-05 13:58:03 +03:00
Amit Shah 9e6996240a KVM: Ignore reads to K7 EVNTSEL MSRs
In commit 7fe29e0faa we ignored the
reads to the P6 EVNTSEL MSRs. That fixed crashes on Intel machines.

Ignore the reads to K7 EVNTSEL MSRs as well to fix this on AMD
hosts.

This fixes Kaspersky antivirus crashing Windows guests on AMD hosts.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-28 14:10:31 +03:00
Gleb Natapov 8db3baa2db KVM: Disable CR8 intercept if tpr patching is active
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:59 +03:00
Gleb Natapov 36752c9b91 KVM: Do not migrate pending software interrupts.
INTn will be re-executed after migration. If we wanted to migrate
pending software interrupt we would need to migrate interrupt type
and instruction length too, but we do not have all required info on
SVM, so SVM->VMX migration would need to re-execute INTn anyway. To
make it simple never migrate pending soft interrupt.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:59 +03:00
Gleb Natapov 6a8b1d1312 KVM: Always request IRQ/NMI window if an interrupt is pending
Currently they are not requested if there is pending exception.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:58 +03:00
Gleb Natapov 66fd3f7f90 KVM: Do not re-execute INTn instruction.
Re-inject event instead. This is what Intel suggest. Also use correct
instruction length when re-injecting soft fault/interrupt.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:58 +03:00
Gleb Natapov 923c61bbc6 KVM: Remove irq_pending bitmap
Only one interrupt vector can be injected from userspace irqchip at
any given time so no need to store it in a bitmap. Put it into interrupt
queue directly.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:57 +03:00
Gleb Natapov fa9726b073 KVM: Do not allow interrupt injection from userspace if there is a pending event.
The exception will immediately close the interrupt window.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:57 +03:00
Marcelo Tosatti 8986ecc0ef KVM: x86: check for cr3 validity in mmu_alloc_roots
Verify the cr3 address stored in vcpu->arch.cr3 points to an existant
memslot. If not, inject a triple fault.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:55 +03:00
Marcelo Tosatti 7c8a83b75a KVM: MMU: protect kvm_mmu_change_mmu_pages with mmu_lock
kvm_handle_hva, called by MMU notifiers, manipulates mmu data only with
the protection of mmu_lock.

Update kvm_mmu_change_mmu_pages callers to take mmu_lock, thus protecting
against kvm_handle_hva.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:54 +03:00
Glauber Costa 310b5d306c KVM: Deal with interrupt shadow state for emulated instructions
We currently unblock shadow interrupt state when we skip an instruction,
but failing to do so when we actually emulate one. This blocks interrupts
in key instruction blocks, in particular sti; hlt; sequences

If the instruction emulated is an sti, we have to block shadow interrupts.
The same goes for mov ss. pop ss also needs it, but we don't currently
emulate it.

Without this patch, I cannot boot gpxe option roms at vmx machines.
This is described at https://bugzilla.redhat.com/show_bug.cgi?id=494469

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:54 +03:00
Glauber Costa 2809f5d2c4 KVM: Replace ->drop_interrupt_shadow() by ->set_interrupt_shadow()
This patch replaces drop_interrupt_shadow with the more
general set_interrupt_shadow, that can either drop or raise
it, depending on its parameter.  It also adds ->get_interrupt_shadow()
for future use.

Signed-off-by: Glauber Costa <glommer@redhat.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:54 +03:00
Marcelo Tosatti 32f8840064 KVM: use smp_send_reschedule in kvm_vcpu_kick
KVM uses a function call IPI to cause the exit of a guest running on a
physical cpu. For virtual interrupt notification there is no need to
wait on IPI receival, or to execute any function.

This is exactly what the reschedule IPI does, without the overhead
of function IPI. So use it instead of smp_call_function_single in
kvm_vcpu_kick.

Also change the "guest_mode" variable to a bit in vcpu->requests, and
use that to collapse multiple IPI's that would be issued between the
first one and zeroing of guest mode.

This allows kvm_vcpu_kick to called with interrupts disabled.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:53 +03:00
Avi Kivity d149c731e4 KVM: Update cpuid 1.ecx reporting
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:53 +03:00
Avi Kivity 7faa4ee1c7 KVM: Add AMD cpuid bit: cr8_legacy, abm, misaligned sse, sse4, 3dnow prefetch
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:52 +03:00
Avi Kivity 8d753f369b KVM: Fix cpuid feature misreporting
MTRR, PAT, MCE, and MCA are all supported (to some extent) but not reported.
Vista requires these features, so if userspace relies on kernel cpuid
reporting, it loses support for Vista.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:52 +03:00
Jan Kiszka d6a8c875f3 KVM: Drop request_nmi from stats
The stats entry request_nmi is no longer used as the related user space
interface was dropped. So clean it up.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:51 +03:00
Sheng Yang 4b12f0de33 KVM: Replace get_mt_mask_shift with get_mt_mask
Shadow_mt_mask is out of date, now it have only been used as a flag to indicate
if TDP enabled. Get rid of it and use tdp_enabled instead.

Also put memory type logical in kvm_x86_ops->get_mt_mask().

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:49 +03:00
Jan Blunck 9b62e5b10f KVM: Wake up waitqueue before calling get_cpu()
This moves the get_cpu() call down to be called after we wake up the
waiters. Therefore the waitqueue locks can safely be rt mutex.

Signed-off-by: Jan Blunck <jblunck@suse.de>
Signed-off-by: Sven-Thorsten Dietrich <sven@thebigcorporation.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:49 +03:00
Gleb Natapov 14d0bc1f7c KVM: Get rid of get_irq() callback
It just returns pending IRQ vector from the queue for VMX/SVM.
Get IRQ directly from the queue before migration and put it back
after.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:49 +03:00
Gleb Natapov 16d7a19117 KVM: Fix userspace IRQ chip migration
Re-put pending IRQ vector into interrupt_bitmap before migration.
Otherwise it will be lost if migration happens in the wrong time.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:48 +03:00
Gleb Natapov 95ba827313 KVM: SVM: Add NMI injection support
Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:48 +03:00
Gleb Natapov 1d6ed0cb95 KVM: Remove inject_pending_vectors() callback
It is the same as inject_pending_irq() for VMX/SVM now.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:47 +03:00