Commit Graph

365 Commits (4fdc18d15151f3e7cc43070ec8d13570431b5abc)

Author SHA1 Message Date
Alexander Graf 032c340731 KVM: PPC: Allocate vcpu struct using vmalloc
We used to use get_free_pages to allocate our vcpu struct. Unfortunately
that call failed on me several times after my machine had a big enough
uptime, as memory became too fragmented by then.

Fortunately, we don't need it to be page aligned any more! We can just
vmalloc it and everything's great.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:38:04 +03:00
Alexander Graf 964b6411af KVM: PPC: Simplify kvmppc_load_up_(FPU|VMX|VSX)
We don't need as complex code. I had some thinkos while writing it, figuring
I needed to support PPC32 paths on PPC64 which would have required DR=0, but
everything just runs fine with DR=1.

So let's make the functions simple C call wrappers that reserve some space on
the stack for the respective functions to clobber.

Fixes out-of-RMA-access (and thus guest FPU loading) on the PS3.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:38:01 +03:00
Alexander Graf 20a340abd3 KVM: PPC: Enable use of secondary htab bucket
We had code to make use of the secondary htab buckets, but kept that
disabled because it was unstable when I put it in.

I checked again if that's still the case and apparently it was only
exposing some instability that was there anyways before. I haven't
seen any badness related to usage of secondary htab entries so far.

This should speed up guest memory allocations by quite a bit, because
we now have more space to put PTEs in.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:37:58 +03:00
Alexander Graf c10207fe86 KVM: PPC: Add capability for paired singles
We need to tell userspace that we can emulate paired single instructions.
So let's add a capability export.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:37:47 +03:00
Alexander Graf 831317b605 KVM: PPC: Implement Paired Single emulation
The one big thing about the Gekko is paired singles.

Paired singles are an extension to the instruction set, that adds 32 single
precision floating point registers (qprs), some SPRs to modify the behavior
of paired singled operations and instructions to deal with qprs to the
instruction set.

Unfortunately, it also changes semantics of existing operations that affect
single values in FPRs. In most cases they get mirrored to the coresponding
QPR.

Thanks to that we need to emulate all FPU operations and all the new paired
single operations too.

In order to achieve that, we use the just introduced FPU call helpers to
call the real FPU whenever the guest wants to modify an FPR. Additionally
we also fix up the QPR values along the way.

That way we can execute paired single FPU operations without implementing a
soft fpu.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:27 +03:00
Alexander Graf e5c29e926c KVM: PPC: Enable program interrupt to do MMIO
When we get a program interrupt we usually don't expect it to perform an
MMIO operation. But why not? When we emulate paired singles, we can end
up loading or storing to an MMIO address - and the handling of those
happens in the program interrupt handler.

So let's teach the program interrupt handler how to deal with EMULATE_MMIO.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:24 +03:00
Alexander Graf dba2e123e7 KVM: PPC: Fix error in BAT assignment
BATs didn't work. Well, they did, but only up to BAT3. As soon as we
came to BAT4 the offset calculation was screwed up and we ended up
overwriting BAT0-3.

Fortunately, Linux hasn't been using BAT4+. It's still a good
idea to write correct code though.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:18 +03:00
Alexander Graf 963cf3dc63 KVM: PPC: Add helpers to call FPU instructions
To emulate paired single instructions, we need to be able to call FPU
operations from within the kernel. Since we don't want gcc to spill
arbitrary FPU code everywhere, we tell it to use a soft fpu.

Since we know we can really call the FPU in safe areas, let's also add
some calls that we can later use to actually execute real world FPU
operations on the host's FPU.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:15 +03:00
Alexander Graf aba3bd7ffe KVM: PPC: Make ext giveup non-static
We need to call the ext giveup handlers from code outside of book3s.c.
So let's make it non-static.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:12 +03:00
Alexander Graf 5467a97d0f KVM: PPC: Make software load/store return eaddr
The Book3S KVM implementation contains some helper functions to load and store
data from and to virtual addresses.

Unfortunately, this helper used to keep the physical address it so nicely
found out for us to itself. So let's change that and make it return the
physical address it resolved.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:09 +03:00
Alexander Graf 71db408936 KVM: PPC: Implement mtsr instruction emulation
The Book3S_32 specifications allows for two instructions to modify segment
registers: mtsrin and mtsr.

Most normal operating systems use mtsrin, because it allows to define which
segment it wants to change using a register. But since I was trying to run
an embedded guest, it turned out to be using mtsr with hardcoded values.

So let's also emulate mtsr. It's a valid instruction after all.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:06 +03:00
Alexander Graf e425a6de1a KVM: PPC: Fix typo in book3s_32 debug code
There's a typo in the debug ifdef of the book3s_32 mmu emulation. While trying
to debug something I stumbled across that and wanted to save anyone after me
(or myself later) from having to debug that again.

So let's fix the ifdef.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:35:03 +03:00
Alexander Graf d1bab74c51 KVM: PPC: Preload FPU when possible
There are some situations when we're pretty sure the guest will use the
FPU soon. So we can save the churn of going into the guest, finding out
it does want to use the FPU and going out again.

This patch adds preloading of the FPU when it's reasonable.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:34:59 +03:00
Alexander Graf c8c0b6f2f7 KVM: PPC: Combine extension interrupt handlers
When we for example get an Altivec interrupt, but our guest doesn't support
altivec, we need to inject a program interrupt, not an altivec interrupt.

The same goes for paired singles. When an altivec interrupt arrives, we're
pretty sure we need to emulate the instruction because it's a paired single
operation.

So let's make all the ext handlers aware that they need to jump to the
program interrupt handler when an extension interrupt arrives that
was not supposed to arrive for the guest CPU.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:34:56 +03:00
Alexander Graf d6d549b207 KVM: PPC: Add Gekko SPRs
The Gekko has some SPR values that differ from other PPC core values and
also some additional ones.

Let's add support for them in our mfspr/mtspr emulator.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:34:53 +03:00
Alexander Graf 3c402a75ea KVM: PPC: Add hidden flag for paired singles
The Gekko implements an extension called paired singles. When the guest wants
to use that extension, we need to make sure we're not running the host FPU,
because all FPU instructions need to get emulated to accomodate for additional
operations that occur.

This patch adds an hflag to track if we're in paired single mode or not.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:34:50 +03:00
Alexander Graf 37f5bca64e KVM: PPC: Add AGAIN type for emulation return
Emulation of an instruction can have different outcomes. It can succeed,
fail, require MMIO, do funky BookE stuff - or it can just realize something's
odd and will be fixed the next time around.

Exactly that is what EMULATE_AGAIN means. Using that flag we can now tell
the caller that nothing happened, but we still want to go back to the
guest and see what happens next time we come around.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:34:47 +03:00
Alexander Graf 3587d5348c KVM: PPC: Teach MMIO Signedness
The guest I was trying to get to run uses the LHA and LHAU instructions.
Those instructions basically do a load, but also sign extend the result.

Since we need to fill our registers by hand when doing MMIO, we also need
to sign extend manually.

This patch implements sign extended MMIO and the LHA(U) instructions.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:34:44 +03:00
Alexander Graf b104d06632 KVM: PPC: Enable MMIO to do 64 bits, fprs and qprs
Right now MMIO access can only happen for GPRs and is at most 32 bit wide.
That's actually enough for almost all types of hardware out there.

Unfortunately, the guest I was using used FPU writes to MMIO regions, so
it ended up writing 64 bit MMIOs using FPRs and QPRs.

So let's add code to handle those odd cases too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-04-25 12:34:41 +03:00
Takuya Yoshikawa 87bf6e7de1 KVM: fix the handling of dirty bitmaps to avoid overflows
Int is not long enough to store the size of a dirty bitmap.

This patch fixes this problem with the introduction of a wrapper
function to calculate the sizes of dirty bitmaps.

Note: in mark_page_dirty(), we have to consider the fact that
  __set_bit() takes the offset as int, not long.

Signed-off-by: Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-04-20 13:06:55 +03:00
Tejun Heo 5a0e3ad6af include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files.  percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed.  Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability.  As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

  http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
  only the necessary includes are there.  ie. if only gfp is used,
  gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
  blocks and try to put the new include such that its order conforms
  to its surrounding.  It's put in the include block which contains
  core kernel includes, in the same order that the rest are ordered -
  alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
  doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
  because the file doesn't have fitting include block), it prints out
  an error message indicating which .h file needs to be added to the
  file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
   over 4000 files, deleting around 700 includes and adding ~480 gfp.h
   and ~3000 slab.h inclusions.  The script emitted errors for ~400
   files.

2. Each error was manually checked.  Some didn't need the inclusion,
   some needed manual addition while adding it to implementation .h or
   embedding .c file was more appropriate for others.  This step added
   inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
   from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
   e.g. lib/decompress_*.c used malloc/free() wrappers around slab
   APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
   editing them as sprinkling gfp.h and slab.h inclusions around .h
   files could easily lead to inclusion dependency hell.  Most gfp.h
   inclusion directives were ignored as stuff from gfp.h was usually
   wildly available and often used in preprocessor macros.  Each
   slab.h inclusion directive was examined and added manually as
   necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
   were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
   distributed build env didn't work with gcov compiles) and a few
   more options had to be turned off depending on archs to make things
   build (like ipr on powerpc/64 which failed due to missing writeq).

   * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
   * powerpc and powerpc64 SMP allmodconfig
   * sparc and sparc64 SMP allmodconfig
   * ia64 SMP allmodconfig
   * s390 SMP allmodconfig
   * alpha SMP allmodconfig
   * um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
   a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2010-03-30 22:02:32 +09:00
Linus Torvalds c812a51d11 Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (145 commits)
  KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP
  KVM: VMX: Update instruction length on intercepted BP
  KVM: Fix emulate_sys[call, enter, exit]()'s fault handling
  KVM: Fix segment descriptor loading
  KVM: Fix load_guest_segment_descriptor() to inject page fault
  KVM: x86 emulator: Forbid modifying CS segment register by mov instruction
  KVM: Convert kvm->requests_lock to raw_spinlock_t
  KVM: Convert i8254/i8259 locks to raw_spinlocks
  KVM: x86 emulator: disallow opcode 82 in 64-bit mode
  KVM: x86 emulator: code style cleanup
  KVM: Plan obsolescence of kernel allocated slots, paravirt mmu
  KVM: x86 emulator: Add LOCK prefix validity checking
  KVM: x86 emulator: Check CPL level during privilege instruction emulation
  KVM: x86 emulator: Fix popf emulation
  KVM: x86 emulator: Check IOPL level during io instruction emulation
  KVM: x86 emulator: fix memory access during x86 emulation
  KVM: x86 emulator: Add Virtual-8086 mode of emulation
  KVM: x86 emulator: Add group9 instruction decoding
  KVM: x86 emulator: Add group8 instruction decoding
  KVM: do not store wqh in irqfd
  ...

Trivial conflicts in Documentation/feature-removal-schedule.txt
2010-03-05 13:12:34 -08:00
Liu Yu daf5e27109 KVM: ppc/booke: Set ESR and DEAR when inject interrupt to guest
Old method prematurely sets ESR and DEAR.
Move this part after we decide to inject interrupt,
which is more like hardware behave.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:36:10 -03:00
Liu Yu da15bf436b KVM: PPC E500: fix tlbcfg emulation
commit 55fb1027c1cf9797dbdeab48180da530e81b1c39 doesn't update tlbcfg correctly.
Fix it.

And since guest OS likes 'fixed' hardware,
initialize tlbcfg everytime when guest access is useless.
So move this part to init code.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:36:06 -03:00
Liu Yu a9040f2742 KVM: PPC: Add PVR/PIR init for E500
commit 513579e3a3 change the way
we emulate PVR/PIR,
which left PVR/PIR uninitialized on E500, and make guest puzzled.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:36:05 -03:00
Liu Yu d86be077a4 KVM: PPC E500: Add register l1csr0 emulation
Latest kernel start to access l1csr0 to contron L1.
We just tell guest no operation is on going.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:36:05 -03:00
Marcelo Tosatti 6474920477 KVM: fix cleanup_srcu_struct on vm destruction
cleanup_srcu_struct on VM destruction remains broken:

BUG: unable to handle kernel paging request at ffffffffffffffff
IP: [<ffffffff802533d2>] srcu_read_lock+0x16/0x21
RIP: 0010:[<ffffffff802533d2>]  [<ffffffff802533d2>] srcu_read_lock+0x16/0x21
Call Trace:
 [<ffffffffa05354c4>] kvm_arch_vcpu_uninit+0x1b/0x48 [kvm]
 [<ffffffffa05339c6>] kvm_vcpu_uninit+0x9/0x15 [kvm]
 [<ffffffffa0569f7d>] vmx_free_vcpu+0x7f/0x8f [kvm_intel]
 [<ffffffffa05357b5>] kvm_arch_destroy_vm+0x78/0x111 [kvm]
 [<ffffffffa053315b>] kvm_put_kvm+0xd4/0xfe [kvm]

Move it to kvm_arch_destroy_vm.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Reported-by: Jan Kiszka <jan.kiszka@siemens.com>
2010-03-01 12:36:01 -03:00
Alexander Graf a76f8497fd KVM: PPC: Move Shadow MSR calculation to function
We keep a copy of the MSR around that we use when we go into the guest context.

That copy is basically the normal process MSR flags OR some allowed guest
specified MSR flags. We also AND the external providers into this, so we get
traps on FPU usage when we haven't activated it on the host yet.

Currently this calculation is part of the set_msr function that we use whenever
we set the guest MSR value. With the external providers, we also have the case
that we don't modify the guest's MSR, but only want to update the shadow MSR.

So let's move the shadow MSR parts to a separate function that we then use
whenever we only need to update it. That way we don't accidently kvm_vcpu_block
within a preempt notifier context.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:56 -03:00
Alexander Graf f7adbba1e5 KVM: PPC: Keep SRR1 flags around in shadow_msr
SRR1 stores more information that just the MSR value. It also stores
valuable information about the type of interrupt we received, for
example whether the storage interrupt we just got was because of a
missing htab entry or not.

We use that information to speed up the exit path.

Now if we get preempted before we can interpret the shadow_msr values,
we get into vcpu_put which then calls the MSR handler, which then sets
all the SRR1 information bits in shadow_msr to 0. Great.

So let's preserve the SRR1 specific bits in shadow_msr whenever we set
the MSR. They don't hurt.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:56 -03:00
Alexander Graf 180a34d2d3 KVM: PPC: Add support for FPU/Altivec/VSX
When our guest starts using either the FPU, Altivec or VSX we need to make
sure Linux knows about it and sneak into its process switching code
accordingly.

This patch makes accesses to the above parts of the system work inside the
VM.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:52 -03:00
Alexander Graf d5e528136c KVM: PPC: Add helper functions to call real mode loaders
Linux contains quite some bits of code to load FPU, Altivec and VSX lazily for
a task. It calls those bits in real mode, coming from an interrupt handler.

For KVM we better reuse those, so let's wrap a bit of trampoline magic around
them and then we can call them from normal module code.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:52 -03:00
Alexander Graf 4b5c9b7f9b KVM: PPC: Make large pages work
An SLB entry contains two pieces of information related to size:

  1) PTE size
  2) SLB size

The L bit defines the PTE be "large" (usually means 16MB),
SLB_VSID_B_1T defines that the SLB should span 1 GB instead of the
default 256MB.

Apparently I messed things up and just put those two in one box,
shaked it heavily and came up with the current code which handles
large pages incorrectly, because it also treats large page SLB entries
as "1TB" segment entries.

This patch splits those two features apart, making Linux guests boot
even when they have > 256MB.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:50 -03:00
Alexander Graf 5f2b105a1d KVM: PPC: Pass through program interrupts
When we get a program interrupt in guest kernel mode, we try to emulate the
instruction.

If that doesn't fail, we report to the user and try again - at the exact same
instruction pointer. So if the guest kernel really does trigger an invalid
instruction, we loop forever.

So let's better go and forward program exceptions to the guest when we don't
know the instruction we're supposed to emulate.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:49 -03:00
Alexander Graf ff1ca3f983 KVM: PPC: Pass program interrupt flags to the guest
When we need to reinject a program interrupt into the guest, we also need to
reinject the corresponding flags into the guest.

Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:49 -03:00
Alexander Graf d35feb26ef KVM: PPC: Fix HID5 setting code
The code to unset HID5.dcbz32 is broken.
This patch makes it do the right rotate magic.

Signed-off-by: Alexander Graf <agraf@suse.de>
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:49 -03:00
Alexander Graf 25a8a02d26 KVM: PPC: Emulate trap SRR1 flags properly
Book3S needs some flags in SRR1 to get to know details about an interrupt.

One such example is the trap instruction. It tells the guest kernel that
a program interrupt is due to a trap using a bit in SRR1.

This patch implements above behavior, making WARN_ON behave like WARN_ON.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:49 -03:00
Alexander Graf 021ec9c69f KVM: PPC: Call SLB patching code in interrupt safe manner
Currently we're racy when doing the transition from IR=1 to IR=0, from
the module memory entry code to the real mode SLB switching code.

To work around that I took a look at the RTAS entry code which is faced
with a similar problem and did the same thing:

  A small helper in linear mapped memory that does mtmsr with IR=0 and
  then RFIs info the actual handler.

Thanks to that trick we can safely take page faults in the entry code
and only need to be really wary of what to do as of the SLB switching
part.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:49 -03:00
Alexander Graf bc90923e27 KVM: PPC: Get rid of unnecessary RFI
Using an RFI in IR=1 is dangerous. We need to set two SRRs and then do an RFI
without getting interrupted at all, because every interrupt could potentially
overwrite the SRR values.

Fortunately, we don't need to RFI in at least this particular case of the code,
so we can just replace it with an mtmsr and b.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:49 -03:00
Alexander Graf b4433a7cce KVM: PPC: Implement 'skip instruction' mode
To fetch the last instruction we were interrupted on, we enable DR in early
exit code, where we are still in a very transitional phase between guest
and host state.

Most of the time this seemed to work, but another CPU can easily flush our
TLB and HTAB which makes us go in the Linux page fault handler which totally
breaks because we still use the guest's SLB entries.

To work around that, let's introduce a second KVM guest mode that defines
that whenever we get a trap, we don't call the Linux handler or go into
the KVM exit code, but just jump over the faulting instruction.

That way a potentially bad lwz doesn't trigger any faults and we can later
on interpret the invalid instruction we fetched as "fetch didn't work".

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:48 -03:00
Alexander Graf 7e57cba060 KVM: PPC: Use PACA backed shadow vcpu
We're being horribly racy right now. All the entry and exit code hijacks
random fields from the PACA that could easily be used by different code in
case we get interrupted, for example by a #MC or even page fault.

After discussing this with Ben, we figured it's best to reserve some more
space in the PACA and just shove off some vcpu state to there.

That way we can drastically improve the readability of the code, make it
less racy and less complex.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:48 -03:00
Alexander Graf 992b5b29b5 KVM: PPC: Add helpers for CR, XER
We now have helpers for the GPRs, so let's also add some for CR and XER.

Having them in the PACA simplifies code a lot, as we don't need to care
about where to store CC or not to overflow any integers.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:47 -03:00
Alexander Graf 8e5b26b55a KVM: PPC: Use accessor functions for GPR access
All code in PPC KVM currently accesses gprs in the vcpu struct directly.

While there's nothing wrong with that wrt the current way gprs are stored
and loaded, it doesn't suffice for the PACA acceleration that will follow
in this patchset.

So let's just create little wrapper inline functions that we call whenever
a GPR needs to be read from or written to. The compiled code shouldn't really
change at all for now.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:47 -03:00
Alexander Graf 97c4cfbe89 KVM: PPC: Enable lightweight exits again
The PowerPC C ABI defines that registers r14-r31 need to be preserved across
function calls. Since our exit handler is written in C, we can make use of that
and don't need to reload r14-r31 on every entry/exit cycle.

This technique is also used in the BookE code and is called "lightweight exits"
there. To follow the tradition, it's called the same in Book3S.

So far this optimization was disabled though, as the code didn't do what it was
expected to do, but failed to work.

This patch fixes and enables lightweight exits again.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:46 -03:00
Alexander Graf b480f780f0 KVM: PPC: Fix typo in rebolting code
When we're loading bolted entries into the SLB again, we're checking if an
entry is in use and only slbmte it when it is.

Unfortunately, the check always goes to the skip label of the first entry,
resulting in an endless loop when it actually gets triggered.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:46 -03:00
Marcelo Tosatti 79fac95ecf KVM: convert slots_lock to a mutex
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:45 -03:00
Marcelo Tosatti f7784b8ec9 KVM: split kvm_arch_set_memory_region into prepare and commit
Required for SRCU convertion later.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:44 -03:00
Marcelo Tosatti 46a26bf557 KVM: modify memslots layout in struct kvm
Have a pointer to an allocated region inside struct kvm.

[alex: fix ppc book 3s]

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:43 -03:00
Alexander Graf 0bb1fb7178 KVM: powerpc: Remove AGGRESSIVE_DEC
Because we now emulate the DEC interrupt according to real life behavior,
there's no need to keep the AGGRESSIVE_DEC hack around.

Let's just remove it.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:42 -03:00
Alexander Graf 7706664d39 KVM: powerpc: Improve DEC handling
We treated the DEC interrupt like an edge based one. This is not true for
Book3s. The DEC keeps firing until mtdec is issued again and thus clears
the interrupt line.

So let's implement this logic in KVM too. This patch moves the line clearing
from the firing of the interrupt to the mtdec emulation.

This makes PPC64 guests work without AGGRESSIVE_DEC defined.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:42 -03:00
Alexander Graf 583617b786 KVM: powerpc: Move vector to irqprio resolving to separate function
We're using a switch table to find the irqprio that belongs to a specific
interrupt vector. This table is part of the interrupt inject logic.

Since we'll add a new function to stop interrupts, let's move this table
out of the injection logic into a separate function.

Signed-off-by: Alexander Graf <agraf@suse.de>
Acked-by: Acked-by: Hollis Blanchard <hollis@penguinppc.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:41 -03:00
Avi Kivity 50eb2a3cd0 KVM: Add KVM_MMIO kconfig item
s390 doesn't have mmio, this will simplify ifdefing it out.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:41 -03:00
David S. Miller 2bb4646fce Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-02-16 22:09:29 -08:00
Alexander Graf e1f829b6f4 KVM: powerpc: Show timing option only on embedded
Embedded PowerPC KVM has an exit timing implementation to track and evaluate
how much time was spent in which exit path.

For Book3S, we don't implement it. So let's not expose it as a config option
either.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-01-25 12:26:36 -02:00
David S. Miller 51c24aaaca Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-01-23 00:31:06 -08:00
Michael S. Tsirkin 3a4d5c94e9 vhost_net: a kernel-level virtio server
What it is: vhost net is a character device that can be used to reduce
the number of system calls involved in virtio networking.
Existing virtio net code is used in the guest without modification.

There's similarity with vringfd, with some differences and reduced scope
- uses eventfd for signalling
- structures can be moved around in memory at any time (good for
  migration, bug work-arounds in userspace)
- write logging is supported (good for migration)
- support memory table and not just an offset (needed for kvm)

common virtio related code has been put in a separate file vhost.c and
can be made into a separate module if/when more backends appear.  I used
Rusty's lguest.c as the source for developing this part : this supplied
me with witty comments I wouldn't be able to write myself.

What it is not: vhost net is not a bus, and not a generic new system
call. No assumptions are made on how guest performs hypercalls.
Userspace hypervisors are supported as well as kvm.

How it works: Basically, we connect virtio frontend (configured by
userspace) to a backend. The backend could be a network device, or a tap
device.  Backend is also configured by userspace, including vlan/mac
etc.

Status: This works for me, and I haven't see any crashes.
Compared to userspace, people reported improved latency (as I save up to
4 system calls per packet), as well as better bandwidth and CPU
utilization.

Features that I plan to look at in the future:
- mergeable buffers
- zero copy
- scalability tuning: figure out the best threading model to use

Note on RCU usage (this is also documented in vhost.h, near
private_pointer which is the value protected by this variant of RCU):
what is happening is that the rcu_dereference() is being used in a
workqueue item.  The role of rcu_read_lock() is taken on by the start of
execution of the workqueue item, of rcu_read_unlock() by the end of
execution of the workqueue item, and of synchronize_rcu() by
flush_workqueue()/flush_work(). In the future we might need to apply
some gcc attribute or sparse annotation to the function passed to
INIT_WORK(). Paul's ack below is for this RCU usage.

(Includes fixes by Alan Cox <alan@linux.intel.com>,
David L Stevens <dlstevens@us.ibm.com>,
Chris Wright <chrisw@redhat.com>)

Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-01-15 01:43:29 -08:00
Alexander Graf 5279aeb4b9 KVM: powerpc: Fix mtsrin in book3s_64 mmu
We were shifting the Ks/Kp/N bits one bit too far on mtsrin. It took
me some time to figure that out, so I also put in some debugging and a
comment explaining the conversion.

This fixes current OpenBIOS boot on PPC64 KVM.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-27 13:36:34 -02:00
Benjamin Herrenschmidt bcd6acd51f Merge commit 'origin/master' into next
Conflicts:
	include/linux/kvm.h
2009-12-09 17:14:38 +11:00
Alexander Graf e15a113700 powerpc/kvm: Sync guest visible MMU state
Currently userspace has no chance to find out which virtual address space we're
in and resolve addresses. While that is a big problem for migration, it's also
unpleasent when debugging, as gdb and the monitor don't work on virtual
addresses.

This patch exports enough of the MMU segment state to userspace to make
debugging work and thus also includes the groundwork for migration.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-12-08 16:02:50 +11:00
Hollis Blanchard c0a187e12d KVM: powerpc: Fix BUILD_BUG_ON condition
The old BUILD_BUG_ON implementation didn't work with __builtin_constant_p().
Fixing that revealed this test had been inverted for a long time without
anybody noticing...

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:22 +02:00
Alexander Graf 10474ae894 KVM: Activate Virtualization On Demand
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:10 +02:00
Avi Kivity 367e1319b2 KVM: Return -ENOTTY on unrecognized ioctls
Not the incorrect -EINVAL.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:08 +02:00
Benjamin Herrenschmidt e0ea8b2c06 powerpc/kvm: Fix non-modular build
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 17:17:12 +11:00
Benjamin Herrenschmidt d4e09f8432 Merge branch 'kvm' into next 2009-11-05 17:16:13 +11:00
Benjamin Herrenschmidt 41c8c46bfe powerpc/kvm: Remove problematic BUILD_BUG_ON statement
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 17:16:08 +11:00
Benjamin Herrenschmidt 38634e6769 powerpc/kvm: Remove problematic BUILD_BUG_ON statement
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 17:10:34 +11:00
Alexander Graf 544c6761bb Use hrtimers for the decrementer
Following S390's good example we should use hrtimers for the decrementer too!
This patch converts the timer from the old mechanism to hrtimers.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:51:05 +11:00
Alexander Graf 346b2762a7 Fix trace.h
It looks like the variable "pc" is defined. At least the current code always
failed on me stating that "pc" is already defined somewhere else.

Let's use _pc instead, because that doesn't collide.

Is this the right approach? Does it break on 440 too? If not, why not?

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:50:27 +11:00
Alexander Graf c4f9c779f1 Include Book3s_64 target in buildsystem
Now we have everything in place to be able to build KVM, so let's add it
as config option and in the Makefile.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:50:26 +11:00
Alexander Graf 0186fd0373 Export KVM symbols for module
To be able to keep KVM as module, we need to export the SLB trampoline
addresses to the module, so it knows where to jump to.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:50:25 +11:00
Alexander Graf 513579e3a3 Add desktop PowerPC specific emulation
Little opcodes behave differently on desktop and embedded PowerPC cores.
In order to reflect those differences, let's add some #ifdef code to emulate.c.

We could probably also handle them in the core specific emulation files, but I
would prefer to reuse as much code as possible.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:57 +11:00
Alexander Graf 9a7a9b09fe Add mfdec emulation
We support setting the DEC to a certain value right now. Doing that basically
triggers the CPU local timer.

But there's also an mfdec command that enabled the OS to read the decrementor.

This is required at least by all desktop and server PowerPC Linux kernels. It
can't really hurt to allow embedded ones to do it as well though.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:56 +11:00
Alexander Graf c215c6e49f Add book3s_64 specific opcode emulation
There are generic parts of PowerPC that can be shared across all
implementations and specific parts that only apply to BookE or desktop PPCs.

This patch adds emulation for desktop specific opcodes that don't apply
to BookE CPUs.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:55 +11:00
Alexander Graf 0123518081 Add book3s_32 guest MMU
This patch adds an implementation for a G3/G4 MMU, so we can run G3 and
G4 guests in KVM on Book3s_64.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:55 +11:00
Alexander Graf e71b2a39af Add book3s_64 guest MMU
To be able to run a guest, we also need to implement a guest MMU.

This patch adds MMU handling for Book3s_64 guests.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:55 +11:00
Alexander Graf 0d8dc681c8 Add book3s_64 Host MMU handling
We designed the Book3S port of KVM as modular as possible. Most
of the code could be easily used on a Book3S_32 host as well.

The main difference between 32 and 64 bit cores is the MMU. To keep
things well separated, we treat the book3s_64 MMU as one possible compile
option.

This patch adds all the MMU helpers the rest of the code needs in
order to modify the host's MMU, like setting PTEs and segments.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:54 +11:00
Alexander Graf 2f4cf5e42d Add book3s.c
This adds the book3s core handling file. Here everything that is generic to
desktop PowerPC cores is handled, including interrupt injections, MSR settings,
etc.

It basically takes over the same role as booke.c for embedded PowerPCs.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:54 +11:00
Alexander Graf c862125c8a Add interrupt handling code
Getting from host state to the guest is only half the story. We also need
to return to our host context and handle whatever happened to get us out of
the guest.

On PowerPC every guest exit is an interrupt. So all we need to do is trap
the host's interrupt handlers and get into our #VMEXIT code to handle it.

PowerPCs also have a register that can add an offset to the interrupt handlers'
adresses which is what the booke KVM code uses. Unfortunately that is a
hypervisor ressource and we also want to be able to run KVM when we're running
in an LPAR. So we have to hook into the Linux interrupt handlers.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:54 +11:00
Alexander Graf 5126ed3760 Add SLB switching code for entry/exit
This is the really low level of guest entry/exit code.

Book3s_64 has an SLB, which stores all ESID -> VSID mappings we're
currently aware of.

The segments in the guest differ from the ones on the host, so we need
to switch the SLB to tell the MMU that we're in a new context.

So we store a shadow of the guest's SLB in the PACA, switch to that on
entry and only restore bolted entries on exit, leaving the rest to the
Linux SLB fault handler.

That way we get a really clean way of switching the SLB.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:53 +11:00
Alexander Graf 29eb61bca1 Add book3s_64 highmem asm code
This is the of entry / exit code. In order to switch between host and guest
context, we need to switch register state and call the exit code handler on
exit.

This assembly file does exactly that. To finally enter the guest it calls
into book3s_64_slb.S. On exit it gets jumped at from book3s_64_slb.S too.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:53 +11:00
Alexander Graf 4e755758cb Move dirty logging code to sub-arch
PowerPC code handles dirty logging in the generic parts atm. While this
is great for "return -ENOTSUPP", we need to be rather target specific
when actually implementing it.

So let's split it to implementation specific code, so we can implement
it for book3s.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-11-05 16:49:51 +11:00
Alexey Dobriyan 828c09509b const: constify remaining file_operations
[akpm@linux-foundation.org: fix KVM]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-01 16:11:11 -07:00
Linus Torvalds 723e9db7a4 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (134 commits)
  powerpc/nvram: Enable use Generic NVRAM driver for different size chips
  powerpc/iseries: Fix oops reading from /proc/iSeries/mf/*/cmdline
  powerpc/ps3: Workaround for flash memory I/O error
  powerpc/booke: Don't set DABR on 64-bit BookE, use DAC1 instead
  powerpc/perf_counters: Reduce stack usage of power_check_constraints
  powerpc: Fix bug where perf_counters breaks oprofile
  powerpc/85xx: Fix SMP compile error and allow NULL for smp_ops
  powerpc/irq: Improve nanodoc
  powerpc: Fix some late PowerMac G5 with PCIe ATI graphics
  powerpc/fsl-booke: Use HW PTE format if CONFIG_PTE_64BIT
  powerpc/book3e: Add missing page sizes
  powerpc/pseries: Fix to handle slb resize across migration
  powerpc/powermac: Thermal control turns system off too eagerly
  powerpc/pci: Merge ppc32 and ppc64 versions of phb_scan()
  powerpc/405ex: support cuImage via included dtb
  powerpc/405ex: provide necessary fixup function to support cuImage
  powerpc/40x: Add support for the ESTeem 195E (PPC405EP) SBC
  powerpc/44x: Add Eiger AMCC (AppliedMicro) PPC460SX evaluation board support.
  powerpc/44x: Update Arches defconfig
  powerpc/44x: Update Arches dts
  ...

Fix up conflicts in drivers/char/agp/uninorth-agp.c
2009-09-15 09:51:09 -07:00
Gleb Natapov a1b37100d9 KVM: Reduce runnability interface with arch support code
Remove kvm_cpu_has_interrupt() and kvm_arch_interrupt_allowed() from
interface between general code and arch code. kvm_arch_vcpu_runnable()
checks for interrupts instead.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:13 +03:00
Marcelo Tosatti 2023a29cbe KVM: remove old KVMTRACE support code
Return EOPNOTSUPP for KVM_TRACE_ENABLE/PAUSE/DISABLE ioctls.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:03 +03:00
Marcelo Tosatti 46f43c6ee0 KVM: powerpc: convert marker probes to event trace
[avi: make it build]
[avi: fold trace-arch.h into trace.h]

CC: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:33:03 +03:00
Gleb Natapov 988a2cae6a KVM: Use macro to iterate over vcpus.
[christian: remove unused variables on s390]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:52 +03:00
Liu Yu 06579dd9c1 KVM: ppc: e500: Add MMUCFG and PVR emulation
Latest kernel started to use these two registers.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:47 +03:00
Liu Yu 5b7c1a2c17 KVM: ppc: e500: Directly pass pvr to guest
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:47 +03:00
Liu Yu 0cfb50e50d KVM: ppc: e500: Move to Book-3e MMU definitions
According to commit 70fe3af840.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:47 +03:00
Stephen Rothwell 2986b8c72c KVM: powerpc: fix some init/exit annotations
Fixes a couple of warnings like this one:

WARNING: arch/powerpc/kvm/kvm-440.o(.text+0x1e8c): Section mismatch in reference from the function kvmppc_44x_exit() to the function .exit.text:kvmppc_booke_exit()
The function kvmppc_44x_exit() references a function in an exit section.
Often the function kvmppc_booke_exit() has valid usage outside the exit section
and the fix is to remove the __exit annotation of kvmppc_booke_exit.

Also add some __init annotations on obvious routines.

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:44 +03:00
Avi Kivity 0ba12d1081 KVM: Move common KVM Kconfig items to new file virt/kvm/Kconfig
Reduce Kconfig code duplication.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-09-10 08:32:41 +03:00
Benjamin Herrenschmidt ee43eb788b powerpc: Use names rather than numbers for SPRGs (v2)
The kernel uses SPRG registers for various purposes, typically in
low level assembly code as scratch registers or to hold per-cpu
global infos such as the PACA or the current thread_info pointer.

We want to be able to easily shuffle the usage of those registers
as some implementations have specific constraints realted to some
of them, for example, some have userspace readable aliases, etc..
and the current choice isn't always the best.

This patch should not change any code generation, and replaces the
usage of SPRN_SPRGn everywhere in the kernel with a named replacement
and adds documentation next to the definition of the names as to
what those are used for on each processor family.

The only parts that still use the original numbers are bits of KVM
or suspend/resume code that just blindly needs to save/restore all
the SPRGs.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-08-20 10:12:27 +10:00
Michael Ellerman ba55bd7436 powerpc: Add configurable -Werror for arch/powerpc
Add the option to build the code under arch/powerpc with -Werror.

The intention is to make it harder for people to inadvertantly introduce
warnings in the arch/powerpc code. It needs to be configurable so that
if a warning is introduced, people can easily work around it while it's
being fixed.

The option is a negative, ie. don't enable -Werror, so that it will be
turned on for allyes and allmodconfig builds.

The default is n, in the hope that developers will build with -Werror,
that will probably lead to some build breaks, I am prepared to be flamed.

It's not enabled for math-emu, which is a steaming pile of warnings.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2009-06-16 14:15:45 +10:00
Gleb Natapov 78646121e9 KVM: Fix interrupt unhalting a vcpu when it shouldn't
kvm_vcpu_block() unhalts vpu on an interrupt/timer without checking
if interrupt window is actually opened.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-06-10 11:48:33 +03:00
Liu Yu 046a48b35b KVM: ppc: e500: Fix the bug that KVM is unstable in SMP
TLB entry should enable memory coherence in SMP.

And like commit 631fba9dd3aca519355322cef035730609e91593,
remove guard attribute to enable the prefetch of guest memory.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:15 +02:00
Liu Yu bc35cbc85c KVM: ppc: e500: Fix the bug that mas0 update to wrong value when read TLB entry
Should clear and then update the next victim area here.

Guest kernel only read TLB1 when startup kernel,
this bug result in an extra 4K TLB1 mapping in guest from 0x0 to 0x0.

As the problem has no impact to bootup a guest,
we didn't notice it before.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:14 +02:00
Liu Yu b0a1835d53 KVM: ppc: Add emulation of E500 register mmucsr0
Latest kernel flushes TLB via mmucsr0.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:12 +02:00
Liu Yu 9aa4dd5e5f KVM: ppc: Move to new TLB invalidate interface
Commit 2a4aca1144 removed old method _tlbia().

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:05 +02:00
Liu Yu fb2838d446 KVM: ppc: Fix e500 warnings and some spelling problems
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:05 +02:00
Avi Kivity 5d9b8e30f5 KVM: Add CONFIG_HAVE_KVM_IRQCHIP
Two KVM archs support irqchips and two don't.  Add a Kconfig item to
make selecting between the two models easier.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:02 +02:00
Hollis Blanchard f5d0906b5b KVM: ppc: remove debug support broken by KVM debug rewrite
After the rewrite of KVM's debug support, this code doesn't even build any
more.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:03:01 +02:00
Hollis Blanchard bb3a8a178d KVM: ppc: Add extra E500 exceptions
e500 has additional interrupt vectors (and corresponding IVORs) for SPE and
performance monitoring interrupts.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:59 +02:00
Hollis Blanchard bdc89f13ec KVM: ppc: distinguish between interrupts and priorities
Although BOOKE_MAX_INTERRUPT has the right value, the meaning is not match.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:59 +02:00
Hollis Blanchard b52a638c39 KVM: ppc: Add kvmppc_mmu_dtlb/itlb_miss for booke
When itlb or dtlb miss happens, E500 needs to update some mmu registers.
So that the auto-load mechanism can work on E500 when write a tlb entry.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:59 +02:00
Hollis Blanchard bc8080cbcc KVM: ppc: E500 core-specific code
Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:58 +02:00
Hollis Blanchard 17c885eb5c KVM: ppc: ifdef iccci with CONFIG_44x
E500 deosn't support this instruction.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:58 +02:00
Hollis Blanchard f7b200af8f KVM: ppc: Add dbsr in kvm_vcpu_arch
Kernel for E500 need clear dbsr when startup.
So add dbsr register in kvm_vcpu_arch for BOOKE.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:57 +02:00
Hollis Blanchard d0c7dc0344 KVM: ppc: split out common Book E instruction emulation
The Book E code will be shared with e500.

I've left PID in kvmppc_core_emulate_op() just so that we don't need to move
kvmppc_set_pid() right now. Once we have the e500 implementation, we can
probably share that too.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:57 +02:00
Hollis Blanchard cea5d8c9de KVM: ppc: use macros instead of hardcoded literals for instruction decoding
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:57 +02:00
Hollis Blanchard f44353610b KVM: ppc: remove last 44x-specific bits from booke.c
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:57 +02:00
Hollis Blanchard fa86b8dda2 KVM: ppc: rename 44x MMU functions used in booke.c
e500 will provide its own implementation of these.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:56 +02:00
Hollis Blanchard be8d1cae07 KVM: ppc: turn tlb_xlate() into a per-core hook (and give it a better name)
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:56 +02:00
Hollis Blanchard 58a96214a3 KVM: ppc: change kvmppc_mmu_map() parameters
Passing just the TLB index will ease an e500 implementation.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:56 +02:00
Hollis Blanchard 475e7cdd69 KVM: ppc: small cosmetic changes to Book E DTLB miss handler
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:56 +02:00
Hollis Blanchard ecc0981ff0 KVM: ppc: cosmetic changes to mmu hook names
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:55 +02:00
Jan Kiszka d0bfb940ec KVM: New guest debug interface
This rips out the support for KVM_DEBUG_GUEST and introduces a new IOCTL
instead: KVM_SET_GUEST_DEBUG. The IOCTL payload consists of a generic
part, controlling the "main switch" and the single-step feature. The
arch specific part adds an x86 interface for intercepting both types of
debug exceptions separately and re-injecting them when the host was not
interested. Moveover, the foundation for guest debugging via debug
registers is layed.

To signal breakpoint events properly back to userland, an arch-specific
data block is now returned along KVM_EXIT_DEBUG. For x86, the arch block
contains the PC, the debug exception, and relevant debug registers to
tell debug events properly apart.

The availability of this new interface is signaled by
KVM_CAP_SET_GUEST_DEBUG. Empty stubs for not yet supported archs are
provided.

Note that both SVM and VTX are supported, but only the latter was tested
yet. Based on the experience with all those VTX corner case, I would be
fairly surprised if SVM will work out of the box.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-03-24 11:02:49 +02:00
Sheng Yang ad8ba2cd44 KVM: Add kvm_arch_sync_events to sync with asynchronize events
kvm_arch_sync_events is introduced to quiet down all other events may happen
contemporary with VM destroy process, like IRQ handler and work struct for
assigned device.

For kvm_arch_sync_events is called at the very beginning of kvm_destroy_vm(), so
the state of KVM here is legal and can provide a environment to quiet down other
events.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-02-15 02:47:36 +02:00
Avi Kivity ca9edaee1a KVM: Consolidate userspace memory capability reporting into common code
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:46 +02:00
Hollis Blanchard 7b7015914b KVM: ppc: mostly cosmetic updates to the exit timing accounting code
The only significant changes were to kvmppc_exit_timing_write() and
kvmppc_exit_timing_show(), both of which were dramatically simplified.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:41 +02:00
Hollis Blanchard 73e75b416f KVM: ppc: Implement in-kernel exit timing statistics
Existing KVM statistics are either just counters (kvm_stat) reported for
KVM generally or trace based aproaches like kvm_trace.
For KVM on powerpc we had the need to track the timings of the different exit
types. While this could be achieved parsing data created with a kvm_trace
extension this adds too much overhead (at least on embedded PowerPC) slowing
down the workloads we wanted to measure.

Therefore this patch adds a in-kernel exit timing statistic to the powerpc kvm
code. These statistic is available per vm&vcpu under the kvm debugfs directory.
As this statistic is low, but still some overhead it can be enabled via a
.config entry and should be off by default.

Since this patch touched all powerpc kvm_stat code anyway this code is now
merged and simplified together with the exit timing statistic code (still
working with exit timing disabled in .config).

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:41 +02:00
Hollis Blanchard c5fbdffbda KVM: ppc: save and restore guest mappings on context switch
Store shadow TLB entries in memory, but only use it on host context switch
(instead of every guest entry). This improves performance for most workloads on
440 by reducing the guest TLB miss rate.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:09 +02:00
Hollis Blanchard 7924bd4109 KVM: ppc: directly insert shadow mappings into the hardware TLB
Formerly, we used to maintain a per-vcpu shadow TLB and on every entry to the
guest would load this array into the hardware TLB. This consumed 1280 bytes of
memory (64 entries of 16 bytes plus a struct page pointer each), and also
required some assembly to loop over the array on every entry.

Instead of saving a copy in memory, we can just store shadow mappings directly
into the hardware TLB, accepting that the host kernel will clobber these as
part of the normal 440 TLB round robin. When we do that we need less than half
the memory, and we have decreased the exit handling time for all guest exits,
at the cost of increased number of TLB misses because the host overwrites some
guest entries.

These savings will be increased on processors with larger TLBs or which
implement intelligent flush instructions like tlbivax (which will avoid the
need to walk arrays in software).

In addition to that and to the code simplification, we have a greater chance of
leaving other host userspace mappings in the TLB, instead of forcing all
subsequent tasks to re-fault all their mappings.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:09 +02:00
Hollis Blanchard 891686188f KVM: ppc: support large host pages
KVM on 440 has always been able to handle large guest mappings with 4K host
pages -- we must, since the guest kernel uses 256MB mappings.

This patch makes KVM work when the host has large pages too (tested with 64K).

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:55:07 +02:00
Hollis Blanchard fe4e771d5c KVM: ppc: fix userspace mapping invalidation on context switch
We used to defer invalidating userspace TLB entries until jumping out of the
kernel. This was causing MMU weirdness most easily triggered by using a pipe in
the guest, e.g. "dmesg | tail". I believe the problem was that after the guest
kernel changed the PID (part of context switch), the old process's mappings
were still present, and so copy_to_user() on the "return to new process" path
ended up using stale mappings.

Testing with large pages (64K) exposed the problem, probably because with 4K
pages, pressure on the TLB faulted all process A's mappings out before the
guest kernel could insert any for process B.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:26 +02:00
Hollis Blanchard df9b856c45 KVM: ppc: use prefetchable mappings for guest memory
Bare metal Linux on 440 can "overmap" RAM in the kernel linear map, so that it
can use large (256MB) mappings even if memory isn't a multiple of 256MB. To
prevent the hardware prefetcher from loading from an invalid physical address
through that mapping, it's marked Guarded.

However, KVM must ensure that all guest mappings are backed by real physical
RAM (since a deliberate access through a guarded mapping could still cause a
machine check). Accordingly, we don't need to make our mappings guarded, so
let's allow prefetching as the designers intended.

Curiously this patch didn't affect performance at all on the quick test I
tried, but it's clearly the right thing to do anyways and may improve other
workloads.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:26 +02:00
Hollis Blanchard bf5d4025c9 KVM: ppc: use MMUCR accessor to obtain TID
We have an accessor; might as well use it.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:25 +02:00
Hollis Blanchard 74ef740da6 KVM: ppc: fix Kconfig constraints
Make sure that CONFIG_KVM cannot be selected without processor support
(currently, 440 is the only processor implementation available).

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:25 +02:00
Hollis Blanchard fcfdbd266a KVM: ppc: improve trap emulation
set ESR[PTR] when emulating a guest trap. This allows Linux guests to
properly handle WARN_ON() (i.e. detect that it's a non-fatal trap).

Also remove debugging printk in trap emulation.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:24 +02:00
Hollis Blanchard d4cf3892e5 KVM: ppc: optimize irq delivery path
In kvmppc_deliver_interrupt is just one case left in the switch and it is a
rare one (less than 8%) when looking at the exit numbers. Therefore we can
at least drop the switch/case and if an if. I inserted an unlikely too, but
that's open for discussion.

In kvmppc_can_deliver_interrupt all frequent cases are in the default case.
I know compilers are smart but we can make it easier for them. By writing
down all options and removing the default case combined with the fact that
ithe values are constants 0..15 should allow the compiler to write an easy
jump table.
Modifying kvmppc_can_deliver_interrupt pointed me to the fact that gcc seems
to be unable to reduce priority_exception[x] to a build time constant.
Therefore I changed the usage of the translation arrays in the interrupt
delivery path completely. It is now using priority without translation to irq
on the full irq delivery path.
To be able to do that ivpr regs are stored by their priority now.

Additionally the decision made in kvmppc_can_deliver_interrupt is already
sufficient to get the value of interrupt_msr_mask[x]. Therefore we can replace
the 16x4byte array used here with a single 4byte variable (might still be one
miss, but the chance to find this in cache should be better than the right
entry of the whole array).

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:23 +02:00
Hollis Blanchard 9ab80843c0 KVM: ppc: optimize find first bit
Since we use a unsigned long here anyway we can use the optimized __ffs.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:23 +02:00
Hollis Blanchard 1b6766c7f3 KVM: ppc: optimize kvm stat handling
Currently we use an unnecessary if&switch to detect some cases.
To be honest we don't need the ligh_exits counter anyway, because we can
calculate it out of others. Sum_exits can also be calculated, so we can
remove that too.
MMIO, DCR  and INTR can be counted on other places without these
additional control structures (The INTR case was never hit anyway).

The handling of BOOKE_INTERRUPT_EXTERNAL/BOOKE_INTERRUPT_DECREMENTER is
similar, but we can avoid the additional if when copying 3 lines of code.
I thought about a goto there to prevent duplicate lines, but rewriting three
lines should be better style than a goto cross switch/case statements (its
also not enough code to justify a new inline function).

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:23 +02:00
Hollis Blanchard b8fd68ac8d KVM: ppc: fix set regs to take care of msr change
When changing some msr bits e.g. problem state we need to take special
care of that. We call the function in our mtmsr emulation (not needed for
wrtee[i]), but we don't call kvmppc_set_msr if we change msr via set_regs
ioctl.
It's a corner case we never hit so far, but I assume it should be
kvmppc_set_msr in our arch set regs function (I found it because it is also
a corner case when using pv support which would miss the update otherwise).

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:23 +02:00
Hollis Blanchard 5cf8ca2214 KVM: ppc: adjust vcpu types to support 64-bit cores
However, some of these fields could be split into separate per-core structures
in the future.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:22 +02:00
Hollis Blanchard db93f5745d KVM: ppc: create struct kvm_vcpu_44x and introduce container_of() accessor
This patch doesn't yet move all 44x-specific data into the new structure, but
is the first step down that path. In the future we may also want to create a
struct kvm_vcpu_booke.

Based on patch from Liu Yu <yu.liu@freescale.com>.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:22 +02:00
Hollis Blanchard 5cbb5106f5 KVM: ppc: Move the last bits of 44x code out of booke.c
Needed to port to other Book E processors.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:22 +02:00
Hollis Blanchard 75f74f0dbe KVM: ppc: refactor instruction emulation into generic and core-specific pieces
Cores provide 3 emulation hooks, implemented for example in the new
4xx_emulate.c:
kvmppc_core_emulate_op
kvmppc_core_emulate_mtspr
kvmppc_core_emulate_mfspr

Strictly speaking the last two aren't necessary, but provide for more
informative error reporting ("unknown SPR").

Long term I'd like to have instruction decoding autogenerated from tables of
opcodes, and that way we could aggregate universal, Book E, and core-specific
instructions more easily and without redundant switch statements.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:21 +02:00
Hollis Blanchard c381a04313 ppc: Create disassemble.h to extract instruction fields
This is used in a couple places in KVM, but isn't KVM-specific.

However, this patch doesn't modify other in-kernel emulation code:
- xmon uses a direct copy of ppc_opc.c from binutils
- emulate_instruction() doesn't need it because it can use a series
  of mask tests.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:21 +02:00
Hollis Blanchard 9dd921cfea KVM: ppc: Refactor powerpc.c to relocate 440-specific code
This introduces a set of core-provided hooks. For 440, some of these are
implemented by booke.c, with the rest in (the new) 44x.c.

Note that these hooks are link-time, not run-time. Since it is not possible to
build a single kernel for both e500 and 440 (for example), using function
pointers would only add overhead.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:52:21 +02:00
Hollis Blanchard d9fbd03d24 KVM: ppc: combine booke_guest.c and booke_host.c
The division was somewhat artificial and cumbersome, and had no functional
benefit anyways: we can only guests built for the real host processor.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:51:50 +02:00
Hollis Blanchard 0f55dc481e KVM: ppc: Rename "struct tlbe" to "struct kvmppc_44x_tlbe"
This will ease ports to other cores.

Also remove unused "struct kvm_tlb" while we're at it.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:51:50 +02:00
Hollis Blanchard a0d7b9f246 KVM: ppc: Move 440-specific TLB code into 44x_tlb.c
This will make it easier to provide implementations for other cores.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-12-31 16:51:50 +02:00
Paul Mackerras fad7b9b51e powerpc: Fix KVM build on ppc440
Commit 2a4aca1144 ("powerpc/mm: Split
low level tlb invalidate for nohash processors") changed a call to
_tlbia to _tlbil_all but didn't include the header that defines
_tlbil_all, leading to a build failure on 440 if KVM is enabled.
This fixes it.

Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-23 14:58:30 +11:00
Benjamin Herrenschmidt 2a4aca1144 powerpc/mm: Split low level tlb invalidate for nohash processors
Currently, the various forms of low level TLB invalidations are all
implemented in misc_32.S for 32-bit processors, in a fairly scary
mess of #ifdef's and with interesting duplication such as a whole
bunch of code for FSL _tlbie and _tlbia which are no longer used.

This moves things around such that _tlbie is now defined in
hash_low_32.S and is only used by the 32-bit hash code, and all
nohash CPUs use the various _tlbil_* forms that are now moved to
a new file, tlb_nohash_low.S.

I moved all the definitions for that stuff out of
include/asm/tlbflush.h as they are really internal mm stuff, into
mm/mmu_decl.h

The code should have no functional changes.  I kept some variants
inline for trivial forms on things like 40x and 8xx.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
2008-12-21 14:21:16 +11:00
Hollis Blanchard c30f8a6c6d KVM: ppc: stop leaking host memory on VM exit
When the VM exits, we must call put_page() for every page referenced in the
shadow TLB.

Without this patch, we usually leak 30-50 host pages (120 - 200 KiB with 4 KiB
pages). The maximum number of pages leaked is the size of our shadow TLB, 64
pages.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-11-25 12:02:48 +02:00
Marcelo Tosatti 4c2155ce81 KVM: switch to get_user_pages_fast
Convert gfn_to_pfn to use get_user_pages_fast, which can do lockless
pagetable lookups on x86. Kernel compilation on 4-way guest is 3.7%
faster on VMX.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2008-10-15 14:25:06 +02:00
Hollis Blanchard 0bd595fc22 KVM: ppc: kvmppc_44x_shadow_release() does not require mmap_sem to be locked
And it gets in the way of get_user_pages_fast().

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:26 +02:00
Hollis Blanchard 49dd2c4928 KVM: powerpc: Map guest userspace with TID=0 mappings
When we use TID=N userspace mappings, we must ensure that kernel mappings have
been destroyed when entering userspace. Using TID=1/TID=0 for kernel/user
mappings and running userspace with PID=0 means that userspace can't access the
kernel mappings, but the kernel can directly access userspace.

The net is that we don't need to flush the TLB on privilege switches, but we do
on guest context switches (which are far more infrequent). Guest boot time
performance improvement: about 30%.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Hollis Blanchard 83aae4a809 KVM: ppc: Write only modified shadow entries into the TLB on exit
Track which TLB entries need to be written, instead of overwriting everything
below the high water mark. Typically only a single guest TLB entry will be
modified in a single exit.

Guest boot time performance improvement: about 15%.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Hollis Blanchard 20754c2495 KVM: ppc: Stop saving host TLB state
We're saving the host TLB state to memory on every exit, but never using it.
Originally I had thought that we'd want to restore host TLB for heavyweight
exits, but that could actually hurt when context switching to an unrelated host
process (i.e. not qemu).

Since this decreases the performance penalty of all exits, this patch improves
guest boot time by about 15%.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Hollis Blanchard 6a0ab738ef KVM: ppc: guest breakpoint support
Allow host userspace to program hardware debug registers to set breakpoints
inside guests.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:16 +02:00
Christian Ehrhardt 3b4bd7969f KVM: ppc: trace powerpc instruction emulation
This patch adds a trace point for the instruction emulation on embedded powerpc
utilizing the KVM_TRACE interface.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Jerone Young 31711f2294 KVM: ppc: adds trace points for ppc tlb activity
This patch adds trace points to track powerpc TLB activities using the
KVM_TRACE infrastructure.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Jerone Young 12f6755602 KVM: ppc: enable KVM_TRACE building for powerpc
This patch enables KVM_TRACE to build for PowerPC arch. This means just
adding sections to Kconfig and Makefile.

Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-10-15 10:15:15 +02:00
Hollis Blanchard cc04454fa8 KVM: ppc: fix invalidation of large guest pages
When guest invalidates a large tlb map, there may be more than one
corresponding shadow tlb maps that need to be invalidated. Use eaddr and eend
to find these shadow tlb maps.

Signed-off-by: Liu Yu <yu.liu@freescale.com>
Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-27 12:02:05 +03:00
Marcelo Tosatti 34d4cb8fca KVM: MMU: nuke shadowed pgtable pages and ptes on memslot destruction
Flush the shadow mmu before removing regions to avoid stale entries.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:40 +03:00
Laurent Vivier 588968b6b7 KVM: Add coalesced MMIO support (powerpc part)
This patch enables coalesced MMIO for powerpc architecture.
It defines KVM_MMIO_PAGE_OFFSET and KVM_CAP_COALESCED_MMIO.
It enables the compilation of coalesced_mmio.c.

Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:31 +03:00
Avi Kivity 7cc8883074 KVM: Remove decache_vcpus_on_cpu() and related callbacks
Obsoleted by the vmx-specific per-cpu list.

Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-07-20 12:42:25 +03:00
Hollis Blanchard 9dcb40e1aa KVM: ppc: Report bad GFNs
This code shouldn't be hit anyways, but when it is, it's useful to have a
little more information about the failure.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:22:41 +03:00
Hollis Blanchard 905fa4b9d6 KVM: ppc: Use a read lock around MMU operations, and release it on error
gfn_to_page() and kvm_release_page_clean() are called from other contexts with
mmap_sem locked only for reading.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:22:33 +03:00
Hollis Blanchard 52435b7c7a KVM: ppc: Remove unmatched kunmap() call
We're not calling kmap() now, so we shouldn't call kunmap() either. This has no
practical effect in the non-highmem case, which is why it hasn't caused more
obvious problems.

Pointed out by Anthony Liguori.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:22:25 +03:00
Hollis Blanchard ac3cd34e4e KVM: ppc: add lwzx/stwz emulation
Somehow these load/store instructions got missed before, but weren't used by
the guest so didn't break anything.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:22:17 +03:00
Hollis Blanchard ce263d70e5 KVM: ppc: Remove duplicate function
This was left behind from some code movement.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-06-06 21:22:09 +03:00
Christian Ehrhardt de368dceb3 KVM: ppc: deliver INTERRUPT_FP_UNAVAIL to the guest
This patch adds the delivery of INTERRUPT_FP_UNAVAIL exceptions to the guest.
It's needed if a guest uses ppc binaries using the Floating point instructions.

Signed-off-by: Christian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:45 +03:00
Hollis Blanchard 45c5eb67da KVM: ppc: Handle guest idle by emulating MSR[WE] writes
This reduces host CPU usage when the guest is idle. However, the guest must
set MSR[WE] in its idle loop, which Linux did not do until 2.6.26.

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Signed-off-by: Jerone Young <jyoung5@us.ibm.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-05-04 14:44:44 +03:00
Hollis Blanchard bbf45ba57e KVM: ppc: PowerPC 440 KVM implementation
This functionality is definitely experimental, but is capable of running
unmodified PowerPC 440 Linux kernels as guests on a PowerPC 440 host. (Only
tested with 440EP "Bamboo" guests so far, but with appropriate userspace
support other SoC/board combinations should work.)

See Documentation/powerpc/kvm_440.txt for technical details.

[stephen: build fix]

Signed-off-by: Hollis Blanchard <hollisb@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Avi Kivity <avi@qumranet.com>
2008-04-27 18:21:39 +03:00