If the local apic initial count is zero, don't start a an hrtimer with infinite
frequency, locking up the host.
Signed-off-by: Avi Kivity <avi@qumranet.com>
the cr3 variable is now inside the vcpu->arch structure.
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
alloc_apic_access_page() can sleep, while vmx_vcpu_setup is called
inside a non preemptable region. Move it after put_cpu().
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
While installing Windows XP 64 bit wants to access the DEBUGCTL and the last
branch record (LBR) MSRs. Don't allowing this in KVM causes the installation to
crash. This patch allow the access to these MSRs and fixes the issue.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This patch replaces the mmap_sem lock for the memory slots with a new
kvm private lock, it is needed beacuse untill now there were cases where
kvm accesses user memory while holding the mmap semaphore.
Signed-off-by: Izik Eidus <izike@qumranet.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Injecting an GP when accessing this MSR lets Windows crash when running some
stress test tools in KVM. So this patch emulates access to this MSR.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
One of the use cases for the supported cpuid list is to create a "greatest
common denominator" of cpu capabilities in a server farm. As such, it is
useful to be able to get the list without creating a virtual machine first.
Since the code does not depend on the vm in any way, all that is needed is
to move it to the device ioctl handler. The capability identifier is also
changed so that binaries made against -rc1 will fail gracefully.
Signed-off-by: Avi Kivity <avi@qumranet.com>
Whilst working on getting a VM to initialize in to IA32e mode I found
this issue. set_cr0 relies on comparing the old cr0 to the new one to
work correctly. Move the assignment below so the compare can work.
Signed-off-by: Paul Knowles <paul@transitive.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
Explicitly enable the NM intercept in svm_set_cr0 if we enable TS in the guest
copy of CR0 for lazy FPU switching. This fixes guest SMP with Linux under SVM.
Without that patch Linux deadlocks or panics right after trying to boot the
other CPUs.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
If the guest writes to cr0 and leaves the TS flag at 0 while vcpu->fpu_active
is also 0, the TS flag in the guest's cr0 gets lost. This leads to corrupt FPU
state an causes Windows Vista 64bit to crash very soon after boot. This patch
fixes this bug.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Markus Rechberger <markus.rechberger@amd.com>
Signed-off-by: Avi Kivity <avi@qumranet.com>
This is unnecessary since it is already protected by
spin_lock_irq{save, restore} in clock.c.
Signed-off-by: eric miao <eric.miao@marvell.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
"cat /dev/mem" may cause kernel Oops for boards with PHYS_OFFSET != 0
because character device is mapped to addresses starting from zero
and there is no protection against such situation.
Patch just add this.
Signed-off-by: Alexandre Rusev <arusev@ru.mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Convert debug-only (and removed) MODULE_PARM() to module_param().
Compiles cleanly (with DEBUG=1).
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
On Wed, Feb 20, 2008 at 11:50:33AM +0100, Guennadi Liakhovetski wrote:
> arch/arm/kernel/atags.c uses for some reason the
> KEXEC_BOOT_PARAMS_SIZE macro, which is only defined if CONFIG_KEXEC
> is set. So, either this macro should be defined always, or another
> macro should be used, or ATAGS_PROC should depend on KEXEC.
As the procfs export of ATAGS is not meant as a stable, general purpose
ABI it shouldn't be an independent, general configuration option.
This patch make ATAGS_PROC depend on KEXEC
Signed-off-by: Uli Luckas <u.luckas@road.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix 32-on-64 pvops kernel:
we don't want userspace using syscall/sysenter, even if the hypervisor
supports it, so mask it out from CPUID.
Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
revert the BTS ptrace extension for now.
based on general objections from Roland McGrath:
http://lkml.org/lkml/2008/2/21/323
we'll let the BTS functionality cook some more and re-enable
it in v2.6.26. We'll leave the dead code around to help the
development of this code.
(X86_BTS is not defined at the moment)
Signed-off-by: Ingo Molnar <mingo@elte.hu>
avoid over-eager large page splitup.
When the target area needs to be split or is split already (ioremap)
then the current code enforces the split of large mappings in the alias
regions even if we could avoid it.
Use a separate variable processed in the cpa_data structure to carry
the number of pages which have been processed instead of reusing the
numpages variable. This keeps numpages intact and gives the alias code
a chance to keep large mappings intact.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
delay the removal of this symbol export by one more kernel release,
giving external modules such as VirtualBox a chance to stop using it.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Jan Beulich noticed it during code review that if a driver's ioremap()
fails (say due to -ENOMEM) then we might leak the struct vm_area.
Free it properly.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Priit Laes discovered that the sed command processing nm output was
sensitive to locale settings. This was addressed in commit
03994f01e8 by using [:alnum:] in place of
[a-zA-Z0-9].
But that solution too is locale-dependent and may not always match
the identifiers it needs to. The better fix is just to run sed et al
with a fixed locale setting in all builds.
Signed-off-by: Roland McGrath <roland@redhat.com>
CC: Priit Laes <plaes@plaes.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
a recent fix:
commit ce28b9864b
Author: Thomas Gleixner <tglx@linutronix.de>
Date: Wed Feb 20 23:57:30 2008 +0100
x86: fix vsyscall wreckage
removed the broken /kernel/vsyscall64 handler completely.
This triggers the following debug check:
sysctl table check failed: /kernel/vsyscall64 No proc_handler
Restore the sane part of the proc handler.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix a kernel bug (vmware boot problem) reported by Tomasz Grobelny,
which occurs with certain .config variants and gccs.
The x86 TLS cleanup in commit efd1ca52d0
made the sys_set_thread_area and sys_get_thread_area functions ripe for
tail call optimization. If the compiler chooses to use it for them, it
can clobber the user trap frame because these are asmlinkage functions.
Reported-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/blackfin-2.6: (35 commits)
Blackfin Serial Driver: Fix bug - Only insert UART rx char in timer task.
Blackfin Serial Driver: Fix bug - update tx dma buffer tail before wake up processes.
Blackfin Serial Driver: Fix bug - Increase buffer tail immediately before starting tx dma.
[Blackfin] serial driver: Add flow control support to bf54x
[Blackfin] serial driver: Fix bug Poll RTS/CTS status in DMA mode as well
[Blackfin] serial driver: ADSP-BF52x arch/mach support
[Blackfin] serial driver: use simpler comment headers and strip out information that is maintained in the scm's log
[Blackfin] serial driver: rework break flood anomaly handling to be more robust/realistic about what we can actually work around
[Blackfin] serial driver: fix bug - cache the bits of the LSR on systems where the LSR is read-to-clear
[Blackfin] serial driver: fix bug - should not wait for the TFI bit, just clear it when tx stop.
[Blackfin] serial driver: Fix bug serial driver in DMA mode spams history to console on shell restart
[Blackfin] serial driver: Fix bug Free rx dma buffer in shutdown.
[Blackfin] serial driver: Clean up UART DMA code.
Blackfin Serial driver: Fix bug - serial driver in PIO mode cant handle input very quickly
[Blackfin] arch: kill section mismatch warnings
[Blackfin] arch: handle the most common L1 shrinkage case (L1 does not exist for a part) so that any parts labeled for L1 instead get placed into external memory sections
[Blackfin] arch: add bfin_clear_PPIx_STATUS() helper funcs like we have for other parts
[Blackfin] arch: make sure we have proper description/copyright/license lines
[Blackfin] arch: Fix CONFIG_PM support for BF561
[Blackfin] arch: Remove DPMC char driver option
...
* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
arch/sh/drivers/dma/dma-sh.c: Correct use of ! and &
serial: Move asm-sh/sci.h to linux/serial_sci.h.
sh: Fix up HAS_SR_RB typo in entry-macros.
maple: fix device detection
sh: fix rtc_resources setup for sh770x
sh: heartbeat: ioremap is expected to succeed
sh: Storage class should be before const qualifier
maple: remove unused variable
sh: SH5-103 needs to select CPU_SH5.
sh: Rename SH-3 CCR3 reg to avoid synclink_cs clash.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
[SPARC64]: Adjust kernel PC validation test in fault handler.
[SPARC64]: Loosen checks in exception table handling.
[SPARC64]: Fix section mismatch from kernel_map_range
[SPARC64]: Fix section mismatchs from dr_cpu_data
[SPARC]: Fix build in arch/sparc/kernel/led.c
Because of the new futex validation init handler, we have
to accept faults in init section text as well as the normal
kernel text.
Thanks to Tom Callaway for the bug report.
Signed-off-by: David S. Miller <davem@davemloft.net>
Iff the parent has TIF_DEBUG set, _and_ clone_flags includes
CLONE_PTRACE we should set the TIF_DEBUG flag for the child and
increment the ocd refcount. Otherwise, the TIF_DEBUG flag must be
unset.
Currently, the child inherits TIF_DEBUG from the parent before
copy_thread is called, so TIF_DEBUG may be already be set before we
determine whether the child is supposed to inherit debugging
capabilities from the parent or not. This means that ocd_enable()
won't increment the refcount, because TIF_DEBUG is already set, and
that TIF_DEBUG will be set for processes that aren't being debugged.
This leads to a refcounting asymmetry, which may show up as
------------[ cut here ]------------
Badness at arch/avr32/kernel/ocd.c:73
PC is at ocd_disable+0x34/0x60
LR is at put_lock_stats+0xa/0x20
as reported by David Brownell. Happens when strace'ing a process that
forks a new child process, e.g. "strace mount -tjffs2 mtd1 /mnt", and
subsequently killing the child process (e.g. "umount /mnt".)
Signed-off-by: Haavard Skinnemoen <hskinnemoen@atmel.com>
Some parts of the kernel now do things like do *_user() accesses while
set_fs(KERNEL_DS) that fault on purpose.
See, for example, the code added by changeset
a0c1e9073e ("futex: runtime enable pi
and robust functionality").
That trips up the ASI sanity checking we make in do_kernel_fault().
Just remove it for now. Maybe we can add it back later with an added
conditional which looks at the current get_fs() value.
Signed-off-by: David S. Miller <davem@davemloft.net>
In commit e6bafba5b4, a bug was fixed that
involved converting !x & y to !(x & y). The code below shows the same
pattern, and thus should perhaps be fixed in the same way.
This is not tested and clearly changes the semantics, so it is only
something to consider.
The semantic patch that makes this change is as follows:
(http://www.emn.fr/x-info/coccinelle/)
// <smpl>
@@ expression E1,E2; @@
(
!E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
* git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86: (24 commits)
x86: no robust/pi futex for real i386 CPUs
x86: fix boot failure on 486 due to TSC breakage
x86: fix build on non-C locales.
x86: make c_idle.work have a static address.
x86: don't save unreliable stack trace entries
x86: don't make swapper_pg_pmd global
x86: don't print a warning when MTRR are blank and running in KVM
x86: fix execve with -fstack-protect
x86: fix vsyscall wreckage
x86: rename KERNEL_TEXT_SIZE => KERNEL_IMAGE_SIZE
x86: fix spontaneous reboot with allyesconfig bzImage
x86: remove double-checking empty zero pages debug
x86: notsc is ignored on common configurations
x86/mtrr: fix kernel-doc missing notation
x86: handle BIOSes which terminate e820 with CF=1 and no SMAP
x86: add comments for NOPs
x86: don't use P6_NOPs if compiling with CONFIG_X86_GENERIC
x86: require family >= 6 if we are using P6 NOPs
x86: do not promote TM3x00/TM5x00 to i686-class
x86: hpet fix docbook comment
...
> Diffing dmesg between git7 and git8 doesn't sched any light since
> git8 also removed the printouts of the x86 caps as they were being
> initialised and updated. I'm currently adding those printouts back
> in the hope of seeing where and when the caps get broken.
That turned out to be very illuminating:
--- dmesg-2.6.24-git7 2008-02-24 18:01:25.295851000 +0100
+++ dmesg-2.6.24-git8 2008-02-24 18:01:25.530358000 +0100
...
CPU: After generic identify, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000
CPU: After all inits, caps: 00000003 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+CPU: After applying cleared_cpu_caps, caps: 00000013 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Notice how the TSC cap bit goes from Off to On.
(The first two lines are printout loops from -git7 forward-ported
to -git8, the third line is the same printout loop added just after
the xor-with-cleared_cpu_caps[] loop.)
Here's how the breakage occurs:
1. arch/x86/kernel/tsc_32.c:tsc_init() sees !cpu_has_tsc,
so bails and calls setup_clear_cpu_cap(X86_FEATURE_TSC).
2. include/asm-x86/cpufeature.h:setup_clear_cpu_cap(bit) clears
the bit in boot_cpu_data and sets it in cleared_cpu_caps
3. arch/x86/kernel/cpu/common.c:identify_cpu() XORs all caps
in with cleared_cpu_caps
HOWEVER, at this point c->x86_capability correctly has TSC
Off, cleared_cpu_caps has TSC On, so the XOR incorrectly
sets TSC to On in c->x86_capability, with disastrous results.
The real bug is that clearing bits with XOR only works if the
bits are known to be 1 prior to the XOR, and that's not true here.
A simple fix is to convert the XOR to AND-NOT instead. The following
patch does that, and allows my 486 to boot 2.6.25-rc kernels again.
[ mingo@elte.hu: fixed a similar bug in setup_64.c as well. ]
The breakage was introduced via commit 7d851c8d3d.
Signed-off-by: Mikael Pettersson <mikpe@it.uu.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
For some locales regex range [a-zA-Z] does not work as it is supposed to.
so we have to use [:alnum:] and [:xdigit:] to make it work as intended.
[1] http://en.wikipedia.org/wiki/Estonian_alphabet
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, c_idle is declared in the stack, and thus, have no static address.
Peter Zijlstra points out this simple solution, in which c_idle.work
is initializated separatedly. Note that the INIT_WORK macro has a static
declaration of a key inside.
Signed-off-by: Glauber Costa <gcosta@redhat.com>
Acked-by: Peter Zijlstra <pzijlstr@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Currently, there is no way for print_stack_trace() to determine whether
a given stack trace entry was deemed reliable or not, simply because
save_stack_trace() does not record this information. (Perhaps needless
to say, this makes the saved stack traces A LOT harder to read, and
probably with no other benefits, since debugging features that use
save_stack_trace() most likely also require frame pointers, etc.)
This patch reverts to the old behaviour of only recording the reliable trace
entries for saved stack traces.
Signed-off-by: Vegard Nossum <vegardno@ifi.uio.no>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There doesn't seem to be any reason for swapper_pg_pmd being global.
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Inside a KVM virtual machine the MTRRs are usually blank. This confuses Linux
and causes a warning message at boot. This patch removes that warning message
when running Linux as a KVM guest.
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
pointed out by pageexec@freemail.hu:
> what happens here is that gcc treats the argument area as owned by the
> callee, not the caller and is allowed to do certain tricks. for ssp it
> will make a copy of the struct passed by value into the local variable
> area and pass *its* address down, and it won't copy it back into the
> original instance stored in the argument area.
>
> so once sys_execve returns, the pt_regs passed by value hasn't at all
> changed and its default content will cause a nice double fault (FWIW,
> this part took me the longest to debug, being down with cold didn't
> help it either ;).
To fix this we pass in pt_regs by pointer.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
based on a report from Arne Georg Gleditsch about user-space apps
misbehaving after toggling /proc/sys/kernel/vsyscall64, a review
of the code revealed that the "NOP patching" done there is
fundamentally unsafe for a number of reasons:
1) the patching code runs without synchronizing other CPUs
2) it inserts NOPs even if there is no clock source which provides vread
3) when the clock source changes to one without vread we run in
exactly the same problem as in #2
4) if nobody toggles the proc entry from 1 to 0 and to 1 again, then
the syscall is not patched out
as a result it is possible to break user-space via this patching.
The only safe thing for now is to remove the patching.
This code was broken since v2.6.21.
Reported-by: Arne Georg Gleditsch <arne.gleditsch@dolphinics.no>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The KERNEL_TEXT_SIZE constant was mis-named, as we not only map the kernel
text but data, bss and init sections as well.
That name led me on the wrong path with the KERNEL_TEXT_SIZE regression,
because i knew how big of _text_ my images have and i knew about the 40 MB
"text" limit so i wrongly thought to be on the safe side of the 40 MB limit
with my 29 MB of text, while the total image size was slightly above 40 MB.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
recently the 64-bit allyesconfig bzImage kernel started spontaneously
rebooting during early bootup.
after a few fun hours spent with early init debugging, it turns out
that we've got this rather annoying limit on the size of the kernel
image:
#define KERNEL_TEXT_SIZE (40*1024*1024)
which limit my vmlinux just happened to pass:
text data bss dec hex filename
29703744 4222751 8646224 42572719 2899baf vmlinux
40 MB is 42572719 bytes, so my vmlinux was just 1.5% above this limit :-/
So it happily crashed right in head_64.S, which - as we all know - is
the most debuggable code in the whole architecture ;-)
So increase the limit to allow an up to 128MB kernel image to be mapped.
(should anyone be that crazy or lazy)
We have a full 4K of pagetable (level2_kernel_pgt) allocated for these
mappings already, so there's no RAM overhead and the limit was rather
pointless and arbitrary.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
notsc is ignored in 32-bit kernels if CONFIG_X86_TSC is on.. which is
bad, fix it.
Signed-off-by: Pavel Machek <pavel@suse.cz>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Fix mtrr kernel-doc warning:
Warning(linux-2.6.24-git12//arch/x86/kernel/cpu/mtrr/main.c:677): No description found for parameter 'end_pfn'
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The proper way to terminate the e820 chain is with %ebx == 0 on the
last legitimate memory block. However, several BIOSes don't do that
and instead return error (CF = 1) when trying to read off the end of
the list. For this error return, %eax doesn't necessarily return the
SMAP signature -- correctly so, since %ah should contain an error code
in this case.
To deal with some particularly broken BIOSes, we clear the entire e820
chain if the SMAP signature is missing in the middle, indicating a
plain insane e820 implementation. However, we need to make the test
for CF = 1 before the SMAP check.
This fixes at least one HP laptop (nc6400) for which none of the
memory-probing methods (e820, e801, 88) functioned fully according to
spec.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
P6_NOPs are definitely not supported on some VIA CPUs, and possibly
(unverified) on AMD K7s. It is also the only thing that prevents a
686 kernel from running on Transmeta TM3x00/5x00 (Crusoe) series.
The performance benefit over generic NOPs is very small, so when
building for generic consumption, avoid using them.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The P6 family of NOPs are only available on family >= 6 or above, so
enforce that in the boot code.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
We have been promoting Transmeta TM3x00/TM5x00 chips to i686-class
based on the notion that they contain all the user-space visible
features of an i686-class chip. However, this is not actually true:
they lack the EA-taking long NOPs (0F 1F /0). Since this is a
userspace-visible incompatibility, downgrade these CPUs to the
manufacturer-defined i586 level.
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>