Commit graph

11540 commits

Author SHA1 Message Date
Namhyung Kim
6376b22975 kprobes: Make functions static
Make following (internal) functions static to make sparse
happier :-)

 * get_optimized_kprobe: only called from static functions
 * kretprobe_table_unlock: _lock function is static
 * kprobes_optinsn_template_holder: never called but holding asm code

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
LKML-Reference: <1284512670-2369-4-git-send-email-namhyung@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-15 10:44:01 +02:00
Ingo Molnar
3aabae7d9d Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-09-15 10:27:31 +02:00
Roland McGrath
eefdca043e x86-64, compat: Retruncate rax after ia32 syscall entry tracing
In commit d4d6715, we reopened an old hole for a 64-bit ptracer touching a
32-bit tracee in system call entry.  A %rax value set via ptrace at the
entry tracing stop gets used whole as a 32-bit syscall number, while we
only check the low 32 bits for validity.

Fix it by truncating %rax back to 32 bits after syscall_trace_enter,
in addition to testing the full 64 bits as has already been added.

Reported-by: Ben Hawkes <hawkes@sota.gen.nz>
Signed-off-by: Roland McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-14 16:08:47 -07:00
H. Peter Anvin
36d001c70d x86-64, compat: Test %rax for the syscall number, not %eax
On 64 bits, we always, by necessity, jump through the system call
table via %rax.  For 32-bit system calls, in theory the system call
number is stored in %eax, and the code was testing %eax for a valid
system call number.  At one point we loaded the stored value back from
the stack to enforce zero-extension, but that was removed in checkin
d4d6715016.  An actual 32-bit process
will not be able to introduce a non-zero-extended number, but it can
happen via ptrace.

Instead of re-introducing the zero-extension, test what we are
actually going to use, i.e. %rax.  This only adds a handful of REX
prefixes to the code.

Reported-by: Ben Hawkes <hawkes@sota.gen.nz>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@kernel.org>
Cc: Roland McGrath <roland@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
2010-09-14 16:08:46 -07:00
H. Peter Anvin
c41d68a513 compat: Make compat_alloc_user_space() incorporate the access_ok()
compat_alloc_user_space() expects the caller to independently call
access_ok() to verify the returned area.  A missing call could
introduce problems on some architectures.

This patch incorporates the access_ok() check into
compat_alloc_user_space() and also adds a sanity check on the length.
The existing compat_alloc_user_space() implementations are renamed
arch_compat_alloc_user_space() and are used as part of the
implementation of the new global function.

This patch assumes NULL will cause __get_user()/__put_user() to either
fail or access userspace on all architectures.  This should be
followed by checking the return value of compat_access_user_space()
for NULL in the callers, at which time the access_ok() in the callers
can also be removed.

Reported-by: Ben Hawkes <hawkes@sota.gen.nz>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: James Bottomley <jejb@parisc-linux.org>
Cc: Kyle McMartin <kyle@mcmartin.ca>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: <stable@kernel.org>
2010-09-14 16:08:45 -07:00
Thomas Gleixner
54ff7e595d x86: hpet: Work around hardware stupidity
This more or less reverts commits 08be979 (x86: Force HPET
readback_cmp for all ATI chipsets) and 30a564be (x86, hpet: Restrict
read back to affected ATI chipsets) to the status of commit 8da854c
(x86, hpet: Erratum workaround for read after write of HPET
comparator).

The delta to commit 8da854c is mostly comments and the change from
WARN_ONCE to printk_once as we know the call path of this function
already.

This needs really in depth explanation:

First of all the HPET design is a complete failure. Having a counter
compare register which generates an interrupt on matching values
forces the software to do at least one superfluous readback of the
counter register.

While it is nice in theory to program "absolute" time events it is
practically useless because the timer runs at some absurd frequency
which can never be matched to real world units. So we are forced to
calculate a relative delta and this forces a readout of the actual
counter value, adding the delta and programming the compare
register. When the delta is small enough we run into the danger that
we program a compare value which is already in the past. Due to the
compare for equal nature of HPET we need to read back the counter
value after writing the compare rehgister (btw. this is necessary for
absolute timeouts as well) to make sure that we did not miss the timer
event. We try to work around that by setting the minimum delta to a
value which is larger than the theoretical time which elapses between
the counter readout and the compare register write, but that's only
true in theory. A NMI or SMI which hits between the readout and the
write can easily push us beyond that limit. This would result in
waiting for the next HPET timer interrupt until the 32bit wraparound
of the counter happens which takes about 306 seconds.

So we designed the next event function to look like:

   match = read_cnt() + delta;
   write_compare_ref(match);
   return read_cnt() < match ? 0 : -ETIME;

At some point we got into trouble with certain ATI chipsets. Even the
above "safe" procedure failed. The reason was that the write to the
compare register was delayed probably for performance reasons. The
theory was that they wanted to avoid the synchronization of the write
with the HPET clock, which is understandable. So the write does not
hit the compare register directly instead it goes to some intermediate
register which is copied to the real compare register in sync with the
HPET clock. That opens another window for hitting the dreaded "wait
for a wraparound" problem.

To work around that "optimization" we added a read back of the compare
register which either enforced the update of the just written value or
just delayed the readout of the counter enough to avoid the issue. We
unfortunately never got any affirmative info from ATI/AMD about this.

One thing is sure, that we nuked the performance "optimization" that
way completely and I'm pretty sure that the result is worse than
before some HW folks came up with those.

Just for paranoia reasons I added a check whether the read back
compare register value was the same as the value we wrote right
before. That paranoia check triggered a couple of years after it was
added on an Intel ICH9 chipset. Venki added a workaround (commit
8da854c) which was reading the compare register twice when the first
check failed. We considered this to be a penalty in general and
restricted the readback (thus the wasted CPU cycles) to the known to
be affected ATI chipsets.

This turned out to be a utterly wrong decision. 2.6.35 testers
experienced massive problems and finally one of them bisected it down
to commit 30a564be which spured some further investigation.

Finally we got confirmation that the write to the compare register can
be delayed by up to two HPET clock cycles which explains the problems
nicely. All we can do about this is to go back to Venki's initial
workaround in a slightly modified version.

Just for the record I need to say, that all of this could have been
avoided if hardware designers and of course the HPET committee would
have thought about the consequences for a split second. It's out of my
comprehension why designing a working timer is so hard. There are two
ways to achieve it:

 1) Use a counter wrap around aware compare_reg <= counter_reg
    implementation instead of the easy compare_reg == counter_reg

    Downsides:

	- It needs more silicon.

	- It needs a readout of the counter to apply a relative
	  timeout. This is necessary as the counter does not run in
	  any useful (and adjustable) frequency and there is no
	  guarantee that the counter which is used for timer events is
	  the same which is used for reading the actual time (and
	  therefor for calculating the delta)

    Upsides:

	- None

  2) Use a simple down counter for relative timer events

    Downsides:

	- Absolute timeouts are not possible, which is not a problem
	  at all in the context of an OS and the expected
	  max. latencies/jitter (also see Downsides of #1)

   Upsides:

	- It needs less or equal silicon.

	- It works ALWAYS

	- It is way faster than a compare register based solution (One
	  write versus one write plus at least one and up to four
	  reads)

I would not be so grumpy about all of this, if I would not have been
ignored for many years when pointing out these flaws to various
hardware folks. I really hate timers (at least those which seem to be
designed by janitors).

Though finally we got a reasonable explanation plus a solution and I
want to thank all the folks involved in chasing it down and providing
valuable input to this.

Bisected-by: Nix <nix@esperi.org.uk>
Reported-by: Artur Skawina <art.08.09@gmail.com>
Reported-by: Damien Wyart <damien.wyart@free.fr>
Reported-by: John Drescher <drescherjm@gmail.com>
Cc: Venkatesh Pallipadi <venki@google.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: stable@kernel.org
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-09-15 00:55:13 +02:00
basile@opensource.dyc.edu
08c2b394b9 x86, build: Disable -fPIE when compiling with CONFIG_CC_STACKPROTECTOR=y
The arch/x86/Makefile uses scripts/gcc-x86_$(BITS)-has-stack-protector.sh
to check if cc1 supports -fstack-protector.  When -fPIE is passed to cc1,
these scripts fail causing stack protection to be disabled even when it
is available.

This fix is similar to commit c47efe5548

Reported-by: Kai Dietrich <mail@cleeus.de>
Signed-off-by: Magnus Granberg <zorry@gentoo.org>
LKML-Reference: <20100913101319.748A1148E216@opensource.dyc.edu>
Signed-off-by: Anthony G. Basile <basile@opensource.dyc.edu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-13 15:53:16 -07:00
Tetsuo Handa
2fd818642a x86, cpufeature: Suppress compiler warning with gcc 3.x
Gcc 3.x generates a warning

  arch/x86/include/asm/cpufeature.h: In function `__static_cpu_has':
  arch/x86/include/asm/cpufeature.h:326: warning: asm operand 1 probably doesn't match constraints

on each file.
But static_cpu_has() for gcc 3.x does not need __static_cpu_has().

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
LKML-Reference: <201008300127.o7U1RC6Z044051@www262.sakura.ne.jp>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-13 14:48:41 -07:00
Stephane Eranian
b0b2072df3 perf_events: Fix BTS interrupt handling to avoid being dazed by NMI (v2)
Fix a bug introduced with commit de725de and the change in the
meaning of the return value of intel_pmu_handle_irq(). With the
current code, when you are using the BTS, you get 'dazed by NMI'
each time the BTS buffer fills up.

BTS does interrupt on the PMU vector, thus NMI. You need to take
this into account in the return value of the function.

This version fixes initial patch which was missing changes to
perf_event_intel_ds.c.

Signed-off-by: Stephane Eranian <eranian@google.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: paulus@samba.org
Cc: davem@davemloft.net
Cc: fweisbec@gmail.com
Cc: perfmon2-devel@lists.sf.net
Cc: eranian@gmail.com
Cc: robert.richter@amd.com
LKML-Reference: <4c8a1686.aae9d80a.5aa4.5e35@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-13 08:43:40 +02:00
Joe Perches
d0ed0c3266 x86: Remove pr_<level> uses of KERN_<level>
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Jiri Kosina <trivial@kernel.org>
LKML-Reference: <d40c60f4b036a26db8492848695bdafaa3b42791.1284267142.git.joe@perches.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-12 09:32:31 +02:00
Peter Zijlstra
5ee5e97ee9 x86, tsc: Fix a preemption leak in restore_sched_clock_state()
A real life genuine preemption leak..

Reported-and-tested-by: Jeff Chua <jeff.chua.linux@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-09-10 18:17:45 -07:00
Venkatesh Pallipadi
351e5a703a x86, mtrr: Support mtrr lookup for range spanning across MTRR range
mtrr_type_lookup [start:end] looked up the resultant MTRR type for that
range, based on fixed and all variable MTRR ranges. It did check for multiple
MTRR var ranges overlapping [start:end] and returned the net type.

However, if the [start:end] range spanned across any var MTRR range,
mtrr_type_lookup would return an error return of 0xFE. This was based on
typical usage of mtrr_type_lookup in PAT mapping, where region being
mapped would not normally span across MTRR ranges and also trying
to keep the code simple.

Mark recently reported the problem with this limitation. When there are
two continguous MTRR's of type "writeback" and if there is a memory mapping
over a region starting in one MTRR range and ending in another MTRR range,
such mapping will fallback to "uncached" due to the above limitation.

Change below adds support for such lookups spanning multiple MTRR ranges.
We now have a wrapper mtrr_type_lookup that dynamically splits such a region
into smaller chunks that fit within one MTRR range and does a
__mtrr_type_lookup on it and combine the results later.

Reported-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <1284159350-19841-3-git-send-email-venki@google.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-10 16:11:20 -07:00
Venkatesh Pallipadi
a7f07cfbaa x86, mtrr: Refactor MTRR type overlap check code
Move the MTRR type overlap check into a new function. No functional change in
this patch. Just making it easier to add multiple region overlap check in
the following patch.

Signed-off-by: Venkatesh Pallipadi <venki@google.com>
LKML-Reference: <1284159350-19841-2-git-send-email-venki@google.com>
Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-10 16:11:10 -07:00
Jack Steiner
36ac4b987b x86, UV: Fix initialization of max_pnode
Fix calculation of "max_pnode" for systems where the the highest
blade has neither cpus or memory. (And, yes, although rare this
does occur).

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100910150808.GA19802@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-10 17:15:49 +02:00
Linus Torvalds
be6200aac9 Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Perform hardware_enable in CPU_STARTING callback
  KVM: i8259: fix migration
  KVM: fix i8259 oops when no vcpus are online
  KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts
2010-09-10 08:02:45 -07:00
Brian Gerst
b2b57fe053 x86, fpu: Merge fpu_save_init()
Make 64-bit use the 32-bit version of fpu_save_init().  Remove
unused clear_fpu_state().

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-13-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:36 -07:00
Brian Gerst
58a992b9cb x86-32, fpu: Rewrite fpu_save_init()
Rewrite fpu_save_init() to prepare for merging with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-12-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:31 -07:00
Brian Gerst
eec73f813a x86, fpu: Remove PSHUFB_XMM5_* macros
The PSHUFB_XMM5_* macros are no longer used.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-11-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:25 -07:00
Brian Gerst
8eb91a577d x86, fpu: Remove unnecessary ifdefs from i387 code.
Remove ifdefs for code that the compiler can optimize away on 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-10-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:18 -07:00
Brian Gerst
a334fe43d8 x86-32, fpu: Remove math_emulate stub
check_fpu() in bugs.c halts boot if no FPU is found and math emulation
isn't enabled.  Therefore this stub will never be used.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-9-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:11 -07:00
Brian Gerst
820241356d x86-64, fpu: Simplify constraints for fxsave/fxtstor
Use the "R" constraint (legacy register) instead of listing all the
possible registers.  Clean up the comments as well.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-8-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:17:06 -07:00
Brian Gerst
10c11f3049 x86-64, fpu: Fix %cs value in convert_from_fxsr()
While %ds still contains the userspace selector, %cs is KERNEL_CS at
this point.  Always get %cs from pt_regs even for the current task.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-7-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:58 -07:00
Brian Gerst
a4d4fbc773 x86-64, fpu: Disable preemption when using TS_USEDFPU
Consolidates code and fixes the below race for 64-bit.

commit 9fa2f37bfeb798728241cc4a19578ce6e4258f25
Author: torvalds <torvalds>
Date:   Tue Sep 2 07:37:25 2003 +0000

    Be a lot more careful about TS_USEDFPU and preemption

    We had some races where we testecd (or set) TS_USEDFPU together
    with sequences that depended on the setting (like clearing or
    setting the TS flag in %cr0) and we could be preempted in between,
    which screws up the FPU state, since preemption will itself change
    USEDFPU and the TS flag.

    This makes it a lot more explicit: the "internal" low-level FPU
    functions ("__xxxx_fpu()") all require preemption to be disabled,
    and the exported "real" functions will make sure that is the case.

    One case - in __switch_to() - was switched to the non-preempt-safe
    internal version, since the scheduler itself has already disabled
    preemption.

    BKrev: 3f5448b5WRiQuyzAlbajs3qoQjSobw

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:45 -07:00
Brian Gerst
bfd946cb89 x86, fpu: Merge __save_init_fpu()
__save_init_fpu() is identical for 32-bit and 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:30 -07:00
Brian Gerst
51115d4d45 x86, fpu: Merge tolerant_fwait()
Commit e2e75c91 merged the math exception handler, allowing both 32-bit
and 64-bit to handle math exceptions from kernel mode.  Switch to using
the 64-bit version of tolerant_fwait() without fnclex, which simply
ignores the exception if one is still pending from userspace.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:25 -07:00
Brian Gerst
6ac8bac268 x86, fpu: Merge fpu_init()
Make fpu_init() handle 32-bit setup.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:20 -07:00
Brian Gerst
2df7a6e9e8 x86: Use correct type for %cr4
%cr4 is 64-bit in 64-bit mode (although the upper 32-bits are currently reserved).
Use unsigned long for the temporary variable to get the right size.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1283563039-3466-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-09 14:16:13 -07:00
Peter Zijlstra
15ac9a395a perf: Remove the sysfs bits
Neither the overcommit nor the reservation sysfs parameter were
actually working, remove them as they'll only get in the way.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:31 +02:00
Peter Zijlstra
a4eaf7f146 perf: Rework the PMU methods
Replace pmu::{enable,disable,start,stop,unthrottle} with
pmu::{add,del,start,stop}, all of which take a flags argument.

The new interface extends the capability to stop a counter while
keeping it scheduled on the PMU. We replace the throttled state with
the generic stopped state.

This also allows us to efficiently stop/start counters over certain
code paths (like IRQ handlers).

It also allows scheduling a counter without it starting, allowing for
a generic frozen state (useful for rotating stopped counters).

The stopped state is implemented in two different ways, depending on
how the architecture implemented the throttled state:

 1) We disable the counter:
    a) the pmu has per-counter enable bits, we flip that
    b) we program a NOP event, preserving the counter state

 2) We store the counter state and ignore all read/overflow events

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:30 +02:00
Peter Zijlstra
33696fc0d1 perf: Per PMU disable
Changes perf_disable() into perf_pmu_disable().

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:29 +02:00
Peter Zijlstra
24cd7f54a0 perf: Reduce perf_disable() usage
Since the current perf_disable() usage is only an optimization,
remove it for now. This eases the removal of the __weak
hw_perf_enable() interface.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:29 +02:00
Peter Zijlstra
b0a873ebbf perf: Register PMU implementations
Simple registration interface for struct pmu, this provides the
infrastructure for removing all the weak functions.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:28 +02:00
Peter Zijlstra
51b0fe3954 perf: Deconstify struct pmu
sed -ie 's/const struct pmu\>/struct pmu/g' `git grep -l "const struct pmu\>"`

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: paulus <paulus@samba.org>
Cc: stephane eranian <eranian@googlemail.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Lin Ming <ming.m.lin@intel.com>
Cc: Yanmin <yanmin_zhang@linux.intel.com>
Cc: Deng-Cheng Zhu <dengcheng.zhu@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: Michael Cree <mcree@orcon.net.nz>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:46:27 +02:00
Ingo Molnar
2aa61274ef Merge branch 'perf/urgent' into perf/core
Merge reason: Pick up pending fixes before applying dependent new changes.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 20:40:08 +02:00
Cliff Wickman
37a2f9f30a x86, kdump: Change copy_oldmem_page() to use cached addressing
The copy of /proc/vmcore to a user buffer proceeds much faster
if the kernel addresses memory as cached.

With this patch we have seen an increase in transfer rate from
less than 15MB/s to 80-460MB/s, depending on size of the
transfer. This makes a big difference in time needed to save a
system dump.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: kexec@lists.infradead.org
Cc: <stable@kernel.org> # as far back as it would apply
LKML-Reference: <E1OtMLz-0001yp-Ia@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-09 09:46:23 +02:00
Andre Przywara
aeb9c7d618 x86, kvm: add new AMD SVM feature bits
The recently updated CPUID specification names new SVM feature bits.
Add them to the list of reported features.

Signed-off-by: Andre Przywara <andre.przywara@amd,com>
LKML-Reference: <1283778860-26843-5-git-send-email-andre.przywara@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-08 13:34:15 -07:00
Andre Przywara
6d886fd042 x86, cpu: Fix allowed CPUID bits for KVM guests
The AMD extensions to AVX (FMA4, XOP) work on the same YMM register set
as AVX, so they are safe for guests to use, as long as AVX itself
is allowed. Add F16C and AES on the way for the same reasons.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
LKML-Reference: <1283778860-26843-4-git-send-email-andre.przywara@amd.com>
Acked-by: Avi Kivity <avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-08 13:34:15 -07:00
Andre Przywara
33ed82fb6c x86, cpu: Update AMD CPUID feature bits
AMD's public CPUID specification has been updated and some bits have
got names. Add them to properly describe new CPU features.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
LKML-Reference: <1283778860-26843-3-git-send-email-andre.przywara@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-08 13:34:15 -07:00
Andre Przywara
7ef8aa72ab x86, cpu: Fix renamed, not-yet-shipping AMD CPUID feature bit
The AMD SSE5 feature set as-it has been replaced by some extensions
to the AVX instruction set. Thus the bit formerly advertised as SSE5
is re-used for one of these extensions (XOP).
Although this changes the /proc/cpuinfo output, it is not user visible, as
there are no CPUs (yet) having this feature.
To avoid confusion this should be added to the stable series, too.

Cc: stable@kernel.org [.32.x .34.x, .35.x]
Signed-off-by: Andre Przywara <andre.przywara@amd.com>
LKML-Reference: <1283778860-26843-2-git-send-email-andre.przywara@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-08 13:32:55 -07:00
Linus Torvalds
1faa6ec8cc Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks
  io-mapping: Fix the address space annotations
  x86: Fix the address space annotations of iomap_atomic_prot_pfn()
  x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
  x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev
2010-09-08 11:14:10 -07:00
Linus Torvalds
899edae615 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86: Try to handle unknown nmis with an enabled PMU
  perf, x86: Fix handle_irq return values
  perf, x86: Fix accidentally ack'ing a second event on intel perf counter
  oprofile, x86: fix init_sysfs() function stub
  lockup_detector: Sync touch_*_watchdog back to old semantics
  tracing: Fix a race in function profile
  oprofile, x86: fix init_sysfs error handling
  perf_events: Fix time tracking for events with pid != -1 and cpu != -1
  perf: Initialize callchains roots's childen hits
  oprofile: fix crash when accessing freed task structs
2010-09-08 11:13:16 -07:00
Gleb Natapov
eebb5f31b8 KVM: i8259: fix migration
Top of kvm_kpic_state structure should have the same memory layout as
kvm_pic_state since it is copied by memcpy.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-09-08 14:50:58 -03:00
Avi Kivity
ae0635b358 KVM: fix i8259 oops when no vcpus are online
If there are no vcpus, found will be NULL.  Check before doing anything with
it.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-09-08 14:50:56 -03:00
Avi Kivity
16518d5ada KVM: x86 emulator: fix regression with cmpxchg8b on i386 hosts
operand::val and operand::orig_val are 32-bit on i386, whereas cmpxchg8b
operands are 64-bit.

Fix by adding val64 and orig_val64 union members to struct operand, and
using them where needed.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-09-08 14:50:55 -03:00
Christian Dietrich
0f1cf415f0 x86: Remove unnecessary #ifdef ACPI/X86_IO_ACPI
The ACPI/X86_IO_ACPI ifdef isn't necessary at this point,
because it is checked in an outer ifdef level already and has no
effect here.

Cleanup only, no functional effect.

Signed-off-by: Christian Dietrich <qy03fugy@stud.informatik.uni-erlangen.de>
Acked-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: vamos-dev@i4.informatik.uni-erlangen.de
LKML-Reference: <d4376e6d79b8dc0f89a4b3ce4a880904a7b93ead.1283782701.git.qy03fugy@stud.informatik.uni-erlangen.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-08 08:14:02 +02:00
Linus Torvalds
d56557af19 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6:
  PCI: bus speed strings should be const
  PCI hotplug: Fix build with CONFIG_ACPI unset
  PCI: PCIe: Remove the port driver module exit routine
  PCI: PCIe: Move PCIe PME code to the pcie directory
  PCI: PCIe: Disable PCIe port services during port initialization
  PCI: PCIe: Ask BIOS for control of all native services at once
  ACPI/PCI: Negotiate _OSC control bits before requesting them
  ACPI/PCI: Do not preserve _OSC control bits returned by a query
  ACPI/PCI: Make acpi_pci_query_osc() return control bits
  ACPI/PCI: Reorder checks in acpi_pci_osc_control_set()
  PCI: PCIe: Introduce commad line switch for disabling port services
  PCI: PCIe AER: Introduce pci_aer_available()
  x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set
  PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs
2010-09-07 16:00:17 -07:00
Jin Dongming
592091c0e2 therm_throt.c: Trivial printk message fix for a unsuitable abbreviation of 'thermal'
In unexpected_thermal_interrupt(), "LVT TMR interrupt" is used
in error message.

I don't think TMR is a suitable abbreviation for thermal.
  1.TMR has been used in IA32 Architectures Software Developer's
    Manual, and is the abbreviation for Trigger Mode Register.
  2.There is not an standard abbreviation "TMR" defined for thermal
    in IA32 Architectures Software Developer's Manual.
  3.Though we could understand it as Thermal Monitor Register, it is
    easy to be misunderstood as a *TIMER* interrupt also.

I think this patch will fix it.

Signed-off-by: Jin Dongming <jin.dongming@np.css.fujitsu.com>
Reviewed-by: Jean Delvare <khali@linux-fr.org>
Cc: Brown Len <len.brown@intel.com>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <4C7C492D.5020704@np.css.fujitsu.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 20:26:50 +02:00
Andreas Herrmann
1389298f7d x86, mcheck: Avoid duplicate sysfs links/files for thresholding banks
kobject_add_internal failed for threshold_bank2 with -EEXIST,
don't try to register things with the same name in the same
directory:

  Pid: 1, comm: swapper Tainted: G        W  2.6.31 #1
  Call Trace:
  [<ffffffff81161b07>] ? kobject_add_internal+0x156/0x180
  [<ffffffff81161cc0>] ? kobject_add+0x66/0x6b
  [<ffffffff81161793>] ? kobject_init+0x42/0x82
  [<ffffffff81161cf9>] ? kobject_create_and_add+0x34/0x63
  [<ffffffff81393963>] ? threshold_create_bank+0x14f/0x259
  [<ffffffff8139310a>] ? mce_create_device+0x8d/0x1b8
  [<ffffffff81646497>] ? threshold_init_device+0x3f/0x80
  [<ffffffff81646458>] ? threshold_init_device+0x0/0x80
  [<ffffffff81009050>] ? do_one_initcall+0x4f/0x143
  [<ffffffff816413a0>] ? kernel_init+0x14c/0x1a2
  [<ffffffff8100c8da>] ? child_rip+0xa/0x20
  [<ffffffff81641254>] ? kernel_init+0x0/0x1a2
  [<ffffffff8100c8d0>] ? child_rip+0x0/0x20
  kobject_create_and_add: kobject_add error: -17

(Probably the for_each_cpu loop should be entirely removed.)

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100827092006.GB5348@loge.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:35:49 +02:00
Andreas Herrmann
d9fadd7ba9 x86, AMD: Remove needless CPU family check (for L3 cache info)
Old 32-bit AMD CPUs (all w/o L3 cache) should always return 0
for cpuid_edx(0x80000006).

For unknown reason the 32-bit implementation differed from the
64-bit implementation. See commit 67cddd9479 ("i386: Add L3 cache
support to AMD CPUID4 emulation"). The current check is the
result of the x86 merge.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <20100902133710.GA5449@loge.amd.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:33:48 +02:00
Borislav Petkov
260133ab65 x86, GART: Disable GART table walk probes
Current code tramples over bit F3x90[6] which can be used to
disable GART table walk probes. However, this bit should be set
for performance reasons (speed up GART table walks). We are
allowed to do that since we put GART tables in UC memory later
anyway. Make it so.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1283531981-7495-3-git-send-email-bp@amd64.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:28:34 +02:00
Borislav Petkov
57ab43e331 x86, GART: Remove superfluous AMD64_GARTEN
There is a GARTEN so use that and drop the duplicate.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1283531981-7495-2-git-send-email-bp@amd64.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:28:34 +02:00
Francisco Jerez
cc1a8e5233 x86: Fix the address space annotations of iomap_atomic_prot_pfn()
This patch fixes the sparse warnings when the return pointer of
iomap_atomic_prot_pfn() is used as an argument of iowrite32()
and friends.

Signed-off-by: Francisco Jerez <currojerez@riseup.net>
LKML-Reference: <1283633804-11749-1-git-send-email-currojerez@riseup.net>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-05 14:26:14 +02:00
Wu Fengguang
1c5f50ee34 x86, mm: fix uninitialized addr in kernel_physical_mapping_init()
This re-adds the lost chunk in commit 9b861528a8.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Haicheng Li <haicheng.li@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <20100903090407.GA19771@localhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 11:40:11 +02:00
Jan Beulich
7fe977dab3 i386: Make kernel_execve() suitable for stack unwinding
The explicit saving and restoring of %ebx was confusing stack
unwind data consumers, and it is plain unnecessary to do this
within the asm(), since that was only introduced for PIC user
mode consumers of the original _syscall3() macro this was
derived from.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Arnd Bergmann <arnd@arndb.de>
LKML-Reference: <4C7FBC660200007800013F95@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:16:02 +02:00
Jan Beulich
df5d1874ce x86: Use {push,pop}{l,q}_cfi in more places
... plus additionally introduce {push,pop}f{l,q}_cfi. All in the
hope that the code becomes better readable this way (it gets
quite a bit smaller in any case).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBDA40200007800013FAF@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:14:11 +02:00
Jan Beulich
a34107b557 i386: Add unwind directives to syscall ptregs stubs
When these stubs are actual functions (i.e. having a return
instruction) and have stack manipulation instructions in them,
they should also be annotated to allow unwinding through them.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBCF00200007800013F99@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:14:10 +02:00
Jan Beulich
b1cccb1bb0 x86-64: Use symbolics instead of raw numbers in entry_64.S
... making the code a little less fragile.

Also use pushq_cfi instead of raw CFI annotations in two more
places, and add two missing annotations after stack pointer
adjustments which got modified here anyway.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBACF0200007800013F6A@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:14:10 +02:00
Jan Beulich
1f130a783a x86-64: Adjust frame type at paranoid_exit:
As this isn't an exception or interrupt entry point, it doesn't
have any of the hardware provide frame layouts active.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBAA80200007800013F67@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:14:10 +02:00
Jan Beulich
e6b04b6b5a x86-64: Fix unwind annotations in syscall stubs
With the return address removed from the stack, these should
really refer to their caller's register state.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Alexander van Heukelum <heukelum@fastmail.fm>
LKML-Reference: <4C7FBA3D0200007800013F61@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:14:09 +02:00
Robert Richter
4177c42a63 perf, x86: Try to handle unknown nmis with an enabled PMU
When the PMU is enabled it is valid to have unhandled nmis, two
events could trigger 'simultaneously' raising two back-to-back
NMIs. If the first NMI handles both, the latter will be empty
and daze the CPU.

The solution to avoid an 'unknown nmi' massage in this case was
simply to stop the nmi handler chain when the PMU is enabled by
stating the nmi was handled. This has the drawback that a) we
can not detect unknown nmis anymore, and b) subsequent nmi
handlers are not called.

This patch addresses this. Now, we check this unknown NMI if it
could be a PMU back-to-back NMI. Otherwise we pass it and let
the kernel handle the unknown nmi.

This is a debug log:

 cpu #6, nmi #32333, skip_nmi #32330, handled = 1, time = 1934364430
 cpu #6, nmi #32334, skip_nmi #32330, handled = 1, time = 1934704616
 cpu #6, nmi #32335, skip_nmi #32336, handled = 2, time = 1936032320
 cpu #6, nmi #32336, skip_nmi #32336, handled = 0, time = 1936034139
 cpu #6, nmi #32337, skip_nmi #32336, handled = 1, time = 1936120100
 cpu #6, nmi #32338, skip_nmi #32336, handled = 1, time = 1936404607
 cpu #6, nmi #32339, skip_nmi #32336, handled = 1, time = 1937983416
 cpu #6, nmi #32340, skip_nmi #32341, handled = 2, time = 1938201032
 cpu #6, nmi #32341, skip_nmi #32341, handled = 0, time = 1938202830
 cpu #6, nmi #32342, skip_nmi #32341, handled = 1, time = 1938443743
 cpu #6, nmi #32343, skip_nmi #32341, handled = 1, time = 1939956552
 cpu #6, nmi #32344, skip_nmi #32341, handled = 1, time = 1940073224
 cpu #6, nmi #32345, skip_nmi #32341, handled = 1, time = 1940485677
 cpu #6, nmi #32346, skip_nmi #32347, handled = 2, time = 1941947772
 cpu #6, nmi #32347, skip_nmi #32347, handled = 1, time = 1941949818
 cpu #6, nmi #32348, skip_nmi #32347, handled = 0, time = 1941951591
 Uhhuh. NMI received for unknown reason 00 on CPU 6.
 Do you have a strange power saving mode enabled?
 Dazed and confused, but trying to continue

Deltas:

 nmi #32334 340186
 nmi #32335 1327704
 nmi #32336 1819      <<<< back-to-back nmi [1]
 nmi #32337 85961
 nmi #32338 284507
 nmi #32339 1578809
 nmi #32340 217616
 nmi #32341 1798      <<<< back-to-back nmi [2]
 nmi #32342 240913
 nmi #32343 1512809
 nmi #32344 116672
 nmi #32345 412453
 nmi #32346 1462095   <<<< 1st nmi (standard) handling 2 counters
 nmi #32347 2046      <<<< 2nd nmi (back-to-back) handling one
 counter nmi #32348 1773      <<<< 3rd nmi (back-to-back)
 handling no counter! [3]

For  back-to-back nmi detection there are the following rules:

The PMU nmi handler was handling more than one counter and no
counter was handled in the subsequent nmi (see [1] and [2]
above).

There is another case if there are two subsequent back-to-back
nmis [3]. The 2nd is detected as back-to-back because the first
handled more than one counter. If the second handles one counter
and the 3rd handles nothing, we drop the 3rd nmi because it
could be a back-to-back nmi.

Signed-off-by: Robert Richter <robert.richter@amd.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
[ renamed nmi variable to pmu_nmi to avoid clash with .nmi in entry.S ]
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: gorcunov@gmail.com
Cc: fweisbec@gmail.com
Cc: ying.huang@intel.com
Cc: ming.m.lin@intel.com
Cc: eranian@google.com
LKML-Reference: <1283454469-1909-3-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:05:18 +02:00
Peter Zijlstra
de725dec9d perf, x86: Fix handle_irq return values
Now that we rely on the number of handled overflows, ensure all
handle_irq implementations actually return the right number.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: robert.richter@amd.com
Cc: gorcunov@gmail.com
Cc: fweisbec@gmail.com
Cc: ying.huang@intel.com
Cc: ming.m.lin@intel.com
Cc: eranian@google.com
LKML-Reference: <1283454469-1909-4-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:05:18 +02:00
Don Zickus
2e556b5b32 perf, x86: Fix accidentally ack'ing a second event on intel perf counter
During testing of a patch to stop having the perf subsytem
swallow nmis, it was uncovered that Nehalem boxes were randomly
getting unknown nmis when using the perf tool.

Moving the ack'ing of the PMI closer to when we get the status
allows the hardware to properly re-set the PMU bit signaling
another PMI was triggered during the processing of the first
PMI.  This allows the new logic for dealing with the
shortcomings of multiple PMIs to handle the extra NMI by
'eat'ing it later.

Now one can wonder why are we getting a second PMI when we
disable all the PMUs in the begining of the NMI handler to
prevent such a case, for that I do not know.  But I know the fix
below helps deal with this quirk.

Tested on multiple Nehalems where the problem was occuring.
With the patch, the code now loops a second time to handle the
second PMI (whereas before it was not).

Signed-off-by: Don Zickus <dzickus@redhat.com>
Cc: peterz@infradead.org
Cc: robert.richter@amd.com
Cc: gorcunov@gmail.com
Cc: fweisbec@gmail.com
Cc: ying.huang@intel.com
Cc: ming.m.lin@intel.com
Cc: eranian@google.com
LKML-Reference: <1283454469-1909-2-git-send-email-dzickus@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-03 08:05:17 +02:00
Ingo Molnar
b4c69d45c4 Merge branch 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/urgent 2010-09-01 22:31:07 +02:00
Robert Richter
269f45c250 oprofile, x86: fix init_sysfs() function stub
The use of the return value of init_sysfs() with commit

 10f0412 oprofile, x86: fix init_sysfs error handling

discovered the following build error for !CONFIG_PM:

 .../linux/arch/x86/oprofile/nmi_int.c: In function ‘op_nmi_init’:
 .../linux/arch/x86/oprofile/nmi_int.c:784: error: expected expression before ‘do’
 make[2]: *** [arch/x86/oprofile/nmi_int.o] Error 1
 make[1]: *** [arch/x86/oprofile] Error 2

This patch fixes this.

Reported-by: Ingo Molnar <mingo@elte.hu>
Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-09-01 21:23:01 +02:00
Cyrill Gorcunov
c9cf4a019c perf, x86, Pentium4: Add RAW events verification
Implements verification of

- Bits of ESCR EventMask field (meaningful bits in field are hardware
  predefined and others bits should be set to zero)

- INSTR_COMPLETED event (it is available on predefined cpu model only)

- Thread shared events (they should be guarded by "perf_event_paranoid"
  sysctl due to security reason). The side effect of this action is
  that PERF_COUNT_HW_BUS_CYCLES become a "paranoid" general event.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Tested-by: Lin Ming <ming.m.lin@intel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <20100825182334.GB14874@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-09-01 08:26:56 +02:00
Robert Richter
10f0412f57 oprofile, x86: fix init_sysfs error handling
On failure init_sysfs() might not properly free resources. The error
code of the function is not checked. And, when reinitializing the exit
function might be called twice. This patch fixes all this.

Cc: stable@kernel.org
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-08-31 10:26:26 +02:00
Ingo Molnar
daab7fc734 Merge commit 'v2.6.36-rc3' into x86/memblock
Conflicts:
	arch/x86/kernel/trampoline.c
	mm/memblock.c

Merge reason: Resolve the conflicts, update to latest upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31 09:45:46 +02:00
Konrad Rzeszutek Wilk
7ac41ccf47 x86, iommu: Fix IOMMU_INIT alignment rules
This boot crash was observed:

 DMA-API: preallocated 32768 debug entries
 DMA-API: debugging enabled by kernel config
 BUG: unable to handle kernel paging request at 19da8955
 IP: [<f4ffffff>] 0xf4ffffff
 *pde = 00000000

The crux of the failure was that even if we did not use any
of the .iommu_table section, the linker would still insert it
in the vmlinux file. This patch fixes that and also fixes the
runtime crash where we would try to access the array.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <1283191802-25086-1-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-31 08:06:10 +02:00
Julia Lawall
9fbaf49c7f x86, kmemcheck: Remove double test
The opcodes 0x2e and 0x3e are tested for in the first Group 2
line as well.

The sematic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@expression@
expression E;
@@

(
* E
  || ... || E
|
* E
  && ... && E
)
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
LKML-Reference: <1283010066-20935-5-git-send-email-julia@diku.dk>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-30 09:19:28 +02:00
Konrad Rzeszutek Wilk
6f44d0337c x86, doc: Adding comments about .iommu_table and its neighbors.
Updating the linker section with comments about .iommu_table and
some other ones that I know of.

CC: Sam Ravnborg <sam@ravnborg.org>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282933173-19960-1-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 18:14:31 -07:00
Yinghai Lu
774ea0bcb2 x86: Remove old bootmem code
Requested by Ingo, Thomas and HPA.

The old bootmem code is no longer necessary, and the transition is
complete.  Remove it.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:14:37 -07:00
Yinghai Lu
6f2a75369e x86, memblock: Use memblock_memory_size()/memblock_free_memory_size() to get correct dma_reserve
memblock_memory_size() will return memory size in memblock.memory.region.
memblock_free_memory_size() will return free memory size in memblock.memory.region.

So We can get exact reseved size in specified range.

Set the size right after initmem_init(), because later bootmem API will
get area above 16M. (except some fallback).

Later after we remove the bootmem, We could call that just before paging_init().

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:13:54 -07:00
Yinghai Lu
a587d2daeb x86: Remove not used early_res code
and some functions in e820.c that are not used anymore

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:13:51 -07:00
Yinghai Lu
a9ce6bc151 x86, memblock: Replace e820_/_early string with memblock_
1.include linux/memblock.h directly. so later could reduce e820.h reference.
2 this patch is done by sed scripts mainly

-v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:13:47 -07:00
Yinghai Lu
72d7c3b33c x86: Use memblock to replace early_res
1. replace find_e820_area with memblock_find_in_range
2. replace reserve_early with memblock_x86_reserve_range
3. replace free_early with memblock_x86_free_range.
4. NO_BOOTMEM will switch to use memblock too.
5. use _e820, _early wrap in the patch, in following patch, will
   replace them all
6. because memblock_x86_free_range support partial free, we can remove some special care
7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
   so adjust some calling later in setup.c::setup_arch()
   -- corruption_check and mptable_update

-v2: Move reserve_brk() early
    Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
    that could happen We have more then 128 RAM entry in E820 tables, and
    memblock_x86_fill() could use memblock_find_in_range() to find a new place for
    memblock.memory.region array.
    and We don't need to use extend_brk() after fill_memblock_area()
    So move reserve_brk() early before fill_memblock_area().
-v3: Move find_smp_config early
    To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
    in right place.
-v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
    memblock.reserved already..
    use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
-v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
    active_region for 32bit does include high pages
    need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
-v6: Use current_limit instead
-v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
-v8: Set memblock_can_resize early to handle EFI with more RAM entries
-v9: update after kmemleak changes in mainline

Suggested-by: David S. Miller <davem@davemloft.net>
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:12:29 -07:00
Yinghai Lu
301ff3e88e x86, memblock: Use memblock_debug to control debug message print out
Also let memblock_x86_reserve_range/memblock_x86_free_range could print out name if memblock=debug is
specified

will also print ther name when reserve_memblock_area/free_memblock_area are called.

-v2: according to Ingo, put " if (memblock_debug) " in one place

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:11:35 -07:00
Yinghai Lu
e82d42be24 x86, memblock: Add memblock_x86_memory_in_range()
It will return memory size in specified range according to memblock.memory.region

Try to share some code with memblock_x86_free_memory_in_range() by passing get_free to
__memblock_x86_memory_in_range().

-v2: Ben want _in_range in the name instead of size

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:11:30 -07:00
Yinghai Lu
b52c17ce85 x86, memblock: Add memblock_x86_free_memory_in_range()
It will return free memory size in specified range.

We can not use memory_size - reserved_size here, because some reserved area
may not be in the scope of memblock.memory.region.

Use memblock.memory.region subtracting memblock.reserved.region to get free range array.
then count size of all free ranges.

-v2: Ben insist on using _in_range

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:11:16 -07:00
Yinghai Lu
6bcc8176d0 x86, memblock: Add memblock_x86_find_in_range_node()
It can be used to find NODE_DATA for numa.

Need to make sure early_node_map[] is filled before it is called, otherwise
it will fallback to memblock_find_in_range(), with node range.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:57 -07:00
Yinghai Lu
88ba088c18 x86, memblock: Add memblock_x86_register_active_regions() and memblock_x86_hole_size()
memblock_x86_register_active_regions() will be used to fill early_node_map,
the result will be memblock.memory.region AND numa data

memblock_x86_hole_size will be used to find hole size on memblock.memory.region
with specified range.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:52 -07:00
Yinghai Lu
4d5cf86ce1 x86, memblock: Add get_free_all_memory_range()
get_free_all_memory_range is for CONFIG_NO_BOOTMEM=y, and will be called by
free_all_memory_core_early().

It will use early_node_map aka active ranges subtract memblock.reserved to
get all free range, and those ranges will convert to slab pages.

-v4: increase range size

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Jan Beulich <jbeulich@novell.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:48 -07:00
Yinghai Lu
9dc5d569c1 x86, memblock: Add memblock_x86_reserve_range/memblock_x86_free_range
They are wrappers for core versions, which take start/end/name instead
of base/size.  This will make x86 conversion eaasier.

could add more debug print out

-v2: change get_max_mapped() to memblock.default_alloc_limit according to Michael
      Ellerman and Ben
     change to memblock_x86_reserve_range and memblock_x86_free_range according to Michael Ellerman
-v3: call check_and_double after reserve/free, so could avoid to use
      find_memblock_area. Suggested by Michael Ellerman

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:10:41 -07:00
Yinghai Lu
27de794365 x86, memblock: Add memblock_x86_to_bootmem()
memblock_x86_to_bootmem() will reserve memblock.reserved.region in
bootmem after bootmem is set up.

We can use it to with all arches that support memblock later.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:08:21 -07:00
Yinghai Lu
f88eff74aa bootmem, x86: Add weak version of reserve_bootmem_generic
It will be used memblock_x86_to_bootmem converting

It is an wrapper for reserve_bootmem, and x86 64bit is using special one.

Also clean up that version for x86_64. We don't need to take care of numa
path for that, bootmem can handle it how

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:08:13 -07:00
Yinghai Lu
fb74fb6db9 x86, memblock: Add memblock_x86_find_in_range_size()
size is returned according free range.
Will be used to find free ranges for early_memtest and memory corruption check

Do not mess it up with lib/memblock.c yet.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-27 11:08:06 -07:00
Frederic Weisbecker
98ee74a75c Merge branch 'perf/urgent' into perf/core
Conflicts:
	tools/perf/util/callchain.h

Merge reason:
	Fix a non-trivial conflict with latest fixes
2010-08-27 02:30:07 +02:00
Shaohua Li
660a293ea9 x86, mm: Make spurious_fault check explicitly check the PRESENT bit
pte_present() returns true even present bit isn't set but _PAGE_PROTNONE
(global bit) bit is set. While with CONFIG_DEBUG_PAGEALLOC, free pages have
global bit set but present bit clear. This patch makes we could catch
free pages access with CONFIG_DEBUG_PAGEALLOC enabled.

[ hpa: added a comment in the code as a warning to janitors ]

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <1280217988.32400.75.camel@sli10-desk.sh.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 16:00:21 -07:00
Konrad Rzeszutek Wilk
ee1f284f38 x86, iommu: Utilize the IOMMU_INIT macros functionality.
We remove all of the sub-platform detection/init routines and instead
use on the .iommu_table array of structs to call the .early_init if
.detect returned a positive value. Also we can stop detecting other
IOMMUs if the IOMMU used the _FINISH type macro. During the
'pci_iommu_init' stage, we call .init for the second-stage
initialization if it was defined. Currently only SWIOTLB has this
defined and it used to de-allocate the SWIOTLB if the other detected
IOMMUs have deemed it unnecessary to use SWIOTLB.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-11-git-send-email-konrad.wilk@oracle.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:14:52 -07:00
Konrad Rzeszutek Wilk
22e6daf41b x86, GART/AMD-VI: Make AMD GART and IOMMU use IOMMU_INIT_* macros.
We utilize the IOMMU_INIT macros to create this dependency:

               [null]
                 |
       [pci_xen_swiotlb_detect]
                 |
       [pci_swiotlb_detect_override]
                 |
       [pci_swiotlb_detect_4gb]
                 |
         +-------+--------+
        /                  \
[detect_calgary]    [gart_iommu_hole_init]
                            |
                    [amd_iommu_detect]

Meaning that 'amd_iommu_detect' will be called after
'gart_iommu_hole_init'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-9-git-send-email-konrad.wilk@oracle.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
CC: Joerg Roedel <joerg.roedel@amd.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:14:30 -07:00
Konrad Rzeszutek Wilk
d2aa232f3d x86, calgary: Make Calgary IOMMU use IOMMU_INIT_* macros.
We utilize the IOMMU_INIT macros to create this dependency:

     [pci_xen_swiotlb_detect]
         |
     [pci_swiotlb_detect_override]
         |
     [pci_swiotlb_detect_4gb]
         |
      [detect_calgary]

Meaning that 'detect_calgary' is going to be called after
'pci_swiotlb_detect'.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-8-git-send-email-konrad.wilk@oracle.com>
CC: Muli Ben-Yehuda <muli@il.ibm.com>
CC: "Jon D. Mason" <jdmason@kudzu.us>
CC: "Darrick J. Wong" <djwong@us.ibm.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:14:15 -07:00
Konrad Rzeszutek Wilk
5cb3a26793 x86, xen-swiotlb: Make Xen-SWIOTLB use IOMMU_INIT_* macros.
We utilize the IOMMU_INIT macros to create this dependency:

               [null]
                 |
       [pci_xen_swiotlb_detect]
                 |
       [pci_swiotlb_detect_override]
                 |
       [pci_swiotlb_detect_4gb]

In other words, we set 'pci_xen_swiotlb_detect' to be
the first detection to be run during start.

CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-7-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:14:06 -07:00
Konrad Rzeszutek Wilk
c116c5457c x86, swiotlb: Make SWIOTLB use IOMMU_INIT_* macros.
We utilize the IOMMU_INIT macros to create this dependency:

       [pci_xen_swiotlb_detect]
                 |
       [pci_swiotlb_detect_override]
                 |
       [pci_swiotlb_detect_4gb]

And set the SWIOTLB IOMMU_INIT to utilize 'pci_swiotlb_init'
for .init and 'pci_swiotlb_late_init' for .late_init.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-6-git-send-email-konrad.wilk@oracle.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:13:37 -07:00
Konrad Rzeszutek Wilk
efa631c26d x86, swiotlb: Simplify SWIOTLB pci_swiotlb_detect routine.
In 'pci_swiotlb_detect' we used to do two different things:
 a). If user provided 'iommu=soft' or 'swiotlb=force' we
     would set swiotlb=1 and return 1 (and forcing pci-dma.c
     to call pci_swiotlb_init() immediately).
 b). If 4GB or more would be detected and if user did not specify
     iommu=off, we would set 'swiotlb=1' and return whatever 'a)'
     figured out.

We simplify this by splitting a) and b) in two different routines.

CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-5-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:13:29 -07:00
Konrad Rzeszutek Wilk
5bef80a4b8 x86, iommu: Add proper dependency sort routine (and sanity check).
We are using a very simple sort routine which sorts the .iommu_table
array in the order of dependencies. Specifically each structure
of iommu_table_entry has a field 'depend' which contains the function
pointer to the IOMMU that MUST be run before us. We sort the array
of structures so that the struct iommu_table_entry with no
'depend' field are first, and then the subsequent ones are the
ones for which the 'depend' function has been already invoked
(in other words, precede us).

Using the kernel's version 'sort', which is a mergeheap is
feasible, but would require making the comparison operator
scan recursivly the array to satisfy the "heapify" process: setting the
levels properly. The end result would much more complex than it should
be an it is just much simpler to utilize this simple sort routine.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-4-git-send-email-konrad.wilk@oracle.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:13:19 -07:00
Konrad Rzeszutek Wilk
480125ba49 x86, iommu: Make all IOMMU's detection routines return a value.
We return 1 if the IOMMU has been detected. Zero or an error number
if we failed to find it. This is in preperation of using the IOMMU_INIT
so that we can detect whether an IOMMU is present. I have not
tested this for regression on Calgary, nor on AMD Vi chipsets as
I don't have that hardware.

CC: Muli Ben-Yehuda <muli@il.ibm.com>
CC: "Jon D. Mason" <jdmason@kudzu.us>
CC: "Darrick J. Wong" <djwong@us.ibm.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: David Woodhouse <David.Woodhouse@intel.com>
CC: Chris Wright <chrisw@sous-sol.org>
CC: Yinghai Lu <yinghai@kernel.org>
CC: Joerg Roedel <joerg.roedel@amd.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-3-git-send-email-konrad.wilk@oracle.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:13:13 -07:00
Konrad Rzeszutek Wilk
0444ad93ea x86, iommu: Add IOMMU_INIT macros, .iommu_table section, and iommu_table_entry structure
This patch set adds a mechanism to "modularize" the IOMMUs we have
on X86. Currently the count of IOMMUs is up to six and they have a complex
relationship that requires careful execution order. 'pci_iommu_alloc'
does that today, but most folks are unhappy with how it does it.
This patch set addresses this and also paves a mechanism to jettison
unused IOMMUs during run-time. For details that sparked this, please
refer to: http://lkml.org/lkml/2010/8/2/282

The first solution that comes to mind is to convert wholesale
the IOMMU detection routines to be called during initcall
time frame. Unfortunately that misses the dependency relationship
that some of the IOMMUs have (for example: for AMD-Vi IOMMU to work,
GART detection MUST run first, and before all of that SWIOTLB MUST run).

The second solution would be to introduce a registration call wherein
the IOMMU would provide its detection/init routines and as well on what
MUST run before it. That would work, except that the 'pci_iommu_alloc'
which would run through this list, is called during mem_init. This means we
don't have any memory allocator, and it is so early that we haven't yet
started running through the initcall_t list.

This solution borrows concepts from the 2nd idea and from how
MODULE_INIT works. A macro is provided that each IOMMU uses to define
it's detect function and early_init (before the memory allocate is
active), and as well what other IOMMU MUST run before us.  Since most IOMMUs
depend on having SWIOTLB run first ("pci_swiotlb_detect") a convenience macro
to depends on that is also provided.

This macro is similar in design to MODULE_PARAM macro wherein
we setup a .iommu_table section in which we populate it with the values
that match a struct iommu_table_entry. During bootup we will sort
through the array so that the IOMMUs that MUST run before us are first
elements in the array. And then we just iterate through them calling the
detection routine and if appropiate, the init routines.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
LKML-Reference: <1282845485-8991-2-git-send-email-konrad.wilk@oracle.com>
CC: H. Peter Anvin <hpa@zytor.com>
CC: Fujita Tomonori <fujita.tomonori@lab.ntt.co.jp>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 15:12:53 -07:00
Haicheng Li
9b861528a8 x86-64, mem: Update all PGDs for direct mapping and vmemmap mapping changes
When memory hotplug-adding happens for a large enough area
that a new PGD entry is needed for the direct mapping, the PGDs
of other processes would not get updated. This leads to some CPUs
oopsing like below when they have to access the unmapped areas.

[ 1139.243192] BUG: soft lockup - CPU#0 stuck for 61s! [bash:6534]
[ 1139.243195] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc
dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc
lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib
i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore
[ 1139.243229] irq event stamp: 8538759
[ 1139.243230] hardirqs last  enabled at (8538759): [<ffffffff8100c3fc>] restore_args+0x0/0x30
[ 1139.243236] hardirqs last disabled at (8538757): [<ffffffff810422df>] __do_softirq+0x106/0x146
[ 1139.243240] softirqs last  enabled at (8538758): [<ffffffff81042310>] __do_softirq+0x137/0x146
[ 1139.243245] softirqs last disabled at (8538743): [<ffffffff8100cb5c>] call_softirq+0x1c/0x34
[ 1139.243249] CPU 0:
[ 1139.243250] Modules linked in: ipv6 autofs4 rfcomm l2cap crc16 bluetooth rfkill binfmt_misc
dm_mirror dm_region_hash dm_log dm_multipath dm_mod video output sbs sbshc fan battery ac parport_pc
lp parport joydev usbhid processor thermal thermal_sys container button rtc_cmos rtc_core rtc_lib
i2c_i801 i2c_core pcspkr uhci_hcd ohci_hcd ehci_hcd usbcore
[ 1139.243284] Pid: 6534, comm: bash Tainted: G   M       2.6.32-haicheng-cpuhp #7 QSSC-S4R
[ 1139.243287] RIP: 0010:[<ffffffff810ace35>]  [<ffffffff810ace35>] alloc_arraycache+0x35/0x69
[ 1139.243292] RSP: 0018:ffff8802799f9d78  EFLAGS: 00010286
[ 1139.243295] RAX: ffff8884ffc00000 RBX: ffff8802799f9d98 RCX: 0000000000000000
[ 1139.243297] RDX: 0000000000190018 RSI: 0000000000000001 RDI: ffff8884ffc00010
[ 1139.243300] RBP: ffffffff8100c34e R08: 0000000000000002 R09: 0000000000000000
[ 1139.243303] R10: ffffffff8246dda0 R11: 000000d08246dda0 R12: ffff8802599bfff0
[ 1139.243305] R13: ffff88027904c040 R14: ffff8802799f8000 R15: 0000000000000001
[ 1139.243308] FS:  00007fe81bfe86e0(0000) GS:ffff88000d800000(0000) knlGS:0000000000000000
[ 1139.243311] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1139.243313] CR2: ffff8884ffc00000 CR3: 000000026cf2d000 CR4: 00000000000006f0
[ 1139.243316] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 1139.243318] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 1139.243321] Call Trace:
[ 1139.243324]  [<ffffffff810ace29>] ? alloc_arraycache+0x29/0x69
[ 1139.243328]  [<ffffffff8135004e>] ? cpuup_callback+0x1b0/0x32a
[ 1139.243333]  [<ffffffff8105385d>] ? notifier_call_chain+0x33/0x5b
[ 1139.243337]  [<ffffffff810538a4>] ? __raw_notifier_call_chain+0x9/0xb
[ 1139.243340]  [<ffffffff8134ecfc>] ? cpu_up+0xb3/0x152
[ 1139.243344]  [<ffffffff813388ce>] ? store_online+0x4d/0x75
[ 1139.243348]  [<ffffffff811e53f3>] ? sysdev_store+0x1b/0x1d
[ 1139.243351]  [<ffffffff8110589f>] ? sysfs_write_file+0xe5/0x121
[ 1139.243355]  [<ffffffff810b539d>] ? vfs_write+0xae/0x14a
[ 1139.243358]  [<ffffffff810b587f>] ? sys_write+0x47/0x6f
[ 1139.243362]  [<ffffffff8100b9ab>] ? system_call_fastpath+0x16/0x1b

This patch makes sure to always replicate new direct mapping PGD entries
to the PGDs of all processes, as well as ensures corresponding vmemmap
mapping gets synced.

V1: initial code by Andi Kleen.
V2: fix several issues found in testing.
V3: as suggested by Wu Fengguang, reuse common code of vmalloc_sync_all().

[ hpa: changed pgd_change from int to bool ]

Originally-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
LKML-Reference: <4C6E4FD8.6080100@linux.intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 14:02:33 -07:00
Haicheng Li
6afb5157b9 x86, mm: Separate x86_64 vmalloc_sync_all() into separate functions
No behavior change.

Move some of vmalloc_sync_all() code into a new function
sync_global_pgds() that will be useful for memory hotplug.

Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
LKML-Reference: <4C6E4ECD.1090607@linux.intel.com>
Reviewed-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-26 14:02:29 -07:00
H. Peter Anvin
9ea77bdb39 x86, bios: Make the x86 early memory reservation a kernel option
Add a kernel command-line option so the x86 early memory reservation
size can be adjusted at runtime instead of only at compile time.

Suggested-by: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <tip-d0cd7425fab774a480cce17c2f649984312d0b55@git.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-25 17:10:49 -07:00
Borislav Petkov
acf01734b1 x86, tsc: Remove CPU frequency calibration on AMD
6b37f5a20c introduced the CPU frequency
calibration code for AMD CPUs whose TSCs didn't increment with the
core's P0 frequency. From F10h, revB onward, however, the TSC increment
rate is denoted by MSRC001_0015[24] and when this bit is set (which
should be done by the BIOS) the TSC increments with the P0 frequency
so the calibration is not needed and booting can be a couple of mcecs
faster on those machines.

Besides, there should be virtually no machines out there which don't
have this bit set, therefore this calibration can be safely removed. It
is a shaky hack anyway since it assumes implicitly that the core is in
P0 when BIOS hands off to the OS, which might not always be the case.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100825162823.GE26438@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-25 13:32:52 -07:00
Linus Torvalds
d4348c6789 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag
  tracing/trace_stack: Fix stack trace on ppc64
2010-08-25 10:50:07 -07:00
Linus Torvalds
5e686019df Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states
  sched: Fix rq->clock synchronization when migrating tasks
2010-08-25 08:40:56 -07:00
Lin Ming
8d33091992 perf, x86, Pentium4: Clear the P4_CCCR_FORCE_OVF flag
If on Pentium4 CPUs the FORCE_OVF flag is set then an NMI happens
on every event, which can generate a flood of NMIs. Clear it.

Reported-by: Vince Weaver <vweaver1@eecs.utk.edu>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25 15:15:33 +02:00
Ingo Molnar
7de5d895b2 Merge branch 'linus' into perf/core
Merge reason: pick up perf fixes

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25 13:10:00 +02:00
Lin Ming
04fba67163 perf: Remove unused variable
This fixes the following build warning introduced by the
callchain rework:

  arch/x86/kernel/cpu/perf_event.c:1574: warning: ‘perf_callchain_entry_nmi’ defined but not used

Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <1282718949.16443.75.camel@minggr.sh.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-25 13:09:41 +02:00
Hugh Dickins
b7d4608977 x86, mm: Fix CONFIG_VMSPLIT_1G and 2G_OPT trampoline
rc2 kernel crashes when booting second cpu on this CONFIG_VMSPLIT_2G_OPT
laptop: whereas cloning from kernel to low mappings pgd range does need
to limit by both KERNEL_PGD_PTRS and KERNEL_PGD_BOUNDARY, cloning kernel
pgd range itself must not be limited by the smaller KERNEL_PGD_BOUNDARY.

Signed-off-by: Hugh Dickins <hughd@google.com>
LKML-Reference: <alpine.LSU.2.00.1008242235120.2515@sister.anvils>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-24 23:05:17 -07:00
H. Peter Anvin
d0cd7425fa x86, bios: By default, reserve the low 64K for all BIOSes
The laundry list of BIOSes that need the low 64K reserved is getting
very long, so make it the default across all BIOSes.  This also allows
the code to be simplified and unified with the reservation code for
the first 4K.

This resolves kernel bugzilla 16661 and who knows what else...

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <tip-*@git.kernel.org>
2010-08-24 17:32:04 -07:00
Linus Torvalds
c05e1e23b8 Merge branch 'for-upstream/pvhvm' of git://xenbits.xensource.com/people/ianc/linux-2.6
* 'for-upstream/pvhvm' of git://xenbits.xensource.com/people/ianc/linux-2.6:
  xen: pvhvm: make it clearer that XEN_UNPLUG_* define bits in a bitfield
  xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary
  xen: pvhvm: allow user to request no emulated device unplug
2010-08-23 18:29:18 -07:00
Alok Kataria
b0f4c062fb x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI
VMI was the only user of the alloc_pmd_clone hook, given that VMI
is now removed we can also remove this hook.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1282608357.19396.36.camel@ank32.eng.vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-23 17:09:44 -07:00
Alok Kataria
9863c90f68 x86, vmware: Remove deprecated VMI kernel support
With the recent innovations in CPU hardware acceleration technologies
from Intel and AMD, VMware ran a few experiments to compare these
techniques to guest paravirtualization technique on VMware's platform.
These hardware assisted virtualization techniques have outperformed the
performance benefits provided by VMI in most of the workloads. VMware
expects that these hardware features will be ubiquitous in a couple of
years, as a result, VMware has started a phased retirement of this
feature from the hypervisor.

Please note that VMI has always been an optimization and non-VMI kernels
still work fine on VMware's platform.
Latest versions of VMware's product which support VMI are,
Workstation 7.0 and VSphere 4.0 on ESX side, future maintainence
releases for these products will continue supporting VMI.

For more details about VMI retirement take a look at this,
http://blogs.vmware.com/guestosguide/2009/09/vmi-retirement.html

This feature removal was scheduled for 2.6.37 back in September 2009.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1282600151.19396.22.camel@ank32.eng.vmware.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-23 15:18:50 -07:00
Ma Ling
59daa706fb x86, mem: Optimize memcpy by avoiding memory false dependece
All read operations after allocation stage can run speculatively,
all write operation will run in program order, and if addresses are
different read may run before older write operation, otherwise wait
until write commit. However CPU don't check each address bit,
so read could fail to recognize different address even they
are in different page.For example if rsi is 0xf004, rdi is 0xe008,
in following operation there will generate big performance latency.
1. movq (%rsi),	%rax
2. movq %rax,	(%rdi)
3. movq 8(%rsi), %rax
4. movq %rax,	8(%rdi)

If %rsi and rdi were in really the same meory page, there are TRUE
read-after-write dependence because instruction 2 write 0x008 and
instruction 3 read 0x00c, the two address are overlap partially.
Actually there are in different page and no any issues,
but without checking each address bit CPU could think they are
in the same page, and instruction 3 have to wait for instruction 2
to write data into cache from write buffer, then load data from cache,
the cost time read spent is equal to mfence instruction. We may avoid it by
tuning operation sequence as follow.

1. movq 8(%rsi), %rax
2. movq %rax,	8(%rdi)
3. movq (%rsi),	%rax
4. movq %rax,	(%rdi)

Instruction 3 read 0x004, instruction 2 write address 0x010, no any
dependence.  At last on Core2 we gain 1.83x speedup compared with
original instruction sequence.  In this patch we first handle small
size(less 20bytes), then jump to different copy mode. Based on our
micro-benchmark small bytes from 1 to 127 bytes, we got up to 2X
improvement, and up to 1.5X improvement for 1024 bytes on Corei7.  (We
use our micro-benchmark, and will do further test according to your
requirment)

Signed-off-by: Ma Ling <ling.ma@intel.com>
LKML-Reference: <1277753065-18610-1-git-send-email-ling.ma@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-23 14:56:41 -07:00
Ma, Ling
fdf4289679 x86, mem: Don't implement forward memmove() as memcpy()
memmove() allow source and destination address to be overlap, but
there is no such limitation for memcpy().  Therefore, explicitly
implement memmove() in both the forwards and backward directions, to
give us the ability to optimize memcpy().

Signed-off-by: Ma Ling <ling.ma@intel.com>
LKML-Reference: <C10D3FB0CD45994C8A51FEC1227CE22F0E483AD86A@shsmsx502.ccr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-23 14:14:27 -07:00
Shaohua Li
61c77326d1 x86, mm: Avoid unnecessary TLB flush
In x86, access and dirty bits are set automatically by CPU when CPU accesses
memory. When we go into the code path of below flush_tlb_fix_spurious_fault(),
we already set dirty bit for pte and don't need flush tlb. This might mean
tlb entry in some CPUs hasn't dirty bit set, but this doesn't matter. When
the CPUs do page write, they will automatically check the bit and no software
involved.

On the other hand, flush tlb in below position is harmful. Test creates CPU
number of threads, each thread writes to a same but random address in same vma
range and we measure the total time. Under a 4 socket system, original time is
1.96s, while with the patch, the time is 0.8s. Under a 2 socket system, there is
20% time cut too. perf shows a lot of time are taking to send ipi/handle ipi for
tlb flush.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <20100816011655.GA362@sli10-desk.sh.intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Andrea Archangeli <aarcange@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-23 10:04:57 -07:00
Ian Campbell
1dc7ce99b0 xen: pvhvm: rename xen_emul_unplug=ignore to =unnnecessary
It is not immediately clear what this option causes to become
ignored. The actual meaning is that it is not necessary to unplug the
emulated devices to safely use the PV ones, even if the platform does
not support the unplug protocol. (pressumably the user will only add
this option if they have ensured that their domain configuration is
safe).

I think xen_emul_unplug=unnecessary better captures this.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
2010-08-23 11:59:29 +01:00
Ian Campbell
c93a4dfb31 xen: pvhvm: allow user to request no emulated device unplug
this allows the user to disable pvhvm and revert to emulated devices
in case of a system misconfiguration (e.g. initramfs with only
emulated drivers in it).

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: Stefano Stabellini <Stefano.Stabellini@eu.citrix.com>
2010-08-23 11:59:28 +01:00
Linus Torvalds
3dc8d7f07e Merge branch 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.36' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: PIT: free irq source id in handling error path
  KVM: destroy workqueue on kvm_create_pit() failures
  KVM: fix poison overwritten caused by using wrong xstate size
2010-08-22 11:27:36 -07:00
Samuel Thibault
ddb0c5a689 Replace Configure with Enable in description of MAXSMP
The "Configure" word tends to make user believe they have to say 'yes'
to be able to choose the number of procs/nodes.  "Enable" should be
unambiguous enough.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-21 12:38:58 -07:00
Sergey Senozhatsky
51e3c1b558 x86, hwmon: Fix unsafe smp_processor_id() in thermal_throttle_add_dev
Fix BUG: using smp_processor_id() in preemptible thermal_throttle_add_dev.
We know the cpu number when calling thermal_throttle_add_dev, so we can
remove smp_processor_id call in thermal_throttle_add_dev by supplying
the cpu number as argument.

This should resolve kernel bugzilla 16615/16629.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
LKML-Reference: <20100820073634.GB5209@swordfish.minsk.epam.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Joerg Roedel <Joerg.Roedel@amd.com>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-20 19:56:00 -07:00
Linus Torvalds
36423a5ed5 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, apic: Fix apic=debug boot crash
  x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
  x86-32: Fix dummy trampoline-related inline stubs
  x86-32: Separate 1:1 pagetables from swapper_pg_dir
  x86, cpu: Fix regression in AMD errata checking code
2010-08-20 14:25:08 -07:00
Peter Zijlstra
ed80526166 perf: Remove superfluous return values from perf_callchain_*()
Fixes these build warnings introduced by the callchain
rework:

 arch/x86/kernel/cpu/perf_event.c: In function ‘perf_callchain_kernel’:
 arch/x86/kernel/cpu/perf_event.c:1646: warning: ‘return’ with a value, in function returning void
 arch/x86/kernel/cpu/perf_event.c: In function ‘perf_callchain_user’:
 arch/x86/kernel/cpu/perf_event.c:1699: warning: ‘return’ with a value, in function returning void
 arch/x86/kernel/cpu/perf_event.c: At top level:
 arch/x86/kernel/cpu/perf_event.c:1607: warning: ‘perf_callchain_entry_nmi’ defined but not used

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-20 14:59:39 +02:00
Suresh Siddha
cd7240c0b9 x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states
TSC's get reset after suspend/resume (even on cpu's with invariant TSC
which runs at a constant rate across ACPI P-, C- and T-states). And in
some systems BIOS seem to reinit TSC to arbitrary large value (still
sync'd across cpu's) during resume.

This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less
than rq->age_stamp (introduced in 2.6.32). This leads to a big value
returned by scale_rt_power() and the resulting big group power set by the
update_group_power() is causing improper load balancing between busy and
idle cpu's after suspend/resume.

This resulted in multi-threaded workloads (like kernel-compilation) go
slower after suspend/resume cycle on core i5 laptops.

Fix this by recomputing cyc2ns_offset's during resume, so that
sched_clock() continues from the point where it was left off during
suspend.

Reported-by: Florian Pritz <flo@xssn.at>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: <stable@kernel.org> # [v2.6.32+]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-20 14:59:02 +02:00
Daniel Kiper
05e407603e x86, apic: Fix apic=debug boot crash
Fix a boot crash when apic=debug is used and the APIC is
not properly initialized.

This issue appears during Xen Dom0 kernel boot but the
fix is generic and the crash could occur on real hardware
as well.

Signed-off-by: Daniel Kiper <dkiper@net-space.pl>
Cc: xen-devel@lists.xensource.com
Cc: konrad.wilk@oracle.com
Cc: jeremy@goop.org
Cc: <stable@kernel.org> # .35.x, .34.x, .33.x, .32.x
LKML-Reference: <20100819224616.GB9967@router-fw-old.local.net-space.pl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-20 10:18:28 +02:00
Borislav Petkov
d7c53c9e82 x86, hotplug: Serialize CPU hotplug to avoid bringup concurrency issues
When testing cpu hotplug code on 32-bit we kept hitting the "CPU%d:
Stuck ??" message due to multiple cores concurrently accessing the
cpu_callin_mask, among others.

Since these codepaths are not protected from concurrent access due to
the fact that there's no sane reason for making an already complex
code unnecessarily more complex - we hit the issue only when insanely
switching cores off- and online - serialize hotplugging cores on the
sysfs level and be done with it.

[ v2.1: fix !HOTPLUG_CPU build ]

Cc: <stable@kernel.org>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20100819181029.GC17171@aftab>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-19 14:47:43 -07:00
Linus Torvalds
b3ea36b7a2 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kprobes/x86: Fix the return address of multiple kretprobes
  perf tools: Fix build error on read only source.
  perf, x86: Fix Intel-nhm PMU programming errata workaround
2010-08-19 09:06:49 -07:00
KUMANO Syuhei
737480a0d5 kprobes/x86: Fix the return address of multiple kretprobes
Fix the return address of subsequent kretprobes when multiple
kretprobes are set on the same function.

For example:

 # cd /sys/kernel/debug/tracing
 # echo "r:event1 sys_symlink" > kprobe_events
 # echo "r:event2 sys_symlink" >> kprobe_events
 # echo 1 > events/kprobes/enable
 # ln -s /tmp/foo /tmp/bar

(without this patch)

 # cat trace
              ln-897   [000] 20404.133727: event1: (kretprobe_trampoline+0x0/0x4c <- sys_symlink)
              ln-897   [000] 20404.133747: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)

(with this patch)

 # cat trace
              ln-740   [000] 13799.491076: event1: (system_call_fastpath+0x16/0x1b <- sys_symlink)
              ln-740   [000] 13799.491096: event2: (system_call_fastpath+0x16/0x1b <- sys_symlink)

Signed-off-by: KUMANO Syuhei <kumano.prog@gmail.com>
Reviewed-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
LKML-Reference: <1281853084.3254.11.camel@camp10-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-19 12:49:56 +02:00
Ingo Molnar
c8710ad389 Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/core 2010-08-19 12:48:09 +02:00
Frederic Weisbecker
927c7a9e92 perf: Fix race in callchains
Now that software events don't have interrupt disabled anymore in
the event path, callchains can nest on any context. So seperating
nmi and others contexts in two buffers has become racy.

Fix this by providing one buffer per nesting level. Given the size
of the callchain entries (2040 bytes * 4), we now need to allocate
them dynamically.

v2: Fixed put_callchain_entry call after recursion.
    Fix the type of the recursion, it must be an array.

v3: Use a manual pr cpu allocation (temporary solution until NMIs
    can safely access vmalloc'ed memory).
    Do a better separation between callchain reference tracking and
    allocation. Make the "put" path lockless for non-release cases.

v4: Protect the callchain buffers with rcu.

v5: Do the cpu buffers allocations node affine.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: David Miller <davem@davemloft.net>
Cc: Borislav Petkov <bp@amd64.org>
2010-08-19 01:32:31 +02:00
Frederic Weisbecker
f72c1a931e perf: Factorize callchain context handling
Store the kernel and user contexts from the generic layer instead
of archs, this gathers some repetitive code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
2010-08-19 01:32:11 +02:00
Frederic Weisbecker
56962b4449 perf: Generalize some arch callchain code
- Most archs use one callchain buffer per cpu, except x86 that needs
  to deal with NMIs. Provide a default perf_callchain_buffer()
  implementation that x86 overrides.

- Centralize all the kernel/user regs handling and invoke new arch
  handlers from there: perf_callchain_user() / perf_callchain_kernel()
  That avoid all the user_mode(), current->mm checks and so...

- Invert some parameters in perf_callchain_*() helpers: entry to the
  left, regs to the right, following the traditional (dst, src).

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
2010-08-19 01:30:59 +02:00
Frederic Weisbecker
70791ce9ba perf: Generalize callchain_store()
callchain_store() is the same on every archs, inline it in
perf_event.h and rename it to perf_callchain_store() to avoid
any collision.

This removes repetitive code.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
2010-08-19 01:30:11 +02:00
Frederic Weisbecker
c1a65932fd perf: Drop unappropriate tests on arch callchains
Drop the TASK_RUNNING test on user tasks for callchains as
this check doesn't seem to make any sense.

Also remove the tests for !current that is not supposed to
happen and current->pid as this should be handled at the
generic level, with exclude_idle attribute.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Tested-by: Will Deacon <will.deacon@arm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Miller <davem@davemloft.net>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Borislav Petkov <bp@amd64.org>
2010-08-19 01:29:35 +02:00
H. Peter Anvin
8848a91068 x86-32: Fix dummy trampoline-related inline stubs
Fix dummy inline stubs for trampoline-related functions when no
trampolines exist (until we get rid of the no-trampoline case
entirely.)

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <4C6C294D.3030404@zytor.com>
2010-08-18 12:42:24 -07:00
Joerg Roedel
fd89a13792 x86-32: Separate 1:1 pagetables from swapper_pg_dir
This patch fixes machine crashes which occur when heavily exercising the
CPU hotplug codepaths on a 32-bit kernel. These crashes are caused by
AMD Erratum 383 and result in a fatal machine check exception. Here's
the scenario:

1. On 32-bit, the swapper_pg_dir page table is used as the initial page
table for booting a secondary CPU.

2. To make this work, swapper_pg_dir needs a direct mapping of physical
memory in it (the low mappings). By adding those low, large page (2M)
mappings (PAE kernel), we create the necessary conditions for Erratum
383 to occur.

3. Other CPUs which do not participate in the off- and onlining game may
use swapper_pg_dir while the low mappings are present (when leave_mm is
called). For all steps below, the CPU referred to is a CPU that is using
swapper_pg_dir, and not the CPU which is being onlined.

4. The presence of the low mappings in swapper_pg_dir can result
in TLB entries for addresses below __PAGE_OFFSET to be established
speculatively. These TLB entries are marked global and large.

5. When the CPU with such TLB entry switches to another page table, this
TLB entry remains because it is global.

6. The process then generates an access to an address covered by the
above TLB entry but there is a permission mismatch - the TLB entry
covers a large global page not accessible to userspace.

7. Due to this permission mismatch a new 4kb, user TLB entry gets
established. Further, Erratum 383 provides for a small window of time
where both TLB entries are present. This results in an uncorrectable
machine check exception signalling a TLB multimatch which panics the
machine.

There are two ways to fix this issue:

        1. Always do a global TLB flush when a new cr3 is loaded and the
        old page table was swapper_pg_dir. I consider this a hack hard
        to understand and with performance implications

        2. Do not use swapper_pg_dir to boot secondary CPUs like 64-bit
        does.

This patch implements solution 2. It introduces a trampoline_pg_dir
which has the same layout as swapper_pg_dir with low_mappings. This page
table is used as the initial page table of the booting CPU. Later in the
bringup process, it switches to swapper_pg_dir and does a global TLB
flush. This fixes the crashes in our test cases.

-v2: switch to swapper_pg_dir right after entering start_secondary() so
that we are able to access percpu data which might not be mapped in the
trampoline page table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <20100816123833.GB28147@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-18 09:17:20 -07:00
Hans Rosenfeld
07a7795ca2 x86, cpu: Fix regression in AMD errata checking code
A bug in the family-model-stepping matching code caused the presence of
errata to go undetected when OSVW was not used. This causes hangs on
some K8 systems because the E400 workaround is not enabled.

Signed-off-by: Hans Rosenfeld <hans.rosenfeld@amd.com>
LKML-Reference: <1282141190-930137-1-git-send-email-hans.rosenfeld@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-18 09:16:28 -07:00
Zhang, Yanmin
351af0725e perf, x86: Fix Intel-nhm PMU programming errata workaround
Fix the Errata AAK100/AAP53/BD53 workaround, the officialy documented
workaround we implemented in:

 11164cd: perf, x86: Add Nehelem PMU programming errata workaround

doesn't actually work fully and causes a stuck PMU state
under load and non-functioning perf profiling.

A functional workaround was found by trial & error.

Affects all Nehalem-class Intel PMUs.

Signed-off-by: Zhang Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1281073148.2125.63.camel@ymzhang.sh.intel.com>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: <stable@kernel.org> # .35.x
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-18 11:17:39 +02:00
Linus Torvalds
392abeea52 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  vt,console,kdb: preserve console_blanked while in kdb
  vt: fix regression warnings from KMS merge
  arm,kgdb: fix GDB_MAX_REGS no longer used
  kgdb: add missing __percpu markup in arch/x86/kernel/kgdb.c
  kdb: fix compile error without CONFIG_KALLSYMS
2010-08-17 18:36:19 -07:00
David Howells
d7627467b7 Make do_execve() take a const filename pointer
Make do_execve() take a const filename pointer so that kernel_execve() compiles
correctly on ARM:

arch/arm/kernel/sys_arm.c:88: warning: passing argument 1 of 'do_execve' discards qualifiers from pointer target type

This also requires the argv and envp arguments to be consted twice, once for
the pointer array and once for the strings the array points to.  This is
because do_execve() passes a pointer to the filename (now const) to
copy_strings_kernel().  A simpler alternative would be to cast the filename
pointer in do_execve() when it's passed to copy_strings_kernel().

do_execve() may not change any of the strings it is passed as part of the argv
or envp lists as they are some of them in .rodata, so marking these strings as
const should be fine.

Further kernel_execve() and sys_execve() need to be changed to match.

This has been test built on x86_64, frv, arm and mips.

Signed-off-by: David Howells <dhowells@redhat.com>
Tested-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-17 18:07:43 -07:00
Jesse Barnes
23b90cfd7b x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set
Otherwise we'll duplicate definitions with the pci.h stubs.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-17 09:29:36 -07:00
Xiao Guangrong
6b5d7a9f6f KVM: PIT: free irq source id in handling error path
Free irq source id if create pit workqueue fail

Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-17 12:04:23 +03:00
Namhyung Kim
8c8aefce93 kgdb: add missing __percpu markup in arch/x86/kernel/kgdb.c
breakinfo->pev is a pointer to percpu pointer but was missing __percpu markup.
Add it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-08-16 15:58:30 -05:00
Linus Torvalds
2245ba2a3a Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6
* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6:
  gcc-4.6: ACPI: fix unused but set variables in ACPI
  ACPI thermal: make procfs I/F depend on CONFIG_ACPI_PROCFS
  ACPI video: make procfs I/F depend on CONFIG_ACPI_PROCFS
  ACPI processor: remove deprecated ACPI procfs I/F
  ACPI power_resource: remove unused procfs I/F
  ACPI: remove deprecated ACPI procfs I/F
  ACPI: introduce drivers/acpi/sysfs.c
  ACPI: introduce module parameter acpi.aml_debug_output
  ACPI: introduce drivers/acpi/debugfs.c
  ACPI, APEI, ERST debug support
  ACPI, APEI, Manage GHES as platform devices
  ACPI, APEI, Rename CPER and GHES severity constants
  ACPI, APEI, Fix a typo of error path of apei_resources_request
  ACPI / ACPICA: Fix reference counting problems with GPE handlers
  ACPI: Add the check of ADR flag in course of finding ACPI handle for PCI device
  ACPI / Sleep: Drop acpi_suspend_finish()
  ACPI / Sleep: Consolidate suspend and hibernation routines
  ACPI / Wakeup: Simplify enabling of wakeup devices
  ACPI / Sleep: Rework enabling wakeup devices
  ACPI / Sleep: Free NVS copy if suspending of devices fails

Fixed up totally buggered "ACPI: fix unused but set variables in ACPI"
patch that doesn't even compile in the merge.

Thanks to Sedat Dilek <sedat.dilek@googlemail.com> for noticing the
breakage before I even pulled.  And a big "Grrr.." at Len for not even
bothering to compile the tree before asking me to pull.
2010-08-15 17:37:07 -07:00
Xiaotian Feng
3185bf8c23 KVM: destroy workqueue on kvm_create_pit() failures
kernel needs to destroy workqueue if kvm_create_pit() fails, otherwise
after pit is freed, the workqueue is leaked.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Gregory Haskins <ghaskins@novell.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-15 14:17:35 +03:00
Xiaotian Feng
f45755b834 KVM: fix poison overwritten caused by using wrong xstate size
fpu.state is allocated from task_xstate_cachep, the size of task_xstate_cachep
is xstate_size. xstate_size is set from cpuid instruction, which is often
smaller than sizeof(struct xsave_struct). kvm is using sizeof(struct xsave_struct)
to fill in/out fpu.state.xsave, as what we allocated for fpu.state is
xstate_size, kernel will write out of memory and caused poison/redzone/padding
overwritten warnings.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Reviewed-by: Sheng Yang <sheng@linux.intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Sheng Yang <sheng@linux.intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-08-15 14:10:15 +03:00
Len Brown
95ee46aa86 Merge branch 'linus' into release
Conflicts:
	drivers/acpi/debug.c

Signed-off-by: Len Brown <len.brown@intel.com>
2010-08-15 01:06:31 -04:00
Sam Ravnborg
8b1bb90701 defconfig reduction
Use the defconfig files generated by "make savedefconfig" for
remaining defconfig files.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2010-08-14 22:26:53 +02:00
Sam Ravnborg
bf56fba670 archs: replace unifdef-y with header-y
unifdef-y and header-y have same semantic, so drop unifdef-y

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
2010-08-14 22:26:51 +02:00
Linus Torvalds
c206d44ffd Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: Make kdump avoid stack dumps - fix !CONFIG_KEXEC breakage
  x86, UV: Initialize BAU hub map
  x86, UV: Make kdump avoid stack dumps
2010-08-13 18:00:25 -07:00
Linus Torvalds
78b148696d Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] acpi-cpufreq: add missing __percpu markup
2010-08-13 17:59:09 -07:00
David Howells
c788732523 Mark arguments to certain syscalls as being const
Mark arguments to certain system calls as being const where they should be but
aren't.  The list includes:

 (*) The filename arguments of various stat syscalls, execve(), various utimes
     syscalls and some mount syscalls.

 (*) The filename arguments of some syscall helpers relating to the above.

 (*) The buffer argument of various write syscalls.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-13 16:53:13 -07:00
Linus Torvalds
4b17cafaa4 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
  perf: Add back list_head data types
  perf ui hist browser: Fixup key bindings
  perf ui browser: Add ui_browser__show counterpart: __hide
  perf annotate: Cycle thru sorted lines with samples
  perf ui: Make SPACE work as PGDN in all browsers
  perf annotate: Sort by hottest lines in the TUI
  perf ui: Complete the breakdown of util/newt.c
  perf ui: Move hists browser to util/ui/browsers/
  perf symbols: Ignore mapping symbols on ARM
  perf ui: Move map browser to util/ui/browsers/
  perf ui: Move annotate browser to util/ui/browsers/
  perf ui: Move ui_progress routines to separate file in util/ui/
  perf ui: Move ui_helpline routines to separate file in util/ui/
  perf ui: Shorten ui_browser member names
  perf, x86: P4 PMU -- update nmi irq statistics and unmask lvt entry properly
  perf ui: Start breaking down newt.c into multiple files
  perf tui: Introduce list_head based generic ui_browser refresh routine
  perf probe: Fix memory leaks in add_perf_probe_events
  perf probe: Fix to copy the type for raw parameters
  perf report: Speed up exit path
  ...
2010-08-13 10:39:30 -07:00
Linus Torvalds
36450e9c95 Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, UV: Initialize BAU MMRs only on hubs with cpus
  x86, UV: Modularize BAU send and wait
  x86, UV: BAU broadcast to the local hub
  x86, UV: Correct BAU regular message type
  x86, UV: Remove BAU check for stay-busy
  x86, UV: Correct BAU discovery of hubs and sockets
  x86, UV: Correct BAU software acknowledge
  x86, UV: BAU structure rearranging
  x86, UV: Shorten access to BAU statistics structure
  x86, UV: Disable BAU on network congestion
  x86, UV: BAU tunables into a debugfs file
  x86, UV: Calculate BAU destination timeout
2010-08-13 10:38:37 -07:00
Linus Torvalds
c029b55af7 Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, asm: Use a lower case name for the end macro in atomic64_386_32.S
  x86, asm: Refactor atomic64_386_32.S to support old binutils and be cleaner
  x86: Document __phys_reloc_hide() usage in __pa_symbol()
  x86, apic: Map the local apic when parsing the MP table.
2010-08-13 10:35:48 -07:00
Linus Torvalds
9605456919 x86: don't send SIGBUS for kernel page faults
It's wrong for several reasons, but the most direct one is that the
fault may be for the stack accesses to set up a previous SIGBUS.  When
we have a kernel exception, the kernel exception handler does all the
fixups, not some user-level signal handler.

Even apart from the nested SIGBUS issue, it's also wrong to give out
kernel fault addresses in the signal handler info block, or to send a
SIGBUS when a system call already returns EFAULT.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-13 09:49:20 -07:00
Namhyung Kim
3f6c4df7e1 [CPUFREQ] acpi-cpufreq: add missing __percpu markup
acpi_perf_data is a percpu pointer but was missing __percpu markup.
Add it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-13 11:11:49 -04:00
Robert Richter
1f999ab5a1 x86, xsave: Disable xsave in i387 emulation mode
xsave is broken for (!HAVE_HWFP). This is the case if config
MATH_EMULATION is enabled, 'no387' kernel parameter is set and xsave
exists. xsave will not work because x86/math-emu and xsave share the
same memory. As this case can be treated as corner case we simply
disable xsave then.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-7-git-send-email-robert.richter@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-12 14:16:54 -07:00
Robert Richter
f6e9456c92 x86, cleanup: Remove obsolete boot_cpu_id variable
boot_cpu_id is there for historical reasons and was renamed to
boot_cpu_physical_apicid in patch:

 c70dcb7 x86: change boot_cpu_id to boot_cpu_physical_apicid

However, there are some remaining occurrences of boot_cpu_id that are
never touched in the kernel and thus its value is always 0.

This patch removes boot_cpu_id completely.

Signed-off-by: Robert Richter <robert.richter@amd.com>
LKML-Reference: <1279731838-1522-8-git-send-email-robert.richter@amd.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-12 14:01:38 -07:00
Ingo Molnar
f46a680413 Merge branch 'linus' into perf/urgent
Merge reason: Fix upstream breakage introduced by:

 de5d9bf: Move list types from <linux/list.h> to <linux/types.h>.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-12 21:39:04 +02:00
Cliff Wickman
1d6225e8cc x86, UV: Make kdump avoid stack dumps - fix !CONFIG_KEXEC breakage
This replaces Version 1 of this patch, which broke the build when
CONFIG_KEXEC and CONFIG_CRASH_DUMP were configured off.  In that case
the storage for the 'in_crash_kexec' flag was never built.

This version defines that flag as 0 if CONFIG_KEXEC is not set.
The patch is tested with all combinations of those two options.

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <E1OiZcw-0001Hb-2g@eag09.americas.sgi.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-12 12:23:55 -07:00
Linus Torvalds
4032816dca Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] add missing __percpu markup in pcc-cpufreq.c
2010-08-12 10:07:11 -07:00
Chris Wilson
4936a3b90d x86/hpet: Use the FSEC_PER_SEC constant for femto-second periods
The current computation, introduced with f12a15be63, of FSEC_PER_SEC using
the multiplication of (FSEC_PER_NSEC * NSEC_PER_SEC) is performed only
with 32bit integers on small machines, resulting in an overflow and a
*very* short intervals being programmed.  An interrupt storm follows.

Note that we also have to specify FSEC_PER_SEC as being long long to
overcome the same limitations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: John Stultz <johnstul@us.ibm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Ingo Molnar <mingo@elte.hu>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-12 09:53:39 -07:00
Namhyung Kim
a3da323420 [CPUFREQ] add missing __percpu markup in pcc-cpufreq.c
pcc_cpu_info is a percpu pointer but was missing __percpu markup.
Add it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-12 12:38:06 -04:00
Linus Torvalds
26f0cf9181 Merge branch 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
* 'stable/xen-swiotlb-0.8.6' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
  x86: Detect whether we should use Xen SWIOTLB.
  pci-swiotlb-xen: Add glue code to setup dma_ops utilizing xen_swiotlb_* functions.
  swiotlb-xen: SWIOTLB library for Xen PV guest with PCI passthrough.
  xen/mmu: inhibit vmap aliases rather than trying to clear them out
  vmap: add flag to allow lazy unmap to be disabled at runtime
  xen: Add xen_create_contiguous_region
  xen: Rename the balloon lock
  xen: Allow unprivileged Xen domains to create iomap pages
  xen: use _PAGE_IOMAP in ioremap to do machine mappings

Fix up trivial conflicts (adding both xen swiotlb and xen pci platform
driver setup close to each other) in drivers/xen/{Kconfig,Makefile} and
include/xen/xen-ops.h
2010-08-12 09:09:41 -07:00
Luca Barbieri
417484d47e x86, asm: Use a lower case name for the end macro in atomic64_386_32.S
Use a lowercase name for the end macro, which somehow fixes a binutils 2.16
problem.

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
LKML-Reference: <tip-30246557a06bb20618bed906a06d1e1e0faa8bb4@git.kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-12 07:04:16 -07:00
Luca Barbieri
30246557a0 x86, asm: Refactor atomic64_386_32.S to support old binutils and be cleaner
The old code didn't work on binutils 2.12 because setting a symbol to
a register apparently requires a fairly recent version.

This commit refactors the code to use the C preprocessor instead, and
in the process makes the whole code a bit easier to understand.

The object code produced is unchanged as expected.

This fixes kernel bugzilla 16506.

Reported-by: Dieter Stussy <kd6lvw+software@kd6lvw.ampr.org>
Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: <stable@kernel.org> 2.6.35
LKML-Reference: <tip-*@git.kernel.org>
2010-08-11 21:03:28 -07:00
FUJITA Tomonori
3b9c6c11f5 dma-mapping: remove dma_is_consistent API
Architectures implement dma_is_consistent() in different ways (some
misinterpret the definition of API in DMA-API.txt).  So it hasn't been so
useful for drivers.  We have only one user of the API in tree.  Unlikely
out-of-tree drivers use the API.

Even if we fix dma_is_consistent() in some architectures, it doesn't look
useful at all.  It was invented long ago for some old systems that can't
allocate coherent memory at all.  It's better to export only APIs that are
definitely necessary for drivers.

Let's remove this API.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:21 -07:00
FUJITA Tomonori
4565f0170d dma-mapping: unify dma_get_cache_alignment implementations
dma_get_cache_alignment returns the minimum DMA alignment.  Architectures
defines it as ARCH_DMA_MINALIGN (formally ARCH_KMALLOC_MINALIGN).  So we
can unify dma_get_cache_alignment implementations.

Note that some architectures implement dma_get_cache_alignment wrongly.
dma_get_cache_alignment() should return the minimum DMA alignment.  So
fully-coherent architectures should return 1.  This patch also fixes this
issue.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:21 -07:00
Namhyung Kim
8fd49936a8 x86: Document __phys_reloc_hide() usage in __pa_symbol()
Until all supported versions of gcc recognize
-fno-strict-overflow, we should keep the RELOC_HIDE() magic in
__pa_symbol(). Comment it.

Signed-off-by: Namhyung Kim <namhyung@gmail.com>
LKML-Reference: <1281508661-29507-1-git-send-email-namhyung@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-11 08:43:49 +02:00
Linus Torvalds
8cbd84f2dd x86: fix up system call numbering nit
As pointed out by Jiri Slaby: when I resolved the the 32-bit x85 system
call entry tables for prlimit (due to the conflict with fanotify), I
forgot to add the numbering in comments that we do for every fifth entry.

Reported-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-10 15:35:10 -07:00
Linus Torvalds
2f9e825d3e Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
  block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
  xen-blkfront: fix missing out label
  blkdev: fix blkdev_issue_zeroout return value
  block: update request stacking methods to support discards
  block: fix missing export of blk_types.h
  writeback: fix bad _bh spinlock nesting
  drbd: revert "delay probes", feature is being re-implemented differently
  drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
  drbd: Disable delay probes for the upcomming release
  writeback: cleanup bdi_register
  writeback: add new tracepoints
  writeback: remove unnecessary init_timer call
  writeback: optimize periodic bdi thread wakeups
  writeback: prevent unnecessary bdi threads wakeups
  writeback: move bdi threads exiting logic to the forker thread
  writeback: restructure bdi forker loop a little
  writeback: move last_active to bdi
  writeback: do not remove bdi from bdi_list
  writeback: simplify bdi code a little
  writeback: do not lose wake-ups in bdi threads
  ...

Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
2010-08-10 15:22:42 -07:00
Linus Torvalds
b34d8915c4 Merge branch 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux
* 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux:
  unistd: add __NR_prlimit64 syscall numbers
  rlimits: implement prlimit64 syscall
  rlimits: switch more rlimit syscalls to do_prlimit
  rlimits: redo do_setrlimit to more generic do_prlimit
  rlimits: add rlimit64 structure
  rlimits: do security check under task_lock
  rlimits: allow setrlimit to non-current tasks
  rlimits: split sys_setrlimit
  rlimits: selinux, do rlimits changes under task_lock
  rlimits: make sure ->rlim_max never grows in sys_setrlimit
  rlimits: add task_struct to update_rlimit_cpu
  rlimits: security, add task_struct to setrlimit

Fix up various system call number conflicts.  We not only added fanotify
system calls in the meantime, but asm-generic/unistd.h added a wait4
along with a range of reserved per-architecture system calls.
2010-08-10 12:07:51 -07:00
Linus Torvalds
8c8946f509 Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify
* 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits)
  fanotify: use both marks when possible
  fsnotify: pass both the vfsmount mark and inode mark
  fsnotify: walk the inode and vfsmount lists simultaneously
  fsnotify: rework ignored mark flushing
  fsnotify: remove global fsnotify groups lists
  fsnotify: remove group->mask
  fsnotify: remove the global masks
  fsnotify: cleanup should_send_event
  fanotify: use the mark in handler functions
  audit: use the mark in handler functions
  dnotify: use the mark in handler functions
  inotify: use the mark in handler functions
  fsnotify: send fsnotify_mark to groups in event handling functions
  fsnotify: Exchange list heads instead of moving elements
  fsnotify: srcu to protect read side of inode and vfsmount locks
  fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
  fsnotify: use _rcu functions for mark list traversal
  fsnotify: place marks on object in order of group memory address
  vfs/fsnotify: fsnotify_close can delay the final work in fput
  fsnotify: store struct file not struct path
  ...

Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.
2010-08-10 11:39:13 -07:00
Suresh Siddha
d7a7c57393 x86, ia64, smp: use workqueues unconditionally during do_boot_cpu()
Workqueues are now initialized as part of the early_initcall().  So they
are available for use during cold boot process aswell.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Tony Luck <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:45:06 -07:00
Andi Kleen
4e60c86bd9 gcc-4.6: mm: fix unused but set warnings
No real bugs, just some dead code and some fixups.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:44:58 -07:00
Cesar Eduardo Barros
597781f3e5 kmap_atomic: make kunmap_atomic() harder to misuse
kunmap_atomic() is currently at level -4 on Rusty's "Hard To Misuse"
list[1] ("Follow common convention and you'll get it wrong"), except in
some architectures when CONFIG_DEBUG_HIGHMEM is set[2][3].

kunmap() takes a pointer to a struct page; kunmap_atomic(), however, takes
takes a pointer to within the page itself.  This seems to once in a while
trip people up (the convention they are following is the one from
kunmap()).

Make it much harder to misuse, by moving it to level 9 on Rusty's list[4]
("The compiler/linker won't let you get it wrong").  This is done by
refusing to build if the type of its first argument is a pointer to a
struct page.

The real kunmap_atomic() is renamed to kunmap_atomic_notypecheck()
(which is what you would call in case for some strange reason calling it
with a pointer to a struct page is not incorrect in your code).

The previous version of this patch was compile tested on x86-64.

[1] http://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html
[2] In these cases, it is at level 5, "Do it right or it will always
    break at runtime."
[3] At least mips and powerpc look very similar, and sparc also seems to
    share a common ancestor with both; there seems to be quite some
    degree of copy-and-paste coding here. The include/asm/highmem.h file
    for these three archs mention x86 CPUs at its top.
[4] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
[5] As an aside, could someone tell me why mn10300 uses unsigned long as
    the first parameter of kunmap_atomic() instead of void *?

Signed-off-by: Cesar Eduardo Barros <cesarb@cesarb.net>
Cc: Russell King <linux@arm.linux.org.uk> (arch/arm)
Cc: Ralf Baechle <ralf@linux-mips.org> (arch/mips)
Cc: David Howells <dhowells@redhat.com> (arch/frv, arch/mn10300)
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com> (arch/mn10300)
Cc: Kyle McMartin <kyle@mcmartin.ca> (arch/parisc)
Cc: Helge Deller <deller@gmx.de> (arch/parisc)
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org> (arch/parisc)
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> (arch/powerpc)
Cc: Paul Mackerras <paulus@samba.org> (arch/powerpc)
Cc: "David S. Miller" <davem@davemloft.net> (arch/sparc)
Cc: Thomas Gleixner <tglx@linutronix.de> (arch/x86)
Cc: Ingo Molnar <mingo@redhat.com> (arch/x86)
Cc: "H. Peter Anvin" <hpa@zytor.com> (arch/x86)
Cc: Arnd Bergmann <arnd@arndb.de> (include/asm-generic)
Cc: Rusty Russell <rusty@rustcorp.com.au> ("Hard To Misuse" list)
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-09 20:44:54 -07:00
Cyrill Gorcunov
1c250d709f perf, x86: P4 PMU -- update nmi irq statistics and unmask lvt entry properly
In case if last active performance counter is not overflowed at
moment of NMI being triggered by another counter, the irq
statistics may miss an update stage. As a more serious
consequence -- apic quirk may not be triggered so apic lvt entry
stay masked.

Tested-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <20100805150917.GA6311@lenovo>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-08 22:53:50 +02:00
Huang Ying
ad4ecef2f1 ACPI, APEI, Rename CPER and GHES severity constants
The abbreviation of severity should be SEV instead of SER, so the CPER
severity constants are renamed accordingly. GHES severity constants
are renamed in the same way too.

Signed-off-by: Huang Ying <ying.huang@intel.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2010-08-08 14:55:26 -04:00
Linus Torvalds
3b7433b8a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (55 commits)
  workqueue: mark init_workqueues() as early_initcall()
  workqueue: explain for_each_*cwq_cpu() iterators
  fscache: fix build on !CONFIG_SYSCTL
  slow-work: kill it
  gfs2: use workqueue instead of slow-work
  drm: use workqueue instead of slow-work
  cifs: use workqueue instead of slow-work
  fscache: drop references to slow-work
  fscache: convert operation to use workqueue instead of slow-work
  fscache: convert object to use workqueue instead of slow-work
  workqueue: fix how cpu number is stored in work->data
  workqueue: fix mayday_mask handling on UP
  workqueue: fix build problem on !CONFIG_SMP
  workqueue: fix locking in retry path of maybe_create_worker()
  async: use workqueue for worker pool
  workqueue: remove WQ_SINGLE_CPU and use WQ_UNBOUND instead
  workqueue: implement unbound workqueue
  workqueue: prepare for WQ_UNBOUND implementation
  libata: take advantage of cmwq and remove concurrency limitations
  workqueue: fix worker management invocation without pending works
  ...

Fixed up conflicts in fs/cifs/* as per Tejun. Other trivial conflicts in
include/linux/workqueue.h, kernel/trace/Kconfig and kernel/workqueue.c
2010-08-07 12:42:58 -07:00
FUJITA Tomonori
7e005f7979 remove needless ISA_DMA_THRESHOLD
Architectures don't need to define ISA_DMA_THRESHOLD anymore.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:15:50 +02:00
Linus Torvalds
4a386c3e17 Merge branch 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-xsave-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, xsave: Make xstate_enable_boot_cpu() __init, protect on CPU 0
  x86, xsave: Add __init attribute to setup_xstate_features()
  x86, xsave: Make init_xstate_buf static
  x86, xsave: Check cpuid level for XSTATE_CPUID (0x0d)
  x86, xsave: Introduce xstate enable functions
  x86, xsave: Separate fpu and xsave initialization
  x86, xsave: Move boot cpu initialization to xsave_init()
  x86, xsave: 32/64 bit boot cpu check unification in initialization
  x86, xsave: Do not include asm/i387.h in asm/xsave.h
  x86, xsave: Use xsaveopt in context-switch path when supported
  x86, xsave: Sync xsave memory layout with its header for user handling
  x86, xsave: Track the offset, size of state in the xsave layout
2010-08-06 16:25:13 -07:00
Linus Torvalds
e8779776af Merge branch 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mce-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: Use HW_ERR in MCE handler
  x86, mce: Add HW_ERR printk prefix for hardware error logging
  x86, mce: Fix MSR_IA32_MCI_CTL2 CMCI threshold setup
  x86, mce: Rename MSR_IA32_MCx_CTL2 value
2010-08-06 16:24:51 -07:00
Linus Torvalds
3cf8ad3394 Merge branch 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-olpc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, olpc: Constify an olpc_ofw() arg
  x86, olpc: Use pr_debug() for EC commands
  x86, olpc: Add comment about implicit optimization barrier
  x86, olpc: Add support for calling into OpenFirmware
2010-08-06 16:24:34 -07:00
Linus Torvalds
66cd55d2b9 Merge branch 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, alternatives: BUG on encountering an invalid CPU feature number
  x86, alternatives: Fix one more open-coded 8-bit alternative number
  x86, alternatives: Use 16-bit numbers for cpufeature index
2010-08-06 16:24:17 -07:00
Linus Torvalds
75cb5fdce2 Merge branches 'x86-cleanups-for-linus', 'x86-vmware-for-linus', 'x86-mtrr-for-linus', 'x86-apic-for-linus', 'x86-fpu-for-linus' and 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Clean up arch/x86/kernel/cpu/mtrr/cleanup.c: use ";" not "," to terminate statements

* 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, vmware: Preset lpj values when on VMware.

* 'x86-mtrr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mtrr: Use stop machine context to rendezvous all the cpu's

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/apic/es7000_32: Remove unused variable

* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Avoid unnecessary __clear_user() and xrstor in signal handling

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, vdso: Unmap vdso pages
2010-08-06 16:22:59 -07:00
Linus Torvalds
b62ad9ab18 Merge branch 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-timekeeping-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  um: Fix read_persistent_clock fallout
  kgdb: Do not access xtime directly
  powerpc: Clean up obsolete code relating to decrementer and timebase
  powerpc: Rework VDSO gettimeofday to prevent time going backwards
  clocksource: Add __clocksource_updatefreq_hz/khz methods
  x86: Convert common clocksources to use clocksource_register_hz/khz
  timekeeping: Make xtime and wall_to_monotonic static
  hrtimer: Cleanup direct access to wall_to_monotonic
  um: Convert to use read_persistent_clock
  timkeeping: Fix update_vsyscall to provide wall_to_monotonic offset
  powerpc: Cleanup xtime usage
  powerpc: Simplify update_vsyscall
  time: Kill off CONFIG_GENERIC_TIME
  time: Implement timespec_add
  x86: Fix vtime/file timestamp inconsistencies

Trivial conflicts in Documentation/feature-removal-schedule.txt

Much less trivial conflicts in arch/powerpc/kernel/time.c resolved as
per Thomas' earlier merge commit 47916be4e2 ("Merge branch
'powerpc.cherry-picks' into timers/clocksource")
2010-08-06 13:18:29 -07:00
H. Peter Anvin
7645e43204 x86, kvm: Remove cast obsoleted by set_64bit() prototype cleanup
KVM ended up having to put a pretty ugly wrapper around set_64bit()
in order to get the type right.  Now set_64bit() takes the expected
u64 type, and this wrapper can be cleaned up.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Avi Kivity <avi@redhat.com>
LKML-Reference: <4C5C4E7A.8040603@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-06 13:07:19 -07:00
Linus Torvalds
1cfd2bda8c Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (30 commits)
  PCI: update for owner removal from struct device_attribute
  PCI: Fix warnings when CONFIG_DMI unset
  PCI: Do not run NVidia quirks related to MSI with MSI disabled
  x86/PCI: use for_each_pci_dev()
  PCI: use for_each_pci_dev()
  PCI: MSI: Restore read_msi_msg_desc(); add get_cached_msi_msg_desc()
  PCI: export SMBIOS provided firmware instance and label to sysfs
  PCI: Allow read/write access to sysfs I/O port resources
  x86/PCI: use host bridge _CRS info on ASRock ALiveSATA2-GLAN
  PCI: remove unused HAVE_ARCH_PCI_SET_DMA_MAX_SEGMENT_{SIZE|BOUNDARY}
  PCI: disable mmio during bar sizing
  PCI: MSI: Remove unsafe and unnecessary hardware access
  PCI: Default PCIe ASPM control to on and require !EMBEDDED to disable
  PCI: kernel oops on access to pci proc file while hot-removal
  PCI: pci-sysfs: remove casts from void*
  ACPI: Disable ASPM if the platform won't provide _OSC control for PCIe
  PCI hotplug: make sure child bridges are enabled at hotplug time
  PCI hotplug: shpchp: Removed check for hotplug of display devices
  PCI hotplug: pciehp: Fixed return value sign for pciehp_unconfigure_device
  PCI: Don't enable aspm before drivers have had a chance to veto it
  ...
2010-08-06 11:44:36 -07:00
Linus Torvalds
cc41f5cede Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-2.6: (524 commits)
  Staging: wlan-ng: Update prism2_set_tx_power() to use mBm
  Staging: ti-st: update TODO
  Staging: wlags49_h2: use common PCI_VENDOR/DEVICE_ID name format
  Staging: comedi : fix brace coding style issue in wwrap.c
  Staging: quatech_usb2: remove unused qt2_box_flush function
  Staging: slicoss: Remove net_device_stats from the driver's private
  staging: rtl8192su: check whether requests succeeded
  staging: panel: fix error path
  staging: otus: check kmalloc() return value
  staging: octeon: check request_irq() return value
  Staging: wlan-ng: remove typedef in p80211hdr.h
  Staging: wlan-ng: fix checkpatch issues in headers.
  Staging: wlan-ng: remove typedef in p80211ioctl.h
  Staging: wlan-ng: fix style issues in p80211conv.h
  Staging: wlan-ng: fix style issues for p80211hdr.h
  staging: vt6656: removed NTSTATUS definition
  staging: vt6656: simplified tests involving both multi/broad-casts
  Staging: vt6655: replace BOOL with in kernel bool
  Staging: vt6655: replace FALSE with in kernel false
  Staging: vt6655: replace TRUE with in kernel true
  ...
2010-08-06 11:41:17 -07:00
Linus Torvalds
90c8327cad Merge branches 'x86-rwsem-for-linus' and 'x86-gcc46-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, rwsem: Minor cleanups
  x86, rwsem: Stay on fast path when count > 0 in __up_write()

* 'x86-gcc46-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, gcc-4.6: Fix set but not read variables
  x86, gcc-4.6: Avoid unused by set variables in rdmsr
2010-08-06 10:22:40 -07:00
Linus Torvalds
a417fb99de Merge branch 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-setup-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, setup: move isdigit.h to ctype.h, header files on top.
  x86, setup: enable early console output from the decompressor
  x86, setup: reorganize the early console setup
  x86, setup: Allow global variables and functions in the decompressor
  x86, setup: Only set early_serial_base after port is initialized
  x86, setup: Make the setup code also accept console=uart8250
  x86, setup: Early-boot serial I/O support
2010-08-06 10:20:47 -07:00
Linus Torvalds
9faa1e5942 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Ioremap: fix wrong physical address handling in PAT code
  x86, tlb: Clean up and correct used type
  x86, iomap: Fix wrong page aligned size calculation in ioremapping code
  x86, mm: Create symbolic index into address_markers array
  x86, ioremap: Fix normal ram range check
  x86, ioremap: Fix incorrect physical address handling in PAE mode
  x86-64, mm: Initialize VDSO earlier on 64 bits
  x86, kmmio/mmiotrace: Fix double free of kmmio_fault_pages
2010-08-06 10:17:52 -07:00
Linus Torvalds
d9a73c0016 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  um, x86: Cast to (u64 *) inside set_64bit()
  x86-32, asm: Directly access per-cpu GDT
  x86-64, asm: Directly access per-cpu IST
  x86, asm: Merge cmpxchg_486_u64() and cmpxchg8b_emu()
  x86, asm: Move cmpxchg emulation code to arch/x86/lib
  x86, asm: Clean up and simplify <asm/cmpxchg.h>
  x86, asm: Clean up and simplify set_64bit()
  x86: Add memory modify constraints to xchg() and cmpxchg()
  x86-64: Simplify loading initial_gs
  x86: Use symbolic MSR names
  x86: Remove redundant K6 MSRs
2010-08-06 10:07:34 -07:00
Linus Torvalds
b304441c6f Merge branch 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-build-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, vdso: Don't quote $nm in the script for checking vdso references
  x86, vdso: Error out if the vdso contains external references
2010-08-06 10:06:47 -07:00
Linus Torvalds
f7ddc2b6cd Merge branch 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mrst-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mrst: make mrst_timer_options an enum
  x86, mrst: make mrst_identify_cpu() an inline returning enum
  x86, mrst: add more timer config options
  x86, mrst: add cpu type detection
  x86: detect scattered cpuid features earlier
2010-08-06 10:06:28 -07:00
Linus Torvalds
a5e11599da Merge branch 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, hwmon: Package Level Thermal/Power: pkgtemp documentation
  x86, hwmon: Package Level Thermal/Power: power limit
  x86, hwmon: Package Level Thermal/Power: thermal throttling handler
  x86, hwmon: Package Level Thermal/Power: pkgtemp hwmon driver
2010-08-06 10:02:58 -07:00
Linus Torvalds
0f477dd085 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix keeping track of AMD C1E
  x86, cpu: Package Level Thermal Control, Power Limit Notification definitions
  x86, cpu: Export AMD errata definitions
  x86, cpu: Use AMD errata checking framework for erratum 383
  x86, cpu: Clean up AMD erratum 400 workaround
  x86, cpu: AMD errata checking framework
  x86, cpu: Split addon_cpuid_features.c
  x86, cpu: Clean up formatting in cpufeature.h, remove override
  x86, cpu: Enumerate xsaveopt
  x86, cpu: Add xsaveopt cpufeature
  x86, cpu: Make init_scattered_cpuid_features() consider cpuid subleaves
  x86, cpu: Support the features flags in new CPUID leaf 7
  x86, cpu: Add CPU flags for F16C and RDRND
  x86: Look for IA32_ENERGY_PERF_BIAS support
  x86, AMD: Extend support to future families
  x86, cacheinfo: Carve out L3 cache slot accessors
  x86, xsave: Cleanup return codes in check_for_xstate()
2010-08-06 10:02:36 -07:00
Linus Torvalds
4aed2fd8e3 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (162 commits)
  tracing/kprobes: unregister_trace_probe needs to be called under mutex
  perf: expose event__process function
  perf events: Fix mmap offset determination
  perf, powerpc: fsl_emb: Restore setting perf_sample_data.period
  perf, powerpc: Convert the FSL driver to use local64_t
  perf tools: Don't keep unreferenced maps when unmaps are detected
  perf session: Invalidate last_match when removing threads from rb_tree
  perf session: Free the ref_reloc_sym memory at the right place
  x86,mmiotrace: Add support for tracing STOS instruction
  perf, sched migration: Librarize task states and event headers helpers
  perf, sched migration: Librarize the GUI class
  perf, sched migration: Make the GUI class client agnostic
  perf, sched migration: Make it vertically scrollable
  perf, sched migration: Parameterize cpu height and spacing
  perf, sched migration: Fix key bindings
  perf, sched migration: Ignore unhandled task states
  perf, sched migration: Handle ignored migrate out events
  perf: New migration tool overview
  tracing: Drop cpparg() macro
  perf: Use tracepoint_synchronize_unregister() to flush any pending tracepoint call
  ...

Fix up trivial conflicts in Makefile and drivers/cpufreq/cpufreq.c
2010-08-06 09:30:52 -07:00
Linus Torvalds
3a3527b646 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  Revert "net: Make accesses to ->br_port safe for sparse RCU"
  mce: convert to rcu_dereference_index_check()
  net: Make accesses to ->br_port safe for sparse RCU
  vfs: add fs.h to define struct file
  lockdep: Add an in_workqueue_context() lockdep-based test function
  rcu: add __rcu API for later sparse checking
  rcu: add an rcu_dereference_index_check()
  tree/tiny rcu: Add debug RCU head objects
  mm: remove all rcu head initializations
  fs: remove all rcu head initializations, except on_stack initializations
  powerpc: remove all rcu head initializations
2010-08-06 09:23:07 -07:00
Linus Torvalds
cc77b4db00 Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/amd-iommu: Export cache-coherency capability
  iommu-api: Extension to check for interrupt remapping
  x86/amd-iommu: Use for_each_pci_dev()
2010-08-06 09:22:39 -07:00
Eric W. Biederman
5989cd6a1c x86, apic: Map the local apic when parsing the MP table.
This fixes a regression in 2.6.35 from 2.6.34, that is
present for select models of Intel cpus when people are
using an MP table.

The commit cf7500c0ea
"x86, ioapic: In mpparse use mp_register_ioapic" started
calling mp_register_ioapic from MP_ioapic_info.  An extremely
simple change that was obviously correct.  Unfortunately
mp_register_ioapic did just a little more than the previous
hand crafted code and so we gained this call path.

The problem call path is:
MP_ioapic_info()
  mp_register_ioapic()
   io_apic_unique_id()
     io_apic_get_unique_id()
       get_physical_broadcast()
         modern_apic()
           lapic_get_version()
             apic_read(APIC_LVR)

Which turned out to be a problem because the local apic
was not mapped, at that point, unlike the similar point
in the ACPI parsing code.

This problem is fixed by mapping the local apic when
parsing the mptable as soon as we reasonably can.

Looking at the number of places we setup the fixmap for
the local apic, I see some serious simplification opportunities.
For the moment except for not duplicating the setting up of the
fixmap in init_apic_mappings, I have not acted on them.

The regression from 2.6.34 is tracked in bug
https://bugzilla.kernel.org/show_bug.cgi?id=16173

Cc: <stable@kernel.org> 2.6.35
Reported-by: David Hill <hilld@binarystorm.net>
Reported-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Tested-by: Tvrtko Ursulin <tvrtko.ursulin@sophos.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
LKML-Reference: <m1eiee86jg.fsf_-_@fess.ebiederm.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-05 16:26:42 -07:00
Linus Torvalds
89a6c8cb9e Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb:
  debug_core,kdb: fix crash when arch does not have single step
  kgdb,x86: use macro HBP_NUM to replace magic number 4
  kgdb,mips: remove unused kgdb_cpu_doing_single_step operations
  mm,kdb,kgdb: Add a debug reference for the kdb kmap usage
  KGDB: Remove set but unused newPC
  ftrace,kdb: Allow dumping a specific cpu's buffer with ftdump
  ftrace,kdb: Extend kdb to be able to dump the ftrace buffer
  kgdb,powerpc: Replace hardcoded offset by BREAK_INSTR_SIZE
  arm,kgdb: Add ability to trap into debugger on notify_die
  gdbstub: do not directly use dbg_reg_def[] in gdb_cmd_reg_set()
  gdbstub: Implement gdbserial 'p' and 'P' packets
  kgdb,arm: Individual register get/set for arm
  kgdb,mips: Individual register get/set for mips
  kgdb,x86: Individual register get/set for x86
  kgdb,kdb: individual register set and and get API
  gdbstub: Optimize kgdb's "thread:" response for the gdb serial protocol
  kgdb: remove custom hex_to_bin()implementation
2010-08-05 15:59:48 -07:00
Greg Kroah-Hartman
e9563355ac Staging: Merge staging-next into Linus's tree
Conflicts:
	drivers/staging/Kconfig
	drivers/staging/batman-adv/bat_sysfs.c
	drivers/staging/batman-adv/device.c
	drivers/staging/batman-adv/hard-interface.c
	drivers/staging/cx25821/cx25821-audups11.c

Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-08-05 14:18:03 -07:00
Linus Torvalds
db7a1535d2 Merge branch 'upstream/xen' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen
* 'upstream/xen' of git://git.kernel.org/pub/scm/linux/kernel/git/jeremy/xen: (23 commits)
  xen/panic: use xen_reboot and fix smp_send_stop
  Xen: register panic notifier to take crashes of xen guests on panic
  xen: support large numbers of CPUs with vcpu info placement
  xen: drop xen_sched_clock in favour of using plain wallclock time
  pvops: do not notify callers from register_xenstore_notifier
  Introduce CONFIG_XEN_PVHVM compile option
  blkfront: do not create a PV cdrom device if xen_hvm_guest
  support multiple .discard.* sections to avoid section type conflicts
  xen/pvhvm: fix build problem when !CONFIG_XEN
  xenfs: enable for HVM domains too
  x86: Call HVMOP_pagetable_dying on exit_mmap.
  x86: Unplug emulated disks and nics.
  x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.
  implement O_NONBLOCK for /proc/xen/xenbus
  xen: Fix find_unbound_irq in presence of ioapic irqs.
  xen: Add suspend/resume support for PV on HVM guests.
  xen: Xen PCI platform device driver.
  x86/xen: event channels delivery on HVM.
  x86: early PV on HVM features initialization.
  xen: Add support for HVM hypercalls.
  ...
2010-08-05 13:45:50 -07:00
Dongdong Deng
df4939350b kgdb,x86: use macro HBP_NUM to replace magic number 4
Use the macros provided by the HW breakpoint API.

Signed-off-by: Dongdong Deng <dongdong.deng@windriver.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-08-05 09:22:25 -05:00
Andi Kleen
9264b278be KGDB: Remove set but unused newPC
Found by gcc 4.6's new warnings

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
2010-08-05 09:22:24 -05:00
Jason Wessel
12bfa3de63 kgdb,x86: Individual register get/set for x86
Implement the ability to individually get and set registers for kdb
and kgdb for x86.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Acked-by: H. Peter Anvin <hpa@zytor.com>
CC: Ingo Molnar <mingo@redhat.com>
CC: x86@kernel.org
2010-08-05 09:22:20 -05:00
Josh Hunt
a7c55cbee0 oprofile: add support for Intel processor model 30
Newer Intel processors identifying themselves as model 30 are not recognized by
oprofile.

<cpuinfo snippet>
model           : 30
model name      : Intel(R) Xeon(R) CPU           X3470  @ 2.93GHz
</cpuinfo snippet>

Running oprofile on these machines gives the following:
+ opcontrol --init
+ opcontrol --list-events
oprofile: available events for CPU type "Intel Architectural Perfmon"

See Intel 64 and IA-32 Architectures Software Developer's Manual
Volume 3B (Document 253669) Chapter 18 for architectural perfmon events
This is a limited set of fallback events because oprofile doesn't know your CPU
CPU_CLK_UNHALTED: (counter: all)
        Clock cycles when not halted (min count: 6000)
INST_RETIRED: (counter: all)
        number of instructions retired (min count: 6000)
LLC_MISSES: (counter: all)
        Last level cache demand requests from this core that missed the LLC
(min count: 6000)
        Unit masks (default 0x41)
        ----------
        0x41: No unit mask
LLC_REFS: (counter: all)
        Last level cache demand requests from this core (min count: 6000)
        Unit masks (default 0x4f)
        ----------
        0x4f: No unit mask
BR_MISS_PRED_RETIRED: (counter: all)
        number of mispredicted branches retired (precise) (min count: 500)
+ opcontrol --shutdown

Tested using oprofile 0.9.6.

Signed-off-by: Josh Hunt <johunt@akamai.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-08-05 12:17:38 +02:00
Ingo Molnar
61be7fdec2 Merge branch 'perf/nmi' into perf/core
Conflicts:
	kernel/Makefile

Merge reason: Add the now complete topic, fix the conflict.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-05 08:45:05 +02:00
Linus Torvalds
3cfc2c42c1 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (48 commits)
  Documentation: update broken web addresses.
  fix comment typo "choosed" -> "chosen"
  hostap:hostap_hw.c Fix typo in comment
  Fix spelling contorller -> controller in comments
  Kconfig.debug: FAIL_IO_TIMEOUT: typo Faul -> Fault
  fs/Kconfig: Fix typo Userpace -> Userspace
  Removing dead MACH_U300_BS26
  drivers/infiniband: Remove unnecessary casts of private_data
  fs/ocfs2: Remove unnecessary casts of private_data
  libfc: use ARRAY_SIZE
  scsi: bfa: use ARRAY_SIZE
  drm: i915: use ARRAY_SIZE
  drm: drm_edid: use ARRAY_SIZE
  synclink: use ARRAY_SIZE
  block: cciss: use ARRAY_SIZE
  comment typo fixes: charater => character
  fix comment typos concerning "challenge"
  arm: plat-spear: fix typo in kerneldoc
  reiserfs: typo comment fix
  update email address
  ...
2010-08-04 15:31:02 -07:00
Jeremy Fitzhardinge
ca50a5f390 Merge branch 'upstream/pvhvm' into upstream/xen
* upstream/pvhvm:
  Introduce CONFIG_XEN_PVHVM compile option
  blkfront: do not create a PV cdrom device if xen_hvm_guest
  support multiple .discard.* sections to avoid section type conflicts
  xen/pvhvm: fix build problem when !CONFIG_XEN
  xenfs: enable for HVM domains too
  x86: Call HVMOP_pagetable_dying on exit_mmap.
  x86: Unplug emulated disks and nics.
  x86: Use xen_vcpuop_clockevent, xen_clocksource and xen wallclock.
  xen: Fix find_unbound_irq in presence of ioapic irqs.
  xen: Add suspend/resume support for PV on HVM guests.
  xen: Xen PCI platform device driver.
  x86/xen: event channels delivery on HVM.
  x86: early PV on HVM features initialization.
  xen: Add support for HVM hypercalls.

Conflicts:
	arch/x86/xen/enlighten.c
	arch/x86/xen/time.c
2010-08-04 14:49:16 -07:00
Jeremy Fitzhardinge
a70ce4b606 Merge branch 'upstream/core' into upstream/xen
* upstream/core:
  xen/panic: use xen_reboot and fix smp_send_stop
  Xen: register panic notifier to take crashes of xen guests on panic
  xen: support large numbers of CPUs with vcpu info placement
  xen: drop xen_sched_clock in favour of using plain wallclock time
  pvops: do not notify callers from register_xenstore_notifier
  xen: make sure pages are really part of domain before freeing
  xen: release unused free memory
2010-08-04 14:49:05 -07:00
Ian Campbell
086748e52f xen/panic: use xen_reboot and fix smp_send_stop
Offline vcpu when using stop_self.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-04 14:47:31 -07:00
Donald Dutile
f09f6d194d Xen: register panic notifier to take crashes of xen guests on panic
Register a panic notifier so that when the guest crashes it can shut
down the domain and indicate it was a crash to the host.

Signed-off-by: Donald Dutile <ddutile@redhat.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-04 14:47:31 -07:00
Mukesh Rathor
c06ee78d73 xen: support large numbers of CPUs with vcpu info placement
When vcpu info placement is supported, we're not limited to MAX_VIRT_CPUS
vcpus.  However, if it isn't supported, then ignore any excess vcpus.

Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com>
Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-04 14:47:30 -07:00
Jeremy Fitzhardinge
8a22b9996b xen: drop xen_sched_clock in favour of using plain wallclock time
xen_sched_clock only counts unstolen time.  In principle this should
be useful to the Linux scheduler so that it knows how much time a process
actually consumed.  But in practice this doesn't work very well as the
scheduler expects the sched_clock time to be synchronized between
cpus.  It also uses sched_clock to measure the time a task spends
sleeping, in which case "unstolen time" isn't meaningful.

So just use plain xen_clocksource_read to return wallclock nanoseconds
for sched_clock.

Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2010-08-04 14:47:29 -07:00
Linus Torvalds
6ba74014c1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1443 commits)
  phy/marvell: add 88ec048 support
  igb: Program MDICNFG register prior to PHY init
  e1000e: correct MAC-PHY interconnect register offset for 82579
  hso: Add new product ID
  can: Add driver for esd CAN-USB/2 device
  l2tp: fix export of header file for userspace
  can-raw: Fix skb_orphan_try handling
  Revert "net: remove zap_completion_queue"
  net: cleanup inclusion
  phy/marvell: add 88e1121 interface mode support
  u32: negative offset fix
  net: Fix a typo from "dev" to "ndev"
  igb: Use irq_synchronize per vector when using MSI-X
  ixgbevf: fix null pointer dereference due to filter being set for VLAN 0
  e1000e: Fix irq_synchronize in MSI-X case
  e1000e: register pm_qos request on hardware activation
  ip_fragment: fix subtracting PPPOE_SES_HLEN from mtu twice
  net: Add getsockopt support for TCP thin-streams
  cxgb4: update driver version
  cxgb4: add new PCI IDs
  ...

Manually fix up conflicts in:
 - drivers/net/e1000e/netdev.c: due to pm_qos registration
   infrastructure changes
 - drivers/net/phy/marvell.c: conflict between adding 88ec048 support
   and cleaning up the IDs
 - drivers/net/wireless/ipw2x00/ipw2100.c: trivial ipw2100_pm_qos_req
   conflict (registration change vs marking it static)
2010-08-04 11:47:58 -07:00
Linus Torvalds
d5fc1d5175 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp:
  amd64_edac: Minor formatting fix
  amd64_edac: Fix operator precendence error
  edac, mc: Improve scrub rate handling
  amd64_edac: Correct scrub rate setting
  amd64_edac: Fix DCT base address selector
  amd64_edac: Remove polling mechanism
  x86, mce: Notify about corrected events too
  amd64_edac: Remove unneeded defines
  edac: Remove EDAC_DEBUG_VERBOSE
  amd64_edac: Sanitize syndrome extraction
2010-08-04 11:23:59 -07:00
Linus Torvalds
8d91530c5f Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
  [CPUFREQ] Remove pointless printk from p4-clockmod.
  [CPUFREQ] Fix section mismatch for powernow_cpu_init in powernow-k7.c
  [CPUFREQ] Fix section mismatch for longhaul_cpu_init.
  [CPUFREQ] Fix section mismatch for longrun_cpu_init.
  [CPUFREQ] powernow-k8: Fix misleading variable naming
  [CPUFREQ] Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used)
  [CPUFREQ] arch/x86/kernel/cpu/cpufreq: use for_each_pci_dev()
  [CPUFREQ] fix brace coding style issue.
  [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent
  [CPUFREQ] acpi-cpufreq: Fix CPU_ANY CPUFREQ_{PRE,POST}CHANGE notification
  [CPUFREQ] ondemand: don't synchronize sample rate unless multiple cpus present
  [CPUFREQ] unexport (un)lock_policy_rwsem* functions
  [CPUFREQ] ondemand: Refactor frequency increase code
  [CPUFREQ] powernow-k8: On load failure, remind the user to enable support in BIOS setup
  [CPUFREQ] powernow-k8: Limit Pstate transition latency check
  [CPUFREQ] Fix PCC driver error path
  [CPUFREQ] fix double freeing in error path of pcc-cpufreq
  [CPUFREQ] pcc driver should check for pcch method before calling _OSC
  [CPUFREQ] fix memory leak in cpufreq_add_dev
  [CPUFREQ] revert "[CPUFREQ] remove rwsem lock from CPUFREQ_GOV_STOP call (second call site)"

Manually fix up non-data merge conflict introduced by new calling
conventions for trace_power_start() in commit 6f4f2723d0 ("x86
cpufreq: Make trace_power_frequency cpufreq driver independent"), which
didn't update the intel_idle native hardware cpuidle driver.
2010-08-04 11:13:36 -07:00
Linus Torvalds
c145307a11 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mjg59/platform-drivers-x86: (88 commits)
  ips driver: make it less chatty
  intel_scu_ipc: fix size field for intel_scu_ipc_command
  intel_scu_ipc: return -EIO for error condition in busy_loop
  intel_scu_ipc: fix data packing of PMIC command on Moorestown
  Clean up command packing on MRST.
  zero the stack buffer before giving random garbage to the SCU
  Fix stack buffer size for IPC writev messages
  intel_scu_ipc: Use the new cpu identification function
  intel_scu_ipc: tidy up unused bits
  Remove indirect read write api support.
  intel_scu_ipc: Support Medfield processors
  intel_scu_ipc: detect CPU type automatically
  x86 plat: limit x86 platform driver menu to X86
  acpi ec_sys: Be more cautious about ec write access
  acpi ec: Fix possible double io port registration
  hp-wmi: acpi_drivers.h is already included through acpi.h two lines below
  hp-wmi: Fix mixing up of and/or directive
  dell-laptop: make dell_laptop_i8042_filter() static
  asus-laptop: fix asus_input_init error path
  msi-wmi: make needlessly global symbols static
  ...
2010-08-04 10:44:06 -07:00
Ingo Molnar
12a81c8df1 Merge branch 'core' of git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile into perf/core 2010-08-04 16:25:47 +02:00
Jiri Kosina
d790d4d583 Merge branch 'master' into for-next 2010-08-04 15:14:38 +02:00
Fenghua Yu
0199114c31 x86, hwmon: Package Level Thermal/Power: power limit
Power limit notification feature is published in Intel 64 and IA-32
Architectures SDMV Vol 3A 14.5.6 Power Limit Notification.

It is implemented first on Intel Sandy Bridge platform.

The patch handles notification interrupt. Interrupt handler dumps power limit
information in log_buf, logs the event in mce log, and increases the event
counters (core_power_limit and package_power_limit). Upper level applications
could use the data to detect system health or diagnose functionality/performance
issues.

In the future, the event could be handled in a more fancy way.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-5-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-03 15:58:56 -07:00
Fenghua Yu
55d435a227 x86, hwmon: Package Level Thermal/Power: thermal throttling handler
Add package level thermal throttle interrupt support. The interrupt handler
increases package level thermal throttle count. It also logs the event in MCE
log.

The package level thermal throttle interrupt happens across threads in a
package. Each thread handles the interrupt individually. User level application
is supposed to retrieve correct event count and log based on package/thread
topology. This is the same situation for core level interrupt handler. In the
future, interrupt may be reported only per package or per core.

core_throttle_count and package_throttle_count are used for user interface.
Previously only throttle_count is used for core throttle count. If you think
new core_throttle_count name breaks user interface, I can change this part.

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-4-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-03 15:58:56 -07:00
Fenghua Yu
cb84b19474 x86, hwmon: Package Level Thermal/Power: pkgtemp hwmon driver
This patch adds a hwmon driver for package level thermal control. The driver
dumps package level thermal information through sysfs interface so that upper
level application (e.g. lm_sensor) can retrive the information.

Instead of having the package level hwmon code in coretemp, I write a seperate
driver pkgtemp because:

First, package level thermal sensors include not only sensors for each core,
but also sensors for uncore, memory controller or other components in the
package. Logically it will be clear to have a seperate hwmon driver for package
level hwmon to monitor wider range of sensors in a package. Merging package
thermal driver into core thermal driver doesn't make sense and may mislead.

Secondly, merging the two drivers together may cause coding mess. It's easier
to include various package level sensors info if more sensor information is
implemented. Coretemp code needs to consider a lot of legacy machine cases.
Pkgtemp code only considers platform starting from Sandy Bridge.

On a 1Sx4Cx2T Sandy Bridge platform, lm-sensors dumps the pkgtemp and coretemp:

pkgtemp-isa-0000
Adapter: ISA adapter
physical id 0: +33.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0000
Adapter: ISA adapter
Core 0:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0001
Adapter: ISA adapter
Core 1:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0002
Adapter: ISA adapter
Core 2:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

coretemp-isa-0003
Adapter: ISA adapter
Core 3:      +32.0°C  (high = +79.0°C, crit = +99.0°C)

[ hpa: folded v3 patch removing improper global variable "SHOW" ]

Signed-off-by: Fenghua Yu <fenghua.yu@intel.com>
LKML-Reference: <1280448826-12004-3-git-send-email-fenghua.yu@intel.com>
Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-03 15:58:07 -07:00
Dave Jones
9d1f44ee20 [CPUFREQ] Remove pointless printk from p4-clockmod.
The only machines this is triggering on should be supported by
acpi-cpufreq or acpi's internal throttling.

Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:30 -04:00
Holger Freyther
307069cf6c [CPUFREQ] Fix section mismatch for powernow_cpu_init in powernow-k7.c
Use __cpuinit instead of __init for the cpufreq_driver
init function like it is done in powernow-k8.c.

This is removing the warning generated when compiling with
the CONFIG_DEBUG_SECTION_MISMATCH=y option.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:06 -04:00
Holger Freyther
2530573e45 [CPUFREQ] Fix section mismatch for longhaul_cpu_init.
Use __cpuinit instead of __init for the cpufreq_driver
init function like it is done in powernow-k8.c. Use the
__cpuinitdata for data used by the routines marked as __cpuinit.

This is removing the warning generated when compiling with
the CONFIG_DEBUG_SECTION_MISMATCH=y option.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:06 -04:00
Holger Freyther
7e2d811220 [CPUFREQ] Fix section mismatch for longrun_cpu_init.
Use __cpuinit instead of __init for the cpufreq_driver
init function like it is done in powernow-k8.c.

This is removing the warning generated when compiling with
the CONFIG_DEBUG_SECTION_MISMATCH=y option.

Signed-off-by: Holger Hans Peter Freyther <holger@moiji-mobile.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:06 -04:00
Borislav Petkov
b30d3304c9 [CPUFREQ] powernow-k8: Fix misleading variable naming
rdmsr() takes the lower 32 bits as a second argument and the high 32 as
a third. Fix the names accordingly since they were swapped.

There should be no functionality change resulting from this patch.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:06 -04:00
Peter Huewe
55c789bb2b [CPUFREQ] Convert pci_table entries to PCI_VDEVICE (if PCI_ANY_ID is used)
This patch converts pci_table entries, where .subvendor=PCI_ANY_ID and
.subdevice=PCI_ANY_ID, .class=0 and .class_mask=0, to use the
PCI_VDEVICE macro, and thus improves readability.

Signed-off-by: Peter Huewe <peterhuewe@gmx.de>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Kulikov Vasiliy
ccc5638a20 [CPUFREQ] arch/x86/kernel/cpu/cpufreq: use for_each_pci_dev()
Use for_each_pci_dev() to simplify the code.

Signed-off-by: Kulikov Vasiliy <segooon@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Thomas Renninger
6f4f2723d0 [CPUFREQ] x86 cpufreq: Make trace_power_frequency cpufreq driver independent
and fix the broken case if a core's frequency depends on others.

trace_power_frequency was only implemented in a rather ungeneric way
in acpi-cpufreq driver's target() function only.
-> Move the call to trace_power_frequency to
   cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
   notifier is triggered.
   This will support power frequency tracing by all cpufreq drivers

trace_power_frequency did not trace frequency changes correctly when
the userspace governor was used or when CPU cores' frequency depend
on each other.
-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
   which gets switched automatically fixes this.

Robert Schoene provided some important fixes on top of my initial
quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id

Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
CC: robert.schoene@tu-dresden.de
Tested-by: robert.schoene@tu-dresden.de
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Thomas Renninger
6b72e3934b [CPUFREQ] acpi-cpufreq: Fix CPU_ANY CPUFREQ_{PRE,POST}CHANGE notification
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: venki@google.com
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:05 -04:00
Marti Raudsepp
298decfbc4 [CPUFREQ] powernow-k8: On load failure, remind the user to enable support in BIOS setup
On Wed, 2010-01-20 at 16:56 +0100, Thomas Renninger wrote:
> But most often this happens if people upgrade their CPU and do not
> update their BIOS.
> Or the vendor does not recognise the new CPU even if the BIOS got
> updated.

Maybe some of those people just didn't realize it was disabled in BIOS?
If you tell users that it's a firmware bug then they'll probably just
give up.

> The itself message might be an enhancment, IMO it's not worth a patch.

Why do you think so? I spent an hour on hunting down the BIOS upgrade,
only to find that it didn't improve anything. It was a day later that I
realized that it might be a BIOS option; and the option was literally
the _last_ option in the whole BIOS setup. :)

This message would have saved the day.

> But do not revert the FW_BUG part!

Sure, you have a point here.

How about this patch?
2010-08-03 13:47:04 -04:00
Borislav Petkov
c2f4a2c6e0 [CPUFREQ] powernow-k8: Limit Pstate transition latency check
The Pstate transition latency check was added for broken F10h BIOSen
which wrongly contain a value of 0 for transition and bus master
latency. Fam11h and later, however, (will) have similar transition
latency so extend that behavior for them too.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:03 -04:00
Matthew Garrett
6ebdf777ba [CPUFREQ] Fix PCC driver error path
The PCC cpufreq driver unmaps the mailbox address range if any CPUs fail to
initialise, but doesn't do anything to remove the registered CPUs from the
cpufreq core resulting in failures further down the line. We're better off
simply returning a failure - the cpufreq core will unregister us cleanly if
we end up with no successfully registered CPUs. Tidy up the failure path
and also add a sanity check to ensure that the firmware gives us a realistic
frequency - the core deals badly with that being set to 0.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:02 -04:00
Daniel J Blueman
0d9715d64f [CPUFREQ] fix double freeing in error path of pcc-cpufreq
Prevent double freeing on error path.

Signed-off-by: Daniel J Blueman <daniel.blueman@gmail.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:02 -04:00
Matthew Garrett
5d77b85458 [CPUFREQ] pcc driver should check for pcch method before calling _OSC
The pcc specification documents an _OSC method that's incompatible with the
one defined as part of the ACPI spec. This shouldn't be a problem as both
are supposed to be guarded with a UUID. Unfortunately approximately nobody
(including HP, who wrote this spec) properly check the UUID on entry to the
_OSC call. Right now this could result in surprising behaviour if the pcc
driver performs an _OSC call on a machine that doesn't implement the pcc
specification. Check whether the PCCH method exists first in order to reduce
this probability.

Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Signed-off-by: Dave Jones <davej@redhat.com>
2010-08-03 13:47:02 -04:00
Borislav Petkov
98a5ae2d99 x86, mce: Notify about corrected events too
Notify all parties registered on the mce decoder chain about logged
correctable MCEs.

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Acked-by: Doug Thompson <dougthompson@xmission.com>
Acked-by: Ingo Molnar <mingo@elte.hu>
2010-08-03 16:14:02 +02:00
Sreedhara DS
804f8681a9 Remove indirect read write api support.
The firmware of production devices does not support this interface so this
is dead code.

Signed-off-by: Sreedhara DS <sreedhara.ds@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2010-08-03 09:50:30 -04:00
Feng Tang
35f2915c3b intel_scu_ipc: add definitions for vRTC related command
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Matthew Garrett <mjg@redhat.com>
2010-08-03 09:48:47 -04:00
H. Peter Anvin
6238b47b58 x86, setup: move isdigit.h to ctype.h, header files on top.
It is a subset of <ctype.h> functionality, so name it ctype.h.  Also,
reorganize header files so #include statements are clustered near the
top as they should be.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4C5752F2.8030206@kernel.org>
2010-08-02 21:07:20 -07:00
Yinghai Lu
8fee13a48e x86, setup: enable early console output from the decompressor
This enables the decompressor output to be seen on the serial console.
Most of the code is shared with the regular boot code.

We could add printf to the decompressor if needed, but currently there
is no sufficiently compelling user.

-v2: define BOOT_BOOT_H to avoid include boot.h
-v3: early_serial_base need to be static in misc.c ?
-v4: create seperate string.c printf.c cmdline.c early_serial_console.c
     after hpa's patch that allow global variables in compressed/misc stage
-v5: remove printf.c related

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-08-02 20:32:20 -07:00
Alok Kataria
9f242dc10e x86, vmware: Preset lpj values when on VMware.
When running on VMware's platform, we have seen situations where
the AP's try to calibrate the lpj values and fail to get good calibration
runs becasue of timing issues. As a result delays don't work correctly
on all cpus.

The solutions is to set preset_lpj value based on the current tsc frequency
value. This is similar to what KVM does as well.

Signed-off-by: Alok N Kataria <akataria@vmware.com>
LKML-Reference: <1280790637.14933.29.camel@ank32.eng.vmware.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-02 17:16:30 -07:00
Yinghai Lu
f4ed2877b1 x86, setup: reorganize the early console setup
Separate early_serial_console from tty.c

This allows for reuse of
early_serial_console.c/string.c/printf.c/cmdline.c in boot/compressed/.

-v2: according to hpa, don't include string.c etc
-v3: compressed/misc.c must have early_serial_base as static, so move it back to tty.c
     for setup code

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4C568D2B.205@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-02 15:51:56 -07:00
H. Peter Anvin
22a57f5896 x86, setup: Allow global variables and functions in the decompressor
In order for global variables and functions to work in the
decompressor, we need to fix up the GOT in assembly code.

Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
LKML-Reference: <4C57382E.8050501@zytor.com>
2010-08-02 15:34:44 -07:00
Shaohua Li
be783a4721 x86, vdso: Unmap vdso pages
We mapped vdso pages but never unmapped them and the virtual address
is lost after exiting from the function, so unmap vdso pages here.

Signed-off-by: Shaohua Li <shaohua.li@intel.com>
LKML-Reference: <20100802004934.GA2505@sli10-desk.sh.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-08-02 15:11:59 -07:00
Konrad Rzeszutek Wilk
fe96eb404e x86: Detect whether we should use Xen SWIOTLB.
It is paramount that we call pci_xen_swiotlb_detect before
pci_swiotlb_detect as both implementations use the 'swiotlb'
and 'swiotlb_force' flags. The pci-xen_swiotlb_detect inhibits
the swiotlb_force and swiotlb flag so that the native SWIOTLB
implementation is not enabled when running under Xen.

[since v1 changed two Cc's to Acked-by]

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
    [http://lkml.org/lkml/2010/7/27/374]
Cc: Albert Herranz <albert_herranz@yahoo.es>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
    [conditional http://lkml.org/lkml/2010/8/2/324]
Cc: x86@kernel.org
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-08-02 15:18:33 -04:00
Michal Schmidt
e8c534ec06 x86: Fix keeping track of AMD C1E
Accomodate the original C1E-aware idle routine to the different times
during boot when the BIOS enables C1E. While at it, remove the synthetic
CPUID flag in favor of a single global setting which denotes C1E status
on the system.

[ hpa: changed c1e_enabled to be a bool; clarified cpu bit 3:21 comment ]

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
LKML-Reference: <20100727165335.GA11630@aftab>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
2010-08-02 08:45:56 -07:00
Ingo Molnar
3772b73472 Merge commit 'v2.6.35' into perf/core
Conflicts:
	tools/perf/Makefile
	tools/perf/util/hist.c

Merge reason: Resolve the conflicts and update to latest upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-08-02 08:31:54 +02:00
Avi Kivity
3444d7da18 KVM: VMX: Fix host GDT.LIMIT corruption
vmx does not restore GDT.LIMIT to the host value, instead it sets it to 64KB.
This means host userspace can learn a few bits of host memory.

Fix by reloading GDTR when we load other host state.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-08-02 08:10:18 +03:00