Commit graph

5442 commits

Author SHA1 Message Date
Jan Beulich
350f8f5631 x86: Eliminate redundant/contradicting cache line size config options
Rather than having X86_L1_CACHE_BYTES and X86_L1_CACHE_SHIFT
(with inconsistent defaults), just having the latter suffices as
the former can be easily calculated from it.

To be consistent, also change X86_INTERNODE_CACHE_BYTES to
X86_INTERNODE_CACHE_SHIFT, and set it to 7 (128 bytes) for NUMA
to account for last level cache line size (which here matters
more than L1 cache line size).

Finally, make sure the default value for X86_L1_CACHE_SHIFT,
when X86_GENERIC is selected, is being seen before that for the
individual CPU model options (other than on x86-64, where
GENERIC_CPU is part of the choice construct, X86_GENERIC is a
separate option on ix86).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Acked-by: Ravikiran Thirumalai <kiran@scalex86.org>
Acked-by: Nick Piggin <npiggin@suse.de>
LKML-Reference: <4AFD5710020000780001F8F0@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-19 04:58:34 +01:00
Yinghai Lu
508d85c2c6 x86: When cleaning MTRRs, do not fold WP into UC
The current MTRR code treats WP as a form of UC.  This really isn't
desirable behaviour, except possibly in the case of severe MTRR
shortage.  Disable this, to allow legitimate uses of WP to remain
unmolested.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2009-11-17 10:26:53 -08:00
Kees Cook
4b0f3b81eb x86, mm: Report state of NX protections during boot
It is possible for x86_64 systems to lack the NX bit either due to the
hardware lacking support or the BIOS having turned off the CPU capability,
so NX status should be reported.  Additionally, anyone booting NX-capable
CPUs in 32bit mode without PAE will lack NX functionality, so this change
provides feedback for that case as well.

Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-6-git-send-email-hpa@zytor.com>
2009-11-16 13:44:59 -08:00
H. Peter Anvin
4763ed4d45 x86, mm: Clean up and simplify NX enablement
The 32- and 64-bit code used very different mechanisms for enabling
NX, but even the 32-bit code was enabling NX in head_32.S if it is
available.  Furthermore, we had a bewildering collection of tests for
the available of NX.

This patch:

a) merges the 32-bit set_nx() and the 64-bit check_efer() function
   into a single x86_configure_nx() function.  EFER control is left
   to the head code.

b) eliminates the nx_enabled variable entirely.  Things that need to
   test for NX enablement can verify __supported_pte_mask directly,
   and cpu_has_nx gives the supported status of NX.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Vegard Nossum <vegardno@ifi.uio.no>
Cc: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
Cc: Chris Wright <chrisw@sous-sol.org>
LKML-Reference: <1258154897-6770-5-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
2009-11-16 13:44:59 -08:00
H. Peter Anvin
583140afb9 x86, pageattr: Make set_memory_(x|nx) aware of NX support
Make set_memory_x/set_memory_nx directly aware of if NX is supported
in the system or not, rather than requiring that every caller assesses
that support independently.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Tim Starling <tstarling@wikimedia.org>
Cc: Hannes Eder <hannes@hanneseder.net>
LKML-Reference: <1258154897-6770-4-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
2009-11-16 13:44:58 -08:00
H. Peter Anvin
a7c4c0d934 x86, sleep: Always save the value of EFER
Always save the value of EFER, regardless of the state of NX.  Since
EFER may not actually exist, use rdmsr_safe() to do so.

v2: check the return value from rdmsr_safe() instead of relying on
    the output values being unchanged on error.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Nigel Cunningham <nigel@tuxonice.net>
LKML-Reference: <1258154897-6770-3-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
2009-11-16 13:44:57 -08:00
H. Peter Anvin
8a50e5135a x86-32: Use symbolic constants, safer CPUID when enabling EFER.NX
Use symbolic constants rather than hard-coded values when setting
EFER.NX in head_32.S, and do a more rigorous test for the validity of
the response when probing for the extended CPUID range.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <1258154897-6770-2-git-send-email-hpa@zytor.com>
Acked-by: Kees Cook <kees.cook@canonical.com>
2009-11-16 13:44:56 -08:00
Yinghai Lu
196cf0d67a x86: Make sure wakeup trampoline code is below 1MB
Instead of using bootmem, try find_e820_area()/reserve_early(),
and call acpi_reserve_memory() early, to allocate the wakeup
trampoline code area below 1M.

This is more reliable, and it also removes a dependency on
bootmem.

-v2: change function name to acpi_reserve_wakeup_memory(),
     as suggested by Rafael.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: H. Peter Anvin <hpa@zytor.com>
Acked-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: pm list <linux-pm@lists.linux-foundation.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4AFA210B.3020207@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-11 20:14:32 +01:00
Suresh Siddha
55ca3cc174 x86_64, ftrace: Make ftrace use kernel identity mapping to modify code
On x86_64, kernel text mappings are mapped read-only with
CONFIG_DEBUG_RODATA. So use the kernel identity mapping instead
of the kernel text mapping to modify the kernel text.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20091029024821.080941108@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02 17:16:36 +01:00
Suresh Siddha
d6cc1c3af7 x86-64: add comment for RODATA large page retainment
Add a comment explaining why RODATA is aligned to 2 MB.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-20 14:46:01 +09:00
Suresh Siddha
74e081797b x86-64: align RODATA kernel section to 2MB with CONFIG_DEBUG_RODATA
CONFIG_DEBUG_RODATA chops the large pages spanning boundaries of kernel
text/rodata/data to small 4KB pages as they are mapped with different
attributes (text as RO, RODATA as RO and NX etc).

On x86_64, preserve the large page mappings for kernel text/rodata/data
boundaries when CONFIG_DEBUG_RODATA is enabled. This is done by allowing the
RODATA section to be hugepage aligned and having same RWX attributes
for the 2MB page boundaries

Extra Memory pages padding the sections will be freed during the end of the boot
and the kernel identity mappings will have different RWX permissions compared to
the kernel text mappings.

Kernel identity mappings to these physical pages will be mapped with smaller
pages but large page mappings are still retained for kernel text,rodata,data
mappings.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091014220254.190119924@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-20 14:46:00 +09:00
Suresh Siddha
b9af7c0d44 x86-64: preserve large page mapping for 1st 2MB kernel txt with CONFIG_DEBUG_RODATA
In the first 2MB, kernel text is co-located with kernel static
page tables setup by head_64.S.  CONFIG_DEBUG_RODATA chops this
2MB large page mapping to small 4KB pages as we mark the kernel text as RO,
leaving the static page tables as RW.

With CONFIG_DEBUG_RODATA disabled, OLTP run on NHM-EP shows 1% improvement
with 2% reduction in system time and 1% improvement in iowait idle time.

To recover this, move the kernel static page tables to .data section, so that
we don't have to break the first 2MB of kernel text to small pages with
CONFIG_DEBUG_RODATA.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20091014220254.063193621@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-10-20 14:46:00 +09:00
David Rientjes
8716273cae x86: Export srat physical topology
This is the counterpart to "x86: export k8 physical topology" for
SRAT. It is not as invasive because the acpi code already seperates
node setup into detection and registration steps, with the
exception of registering e820 active regions in
acpi_numa_memory_affinity_init().  This is now moved to
acpi_scan_nodes() if NUMA emulation is disabled or deferred.

acpi_numa_init() now returns a value which specifies whether an
underlying SRAT was located.  If so, that topology can be used by
the emulation code to interleave emulated nodes over physical nodes
or to register the nodes for ACPI.

acpi_get_nodes() may now be used to export the srat physical
topology of the machine for NUMA emulation.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518580.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:46 +02:00
David Rientjes
8ee2debce3 x86: Export k8 physical topology
To eventually interleave emulated nodes over physical nodes, we
need to know the physical topology of the machine without actually
registering it.  This does the k8 node setup in two parts:
detection and registration.  NUMA emulation can then used the
physical topology detected to setup the address ranges of emulated
nodes accordingly.  If emulation isn't used, the k8 nodes are
registered as normal.

Two formals are added to the x86 NUMA setup functions: `acpi' and
`k8'. These represent whether ACPI or K8 NUMA has been detected;
both cannot be true at the same time.  This specifies to the NUMA
emulation code whether an underlying physical NUMA topology exists
and which interface to use.

This patch deals solely with separating the k8 setup path into
Northbridge detection and registration steps and leaves the ACPI
changes for a subsequent patch.  The `acpi' formal is added here,
however, to avoid touching all the header files again in the next
patch.

This approach also ensures emulated nodes will not span physical
nodes so the true memory latency is not misrepresented.

k8_get_nodes() may now be used to export the k8 physical topology
of the machine for NUMA emulation.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Ankita Garg <ankita@in.ibm.com>
Cc: Len Brown <len.brown@intel.com>
LKML-Reference: <alpine.DEB.1.00.0909251518400.14754@chino.kir.corp.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-12 22:56:45 +02:00
Yinghai Lu
15b812f1d0 pci: increase alignment to make more space for hidden code
As reported in

	http://bugzilla.kernel.org/show_bug.cgi?id=13940

on some system when acpi are enabled, acpi clears some BAR for some
devices without reason, and kernel will need to allocate devices for
them.  It then apparently hits some undocumented resource conflict,
resulting in non-working devices.

Try to increase alignment to get more safe range for unassigned devices.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-10-11 14:43:36 -07:00
Alexey Dobriyan
d43c36dc6b headers: remove sched.h from interrupt.h
After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
2009-10-11 11:20:58 -07:00
Linus Torvalds
624235c5b3 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, pci: Correct spelling in a comment
  x86: Simplify bound checks in the MTRR code
  x86: EDAC: carve out AMD MCE decoding logic
  initcalls: Add early_initcall() for modules
  x86: EDAC: MCE: Fix MCE decoding callback logic
2009-10-08 12:06:36 -07:00
Arjan van de Ven
9bcbdd9c58 x86, timers: Check for pending timers after (device) interrupts
Now that range timers and deferred timers are common, I found a
problem with these using the "perf timechart" tool. Frans Pop also
reported high scheduler latencies via LatencyTop, when using
iwlagn.

It turns out that on x86, these two 'opportunistic' timers only get
checked when another "real" timer happens. These opportunistic
timers have the objective to save power by hitchhiking on other
wakeups, as to avoid CPU wakeups by themselves as much as possible.

The change in this patch runs this check not only at timer
interrupts, but at all (device) interrupts. The effect is that:

 1) the deferred timers/range timers get delayed less

 2) the range timers cause less wakeups by themselves because
    the percentage of hitchhiking on existing wakeup events goes up.

I've verified the working of the patch using "perf timechart", the
original exposed bug is gone with this patch. Frans also reported
success - the latencies are now down in the expected ~10 msec
range.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Tested-by: Frans Pop <elendil@planet.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Mike Galbraith <efault@gmx.de>
LKML-Reference: <20091008064041.67219b13@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-08 17:27:27 +02:00
Marin Mitov
e3be785fb5 x86, pci: Correct spelling in a comment
Signed-off-by: Marin Mitov <mitov@issp.bas.bg>
Cc: Joerg Roedel <joerg.roedel@amd.com>
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
LKML-Reference: <200910032045.02523.mitov@issp.bas.bg>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
======================================================
2009-10-03 20:35:16 +02:00
Arjan van de Ven
11879ba5d9 x86: Simplify bound checks in the MTRR code
The current bound checks for copy_from_user in the MTRR driver are
not as obvious as they could be, and gcc agrees with that.

This patch simplifies the boundary checks to the point that gcc can
now prove to itself that the copy_from_user() is never going past
its bounds.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <20090926205150.30797709@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-02 19:51:56 +02:00
Ingo Molnar
f436f8bb73 x86: EDAC: MCE: Fix MCE decoding callback logic
Make decoding of MCEs happen only on AMD hardware by registering a
non-default callback only on CPU families which support it.

While looking at the interaction of decode_mce() with the other MCE
code i also noticed a few other things and made the following
cleanups/fixes:

 - Fixed the mce_decode() weak alias - a weak alias is really not
   good here, it should be a proper callback. A weak alias will be
   overriden if a piece of code is built into the kernel - not
   good, obviously.

 - The patch initializes the callback on AMD family 10h and 11h.

 - Added the more correct fallback printk of:

	No support for human readable MCE decoding on this CPU type.
	Transcribe the message and run it through 'mcelog --ascii' to decode.

   On CPUs that dont have a decoder.

 - Made the surrounding code more readable.

Note that the callback allows us to have a default fallback -
without having to check the CPU versions during the printout
itself. When an EDAC module registers itself, it can install the
decode-print function.

(there's no unregister needed as this is core code.)

version -v2 by Borislav Petkov:

 - add K8 to the set of supported CPUs

 - always build in edac_mce_amd since we use an early_initcall now

 - fix checkpatch warnings

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
LKML-Reference: <20091001141432.GA11410@aftab>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-02 15:42:18 +02:00
Jason Wessel
ea3acb199a x86: earlyprintk: Fix regression to handle serial,ttySn as 1 arg
Commit c953094 ("early_printk: Allow more than one early console")
introduced a regression in the parsing of the earlyprintk= kernel
arguments.

If you specify "earlyprintk=serial,ttyS0,115200" as a kernel
argument, the "serial,ttyS" should be parsed as a single argument
and not as "serial" and then "ttyS".

Also update the documentation to reflect you can specify the ttyS
directly without the "serial" argument.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
LKML-Reference: <4ABB7D5E.6000301@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 10:34:16 +02:00
Eric Dumazet
04edbdef02 x86: Don't generate cmpxchg8b_emu if CONFIG_X86_CMPXCHG64=y
Conditionaly compile cmpxchg8b_emu.o and EXPORT_SYMBOL(cmpxchg8b_emu).

This reduces the kernel size a bit.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4AC43E7E.1000600@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-01 08:42:24 +02:00
Linus Torvalds
84d88d5d4e Merge branch 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  sched_clock: Fix atomicity/continuity bug by using cmpxchg64()
  x86: Provide an alternative() based cmpxchg64()
2009-09-30 15:10:40 -07:00
Arjan van de Ven
79e1dd05d1 x86: Provide an alternative() based cmpxchg64()
cmpxchg64() today generates, to quote Linus, "barf bag" code.

cmpxchg64() is about to get used in the scheduler to fix a bug there,
but it's a prerequisite that cmpxchg64() first be made non-sucking.

This patch turns cmpxchg64() into an efficient implementation that
uses the alternative() mechanism to just use the raw instruction on
all modern systems.

Note: the fallback is NOT smp safe, just like the current fallback
is not SMP safe. (Interested parties with i486 based SMP systems
are welcome to submit fix patches for that.)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
[ fixed asm constraint bug ]
Fixed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20090930170754.0886ff2e@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-30 22:55:59 +02:00
Linus Torvalds
e207e143e2 Revert "x86, mce: do not compile mcelog message on AMD"
This reverts commit 22223c9b41, as
requested by Andi Kleen:

  "Obviously kernels compiled with AMD support can still run on non AMD
   systems, so messages like this can never be removed at compile time."

Requsted-by: Andi Kleen <andi@firstfloor.org>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-30 07:48:37 -07:00
Zhao Yakui
3e2ada5867 ACPI: fix Compaq Evo N800c (Pentium 4m) boot hang regression
Don't disable ARB_DISABLE when the familary ID is 0x0F.

http://bugzilla.kernel.org/show_bug.cgi?id=14211

This was a 2.6.31 regression, and so this patch
needs to be applied to 2.6.31.stable

Signed-off-by: Zhao Yakui <yakui.zhao@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-09-27 03:43:42 -04:00
Linus Torvalds
49e70dda35 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf tools: Dont use openat()
  perf tools: Fix buffer allocation
  perf tools: .gitignore += perf*.html
  perf tools: Handle relative paths while loading module symbols
  perf tools: Fix module symbol loading bug
  perf_event, x86: Fix 'perf sched record' crashing the machine
  perf_event: Update PERF_EVENT_FORK header definition
  perf stat: Fix zero total printouts
2009-09-26 10:15:33 -07:00
Linus Torvalds
5bb241b325 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Remove redundant non-NUMA topology functions
  x86: early_printk: Protect against using the same device twice
  x86: Reduce verbosity of "PAT enabled" kernel message
  x86: Reduce verbosity of "TSC is reliable" message
  x86: mce: Use safer ways to access MCE registers
  x86: mce, inject: Use real inject-msg in raise_local
  x86: mce: Fix thermal throttling message storm
  x86: mce: Clean up thermal throttling state tracking code
  x86: split NX setup into separate file to limit unstack-protected code
  xen: check EFER for NX before setting up GDT mapping
  x86: Cleanup linker script using new linker script macros.
  x86: Use section .data.page_aligned for the idt_table.
  x86: convert to use __HEAD and HEAD_TEXT macros.
  x86: convert compressed loader to use __HEAD and HEAD_TEXT macros.
  x86: fix fragile computation of vsyscall address
2009-09-26 10:13:35 -07:00
Ingo Molnar
704daf55c7 Merge branch 'x86/asm' into x86/urgent
Merge reason: The linker script cleanups are ready for upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-25 10:47:00 +02:00
Alexey Dobriyan
8d65af789f sysctl: remove "struct file *" argument of ->proc_handler
It's unused.

It isn't needed -- read or write flag is already passed and sysctl
shouldn't care about the rest.

It _was_ used in two places at arch/frv for some reason.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-24 07:21:04 -07:00
Jason Wessel
429a6e5e2c x86: early_printk: Protect against using the same device twice
If you use the kernel argument:

  earlyprintk=serial,ttyS0,115200

This will cause a recursive hang printing the same line
again and again:

 BIOS-e820: 000000003fff3000 - 0000000040000000 (ACPI data)
 BIOS-e820: 00000000e0000000 - 00000000f0000000 (reserved)
 BIOS-e820: 00000000fec00000 - 0000000100000000 (reserved)
bootconsole [earlyser0] enabled
Linux version 2.6.31-07863-gb64ada6 (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009
Linux version 2.6.31-07863-gb64ada6 (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009
Linux version 2.6.31-07863-gb64ada6 (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009
Linux version 2.6.31-07863-gb64ada6 (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009
Linux version 2.6.31-07863-gb64ada6 (mingo@sirius) (gcc version 4.3.2 20081105 (Red Hat 4.3.2-7) (GCC) ) #16789 SMP Wed Sep 23 21:09:43 CEST 2009

Instead warn the end user that they specified the device
a second time, and ignore that second console.

Reported-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Greg KH <gregkh@suse.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <4ABAAB89.1080407@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24 13:01:13 +02:00
Ingo Molnar
d2ff6de537 Merge branch 'linus' into x86/urgent
Merge reason: Queueing up dependent early-printk fix.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24 12:59:18 +02:00
Roland Dreier
ea01c0d731 x86: Reduce verbosity of "TSC is reliable" message
On modern systems, the kernel prints the message

    Skipping synchronization checks as TSC is reliable.

once for every non-boot CPU.

This gets kind of ridiculous on huge systems; for example, on a
64-thread system I was lucky enough to get:

    $ dmesg | grep 'TSC is reliable' | wc
         63     567    4221

There's no point to doing this for every CPU, since the code is
just checking the boot CPU anyway, so change this to a
printk_once() to make the message appears only once.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
LKML-Reference: <adazl8l2swc.fsf@cisco.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24 11:35:19 +02:00
Linus Torvalds
94a8d5caba Merge git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (39 commits)
  cpumask: Move deprecated functions to end of header.
  cpumask: remove unused deprecated functions, avoid accusations of insanity
  cpumask: use new-style cpumask ops in mm/quicklist.
  cpumask: use mm_cpumask() wrapper: x86
  cpumask: use mm_cpumask() wrapper: um
  cpumask: use mm_cpumask() wrapper: mips
  cpumask: use mm_cpumask() wrapper: mn10300
  cpumask: use mm_cpumask() wrapper: m32r
  cpumask: use mm_cpumask() wrapper: arm
  cpumask: Use accessors for cpu_*_mask: um
  cpumask: Use accessors for cpu_*_mask: powerpc
  cpumask: Use accessors for cpu_*_mask: mips
  cpumask: Use accessors for cpu_*_mask: m32r
  cpumask: remove arch_send_call_function_ipi
  cpumask: arch_send_call_function_ipi_mask: s390
  cpumask: arch_send_call_function_ipi_mask: powerpc
  cpumask: arch_send_call_function_ipi_mask: mips
  cpumask: arch_send_call_function_ipi_mask: m32r
  cpumask: arch_send_call_function_ipi_mask: alpha
  cpumask: remove obsolete topology_core_siblings and topology_thread_siblings: ia64
  ...
2009-09-23 18:14:11 -07:00
Alexey Dobriyan
2bcd57ab61 headers: utsname.h redux
* remove asm/atomic.h inclusion from linux/utsname.h --
   not needed after kref conversion
 * remove linux/utsname.h inclusion from files which do not need it

NOTE: it looks like fs/binfmt_elf.c do not need utsname.h, however
due to some personality stuff it _is_ needed -- cowardly leave ELF-related
headers and files alone.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-09-23 18:13:10 -07:00
Rusty Russell
78f1c4d6b0 cpumask: use mm_cpumask() wrapper: x86
Makes code futureproof against the impending change to mm->cpu_vm_mask (to be a pointer).

It's also a chance to use the new cpumask_ ops which take a pointer
(the older ones are deprecated, but there's no hurry for arch code).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24 09:34:52 +09:30
Rusty Russell
1d1afc1957 cpumask: remove last assignment to mask field of struct irqaction.
This snuck in after the patch which removed all the others.

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Ingo Molnar <mingo@elte.hu>
2009-09-24 09:34:37 +09:30
Li Zefan
79f5599772 cpumask: use zalloc_cpumask_var() where possible
Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-24 09:34:24 +09:30
Linus Torvalds
c37efa9325 Merge git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next
* git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-next: (30 commits)
  Use macros for .data.page_aligned section.
  Use macros for .bss.page_aligned section.
  Use new __init_task_data macro in arch init_task.c files.
  kbuild: Don't define ALIGN and ENTRY when preprocessing linker scripts.
  arm, cris, mips, sparc, powerpc, um, xtensa: fix build with bash 4.0
  kbuild: add static to prototypes
  kbuild: fail build if recordmcount.pl fails
  kbuild: set -fconserve-stack option for gcc 4.5
  kbuild: echo the record_mcount command
  gconfig: disable "typeahead find" search in treeviews
  kbuild: fix cc1 options check to ensure we do not use -fPIC when compiling
  checkincludes.pl: add option to remove duplicates in place
  markup_oops: use modinfo to avoid confusion with underscored module names
  checkincludes.pl: provide usage helper
  checkincludes.pl: close file as soon as we're done with it
  ctags: usability fix
  kernel hacking: move STRIP_ASM_SYMS from General
  gitignore usr/initramfs_data.cpio.bz2 and usr/initramfs_data.cpio.lzma
  kbuild: Check if linker supports the -X option
  kbuild: introduce ld-option
  ...

Fix trivial conflict in scripts/basic/fixdep.c
2009-09-23 15:37:02 -07:00
Linus Torvalds
d19110baaf Merge branch 'x86/ptrace-syscall-exit' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland
* 'x86/ptrace-syscall-exit' of git://git.kernel.org/pub/scm/linux/kernel/git/frob/linux-2.6-roland:
  x86: ptrace: sysret path should reach syscall_trace_leave
2009-09-23 10:11:26 -07:00
Linus Torvalds
b09a75fc5e Merge git://git.infradead.org/iommu-2.6
* git://git.infradead.org/iommu-2.6: (23 commits)
  intel-iommu: Disable PMRs after we enable translation, not before
  intel-iommu: Kill DMAR_BROKEN_GFX_WA option.
  intel-iommu: Fix integer wrap on 32 bit kernels
  intel-iommu: Fix integer overflow in dma_pte_{clear_range,free_pagetable}()
  intel-iommu: Limit DOMAIN_MAX_PFN to fit in an 'unsigned long'
  intel-iommu: Fix kernel hang if interrupt remapping disabled in BIOS
  intel-iommu: Disallow interrupt remapping if not all ioapics covered
  intel-iommu: include linux/dmi.h to use dmi_ routines
  pci/dmar: correct off-by-one error in dmar_fault()
  intel-iommu: Cope with yet another BIOS screwup causing crashes
  intel-iommu: iommu init error path bug fixes
  intel-iommu: Mark functions with __init
  USB: Work around BIOS bugs by quiescing USB controllers earlier
  ia64: IOMMU passthrough mode shouldn't trigger swiotlb init
  intel-iommu: make domain_add_dev_info() call domain_context_mapping()
  intel-iommu: Unify hardware and software passthrough support
  intel-iommu: Cope with broken HP DC7900 BIOS
  iommu=pt is a valid early param
  intel-iommu: double kfree()
  intel-iommu: Kill pointless intel_unmap_single() function
  ...

Fixed up trivial include lines conflict in drivers/pci/intel-iommu.c
2009-09-23 10:06:10 -07:00
Linus Torvalds
746942d06a Merge branch 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6
* 'sfi-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6:
  SFI: remove unneeded includes
  sfi: Remove unused code
  SFI: Hook PCI MMCONFIG
  x86: add arch-specific SFI support
  SFI: add capability to parse ACPI tables
  SFI: add platform-independent core support
  SFI: create linux/sfi.h
  SFI: Simple Firmware Interface - MAINTAINERS, Kconfig
2009-09-23 09:34:07 -07:00
Linus Torvalds
be90a49ca2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6: (142 commits)
  USB: Fix sysfs paths in documentation
  USB: skeleton: fix coding style issues.
  USB: O_NONBLOCK in read path of skeleton
  USB: make usb-skeleton honor O_NONBLOCK in write path
  USB: skel_read really sucks royally
  USB: Add hub descriptor update hook for xHCI
  USB: xhci: Support USB hubs.
  USB: xhci: Set multi-TT field for LS/FS devices under hubs.
  USB: xhci: Set route string for all devices.
  USB: xhci: Fix command wait list handling.
  USB: xhci: Change how xHCI commands are handled.
  USB: xhci: Refactor input device context setup.
  USB: xhci: Endpoint representation refactoring.
  USB: gadget: ether needs to select CRC32
  USB: fix USBTMC get_capabilities success handling
  USB: fix missing error check in probing
  USB: usbfs: add USBDEVFS_URB_BULK_CONTINUATION flag
  USB: support for autosuspend in sierra while online
  USB: ehci-dbgp,ehci: Allow dbpg to work with suspend/resume
  USB: ehci-dbgp,documentation: Documentation updates for ehci-dbgp
  ...
2009-09-23 09:25:16 -07:00
Ingo Molnar
11868a2dc4 x86: mce: Use safer ways to access MCE registers
Use rdmsrl_safe() when accessing MCE registers. While in
theory we always 'know' which ones are safe to access from
the capability bits, there's a lot of hardware variations
and reality might differ from theory, as it did in this case:

   http://bugzilla.kernel.org/show_bug.cgi?id=14204

[    0.010016] mce: CPU supports 5 MCE banks
[    0.011029] general protection fault: 0000 [#1]
[    0.011998] last sysfs file:
[    0.011998] Modules linked in:
[    0.011998]
[    0.011998] Pid: 0, comm: swapper Not tainted (2.6.31_router #1) HP Vectra
[    0.011998] EIP: 0060:[<c100d9b9>] EFLAGS: 00010246 CPU: 0
[    0.011998] EIP is at mce_rdmsrl+0x19/0x60
[    0.011998] EAX: 00000000 EBX: 00000001 ECX: 00000407 EDX: 08000000
[    0.011998] ESI: 00000000 EDI: 8c000000 EBP: 00000405 ESP: c17d5eac

So WARN_ONCE() instead of crashing the box.

( also fix a number of stylistic inconsistencies in the code. )

Note, we might still crash in wrmsrl() if we get that far, but
we shouldnt if the registers are truly inaccessible.

Reported-by: GNUtoo <GNUtoo@no-log.org>
Cc: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
LKML-Reference: <bug-14204-5438@http.bugzilla.kernel.org/>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-23 18:08:26 +02:00
Jason Wessel
c9530948bc early_printk: Allow more than one early console
It is desirable to be able to use one early boot device to debug
another or to have multiple places you can see the early boot
diagnostics, such as the vga screen or serial device.

This patch changes the early_printk console device registration to
allow more than one early printk device to get registered via
register_console().

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-23 06:46:38 -07:00
Jason Wessel
df6c516900 USB: ehci,dbgp,early_printk: split ehci debug driver from early_printk.c
Move the dbgp early printk driver in advance of refactoring and adding
new code, so the changes to this code are tracked separately from the
move of the code.

The drivers/usb/early directory will be the location of the current
and future early usb code for driving usb devices prior initializing
the standard interrupt driven USB drivers.

Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2009-09-23 06:46:38 -07:00
Peter Zijlstra
7d42896628 perf_event, x86: Fix 'perf sched record' crashing the machine
Chris Malley reported that 'perf sched record' sometimes
crashes his box with:

[  389.272175] BUG: unable to handle kernel paging request at ffffb300
[  389.272294] IP: [<c011b0bd>] default_send_IPI_self+0x1d/0x50
[  389.272366] *pde = 0073f067 *pte = 00000000
[  389.274708] Call Trace:
[  389.274752]  [<c010e3b4>] ?  set_perf_event_pending+0x14/0x20
[  389.274801]  [<c01b9751>] ?  perf_output_unlock+0x121/0x1a0
[  389.274848]  [<c01b981a>] ? perf_output_end+0x4a/0x70
[  389.274893]  [<c01ba690>] ?  __perf_event_overflow+0x240/0x2f0
[  389.274942]  [<c030963e>] ? atomic64_cmpxchg+0x1e/0x30
[  389.274988]  [<c01ba8f4>] ?  perf_swevent_ctx_event+0x1b4/0x1c0
[  389.275035]  [<c01ba773>] ?  perf_swevent_ctx_event+0x33/0x1c0
[  389.275081]  [<c01ba9a7>] ? do_perf_sw_event+0xa7/0x160
[  389.275127]  [<c01baae2>] ? perf_tp_event+0x82/0xa0
[  389.275174]  [<c012e9c6>] ?  ftrace_profile_sched_stat_runtime+0xe6/0x120
[  389.275224]  [<c012e8e0>] ?  ftrace_profile_sched_stat_runtime+0x0/0x120
[  389.275273]  [<c013c85a>] ? update_curr+0x18a/0x230
[  389.275318]  [<c013cdc5>] ?  put_prev_task_fair+0x155/0x160
[  389.275366]  [<c01618b5>] ? sched_clock_cpu+0xd5/0x110
[  389.275413]  [<c04e7525>] ? _spin_lock_irq+0x45/0x50
[  389.275458]  [<c04e424e>] ? schedule+0x20e/0xb10

The problem is that the box has no lapic enabled:

  [    0.042445] Local APIC not detected. Using dummy APIC emulation.

The below seems like the best fix. We disabled all lapic bits, except
the self-IPI-resend logic.

Reported-by: Chris Malley <mail@chrismalley.co.uk>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <7863dc4c0909221409v7893bfd3o4b590d5951a233ba@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-23 11:25:56 +02:00
Roland McGrath
8cb3ed1393 x86: ptrace: set TS_COMPAT when 32-bit ptrace sets orig_eax>=0
The 32-bit ptrace syscall on a 64-bit kernel (32-bit debugger on
32-bit task) behaves differently than a native 32-bit kernel.  When
setting a register state of orig_eax>=0 and eax=-ERESTART* when the
debugged task is NOT on its way out of a 32-bit syscall, the task will
fail to do the syscall restart logic that it should do.

Test case available at http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/erestartsys-trap.c?cvsroot=systemtap

This happens because the 32-bit ptrace syscall sets eax=0xffffffff
when it sets orig_eax>=0.  The resuming task will not sign-extend this
for the -ERESTART* check because TS_COMPAT is not set.  (So the task
thinks it is restarting after a 64-bit syscall, not a 32-bit one.)

The fix is to have 32-bit ptrace calls set TS_COMPAT when setting
orig_eax>=0.  This ensures that the 32-bit syscall restart logic
will apply when the child resumes.

Signed-off-by: Roland McGrath <roland@redhat.com>
2009-09-22 22:49:24 -07:00
Roland McGrath
08ff18e299 x86: ptrace: do not sign-extend orig_ax on write
The high 32 bits of orig_ax will be ignored when it matters,
so don't fiddle them when setting it.

Signed-off-by: Roland McGrath <roland@redhat.com>
2009-09-22 22:46:48 -07:00