Commit graph

1938 commits

Author SHA1 Message Date
Avi Kivity
0680fe5275 KVM: Bump maximum vcpu count to 64
With slots_lock converted to rcu, the entire kvm hotpath on modern processors
(with npt or ept) now scales beautifully.  Increase the maximum vcpu count to
64 to reflect this.

Signed-off-by: Avi Kivity <avi@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:45 -03:00
Marcelo Tosatti
a983fb2387 KVM: x86: switch kvm_set_memory_alias to SRCU update
Using a similar two-step procedure as for memslots.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:45 -03:00
Marcelo Tosatti
fef9cce0eb KVM: modify alias layout in x86s struct kvm_arch
Have a pointer to an allocated region inside x86's kvm_arch.

Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2010-03-01 12:35:43 -03:00
Sheng Yang
4e47c7a6d7 KVM: VMX: Add instruction rdtscp support for guest
Before enabling, execution of "rdtscp" in guest would result in #UD.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:40 -03:00
Sheng Yang
0e85188049 KVM: Add cpuid_update() callback to kvm_x86_ops
Sometime, we need to adjust some state in order to reflect guest CPUID
setting, e.g. if we don't expose rdtscp to guest, we won't want to enable
it on hardware. cpuid_update() is introduced for this purpose.

Also export kvm_find_cpuid_entry() for later use.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:40 -03:00
Avi Kivity
fc78f51938 KVM: Add accessor for reading cr4 (or some bits of cr4)
Some bits of cr4 can be owned by the guest on vmx, so when we read them,
we copy them to the vcpu structure.  In preparation for making the set of
guest-owned bits dynamic, use helpers to access these bits so we don't need
to know where the bit resides.

No changes to svm since all bits are host-owned there.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:39 -03:00
Avi Kivity
cdc0e24456 KVM: VMX: Move some cr[04] related constants to vmx.c
They have no place in common code.

Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:39 -03:00
Sheng Yang
59708670b6 KVM: VMX: Trap and invalid MWAIT/MONITOR instruction
We don't support these instructions, but guest can execute them even if the
feature('monitor') haven't been exposed in CPUID. So we would trap and inject
a #UD if guest try this way.

Cc: stable@kernel.org
Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2010-03-01 12:35:39 -03:00
Robert Richter
bb1165d688 perf, x86: rename macro in ARCH_PERFMON_EVENTSEL_ENABLE
For consistency reasons this patch renames
ARCH_PERFMON_EVENTSEL0_ENABLE to ARCH_PERFMON_EVENTSEL_ENABLE.

The following is performed:

 $ sed -i -e s/ARCH_PERFMON_EVENTSEL0_ENABLE/ARCH_PERFMON_EVENTSEL_ENABLE/g \
   arch/x86/include/asm/perf_event.h arch/x86/kernel/cpu/perf_event.c \
   arch/x86/kernel/cpu/perf_event_p6.c \
   arch/x86/kernel/cpu/perfctr-watchdog.c \
   arch/x86/oprofile/op_model_amd.c arch/x86/oprofile/op_model_ppro.c

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-03-01 14:21:23 +01:00
Joerg Roedel
5d214fe6e8 x86/amd-iommu: Protect IOMMU-API map/unmap path
This patch introduces a mutex to lock page table updates in
the IOMMU-API path. We can't use the spin_lock here because
this patch might sleep.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-03-01 14:16:22 +01:00
Robert Richter
a163b1099d perf, x86: add some IBS macros to perf_event.h
Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-03-01 11:23:15 +01:00
Robert Richter
1d6040f17d perf, x86: make IBS macros available in perf_event.h
This patch moves code from oprofile to perf_event.h to make it also
available for usage by perf.

Signed-off-by: Robert Richter <robert.richter@amd.com>
2010-03-01 11:23:15 +01:00
Linus Torvalds
30ff056c42 Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, uv: Remove recursion in uv_heartbeat_enable()
  x86, uv: uv_global_gru_mmr_address() macro fix
  x86, uv: Add serial number parameter to uv_bios_get_sn_info()
2010-02-28 11:00:55 -08:00
Linus Torvalds
6f5621cb16 Merge branch 'x86-ptrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-ptrace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, ptrace: Remove set_stopped_child_used_math() in [x]fpregs_set
  x86, ptrace: Simplify xstateregs_get()
  ptrace: Fix ptrace_regset() comments and diagnose errors specifically
  parisc: Disable CONFIG_HAVE_ARCH_TRACEHOOK
  ptrace: Add support for generic PTRACE_GETREGSET/PTRACE_SETREGSET
  x86, ptrace: regset extensions to support xstate
2010-02-28 10:59:44 -08:00
Linus Torvalds
c7e15899d0 Merge branch 'x86-pci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-pci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Enable NMI on all cpus on UV
  vgaarb: Add user selectability of the number of GPUS in a system
  vgaarb: Fix VGA arbiter to accept PCI domains other than 0
  x86, uv: Update UV arch to target Legacy VGA I/O correctly.
  pci: Update pci_set_vga_state() to call arch functions
2010-02-28 10:59:18 -08:00
Linus Torvalds
d6cd4715e2 Merge branch 'x86-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-rwsem-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64, rwsem: Avoid store forwarding hazard in __downgrade_write
  x86-64, rwsem: 64-bit xadd rwsem implementation
  x86: Fix breakage of UML from the changes in the rwsem system
  x86-64: support native xadd rwsem implementation
  x86: clean up rwsem type system
2010-02-28 10:41:35 -08:00
Linus Torvalds
1eca9acbf9 Merge branch 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-numa-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, numa: Remove configurable node size support for numa emulation
  x86, numa: Add fixed node size option for numa emulation
  x86, numa: Fix numa emulation calculation of big nodes
  x86, acpi: Map hotadded cpu to correct node.
2010-02-28 10:39:36 -08:00
Linus Torvalds
46bbffad54 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mm: Unify kernel_physical_mapping_init() API
  x86, mm: Allow highmem user page tables to be disabled at boot time
  x86: Do not reserve brk for DMI if it's not going to be used
  x86: Convert tlbstate_lock to raw_spinlock
  x86: Use the generic page_is_ram()
  x86: Remove BIOS data range from e820
  Move page_is_ram() declaration to mm.h
  Generic page_is_ram: use __weak
  resources: introduce generic page_is_ram()
2010-02-28 10:38:45 -08:00
Linus Torvalds
85fe20bfd4 Merge branch 'x86-io-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-io-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Merge io.h
  x86: Simplify flush_write_buffers()
  x86: Clean up mem*io functions.
  x86-64: Use BUILDIO in io_64.h
  x86-64: Reorganize io_64.h
  x86-32: Remove _local variants of in/out from io_32.h
  x86-32: Move XQUAD definitions to numaq.h
2010-02-28 10:37:40 -08:00
Linus Torvalds
58f02db466 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, cacheinfo: Enable L3 CID only on AMD
  x86, cacheinfo: Remove NUMA dependency, fix for AMD Fam10h rev D1
  x86, cpu: Print AMD virtualization features in /proc/cpuinfo
  x86, cacheinfo: Calculate L3 indices
  x86, cacheinfo: Add cache index disable sysfs attrs only to L3 caches
  x86, cacheinfo: Fix disabling of L3 cache indices
  intel-agp: Switch to wbinvd_on_all_cpus
  x86, lib: Add wbinvd smp helpers
2010-02-28 10:37:06 -08:00
Linus Torvalds
a7f16d10b5 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Mark atomic irq ops raw for 32bit legacy
  x86: Merge show_regs()
  x86: Macroise x86 cache descriptors
  x86-32: clean up rwsem inline asm statements
  x86: Merge asm/atomic_{32,64}.h
  x86: Sync asm/atomic_32.h and asm/atomic_64.h
  x86: Split atomic64_t functions into seperate headers
  x86-64: Modify memcpy()/memset() alternatives mechanism
  x86-64: Modify copy_user_generic() alternatives mechanism
  x86: Lift restriction on the location of FIX_BTMAP_*
  x86, core: Optimize hweight32()
2010-02-28 10:35:09 -08:00
Linus Torvalds
6556a67435 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (172 commits)
  perf_event, amd: Fix spinlock initialization
  perf_event: Fix preempt warning in perf_clock()
  perf tools: Flush maps on COMM events
  perf_events, x86: Split PMU definitions into separate files
  perf annotate: Handle samples not at objdump output addr boundaries
  perf_events, x86: Remove superflous MSR writes
  perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in()
  perf_events, x86: AMD event scheduling
  perf_events: Add new start/stop PMU callbacks
  perf_events: Report the MMAP pgoff value in bytes
  perf annotate: Defer allocating sym_priv->hist array
  perf symbols: Improve debugging information about symtab origins
  perf top: Use a macro instead of a constant variable
  perf symbols: Check the right return variable
  perf/scripts: Tag syscall_name helper as not yet available
  perf/scripts: Add perf-trace-python Documentation
  perf/scripts: Remove unnecessary PyTuple resizes
  perf/scripts: Add syscall tracing scripts
  perf/scripts: Add Python scripting engine
  perf/scripts: Remove check-perf-trace from listed scripts
  ...

Fix trivial conflict in tools/perf/util/probe-event.c
2010-02-28 10:20:25 -08:00
Linus Torvalds
e0d272429a Merge branch 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
  ftrace: Add function names to dangling } in function graph tracer
  tracing: Simplify memory recycle of trace_define_field
  tracing: Remove unnecessary variable in print_graph_return
  tracing: Fix typo of info text in trace_kprobe.c
  tracing: Fix typo in prof_sysexit_enable()
  tracing: Remove CONFIG_TRACE_POWER from kernel config
  tracing: Fix ftrace_event_call alignment for use with gcc 4.5
  ftrace: Remove memory barriers from NMI code when not needed
  tracing/kprobes: Add short documentation for HAVE_REGS_AND_STACK_ACCESS_API
  s390: Add pt_regs register and stack access API
  tracing/kprobes: Make Kconfig dependencies generic
  tracing: Unify arch_syscall_addr() implementations
  tracing: Add notrace to TRACE_EVENT implementation functions
  ftrace: Allow to remove a single function from function graph filter
  tracing: Add correct/incorrect to sort keys for branch annotation output
  tracing: Simplify test for function_graph tracing start point
  tracing: Drop the tr check from the graph tracing path
  tracing: Add stack dump to trace_printk if stacktrace option is set
  tracing: Use appropriate perl constructs in recordmcount.pl
  tracing: optimize recordmcount.pl for offsets-handling
  ...
2010-02-28 10:17:55 -08:00
Linus Torvalds
f91b22c35f Merge branches 'core-ipi-for-linus', 'core-locking-for-linus', 'tracing-fixes-for-linus', 'x86-debug-for-linus', 'x86-doc-for-linus', 'x86-gpu-for-linus' and 'x86-rlimit-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-ipi-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  generic-ipi: Optimize accesses by using DEFINE_PER_CPU_SHARED_ALIGNED for IPI data

* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  plist: Fix grammar mistake, and c-style mistake

* 'tracing-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  kprobes: Add mcount to the kprobes blacklist

* 'x86-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86_64: Print modules like i386 does

* 'x86-doc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Put 'nopat' in kernel-parameters

* 'x86-gpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64: Allow fbdev primary video code

* 'x86-rlimit-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Use helpers for rlimits
2010-02-28 10:04:02 -08:00
Ian Campbell
dad52fc011 x86, paravirt: Remove kmap_atomic_pte paravirt op.
Now that both Xen and VMI disable allocations of PTE pages from high
memory this paravirt op serves no further purpose.

This effectively reverts ce6234b5 "add kmap_atomic_pte for mapping
highpte pages".

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1267204562-11844-3-git-send-email-ian.campbell@citrix.com>
Acked-by: Alok Kataria <akataria@vmware.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-27 14:41:35 -08:00
Frederic Weisbecker
3d083407a1 x86/hw-breakpoints: Remove the name field
Remove the name field from the arch_hw_breakpoint. We never deal
with target symbols in the arch level, neither do we need to ever
store it. It's a legacy for the previous version of the x86
breakpoint backend.

Let's remove it.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-27 17:24:15 +01:00
Frederic Weisbecker
018cbffe68 Merge commit 'v2.6.33' into perf/core
Merge reason:
	__percpu annotations need the corresponding sparse address
space definition upstream.

Conflicts:
	tools/perf/util/probe-event.c (trivial)
2010-02-27 16:18:46 +01:00
Russ Anderson
78c0617646 x86: Enable NMI on all cpus on UV
Enable NMI on all cpus in UV system and add an NMI handler
to dump_stack on each cpu.

By default on x86 all the cpus except the boot cpu have NMI
masked off.  This patch enables NMI on all cpus in UV system
and adds an NMI handler to dump_stack on each cpu.  This
way if a system hangs we can NMI the machine and get a
backtrace from all the cpus.

Version 2: Use x86_platform driver mechanism for nmi init, per
           Ingo's suggestion.

Version 3: Clean up Ingo's nits.

Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20100226164912.GA24439@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-27 12:34:21 +01:00
Ingo Molnar
6fb83029db Merge branch 'tracing/core' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing into tracing/core 2010-02-27 10:06:10 +01:00
Luca Barbieri
a7e926abc3 x86-32: Rewrite 32-bit atomic64 functions in assembly
This patch replaces atomic64_32.c with two assembly implementations,
one for 386/486 machines using pushf/cli/popf and one for 586+ machines
using cmpxchg8b.

The cmpxchg8b implementation provides the following advantages over the
current one:

1. Implements atomic64_add_unless, atomic64_dec_if_positive and
   atomic64_inc_not_zero

2. Uses the ZF flag changed by cmpxchg8b instead of doing a comparison

3. Uses custom register calling conventions that reduce or eliminate
   register moves to suit cmpxchg8b

4. Reads the initial value instead of using cmpxchg8b to do that.
   Currently we use lock xaddl and movl, which seems the fastest.

5. Does not use the lock prefix for atomic64_set
   64-bit writes are already atomic, so we don't need that.
   We still need it for atomic64_read to avoid restoring a value
   changed in the meantime.

6. Allocates registers as well or better than gcc

The 386 implementation provides support for 386 and 486 machines.
386/486 SMP is not supported (we dropped it), but such support can be
added easily if desired.

A pure assembly implementation is required due to the custom calling
conventions, and desire to use %ebp in atomic64_add_return (we need
7 registers...), as well as the ability to use pushf/popf in the 386
code without an intermediate pop/push.

The parameter names are changed to match the convention in atomic_64.h

Changes in v3 (due to rebasing to tip/x86/asm):
- Patches atomic64_32.h instead of atomic_32.h
- Uses the CALL alternative mechanism from commit
  1b1d925818

Changes in v2:
- Merged 386 and cx8 support in the same patch
- 386 support now done in assembly, C code no longer used at all
- cmpxchg64 is used for atomic64_cmpxchg
- stop using macros, use one-line inline functions instead
- miscellanous changes and improvements

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
LKML-Reference: <1267005265-27958-5-git-send-email-luca@luca-barbieri.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-25 20:47:30 -08:00
Luca Barbieri
9c76b38476 x86-32: Allow UP/SMP lock replacement in cmpxchg64
Use the functionality just introduced in the previous patch: mark the
lock prefixes in cmpxchg64 alternatives for UP removal.

Changes in v2:
- Naming change

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
LKML-Reference: <1267005265-27958-3-git-send-email-luca@luca-barbieri.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-25 20:47:03 -08:00
Luca Barbieri
b3ac891b67 x86: Add support for lock prefix in alternatives
The current lock prefix UP/SMP alternative code doesn't allow
LOCK_PREFIX to be used in alternatives code.

This patch solves the problem by adding a new LOCK_PREFIX_ALTERNATIVE_PATCH
macro that only records the lock prefix location but does not emit
the prefix.

The user of this macro can then start any alternative sequence with
"lock" and have it UP/SMP patched.

To make this work, the UP/SMP alternative code is changed to do the
lock/DS prefix switching only if the byte actually contains a lock or
DS prefix.

Thus, if an alternative without the "lock" is selected, it will now do
nothing instead of clobbering the code.

Changes in v2:
- Naming change
- Change label to not conflict with alternatives

Signed-off-by: Luca Barbieri <luca@luca-barbieri.com>
LKML-Reference: <1267005265-27958-2-git-send-email-luca@luca-barbieri.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-25 20:46:23 -08:00
Thomas Gleixner
d5d0e88c1e x86, olpc: Use pci subarch init for OLPC
Replace the #ifdef'ed OLPC-specific init functions by a conditional
x86_init function.  If the function returns 0 we leave pci_arch_init,
otherwise we continue.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: Andres Salomon <dilinger@collabora.co.uk>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318CE89@orsmsx508.amr.corp.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-25 19:26:23 -08:00
Thomas Gleixner
4fb6088a5c x86, pci: Add arch_init to x86_init abstraction
Added an abstraction function for arch specific init calls.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318CE84@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-25 19:24:43 -08:00
Russell King
9f33be2c3a Merge branches 'clks' and 'pnx' into devel 2010-02-25 22:10:38 +00:00
Masami Hiramatsu
c0f7ac3a9e kprobes/x86: Support kprobes jump optimization on x86
Introduce x86 arch-specific optimization code, which supports
both of x86-32 and x86-64.

This code also supports safety checking, which decodes whole of
a function in which probe is inserted, and checks following
conditions before optimization:
 - The optimized instructions which will be replaced by a jump instruction
   don't straddle the function boundary.
 - There is no indirect jump instruction, because it will jumps into
   the address range which is replaced by jump operand.
 - There is no jump/loop instruction which jumps into the address range
   which is replaced by jump operand.
 - Don't optimize kprobes if it is in functions into which fixup code will
   jumps.

This uses text_poke_multibyte() which doesn't support modifying
code on NMI/MCE handler. However, since kprobes itself doesn't
support NMI/MCE code probing, it's not a problem.

Changes in v9:
 - Use *_text_reserved() for checking the probe can be optimized.
 - Verify jump address range is in 2G range when preparing slot.
 - Backup original code when switching optimized buffer, instead of
   preparing buffer, because there can be int3 of other probes in
   preparing phase.
 - Check kprobe is disabled in arch_check_optimized_kprobe().
 - Strictly check indirect jump opcodes (ff /4, ff /5).

Changes in v6:
 - Split stop_machine-based jump patching code.
 - Update comments and coding style.

Changes in v5:
 - Introduce stop_machine-based jump replacing.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Anders Kaseorg <andersk@ksplice.com>
Cc: Tim Abbott <tabbott@ksplice.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20100225133446.6725.78994.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25 17:49:26 +01:00
Masami Hiramatsu
3d55cc8a05 x86: Add text_poke_smp for SMP cross modifying code
Add generic text_poke_smp for SMP which uses stop_machine()
to synchronize modifying code.
This stop_machine() method is officially described at "7.1.3
Handling Self- and Cross-Modifying Code" on the intel's
software developer's manual 3A.

Since stop_machine() can't protect code against NMI/MCE, this
function can not modify those handlers. And also, this function
is basically for modifying multibyte-single-instruction. For
modifying multibyte-multi-instructions, we need another special
trap & detour code.

This code originaly comes from immediate values with
stop_machine() version. Thanks Jason and Mathieu!

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Anders Kaseorg <andersk@ksplice.com>
Cc: Tim Abbott <tabbott@ksplice.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20100225133438.6725.80273.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25 17:49:26 +01:00
Masami Hiramatsu
d498f76395 kprobes/x86: Cleanup RELATIVEJUMP_INSTRUCTION to RELATIVEJUMP_OPCODE
Change RELATIVEJUMP_INSTRUCTION macro to RELATIVEJUMP_OPCODE
since it represents just the opcode byte.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Anders Kaseorg <andersk@ksplice.com>
Cc: Tim Abbott <tabbott@ksplice.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
LKML-Reference: <20100225133349.6725.99302.stgit@localhost6.localdomain6>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-25 17:49:24 +01:00
Ian Campbell
1431559200 x86, mm: Allow highmem user page tables to be disabled at boot time
Distros generally (I looked at Debian, RHEL5 and SLES11) seem to
enable CONFIG_HIGHPTE for any x86 configuration which has highmem
enabled. This means that the overhead applies even to machines which
have a fairly modest amount of high memory and which therefore do not
really benefit from allocating PTEs in high memory but still pay the
price of the additional mapping operations.

Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but
no actual highptes being allocated there was a reduction in system
time used from 59.737s to 55.9s.

With CONFIG_HIGHPTE=y and highmem PTEs being allocated:
  Average Optimal load -j 4 Run (std deviation):
  Elapsed Time 175.396 (0.238914)
  User Time 515.983 (5.85019)
  System Time 59.737 (1.26727)
  Percent CPU 263.8 (71.6796)
  Context Switches 39989.7 (4672.64)
  Sleeps 42617.7 (246.307)

With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated:
  Average Optimal load -j 4 Run (std deviation):
  Elapsed Time 174.278 (0.831968)
  User Time 515.659 (6.07012)
  System Time 55.9 (1.07799)
  Percent CPU 263.8 (71.266)
  Context Switches 39929.6 (4485.13)
  Sleeps 42583.7 (373.039)

This patch allows the user to control the allocation of PTEs in
highmem from the command line ("userpte=nohigh") but retains the
status-quo as the default.

It is possible that some simple heuristic could be developed which
allows auto-tuning of this option however I don't have a sufficiently
large machine available to me to perform any particularly meaningful
experiments. We could probably handwave up an argument for a threshold
at 16G of total RAM.

Assuming 768M of lowmem we have 196608 potential lowmem PTE
pages. Each page can map 2M of RAM in a PAE-enabled configuration,
meaning a maximum of 384G of RAM could potentially be mapped using
lowmem PTEs.

Even allowing generous factor of 10 to account for other required
lowmem allocations, generous slop to account for page sharing (which
reduces the total amount of RAM mappable by a given number of PT
pages) and other innacuracies in the estimations it would seem that
even a 32G machine would not have a particularly pressing need for
highmem PTEs. I think 32G could be considered to be at the upper bound
of what might be sensible on a 32 bit machine (although I think in
practice 64G is still supported).

It's seems questionable if HIGHPTE is even a win for any amount of RAM
you would sensibly run a 32 bit kernel on rather than going 64 bit.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
LKML-Reference: <1266403090-20162-1-git-send-email-ian.campbell@citrix.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-25 10:28:19 +01:00
Yinghai Lu
28c6a0ba30 x86, legacy_irq: Remove left over nr_legacy_irqs
nr_legacy_irqs and its ilk have moved to legacy_pic.

-v2: there is one in ioapic_.c

Singed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B84AAC4.2020204@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-24 11:01:34 -08:00
Jacob Pan
bb24c47161 x86, apbt: Moorestown APB system timer driver
Moorestown platform does not have PIT or HPET platform timers.  Instead it
has a bank of eight APB timers.  The number of available timers to the os
is exposed via SFI mtmr tables.  All APB timer interrupts are routed via
ioapic rtes and delivered as MSI.
Currently, we use timer 0 and 1 for per cpu clockevent devices, timer 2
for clocksource.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318D2D2@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-24 11:01:21 -08:00
Feng Tang
cf08945596 x86, mrst: Add vrtc platform data setup code
vRTC information is obtained from SFI tables on Moorestown, this patch parses
these tables and assign the information.

Signed-off-by: Feng Tang <feng.tang@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0D@orsmsx508.amr.corp.intel.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:15:19 -08:00
Jacob Pan
16ab539585 x86, mrst: Add platform timer info parsing code
Moorestown platform timer information is obtained from SFI FW tables.
This patch parses SFI table then assign the irq information to mp_irqs.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0B@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:15:19 -08:00
Jacob Pan
af2730f6ee x86, mrst: Fill in PCI functions in x86_init layer
This patch added Moorestown platform specific PCI init functions.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D0A@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:15:19 -08:00
Jacob Pan
4966e1affb x86, ioapic: Add dummy ioapic functions
Some ioapic extern functions are used when CONFIG_X86_IO_APIC is not
defined.  We need the dummy functions to avoid a compile time error.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A318DA07@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:14:07 -08:00
Jacob Pan
05ddafb17a x86, ioapic: Early enable ioapic for timer irq
Moorestown platform needs apic ready early for the system timer irq
which is delievered via ioapic.  Should not impact other platforms.

In the longer term, once ioapic setup is moved before late time init,
we will not need this patch to do early apic enabling.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D07@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-23 23:13:19 -08:00
Bjorn Helgaas
7bc5e3f2be x86/PCI: use host bridge _CRS info by default on 2008 and newer machines
The main benefit of using ACPI host bridge window information is that
we can do better resource allocation in systems with multiple host bridges,
e.g., http://bugzilla.kernel.org/show_bug.cgi?id=14183

Sometimes we need _CRS information even if we only have one host bridge,
e.g., https://bugs.launchpad.net/ubuntu/+source/linux/+bug/341681

Most of these systems are relatively new, so this patch turns on
"pci=use_crs" only on machines with a BIOS date of 2008 or newer.

Signed-off-by: Bjorn Helgaas <bjorn.helgaas@hp.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
2010-02-23 09:43:42 -08:00
H. Peter Anvin
54b56170e4 Merge remote branch 'origin/x86/apic' into x86/mrst
Conflicts:
	arch/x86/kernel/apic/io_apic.c
2010-02-22 16:25:18 -08:00
H. Peter Anvin
d02e30c31c Merge branch 'x86/irq' into x86/apic
Merge reason:
	Conflicts in arch/x86/kernel/apic/io_apic.c

Resolved Conflicts:
	arch/x86/kernel/apic/io_apic.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-22 16:20:34 -08:00
Linus Torvalds
bee415ce42 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf probe: Init struct probe_point and set counter correctly
  hw-breakpoint: Keep track of dr7 local enable bits
  hw-breakpoints: Accept breakpoints on NULL address
  perf_events: Fix FORK events
2010-02-22 08:55:32 -08:00
H. Peter Anvin
aef55d4922 Merge branch 'x86/urgent' into x86/irq
Merge reason: conflict in arch/x86/kernel/apic/io_apic.c

Resolved Conflicts:
	arch/x86/kernel/apic/io_apic.c

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-20 22:54:05 -08:00
Russell King
4b3073e1c5 MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself
On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies.  We do this via make_coherent() by making the pages
uncacheable.

This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().

Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():

  On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
  to construct a pointer to the pte again.  Passing a pte_t * is much
  more elegant.  Maybe we might even replace the pte argument with the
  pte_t?

Ben Herrenschmidt would also like the pte pointer for PowerPC:

  Passing the ptep in there is exactly what I want.  I want that
  -instead- of the PTE value, because I have issue on some ppc cases,
  for I$/D$ coherency, where set_pte_at() may decide to mask out the
  _PAGE_EXEC.

So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.

Includes a fix from Stephen Rothwell:

  sparc: fix fallout from update_mmu_cache API change

  Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>

Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
2010-02-20 16:41:46 +00:00
Jacob Pan
1f91233c26 x86, apic: Remove ioapic_disable_legacy()
The ioapic_disable_legacy() call is no longer needed for platforms do
not have legacy pic. the legacy pic abstraction has taken care it
automatically.

This patch also initialize irq-related static variables based on
information obtained from legacy_pic.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F0755A30A7660@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 17:16:38 -08:00
Jacob Pan
b81bb373a7 x86, pic: Make use of legacy_pic abstraction
This patch replaces legacy PIC-related global variable and functions
with the new legacy_pic abstraction.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D04@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:25:17 -08:00
Jacob Pan
ef3548668c x86, pic: Introduce legacy_pic abstraction
This patch makes i8259A like legacy programmable interrupt controller
code into a driver so that legacy pic functions can be selected at
runtime based on platform information, such as HW subarchitecure ID.
Default structure of legacy_pic maintains the current code path for
x86pc.

Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D03@orsmsx508.amr.corp.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:25:17 -08:00
Thomas Gleixner
9325a28ce2 x86: Add pcibios_fixup_irqs to x86_init
Platforms like Moorestown want to override the pcibios_fixup_irqs
default function. Add it to x86_init.pci.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80D00@orsmsx508.amr.corp.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:12:39 -08:00
Thomas Gleixner
ab3b37937e x86: Add pci_init_irq to x86_init
Moorestown wants to reuse pcibios_init_irq but needs to provide its
own implementation of pci_enable_irq. After we distangled the init we
can move the init_irq call to x86_init and remove the pci_enable_irq
!= NULL check in pcibios_init_irq. pci_enable_irq is compile time
initialized to pirq_enable_irq and the special cases which override it
(visws and acpi) set the x86_init function pointer to noop. That
allows MSRT to override pci_enable_irq and otherwise run
pcibios_init_irq unmodified.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80CFF@orsmsx508.amr.corp.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:12:33 -08:00
Thomas Gleixner
b72d0db9dd x86: Move pci init function to x86_init
The PCI initialization in pci_subsys_init() is a mess. pci_numaq_init,
pci_acpi_init, pci_visws_init and pci_legacy_init are called and each
implementation checks and eventually modifies the global variable
pcibios_scanned.

x86_init functions allow us to do this more elegant. The pci.init
function pointer is preset to pci_legacy_init. numaq, acpi and visws
can modify the pointer in their early setup functions. The functions
return 0 when they did the full initialization including bus scan. A
non zero return value indicates that pci_legacy_init needs to be
called either because the selected function failed or wants the
generic bus scan in pci_legacy_init to happen (e.g. visws).

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <43F901BD926A4E43B106BF17856F07559FB80CFE@orsmsx508.amr.corp.intel.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Jacob Pan <jacob.jun.pan@intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-19 16:12:29 -08:00
Frederic Weisbecker
326264a024 hw-breakpoint: Keep track of dr7 local enable bits
When the user enables breakpoints through dr7, he can choose
between "local" or "global" enable bits but given how linux is
implemented, both have the same effect.

That said we don't keep track how the user enabled the breakpoints
so when the user requests the dr7 value, we only translate the
"enabled" status using the global enabled bits. It means that if
the user enabled a breakpoint using the local enabled bit, reading
back dr7 will set the global bit and clear the local one.

Apps like Wine expect a full dr7 POKEUSER/PEEKUSER match for emulated
softwares that implement old reverse engineering protection schemes.

We fix that by keeping track of the whole dr7 value given by the user
in the thread structure to drop this bug. We'll think about
something more proper later.

This fixes a 2.6.32 - 2.6.33-x ptrace regression.

Reported-and-tested-by: Michael Stefaniuc <mstefani@redhat.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Maneesh Soni <maneesh@linux.vnet.ibm.com>
Cc: Alexandre Julliard <julliard@winehq.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Maciej Rutecki <maciej.rutecki@gmail.com>
2010-02-19 19:06:48 +01:00
Thomas Gleixner
b7e56edba4 Merge branch 'linus' into x86/mm
x86/mm is on 32-rc4 and missing the spinlock namespace changes which
are needed for further commits into this topic.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-17 18:28:05 +01:00
Mike Frysinger
e7b8e675d9 tracing: Unify arch_syscall_addr() implementations
Most implementations of arch_syscall_addr() are the same, so create a
default version in common code and move the one piece that differs (the
syscall table) to asm/syscall.h.  New arch ports don't have to waste
time copying & pasting this simple function.

The s390/sparc versions need to be different, so document why.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Paul Mundt <lethal@linux-sh.org>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <1264498803-17278-1-git-send-email-vapier@gentoo.org>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-02-17 13:07:21 +01:00
Yinghai Lu
580e0ad21d core: Move early_res from arch/x86 to kernel/
This makes the range reservation feature available to other
architectures.

-v2: add get_max_mapped, max_pfn_mapped only defined in x86...
     to fix PPC compiling
-v3: according to hpa, add CONFIG_HAVE_EARLY_RES
-v4: fix typo about EARLY_RES in config

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B7B5723.4070009@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-16 21:43:39 -08:00
Dave Airlie
477346ff74 x86-64: Allow fbdev primary video code
For some reason the 64-bit tree was doing this differently and
I can't see why it would need to.

This correct behaviour when you have two GPUs plugged in and
32-bit put the console in one place and 64-bit in another.

Signed-off-by: Dave Airlie <airlied@redhat.com>
LKML-Reference: <1262847894-27498-1-git-send-email-airlied@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-16 21:22:26 -08:00
Thomas Gleixner
5619c28061 x86: Convert i8259_lock to raw_spinlock
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-16 18:21:32 +01:00
Oleg Nesterov
11557b24fd x86: ELF_PLAT_INIT() shouldn't worry about TIF_IA32
The 64-bit version of ELF_PLAT_INIT() clears TIF_IA32, but at this point
it has already been cleared by SET_PERSONALITY == set_personality_64bit.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-02-16 08:51:49 -08:00
Ingo Molnar
17c0e7107b x86: Mark atomic irq ops raw for 32bit legacy
The atomic ops emulation for 32bit legacy CPUs floods the tracer with
irq off/on entries. The irq disabled regions are short and therefor
not interesting when chasing long irq disabled latencies. Mark them
raw and keep them out of the trace.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-16 17:19:11 +01:00
David Rientjes
8df5bb34de x86, numa: Add fixed node size option for numa emulation
numa=fake=N specifies the number of fake nodes, N, to partition the
system into and then allocates them by interleaving over physical nodes.
This requires knowledge of the system capacity when attempting to
allocate nodes of a certain size: either very large nodes to benchmark
scalability of code that operates on individual nodes, or very small
nodes to find bugs in the VM.

This patch introduces numa=fake=<size>[MG] so it is possible to specify
the size of each node to allocate.  When used, nodes of the size
specified will be allocated and interleaved over the set of physical
nodes.

FAKE_NODE_MIN_SIZE was also moved to the more-appropriate
include/asm/numa_64.h.

Signed-off-by: David Rientjes <rientjes@google.com>
LKML-Reference: <alpine.DEB.2.00.1002151342510.26927@chino.kir.corp.google.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-15 14:34:10 -08:00
Joerg Roedel
414bb144ef x86, cpu: Print AMD virtualization features in /proc/cpuinfo
This patch adds code to cpu initialization path to detect
the extended virtualization features of AMD cpus to show
them in /proc/cpuinfo.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
LKML-Reference: <1260792521-15212-1-git-send-email-joerg.roedel@amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-13 15:04:40 -08:00
Avi Kivity
0d1622d7f5 x86-64, rwsem: Avoid store forwarding hazard in __downgrade_write
The Intel Architecture Optimization Reference Manual states that a short
load that follows a long store to the same object will suffer a store
forwading penalty, particularly if the two accesses use different addresses.
Trivially, a long load that follows a short store will also suffer a penalty.

__downgrade_write() in rwsem incurs both penalties:  the increment operation
will not be able to reuse a recently-loaded rwsem value, and its result will
not be reused by any recently-following rwsem operation.

A comment in the code states that this is because 64-bit immediates are
special and expensive; but while they are slightly special (only a single
instruction allows them), they aren't expensive: a test shows that two loops,
one loading a 32-bit immediate and one loading a 64-bit immediate, both take
1.5 cycles per iteration.

Fix this by changing __downgrade_write to use the same add instruction on
i386 and on x86_64, so that it uses the same operand size as all the other
rwsem functions.

Signed-off-by: Avi Kivity <avi@redhat.com>
LKML-Reference: <1266049992-17419-1-git-send-email-avi@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-13 13:37:56 -08:00
Yinghai Lu
dd645cee7b x86: Add find_fw_memmap_area
... so we can move early_res up.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-27-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-12 09:42:40 -08:00
Yinghai Lu
9b3be9f992 Move round_up/down to kernel.h
... in preparation of moving early_res to kernel/.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-26-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-12 09:42:39 -08:00
Yinghai Lu
efdd0e81df x86: Move back find_e820_area to e820.c
Makes early_res.c more clean, so later could move it to /kernel.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-23-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-12 09:42:39 -08:00
Yinghai Lu
a678c2be75 x86: Separate early_res related code from e820.c
... to make e820.c smaller.

-v2: fix 32bit compiling with MAX_DMA32_PFN

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-21-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-12 09:42:38 -08:00
Yinghai Lu
08677214e3 x86: Make 64 bit use early_res instead of bootmem before slab
Finally we can use early_res to replace bootmem for x86_64 now.

Still can use CONFIG_NO_BOOTMEM to enable it or not.

-v2: fix 32bit compiling about MAX_DMA32_PFN
-v3: folded bug fix from LKML message below

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B747239.4070907@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-12 09:41:59 -08:00
Suresh Siddha
5b3efd5008 x86, ptrace: regset extensions to support xstate
Add the xstate regset support which helps extend the kernel ptrace and the
core-dump interfaces to support AVX state etc.

This regset interface is designed to support all the future state that gets
supported using xsave/xrstor infrastructure.

Looking at the memory layout saved by "xsave", one can't say which state
is represented in the memory layout. This is because if a particular state is
in init state, in the xsave hdr it can be represented by bit '0'. And hence
we can't really say by the xsave header wether a state is in init state or
the state is not saved in the memory layout.

And hence the xsave memory layout available through this regset
interface uses SW usable bytes [464..511] to convey what state is represented
in the memory layout.

First 8 bytes of the sw_usable_bytes[464..467] will be set to OS enabled xstate
mask(which is same as the 64bit mask returned by the xgetbv's xCR0).

The note NT_X86_XSTATE represents the extended state information in the
core file, using the above mentioned memory layout.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100211195614.802495327@sbs-t61.sc.intel.com>
Signed-off-by: Hongjiu Lu <hjl.tools@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-11 15:08:17 -08:00
Yinghai Lu
c252a5bb1f x86: Only call dma32_reserve_bootmem 64bit !CONFIG_NUMA
64bit NUMA already make enough space under 4G with new early_node_mem.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-16-git-send-email-yinghai@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10 17:47:18 -08:00
H. Peter Anvin
84abd88a70 Merge remote branch 'linus/master' into x86/bootmem 2010-02-10 16:55:28 -08:00
Yinghai Lu
18dce6ba5c x86: Fix SCI on IOAPIC != 0
Thomas Renninger <trenn@suse.de> reported on IBM x3330

booting a latest kernel on this machine results in:

PCI: PCI BIOS revision 2.10 entry at 0xfd61c, last bus=1
PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0
ACPI: SCI (IRQ30) allocation failed
ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20090903/evevent-161)
ACPI: Unable to start the ACPI Interpreter

Later all kind of devices fail...

and bisect it down to this commit:
commit b9c61b7007

    x86/pci: update pirq_enable_irq() to setup io apic routing

it turns out we need to set irq routing for the sci on ioapic1 early.

-v2: make it work without sparseirq too.
-v3: fix checkpatch.pl warning, and cc to stable

Reported-by: Thomas Renninger <trenn@suse.de>
Bisected-by: Thomas Renninger <trenn@suse.de>
Tested-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <1265793639-15071-2-git-send-email-yinghai@kernel.org>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10 13:47:39 -08:00
Serge E. Hallyn
cf9db6c41f x86-32: Make AT_VECTOR_SIZE_ARCH=2
Both x86-32 and x86-64 with 32-bit compat use ARCH_DLINFO_IA32,
which defines two saved_auxv entries.  But system.h only defines
AT_VECTOR_SIZE_ARCH as 2 for CONFIG_IA32_EMULATION, not for
CONFIG_X86_32.  Fix that.

Signed-off-by: Serge E. Hallyn <serue@us.ibm.com>
LKML-Reference: <20100209023502.GA15408@us.ibm.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-09 16:05:08 -08:00
Mike Travis
841582ea9e x86, uv: Update UV arch to target Legacy VGA I/O correctly.
Add function to direct Legacy VGA I/O traffic to correct I/O Hub.

Signed-off-by: Mike Travis <travis@sgi.com>
LKML-Reference: <201002022238.o12McEbi018727@imap1.linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: David Airlie <airlied@linux.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-05 14:05:41 -08:00
Brian Gerst
1c5b9069e1 x86: Merge io.h
io_32.h and io_64.h are now identical.  Merge them into io.h.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-8-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-05 13:57:40 -08:00
Brian Gerst
910bf6ad0b x86: Simplify flush_write_buffers()
Always make it an inline instead of using a macro for the no-op case.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-7-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-05 13:57:38 -08:00
Brian Gerst
6175ddf06b x86: Clean up mem*io functions.
Iomem has no special significance on x86.  Use the standard mem*
functions instead of trying to call other versions.  Some fixups
are needed to match the function prototypes.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-05 13:57:33 -08:00
Brian Gerst
2b4df4d4f7 x86-64: Use BUILDIO in io_64.h
Copied from io_32.h.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-05 13:57:24 -08:00
Brian Gerst
2e16fc7728 x86-64: Reorganize io_64.h
Make it more similar to io_32.h.  No real code changes.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-05 13:57:22 -08:00
Brian Gerst
bd2984e964 x86-32: Remove _local variants of in/out from io_32.h
These were leftover from the numaq support that was removed in commit
1fba38703d.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-05 13:57:18 -08:00
Brian Gerst
5c64c7019e x86-32: Move XQUAD definitions to numaq.h
The XQUAD stuff is part of the NUMAQ architecture, so move it there.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1265380629-3212-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-05 13:57:12 -08:00
Masami Hiramatsu
2cfa19780d ftrace/alternatives: Introducing *_text_reserved functions
Introducing *_text_reserved functions for checking the text
address range is partially reserved or not. This patch provides
checking routines for x86 smp alternatives and dynamic ftrace.
Since both functions modify fixed pieces of kernel text, they
should reserve and protect those from other dynamic text
modifier, like kprobes.

This will also be extended when introducing other subsystems
which modify fixed pieces of kernel text. Dynamic text modifiers
should avoid those.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: przemyslaw@pawelczyk.it
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Jim Keniston <jkenisto@us.ibm.com>
Cc: Mathieu Desnoyers <compudj@krystal.dyndns.org>
Cc: Jason Baron <jbaron@redhat.com>
LKML-Reference: <20100202214911.4694.16587.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-02-04 09:36:19 +01:00
Tejun Heo
ab386128f2 Merge branch 'master' into percpu 2010-02-02 14:38:15 +09:00
Wu Fengguang
13ca0fcaa3 x86: Use the generic page_is_ram()
The generic resource based page_is_ram() works better with memory
hotplug/hotremove. So switch the x86 e820map based code to it.

CC: Andi Kleen <andi@firstfloor.org>
CC: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
CC: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
LKML-Reference: <20100122033004.470767217@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-01 16:58:17 -08:00
Linus Torvalds
4ca5ded2bd Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/agp: Fix agp_amd64_init regression
  x86: Add quirk for Intel DG45FC board to avoid low memory corruption
  x86: Add Dell OptiPlex 760 reboot quirk
  x86, UV: Fix RTC latency bug by reading replicated cachelines
  oprofile/x86: add Xeon 7500 series support
  oprofile/x86: fix crash when profiling more than 28 events
  lib/dma-debug.c: mark file-local struct symbol static.
  x86/amd-iommu: Fix deassignment of a device from the pt_domain
  x86/amd-iommu: Fix IOMMU-API initialization for iommu=pt
  x86/amd-iommu: Fix NULL pointer dereference in __detach_device()
  x86/amd-iommu: Fix possible integer overflow
2010-02-01 10:42:35 -08:00
H. Peter Anvin
05d43ed8a8 x86: get rid of the insane TIF_ABI_PENDING bit
Now that the previous commit made it possible to do the personality
setting at the point of no return, we do just that for ELF binaries.
And suddenly all the reasons for that insane TIF_ABI_PENDING bit go
away, and we can just make SET_PERSONALITY() just do the obvious thing
for a 32-bit compat process.

Everything becomes much more straightforward this way.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-01-29 08:22:01 -08:00
Ingo Molnar
ae7f6711d6 Merge branch 'perf/urgent' into perf/core
Merge reason: We want to queue up a dependent patch. Also update to
              later -rc's.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-29 10:36:22 +01:00
Peter Zijlstra
ed8777fc13 perf_events, x86: Fix event constraint masks
Since constraints are specified on the event number, not number
and unit mask shorten the constraint masks so that we'll
actually match something.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
LKML-Reference: <20100127221121.967610372@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-29 09:01:46 +01:00
Stephane Eranian
1da53e0230 perf_events, x86: Improve x86 event scheduling
This patch improves event scheduling by maximizing the use of PMU
registers regardless of the order in which events are created in a group.

The algorithm takes into account the list of counter constraints for each
event. It assigns events to counters from the most constrained, i.e.,
works on only one counter, to the least constrained, i.e., works on any
counter.

Intel Fixed counter events and the BTS special event are also handled via
this algorithm which is designed to be fairly generic.

The patch also updates the validation of an event to use the scheduling
algorithm. This will cause early failure in perf_event_open().

The 2nd version of this patch follows the model used by PPC, by running
the scheduling algorithm and the actual assignment separately. Actual
assignment takes place in hw_perf_enable() whereas scheduling is
implemented in hw_perf_group_sched_in() and x86_pmu_enable().

Signed-off-by: Stephane Eranian <eranian@google.com>
[ fixup whitespace and style nits as well as adding is_x86_event() ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
LKML-Reference: <4b5430c6.0f975e0a.1bf9.ffff85fe@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-29 09:01:33 +01:00
K.Prasad
40f9249a73 x86/debug: Clear reserved bits of DR6 in do_debug()
Clear the reserved bits from the stored copy of debug status
register (DR6).
This will help easy bitwise operations such as quick testing
of a debug event origin.

Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <20100128111401.GB13935@in.ibm.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-01-29 02:26:10 +01:00
Ingo Molnar
e0b5f80dd4 Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2010-01-27 11:04:40 +01:00
H. Peter Anvin
b160091802 x86: Remove "x86 CPU features in debugfs" (CONFIG_X86_CPU_DEBUG)
CONFIG_X86_CPU_DEBUG, which provides some parsed versions of the x86
CPU configuration via debugfs, has caused boot failures on real
hardware.  The value of this feature has been marginal at best, as all
this information is already available to userspace via generic
interfaces.

Causes crashes that have not been fixed + minimal utility -> remove.

See the referenced LKML thread for more information.

Reported-by: Ozan Çağlayan <ozan@pardus.org.tr>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <alpine.LFD.2.00.1001221755320.13231@localhost.localdomain>
Cc: Jaswinder Singh Rajput <jaswinder@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: <stable@kernel.org>
2010-01-23 18:27:47 -08:00
Andreas Herrmann
3b2e3d85ae Revert "x86: ucode-amd: Load ucode-patches once ..."
Commit d1c84f79a6
leads to a regression when microcode_amd.c is compiled into the kernel.
It causes a big boot delay because the firmware is not available.
See http://marc.info/?l=linux-kernel&m=126267290920060

It also renders the reload sysfs attribute useless.
Fixing this is too intrusive for an -rc5 kernel.

Thus I'd like to restore the microcode loading behaviour of kernel
2.6.32.

CC: Gene Heskett <gene.heskett@verizon.net>
Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20100122203456.GB13792@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-23 06:21:59 +01:00
Pallipadi, Venkatesh
73472a46b5 x86: Disable HPET MSI on ATI SB700/SB800
HPET MSI on platforms with ATI SB700/SB800 as they seem to have some
side-effects on floppy DMA. Do not use HPET MSI on such platforms.

Original problem report from Mark Hounschell
http://lkml.indiana.edu/hypermail/linux/kernel/0912.2/01118.html

[ This patch needs to go to stable as well. But, there are some
  conflicts that prevents the patch from going as is. I can
  rebase/resubmit to stable once the patch goes upstream.
  hpa: still Cc:'ing stable@ as an FYI. ]

Tested-by: Mark Hounschell <markh@compro.net>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Cc: <stable@kernel.org>
LKML-Reference: <20100121190952.GA32523@linux-os.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-23 06:21:58 +01:00
Borislav Petkov
a7b480e7f3 x86, lib: Add wbinvd smp helpers
Add wbinvd_on_cpu and wbinvd_on_all_cpus stubs for executing wbinvd on a
particular CPU.

[ hpa: renamed lib/smp.c to lib/cache-smp.c ]
[ hpa: wbinvd_on_all_cpus() returns int, but wbinvd() returns
  void.  Thus, the former cannot be a macro for the latter,
  replace with an inline function. ]

Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1264172467-25155-2-git-send-email-bp@amd64.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-22 16:05:42 -08:00
Joerg Roedel
f532509437 x86/amd-iommu: Fix IOMMU-API initialization for iommu=pt
This patch moves the initialization of the iommu-api out of
the dma-ops initialization code. This ensures that the
iommu-api is initialized even with iommu=pt.

Cc: stable@kernel.org
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2010-01-22 17:44:35 +01:00
Stephane Eranian
b27d515a49 perf: x86: Add support for the ANY bit
Propagate the ANY bit into the fixed counter config for v3 and higher.

Signed-off-by: Stephane Eranian <eranian@google.com>
[a.p.zijlstra@chello.nl: split from larger patch]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4b5430c6.0f975e0a.1bf9.ffff85fe@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-21 13:40:41 +01:00
Suresh Siddha
97943390b0 x86, irq: Don't block IRQ0_VECTOR..IRQ15_VECTOR's on all cpu's
Currently IRQ0..IRQ15 are assigned to IRQ0_VECTOR..IRQ15_VECTOR's on
all the cpu's.

If these IRQ's are handled by legacy pic controller, then the kernel
handles them only on cpu 0. So there is no need to block this vector
space on all cpu's.

Similarly if these IRQ's are handled by IO-APIC, then the IRQ affinity
will determine on which cpu's we need allocate the vector resource for
that particular IRQ. This can be done dynamically and here also there
is no need to block 16 vectors for IRQ0..IRQ15 on all cpu's.

Fix this by initially assigning IRQ0..IRQ15 to IRQ0_VECTOR..IRQ15_VECTOR's only
on cpu 0. If the legacy controllers like pic handles these irq's, then
this configuration will be fixed. If more modern controllers like IO-APIC
handle these IRQ's, then we start with this configuration and as IRQ's
migrate, vectors (/and cpu's) associated with these IRQ's change dynamically.

This will freeup the block of 16 vectors on other cpu's which don't handle
IRQ0..IRQ15, which can now be used for other IRQ's that the particular cpu
handle.

[ hpa: this also an architectural cleanup for future legacy-PIC-free
  configurations. ]
[ hpa: fixed typo NR_LEGACY_IRQS -> NR_IRQS_LEGACY ]

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1263932453.2814.52.camel@sbs-t61.sc.intel.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-19 13:40:29 -08:00
H. Peter Anvin
1838ef1d78 x86-64, rwsem: 64-bit xadd rwsem implementation
For x86-64, 32767 threads really is not enough.  Change rwsem_count_t
to a signed long, so that it is 64 bits on x86-64.

This required the following changes to the assembly code:

a) %z0 doesn't work on all versions of gcc!  At least gcc 4.4.2 as
   shipped with Fedora 12 emits "ll" not "q" for 64 bits, even for
   integer operands.  Newer gccs apparently do this correctly, but
   avoid this problem by using the _ASM_ macros instead of %z.
b) 64 bits immediates are only allowed in "movq $imm,%reg"
   constructs... no others.  Change some of the constraints to "e",
   and fix the one case where we would have had to use an invalid
   immediate -- in that case, we only care about the upper half
   anyway, so just access the upper half.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <tip-bafaecd11df15ad5b1e598adc7736afcd38ee13d@git.kernel.org>
2010-01-18 14:00:34 -08:00
Suresh Siddha
6579b47457 x86, irq: Use 0x20 for the IRQ_MOVE_CLEANUP_VECTOR instead of 0x1f
After talking to some more folks inside intel (Peter Anvin, Asit Mallick),
the safest option (for future compatibility etc) seen was to use vector 0x20
for IRQ_MOVE_CLEANUP_VECTOR instead of using vector 0x1f (which is documented as
reserved vector in the Intel IA32 manuals).

Also we don't need to reserve the entire privilege level (all 16 vectors in
the priority bucket that IRQ_MOVE_CLEANUP_VECTOR falls into), as the
x86 architecture (section 10.9.3 in SDM Vol3a) specifies that with in the
priority level, the higher the vector number the higher the priority.
And hence we don't need to reserve the complete priority level 0x20-0x2f for
the IRQ migration cleanup logic.

So change the IRQ_MOVE_CLEANUP_VECTOR to 0x20 and  allow 0x21-0x2f to be used
for device interrupts. 0x30-0x3f will be used for ISA interrupts (these
also can be migrated in the context of IOAPIC and hence need to be at a higher
priority level than IRQ_MOVE_CLEANUP_VECTOR).

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <20100114002118.521826763@sbs-t61.sc.intel.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Maciej W. Rozycki <macro@linux-mips.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-18 10:59:59 -08:00
Linus Torvalds
330a518a1a Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, uv: Ensure hub revision set for all ACPI modes.
  x86, uv: Add function retrieving node controller revision number
  x86: xen: 64-bit kernel RPL should be 0
  x86: kernel_thread() -- initialize SS to a known state
  x86/agp: Fix agp_amd64_init and agp_amd64_cleanup
  x86: SGI UV: Fix mapping of MMIO registers
  x86: mce.h: Fix warning in header checks
2010-01-16 12:31:42 -08:00
Jack Steiner
7a1110e861 x86, uv: Add function retrieving node controller revision number
Add function for determining the revision id of the SGI UV
node controller chip (HUB). This function is needed in a
subsequent patch.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100112210904.GA24546@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-15 11:08:55 -08:00
Linus Torvalds
5d0b7235d8 x86: clean up rwsem type system
The fast version of the rwsems (the code that uses xadd) has
traditionally only worked on x86-32, and as a result it mixes different
kinds of types wildly - they just all happen to be 32-bit.  We have
"long", we have "__s32", and we have "int".

To make it work on x86-64, the types suddenly matter a lot more.  It can
be either a 32-bit or 64-bit signed type, and both work (with the caveat
that a 32-bit counter will only have 15 bits of effective write
counters, so it's limited to 32767 users).  But whatever type you
choose, it needs to be used consistently.

This makes a new 'rwsem_counter_t', that is a 32-bit signed type.  For a
64-bit type, you'd need to also update the BIAS values.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121755220.17145@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-13 22:38:51 -08:00
Alan Cox
df39a2e48f x86: mce.h: Fix warning in header checks
Someone isn't reading their build output: Move the definition
out of the exported header.

Signed-off-by: Alan Cox <alan@linux.intel.com>
Cc: linux-kernel@vger.kernelorg
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13 10:41:22 +01:00
Masami Hiramatsu
aa5add93e9 x86/ptrace: Remove unused regs_get_argument_nth API
Because of dropping function argument syntax from kprobe-tracer,
we don't need this API anymore.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: linuxppc-dev@ozlabs.org
LKML-Reference: <20100105224656.19431.92588.stgit@dhcp-100-2-132.bos.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13 10:09:12 +01:00
Frederic Weisbecker
0fb8ee48d9 perf: Drop useless check for ignored frame
The check that ignores the debug and nmi stack frames is useless
now that we have a frame pointer that makes us start at the
right place. We don't anymore have to deal with these.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1262235183-5320-2-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13 10:09:08 +01:00
Ingo Molnar
61405fea92 Merge branch 'perf/urgent' into perf/core
Merge reason: queue up dependent patch, update to -rc4

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-13 10:08:50 +01:00
Linus Torvalds
59c33fa779 x86-32: clean up rwsem inline asm statements
This makes gcc use the right register names and instruction operand sizes
automatically for the rwsem inline asm statements.

So instead of using "(%%eax)" to specify the memory address that is the
semaphore, we use "(%1)" or similar. And instead of forcing the operation
to always be 32-bit, we use "%z0", taking the size from the actual
semaphore data structure itself.

This doesn't actually matter on x86-32, but if we want to use the same
inline asm for x86-64, we'll need to have the compiler generate the proper
64-bit names for the registers (%rax instead of %eax), and if we want to
use a 64-bit counter too (in order to avoid the 15-bit limit on the
write counter that limits concurrent users to 32767 threads), we'll need
to be able to generate instructions with "q" accesses rather than "l".

Since this header currently isn't enabled on x86-64, none of that matters,
but we do want to use the xadd version of the semaphores rather than have
to take spinlocks to do a rwsem. The mm->mmap_sem can be heavily contended
when you have lots of threads all taking page faults, and the fallback
rwsem code that uses a spinlock performs abysmally badly in that case.

[ hpa: modified the patch to skip size suffixes entirely when they are
  redundant due to register operands. ]

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121613560.17145@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-12 20:43:04 -08:00
Linus Torvalds
80e23b7cea Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-tip:
  x86, irq: Check move_in_progress before freeing the vector mapping
  x86: copy_from_user() should not return -EFAULT
  Revert "x86: Side-step lguest problem by only building cmpxchg8b_emu for pre-Pentium"
  x86/pci: Intel ioh bus num reg accessing fix
  x86: Fix size for ex trampoline with 32bit
2010-01-08 13:55:52 -08:00
Jack Steiner
e1e0138d7d x86, uv: uv_global_gru_mmr_address() macro fix
Fix bug in uv_global_gru_mmr_address macro.  Macro failed
to cast an int value to a long prior to a left shift > 32.

Signed-off-by: Jack Steiner <steiner@sgi.com>
LKML-Reference: <20100107161240.GA2610@sgi.com>
Cc: <stable@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-07 11:49:57 -08:00
Brian Gerst
5abbbbf0b0 x86: Merge asm/atomic_{32,64}.h
Merge the now identical code from asm/atomic_32.h and asm/atomic_64.h
into asm/atomic.h.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1262883215-4034-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-07 11:48:38 -08:00
Brian Gerst
3ce59bb835 x86: Sync asm/atomic_32.h and asm/atomic_64.h
Prepare for merging into asm/atomic.h.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1262883215-4034-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-07 11:47:55 -08:00
Brian Gerst
1a3b1d89ed x86: Split atomic64_t functions into seperate headers
Split atomic64_t functions out into separate headers, since they will
not be practical to merge between 32 and 64 bits.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1262883215-4034-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-07 11:47:31 -08:00
Heiko Carstens
409d02ef6d x86: copy_from_user() should not return -EFAULT
Callers of copy_from_user() expect it to return the number of bytes
it could not copy. In no case it is supposed to return -EFAULT.

In case of a detected buffer overflow just return the requested
length. In addition one could think of a memset that would clear
the size of the target object.

[ hpa: code is not in .32 so not needed for -stable ]

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
LKML-Reference: <20100105131911.GC5480@osiris.boeblingen.de.ibm.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-05 13:45:06 -08:00
Christoph Lameter
5917dae83c percpu, x86: Generic inc / dec percpu instructions
Optimize code generated for percpu access by checking for increment and
decrements.

tj: fix incorrect usage of __builtin_constant_p() and restructure
    percpu_add_op() macro.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-01-05 15:34:50 +09:00
Christoph Lameter
38b7827fcd local_t: Remove cpu_local_xx macros
These macros have not been used for awhile now.

Signed-off-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
2010-01-05 15:34:49 +09:00
H. Peter Anvin
ea94396629 x86, apic: Don't waste a vector to improve vector spread
We want to use a vector-assignment sequence that avoids stumbling onto
0x80 earlier in the sequence, in order to improve the spread of
vectors across priority levels on machines with a small number of
interrupt sources.  Right now, this is done by simply making the first
vector (0x31 or 0x41) completely unusable.  This is unnecessary; all
we need is to start assignment at a +1 offset, we don't actually need
to prohibit the usage of this vector once we have wrapped around.

Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <4B426550.6000209@kernel.org>
2010-01-04 21:28:24 -08:00
H. Peter Anvin
99d113b17e x86, apic: Reclaim IDT vectors 0x20-0x2f
Reclaim 16 IDT vectors and make them available for general allocation.

Reclaim vectors 0x20-0x2f by reallocating the IRQ_MOVE_CLEANUP_VECTOR
to vector 0x1f.  This is in the range of vector numbers that is
officially reserved for the CPU (for exceptions), however, the use of
the APIC to generate any vector 0x10 or above is documented, and the
CPU internally can receive any vector number (the legacy BIOS uses INT
0x08-0x0f for interrupts, as messed up as that is.)

Since IRQ_MOVE_CLEANUP_VECTOR has to be alone in the lowest-numbered
priority level (block of 16), this effectively enables us to reclaim
an otherwise-unusable APIC priority level and put it to use.

Since this is a transient kernel-only allocation we can change it at
any time, and if/when there is an exception at vector 0x1f this
assignment needs to be changed as part of OS enabling that new feature.

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
LKML-Reference: <4B4284C6.9030107@kernel.org>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-04 21:12:52 -08:00
Tejun Heo
32032df6c2 Merge branch 'master' into percpu
Conflicts:
	arch/powerpc/platforms/pseries/hvCall.S
	include/linux/percpu.h
2010-01-05 09:17:33 +09:00
Linus Torvalds
2d959e9565 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/agp: Fix agp_amd64_init() initialization with CONFIG_GART_IOMMU enabled
  x86: SGI UV: Fix writes to led registers on remote uv hubs
  x86, kmemcheck: Use KERN_WARNING for error reporting
  x86: Use KERN_DEFAULT log-level in __show_regs()
  x86, compress: Force i386 instructions for the decompressor
  x86/amd-iommu: Fix initialization failure panic
  dma-debug: Do not add notifier when dma debugging is disabled.
  x86: Fix objdump version check in chkobjdump.awk for different formats.

Trivial conflicts in arch/x86/include/asm/uv/uv_hub.h due to me having
applied an earlier version of an SGI UV fix.
2009-12-31 11:54:13 -08:00
Linus Torvalds
b07d41b77e Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: get rid of kvm_create_vm() unused label warning on s390
  KVM: powerpc: Fix mtsrin in book3s_64 mmu
  KVM: ia64: fix build breakage due to host spinlock change
  KVM: x86: Extend KVM_SET_VCPU_EVENTS with selective updates
  KVM: LAPIC: make sure IRR bitmap is scanned after vm load
  KVM: Fix possible circular locking in kvm_vm_ioctl_assign_device()
  KVM: MMU: remove prefault from invlpg handler
2009-12-30 12:56:17 -08:00
Mike Travis
9a7262a056 x86_64 SGI UV: Fix writes to led registers on remote uv hubs.
The wrong address was being used to write the SCIR led regs on remote
hubs.  Also, there was an inconsistency between how BIOS and the kernel
indexed these regs.  Standardize on using the lower 6 bits of the APIC
ID as the index.

This patch fixes the problem of writing to an errant address to a
cpu # >= 64.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Jack Steiner <steiner@sgi.com>
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-30 12:25:26 -08:00
Jan Beulich
1b1d925818 x86-64: Modify copy_user_generic() alternatives mechanism
In order to avoid unnecessary chains of branches, rather than
implementing copy_user_generic() as a function consisting of
just a single (possibly patched) branch, instead properly deal
with patching call instructions in the alternative instructions
framework, and move the patching into the callers.

As a follow-on, one could also introduce something like
__EXPORT_SYMBOL_ALT() to avoid patching call sites in modules.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B2BB8180200007800026AE7@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-30 11:57:31 +01:00
Jan Beulich
499a5f1efa x86: Lift restriction on the location of FIX_BTMAP_*
The early ioremap fixmap entries cover half (or for 32-bit
non-PAE, a quarter) of a page table, yet they got
uncondtitionally aligned so far to a 256-entry boundary. This is
not necessary if the range of page table entries anyway falls
into a single page table.

This buys back, for (theoretically) 50% of all configurations
(25% of all non-PAE ones), at least some of the lowmem
necessarily lost with commit e621bd1895.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B2BB66F0200007800026AD6@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-30 11:57:30 +01:00
Yinghai Lu
9959c888a3 x86: Increase NR_IRQS and nr_irqs
I have a system with lots of igb and ixgbe, when iov/vf are
enabled for them, we hit the limit of 3064.

when system has 20 pcie installed, and one card has 2
functions, and one function needs 64 msi-x,
 may need 20 * 2 * 64 = 2560 for msi-x

but if iov and vf are enabled
 may need 20 * 2 * 64 * 3 = 7680 for msi-x
assume system with 5 ioapic, nr_irqs_gsi will be 120.

NR_CPUS = 512, and nr_cpu_ids = 128
will have NR_IRQS = 256 + 512 * 64 = 33024
will have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824

When SPARSE_IRQ is not set, there is no increase with kernel data
size.

when NR_CPUS=128, and SPARSE_IRQ is set:
   text		   data	    bss		   dec		 hex	filename
21837444	4216564	12480736	38534744	24bfe58	vmlinux.before
21837442	4216580	12480736	38534758	24bfe66	vmlinux.after
when NR_CPUS=4096, and SPARSE_IRQ is set
   text		   data	    bss		   dec		 hex	filename
21878619	5610244	13415392	40904255	270263f	vmlinux.before
21878617	5610244	13415392	40904253	270263d	vmlinux.after

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B398ECD.1080506@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-30 11:55:59 +01:00
Mike Travis
39d3077099 x86: SGI UV: Fix writes to led registers on remote uv hubs
The wrong address was being used to write the SCIR led regs on
remote hubs.  Also, there was an inconsistency between how BIOS
and the kernel indexed these regs.  Standardize on using the
lower 6 bits of the APIC ID as the index.

This patch fixes the problem of writing to an errant address to
a cpu # >= 64.

Signed-off-by: Mike Travis <travis@sgi.com>
Reviewed-by: Jack Steiner <steiner@sgi.com>
Cc: Robin Holt <holt@sgi.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@kernel.org
LKML-Reference: <4B3922F9.3060905@sgi.com>
[ v2: fix a number of annoying checkpatch artifacts and whitespace noise ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-29 06:47:39 +01:00
Naga Chumbalkar
fd2a50a024 x86, perfctr: Remove unused func avail_to_resrv_perfctr_nmi()
avail_to_resrv_perfctr_nmi() is neither EXPORT'd, nor used in
the file. So remove it.

Signed-off-by: Naga Chumbalkar <nagananda.chumbalkar@hp.com>
Acked-by: Cyrill Gorcunov <gorcunov@gmail.com>
Cc: oprofile-list@lists.sf.net
LKML-Reference: <20091224015441.6005.4408.sendpatchset@localhost.localdomain>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-28 09:36:46 +01:00
Ingo Molnar
605c1a187f Merge branch 'iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2009-12-28 09:23:13 +01:00
Jan Kiszka
dab4b911a5 KVM: x86: Extend KVM_SET_VCPU_EVENTS with selective updates
User space may not want to overwrite asynchronously changing VCPU event
states on write-back. So allow to skip nmi.pending and sipi_vector by
setting corresponding bits in the flags field of kvm_vcpu_events.

[avi: advertise the bits in KVM_GET_VCPU_EVENTS]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-27 13:36:33 -02:00
Len Brown
da3df858c8 Merge branch 'pdc' into release 2009-12-24 01:17:21 -05:00
Alex Chiang
6c5807d7bc ACPI: processor: finish unifying arch_acpi_processor_init_pdc()
The only thing arch-specific about calling _PDC is what bits get
set in the input obj_list buffer.

There's no need for several levels of indirection to twiddle those
bits. Additionally, since we're just messing around with a buffer,
we can simplify the interface; no need to pass around the entire
struct acpi_processor * just to get at the buffer.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22 03:24:13 -05:00
Alex Chiang
1d9cb470a7 ACPI: processor: introduce arch_has_acpi_pdc
arch dependent helper function that tells us if we should attempt to
evaluate _PDC on this machine or not.

The x86 implementation assumes that the CPUs in the machine must be
homogeneous, and that you cannot mix CPUs of different vendors.

Cc: Tony Luck <tony.luck@intel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Alex Chiang <achiang@hp.com>
Signed-off-by: Len Brown <len.brown@intel.com>
2009-12-22 03:24:10 -05:00
Linus Torvalds
eca9dfcd00 Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  perf session: Make events_stats u64 to avoid overflow on 32-bit arches
  hw-breakpoints: Fix hardware breakpoints -> perf events dependency
  perf events: Dont report side-band events on each cpu for per-task-per-cpu events
  perf events, x86/stacktrace: Fix performance/softlockup by providing a special frame pointer-only stack walker
  perf events, x86/stacktrace: Make stack walking optional
  perf events: Remove unused perf_counter.h header file
  perf probe: Check new event name
  kprobe-tracer: Check new event/group name
  perf probe: Check whether debugfs path is correct
  perf probe: Fix libdwarf include path for Debian
2009-12-19 09:48:42 -08:00
Linus Torvalds
3981e15286 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system
  Makefile: Unexport LC_ALL instead of clearing it
  x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk
  x86: Reenable TSC sync check at boot, even with NONSTOP_TSC
  x86: Don't use POSIX character classes in gen-insn-attr-x86.awk
  Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C
  x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA
  x86: Fix checking of SRAT when node 0 ram is not from 0
  x86, cpuid: Add "volatile" to asm in native_cpuid()
  x86, msr: msrs_alloc/free for CONFIG_SMP=n
  x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space
  x86: Add IA32_TSC_AUX MSR and use it
  x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers
  initramfs: add missing decompressor error check
  bzip2: Add missing checks for malloc returning NULL
  bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure
2009-12-19 09:48:14 -08:00
Suresh Siddha
18374d89e5 x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system
John Blackwood reported:
> on an older Dell PowerEdge 6650 system with 8 cpus (4 are hyper-threaded),
> and  32 bit (x86) kernel, once you change the irq smp_affinity of an irq
> to be less than all cpus in the system, you can never change really the
> irq smp_affinity back to be all cpus in the system (0xff) again,
> even though no error status is returned on the "/bin/echo ff >
> /proc/irq/[n]/smp_affinity" operation.
>
> This is due to that fact that BAD_APICID has the same value as
> all cpus (0xff) on 32bit kernels, and thus the value returned from
> set_desc_affinity() via the cpu_mask_to_apicid_and() function is treated
> as a failure in set_ioapic_affinity_irq_desc(), and no affinity changes
> are made.

set_desc_affinity() is already checking if the incoming cpu mask
intersects with the cpu online mask or not. So there is no need
for the apic op cpu_mask_to_apicid_and() to check again
and return BAD_APICID.

Remove the BAD_APICID return value from cpu_mask_to_apicid_and()
and also fix set_desc_affinity() to return -1 instead of using BAD_APICID
to represent error conditions (as cpu_mask_to_apicid_and() can return
logical or physical apicid values and BAD_APICID is really to represent
bad physical apic id).

Reported-by: John Blackwood <john.blackwood@ccur.com>
Root-caused-by: John Blackwood <john.blackwood@ccur.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1261103386.2535.409.camel@sbs-t61>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-17 22:03:06 -08:00
Russ Anderson
b76365a18f x86, uv: Add serial number parameter to uv_bios_get_sn_info()
Add system_serial_number to the information returned by
uv_bios_get_sn_info() UV BIOS call.

Signed-off-by: Russ Anderson <rja@sgi.com>
LKML-Reference: <20091217165323.GA30774@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-17 11:33:51 -08:00
Linus Torvalds
5a865c0606 Merge branch 'for-33' of git://repo.or.cz/linux-kbuild
* 'for-33' of git://repo.or.cz/linux-kbuild: (29 commits)
  net: fix for utsrelease.h moving to generated
  gen_init_cpio: fixed fwrite warning
  kbuild: fix make clean after mismerge
  kbuild: generate modules.builtin
  genksyms: properly consider  EXPORT_UNUSED_SYMBOL{,_GPL}()
  score: add asm/asm-offsets.h wrapper
  unifdef: update to upstream revision 1.190
  kbuild: specify absolute paths for cscope
  kbuild: create include/generated in silentoldconfig
  scripts/package: deb-pkg: use fakeroot if available
  scripts/package: add KBUILD_PKG_ROOTCMD variable
  scripts/package: tar-pkg: use tar --owner=root
  Kbuild: clean up marker
  net: add net_tstamp.h to headers_install
  kbuild: move utsrelease.h to include/generated
  kbuild: move autoconf.h to include/generated
  drop explicit include of autoconf.h
  kbuild: move compile.h to include/generated
  kbuild: drop include/asm
  kbuild: do not check for include/asm-$ARCH
  ...

Fixed non-conflicting clean merge of modpost.c as per comments from
Stephen Rothwell (modpost.c had grown an include of linux/autoconf.h
that needed to be changed to generated/autoconf.h)
2009-12-17 07:23:42 -08:00
Frederic Weisbecker
06d65bda75 perf events, x86/stacktrace: Fix performance/softlockup by providing a special frame pointer-only stack walker
It's just wasteful for stacktrace users like perf to walk
through every entries on the stack whereas these only accept
reliable ones, ie: that the frame pointer validates.

Since perf requires pure reliable stacktraces, it needs a stack
walker based on frame pointers-only to optimize the stacktrace
processing.

This might solve some near-lockup scenarios that can be triggered
by call-graph tracing timer events.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1261024834-5336-2-git-send-regression-fweisbec@gmail.com>
[ v2: fix for modular builds and small detail tidyup ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-17 10:42:52 +01:00
Frederic Weisbecker
61c1917f47 perf events, x86/stacktrace: Make stack walking optional
The current print_context_stack helper that does the stack
walking job is good for usual stacktraces as it walks through
all the stack and reports even addresses that look unreliable,
which is nice when we don't have frame pointers for example.

But we have users like perf that only require reliable
stacktraces, and those may want a more adapted stack walker, so
lets make this function a callback in stacktrace_ops that users
can tune for their needs.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1261024834-5336-1-git-send-regression-fweisbec@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-17 09:56:19 +01:00
Suresh Siddha
45a94d7cd4 x86, cpuid: Add "volatile" to asm in native_cpuid()
xsave_cntxt_init() does something like:

	cpuid(0xd, ..);	// find out what features FP/SSE/.. etc are supported

	xsetbv();	// enable the features known to OS

	cpuid(0xd, ..);	// find out the size of the context for features enabled

Depending on what features get enabled in xsetbv(), value of the
cpuid.eax=0xd.ecx=0.ebx changes correspondingly (representing the
size of the context that is enabled).

As we don't have volatile keyword for native_cpuid(), gcc 4.1.2
optimizes away the second cpuid and the kernel continues to use
the cpuid information obtained before xsetbv(), ultimately leading to kernel
crash on processors supporting more state than the legacy FP/SSE.

Add "volatile" for native_cpuid().

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
LKML-Reference: <1261009542.2745.55.camel@sbs-t61.sc.intel.com>
Cc: stable@kernel.org
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16 16:30:57 -08:00
Borislav Petkov
6ede31e030 x86, msr: msrs_alloc/free for CONFIG_SMP=n
Randy Dunlap reported the following build error:

"When CONFIG_SMP=n, CONFIG_X86_MSR=m:

ERROR: "msrs_free" [drivers/edac/amd64_edac_mod.ko] undefined!
ERROR: "msrs_alloc" [drivers/edac/amd64_edac_mod.ko] undefined!"

This is due to the fact that <arch/x86/lib/msr.c> is conditioned on
CONFIG_SMP and in the UP case we have only the stubs in the header.
Fork off SMP functionality into a new file (msr-smp.c) and build
msrs_{alloc,free} unconditionally.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Signed-off-by: Borislav Petkov <petkovbb@gmail.com>
LKML-Reference: <20091216231625.GD27228@liondog.tnic>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16 15:36:32 -08:00
Andreas Herrmann
9d260ebc09 x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space
Use NodeId MSR to get NodeId and number of nodes per processor.

Signed-off-by: Andreas Herrmann <andreas.herrmann3@amd.com>
LKML-Reference: <20091216144355.GB28798@alberich.amd.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-16 15:06:23 -08:00
Linus Torvalds
bac5e54c29 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
  direct I/O fallback sync simplification
  ocfs: stop using do_sync_mapping_range
  cleanup blockdev_direct_IO locking
  make generic_acl slightly more generic
  sanitize xattr handler prototypes
  libfs: move EXPORT_SYMBOL for d_alloc_name
  vfs: force reval of target when following LAST_BIND symlinks (try #7)
  ima: limit imbalance msg
  Untangling ima mess, part 3: kill dead code in ima
  Untangling ima mess, part 2: deal with counters
  Untangling ima mess, part 1: alloc_file()
  O_TRUNC open shouldn't fail after file truncation
  ima: call ima_inode_free ima_inode_free
  IMA: clean up the IMA counts updating code
  ima: only insert at inode creation time
  ima: valid return code from ima_inode_alloc
  fs: move get_empty_filp() deffinition to internal.h
  Sanitize exec_permission_lite()
  Kill cached_lookup() and real_lookup()
  Kill path_lookup_open()
  ...

Trivial conflicts in fs/direct-io.c
2009-12-16 12:04:02 -08:00
Linus Torvalds
61ecdb84c1 Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: Fix kprobes build with non-gawk awk
  x86: Split swiotlb initialization into two stages
  x86: Regex support and known-movable symbols for relocs, fix _end
  x86, msr: Remove incorrect, duplicated code in the MSR driver
  x86: Merge kernel_thread()
  x86: Sync 32/64-bit kernel_thread
  x86, 32-bit: Use same regs as 64-bit for kernel_thread_helper
  x86, 64-bit: Use user_mode() to determine new stack pointer in copy_thread()
  x86, 64-bit: Move kernel_thread to C
  x86-64, paravirt: Call set_iopl_mask() on 64 bits
  x86-32: Avoid pipeline serialization in PTREGSCALL1 and 2
  x86: Merge sys_clone
  x86, 32-bit: Convert sys_vm86 & sys_vm86old
  x86: Merge sys_sigaltstack
  x86: Merge sys_execve
  x86: Merge sys_iopl
  x86-32: Add new pt_regs stubs
  cpumask: Use modern cpumask style in arch/x86/kernel/cpu/mcheck/mce-inject.c
2009-12-16 12:02:37 -08:00
Al Viro
853b3da10d sanitize do_pipe_flags() callers in arch
* hpux_pipe() - no need to take BKL
* sys32_pipe() in arch/x86/ia32 and xtensa_pipe() in arch/xtensa -
	no need at all, since both functions are open-coded sys_pipe()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-16 12:16:40 -05:00
Jack Steiner
56abcf24ff gru: function to generate chipset IPI values
Create a function to generate the value that is written to the UV hub MMR
to cause an IPI interrupt to be sent.  The function will be used in the
GRU message queue error recovery code that sends IPIs to nodes in remote
partitions.

Signed-off-by: Jack Steiner <steiner@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:17 -08:00
Robin Holt
c2c9f11574 x86: uv: update XPC to handle updated BIOS interface
The UV BIOS has moved the location of some of their pointers to the
"partition reserved page" from memory into a uv hub MMR.  The GRU does not
support bcopy operations from MMR space so we need to special case the MMR
addresses using VLOAD operations.

Additionally, the BIOS call for registering a message queue watchlist has
removed the 'blade' value and eliminated the structure that was being
passed in.  This is also reflected in this patch.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:14 -08:00
Robin Holt
fae419f2ab x86: uv: introduce uv_gpa_is_mmr
Provide a mechanism for determining if a global physical address is
pointing to a UV hub MMR.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
Robin Holt
729d69e699 x86: uv: introduce a means to translate from gpa -> socket_paddr
The UV BIOS has been updated to implement some of our interface
functionality differently than originally expected.  These patches update
the kernel to the bios implementation and include a few minor bug fixes
which prevent us from doing significant testing on real hardware.

This patch:

For SGI UV systems, translate from a global physical address back to a
socket physical address.  This does nothing to ensure the socket physical
address is actually addressable by the kernel.  That is the responsibility
of the user of the function.

Signed-off-by: Robin Holt <holt@sgi.com>
Cc: Jack Steiner <steiner@sgi.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:13 -08:00
Jan Beulich
ac2b3e67dd dma-mapping: fix off-by-one error in dma_capable()
dma_mask is, when interpreted as address, the last valid byte, and hence
comparison msut also be done using the last valid of the buffer in
question.

Also fix the open-coded instances in lib/swiotlb.c.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Becky Bruce <beckyb@kernel.crashing.org>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:12 -08:00
Christoph Hellwig
698ba7b5a3 elf: kill USE_ELF_CORE_DUMP
Currently all architectures but microblaze unconditionally define
USE_ELF_CORE_DUMP.  The microblaze omission seems like an error to me, so
let's kill this ifdef and make sure we are the same everywhere.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: <linux-arch@vger.kernel.org>
Cc: Michal Simek <michal.simek@petalogix.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:12 -08:00
Oleg Nesterov
7f38551fc3 ptrace: x86: implement user_single_step_siginfo()
Suggested by Roland.

Implement user_single_step_siginfo() for x86.  Extract this code from
send_sigtrap().

Since x86 calls tracehook_report_syscall_exit(step => 0) the new helper is
not used yet.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Roland McGrath <roland@redhat.com>
Cc: <linux-arch@vger.kernel.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-16 07:20:08 -08:00
Sheng Yang
5df974009f x86: Add IA32_TSC_AUX MSR and use it
Clean up write_tsc() and write_tscp_aux() by replacing
hardcoded values.

No change in functionality.

Signed-off-by: Sheng Yang <sheng@linux.intel.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
LKML-Reference: <1260942485-19156-4-git-send-email-sheng@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-16 09:02:42 +01:00
Ingo Molnar
ab1eebe77d Merge branch 'x86/asm' into x86/urgent
Merge reason: it's stable so lets push it upstream.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15 20:33:28 +01:00
Linus Torvalds
8f0ddf91f2 Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (26 commits)
  clockevents: Convert to raw_spinlock
  clockevents: Make tick_device_lock static
  debugobjects: Convert to raw_spinlocks
  perf_event: Convert to raw_spinlock
  hrtimers: Convert to raw_spinlocks
  genirq: Convert irq_desc.lock to raw_spinlock
  smp: Convert smplocks to raw_spinlocks
  rtmutes: Convert rtmutex.lock to raw_spinlock
  sched: Convert pi_lock to raw_spinlock
  sched: Convert cpupri lock to raw_spinlock
  sched: Convert rt_runtime_lock to raw_spinlock
  sched: Convert rq->lock to raw_spinlock
  plist: Make plist debugging raw_spinlock aware
  bkl: Fixup core_lock fallout
  locking: Cleanup the name space completely
  locking: Further name space cleanups
  alpha: Fix fallout from locking changes
  locking: Implement new raw_spinlock
  locking: Convert raw_rwlock functions to arch_rwlock
  locking: Convert raw_rwlock to arch_rwlock
  ...
2009-12-15 09:02:01 -08:00
Andres Salomon
c95d1e53ed cs5535: drop the Geode-specific MFGPT/GPIO code
With generic modular drivers handling all of this stuff, the
geode-specific code can go away.  The cs5535-gpio, cs5535-mfgpt, and
cs5535-clockevt drivers now handle this.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:28 -08:00
Andres Salomon
f3a57a60d3 cs5535: define lxfb/gxfb MSRs in linux/cs5535.h
..and include them in the lxfb/gxfb drivers rather than asm/geode.h (where
possible).

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:28 -08:00
Andres Salomon
f060f27007 cs5535: move VSA2 checks into linux/cs5535.h
Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:28 -08:00
Andres Salomon
2e8c12436f cs5535: move the DIVIL MSR definition into linux/cs5535.h
The only thing that uses this is the reboot_fixups code.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:28 -08:00
Andres Salomon
82dca611bb cs5535: add a generic MFGPT driver
This is based on the old code on arch/x86/kernel/mfgpt_32.c, except it's
not x86 specific, it's modular, and it makes use of a PCI BAR rather than
a random MSR.  Currently module unloading is not supported; it's uncertain
whether or not it can be made work with the hardware.

[akpm@linux-foundation.org: add X86 dependency]
Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Chris Ball <cjb@laptop.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:28 -08:00
Andres Salomon
3c55494670 ALSA: cs5535audio: free OLPC quirks from reliance on MGEODE_LX cpu optimization
Previously, OLPC support for the mic extensions was only enabled in the
ALSA driver if CONFIG_OLPC and CONFIG_MGEODE_LX were both set.  This was
because the old geode GPIO code was written in a manner that assumed
CONFIG_MGEODE_LX.  With the new cs553x-gpio driver, this is no longer the
case; as such, we can drop the requirement on CONFIG_MGEODE_LX and instead
include a requirement on GPIOLIB.

We use the generic GPIO API rather than the cs553x-specific API.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: David Brownell <david-b@pacbell.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:27 -08:00
Andres Salomon
5f0a96b044 cs5535-gpio: add AMD CS5535/CS5536 GPIO driver support
This creates a CS5535/CS5536 GPIO driver which uses a gpio_chip backend
(allowing GPIO users to use the generic GPIO API if desired) while also
allowing architecture-specific users directly (via the cs5535_gpio_*
functions).

Tested on an OLPC machine.  Some Leemotes also use CS5536 (with a mips
cpu), which is why this is in drivers/gpio rather than arch/x86.
Currently, it conflicts with older geode GPIO support; once MFGPT support
is reworked to also be more generic, the older geode code will be removed.

Signed-off-by: Andres Salomon <dilinger@collabora.co.uk>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Jordan Crouse <jordan@cosmicpenguin.net>
Cc: David Brownell <david-b@pacbell.net>
Reviewed-by: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:27 -08:00
Lee Schermerhorn
4e25b2576e hugetlb: add generic definition of NUMA_NO_NODE
Move definition of NUMA_NO_NODE from ia64 and x86_64 arch specific headers
to generic header 'linux/numa.h' for use in generic code.  NUMA_NO_NODE
replaces bare '-1' where it's used in this series to indicate "no node id
specified".  Ultimately, it can be used to replace the -1 elsewhere where
it is used similarly.

Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Andi Kleen <andi@firstfloor.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: Adam Litke <agl@us.ibm.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Eric Whitney <eric.whitney@hp.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-15 08:53:12 -08:00
FUJITA Tomonori
186a25026c x86: Split swiotlb initialization into two stages
The commit f4780ca005 moves
swiotlb initialization before dma32_free_bootmem(). It's
supposed to fix a bug that the commit
75f1cdf1dd introduced, we
initialize SWIOTLB right after dma32_free_bootmem so we wrongly
steal memory area allocated for GART with broken BIOS earlier.

However, the above commit introduced another problem, which
likely breaks machines with huge amount of memory. Such a box
use the majority of DMA32_ZONE so there is no memory for
swiotlb.

With this patch, the x86 IOMMU initialization sequence are:

1. We set swiotlb to 1 in the case of (max_pfn > MAX_DMA32_PFN
   && !no_iommu). If swiotlb usage is forced by the boot option,
   we go to the step 3 and finish (we don't try to detect IOMMUs).

2. We call the detection functions of all the IOMMUs. The
   detection function sets x86_init.iommu.iommu_init to the IOMMU
   initialization function (so we can avoid calling the
   initialization functions of all the IOMMUs needlessly).

3. We initialize swiotlb (and set dma_ops to swiotlb_dma_ops) if
   swiotlb is set to 1.

4. If the IOMMU initialization function doesn't need swiotlb
   (e.g. the initialization is sucessful) then sets swiotlb to zero.

5. If we find that swiotlb is set to zero, we free swiotlb
   resource.

Reported-by: Yinghai Lu <yinghai@kernel.org>
Reported-by: Roland Dreier <rdreier@cisco.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
LKML-Reference: <20091215204729A.fujita.tomonori@lab.ntt.co.jp>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-15 13:01:57 +01:00
Thomas Gleixner
e5931943d0 locking: Convert raw_rwlock functions to arch_rwlock
Name space cleanup for rwlock functions. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner
fb3a6bbc91 locking: Convert raw_rwlock to arch_rwlock
Not strictly necessary for -rt as -rt does not have non sleeping
rwlocks, but it's odd to not have a consistent naming convention.

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner
0199c4e68d locking: Convert __raw_spin* functions to arch_spin*
Name space cleanup. No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner
edc35bd72e locking: Rename __RAW_SPIN_LOCK_UNLOCKED to __ARCH_SPIN_LOCK_UNLOCKED
Further name space cleanup. No functional change

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Thomas Gleixner
445c89514b locking: Convert raw_spinlock to arch_spinlock
The raw_spin* namespace was taken by lockdep for the architecture
specific implementations. raw_spin_* would be the ideal name space for
the spinlocks which are not converted to sleeping locks in preempt-rt.

Linus suggested to convert the raw_ to arch_ locks and cleanup the
name space instead of using an artifical name like core_spin,
atomic_spin or whatever

No functional change.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Ingo Molnar <mingo@elte.hu>
Cc: linux-arch@vger.kernel.org
2009-12-14 23:55:32 +01:00
Linus Torvalds
75b08038ce Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, mce: Clean up thermal init by introducing intel_thermal_supported()
  x86, mce: Thermal monitoring depends on APIC being enabled
  x86: Gart: fix breakage due to IOMMU initialization cleanup
  x86: Move swiotlb initialization before dma32_free_bootmem
  x86: Fix build warning in arch/x86/mm/mmio-mod.c
  x86: Remove usedac in feature-removal-schedule.txt
  x86: Fix duplicated UV BAU interrupt vector
  nvram: Fix write beyond end condition; prove to gcc copy is safe
  mm: Adjust do_pages_stat() so gcc can see copy_from_user() is safe
  x86: Limit the number of processor bootup messages
  x86: Remove enabling x2apic message for every CPU
  doc: Add documentation for bootloader_{type,version}
  x86, msr: Add support for non-contiguous cpumasks
  x86: Use find_e820() instead of hard coded trampoline address
  x86, AMD: Fix stale cpuid4_info shared_map data in shared_cpu_map cpumasks

Trivial percpu-naming-introduced conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c
2009-12-14 12:36:46 -08:00
Linus Torvalds
d0316554d3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
  m68k: rename global variable vmalloc_end to m68k_vmalloc_end
  percpu: add missing per_cpu_ptr_to_phys() definition for UP
  percpu: Fix kdump failure if booted with percpu_alloc=page
  percpu: make misc percpu symbols unique
  percpu: make percpu symbols in ia64 unique
  percpu: make percpu symbols in powerpc unique
  percpu: make percpu symbols in x86 unique
  percpu: make percpu symbols in xen unique
  percpu: make percpu symbols in cpufreq unique
  percpu: make percpu symbols in oprofile unique
  percpu: make percpu symbols in tracer unique
  percpu: make percpu symbols under kernel/ and mm/ unique
  percpu: remove some sparse warnings
  percpu: make alloc_percpu() handle array types
  vmalloc: fix use of non-existent percpu variable in put_cpu_var()
  this_cpu: Use this_cpu_xx in trace_functions_graph.c
  this_cpu: Use this_cpu_xx for ftrace
  this_cpu: Use this_cpu_xx in nmi handling
  this_cpu: Use this_cpu operations in RCU
  this_cpu: Use this_cpu ops for VM statistics
  ...

Fix up trivial (famous last words) global per-cpu naming conflicts in
	arch/x86/kvm/svm.c
	mm/slab.c
2009-12-14 09:58:24 -08:00
Cliff Wickman
1d865fb728 x86: Fix duplicated UV BAU interrupt vector
Interrupt vector 0xec has been doubly defined in irq_vectors.h

It seems arbitrary whether LOCAL_PENDING_VECTOR or
UV_BAU_MESSAGE is the higher number.  As long as they are
unique. If they are not unique we'll hit a BUG in
alloc_system_vector().

Signed-off-by: Cliff Wickman <cpw@sgi.com>
Cc: <stable@kernel.org>
LKML-Reference: <E1NJ9Pe-0004P7-0Q@eag09.americas.sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-13 08:17:40 +01:00
Sam Ravnborg
559df2e021 kbuild: move asm-offsets.h to include/generated
The simplest method was to add an extra asm-offsets.h
file in arch/$ARCH/include/asm that references the generated file.

We can now migrate the architectures one-by-one to reference
the generated file direct - and when done we can delete the
temporary arch/$ARCH/include/asm/asm-offsets.h file.

Signed-off-by: Sam Ravnborg <sam@ravnborg.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Michal Marek <mmarek@suse.cz>
2009-12-12 13:08:14 +01:00
Linus Torvalds
756300983f Merge branch 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86/amd-iommu: Fix PCI hotplug with passthrough mode
  x86/amd-iommu: Fix passthrough mode
  x86: mmio-mod.c: Use pr_fmt
  x86: kmmio.c: Add and use pr_fmt(fmt)
  x86: i8254.c: Add pr_fmt(fmt)
  x86: setup_percpu.c: Use pr_<level> and add pr_fmt(fmt)
  x86: es7000_32.c: Use pr_<level> and add pr_fmt(fmt)
  x86: Print DMI_BOARD_NAME as well as DMI_PRODUCT_NAME from __show_regs()
  x86: Factor duplicated code out of __show_regs() into show_regs_common()
  arch/x86/kernel/microcode*: Use pr_fmt() and remove duplicated KERN_ERR prefix
  x86, mce: fix confusion between bank attributes and mce attributes
  x86/mce: Set up timer unconditionally
  x86: Fix bogus warning in apic_noop.apic_write()
  x86: Fix typo in arch/x86/mm/kmmio.c
  x86: ASUS P4S800 reboot=bios quirk
2009-12-11 20:47:59 -08:00
Linus Torvalds
aad3bf04dc Merge git://git.kernel.org/pub/scm/linux/kernel/git/viro/mmap
* git://git.kernel.org/pub/scm/linux/kernel/git/viro/mmap:
  Add missing alignment check in arch/score sys_mmap()
  fix broken aliasing checks for MAP_FIXED on sparc32, mips, arm and sh
  Get rid of open-coding in ia64_brk()
  sparc_brk() is not needed anymore
  switch do_brk() to get_unmapped_area()
  Take arch_mmap_check() into get_unmapped_area()
  fix a struct file leak in do_mmap_pgoff()
  Unify sys_mmap*
  Cut hugetlb case early for 32bit on ia64
  arch_mmap_check() on mn10300
  Kill ancient crap in s390 compat mmap
  arm: add arch_mmap_check(), get rid of sys_arm_mremap()
  file ->get_unmapped_area() shouldn't duplicate work of get_unmapped_area()
  kill useless checks in sparc mremap variants
  fix pgoff in "have to relocate" case of mremap()
  fix the arch checks in MREMAP_FIXED case
  fix checks for expand-in-place mremap
  do_mremap() untangling, part 3
  do_mremap() untangling, part 2
  untangling do_mremap(), part 1
2009-12-11 12:23:29 -08:00
Linus Torvalds
11bd04f6f3 Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6
* 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: (109 commits)
  PCI: fix coding style issue in pci_save_state()
  PCI: add pci_request_acs
  PCI: fix BUG_ON triggered by logical PCIe root port removal
  PCI: remove ifdefed pci_cleanup_aer_correct_error_status
  PCI: unconditionally clear AER uncorr status register during cleanup
  x86/PCI: claim SR-IOV BARs in pcibios_allocate_resource
  PCI: portdrv: remove redundant definitions
  PCI: portdrv: remove unnecessary struct pcie_port_data
  PCI: portdrv: minor cleanup for pcie_port_device_register
  PCI: portdrv: add missing irq cleanup
  PCI: portdrv: enable device before irq initialization
  PCI: portdrv: cleanup service irqs initialization
  PCI: portdrv: check capabilities first
  PCI: portdrv: move PME capability check
  PCI: portdrv: remove redundant pcie type calculation
  PCI: portdrv: cleanup pcie_device registration
  PCI: portdrv: remove redundant pcie_port_device_probe
  PCI: Always set prefetchable base/limit upper32 registers
  PCI: read-modify-write the pcie device control register when initiating pcie flr
  PCI: show dma_mask bits in /sys
  ...

Fixed up conflicts in:
	arch/x86/kernel/amd_iommu_init.c
	drivers/pci/dmar.c
	drivers/pci/hotplug/acpiphp_glue.c
2009-12-11 12:18:16 -08:00
Borislav Petkov
505422517d x86, msr: Add support for non-contiguous cpumasks
The current rd/wrmsr_on_cpus helpers assume that the supplied
cpumasks are contiguous. However, there are machines out there
like some K8 multinode Opterons which have a non-contiguous core
enumeration on each node (e.g. cores 0,2 on node 0 instead of 0,1), see
http://www.gossamer-threads.com/lists/linux/kernel/1160268.

This patch fixes out-of-bounds writes (see URL above) by adding per-CPU
msr structs which are used on the respective cores.

Additionally, two helpers, msrs_{alloc,free}, are provided for use by
the callers of the MSR accessors.

Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Mauro Carvalho Chehab <mchehab@redhat.com>
Cc: Aristeu Rozanski <aris@redhat.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Doug Thompson <dougthompson@xmission.com>
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <20091211171440.GD31998@aftab>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-11 10:59:21 -08:00
H. Peter Anvin
5c6baba84e Merge commit 'linus/master' into x86/urgent 2009-12-11 10:57:42 -08:00
Al Viro
f8b7256096 Unify sys_mmap*
New helper - sys_mmap_pgoff(); switch syscalls to using it.

Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2009-12-11 06:44:29 -05:00
Yinghai Lu
893f38d144 x86: Use find_e820() instead of hard coded trampoline address
Jens found the following crash/regression:

[    0.000000] found SMP MP-table at [ffff8800000fdd80] fdd80
[    0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 0-fff BIOS data page

and

[    0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 6000-7fff TRAMPOLINE

and bisected it to b24c2a9 ("x86: Move find_smp_config()
earlier and avoid bootmem usage").

It turns out the BIOS is using the first 64k for mptable,
without reserving it.

So try to find good range for the real-mode trampoline instead of
hard coding it, in case some bios tries to use that range for sth.

Reported-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Tested-by: Jens Axboe <jens.axboe@oracle.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
LKML-Reference: <4B21630A.6000308@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-11 09:28:22 +01:00
Ingo Molnar
9cf7826743 Merge branch 'amd-iommu/fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent 2009-12-10 14:25:48 +01:00
Joerg Roedel
8638c4914f x86/amd-iommu: Fix PCI hotplug with passthrough mode
The device change notifier is initialized in the dma_ops
initialization path. But this path is never executed for
iommu=pt. Move the notifier initialization to IOMMU hardware
init code to fix this.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-12-10 12:23:47 +01:00
Joerg Roedel
b7cc9554bc x86/amd-iommu: Fix passthrough mode
The data structure changes to use dev->archdata.iommu field
broke the iommu=pt mode because in this case the
dev->archdata.iommu was left uninitialized. This moves the
inititalization of the devices into the main init function
and fixes the problem.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-12-10 12:21:31 +01:00
Linus Torvalds
4ef58d4e2a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
  tree-wide: fix misspelling of "definition" in comments
  reiserfs: fix misspelling of "journaled"
  doc: Fix a typo in slub.txt.
  inotify: remove superfluous return code check
  hdlc: spelling fix in find_pvc() comment
  doc: fix regulator docs cut-and-pasteism
  mtd: Fix comment in Kconfig
  doc: Fix IRQ chip docs
  tree-wide: fix assorted typos all over the place
  drivers/ata/libata-sff.c: comment spelling fixes
  fix typos/grammos in Documentation/edac.txt
  sysctl: add missing comments
  fs/debugfs/inode.c: fix comment typos
  sgivwfb: Make use of ARRAY_SIZE.
  sky2: fix sky2_link_down copy/paste comment error
  tree-wide: fix typos "couter" -> "counter"
  tree-wide: fix typos "offest" -> "offset"
  fix kerneldoc for set_irq_msi()
  spidev: fix double "of of" in comment
  comment typo fix: sybsystem -> subsystem
  ...
2009-12-09 19:43:33 -08:00
Brian Gerst
f839bbc5c8 x86: Merge sys_clone
Change 32-bit sys_clone to new PTREGSCALL stub, and merge with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-7-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:29:42 -08:00
Brian Gerst
f1382f157f x86, 32-bit: Convert sys_vm86 & sys_vm86old
Convert these to new PTREGSCALL stubs.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-6-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:29:23 -08:00
Brian Gerst
052acad48a x86: Merge sys_sigaltstack
Change 32-bit sys_sigaltstack to PTREGSCALL2, and merge with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-5-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:28:59 -08:00
Brian Gerst
11cf88bd0b x86: Merge sys_execve
Change 32-bit sys_execve to PTREGSCALL3, and merge with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:28:34 -08:00
Brian Gerst
27f59559d6 x86: Merge sys_iopl
Change 32-bit sys_iopl to PTREGSCALL1, and merge with 64-bit.

Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1260403316-5679-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-09 16:28:10 -08:00
Andy Isaacson
814e2c84a7 x86: Factor duplicated code out of __show_regs() into show_regs_common()
Unify x86_32 and x86_64 implementations of __show_regs() header,
standardizing on the x86_64 format string in the process. Also,
32-bit will now call print_modules.

Signed-off-by: Andy Isaacson <adi@hexapodia.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Robert Hancock <hancockrwd@gmail.com>
Cc: Richard Zidlicky <rz@linux-m68k.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <20091208082942.GA27174@hexapodia.org>
[ v2: resolved conflict ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-09 10:17:58 +01:00
Linus Torvalds
849e8dea09 Merge branch 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: hpet: Make WARN_ON understandable
  x86: arch specific support for remapping HPET MSIs
  intr-remap: generic support for remapping HPET MSIs
  x86, hpet: Simplify the HPET code
  x86, hpet: Disable per-cpu hpet timer if ARAT is supported
2009-12-08 19:26:55 -08:00
Linus Torvalds
e069efb6bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  hwrng: core - Prevent too-small buffer sizes
  hwrng: virtio-rng - Convert to new API
  hwrng: core - Replace u32 in driver API with byte array
  crypto: ansi_cprng - Move FIPS functions under CONFIG_CRYPTO_FIPS
  crypto: testmgr - Add ghash algorithm test before provide to users
  crypto: ghash-clmulni-intel - Put proper .data section in place
  crypto: ghash-clmulni-intel - Use gas macro for PCLMULQDQ-NI and PSHUFB
  crypto: aesni-intel - Use gas macro for AES-NI instructions
  x86: Generate .byte code for some new instructions via gas macro
  crypto: ghash-intel - Fix irq_fpu_usable usage
  crypto: ghash-intel - Add PSHUFB macros
  crypto: ghash-intel - Hard-code pshufb
  crypto: ghash-intel - Fix building failure on x86_32
  crypto: testmgr - Fix warning
  crypto: ansi_cprng - Fix test in get_prng_bytes
  crypto: hash - Remove cra_u.{digest,hash}
  crypto: api - Remove digest case from procfs show handler
  crypto: hash - Remove legacy hash/digest code
  crypto: ansi_cprng - Add FIPS wrapper
  crypto: ghash - Add PCLMULQDQ accelerated implementation
2009-12-08 15:55:13 -08:00
Linus Torvalds
4646575daf Merge branch 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-uv-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: UV RTC: Always enable RTC clocksource
  x86: UV RTC: Rename generic_interrupt to x86_platform_ipi
  x86: UV RTC: Clean up error handling
  x86: UV RTC: Add clocksource only boot option
  x86: UV RTC: Fix early expiry handling
2009-12-08 13:38:21 -08:00
Linus Torvalds
e7522ed5c0 Merge branch 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-process-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86-64: merge the standard and compat start_thread() functions
  x86-64: make compat_start_thread() match start_thread()
2009-12-08 13:34:53 -08:00
Linus Torvalds
e33c019722 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (36 commits)
  x86, mm: Correct the implementation of is_untracked_pat_range()
  x86/pat: Trivial: don't create debugfs for memtype if pat is disabled
  x86, mtrr: Fix sorting of mtrr after subtracting
  x86: Move find_smp_config() earlier and avoid bootmem usage
  x86, platform: Change is_untracked_pat_range() to bool; cleanup init
  x86: Change is_ISA_range() into an inline function
  x86, mm: is_untracked_pat_range() takes a normal semiclosed range
  x86, mm: Call is_untracked_pat_range() rather than is_ISA_range()
  x86: UV SGI: Don't track GRU space in PAT
  x86: SGI UV: Fix BAU initialization
  x86, numa: Use near(er) online node instead of roundrobin for NUMA
  x86, numa, bootmem: Only free bootmem on NUMA failure path
  x86: Change crash kernel to reserve via reserve_early()
  x86: Eliminate redundant/contradicting cache line size config options
  x86: When cleaning MTRRs, do not fold WP into UC
  x86: remove "extern" from function prototypes in <asm/proto.h>
  x86, mm: Report state of NX protections during boot
  x86, mm: Clean up and simplify NX enablement
  x86, pageattr: Make set_memory_(x|nx) aware of NX support
  x86, sleep: Always save the value of EFER
  ...

Fix up conflicts (added both iommu_shutdown and is_untracked_pat_range)
to 'struct x86_platform_ops') in
	arch/x86/include/asm/x86_init.h
	arch/x86/kernel/x86_init.c
2009-12-08 13:27:33 -08:00
Linus Torvalds
343036cea2 Merge branch 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-microcode-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86: ucode-amd: Move family check to microcde_amd.c's init function
  x86, ucode-amd: Ensure ucode update on suspend/resume after CPU off/online cycle
  x86: ucode-amd: Convert printk(KERN_*...) to pr_*(...)
  x86: ucode-amd: Don't warn when no ucode is available for a CPU revision
  x86: ucode-amd: Load ucode-patches once and not separately of each CPU
  x86, amd-ucode: Remove needless log messages
2009-12-08 13:25:10 -08:00
Linus Torvalds
6035ccd8e9 Merge branch 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.33' of git://git.kernel.dk/linux-2.6-block: (113 commits)
  cfq-iosched: Do not access cfqq after freeing it
  block: include linux/err.h to use ERR_PTR
  cfq-iosched: use call_rcu() instead of doing grace period stall on queue exit
  blkio: Allow CFQ group IO scheduling even when CFQ is a module
  blkio: Implement dynamic io controlling policy registration
  blkio: Export some symbols from blkio as its user CFQ can be a module
  block: Fix io_context leak after failure of clone with CLONE_IO
  block: Fix io_context leak after clone with CLONE_IO
  cfq-iosched: make nonrot check logic consistent
  io controller: quick fix for blk-cgroup and modular CFQ
  cfq-iosched: move IO controller declerations to a header file
  cfq-iosched: fix compile problem with !CONFIG_CGROUP
  blkio: Documentation
  blkio: Wait on sync-noidle queue even if rq_noidle = 1
  blkio: Implement group_isolation tunable
  blkio: Determine async workload length based on total number of queues
  blkio: Wait for cfq queue to get backlogged if group is empty
  blkio: Propagate cgroup weight updation to cfq groups
  blkio: Drop the reference to queue once the task changes cgroup
  blkio: Provide some isolation between groups
  ...
2009-12-08 08:19:16 -08:00
Linus Torvalds
ed9216c171 Merge branch 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm
* 'kvm-updates/2.6.33' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (84 commits)
  KVM: VMX: Fix comparison of guest efer with stale host value
  KVM: s390: Fix prefix register checking in arch/s390/kvm/sigp.c
  KVM: Drop user return notifier when disabling virtualization on a cpu
  KVM: VMX: Disable unrestricted guest when EPT disabled
  KVM: x86 emulator: limit instructions to 15 bytes
  KVM: s390: Make psw available on all exits, not just a subset
  KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
  KVM: VMX: Report unexpected simultaneous exceptions as internal errors
  KVM: Allow internal errors reported to userspace to carry extra data
  KVM: Reorder IOCTLs in main kvm.h
  KVM: x86: Polish exception injection via KVM_SET_GUEST_DEBUG
  KVM: only clear irq_source_id if irqchip is present
  KVM: x86: disallow KVM_{SET,GET}_LAPIC without allocated in-kernel lapic
  KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
  KVM: VMX: Remove vmx->msr_offset_efer
  KVM: MMU: update invlpg handler comment
  KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
  KVM: remove duplicated task_switch check
  KVM: powerpc: Fix BUILD_BUG_ON condition
  KVM: VMX: Use shared msr infrastructure
  ...

Trivial conflicts due to new Kconfig options in arch/Kconfig and kernel/Makefile
2009-12-08 08:02:38 -08:00
Linus Torvalds
d7fc02c7ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1815 commits)
  mac80211: fix reorder buffer release
  iwmc3200wifi: Enable wimax core through module parameter
  iwmc3200wifi: Add wifi-wimax coexistence mode as a module parameter
  iwmc3200wifi: Coex table command does not expect a response
  iwmc3200wifi: Update wiwi priority table
  iwlwifi: driver version track kernel version
  iwlwifi: indicate uCode type when fail dump error/event log
  iwl3945: remove duplicated event logging code
  b43: fix two warnings
  ipw2100: fix rebooting hang with driver loaded
  cfg80211: indent regulatory messages with spaces
  iwmc3200wifi: fix NULL pointer dereference in pmkid update
  mac80211: Fix TX status reporting for injected data frames
  ath9k: enable 2GHz band only if the device supports it
  airo: Fix integer overflow warning
  rt2x00: Fix padding bug on L2PAD devices.
  WE: Fix set events not propagated
  b43legacy: avoid PPC fault during resume
  b43: avoid PPC fault during resume
  tcp: fix a timewait refcnt race
  ...

Fix up conflicts due to sysctl cleanups (dead sysctl_check code and
CTL_UNNUMBERED removed) in
	kernel/sysctl_check.c
	net/ipv4/sysctl_net_ipv4.c
	net/ipv6/addrconf.c
	net/sctp/sysctl.c
2009-12-08 07:55:01 -08:00
Linus Torvalds
1557d33007 Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl-2.6: (43 commits)
  security/tomoyo: Remove now unnecessary handling of security_sysctl.
  security/tomoyo: Add a special case to handle accesses through the internal proc mount.
  sysctl: Drop & in front of every proc_handler.
  sysctl: Remove CTL_NONE and CTL_UNNUMBERED
  sysctl: kill dead ctl_handler definitions.
  sysctl: Remove the last of the generic binary sysctl support
  sysctl net: Remove unused binary sysctl code
  sysctl security/tomoyo: Don't look at ctl_name
  sysctl arm: Remove binary sysctl support
  sysctl x86: Remove dead binary sysctl support
  sysctl sh: Remove dead binary sysctl support
  sysctl powerpc: Remove dead binary sysctl support
  sysctl ia64: Remove dead binary sysctl support
  sysctl s390: Remove dead sysctl binary support
  sysctl frv: Remove dead binary sysctl support
  sysctl mips/lasat: Remove dead binary sysctl support
  sysctl drivers: Remove dead binary sysctl support
  sysctl crypto: Remove dead binary sysctl support
  sysctl security/keys: Remove dead binary sysctl support
  sysctl kernel: Remove binary sysctl logic
  ...
2009-12-08 07:38:50 -08:00
Jiri Kosina
d014d04386 Merge branch 'for-next' into for-linus
Conflicts:

	kernel/irq/chip.c
2009-12-07 18:36:35 +01:00
Linus Torvalds
83be7d764d Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  x86, msr, cpumask: Use struct cpumask rather than the deprecated cpumask_t
  x86, cpuid: Simplify the code in cpuid_open
  x86, cpuid: Remove the bkl from cpuid_open()
  x86, msr: Remove the bkl from msr_open()
  x86: AMD Geode LX optimizations
  x86, msr: Unify rdmsr_on_cpus/wrmsr_on_cpus
2009-12-05 15:32:35 -08:00
Linus Torvalds
ef26b1691d Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
  include/linux/compiler-gcc4.h: Fix build bug - gcc-4.0.2 doesn't understand __builtin_object_size
  x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h
  x86/alternatives: Check replacementlen <= instrlen at build time
  x86, 64-bit: Set data segments to null after switching to 64-bit mode
  x86: Clean up the loadsegment() macro
  x86: Optimize loadsegment()
  x86: Add missing might_fault() checks to copy_{to,from}_user()
  x86-64: __copy_from_user_inatomic() adjustments
  x86: Remove unused thread_return label from switch_to()
  x86, 64-bit: Fix bstep_iret jump
  x86: Don't use the strict copy checks when branch profiling is in use
  x86, 64-bit: Move K8 B step iret fixup to fault entry asm
  x86: Generate cmpxchg build failures
  x86: Add a Kconfig option to turn the copy_from_user warnings into errors
  x86: Turn the copy_from_user check into an (optional) compile time warning
  x86: Use __builtin_memset and __builtin_memcpy for memset/memcpy
  x86: Use __builtin_object_size() to validate the buffer size for copy_from_user()
2009-12-05 15:32:03 -08:00
Linus Torvalds
a77d2e081b Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (30 commits)
  x86, apic: Enable lapic nmi watchdog on AMD Family 11h
  x86: Remove unnecessary mdelay() from cpu_disable_common()
  x86, ioapic: Document another case when level irq is seen as an edge
  x86, ioapic: Fix the EOI register detection mechanism
  x86, io-apic: Move the effort of clearing remoteIRR explicitly before migrating the irq
  x86: SGI UV: Map low MMR ranges
  x86: apic: Print out SRAT table APIC id in hex
  x86: Re-get cfg_new in case reuse/move irq_desc
  x86: apic: Remove not needed #ifdef
  x86: io-apic: IO-APIC MMIO should not fail on resource insertion
  x86: Remove asm/apicnum.h
  x86: apic: Do not use stacked physid_mask_t
  x86, apic: Get rid of apicid_to_cpu_present assign on 64-bit
  x86, ioapic: Use snrpintf while set names for IO-APIC resourses
  x86, apic: Use PAGE_SIZE instead of numbers
  x86: Remove local_irq_enable()/local_irq_disable() in fixup_irqs()
  x86: Use EOI register in io-apic on intel platforms
  x86: Force irq complete move during cpu offline
  x86: Remove move_cleanup_count from irq_cfg
  x86, intr-remap: Avoid irq_chip mask/unmask in fixup_irqs() for intr-remapping
  ...
2009-12-05 15:31:25 -08:00
Linus Torvalds
c3fa27d136 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (470 commits)
  x86: Fix comments of register/stack access functions
  perf tools: Replace %m with %a in sscanf
  hw-breakpoints: Keep track of user disabled breakpoints
  tracing/syscalls: Make syscall events print callbacks static
  tracing: Add DEFINE_EVENT(), DEFINE_SINGLE_EVENT() support to docbook
  perf: Don't free perf_mmap_data until work has been done
  perf_event: Fix compile error
  perf tools: Fix _GNU_SOURCE macro related strndup() build error
  trace_syscalls: Remove unused syscall_name_to_nr()
  trace_syscalls: Simplify syscall profile
  trace_syscalls: Remove duplicate init_enter_##sname()
  trace_syscalls: Add syscall_nr field to struct syscall_metadata
  trace_syscalls: Remove enter_id exit_id
  trace_syscalls: Set event_enter_##sname->data to its metadata
  trace_syscalls: Remove unused event_syscall_enter and event_syscall_exit
  perf_event: Initialize data.period in perf_swevent_hrtimer()
  perf probe: Simplify event naming
  perf probe: Add --list option for listing current probe events
  perf probe: Add argv_split() from lib/argv_split.c
  perf probe: Move probe event utility functions to probe-event.c
  ...
2009-12-05 15:30:21 -08:00
David S. Miller
28b4d5cc17 Merge branch 'master' of /home/davem/src/GIT/linux-2.6/
Conflicts:
	drivers/net/pcmcia/fmvj18x_cs.c
	drivers/net/pcmcia/nmclan_cs.c
	drivers/net/pcmcia/xirc2ps_cs.c
	drivers/net/wireless/ray_cs.c
2009-12-05 15:22:26 -08:00
Linus Torvalds
7b626acb8f Merge branch 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'core-iommu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (63 commits)
  x86, Calgary IOMMU quirk: Find nearest matching Calgary while walking up the PCI tree
  x86/amd-iommu: Remove amd_iommu_pd_table
  x86/amd-iommu: Move reset_iommu_command_buffer out of locked code
  x86/amd-iommu: Cleanup DTE flushing code
  x86/amd-iommu: Introduce iommu_flush_device() function
  x86/amd-iommu: Cleanup attach/detach_device code
  x86/amd-iommu: Keep devices per domain in a list
  x86/amd-iommu: Add device bind reference counting
  x86/amd-iommu: Use dev->arch->iommu to store iommu related information
  x86/amd-iommu: Remove support for domain sharing
  x86/amd-iommu: Rearrange dma_ops related functions
  x86/amd-iommu: Move some pte allocation functions in the right section
  x86/amd-iommu: Remove iommu parameter from dma_ops_domain_alloc
  x86/amd-iommu: Use get_device_id and check_device where appropriate
  x86/amd-iommu: Move find_protection_domain to helper functions
  x86/amd-iommu: Simplify get_device_resources()
  x86/amd-iommu: Let domain_for_device handle aliases
  x86/amd-iommu: Remove iommu specific handling from dma_ops path
  x86/amd-iommu: Remove iommu parameter from __(un)map_single
  x86/amd-iommu: Make alloc_new_range aware of multiple IOMMUs
  ...
2009-12-05 09:49:07 -08:00
David Daney
a5fc5eba4d x86: Convert BUG() to use unreachable()
Use the new unreachable() macro instead of for(;;);.  When
allyesconfig is built with a GCC-4.5 snapshot on i686 the size of the
text segment is reduced by 3987 bytes (from 6827019 to 6823032).

Signed-off-by: David Daney <ddaney@caviumnetworks.com>
Acked-by: "H. Peter Anvin" <hpa@zytor.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ingo Molnar <mingo@redhat.com>
CC: x86@kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2009-12-05 09:10:12 -08:00
Adam Buchbinder
6070d81eb5 tree-wide: fix misspelling of "definition" in comments
"Definition" is misspelled "defintion" in several comments; this
patch fixes them. No code changes.

Signed-off-by: Adam Buchbinder <adam.buchbinder@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 23:41:47 +01:00
André Goddard Rosa
af901ca181 tree-wide: fix assorted typos all over the place
That is "success", "unknown", "through", "performance", "[re|un]mapping"
, "access", "default", "reasonable", "[con]currently", "temperature"
, "channel", "[un]used", "application", "example","hierarchy", "therefore"
, "[over|under]flow", "contiguous", "threshold", "enough" and others.

Signed-off-by: André Goddard Rosa <andre.goddard@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2009-12-04 15:39:55 +01:00
Ingo Molnar
26fb20d008 Merge branch 'perf/mce' into perf/core
Merge reason: It's ready for v2.6.33.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-03 20:11:06 +01:00
Jens Axboe
220d0b1dbf Merge branch 'master' into for-2.6.33 2009-12-03 13:49:39 +01:00
Avi Kivity
d5696725b2 KVM: VMX: Fix comparison of guest efer with stale host value
update_transition_efer() masks out some efer bits when deciding whether
to switch the msr during guest entry; for example, NX is emulated using the
mmu so we don't need to disable it, and LMA/LME are handled by the hardware.

However, with shared msrs, the comparison is made against a stale value;
at the time of the guest switch we may be running with another guest's efer.

Fix by deferring the mask/compare to the actual point of guest entry.

Noted by Marcelo.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:34:20 +02:00
Avi Kivity
eb3c79e64a KVM: x86 emulator: limit instructions to 15 bytes
While we are never normally passed an instruction that exceeds 15 bytes,
smp games can cause us to attempt to interpret one, which will cause
large latencies in non-preempt hosts.

Cc: stable@kernel.org
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Jan Kiszka
3cfc3092f4 KVM: x86: Add KVM_GET/SET_VCPU_EVENTS
This new IOCTL exports all yet user-invisible states related to
exceptions, interrupts, and NMIs. Together with appropriate user space
changes, this fixes sporadic problems of vmsave/restore, live migration
and system reset.

[avi: future-proof abi by adding a flags field]

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:25 +02:00
Avi Kivity
18863bdd60 KVM: x86 shared msr infrastructure
The various syscall-related MSRs are fairly expensive to switch.  Currently
we switch them on every vcpu preemption, which is far too often:

- if we're switching to a kernel thread (idle task, threaded interrupt,
  kernel-mode virtio server (vhost-net), for example) and back, then
  there's no need to switch those MSRs since kernel threasd won't
  be exiting to userspace.

- if we're switching to another guest running an identical OS, most likely
  those MSRs will have the same value, so there's little point in reloading
  them.

- if we're running the same OS on the guest and host, the MSRs will have
  identical values and reloading is unnecessary.

This patch uses the new user return notifiers to implement last-minute
switching, and checks the msr values to avoid unnecessary reloading.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:21 +02:00
Glauber Costa
afbcf7ab8d KVM: allow userspace to adjust kvmclock offset
When we migrate a kvm guest that uses pvclock between two hosts, we may
suffer a large skew. This is because there can be significant differences
between the monotonic clock of the hosts involved. When a new host with
a much larger monotonic time starts running the guest, the view of time
will be significantly impacted.

Situation is much worse when we do the opposite, and migrate to a host with
a smaller monotonic clock.

This proposed ioctl will allow userspace to inform us what is the monotonic
clock value in the source host, so we can keep the time skew short, and
more importantly, never goes backwards. Userspace may also need to trigger
the current data, since from the first migration onwards, it won't be
reflected by a simple call to clock_gettime() anymore.

[marcelo: future-proof abi with a flags field]
[jan: fix KVM_GET_CLOCK by clearing flags field instead of checking it]

Signed-off-by: Glauber Costa <glommer@redhat.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:19 +02:00
Jan Kiszka
6be7d3062b KVM: SVM: Cleanup NMI singlestep
Push the NMI-related singlestep variable into vcpu_svm. It's dealing
with an AMD-specific deficit, nothing generic for x86.

Acked-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>

 arch/x86/include/asm/kvm_host.h |    1 -
 arch/x86/kvm/svm.c              |   12 +++++++-----
 2 files changed, 7 insertions(+), 6 deletions(-)
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:19 +02:00
Jan Kiszka
94fe45da48 KVM: x86: Fix guest single-stepping while interruptible
Commit 705c5323 opened the doors of hell by unconditionally injecting
single-step flags as long as guest_debug signaled this. This doesn't
work when the guest branches into some interrupt or exception handler
and triggers a vmexit with flag reloading.

Fix it by saving cs:rip when user space requests single-stepping and
restricting the trace flag injection to this guest code position.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:19 +02:00
Ed Swierk
ffde22ac53 KVM: Xen PV-on-HVM guest support
Support for Xen PV-on-HVM guests can be implemented almost entirely in
userspace, except for handling one annoying MSR that maps a Xen
hypercall blob into guest address space.

A generic mechanism to delegate MSR writes to userspace seems overkill
and risks encouraging similar MSR abuse in the future.  Thus this patch
adds special support for the Xen HVM MSR.

I implemented a new ioctl, KVM_XEN_HVM_CONFIG, that lets userspace tell
KVM which MSR the guest will write to, as well as the starting address
and size of the hypercall blobs (one each for 32-bit and 64-bit) that
userspace has loaded from files.  When the guest writes to the MSR, KVM
copies one page of the blob from userspace to the guest.

I've tested this patch with a hacked-up version of Gerd's userspace
code, booting a number of guests (CentOS 5.3 i386 and x86_64, and
FreeBSD 8.0-RC1 amd64) and exercising PV network and block devices.

[jan: fix i386 build warning]
[avi: future proof abi with a flags field]

Signed-off-by: Ed Swierk <eswierk@aristanetworks.com>
Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:18 +02:00
Mark Langsdorf
565d0998ec KVM: SVM: Support Pause Filter in AMD processors
New AMD processors (Family 0x10 models 8+) support the Pause
Filter Feature.  This feature creates a new field in the VMCB
called Pause Filter Count.  If Pause Filter Count is greater
than 0 and intercepting PAUSEs is enabled, the processor will
increment an internal counter when a PAUSE instruction occurs
instead of intercepting.  When the internal counter reaches the
Pause Filter Count value, a PAUSE intercept will occur.

This feature can be used to detect contended spinlocks,
especially when the lock holding VCPU is not scheduled.
Rescheduling another VCPU prevents the VCPU seeking the
lock from wasting its quantum by spinning idly.

Experimental results show that most spinlocks are held
for less than 1000 PAUSE cycles or more than a few
thousand.  Default the Pause Filter Counter to 3000 to
detect the contended spinlocks.

Processor support for this feature is indicated by a CPUID
bit.

On a 24 core system running 4 guests each with 16 VCPUs,
this patch improved overall performance of each guest's
32 job kernbench by approximately 3-5% when combined
with a scheduler algorithm thati caused the VCPU to
sleep for a brief period. Further performance improvement
may be possible with a more sophisticated yield algorithm.

Signed-off-by: Mark Langsdorf <mark.langsdorf@amd.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:17 +02:00
Zhai, Edwin
4b8d54f972 KVM: VMX: Add support for Pause-Loop Exiting
New NHM processors will support Pause-Loop Exiting by adding 2 VM-execution
control fields:
PLE_Gap    - upper bound on the amount of time between two successive
             executions of PAUSE in a loop.
PLE_Window - upper bound on the amount of time a guest is allowed to execute in
             a PAUSE loop

If the time, between this execution of PAUSE and previous one, exceeds the
PLE_Gap, processor consider this PAUSE belongs to a new loop.
Otherwise, processor determins the the total execution time of this loop(since
1st PAUSE in this loop), and triggers a VM exit if total time exceeds the
PLE_Window.
* Refer SDM volume 3b section 21.6.13 & 22.1.3.

Pause-Loop Exiting can be used to detect Lock-Holder Preemption, where one VP
is sched-out after hold a spinlock, then other VPs for same lock are sched-in
to waste the CPU time.

Our tests indicate that most spinlocks are held for less than 212 cycles.
Performance tests show that with 2X LP over-commitment we can get +2% perf
improvement for kernel build(Even more perf gain with more LPs).

Signed-off-by: Zhai Edwin <edwin.zhai@intel.com>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
2009-12-03 09:32:17 +02:00
Jan Kiszka
91586a3b7d KVM: x86: Rework guest single-step flag injection and filtering
Push TF and RF injection and filtering on guest single-stepping into the
vender get/set_rflags callbacks. This makes the whole mechanism more
robust wrt user space IOCTL order and instruction emulations.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Jan Kiszka
355be0b930 KVM: x86: Refactor guest debug IOCTL handling
Much of so far vendor-specific code for setting up guest debug can
actually be handled by the generic code. This also fixes a minor deficit
in the SVM part /wrt processing KVM_GUESTDBG_ENABLE.

Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:14 +02:00
Alexander Graf
10474ae894 KVM: Activate Virtualization On Demand
X86 CPUs need to have some magic happening to enable the virtualization
extensions on them. This magic can result in unpleasant results for
users, like blocking other VMMs from working (vmx) or using invalid TLB
entries (svm).

Currently KVM activates virtualization when the respective kernel module
is loaded. This blocks us from autoloading KVM modules without breaking
other VMMs.

To circumvent this problem at least a bit, this patch introduces on
demand activation of virtualization. This means, that instead
virtualization is enabled on creation of the first virtual machine
and disabled on destruction of the last one.

So using this, KVM can be easily autoloaded, while keeping other
hypervisors usable.

Signed-off-by: Alexander Graf <agraf@suse.de>
Signed-off-by: Marcelo Tosatti <mtosatti@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:10 +02:00
Gleb Natapov
136bdfeee7 KVM: Move irq ack notifier list to arch independent code
Mask irq notifier list is already there.

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:07 +02:00
Gleb Natapov
3e71f88bc9 KVM: Maintain back mapping from irqchip/pin to gsi
Maintain back mapping from irqchip/pin to gsi to speedup
interrupt acknowledgment notifications.

[avi: build fix on non-x86/ia64]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:07 +02:00
Gleb Natapov
1a6e4a8c27 KVM: Move irq sharing information to irqchip level
This removes assumptions that max GSIs is smaller than number of pins.
Sharing is tracked on pin level not GSI level.

[avi: no PIC on ia64]

Signed-off-by: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:06 +02:00
Avi Kivity
851ba6922a KVM: Don't pass kvm_run arguments
They're just copies of vcpu->run, which is readily accessible.

Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:32:06 +02:00
Avi Kivity
58988b07cf Merge remote branch 'tip/x86/entry' into kvm-updates/2.6.33
Signed-off-by: Avi Kivity <avi@redhat.com>
2009-12-03 09:30:06 +02:00
Jan Beulich
99063c0bce x86/alternatives: No need for alternatives-asm.h to re-invent stuff already in asm.h
This at once also gets the alignment specification right for
x86-64.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4B0FF8F80200007800022708@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 11:39:45 +01:00
Jan Beulich
01be50a308 x86/alternatives: Check replacementlen <= instrlen at build time
Having run into the run-(boot-)time check a couple of times lately,
I finally took time to find a build-time check so that one doesn't
need to analyze the register/stack dump and resolve this (through
manual lookup in vmlinux) to the offending construct.

The assembler will emit a message like "Error: value of <num> too
large for field of 1 bytes at <offset>", which while not pointing
out the source location still makes analysis quite a bit easier.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
LKML-Reference: <4B0FF8AA0200007800022703@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02 11:39:45 +01:00
Masami Hiramatsu
e859cf8656 x86: Fix comments of register/stack access functions
Fix typos and some redundant comments of register/stack access
functions in asm/ptrace.h.

Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com>
Cc: systemtap <systemtap@sources.redhat.com>
Cc: DLE <dle-develop@lists.sourceforge.net>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Roland McGrath <roland@redhat.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Wenji Huang <wenji.huang@oracle.com>
Cc: Mahesh J Salgaonkar <mahesh@linux.vnet.ibm.com>
LKML-Reference: <20091201000222.7669.7477.stgit@harusame>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Suggested-by: Wenji Huang <wenji.huang@oracle.com>
2009-12-02 10:22:22 +01:00
Herbert Xu
8386324381 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 2009-12-01 15:16:22 +08:00
H. Peter Anvin
ccef086454 x86, mm: Correct the implementation of is_untracked_pat_range()
The semantics the PAT code expect of is_untracked_pat_range() is "is
this range completely contained inside the untracked region."  This
means that checkin 8a27138924 was
technically wrong, because the implementation needlessly confusing.

The sane interface is for it to take a semiclosed range like just
about everything else (as evidenced by the sheer number of "- 1"'s
removed by that patch) so change the actual implementation to match.

Reported-by: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jack Steiner <steiner@sgi.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
LKML-Reference: <20091119202341.GA4420@sgi.com>
2009-11-30 21:33:51 -08:00
Joerg Roedel
492667dacc x86/amd-iommu: Remove amd_iommu_pd_table
The data that was stored in this table is now available in
dev->archdata.iommu. So this table is not longer necessary.
This patch removes the remaining uses of that variable and
removes it from the code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:37 +01:00
Joerg Roedel
b00d3bcff4 x86/amd-iommu: Cleanup DTE flushing code
This patch cleans up the code to flush device table entries
in the IOMMU. With this chance the driver can get rid of the
iommu_queue_inv_dev_entry() function.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:36 +01:00
Joerg Roedel
7c392cbe98 x86/amd-iommu: Keep devices per domain in a list
This patch introduces a list to each protection domain which
keeps all devices associated with the domain. This can be
used later to optimize certain functions and to completly
remove the amd_iommu_pd_table.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:34 +01:00
Joerg Roedel
241000556f x86/amd-iommu: Add device bind reference counting
This patch adds a reference count to each device to count
how often the device was bound to that domain. This is
important for single devices that act as an alias for a
number of others. These devices must stay bound to their
domains until all devices that alias to it are unbound from
the same domain.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:33 +01:00
Joerg Roedel
657cbb6b6c x86/amd-iommu: Use dev->arch->iommu to store iommu related information
This patch changes IOMMU code to use dev->archdata->iommu to
store information about the alias device and the domain the
device is attached to.
This allows the driver to get rid of the amd_iommu_pd_table
in the future.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:32 +01:00
Joerg Roedel
8793abeb78 x86/amd-iommu: Remove support for domain sharing
This patch makes device isolation mandatory and removes
support for the amd_iommu=share option. This simplifies the
code in several places.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:20:32 +01:00
Joerg Roedel
318afd41d2 x86/amd-iommu: Make np-cache a global flag
The non-present cache flag was IOMMU local until now which
doesn't make sense. Make this a global flag so we can remove
the lase user of 'struct iommu' in the map/unmap path.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:29 +01:00
Joerg Roedel
aeb26f5533 x86/amd-iommu: Implement protection domain list
This patch adds code to keep a global list of all protection
domains. This allows to simplify the resume code.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 14:16:27 +01:00
Joerg Roedel
c459611424 x86/amd-iommu: Add per IOMMU reference counting
This patch adds reference counting for protection domains
per IOMMU. This allows a smarter TLB flushing strategy.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2009-11-27 11:45:50 +01:00